How Programmers Debug, Revisited: An ...

20 downloads 12923 Views 4MB Size Report
ten professional programmers debugging a real-world open source program. ...... atched.the. ideos9.ta8ing.into.account. hat.the.participants.said.their.goals. ere9. ...... in Apple Computer s Advanced Technology Group and at The University of ...
7his'article'has'been'accepted'for'publication'in'a'future'issue'of'this'*ournalM'but'has'not'been'fullN'edited4'Content'maN'change'prior'to'final'publication4 I999'7AACSAC7I(CS'(C'S(E7WAA9'9CGIC99AICG

How Programmers Debug, Revisited: An Information Foraging Theory Perspective Joseph Lawrance1,2,1, Christopher 5ogart2, Margaret 5urnett2,1, Rachel 5ellamy1, Kyle Rector2, Scott D. Fleming2 1Wentworth

Institute of Technology

[email protected]

2Oregon State University {bogart, burnett, rectorky, sdf} @eecs.oregonstate.edu

1I5M

TJ Watson Research Center [email protected]

ABSTRACT Many theories of human debugging rely on complex mental constructs that offer little practical advice to builders of software engineering tools. Although hypotheses are important in debugging, a theory of navigation adds more practical value to our understanding of how programmers debug. Therefore, in this paper, we reconsider how people go about debugging in large collections of source code using a modern programming environment. We present an information foraging theory of debugging that treats programmer navigation during debugging as being analogous to a predator following scent to find prey in the wild. The theory proposes that constructs of scent and topology provide enough information to describe and predict programmer navigation during debugging, without reference to mental states such as hypotheses. We investigate the scope of our theory through an empirical study of ten professional programmers debugging a real-world open source program. We found that the programmers’ verbalizations far more often concerned scent-following than hypotheses. To evaluate the predictiveness of our theory, we created an executable model that predicted programmer navigation behavior more accurately than comparable models that did not consider information scent. Finally, we discuss the implications of our results for enhancing software engineering tools. Author Keywords Information foraging theory, debugging, software maintenance, programmer navigation, information scent, empirical software engineering ACM Classification Keywords D.2.5 MSoftware EngineeringO: Testing and DebuggingQ H.1.2 MInformation SystemsO: UserSMachine SystemsT Human factors 1. INTRODUCTION An advantage of some theories of programmer behavior is that they can be used to make predictions about how programmers will use software engineering tools. Such predictions can be used to guide choices about potential

1

Digital'(b*ect'Indentifier'134113567S9423134111

335;2?433'@''2313'I999

7his'article'has'been'accepted'for'publication'in'a'future'issue'of'this'*ournalM'but'has'not'been'fullN'edited4'Content'maN'change'prior'to'final'publication4 I999'7AACSAC7I(CS'(C'S(E7WAA9'9CGIC99AICG

benefits of new tool features, or can inspire new software engineering practices. In this way, software engineering researchers and tool developers can build on each other’s work in a principled manner. This paper presents a theory of programmer navigation when debugging. The programmer navigation aspect of debugging is important: recent research has shown that programmers spend 35A of their time navigating [Co et al. 2006]. Older theories of program debugging (e.g., [Brooks 1999], [Lans and von Mayrhauser 1999]) do not explicitly consider navigation; instead, they rely primarily on in-the-head constructs, such as mental models and hypotheses, to explain the behavior of programmers during program comprehension and debugging. Although such theories revealed phenomena that have influenced software tools, they have not revealed concrete behavioral phenomena that tools can directly observe and respond to appropriately. Furthermore, these theories were mostly developed in an age in which programming environments were relatively simple. Modern programming environments provide a plethora of visualizations, clickable links, animations, and other aids, but the use of these devices are not accounted for by the older theories. Rational Analysis [Anderson 1990] may hold the key to understanding programmer navigation during debugging in modern programming environments, with potential concrete implications for programming tools. It suggests a basis for theories of programmer behavior that do not involve knowledge of what is inside a programmer’s head. Rational Analysis assumes that expert behavior is optimally adapted to the structure of the environ'ent. It allows researchers to infer what a person adapted to an environment will do, subject to (1) the person’s goals, (2) the costs and benefits of actions in the environment, and (3) a modest set of resource constraints known to apply to the brain’s computational capacities. Applied to debugging, it implies that experts will make the best possible navigational choices, given the information the environment makes available to them at each moment. Anderson has shown good results in this approach to human behavior involving problem solving, categorization, and causal inference. Information foraging theory [Pirolli and Card 1999] is an example of a rational analysis that has emerged in the last decade as a way to explain how people seek, gather, and make use of information. Like all rational analyses, information foraging theory assumes that humans have evolved to be well adapted to the excessive information in the world around them, and that they behave accordingly in information spaces. The basic idea is that, given the plethora of irrelevant information in the environment, humans have evolved strategies to efficiently find information relevant to their needs without processing everythingZin essence, minimizing the mental cost to achieve their goals. Information foraging theory is based on optimal foraging theory, a theory of how predators and prey behave in the wild. Predators sniff for the prey, and follow the scent to the patch where the prey is likely to be. Applying these notions to the domain of information technology, predators (people in need of information) sniff for the prey (the information itself), and follow the scent through cues in the environment to the information patch that contains the prey. Information foraging theory has been shown to mathematically model which web pages human information foragers select on the web [Chi et al. 2001], and as a result has become extremely useful as a practical tool for web site design and evaluation [Chi et al. 2003], [Nielsen 2003], [Spool et al. 2004]. We believe that information foraging theory should apply to how programmers navigate when searching for and fixing a bug as well. That is, we propose that if a bug is treated as the “prey,” words in the environment and the source code are treated as “cues” that suggest the “scent” of prey, and modern programming environments’ navigational affordances (such as the ability to mouse over a method call to see its definition) are treated as “topology,”

2

7his'article'has'been'accepted'for'publication'in'a'future'issue'of'this'*ournalM'but'has'not'been'fullN'edited4'Content'maN'change'prior'to'final'publication4 I999'7AACSAC7I(CS'(C'S(E7WAA9'9CGIC99AICG

t$en t$e t$eory of infor,ation foraging can be ,a11ed to t$e do,ain of debugging. 56e detail t$is ,a11ing in Sec: tion ;.< =f infor,ation foraging t$eory a11lies to 1rogra,,ers> debugging be$avior, it $as t$e 1otential to 1rovide a 1arsi,onious and easily understood ,odel to guide t$e efforts of tool builders. T$us, a tool designer Bould $ave a t$eoretically Bell:grounded Bay of evaluating design c$oices about t$e navigational devices and cues in a 1ro1osed softBare:debugging tool. 5=n t$e conclusion of t$is 1a1er, Be briefly suggest a feB suc$ i,1licationsC DScaffidi et al. ;E1EG 1rovides a ,ore in:de1t$ treat,ent of t$e i,1lications for tool design.< =n t$is 1a1er, Be 1resent a t$eory of 1rogra,,er navigation during debugging based on infor,ation foraging t$eory. Consistent Bit$ SIJberg et al.>s t$eory:building 1rocess DSIJberg et al. ;EEKG, Be define t$e constructs and 1ro1ositions of t$e t$eoryC Be e,1irically investigate t$e sco1e of t$e t$eory 5i.e., t$e circu,stances in B$ic$ t$e t$eory is a11licables 1redictive 1oBer.1 To 1erfor, t$e evaluation, Be develo1ed an eLecutable ,odel, MF=S, B$ic$ o1erationaliOes t$e t$eory constructs and can 1redict B$ere eL1ert 1ro: gra,,ers navigate during debugging DPaBrance et al. ;EEKbG.; As a 1oint of co,1arison, Be also eL1lored a variant of our t$eory t$at bases 1redictions on 1rogra,,er $y1ot$eses, rat$er t$an scent. T$e contributions of t$is 1a1er are: 51< a t$eory of infor,ation foraging for 1rogra,,er navigation be$avior B$ile debuggingC 5;< an analysis of t$e relations$i1 betBeen infor,ation foraging t$eory for debugging and ot$er t$eories of debugging in t$e softBare engineering literatureC 53< an e,1irical co,1arison of t$e 1revalence of scent: folloBing versus 5non:scent< $y1ot$esis 1rocessing in debuggingC 5S< a detailed e,1irical analysis of $oB infor,a: tion foraging activity 1ertains to siL debugging T,odesUC 5V< an e,1irical analysis of B$ic$ artifacts are needed ,ost by tools atte,1ting to ca1italiOe on infor,ation foraging t$eoryC and 5W< an e,1irical evaluation of MF=S as a 1redictor of B$ere 1rogra,,ers navigate. 2. A% I%FORMATIO% FORA,I%, THEOR/ OF PRO,RAMMER BEHA2IOR DURI%, DEBU,,I%, A central as1ect of develo1ing a t$eory is c$oosing t$e rig$t constructs DSIJberg et al. ;EEKG. A s,all nu,ber of constructs ,ay i,1rove t$e t$eory>s 1arsi,onyC $oBever, too feB or overly si,1le constructs ,ay li,it t$e t$eory>s eL1lanatory 1oBer or sco1e. =nfor,ation foraging t$eory DMirolli and Card 1XXXG defines a ,anageably s,all set of intuitive constructs t$at $as de,onstrated utility for eL1laining and 1redicting $oB $u,ans navigate Beb 1ages DC$i et al. ;EE1G. To $andle t$e real, of debugging, Be refined t$e original infor,ation foraging constructs as folloBs: !

!"#$%&'": T$e 1rogra,,er B$o is debugging.

!

!"#): 6$at t$e 1rogra,,er seeYs to YnoB to reveal t$e c$anges t$at ,ust be ,ade to fiL t$e bug. Fur: t$er,ore, any infor,ation t$at t$e 1rogra,,er seeYs to ac$ieve t$e goal also constitutes a for, of 1rey.

!

=nfor,ation *%&+,#-: Pocalities in t$e source code, related docu,ents, and dis1lays t$at ,ay contain 1rey.

!

!"'./0%1 +2#-: 6ords, obIects, and 1erce1tible runti,e be$aviors in t$e 1rogra,,ing environ,ent t$at suggest scent relative to t$e distal 1rey. Cues act as sign1osts to 1rey. For eLa,1le, Bords in t$e source code, including co,,ents, constitute a ty1e of cue.

1

T$e e,1irical evaluations 5described in Section V< are original and unique to t$is 1a1er, but are based on 1re:

viously collected navigation data and t$inY:aloud 1rotocols 5See Section S and DPaBrance et al. ;EE[Gs i,1le,entation is described in A11endiL A and in DPaBrance et al. ;EEKbG.

3

7his'article'has'been'accepted'for'publication'in'a'future'issue'of'this'*ournalM'but'has'not'been'fullN'edited4'Content'maN'change'prior'to'final'publication4 I999'7AACSAC7I(CS'(C'S(E7WAA9'9CGIC99AICG

!

!"#$%&'()$" +,-"(#!T%&!'&()&*+&,!-*.&-*%//,!/0!1!)2&!-&1,*34!5/!'(&67!&*5%&(!,*(&)5-6!/(!*3,*(&)5-68!S)&35!*:!1! ;&1:2(&7!13,!:)&35!0(/;!/3&!)2&!)13!*;15&!*30/(;15*/3!:)&357!@&!)/;'25&!@/(,!:*;*-1(*56!*;1-! )2&:! *3! 5%&! :/2()&! )/,&! the:? >to:? >be:? >or:? and >not?;9 Second: we weigh ter*s in 5iles o5 source code according to the co**only used tf#idf 5or*ula CDaeEaFGates et al9 HIIIJ: which we co*=ute as 5ollows:

where

and

Lere: fi&' is the 5reMuency o5 word i in docu*ent ' 8nor*aliEed with res=ect to the *ost 5reMuently occurring word v in a docu*ent;: idfi is the in7erse docu*ent 5reMuency: and *i&' is the weight o5 word i in docu*ent d'9 Nhird: we co*=ute the interFword correlation between =roOi*al cues in source 5iles and the teOt o5 a bug re=ort using cosine si*ilarity: a *easure co**only used in in5or*ation retrie7al syste*s CDaeEaFGates et al9 HIIIJ9 We co*=ute cosine si*ilarity as 5ollows:

where *i&+ is the weight o5 word i in the bug re=ort teOt 8i9e9: the Muery in the ter*inology o5 CDaeEaFGates et al9 HIIIJ;9 No e7aluate the =redicti7e =ower o5 the theory: we de7elo=ed an eOecutable *odel: &QRS 8&rogra**er Qlow by Rn5or*ation Scent; based on the abo7e o=erationaliEation9 We introduced &QRS in =re7ious worS CTawrance et al9 UVVWbJ9 No i*=le*ent &QRSXs a==roOi*ation o5 scent: we le7eraged the A=ache Tucene searchFengine libraryZ: which =ro7ides 5unctionality 5or co*=uting word si*ilarity9 No *aSe =redictions: &QRS =ro=agates its co*=utation o5 scent throughout the 5oraging en7iron*ent 8i9e9: =roF gra* source code;9 Rt starts by gathering to=ological in5or*ation: then calculates scent o5 each =rocedure[*ethodXs cue9 No si*ulate the =ro=ortion o5 =rogra**ers that will 5ollow each linS in the to=ology: &QRS =ro=agates the scent through the to=ology using the s=reading acti7ation techniMue CAnderson HIW4J: CCrestani HII]J9 A==endiO A =roF 7ides the details o5 the s=readingFacti7ation algorith*9 Rn Sections Z^_: we describe a study in which we a==lied &QRS to test the theoryXs =redicti7e =ower9 !. $%&'()O+S-). O/ )+/O$M'()O+ /O$'1)+1 (O O(-%$ (-%O$)%S O/ 2%3411)+1 So*e researchers ha7e =ro=osed *odels o5 cognition during debugging: and ha7e suggested ways that these *odels *ight in5luence na7igation9 Nheories o5 debugging ha7e e7ol7ed in =arallel with changes to the tasS o5 deF bugging itsel59 Droadly s=eaSing: earlier theories 8e9g9: CDrooSs HIW4J: CTeto7sSy HIW_J; tended to 5ocus on a =erson

4

&orter ste**ing is si*=le and e55icient: but: liSe all ste**ing algorith*s: it can ste* words erroneously 8i9e9:

=roducing di55erent ste*s 5or di55erent 5or*s o5 the sa*e words: or =roducing the sa*e ste* 5or di55erent words; CDaeEaFGates et al9 HIIIJ9 Z

htt=:[[lucene9a=ache9org[

"

7his'article'has'been'accepted'for'publication'in'a'future'issue'of'this'*ournalM'but'has'not'been'fullN'edited4'Content'maN'change'prior'to'final'publication4 I999'7AACSAC7I(CS'(C'S(E7WAA9'9CGIC99AICG

reading programs and forming hypotheses in his or her head, until he or she found a fix. Later theories (e.g., [Ko et al. 2006], [Romero et al. 2007]) depict a process of gathering and organizing information; the trend is from largely in-the-head debugging towards a distributed-cognition perspective, which says that some cognition is “in the world” as a replacement for being in the head [Hollan et al. 2000]. Brooks proposed a top-down theory of program comprehension in which a hierarchy of hypotheses drives comprehension [Brooks 1983]. According to Brooks, high-level hypotheses are notions about how the whole of the code might be structured. The pursuit of these high-level hypotheses spawns lower-level hypotheses about parts or aspects of the program. The programmer can validate the most specific, lowest-level hypotheses by inspecting specific lines of code. Brooks observed several patterns in programmers’ hypothesis-processing behavior: !

Programmers generally do not give hypotheses names or named components; their hypotheses are “just descriptions of the components in terms of the functions they perform.”

!

In Brooks’s theory, the “primary hypothesis”—that is, the top-level one about the structure of the code—is “global and non-specific,” and thus “the programmer will almost always find it impossible to verify directly against the program code.” Instead, it prompts “construction of subsidiary hypotheses,” with the lowest-level ones “adding specific detail and concreteness.” This suggests that the high-level hypotheses will be the least likely to match up directly with the code.

!

“Information collection during the search will be broad, rather than focused only on the hypothesis at hand.” This observation suggests that a programmer’s current hypothesis is unlikely to directly correspond to the code that the programmer visits during debugging.

Brooks’ theory also helps us understand why scent may be a more useful construct to use as a predictor of code navigation. In his theory, he notes that the most concrete hypotheses are verified by the identification of “beacons”: “sets of features that typically indicate the occurrence of certain structures or operations within the code.” For example, a swap operation within nested DO loops is a possible beacon for a sort operation. Other examples of beacons include variable and other names used in code. Beacons are closely related to cues in information foraging theory. The programmer can register scent from beacons because beacons relate the code to some notion the programmer has in his or her head (e.g., a hypothesis about what the code represents). Letovsky proposed a cognitive model that involves both bottom-up and top-down hypothesis5 formation [Letovsky 1986]. His work looked at programmers’ initial phase of comprehension during a small-scale change task. He described what and why hypotheses (and questions) as drawing connections bottom-up from the program code (“implementation”) to the program's domain (“specification”); and how hypotheses (and questions) as going top-down. Letovsky claimed that these bottom-up hypotheses trigger a search through code or documentation, or a reasoning process within the mental model. Moreover, programmers tend to “pick a method which is likely to yield an answer with little effort.” Unlike Brooks’ top-down hypotheses, verbalizations of Letovsky’s bottom-up hypotheses might contain words found in the code itself, rather than in a bug report. However, the bottom-up hypotheses seem unlikely to be useful as a predictor of code navigation, as these hypotheses occur after a programmer has already navigated to a closely related place in the code.

5

Letovsky used the term conjecture.

6

7his'article'has'been'accepted'for'publication'in'a'future'issue'of'this'*ournalM'but'has'not'been'fullN'edited4'Content'maN'change'prior'to'final'publication4 I999'7AACSAC7I(CS'(C'S(E7WAA9'9CGIC99AICG

Program comprehension is one aspect of debugging; however, there are other aspects to consider. Katz and Anderson conducted experiments examining students' LISP-debugging strategies [Katz and Anderson 1988]. The study revealed four distinct debugging subtasks: comprehension, testing, locating the component containing the error, and repairing the component. The students had the most difficulty with locating the component that contained the error. The study also found three general strategies for locating the buggy component: mapping directly from program behavior to the bug, causal reasoning, and hand-simulation. The researchers speculated that causal reasoning becomes less frequent over time, as learners build up a set of familiar problem situations and remedies. The study also found that students engaged in more causal reasoning when debugging their own code as opposed to someone else's. Although the Katz and Anderson study has been criticized because it assumed that program comprehension precedes debugging [Gilmore 1991], subsequent studies, which made no such assumption, corroborated the finding that programmers use these debugging strategies [Romero et al. 2007]. We have already pointed out that the difference in today’s environments from those of the early studies suggest that we must tread carefully in generalizing the results of these studies to programming today. Furthermore, the programming tasks studied by Brooks, Letovsky, and Katz and Anderson were very different from those typically encountered by today’s professional programmers. Brooks and Letovsky developed their theories using short programs presented on paper. When working with a short program on paper, a programmer could attempt to completely comprehend the whole program before making changes. In fact, navigation in the paper medium was fairly costly because no navigation tools were provided. But today, professional programmers must deal with large source-code libraries containing thousands of lines of code. For such large programs, the only realistic strategy is to focus on relevant parts of the code and to navigate among them. Today’s programming environments include affordances and tools, such as the ubiquitous find utility, that aim to support these capabilities. Contemporary software-development environments have been found to affect debugging strategies. In an experiment using such environments, Romero et al. identified an additional forward-reasoning strategy that they termed following execution [Romero et al. 2007]. This strategy involves step-by-step tracing of the execution for one particular input example, and observing which lines of code change the data and affect the program’s output. Such a strategy requires environmental support, and was not possible in the paper-based experiments of earlier years. Thus, the richness and flexibility of modern development environments make it possible for a programmer to substitute informational manipulation for some cognitively intensive activities. For instance, a programmer can use a debugger’s stepping functionality when following execution rather than performing hand simulations and calculating results by hand. The large size of today’s programs also seems to impact debugging strategies; in fact, we believe today’s large programs necessitate that programmers use foraging strategies in debugging. To reduce the cost in time and effort of debugging a large program, we conjecture that programmers choose to understand only parts of the program, accepting the risk that incomplete comprehension might lead to an incorrect fix. When following such a strategy, programmers may have to ignore many bottom-up hypotheses. We expect that in programming situations involving large programs and limited time, programmers tend to ignore hypotheses that are not related to the bug report. Instead, the programmers follow scents that involve words related, directly or indirectly, to the bug report. Our theory is based on this premise.

7

7his'article'has'been'accepted'for'publication'in'a'future'issue'of'this'*ournalM'but'has'not'been'fullN'edited4'Content'maN'change'prior'to'final'publication4 I999'7AACSAC7I(CS'(C'S(E7WAA9'9CGIC99AICG

Ko et al. have also argued the importance of seeking information as a central mechanism in debugging behavior [Ko et al. 2006]. They propose a model of program understanding during software maintenance in which the strategy chosen depends on seeking relevant information in the environment. Their model consists of three stages: seek$ ing, relating, and collecting information. Seeking involves looking through the code for information relevant to the task at hand. Relating involves connecting the different regions of code to each other to see how they relate. Collecting involves the designation of some of these regions as relevant to the task. Examples of relating and collecting include programmers repeatedly returning to regions of code they had visited before, and using affordances of the programming environment to speed up that repeated navigation, for example, by creating bookmarks, leaving scrollbars strategically placed, and keeping multiple tabs or windows open for fast switching. Actions such as these allow the programmer to improve the density of useful material in a patch, or to reduce the cost of navigating between patches. In turn, these actions change the cost/benefit trade-offs of different debugging strategies. Recall that information foraging theory refers to such actions as enrichment. Through enrichment, the information seeker can deliberately modify the environment to either improve the density of useful material in a patch, or speed travel between patches. 4. EMPIRICAL STUDY This study investigates the use of information foraging theory to explain and predict the navigation behavior of programmers performing debugging. To better understand and evaluate the theory in this context, we compared it with a variant based on programmers’ hypothesis processing, rather than scent following. In conducting the study, we addressed four primary research questions. The first research question explored the relationship between scentfollowing and hypothesis-processing behavior: •

RQ1: The literature emphasizes the importance of programmer hypotheses in understanding debugging behavior. Is there reason to expect a scent-based model of debugging navigation will succeed if it does not explicitly handle programmers’ hypothesis processing?

The second and third research questions served to elaborate the scope of the theory: •

RQ2: Debugging involves a variety of activities, such as locating the fault, fixing the fault, and verifying the fix. When do developers engage in scent and hypothesis processing with respect to these activities?



RQ3: Programmers navigate through a variety of artifacts, such as source code, documentation, and email. Where do programmers navigate during debugging with respect to these artifacts, and in which artifacts do programmers exhibit scent seeking and hypothesis processing?

The fourth question evaluated the predictive power of the theory: •

RQ4: Is the PFIS approach to predicting programmer navigation significantly more accurate than an approach based on programmers’ stated information needs and hypotheses?

To answer these questions, we made detailed observations of programmers engaged in debugging. We analyzed verbal protocols collected from our participants, focusing on what those verbalizations could tell us about the roles of hypotheses and scent in our theory and in operational derivatives of it such as PFIS.

8

7his'article'has'been'accepted'for'publication'in'a'future'issue'of'this'*ournalM'but'has'not'been'fullN'edited4'Content'maN'change'prior'to'final'publication4 I999'7AACSAC7I(CS'(C'S(E7WAA9'9CGIC99AICG

!"# Design+ ,artici1ants+ and Materials The study design used the think-aloud method [Ericsson and Simon 1993]. Consistent with the method, we observed participants performing programming tasks and prompted them to “think aloud” as they worked. We recorded video of their sessions as well as a log of their actions. Using a grounded-theory approach [Corbin and Strauss 2008], we performed an in-depth qualitative analysis of their words and actions. Furthermore, we quantitatively analyzed the action logs to evaluate our model’s predictions. We recruited twelve professional programmers from IBM to participate in the study. All participants had at least two years experience programming in Java, used Java for the majority of their software development, and were familiar with Eclipse6, a standard Java IDE, and with bug-tracking tools, such as Bugzilla7. As the subject program for our study, we selected RSSOwl8, an open-source news-feed reader that is one of the most actively maintained and downloaded projects hosted at Sourceforge.net. The program met several key criteria: it was a real-world program; we had access to its source code and bug reports; it was written in Java; it was sufficiently large to allow enough navigation for useful analysis; and it was editable and executable through Eclipse. RSSOwl’s source code consisted of 193 class files (80,520 lines of code). The popularity of newsreaders and the similarity of its user interface to email clients helped ensure that our participants would understand the functionality and interface after a brief introduction, and that they could quickly begin using and testing the program. As tasks for our participants, we selected two bug reports from RSSOwl’s bug-tracking database. We were more interested in source-code navigation than the actual bug fixes; therefore, we wanted to ensure that the issues could not be solved within the duration of the session. We also decided that one bug should be about a code-level defect, whereas the other bug should be a specification-level defect. The code-level defect involved a broken feature: bug #1458101: “HTML entities in titles of atom items not decoded.” The specification-level defect involved a missing feature: bug #1398345: “Remove Feed Items Based on Age.” We refer to the first bug as BF (“Broken Feature”) and the second as MF (“Missing Feature”). Although MF might be a less traditional bug than BF (e.g., MF could be characterized as a feature request), this kind of bug report is common. Including it in our analysis adds diversity to the tasks studied and thus improves our ability to assess the general applicability of our findings. These two bugs were still open when we ran the study—that is, no RSSOwl developers had fixed them yet. We considered looking at closed bugs with solutions we could examine; however, we would have been forced to restrict participants’ use of the web to ensure they did not find the existing solutions. Such a restriction would have diminished the realism of the task context, so we went with open bugs to preserve realism. Fortunately, the bugs remained open for the duration of the study. !"5 ,rocedure Initially, participants filled out paper work, and we briefly described RSSOwl. Next, we individually instructed them to find and fix the defects described in the bug reports. We set up an instant messenger client so that participants could contact us, then we excused ourselves from the room. Each participant worked on both bugs. We also 6

http://www.eclipse.org/

7

http://www.bugzilla.org/

8

http://www.rssowl.org/

9

7his'article'has'been'accepted'for'publication'in'a'future'issue'of'this'*ournalM'but'has'not'been'fullN'edited4'Content'maN'change'prior'to'final'publication4 I999'7AACSAC7I(CS'(C'S(E7WAA9'9CGIC99AICG

asked participants to “think aloud” as they worked. If a participant fell silent for an extended period of time, we prompted the participant over instant messenger to “please, keep talking.” We counterbalanced the ordering of tasks among the participants to control for learning effects. We met with each participant for three hours, and observed each participant remotely for two hours.9 We allowed each participant to spend only one hour per bug. We recorded synchronized audio, video, and screen captures, which we were able to replay together, allowing us to hear what participants said, while observing their actions and the screens they were viewing. We also logged their events, and archived any changes they made to the program. The electronic transcripts of their actions, videos with screen captures, and source code served as the data sources we used in our analysis. 5. ANALYSIS METHODOLOGY We discarded the data for two of the twelve participants: one was used as a pilot, and the data for the other was unreadable (a recording-tool failure). Our analyses of the verbal protocol data comprised three stages. In the first stage, we categorized the participants’ explicit verbalizations about hypothesis processing and scent following. In the second stage, we categorized the artifacts that participants were looking at when they made certain types of statements. In the third stage, we performed a fine-grained analysis of two of the participants’ verbalizations and videos. We describe the details of each analysis below. For the first stage of analysis, categorizing verbalizations, we coded all ten participants’ verbal protocols in the following manner. Initially, four researchers developed codes while jointly coding two of the protocols. After the four researchers cooperatively refined the initial code sets and developed norms about how to apply them, two researchers coded the remaining protocols, working independently and then checking for agreement. Thus, at least two researchers coded every protocol. The total rate of agreement was 97%, which indicates extremely high coding reliability. The reliability was even higher (99%) when we excluded from the calculations the code “no code” (i.e., verbalizations deemed by the coders to be neither about hypotheses nor scent). For hypotheses, we coded when participants formed hypotheses, modified hypotheses, confirmed hypotheses, or abandoned hypotheses. We used the hypothesis codes on verbalizations that were explicitly about the part of the code or application behavior implicated in the bug or the fix. The first four rows of Table 1 define these hypothesis codes. For scent, we coded participants’ statements about what scent to look for, when they gained scent, and when they lost scent. Only three codes were needed here, instead of four, because we did not try to discriminate between scents that were and were not variants of previous scents sought. The last three rows of Table 1 define these codes.

9

We devoted one hour to overhead: paperwork, briefing participants, a break between issues, and debriefing.

10

7his'article'has'been'accepted'for'publication'in'a'future'issue'of'this'*ournalM'but'has'not'been'fullN'edited4'Content'maN'change'prior'to'final'publication4 I999'7AACSAC7I(CS'(C'S(E7WAA9'9CGIC99AICG

#ode

'e(inition

,x./01e

h30othe4i454t.rt

7u99e4tin9 :h.t 0.rt o( the ;ode or .001i;.5 tion beh.=ior i4 i/01i;.ted in the bu9>

?@A “7o itC4 Dind o(Ethe .0o4tro0he4 .re not /.Din9 it in>”

h30othe4i45/odi(3

#h.n9in9 or re(inin9 . h30othe4i4>

?GA “7o thi4 ne:4(eedtit1e 4trin9 d.t. =.1ue need4 to be e4;.0ed>”

h30othe4i45;on(ir/ 'e;idin9 . h30othe4i4 :.4 ;orre;t>

H""A “7o th.tC4 de(inite13 the ri9ht 40ot>”

h30othe4i45.b.ndon 'e;idin9 . h30othe4i4 :.4 not ;orre;t>

?IA “Jnd .;tu.113K no: :e donLt need to (i9ure out th.t itC4 =.1id>”

4;ent5to54eeD

7t.tin9 . need (or . ;ert.in Dind o( in(or/.5 tion>

HMA “7o N need to (ind 4o/ethin9 .bout (eed4>”

4;ent59.ined

Oindin9 . 4;ent Pth.t /.3 or /.3 not h.=e been 4ou9htQ

H?A “RetC4 (i9ure outEohK hereC4 9ui i"?n tr.n41.tion>”

4;ent51o4t

Ro4in9 . 4;ent>

?SA “Thi4 i4 .11 Uu4t d.te 4tu((>”

!"#$%&'(&)"*+&,-.%/&*.%+0*12*+3&425-04%/%/&"+.&/,%+06&"+.&%7"85$%/&-1&5"90*,*5"+0&:%9#"$*;"0*-+/(&

Oor the 4e;ond 4t.9eK ;.te9oriVin9 .rti(.;t4K :e deri=ed . 4et o( “tri99er” ;ode4 b3 identi(3in9 the .rti(.;t in u4e :hen . 0.rti;i0.nt de;ided to 0ur4ue . h30othe4i4 Ph30othe4i454t.rtQ or . 4;ent P4;ent5to54eeDQ> PWe ;.11 the/ “tri95 9er4” be;.u4e :e ;on4idered the4e .rti(.;t4 .4 0otenti.113 tri99erin9 the 0.rti;i0.nt4C de;i4ion4 to 0ur4ue h30othe4e4 or 4;ent>Q T.b1e @ 1i4t4 the4e tri99er ;ode4> Xe;.u4e the tri99er ;ode4 did not reYuire interpretation o( the =ideo4K :e u4ed . 4i/01er ;odin9 0ro;e44 th.n on the /.in ;ode 4et> We de=i4ed the ;ode4 b3 4i/013 enu/er.tin9 the .rti5 (.;t4Z.44et4 :e ex0e;ted Pthe unit.1i;iVed entrie4 in the (ir4t ;o1u/n o( the t.b1eQ> #odin9 o( the (ir4t t:o =ideo4 identi(ied t:o /ore ;.te9orie4 Pit.1i;iVed in the t.b1eQ> We .14o .11o:ed “other” in ;.4e . ;oder r.n .;ro44 .n3 /oreK but there :ere =er3 (e: u4e4 o( th.t ;ode> We te4ted the 4;he/eC4 robu4tne44 b3 h.=in9 t:o re4e.r;her4 ;ode t:o 0.rti;i0.nt4 (or e.;h o( the bu94 Pi>e>K @[\ o( the d.t. 4etQ> J(ter re.;hin9 .n .9ree/ent o( H]\K :hi;h indi;.ted th.t the ;ode4 :ere robu4tK one o( the re4e.r;her4 .001ied the =eri(ied ;ode4 to the re/.inin9 =ideo4> #ode

'e(inition

Xu95text

RooDin9 .t the bu9 re0ort>

^unti/e

RooDin9 .t runti/e beh.=ior>

7our;e5;ode

RooDin9 .t 4our;e ;ode>

_N54t.ti;5in40e;tion

RooDin9 .t the .001i;.tionC4 _N “4t.ti;.113” to identi(3 0ie;e4 o( (un;tion.1it3K the 4tru;ture o( the .001i;.tionC4 _NK et;>K P.4 o00o4ed to it4 runti/e beh.=iorQ>

Web

Xro:4in9 or 4e.r;hin9 :eb 0.9e4>

`ther

RooDin9 .t .n3thin9 e14eK e>9>K 1ooDin9 .t 0ortion4 o( the ,;1i04e _N it4e1(>

Debugger-menus

RooDin9 .t debu99er /enu4> P'ebu99er /enu4 enu/er.te d.t. (ie1d n./e4K et;>Q

Input-file

RooDin9 .t the in0ut (i1eK e>9>K “7o in .n x/1 do;u/entK :e h.=e 4o/e tit1eK :e h.=e 4o/e ht/1 entitie4a”

!"#$%&90*1",0/&"0&?4*,4&5"90*,*5"+0/&?%9%&$--@*+3&?4%+&04%2&%759%//%.&425-04%/*/A/0"90&-9&/,%+0A0-A/%%@& :%9#"$*;"0*-+/(&&

The third 4t.9e o( .n.134i4 :.4 =er3 (ine 9r.inedK 4o :e re4tri;ted it to t:o 0.rti;i0.nt4C 0er(or/.n;e o( both t.4D4 Pi>e>K (our t.4D4 in tot.1Q> The (ine59r.ined .n.134i4 h.d t:o 9o.14A to 4;rutiniVe the =ideo4 ;1o4e13 to identi(3 in4t.n;e4 o( h30othe4e4 .nd 4;ent th.t :ere not .00.rent (ro/ the =erb.1iV.tion4 .1oneK .nd to ;h.r.;teriVe contextE

""

7his'article'has'been'accepted'for'publication'in'a'future'issue'of'this'*ournalM'but'has'not'been'fullN'edited4'Content'maN'change'prior'to'final'publication4 I999'7AACSAC7I(CS'(C'S(E7WAA9'9CGIC99AICG

Partici9 pant! G2!

Jug!JK!

Jug!QK!

!

GL! GX! GH! RS! RO! RG! RR! R1N! R11! Figure'1.'Thumbnails'of'each'participant’s'hypothesis'(dark'blue)'and'scent'(light'orange)'verbali>ations'over'time.'The' x@axes'are'time'(0'to'70+'minutes,'in'5@minute'intervals),'the'y@axes'denote'counts'of'verbali>ations'of'each'type'(0'to'18).''

$h&!and!$hen!participants1!h&potheses!and!scent!follo$ing!occurred7!Since!such!fine9grained!:ualitati;e!anal&ses! are! e! $e! selected! ?ust! four! tas@s7! Aur! criteria! $ere>! first>! that! $e! $ere! interested! in! e!tas@!selection!should!be!paired!Di7e7>!t$o!tas@s!from!each!of!t$o!participantsEB!and!third>!participants! must! ha;e! performed! tas@s! in! different! orders7! Fe! thus! selected! one! participant! DGHE! for! $hom! our! first! stage! of! anal&sis!re;ealed!that!h&pothesis!acti;it&!and!scent!acti;it&!pea@s!seemed!to!coincide7!In!particular>!of!all!the!ses9 sions>!participant!GH1s!bug!JK!session!had!the!highest!correlation!bet$een!h&pothesis!and!scent!counts>!aggregated! o;er! H9minute! periods! DrD1LEMN7ONL1B! see! Kigure! 1E7! Participant! GH1s! bug! QK! had! one! of! the! lo$est! correlations! DrD1LEM9N7NG2HE>!so!it!pro;ided!a!good!contrast7!!Kor!the!second!participant!$e!chose!RG>!because!of!the!large!dis9 tance!bet$een!the!h&pothesis!and!scent!cur;es!in!Kigure!1!Da!total!count!of!SH!more!scents!than!h&potheses!for!bug! JKE!and!because!Participant!RG!$or@ed!on!JK!first>!$hereas!GH!$or@ed!on!QK!first7!Kor!the!t$o!participants>!three! of!the!researchers!$atched!the!;ideos>!ta@ing!into!account!$hat!the!participants!said!their!goals!$ere>!$hat!actions! the!participants!too@>!ho$!the!participants!interacted!$ith!artifacts>!and!$hat!;oice!inflections!and!facial!e!$atching!the!;ideos! as!a!group!and!discussing!each!segment!of!;ideo!in!detail7!Uhe!three!researchers!$or@ed!through!three!of!the!four! tas@s>!and!t$o!researchers!finished!up!the!remaining!tas@7! !" $%&'()& In!this!section>!$e!present!empirical!anal&ses!aimed!at!building!and!e;aluating!our!theor&!of!information!forag9 ing!during!debugging7!Specificall&>!$e!first!anal&Ted!the!relationship!bet$een!information9foraging!na;igation!be9 ha;ior!and!;erbaliTed!h&pothesis9processing!beha;ior!DVW1EB!second>!$e!anal&Ted!when!information9foraging!and!

!

12

7his'article'has'been'accepted'for'publication'in'a'future'issue'of'this'*ournalM'but'has'not'been'fullN'edited4'Content'maN'change'prior'to'final'publication4 I999'7AACSAC7I(CS'(C'S(E7WAA9'9CGIC99AICG

hypothesis,processing behavior occurred by considering each o6 the si7 debugging subtas8s 9or modes: that arose in our participants; data 9R=>:? third, Ae analyCed where the participants sought scent by considering their interactions Aith various arti6acts 9R=#:? and 6ourth, Ae evaluated the predictive poAer o6 the theory by comparing the predic, tions produced by our EFIS model Aith those produced by variant models, including one based on hypotheses 9R=I:J Ke did not analyCe Ahether participants; utterances Aere LcorrectM in any AayJ !"# %&' (')*+,-./&,0 1'+2''. 34'.+ *.5 670-+&'/'/ Nhe literature has long held that the debugging behavior o6 programmers is driven by hypothesis processingJ Ke have reason to believe that our theory based on in6ormation 6oraging su66iciently accounts 6or hypothesis proc, essing to e7plain and predict programmer navigation during debuggingJ In particular, Ae hypothesiCe that tAo con, ditions holdO 9": some hypothesis content closely parallels scent content and there6ore is accounted 6or by our the, ory, and 9>: the remaining amount o6 hypothesis,based processing 9iJeJ, that is unaccounted 6or by our theory: is relatively small compared to the amount o6 scent processingJ In this section, Ae chec8 6or these conditions Aith a qualitative analysis o6 all ten participants; scent,6olloAing and hypothesis,processing behaviorJ Regarding the 6irst condition, Ae 6ound that Ahen there Aas a relationship betAeen hypotheses and scent, it o6, ten re6lected a bottom,up hypothesis modi6ication that Aas based on a scent gained in the code, or sometimes re, 6lected a hypothesis that led top,doAn to a neA scent to loo8 6orJ For e7ample, a6ter Aor8ing 6or some time on Qug QF, Earticipant R> gained scent 6rom getNitleO R>O LGetTitle, there we go!M 6olloAed by a hypothesis modi6icationO Lso this is the problem…we need to turn this into HTML,M 6olloAed directly by a scent to see8O Lso now the question is how do we test this to see if it’s HTML?M Sote that not all scent processing verbaliCations led to hypothesesJ Nhere Aere numerous e7amples o6 scent verbaliCations Aithout hypothesis verbaliCationsJ For e7ample, Earticipant R>;s early Aor8 on Qug QF involved scent alone as he tried to gain a better understanding o6 the Aay RSSOAl Aas organiCedO R>O “GUI, where is that? Okay, GUI RSSOwl tab folder. So is it something in this directory?” Us another e7ample, Earticipant VV discovered the class Lentity resolverM as he broAsed through code Ahile Aor8ing on Qug QFJ Nhis discovery triggered a scent,see8ing diversion unrelated to any e7pression o6 hypotheses, starting Aith the statementO VVO “I have no idea what an entity resolver is.” Nhese results ma8e clear that although some instances o6 hypothesis processing and scent processing Aere inter, tAined 9and there6ore are accounted 6or e7plicitly by our model:, not all o6 them Aere intertAinedJ Regarding the second condition, Ahether the remaining amount o6 hypothesis processing is relatively small compared to the amount o6 scent processing, Ae 6ound that this Aas indeed the case 6or our participantsJ Nable # compares the numbers and patterns o6 hypothesis and scent verbaliCationsJ Ke 6ound strong evidence that verbaliCa, tions about scent occurred more o6ten than verbaliCations about hypotheses 9paired t,test o6 log o6 countsO pWJXXXXXI, d6YV, mean o6 IJ"I times as many scents as hypotheses:J Nhis 6inding not only held in the aggregateZit Aas true o6 each individual participant 6or both issuesJ Figure " depicts a pro6ile o6 each participant;s hypothesis and scent verbaliCations over timeJ

"#

7his'article'has'been'accepted'for'publication'in'a'future'issue'of'this'*ournalM'but'has'not'been'fullN'edited4'Content'maN'change'prior'to'final'publication4 I999'7AACSAC7I(CS'(C'S(E7WAA9'9CGIC99AICG

$% &'()* )+, -'.)/(/-'%)0 ,1-.,00,2 &,3 %,3 +4-5)+,0,06 057,)/7,0 5%84 5%, 5. )35 5./9/%'8 +4-5)+,0,0 -,. :;9< ='8& )+, +4-5)+,0/0 0)'.)0 3,., 8'),. &58853,2 :4 +4-5)+,0/0 ':'%25%0< $%),.,0)/%984* -'.)/(/-'%)0 (5%&/.7,2 +4> -5)+,0,0 only 3+,% 35.?/%9 5% @;9 @A< A5. :5)+ @;9 @A '%2 @;9 BA* -'.)/(/-'%)0 752/&/,2 75., +4-5)+,0,0 )+'% )+,4 0)'.),2 C-'/.,2 )>),0) 5& 859 (5;%)06 @A6 -D (8;2/%9 5%, -'.)/(/-'%) 3+5 0)'.),2 %5 +4-5)+,0,0J BA6 -D&58853/%9 '()/R/)4 5R,. +4-5)+,0/0>5./,%),2 '()/R/)4< CV/0,%0)'2)W0 35.? /0 5%, 5& )+, &,3 (8'00/( 35.?0 /% 3+/(+ )+, .,0,'.(+,.0 2/2 %5) +'R, ' +4-5)+,0/0>5./,%),2 &5(;0 3+,% /%R,0)/9')/%9 +53 -,5-8, 2,:;9;- '()/R/)4 /% 3+/(+ [/%&5.7'%)0 7'4 +'R, +'2 ' .5;9+ /2,' 5& 3+') )+,4 3,., looking for C,7-+'0/0 '22,2L* :;) 3,., %5) ,1-8/(> /)84 ),0)/%9 '%4 +4-5)+,0,0 /% ' 040),7')/( 3'4 &5.7')/5% &5.'9/%9< $% -'.)/(;8'.* 3, '%'84U,2 5;. 2')' )5 0,, 3+,% /% )+,/. 2,:;99/%9 '()/R/)4 -.59.'77,.0 &58>

"#

7his'article'has'been'accepted'for'publication'in'a'future'issue'of'this'*ournalM'but'has'not'been'fullN'edited4'Content'maN'change'prior'to'final'publication4 I999'7AACSAC7I(CS'(C'S(E7WAA9'9CGIC99AICG

lo&ed scent and &hen they &or1ed on hypotheses3 4e chec1ed &hether scent 5ollo&ing per8aded all 1inds o5 de9 :ugging acti8ities and ho& it compared to hypothesis acti8ity during di55erent modes o5 de:ugging3 =o consider this >uestion? &e per5ormed a 5ine9grained analysis o5 Participant A# and Participant BA? as ex9 plained in Section #3 Eor this 5ine9grained analysis? &e &atched 8ideo recordings o5 the sessions in order to ta1e into account not only the participantsF explicit &ords? :ut also their actions? 5acial expressions? and so on3 =his led to the identi5ication o5 additional instances o5 :oth hypotheses and scent that had not :een e8ident &hen coding di9 rectly 5rom the transcripts3 =he most pre8alent o5 the additional instances &ere scent 5ollo&ing through use o5 the pop9up method documentation Gin tool9tip 5ormH3 4hereas in older programming en8ironments? loo1ing up docu9 mentation 5or a method &ould ha8e re>uired opening another 5ile and scrolling around? in Eclipse a mere mouse9 o8er >uic1ly :rings up documentation3 Jur participants rarely e8en mentioned this 5unctionality in their &ords? :ut used it extensi8ely in their actions to 5ollo& up on scents3 4e did not use these actions in the 8er:aliKation9:ased analyses? &hich &ere more conser8ati8e :ecause they relied solely on utterances in &hich participants made their intent explicit3 Lo&e8er? the data 5rom :oth types o5 analyses &ere consistentM although the 5ine9grained analysis increased the ra& counts o5 :oth hypotheses and scent? it did not change the relati8e proportions o5 hypotheses to scent3 4e categoriKed the maNor contexts in these participantsF 8ideos into six categories that emerged 5rom the data3 =hese contexts &ere maNor acti8ities? co8ering almost the entire data set? and &e there5ore characteriKe them as de9 :ugging !"#$%3 =he modes? &hich are summariKed in =a:le O? &ereM G"H &'(()*+, mode- Program understanding is &idely understood to :e an important part o5 de:ugging Ge3g3? PQo et al3 RSSTU? PVanNa and Woo1 "BAXU? PYans and 8on Mayrhauser "BBBUH3 =he type o5 program understanding that stood out in our participantsF data &as their attempts to :uild a mental model G[map\H o5 the program3 ]oth par9 ticipants started o55 :y mapping? not only in their 5irst tas1? :ut also in their second one3 =his mode &as character9 iKed :y participants scanning proximal cues? noticing :eacons P]roo1s "BA^U? and ta1ing in8entory o5 the in5orma9 tion patches3 Participant BA summariKed this mode succinctly &hen he saidM ./-,01$2,!$,%$$,34'2,24$,(5"6$72,)%,!'#$,"82,"9:;,, 4e :elie8e that this mode &as necessary to e8en :egin to 5ollo& scent? :ecause it pro8ided the in5ormation partici9 pants &ould need in later modes to interpret proximal cuesF scent? meaning? and relati8e importance3 Mode &'(()*+,

]ug ]E

]ug ME

_etting a &hole9program o8er8ie&3

#"3*,!'(()*+, _etting a detailed understanding o5 one particular portion o5 the code3 Running the application to see &here and ho& the ne& 5eature &ould mani5est itsel5 at run9 time3

?@%$5A)*+,24$,9')=85$,

=rying to &itness the :ug or mis:eha8ior at runtime3

1"7'2)*+,24$,9'8=2,

=rying to 5ind the location in the code that needs to :e changed

B)C)*+,24$,9'8=2,

Eixing the :ug3

D$5)9E)*+,24$,9)C,

E8aluating the changes Nust made to determine i5 the 5ix &as a good solution3

adding the ne& 5eature? including necessary re5actoring3

Table '( The six primary modes observed in the bug and feature tas9s(

"#

7his'article'has'been'accepted'for'publication'in'a'future'issue'of'this'*ournalM'but'has'not'been'fullN'edited4'Content'maN'change'prior'to'final'publication4 I999'7AACSAC7I(CS'(C'S(E7WAA9'9CGIC99AICG

(2) Drill-down mapping mode+ Participants worked harder to build a detailed mental model of some patches in the source code than others, drilling down into the details, and we termed those periods of time as >drill-down mapping.A For example, during this mode, Participant 98 looked at constructors, reading the detailed comments and trying to figure out exactly what happened when a new instance was created. As with mapping, both participants used drill-down mapping in both tasks. (3) Observe-the-failure mode+ It is common in debugging for programmers to try to replicate the failure, and we termed this mode >observe the failure.A (Eisenstadt also reported this mode in his >gather dataA category LEisenstadt 1993M.) In our study, an example failure had been included in Bug BF’s bug report, but detailed instructions for actually making the code fail were not present, and we noticed that some participants invested a great deal of effort in figuring out how to reproduce the failure so as to observe the program carefully. In our fine-grained analysis, this occurred in three of the four task instances we analyPed. Observing the failure included not only figuring out exactly what kind of data to provide to make the failure happen, but also figuring out where to install breakpoints and the like. From an information foraging perspective, at this point the (interim) prey was the failure, not the fault. For example, when trying to get the software to display the garbled character described in Bug BF, Participant 98 said+ 98+ 8I:m looking at CrookedTimber and I click on some of the items. I:m not seeing any examples of what they are talking about in the bug.B (4) Locate-the-fault mode+ We termed the participant to be in the >locate the faultA mode when he or she was currently seeking the location of the fault in the code. In the case of bug MF, we assigned this mode when the prey was the >hookA in the code where changes should be made. Participant 98 was in drill-down mode when he found a properties page, and went into locate-the-fault mode as he started exploring the code with the idea that he should copy some code and use it as a hook to add the reUuested feature+ 98+ 8So this looks goodE this is a properties page. Is there a view tabF Here:s a view one. Heah. Which one of these is differentF I would have to add one of those.B (5) Fix-the-fault mode+ Once a participant located the fault, it was not always obvious how exactly to fix it. In fact, for three of the four task instances in which a participant got to this stage, it occupied far more time than all other modes combined. We termed a participant’s efforts in devising a fix to be in >fix the faultA mode. Why did the fix mode take so longW These tasks were reasonably challenging, and there were many decisions to be made about how exactly to implement a reasonable solution, any one of which could lead down a path that ultimately had to be undone. For example, Participant 85’s fix to implement the new feature for Bug MF reUuired extensive refactoring of existing code. The environment’s refactoring tools supported the procedure, but these tools also introduced new problems. In all, Participant 85 spent most of the time on the refactoring aspects of the fix, including solving bugs introduced by the refactoring. Ye finished the refactoring aspect just before time ran out, so he was never able to spend much time on the remaining aspects of the fix. Participant 98, on the other hand, spent considerable time deciding how to proceed, then began entering new code to implement the fix. After about 15 minutes invested in this new code, however, he decided he had been going about it the wrong way, and removed most of his new code and then started a different approach. Yowever, he ran out of time before he could make progress with the second approach.

16

7his'article'has'been'accepted'for'publication'in'a'future'issue'of'this'*ournalM'but'has'not'been'fullN'edited4'Content'maN'change'prior'to'final'publication4 I999'7AACSAC7I(CS'(C'S(E7WAA9'9CGIC99AICG

$%& Verify mode+ It has long been 5no6n that, a8ter ma5ing a 8i;, novice and e;pert programmers ali5e evaluate 6hether their 8i; actually did correct the problem ABanCa and Coo5 "EF#G+ In our study, only one o8 the t6o particiH pants produced a 8i; complete enough to evaluate+ Ihe results o8 his 8irst evaluation led him to change his 8i;, and then evaluate it again+ Using these modes, the graphs in Figure L sho6 activity o8 hypotheses and scent 8or each o8 these participants and tas5s, modeHbyHmode, 6ith each mode on a separate ro6+ Ihe thic5 gray horiMontal bars sho6 the length o8 time each participant spent in that mode+ Ihe blue hash mar5s belo6 the gray bars denote hypothesis verbaliMations, and the red hash mar5s above the lines denote scent see5ing $verbaliMations and actions to request the popHup e;planaH tions&+ Oonger hash mar5s indicate t6o or more events o8 the same type in quic5 succession+

"#

7his'article'has'been'accepted'for'publication'in'a'future'issue'of'this'*ournalM'but'has'not'been'fullN'edited4'Content'maN'change'prior'to'final'publication4 I999'7AACSAC7I(CS'(C'S(E7WAA9'9CGIC99AICG

!igure (: Scent and hypothesis events observed for each mode. Top to bottom: Participant pursuits of scent were tri77ered in the source code far more than anywhere else9 Jther noticeable tri77ers included web re@ sources, the runtime behavior of R::Jwl, and the bu7 report itself AHu7 HD or IDC9 $hese results were consistent across participantsLeach information source was used by at least seven participants Aand no participant usin7 fewer than four information sourcesC, with source code bein7 the dominant information source for every participant9 $he distribution of the scent tri77ers was also reasonably consistent between Hu7s HD and ID, as can be seen by comparin7 the li7ht and dark bars on the left side of Di7ure 39 When startin7 a new hypothesis, however, tri77er patterns were less clear because of the low number of hypotheses9 However, it appears that, althou7h the bu7 de@ scription itself tri77ered the hypothesis over one@fourth of the time, no tri77er dominated the others as source code did for scent seekin79 Jther than the bu7@report teNt tri77ers, the distribution of tri77ers varied 7reatly between Hu7 HD and Hu7 ID Ali7ht and dark bars on the ri7ht side of Di7ure 3, respectivelyC9 Dor Hu7 HD, hypothesis tri77ers were mostly in the bu7 report, source code, and the pro7ram inputO for Hu7 ID, hypothesis tri77ers were mostly in the bu7 report and web resources9 Es eNpected, comparin7 the 7raphs in Di7ure 3 shows that these artifacts tri77ered far more scent@seekin7 behavior than hypothesis@processin7 behavior9 $hese results su77est that, althou7h non@code artifacts did affect participants> navi7ation behavior to some eN@ tent, it is reasonable for a model or tool to rely solely on the analysis of source code relative to the bu7 report to ob@ tain predictions, without incurrin7 the eNpense of the more costly analyses of runtime data, the web, and so on9 $his concurs with the PDQ: modelin7 approach, which relies solely on static analysis of source code relative to the bu7 report9

2#

7his'article'has'been'accepted'for'publication'in'a'future'issue'of'this'*ournalM'but'has'not'been'fullN'edited4'Content'maN'change'prior'to'final'publication4 I999'7AACSAC7I(CS'(C'S(E7WAA9'9CGIC99AICG

Figure (. Scent to seek (left) and hypothesis (right) verbali5&2!#,,,0! $K)++Q*!#,-".!10!K)++Q*2!3R+G5)'*!5!9;(+)4)(;(&*6+&!+=!8+>4J9()!4)+7)5>*2?!Int. Journal of ManMachine Studies2!:+@0!#-2!440!DS"CDDS2!#,-"0! $K)++Q*!#,,,.!K)++Q*2!10!R+G5)'!5!9;(+)4J9()!4)+7)5>>6&72!T&90!/0!UJ>5&M V+>4J9()!W9J'6(*2!:+@0!D#2!#,XCA##2!#,,,0! !$V;6!(9!5@0!AEE#.!H0!V;62!Y0!Y6)+@@62!Z0!V;(&!5&'!/0!Y69Q+G2!3[*6&7!6&=+)>596+&!*8(&9!9+!>+'(@!J*()!6&=+)>596+&! &(('*!5&'!5896+&*!+&!9;(!G(I2?!ACM Conf. Human Factors in Computing Systems2!440!S,ECS,X,!AEE#0! $V;6!(9!5@0!AEE".!H0!V;62!%0!1+*6(&2!\0!WJ45995&5*6)62!%0!P6@@65>*2!V0!1+5*2!Elements of Information Theory0!/+;&!P6@((&92?!IEEE Trans. Soft. Eng. vol. 312!&+0!B2!440!SSBCSBD2!AEED0! $](F6&(!(9!5@0!AEED.!10!](F6&(2!c0!VL()G6&*Q6!5&'!\0!1+I()9*+&2!3H5*6&7!4)+7)5>!8+>4)(;(&*6+&!I+&2!Protocol Analysis: Verbal Reports as Data0!cTR!Y)(**2! #,,"0! $`)69L*+&!(9!5@0!#,,#.!Y0!`)69L*+&2!R0!\e9;Q5)2!O0!W;5;>(;)6_!3\(&()5@6L('!%@7+)69;>68!](IJ776&7! 5&'!R(*96&72?!T&!Proc. ACM SIGPLAN 2010 Conf. Programming Language Design and Implementation (PLDI)2!440!"#XC"AB2!#,,#0! $\6@>+)(!#,,#.!]0/0!\6@>+)(2!3c+'(@*!+=!'(IJ776&72?!Acta Psychologica2!:+@0!X-2!440!#D#C#XA2!#,,#0! $U()@+8Q()!(9!5@0!AEES.!/0!U()@+8Q()2!/0!Z+&*95&2!F0!R():((&!5&'!/0!16('@2!3H:5@J596&7!8+@@5I+)596:(!=6@9()6&7!)(8+>M >(&'()!**2?!ACM Trans. Information Systems,!:+@0!AA2!&+0!#2!440!DCD"2!AEES0! !

"#

7his'article'has'been'accepted'for'publication'in'a'future'issue'of'this'*ournalM'but'has'not'been'fullN'edited4'Content'maN'change'prior'to'final'publication4 I999'7AACSAC7I(CS'(C'S(E7WAA9'9CGIC99AICG

$%ill et al. 2,,-. /. %ill, 1. 2ollock and K. 9i:ay-Shanker, @/xploring the NeighEorhood with Dora to /xpedite Software Maintenance.J In !roc% 'nt% Conf% ,utomated 2oftware Engineering 7,2E8, pp. 1MN23, 2,,-. $%ollan et al. 2,,,. J. %ollan, /. %utchins and D. Kirsh, @DistriEuted cognition: Toward a new foundation for human-computer interaction research,J ,C9 :rans% Computer=>uman 'nteraction, vol. -, pp. 1-MN1VW, 2,,,. $Katz and Ynderson 1VZZ. I.R. Katz and J.R. Ynderson, @DeEugging: Yn analysis of Eug-location strategies,J >u= man=Computer 'nteraction, vol 3, pp. 3\1N3VV, 1VZZ. $Kersten and Murphy 2,,\. M. Kersten and G. Murphy, @Mylar: Y degree of interest model for ID/s,J In !roc% ,spect=?riented 2oftware @evelopment Conf% 7,?2@8, 2,,\. $Ko et al. 2,,\. Y.J. Ko, %. Yung, and B. Y. Myers, @/liciting design re_uirements for maintenance-oriented ID/s: Y Detailed Study of `orrective and 2erfective Maintenance Tasks.JIn !roc% 'nt% Conf% 2oftware Engineering 7'C2E8, pp 12WN13\, 2,,\. $Ko et al. 2,,W. Y.J. Ko, B.Y. Myers, M.J. `oElenz and %.%. Yung, @Yn exploratory study of how developers seek, relate, and collect relevant information during software maintenance tasks,J 'EEE :rans% 2oftware EngineeringC vol. 32, no. 12, pp. V-1NVZ-, 2,,W. $1andaur and Dumais 1VV-. T.K. 1andauer and S.T. Dumais, @Y solution to 2latoas proElem: The latent semantic analysis theory of the ac_uisition, induction, and representation of knowledge,J !sychological Review 1,M, pp. 211N2M,, 1VV-. $1awrance et al. 2,,-. J. 1awrance, R. Bellamy and M. Burnett, @Scents in programs: Does information foraging theory apply to program maintenancebJ 'EEE 2ymp% Gisual Hanguages and >uman=Centric Computing 7GHI>CC8, pp. 1\N22, 2,,-. $1awrance et al. 2,,Za. J. 1awrance, R. Bellamy, M. Burnett and K. Rector, @`an information foraging pick the fixb Y field study,J 'EEE 2ymp% Gisual Hanguages and >uman=Centric Computing 7GHI>CC8, pp. \-NWM, 2,,Z. $1awrance et al. 2,,ZE. J. 1awrance, R. Bellamy, M. Burnett and K. Rector, @Using information scent to model the dynamic foraging Eehavior of programmers in maintenance tasks,J ,C9 Conf% >uman Jactors in Computing 2ystems 7C>'8, pp. 1323N1332, 2,,Z. $1eont:ev 1V-Z. Y. Leontjev, !"#$%$#&'()*+,"$*-,+.,,'(/+0(1.2,*+/3$#&4 Englewood Cliffs NJ: Prentice-Hall, 1978. $1etovsky 1VZW. S. 1etovsky, @`ognitive processes in program comprehension,J in Empirical 2tudies of !rogram= mers, /. Soloway and S. Iyengar, /ds. YElex 2uElishing `orporation, pp. \ZN-V, 1VZW. $1ong et al. 2,,V. F. 1ong, e. fang, g. `ai: @Y2I %yperlinking via Structural Overlap.J In !roc% E2ECI,C9 2'K2?J: 2ymp% Joundations of 2oftware Engineering 7J2E8, pp. 2,3N212, 2,,V. $Nan:a and `ook 1VZ-. M. Nan:a and `. `ook, @Yn analysis of the on-line deEugging process,J in Empirical 2tud= ies of !rogrammersL 2econd MorNshop, /. Soloway and S. Sheppard, /ds. YElex 2uElishing `orporation, pp. 1-2N1ZM, 1VZ-. $Nielsen 2,,3. J. Nielsen, @Information foraging: fhy Google makes people leave your site faster,J June 3,, 2,,3. $Online. YvailaEle: http:iiwww.useit.comialertEoxi2,,3,W3,.html $Yccessed: March 1W, 2,,V.. $2irolli 1VV-.

32

7his'article'has'been'accepted'for'publication'in'a'future'issue'of'this'*ournalM'but'has'not'been'fullN'edited4'Content'maN'change'prior'to'final'publication4 I999'7AACSAC7I(CS'(C'S(E7WAA9'9CGIC99AICG

P$ Pirolli) *Computational models of information scent7following in a :ery large In ;r%.($I&tl($I%rJ1 In ;r%.($I&t($"%&'($5%'tBare$#ai&te&a&.e$8I"5#:) pp$ "2J?""I) 2AAJ$ DSfh