Understanding software is an important aspect of software maintenance. ... The questions we sought to answer were: (1) How do programmers approach ... (2) Do programmers follow the Integrated Comprehension model 11]? Do they switch ..... Table 2 contains a column for combined Program and Situation model refer-.
PROGRAM COMPREHENSION AND ENHANCEMENT OF SOFTWARE Anneliese von Mayrhauser, A. Marie Vans, Steve Lang Abstract This paper reports on detailed observations during software enhancement tasks of ve programmers enhancing software. The enhancement tasks represent realistic work behavior by industrial programmers. The paper describes the kinds of actions programmers preferred during their task, the level of abstraction at which they were working, and the role of hypotheses in their enhancement strategies.
1 Introduction Understanding software is an important aspect of software maintenance. To enhance performance, add new features, or leverage existing code, a programmer must understand the code well enough to know what changes are needed, how to make them, and how to integrate new code into existing applications. For larger software products, understanding will, by necessity, be partial. Software products typically evolve through successive enhancements. These enhancements vary in size and complexity. Some programmers develop deep knowledge of a product through several enhancement cycles, others are asked to make enhancements to code with which they are unfamiliar. To learn more about comprehension behavior when enhancing software, we observed programmers in either category to see how professionals enhance software. The questions we sought to answer were: (1) How do programmers approach enhancing code? (2) Do programmers follow the Integrated Comprehension model [11]? Do they switch between its model components? Is there a preference for a model component? (3) Do they use hypotheses to drive cognition? What types of hypotheses are used and how are they resolved? Section 2 brie y explains the Integrated Comprehension Model. Section 3 describes the design of the study. Professional programmers worked on realistic enhancement tasks with software that ranged from 7KLOC to well over 40KLOC. Section 4 reports the results of the observations with regards to the questions posed above. Section 5 summarizes conclusions and provides working hypotheses, based on our results, that should be evaluated with controlled experiments.
Computer Science Department, Colorado State University, Fort Collins, CO 80523-1873, USA
2 Comprehension Model [11] Existing program understanding models agree that comprehension proceeds either top-down, bottom-up, or a combination of the two. Observations with large scale code [9, 11] indicate that comprehension involves both top-down and bottom-up activities. This led to the formulation of an Integrated Comprehension Model [11]. It consists of the following components: (1) Program Model, (2)Situation Model, (3) Top-Down Model (or Domain Model), and (4) Knowledge Base. Soloway and Ehrlich's model [7] is the basis for the top-down component (the domain model) while Pennington's model [5, 6] inspired the program and situation models. Program, situation, and top-down (or domain) model building form the three processes that lead to an understanding of code. Any of the three may be activated from any of the others. Beacons, goals, hypotheses, and strategies determine the dynamics of the cognitive tasks and the switches between the models. Each process component includes the internal representation (mental model) of the program being understood. This representation diers in level of abstraction for each model. Each model component also includes strategies to build this internal representation. The knowledge base furnishes the process with information related to the comprehension task. It also stores any new and inferred knowledge. The Top Down or Domain model of program understanding is typically invoked if the code or type of code is familiar. The top-down model or domain model represents knowledge schemas about the application domain. For example, a domain model of an Operating System (OS) would contain knowledge about the components of an OS (memory management, OS structure, etc.) and how they interact with each other. This knowledge often takes the form of specialized schemas including design rationalization (e.g., the pros and cons of First-Come-First-Serve versus Round Robin scheduling). A new OS will be easier to understand with such knowledge than without it. Domain knowledge provides a motherboard into which speci c product knowledge can be integrated more easily. It can also lead to eective strategies to guide understanding (e.g., understanding high paging rates requires understanding how process scheduling and paging algorithms are implemented). Hypotheses are important drivers of cognition. Letovsky [2] de nes Hypotheses as conjectures and comprehension activities (actions) that take on the order of seconds or minutes to occur. Actions classify programmer activities, both implicit and explicit during a maintenance task. Examples of action types include \asking a question" and \generating a hypothesis". Letovsky identi ed three major types of hypotheses: Why conjectures hypothesize the purpose of some function or design choice. How conjectures hypothesize about the method for accomplishing a program goal. What conjectures hypothesize about what something is, for example a variable or function. Conjectures vary in their degree of certainty from uncertain guesses to almost certain conclusions. Brooks [1] considers hypotheses the only drivers of cognition. Understanding is
complete when the mental model consists entirely of a complete hierarchy of hypotheses in which the lowest level hypotheses are either veri ed (against actual code or documentation) or fail. At the top is the primary hypothesis, a high-level description of the program function. Once the primary hypothesis exists, subsidiary hypotheses in support of the primary hypothesis are generated. The process continues until the mental model is built. Brooks considers three reasons why hypotheses fail: code to verify a hypothesis can't be found; confusion due to a single piece of code that satis es dierent hypotheses; and code that cannot be explained. Goals or Questions [2] embody the cognitive processes by which maintenance engineers understand code. A goal can be explicit or inferred from a hypothesis. It involves formation a hypothesis in support of reaching the goal (answering the question). Hypotheses then lead to supporting actions, lower level goals, subsidiary hypotheses, etc. Hypotheses occur at all levels, the application domain, algorithmic, and code levels. When code to be understood is completely new to the programmer, Pennington [5, 6] found that programmers rst build a control ow abstraction of the program called the program model. Once the program model representation exists, Pennington showed that a situation model is developed. This representation, also built from the bottom up, uses the program model to create a data- ow/functional abstraction. The Integrated Model assumes that programmers can start building a mental model at any level that appears opportune. Further, programmers switch between any of the three model components during the comprehension process. When constructing the program model, a programmer may recognize clues (called beacons ) in the code indicating a common task such as sorting.If, for example, a beacon leads to the hypothesis that a sort is performed, the switch is to the top-down model. The programmer then generates sub-goals to support the hypothesis and searches the code for clues to support these sub-goals. If, during the search, a section of unrecognized code is found, the programmer jumps back to building the program model. Comprehension is guided by the systematic bottom-up, the opportunistic top-down, or a mixed systematic/opportunistic strategy. In a systematic approach, the programmer applies a systematic order to understanding code completely, for example code comprehension line by line. An opportunistic approach studies code in an as-needed fashion. Littman et al. [3] found that programmers using a systematic approach to comprehension are more successful at modifying code (once they understand it). Unfortunately, the systematic strategy is unrealistic for large programs. A disadvantage to the opportunistic approach is that understanding is incomplete and code modi cations based on this understanding can be error prone [3]. Various aspects of this model were con rmed in prior studies. [9] showed for one enhancement task that the software engineer switched between all model components of the Integrated Model and reported actions occurring at all three levels of the model. [8] extended these results to include a debugging task. It also analyzed for detailed action types. Both interpret observa-
tions in terms of useful tool capabilities. [10, 13, 14] investigated whether observations could con rm the processes stipulated in the model. [10, 14] report on the comprehension process of one subject who used a systematic understanding strategy. [13] reports on the comprehension process related to an opportunistic strategy. It is structured around a hierarchy of goals, hypotheses and actions. [15, 16] report on comprehension behavior of four corrective maintenance subjects. These results support the Integrated Model, the switching behavior between model components, and the role of hypotheses in an opportunistic understanding process.
3 Design of Study The study represents ve observations of professional maintenance programmers working on software enhancements. Two were eld observations, three observed programmers working (separately) on the same enhancement task. The participants were asked to think aloud while working on enhancements. This was recorded. Sessions lasted about two hours. Four subjects were experts in the language/platform. Three were experts in the application domain. The others had at least intermediate knowledge in these areas. One participant had acquired signi cant knowledge about the software. The other four had little accumulated knowledge about the software; one had never looked at the code, but other software documentation; the other three had completed a debugging task prior to the enhancement. All were adding new functionality to existing software. Table 1 summarizes the subject classi cation. Subj
Task Enhancement EN1 Add Functionality EN2 Add Functionality AO-01 Add Functionality AL-01 Add Functionality AL-02
Add Functionality
Expertise Language/Platform; Application Domain Language: expert, application domain: expert Language: expert, application domain: expert Language: expert, application domain: expert Language: expert, application domain: intermediate Language: intermediate, application domain: intermediate
Table 1:
Accumulated Knowledge (Prior Work with Code) Signi cant knowledge Little knowledge (no code) Little knowledge (prior debug task) Little knowledge Little knowledge
Subject Classi cation
The tasks of EN1 and EN2 were real work assignments. The two subjects represent situations in which programmers nd themselves on occasion: being assigned software over a longer term thus having the opportunity to develop signi cant knowledge about it, or, \inheriting" a software product in one's area of expertise, but having to develop knowledge about the product itself while enhancing it. The subjects worked at dierent stages of the enhancement task. The programmers were allowed to use any tool, dynamic or static, they thought useful. The remaining three subjects worked on enhancements to a test support tool, C-Patrol [18]. Cpatrol allows testers to insert executable speci cations into C code. C-Patrol preprocesses them and translates them into C statements. During execution, C-Patrol collects data and reports
discrepancies between code execution and the executable speci cation. The three subjects had acquired limited knowledge about C-Patrol through a prior two hour debugging task. A knowledge questionnaire after the debugging task indicated that subjects AL-01 and AL-02 had learned more about the software than AO-01. The three subjects also diered in the environment they used: AO-1 was only allowed to use static analysis (no code execution), but could use any of the UNIX platform tools (grep, awk, etc.) as long as he didn't execute the code. AL-01 and AL-02 had access to a static environment, Lemma [4], that also provided various browsing and ltering capabilities useful for code comprehension.
3.1 Protocol Analysis Protocol analysis is used for analyzing observational data. Think-aloud reports of subjects working on tasks are transcribed and classi ed using the categories de ned in [14]. Each statement in the transcript is encoded as one of these a priori categories. The analysis proceeds from identifying single actions of various types to determining action sequences and extracting cognition processes and strategies. The rst analysis on the protocols involved enumeration of action types as they relate to the Integrated Cognition Model. Action types classify programmer activities, both implicit and explicit. Examples of action types are \Generating hypotheses about program behavior" or \mental simulation of program statement execution". The results of this analysis are traces of action types as they occurred in the protocol. Summary data is then computed as frequencies for individual action types as well as cumulative frequencies of actions per model component. The next step in the analysis combines segmentation of the protocols and identi cation of information and knowledge items. Segmentation classi es action types in terms of dierent levels of abstraction in the mental model. Hypothesis analysis concentrates on stating goals, making and resolving hypotheses, and actions supporting goal or hypothesis resolution. We distinguished three action types (stating a Goal, stating a Hypothesis, and supporting Actions) at the three levels of the Integrated Model (Program, Situation, and Top-down or Domain levels). Hypotheses are initially identi ed during action type analysis which tags generation, con rmation, or failure of each hypothesis. Each hypothesis is then classi ed by how it was stated in the protocol. For example, the hypothesis \warnings are related, we believe, to the resources either not being installed properly or not being compatible with this system," is a hypothesis about the cause of the buggy behavior. We classi ed the type of each hypothesis according to Letovsky's taxonomy into what, why, or how hypotheses [2]. The analysis follows each hypothesis through the protocol until it is resolved through con rmation, failure, or abandonment. A hypothesis is verbally con rmed or rejected (it fails). Hypothesis abandonment is explicit or implicit. Explicit abandonment occurs when the programmer decides it is not relevant or will be too much trouble. Implicit abandonment occurs when the hypothesis is stated but the programmer never returns to it. The hypothesis
was forgotten or dismissed without verbal con rmation. Process analysis determined the nature of actions over time. Each action has already been classi ed by level of abstraction (Program, Situation, or Domain level). We are interested in switching behavior between these component models: the frequency of switches between each model and whether switching is fairly unidirectional (top-down or bottom-up) or not.
4 Results 4.1 Programmer Actions Subject Code
EN1 Enhancement EN2 Enhancement AO-01 Enhancement AL-01 Enhancement AL-02 Enhancement Total Enhancement
Top-Down Model 35 11% 154 35% 88 24% 60 22% 73 20% 410 23%
Program & Situation Model 282 89% 288 65% 281 76% 214 78% 300 80% 1365 77%
Situation Model 62 20% 54 12% 37 10% 11 4% 36 9% 200 11%
Program Model 220 69% 234 53% 244 66% 203 74% 264 71% 1165 66%
Total Actions 317
442 369 274 373 1775
Table 2: Action-Types by Model { Totals & Frequencies Table 2 shows how often the subjects perform actions at the three levels of abstraction de ned in the Integrated Model. Percentages give an indication of relative frequency of these actions for each subject. Table 2 contains a column for combined Program and Situation model references, roughly corresponding to Pennington's [5] comprehension model. We wanted to identify patterns based on dierences between Pennington's bottom-up model and the top-down model. Subject EN1 (who knew the most about the software) works mostly at the program and situation model level (89% of the actions). This re ects his ability to make connections to the code level. It may also be typical of enhancement activities as all ve subjects tended to work at the lower levels and less frequently with the Top-Down model. In this way his behavior is similar to analysis results for corrective maintenance [15], although corrective maintenance actions show a higher percentage of actions at the domain level. EN1 also spent a great deal of time in the program model (he was interested in speci c parts of the code he had recently enhanced). Since he was a domain and language expert and had signi cant experience with the code, we surmise that his task did not require building a top-down mental model. He had worked with this code for several years and already had a domain level representation of the code. His task did not
require much adjustment to nor use the domain level mental representation. The focus of the remaining programmers (EN2 to AL-02) re ect an earlier stage of enhancement work which requires understanding the nature of the enhancement as well as nding the proper location in the code for the enhancement. The number of actions at the domain level for each of the four subjects is higher than for EN1. Even so, all subjects showed more actions at the situation and program model levels than at the domain level. Next, consider the types and frequency of actions at each model level. They re ect how often programmers worked at each level of abstraction (see table 3). Consider cumulative results rst (rightmost column). The most commonly found action during domain model construction is use of top-down knowledge (OPKNOW). Top-down knowledge is previously acquired knowledge at the domain level that is elicited from long-term memory. Frequent use of top-down knowledge supports the hypothesis that programmers who are familiar with the application domain develop and use the top-down model to structure information about the system. It is easier to decompose the program into functional units if the programmer is well acquainted with the domain. Individual results show an interesting phenomenon. The subject with the least accumulated knowledge (EN2) is responsible for 42% of the references to domain level knowledge. Likewise, most of EN1's domain level actions are use of domain knowledge (although compared to actions at other model levels he uses very few). The second most frequent action type is to gain highlevel information (OP1). 96% of these actions are attributable to subjects AO-01, AL-01, and AL-02. Apparently, these subjects still needed more domain level knowledge to structure and classify information for their task. It is unclear whether the debugging task didn't need as much of such high level structure knowledge or whether the subjects didn't know enough about the domain concepts related to C-Patrol. Given that two of them ranked \intermediate" in this area and none scored 100% on the knowledge test after the debugging task, a combinations of both is likely. Hypothesis generation (OP3 { the third most common action found in the protocols) drives the decomposition process. While all subjects use hypothesis generation, it is the third most frequent actions at the domain level only for subjects AO-01 and AL-02. The fourth and fth most frequent action is to generate a task (OP20) and determine the next program segment to examine (OP2). OP20, task generation, is almost exclusively done by EN2. Looking at the most frequent action types at this model level for EN2, it is apparent that EN2 is using the domain model for work planning and information structuring. The same holds true for EN1, except that his actions at the domain level are signi cantly fewer, indicating less need for such actions. Overall, these actions represent Goal - Hypothesis - Action Triads. A comprehension process starts with a goal (what is to be understood or done), supporting hypotheses are made which lead to one or more of the following: (1) formulation of subgoals, (2) formulation of
Code OP1 OP2 OP3 OP4 OP6 OP7 OP8 OP9 OP11 OP13 OP14 OP15 OP16 OP17 OP20 OPCONF OPKNOW Total SIT1 SIT2 SIT3 SIT4 SIT5 SIT6 SIT7 SIT8 SIT9 SIT10 SIT11 SITCONF SITKNOW Total SYS1 SYS2 SYS3 SYS4 SYS5 SYS6 SYS7 SYS8 SYS9 SYS10 SYS11 SYS12 SYS13 SYS14 SYS15 SYS16 SYS17 SYS18 SYS19 SYS20 SYS21 SYS22 SYS23 SYSCONF SYSKNOW Total
Action Type Gain high-level Program overview Determine next prgm. seg to examine Generate/revise hypothesis re: functionality Determine relevance of prgrm segment Determine understand strategy Investigate oversight Failed hypothesis Mental simulation High-level change plan/alternatives Study/initiate program execution Compare program segments Generate questions Answer questions Chunk & store knowledge Generate task Con rm hypothesis Use of Top-down knowledge Top-Down Model Actions Gain situation knowledge Develop questions Determine answers to questions Chunk & store Determine relevance of sit. know. Determine next info to gain Generate hypothesis Determine understand strategy Determine if error exists (missing funct) Failed hypothesis Mental simulation Con rm hypothesis Use of Situation model knowledge Situation Model Actions Read intro code comments/related docs Determine next prg segmt to examine Examine next module in sequence Examine next module in cntrl- ow Examine data structs & de nitions Slice on data Chunk & store knowledge Generate hypothesis Construct call tree Determine understand strategy Generate new task Generate question Determine if looking at right code Change direction Generate/consider dierent code changes Answer question Add/Alter code Determine location to set breakpt Failed hypothesis Determine error/omitted code to add Mental simulation Compare code versions Search for var de nes/use Con rm hypothesis Use of Program model knowledge Program Down Model Actions
EN1 1 9 1 0 0 2 1 0 0 0 0 0 0 0 0 1 19 34 3 2 1 7 1 0 4 1 2 1 2 3 35 62 2 6 14 11 2 0 17 30 0 9 14 8 2 2 3 1 15 11 10 7 7 0 0 17 32 220
EN2 1 6 13 7 5 0 1 1 17 10 3 7 4 13 35 6 31 160 1 1 0 31 0 1 1 0 0 0 11 1 7 54 13 10 11 17 1 0 40 11 0 3 24 7 3 0 25 1 10 6 2 17 13 0 1 9 10 234
AO-01 12 6 11 0 0 1 2 0 4 19 11 0 0 10 0 6 6 88 24 1 0 3 0 0 3 0 0 1 0 2 3 37 6 35 24 0 2 3 22 18 1 3 17 2 0 0 45 0 45 0 3 0 0 0 3 10 5 244
AL-01 22 7 4 0 0 0 1 0 4 0 2 8 4 2 0 2 4 60 3 1 0 2 0 0 0 0 0 0 0 0 5 11 0 46 35 0 0 2 12 11 9 1 6 10 0 0 30 7 13 0 2 1 4 0 1 7 6 203
AL-02 17 7 9 2 1 3 4 0 9 0 0 0 0 5 1 2 13 73 10 2 2 5 0 0 5 0 0 3 0 2 7 36 2 34 40 5 0 8 22 20 5 1 15 15 2 0 26 11 29 1 2 2 0 1 12 9 2 264
Total 53 35 38 9 6 6 9 1 34 29 16 15 8 30 36 17 73 415 41 7 3 48 1 1 13 1 2 5 13 8 57 200 23 131 124 33 5 13 113 90 15 17 76 42 7 2 129 20 112 18 19 27 24 1 17 52 55 1165
Table 3: Action Types - Enhancement sub-hypotheses, (3) actions that help to con rm or refute hypotheses. This interpretation is
Task Enhancement (5 subjects) (Total switches = 745)
Model Top-Down
Situation Program Total
Model Switches Top-Down Situation Model Model N/A 42 6% 57 N/A 8% 212 112 28% 15% 269 154 36% 20%
Program Model 229 31% 93 12% N/A
322 44%
Total 271 36% 150 20% 324 44% 745
Table 4: Action Switches { Absolute & Frequency certainly reasonable, if we consider the observations taken together as a \complete" enhancement task: subjects AO-01, AL-01, AL-02, and EN2 were designing and locating where to make the enhancement. EN1 was further along, i.e. implementing and debugging the enhancement. Each group shows dierent action type preferences. Each individual in the group representing the rst phase of the enhancement uses more and a larger variety of actions at the domain level. This re ects the need to build knowledge about the software's functionality and where and how to make the enhancement. EN1 uses fewer actions and action types (using domain knowledge - OPKNOW - and determining sequence - OP2 are the most frequent). At the program model level, the most frequent actions are determining the next code segment to examine (SYS2), reading code line by line (SYS3), and considering and making code changes (SYS15). Except for SYS15, these actions are part of bottom-up understanding. We can also recognize this behavior in the subjects' use of chunk and store actions (SYS7) and code reading according to control ow (SYS4). Because of his knowledge, EN2 showed a preference for using his program model knowledge instead. The subjects whose work included deciding where and how to make the enhancement (EN2, AO-01, AL-01, AL-02) also had a large proportion of actions related to this activity (SYS15, SYS17), while EN1, who was debugging his enhancement, had more actions related to debugging (SYS20,SYS21). At the situation model level, the top three actions were use of situation model knowledge (SITKNOW { primarily driven by EN1's use of situation model knowledge), chunk and store knowledge (SIT4 { mostly due to EN2), and gain situation knowledge (SIT1 { mostly due to AO-1 and AL-2). As before, the subject with the most knowledge of the software uses his situation model knowledge. The others attempt to gain such knowledge (via SIT1 or SIT4).
4.2 Processes Switches occur between all three models (top-down, program, and situation models). Table 4 summarizes switching behavior. The rows represent starting models and the columns rep-
resent ending models. This switching behavior con rms the assumptions of the Integrated Model (switching occurs at all three levels, implying that understanding is neither top-down nor bottom-up, but always a combination). The speci cs of the switching behavior are dierent than for the re-engineering subjects [12] who showed more of a preference for bottom-up switches (from program to situation model). It is also dierent from corrective maintenance [15] which shows fairly even switching rates. During enhancement, the subjects switched between situation and top-down models less than between the other model components. If the situation model acts as a bridge between the other two, software enhancement tasks may need this bridge more between program and situation model, rather than situation and domain model. The subjects preferred to make direct connections between the top-down and the program model. Model Top-Down (Domain) Model
Situation Model
Model Program Model
Tag OPH1 OPH2 OPH3 OPH5 OPH9
Hypothesis-Type Domain Procedure functionality/Concepts Variable functionality/domain concepts Rules of discourse/Expectations Existence of installed (running) program Permissions/Environment set correctly/ Tool functionality Location to add functionality Comparison of functionality at high level Level & structure of code/scope Top-Down Model Hypotheses Function/code block execution order/state Function/procedure function, call function Eect of running program Existence of functionality/algorithm/ variable Program function Situation Model Hypotheses Variable function Function/procedure function Location/type/existence of function call Statement execution order/state Variable value/defaults (Non-)Existence of construct (var/code) Variable/construct equivalency Syntax Meaning Variable de nition & location Code block/procedure comparison Code block function Code correctness, cause/location of error Location to add code/alternatives Program Model Hypotheses
OPH10 OPH11 OPH16 Total SITH2 SITH3 SITH4 SITH7 SITH8 Total SYSH1 SYSH2 SYSH5 SYSH6 SYSH7 SYSH8 SYSH9 SYSH10 SYSH12 SYSH13 SYSH14 SYSH16 SYSH18 Total
Total Refers 21 4 3 2 1
EN1 0 0 0 0 1
EN2 9 0 0 2 0
AO-01 4 0 3 0 0
AL-01 2 2 0 0 0
AL-02 6 2 0 0 0
4 2 1 38 1 2 1 1
0 0 1 2 1 1 1 1
3 2 0 16 0 1 0 0
1 0 0 8 0 0 0 0
0 0 0 4 0 0 0 0
0 0 0 8 0 0 0 0
7 12 3 5 9 3 9 1 2 5 3 5 5 20 16 86
0 4 0 1 2 3 7 1 0 0 0 0 0 15 1 30
0 1 0 3 1 0 1 0 0 0 0 1 0 1 4 11
2 2 0 1 5 0 0 0 0 5 1 0 0 3 1 16
0 0 0 0 0 0 0 0 1 0 1 1 0 1 7 11
5 5 3 0 1 0 1 0 1 0 1 3 5 0 3 18
Table 5: Hypothesis-Type Frequencies Enhancement
4.3 Hypotheses Table 6 reports for each subject the number of hypotheses made at each model level. The subjects made most of the hypotheses at the program model level (86). Less than half that
Subject Code
EN1 Enhancement EN2 Enhancement AO-01 Enhancement AL-01 Enhancement AL-02 Enhancement Total Enhancement
Top-Down Model 2 6% 16 57% 8 31% 4 27% 8 26% 38 28%
Program & Situation Model 34 94% 12 43% 18 69% 11 73% 23 74% 98 72%
Situation Model 4 11% 1 4% 2 8% 0 0% 5 16% 12 9%
Program Model 30 83% 11 39% 16 61% 11 73% 18 58% 86 63%
Total Hypotheses 36
28 26 15 31 136
Table 6: Hypotheses by Model { Frequencies & Percentages were domain level hypotheses (38) and very few were at the situation model level (12). EN1 made the fewest hypotheses, the majority (83%) at the program model level. This may be because he had a mental model of the top-down and situation levels, having worked with the system for so long. EN2, AO-01, AL-01, and AL-02 made most of their hypotheses at the domain and program model level, indicating the earlier stage of the enhancement task (where and how to make the enhancement). Table 5 lists the hypothesis types used by the subjects by level of abstraction of the Integrated Comprehension Model [11]. The leftmost column identi es at which model level the hypothesis is made, the second column codes the hypothesis type. The third column describes the hypothesis type. The next three columns identify the frequency by type, rst cumulatively, then for each subject individually. EN1 and EN2 diered in the types of hypotheses they made at the domain model level. The two groups of subjects (EN1 and EN2, AO-01, AL-01, AL-02) use dierent hypotheses. This re ects the dierent stages of the enhancement task (e. g., EN2 hypothesizes about the location to add functionality - OPH10, something EN1 has already solved) and the amount of prior work with the code (e. g., the CPatrol enhancers make most of the domain level hypotheses regarding the domain functionality of the software (OPH1), expectations (OPH2), and where to add the functionality (OPH10); EN1 has none like these - he has already implemented the enhancement). The few hypotheses at the Situation model level are consistent with debugging or knowledge acquisition. At the program model level, EN1 makes most of his hypotheses about code correctness and cause or location of error (SYSH16) (he was debugging an enhancement). The second and third most frequent types of hypotheses concerned variable values (SYSH7) and statement execution order (SYSH6). This is similar to behavior during corrective maintenance [16]: For corrective maintenance, the three most frequent hypotheses at the program level were SYSH6, SYSH16, and SYSH7 in that order. The other four subjects' program model hypotheses re ect their desire to nd out where and how to add the enhancement: the most frequent hypotheses were about the location to add code (SYSH18), the location, type, or existence of a function call (SYS5), and variable de nitions (SYSH12).
Next, we consider the nature of the hypotheses made (what, how, why hypotheses and whether they were con rmed, abandoned, or failed). Table 7 gives the number of hypotheses in each of these categories for each subject and cumulatively for all subjects. More than twice as many hypotheses were con rmed than abandoned or failed. This may be due to the subjects' expertise in application domain and language/platform. On the other hand, accumulated knowledge about the software could play a role in the few abandoned hypotheses for EN1 versus the other subjects. Knowing more about the product may enable a programmer to avoid hypotheses he or she must later abandon. This behavior is diametrically opposite to corrective maintenance (the programmer with more prior knowledge and expertise abandoned twice as many hypotheses than the programmers with less expertise [16]). We speculate that this behavior is task speci c and diers between enhancement and debugging. Enhancement tasks show fewer failed hypotheses than corrective maintenance [16]. This might be due to the intrinsic uncertainty (lack of knowledge) inherent in debugging. There are dierences between the other four subjects. They fall into two groups: (1) EN2 and AO-01, and (2) AL-01 and AL-02. The rst group had almost twice as many con rmed hypotheses as they abandoned while the second group had almost the same number of con rmed and abandoned hypotheses. The second group used a static analysis tool during the enhancement task while the rst group had no such tool. The Lemma tool had features that allowed asking complex questions (e. g. regarding slices, variable values, paths, etc.), thus encouraging more, more complex, and alternate hypotheses which could then be abandoned when the answer was found a dierent way. EN1 EN2 AO-01 AL-01 AL-02 Total
What 21 10 20 7 21 79
How Why 9 6 17 1 9 1 8 0 10 0 53 8
Conf Abandon 21 3 16 9 18 7 8 5 12 13 75 37
Fail 12 3 5 2 6 28
Table 7: Distribution of Hypothesis Categories The majority of the hypotheses were of type what. Letovsky [2] de nes what hypotheses as those that conjecture about what something is or does. Slightly fewer (53) how hypotheses (conjectures about the way something is accomplished) and only 8 why hypothesis (conjectures about the objective of an action or design choice). While the subjects voiced a signi cant number of what and how hypotheses, they had few why hypotheses. This is similar to results for corrective maintenance [16]. Thus, this cannot be task-speci c. It might be related to the amount of accumulated knowledge or expertise (understanding why the code was written a certain way may require more understaning than the programmer has). It could also be driven by (1) attitude (\I don't care why developers made certain decisions), (2) inability to nd or
recreate the design or programming rationale, or (3) perceived lack of time. In either case, not knowing why design and implementation decisions were made can lead to enhancements that degrade software.
5 Conclusions We observed ve experienced programmers while they were enhancing software and analyzed their behavior. The goal was to extend existing results ([17, 16]) and to answer several questions about how programmer go about enhancing software. Answers can be summarized as follows: (1) Actions. Enhancement work is done predominantly at the program and situation model level, even in the beginning, when decisions are made how to make the enhancement and where to put it. Using accumulated knowledge about the software or acquiring it are key activities. Detailed action types dier depending on the stage of the enhancement task. (2) Process. Enhancement tasks, like other maintenance tasks require frequent switching between levels of abstraction. Switches are multi-directional, con rming the Integrated Comprehension Model [11]. Work duration (number of actions) at a given level and number of switches between levels diers. We hypothesize that this is due to the amount of accumulated knowledge about the software and the programmer's strategy (systematic vs. opportunistic). The early stage of enhancement also requires signi cantly more cross-referencing between domain level, situation level and program level information. (3) Hypothesis making behavior diered between the subjects. EN1 made almost exclusively hypotheses about program and situation model level behavior, while EN2 made most at the domain level. This is likely related to the stage of the enhancement task. Amount of prior work with the code should also be a factor. EN2 was a domain expert, but had not worked with the code before. So he approached the problem starting with his own expertise, i. e. domain knowledge, hypothesize about the domain level functionality of the software and cross-reference into the code from there. The subjects enhancing the C-Patrol code fell in between EN1's and EN2's accumulated knowledge of the code (they had debugged it previously). So did their behavior. Hypothesis resolution was predominantly through con rmation or failure. EN1 in particular abandons only one hypothesis. It is not clear whether this is a function of his strong knowledge of the code. His behavior supports Brooks' theory on hypotheses [1], although none of our other observations do [13, 15]. Programmers appear to make few \why" hypotheses [13, 16, 17]. The subjects of the present study are no exception. EN1, however, made more than the other subjects. Thus hypothesizing about \why" something was done might be related to the accumulated knowledge about a software product and the type of maintenance task. This should be investigated further. We expected more why hypotheses during enhancements tasks: Knowing why software is built in a certain way appears useful, even necessary for making good enhancement decisions. So it is a bit surprising that so few hypotheses were why hypotheses.
References [1] Ruven Brooks, `Towards a theory of the comprehension of computer programs', International Journal of Man-Machine Studies, 18(1983), pp. 543-554. [2] Stanley Letovsky, `Cognitive Processes in Program Comprehension', Empirical Studies of Programmers, Eds. Soloway and Iyengar, Ablex Publ.,1986, pp. 58 - 79. [3] D. C. Littman, J. Pinto, S. Letovsky, E. Soloway, `Mental Models and Software Maintenance', Empirical Studies of Programmers, Eds. Soloway, Iyengar, Ablex, 1986, pp. 80 - 98. [4] Mays, R. G., and T. M. Stubbs, \Advances in program understanding technology in Lemma," IBM Technical Report 29.2008 (1995). [5] Nancy Pennington, \Stimulus Structures and Mental Representations in Expert Comprehension of Computer Programs", Cognitive Psychology, 19(1987), pp.295-341. [6] Nancy Pennington, `Comprehension Strategies in Programming', Empirical Studies of Programmers:Second Workshop, Eds. Olson, Sheppard, and Soloway, Ablex Publ., 1986, pp. 100 - 112. [7] E. Soloway, B. Adelson, K. Ehrlich, `Knowledge and Processes in the Comprehension of Computer Programs', The Nature of Expertise , Eds. M. Chi, R. Glaser, M.Farr, Lawrence Erlbaum Ass., 1988, pp. 129-152. [8] A. von Mayrhauser, A. Vans, `From Program Comprehension to Tool Requirements for an Industrial Environment', Procs. 2nd Workshop on Program Comprehension, July 1993, pp. 78 -86. [9] A. von Mayrhauser, A. Vans, `From Code Understanding Needs to Reverse Engineering Tool Capabilities', Procs. 6th Intl. Workshop on Computer-Aided Software Engineering (CASE93), Singapore, July 1993, pp. 230 - 239. [10] A. von Mayrhauser, A. Vans, `Comprehension Processes During Large Scale Maintenance', Procs. 16th Intl. Conf. on Software Engineering, Sorrento, Italy, May 1994, pp. 39-48. [11] A. von Mayrhauser, A. Vans, `Industrial Experience with an Integrated Code Comprehension Model', IEE Software Engineering Journal, Sept. 1995, pp. 171 - 182. [12] A. von Mayrhauser, A. Vans, `On the Role of Program Understanding in Re-engineering Tasks', Procs. 1996 IEEE Aerospace Conference, Snowmass, February 1996, pp. 253 - 267. [13] A. von Mayrhauser, A. Vans, `On the Role of Hypotheses during Opportunistic Understanding While Porting Large Scale Code', Procs. 4th Workshop on Program Comprehension, Berlin, March 1996, pp. 68 - 77. [14] A. von Mayrhauser, A. Vans, `Identi cation of Dynamic Comprehension Processes during Large Scale Maintenance', IEEE Trans. Software Engineering, vol. 22, no. 6, June 1996, pp. 424 - 438. [15] A. von Mayrhauser, A. Vans, `Program Understanding Needs During Corrective Maintenance of Large-Scale Software', Procs. COMPSAC97, August 1997, Washington, DC. [16] A. von Mayrhauser, A. Vans, `Hypothesis-Driven Understanding Processes During Corrective Maintenance of Large Scale Software', Procs. 1997 Intl. Conf. Software Maintenance, Oct. 1997. [17] A. von Mayrhauser, A. Marie Vans, A. E. Howe, `Program Understanding Behavior During Enhancement of Large Scale Software', Intl. J. of Software Maintenance, Vol. 9, pp. 299-327. [18] H. Yin, J. M. Bieman, \Improving Software Testability with Assertion Insertion", Procs. International Test Conference, Oct. 1994, p. 831 { 839.