cause the developer to switch to a new task even before completing the one at ..... This caused a large TIIA for that method to account for all the approximately 20 ...
Automated Identification of Tasks in Development Sessions Irina Diana Coman, Alberto Sillitti Free University of Bozen-Bolzano {icoman, asillitti}@unibz.it Abstract During a single development session, software developers often work on multiple tasks. Interruptions from the environment or higher priority tasks can cause the developer to switch to a new task even before completing the one at hand. While there are already several approaches aiming at assisting developers in recovering their contexts of previous tasks, they generally rely on the developer to identify the beginning of each task. We propose a new technique for automatically splitting a development session into task-related subsections, based on the interaction of the developer with the IDE. The technique also shows potential benefits for automatic concern detection and for suggestions for code investigation. We present the technique, the results of a study conducted for its initial validation, and we discuss the additional potential benefits under investigation.
1. Introduction During a single development session, software developers often work on multiple tasks. Interruptions from the environment or higher priority tasks can cause the developer to switch to a new task before completing the one at hand. Studies on information workers, including software developers, show that many of the task switches (40%) are self initiated, with almost half (19%) corresponding simply to moving on to a new task [4]. In many cases (41%), an interruption causes also a task switch as the initial task is not resumed immediately after the interruption [3]. The main problem associated with task interruption is the saving of the context of the task and the main problem associated with resuming of a previously interrupted task is recovering the context of the task. For developers working on maintenance tasks, the context usually consists in code locations that have been identified as relevant to the task. Techniques such as concern inference [6] and concept identification [5]
try to assist developers with the recovery of task context. The building of the context itself is hard, as the developers have to decide what code to investigate and to remember which code is relevant. Various techniques offer developers suggestions on possibly relevant code [9], [8] or propose organization of visualizations of the code elements according to their perceived relevance [7]. The above techniques rely on the developer to identify the actual start of working on a specific task. Without knowing when a switch to another task occurs, such techniques might mix the contexts of two or more tasks, which would have a negative impact on the quality of their results. Moreover, Robillard and Murphy, [10] acknowledge the identification of the actual start point of a task (after the initial code exploration) as a technical challenge for attempts at supporting task-aware software development environments. In this paper, we propose a new technique for automatically splitting development sessions into taskrelated subsections, based on basic type of events from the interaction history between the developer and her computer. This technique can be used as a preliminary step before applying a concern identification technique. It also shows potential in identifying locations of code that are perceived as relevant to each task and that could be used later by other developers in understanding the implementation of a solution to a particular task. Another possible future usage of this technique is a novel approach for suggestions for code investigation. The paper is organized as follows: section 2 and subsections describe our approach; section 3 and subsections present the experiment conducted for validating our approach and the results obtained; section 4 and subsections discuss the results and the potential additional benefits under investigation; section 5 reviews related works; finally, section 6 draws the conclusions and introduces future work.
2. Our approach Our approach is based on the interaction history of the developer with the computer. An interaction history can contain a large number of types of events, such as running specific IDE commands, editing, opening files, or navigating code dependencies. Our approach makes use of a single, basic type of events, namely the change of the method on focus. Thus, the interaction history consists in a stream of events reflecting the sequence in which the developer accessed various methods and the time spent interacting with each method before accessing another. Each event has three characteristics: the timestamp, the complete name of the method, and the duration in seconds. We choose to use only this kind of events in an attempt to keep our approach as simple as possible and independent of the specific IDE or other software that the developer is using. We have two guiding hypotheses: H1: The locations that are essential to solve a task (the core) are accessed throughout the solving more than other locations. The work of developers on maintenance tasks consists in finding, understanding, and editing (modifying or adding) taskrelevant code [1]. While the definition of what code is relevant to a task is subjective, there are nevertheless a set of locations that are essential for each task (e.g., those that have to be modified). We name this set of essential locations the core of a task. This notion is related to that of concerns [6] but it is more restrictive: there can be just one core for each task, while there can be several concerns for each task. In the process of finding the relevant code and understanding the relationships between information, the developer might visit many locations that are not even of interest for the actual task. However, developers access locations of no interest to the task only briefly. Developers access locations of reduced interest to the task more, possibly repeatedly, and they access the core locations even more, intensively and repeatedly, throughout the task and especially when the actual required modifications are made. H2: The core of a task contains several locations that will be intensively accessed during approximately the same time. We expect the core of real-world tasks to usually contain several locations. When the developer makes the modifications required to solve the task, she accesses intensively the core locations of that task. Thus, some of the time intervals
of intensive access (TIIAs) of these locations are overlapping to a large degree. Based on the above hypotheses, we first identify the cores in a development session. Then, we use their time positions as an initial rough approximation of the time position of each subsection. Starting from these approximated points, the time frame of each subsection is built by grouping together methods with overlapping or close TIIAs. While in the present paper we consider only the interaction of the developer with the IDE, the approach could easily be extended to make use of the interaction history also outside the IDE. Outside of the IDE, the events would represent a change of the focused window while inside the IDE they represent the change of the focused method.
2.1 Core identification We identify cores as groups of methods with greatly overlapping TIIAs. We measure the accesses to a method based on the amount of time that a method is focused. To assess how intensively a developer accesses a method, we define a measure called degree of access (DOA). The DOA for a method m at a given time t and considering a first access to m at time t0 is the ratio between the access time (AT) and the total interval of time considered:
DOA(m, t ) =
AT (m, t ) t − t0
where AT is the amount of time that the method m is focused in the time interval between t0 and t. All time values are considered in seconds. Due to the definition, DOA always has values between 1 and 0. We consider that the TIIAs for a method m are the time intervals where DOA(m)>th, where th is a threshold. Figure 1 shows the DOA for one method, computed using a threshold 0.2. There are two TIIAs for the method, namely the time intervals when DOA is greater than 0.2. Each increase of the DOA marks an actual access to the method. The longer the access, the more the DOA increases. Thus fewer, higher peaks of the DOA mean fewer but longer accesses (first TIIA), while more, smaller peaks of the DOA mean more, shorter accesses (second TIIA). Although the actual values of DOA in each point provide information on the length and number of accesses, for the purpose of core detection we use, as the TIIAs for a method, only the time frames when DOA is greater than the threshold th.
Figure 2. The height of a peak.
Figure 1. DOA for one method, with threshold 0.2. The value of DOA for a method m in a point t>t0 is greatly influenced by the value chosen for t0. Each TIIA(m) has a different t0, that is the moment of the first access to the method m during that specific TIIA. Thus, at the beginning of each TIIA, DOA is always 1. During the TIIA, the DOA decreases when the method is not focused and increases when the method is focused again (Figure 1). As soon as the value of DOA(m,t) gets lower than the threshold th, (DOA(m,t)