Untangling Development Tasks with Software Developer’s Activity Martin Konopka, Pavol Navrat Slovak University of Technology in Bratislava Faculty of Informatics and Information Technologies Ilkovičova 2, 842 16 Bratislava, Slovakia
[email protected],
[email protected]
Abstract—A combination of several activities is required to solve a development task, but in the end, developer reports only part of it. It is difficult to understand whether all committed files were changed because of the reason in a given description. Software developers work on multiple tasks at once and often fail to distinguish them with separate commits because of their unknowingness, as well as of limitations of the current tools for source code versioning. Our idea is to address this problem with identification of software developer’s activities from a stream of interaction data in real time. We attempt to identify situations when a developer has worked on multiple tasks, to prevent him from tangling them in a single commit, or to aid him to separate certain activities from the task, e.g., floss refactoring. Index Terms—Tangled change, composite change, developer activity, interaction data, task context, code change, code review.
I. TANGLED CHANGES Software development tasks vary from simple ones to complex solutions, e.g., from one line changes to refactoring whole codebase. Because that developers switch between tasks, or get interrupted during their work, these scenarios happen: • Implementation of a new feature requires developer to refactor existing code (floss refactoring). • Developer fixes a bug, knowingly or unknowingly, when working on a different task. • Developer begins to work on a task with yet uncommitted and possibly forgotten changes in code. • Developer completes multiple tasks at once, but he does not separate them into individual commits. In many situations, software developer solves multiple tasks successively and then tangles them together in a single commit to a revision control system (RCS), describing it with only one task, which is (more or less arbitrarily) considered to be the main one. Unfortunately, developers do not get back to manually split up old commits with tangled changes, what negatively affects maintainability of source code [6], e.g., merging changes between development branches. Untangling already made changes is difficult without additional information about developer’s intents during his work. In this paper we present an idea of approach to identify tangled development tasks using development activities, and split them up when a developer attempts to commit them together, or even separate certain activities into own commits.
II. RELATED WORK Software developer performs a combination of activities to accomplish a development task [6], such as studying code, adding new feature, refactoring, testing, etc. He does so with interactions in an integrated development environment (IDE) and other tools [1], such as opening a source code file, navigating between files, or hitting a breakpoint. Monitoring developer’s interactions provides fine grained data about his tasks [1, 4, 5]. The Mylyn project [4] provides contextual information about a solved task which is helpful when returning back to work. However, multiple tasks are tangled together (even unintentionally) and not clearly distinguished with separate commits [6, 8], such as floss refactoring, i.e., task-independent refactoring within a solution of a task [3, 8]. Existing approaches mine source code repositories to identify tangled changes [3], even though already made commits already miss contextual information about the tangled tasks. We think that untangling development tasks should be done even before making a commit, although at the same time developers fail to do so by themselves as they often do not remember reasons of every change when submitting a commit [8]. Automatic identification of their activities [7] from interactions [1, 4], together with mapping them to tasks, may be applied for solving this problem. Machine learning methods are often used for identifying developer’s activities [7] or development sessions [6] from interaction data [5], such as Hidden Markov Models [7]. However, we are not aware of works that map developer’s interactions to activities, together with activities into corresponding tasks. Our hypothesis is that this is possible too, even incrementally in real time, not only from processing static data snapshot. We are exploring this line of research. III. UNEARTHING DEVELOPMENT TASKS Tangled changes in code are nevertheless activities that a developer duly performed but failed to record them in a commit message or when describing a task solution. Subsequently, they are hard to find in commit messages or descriptions when tasks had overlapped during developer’s work. To distinguish activities, we base our approach on automatic processing of interaction data.
C: Fix a bug
Tasks Activities
A
B: Add new feature
Refactoring
A: Change sockets to REST
A Debug
Edit
Debug
Study
Edit
V. DISCUSSION AND FURTHER WORK
Interactions Source code files
to different commits, e.g., when the same file was changed in different time. Optionally, a developer can use types of identified activities to describe suggested commit, e.g., “added new feature,” or “fixed bug #427”.
Time
Commits
Fig. 1. Example of de-structuring of developer’s work in time and separating source code changes to commits per development tasks or activities.
We de-structure software development into this hierarchy: • Tasks – high-level work assignments and results, e.g., new feature, bug fix, root canal refactoring. • Activities – sets of interactions required to accomplish a task [6], e.g., studying a code, adding a new code, debugging, or testing a new implementation. • Interactions – low-level interaction events recorded in an IDE [4, 5] or other applications [1], e.g., open a source code file, switch-to another file, compile project, run test case, breakpoint hit. Tasks may be decomposed top-down to activities and their interactions (Fig. 1). Our goal is to automatically identify activities from interactions, and then find patterns of activities to reversely compose tasks. We hierarchically classify activities by ontologies [5], e.g., for studying source code it may be developer’s own source code, a never-seen-before code, or own source code but later modified by other developer, etc. Selected properties of interaction data may be used for distinguishing between activities, e.g., difference in timestamps of interaction events, affected source code entities, or different event types. Hidden Markov Models are viable option for this problem [7], but we look into their incremental learning variations. We also consider representing interaction data with metrics as time series [5] and reasoning upon them in real time, e.g., number of changed lines occurred in certain time, number of visited source code documents, etc. IV. UNTANGLING CHANGES BEFORE COMMITTING We see it suitable to inform a developer about his attempts to tangle changes when preparing a commit, rather than analyzing commit history in an RCS later. Our idea of such method consists of 2 parts: • Identification of development activities from interaction events in IDE and RCS, and unearthing development tasks from these activities in real-time. • Intercepting developer’s attempt to commit all changes in a single commit with suggestions for separate commits, if they were identified so. Commit suggestions are generated from activities that affected not yet committed changes, see Fig. 1 for an example of 3 tasks. Presented suggestions to a developer may be revised, then committed to an RCS. We also take into account that multiple changes of the same source code file may belong
Monitoring developer’s activity through recording interaction events in an IDE is a rich source of information about developer’s work. We are at the beginning of our work on the proposed approach for untangling tangled changes. In comparison to existing works focusing on this problem, we attempt to identify overlapping tasks during developer’s work in real time and aid him before committing them together with suggestions of possible separation of changes. As a viable evaluation we see comparison with existing approaches using data with both available interaction data and commit history, as well as manual evaluation of commit suggestions by developers themselves. We also monitor over 20 software developers in medium size software company, as our work is part of the research project PerConIK – Personalized Conveying of Information and Knowledge [1]. ACKNOWLEDGMENT This work was partially supported by the Scientific Grant Agency of Slovakia, grants No. VG 1/0752/14 and VG 1/0646/15, and it is the partial result of the Research & Development Operational Programme for the project PerConIK, ITMS 26240220039, co-funded by the ERDF. REFERENCES [1] M. Bieliková, I. Polášek, M. Barla, E. Kuric, K. Rástočný, J. Tvarožek, P. Lacko, “Platform independent software development monitoring: Design of an architecture,” in Proc. of SOFSEM 2014, Springer-Verlag, 2014, pp. 126-137. [2] T. Fritz, D.C. Shephard, K. Kevic, W. Snipes, C. Bräunlich, “Developer’s code context models for change tasks,” in Proc. of FSE 2014, ACM, 2014, pp. 7-18. [3] K. Herzig, A. Zeller, “The impact of tangled code changes,” in Proc. of MSR ’13, IEEE, 2013, pp. 121-130. [4] M. Kersten, G.C. Murphy, “Using task context to improve programmer productivity,” in Proc. of SIGSOFT ‘06/FSE-14, ACM, 2006, pp. 1-11. [5] W. Maalej, T. Fritz, R. Robbes, “Collecting and processing interaction data for recommendation systems,” in Recommendation Systems in Software Engineering, M.P. Robillard, W. Maalej, R.J. Walker, T. Zimmermann, Eds. Springer Berlin Heidelberg, 2014, pp. 173-197. [6] R. Robbes, M. Lanza, “Characterizing and understanding development sessions,” in Proc. of ICPC ’07, IEEE, 2007, pp. 155-166. [7] T. Roehm, W. Maalej, “Automatically detecting developer activities and problems in software development work,” in: Proc. of ICSE 2012, IEEE, 2012, pp. 1261-1264. [8] Y. Tao, Y. Dang, T. Xie, “How do software engineers understand code changes?,” in Proc. of FSE ’12, ACM, 2012, article no. 51, 11 p.