Predicting Refactoring Activities via Time Series Giuliano Antoniol, Massimiliano Di Penta and Ettore Merlo
[email protected] [email protected] [email protected]
RCOST - Research Centre on Software Technology University of Sannio, Department of Engineering Palazzo ex Poste, Via Traiano 82100 Benevento, Italy
Abstract
´
Ecole Polytechnique de Montr´eal Montr´eal, Canada
taken to bring back the software to a more maintainable and evolvable state. Possible actions include, but are not limited to, clone detection and refactoring, code smells removal, library reorganization to reduce their size, modules and subsystem reorganization to increase cohesion while decreasing coupling. Interventions should also be performed to promote or limit the usage of design patterns: a limited use can be considered as a symptom of poor design, while an excessive use can be the cause of overhead.
Refactoring activities should be carefully planned in advance for different reasons: i) they can be expensive in terms of resources needed; ii) refactoring is highly requested when clones, code smells or other potential problems reach a given threshold; and iii) such activities should be performed when the evolution of the software system is in a stable period. This work proposes to apply time series forecasting for determining the future releases in correspondence of which refactoring activities should be performed. The idea is to analyze different metrics (size and complexity, cloning ratio, number of smells, etc.) from past releases to predict the optimal refactoring point.
It is also worth pointing out that refactoring interventions have to be carefully planned ahead, taking into consideration the technical as well as the organizational and managerial implications of the different choices. For example, it may not always be true that clone refactoring will always contribute to improve the software system maintainability and, above all, reliability: it could instead introduce new bugs or have an unpredicted impact [2]. All in all, refactoring activities are expensive and, if performed during a period of significant evolution for the system, they could be risky, especially in terms of system reliability. For example, it should be avoided to perform refactoring right before an important release milestone. As suggested in [3], such refactoring activities should be performed during periods in which the software system evolution in terms of size, complexity and functionalities is not particularly significant. In other words, prior to any other activity a preliminary phase of identifying suitable windows of opportunity to carry out refactoring is required. This in turn highlights the need that an organization is able to track and predict its software evolution.
Keywords: refactoring, time series forecasting
1. Introduction Software systems must evolve to cope with user ever changing needs [1]. Despite the general agreement that the cost of change interventions for maintenance and evolution purposes largely exceed the development costs, only a few contributions addressed the problem of identifying and managing refactoring interventions to keep the software system quality at an acceptable level. Indeed, changes and maintenance interventions are rarely planned in advance, and very often they are limited to bug fixing with the undesired consequence of inevitably contributing to deteriorate the structure of the software system itself. Deterioration per se may be extremely difficult to define and to quantify, however, it can be measured in terms of different factors: from clones to code smells, from library sizes and organization to cohesion and coupling measures, and to the correct use of design patterns. In this paper we assume that when such factors reach certain critical values, different form of actions should be
Based on the above considerations, we propose an approach, relying on Time Series (TS) forecasting, to plan refactoring activities. As shown in Figure 3, different types of metrics at different granularity levels are extracted from the current and the past versions of the monitored software system. For each metric, a TS is built and then used to predict future values. Such values can then be used to determine the window of opportunity and an optimal refactoring 1
point, i.e., the release at which particular refactoring activities should be performed.
20 Rel.
Rel.
1.2
2.0
1 step ahead 5 steps ahead 10 steps ahead
Rel.
Rel.
1.3
2.1
15
Rel.
2. Time Series Forecasting MRE
2.2
A TS is defined as a sequence of observations ordered along a single dimension, such as time. TS data often arise when monitoring industrial processes, business metrics and, in general, the temporal behavior of a phenomenon. One of the possible objectives in analyzing TS is prediction: given an observed TS, one may want to predict its future values. This kind of analysis is used in many applications, such as economic forecasting, budgetary analysis, stock market analysis, process and quality control, census analysis, and many more. There are many methods used to model and forecast TS [4, 5]: Some of the most widely adopted are:
Rel.
Rel.
2.3
2.4
10
5
0 50
100
150
200
250 Release
300
350
400
Figure 1. Linux Kernel - MRE on KLOCs prediction.
¯ Box-Jenkins Auto Regressive Integrated Moving Average (ARIMA) models;
(1) The Mean Magnitude of Relative Error (MMRE) over a sequence of n experiments is defined as:
¯ Box-Jenkins Multivariate Models; ¯ Holt-Winters Exponential Smoothing; and ¯ Multivariate Autoregression.
TS may be affected by two main phenomena: nonstationarity (i.e., presence of trends) and seasonality (i.e., fluctuations). For our purpose, it is worth pointing out that, while often TS related to size exhibits a trend, the TS related to smells, clones or other characteristics involved in refactoring may exhibit a seasonality. In fact, it may happen that clones tend to increase until refactoring is performed, they fall down and then tend to increase again. A TS prediction process can thought of as a sequence of three steps, i.e., metrics extraction, model building, and future value prediction. First and foremost, the features to be predicted (i.e., metrics, but also cloning ratio, number of code smells, etc.) must be extracted from each (past) release of the software system. It is worth noting that the accuracy of the prediction is highly correlated to the size of the training set (i.e., the number of past releases used to build the model). Subsequently, once the TS is modeled using one of the above mentioned model, it is possible to predict future values of -steps ahead. To determine the model performance, given a TS of points, we can use points of the TS realization to build the model, predict future values and then match them against the actual values. The k-th steps ahead predicted value is compared against the actual value (i.e., ) in order to evaluate the k-th steps ahead prediction error. Let then be the predicted value at time and its actual value, the Magnitude of Relative Error (MRE) is defined as:
(2)
Prediction errors are usually low so that predicted quantity is useful for planning and managing software projects [6]. As an example of the achievable accuracy, Figure 2 reports MRE from the KLOCs prediction on a sequence of over 400 Linux Kernel releases. MMRE values for different metrics and different steps ahead are shown in Figure 2. Results on prediction cloning evolution can be found in [7] (where mSQL cloning was predicted with a one-step-ahead MMRE of 3.8%), while examples of clone refactoring occurred during the Linux Kernel evolution are shown in [8]. 5 LOCs # of Functions Cycl. Complexity 4
MMRE (%)
3
2
1
0 1
2
3
4
5
6
7
8
9
10
Steps ahead
Figure 2. Linux Kernel - metrics prediction MMREs.
2
....
RELEASE 1
..........
RELEASE n
RELEASE n+k
Timeline
Metrics Extraction
Smells Detection
Metrics Extraction
Smells Detection Refactoring Decision
Metrics Time Series Forecasting
Smells Time Series Forecasting
REFACTORING
Estimated Smells REFACOTORED RELEASE n+k
Estimated Metrics
Figure 3. Determining the refactoring point.
3. The Refactoring Plan Indicators
refactoring, have been extensively described in [3]. Examples are long methods, or the use of the same temporary variable for different purposes (i.e., in different assignments). Grant and Cordy [13] proposed a method for automatic detection and removal of such smells. We believe that the monitoring and prediction of their evolution will help to identify when applying refactoring transformations;
As stated in the introduction and also shown in Figure 3, the refactoring point may be located by means of TS forecasting. In particular, two different aspects should be considered. First and foremost, forecasting on TS obtained from dimensional and structural metrics (LOCs, average cyclomatic complexity, number of functions/classes/modules) will help to determine a stable evolution region, a region where refactoring activities should take place [3]. Once determined a candidate refactoring point, forecasting of different software system characteristics will help to select the activities to be performed. In particular, we can classify characteristics to be monitored as follows:
3. Library size, cohesion and coupling: when new functionalities are added, often software system libraries tend to increase their size, new objects are spread across the software system, and the coupling level between different modules/libraries tend to increase. As a consequence, the memory requirements of applications tend to increase. Increasing the used resources may pose serious problems when porting applications on limited resource devices. Moreover, the maintainability of the overall system is also deteriorated. This suggests of periodically reorganizing libraries, for example splitting the biggest in smaller ones [14], clustering recently-added objects in new libraries (or adding them to existing libraries). As a consequence, indicators such as the average size of libraries, but also the cohesion and coupling levels should be monitored and predicted for future releases;
1. The presence of source code clones: When writing a device driver, porting an existing application to a new processor, or for any other reason, developers may decide to copy an entire working subsystem and then modify the code to cope with the new hardware. This technique ensures that their work will not have any unplanned effect on the original piece of code they have just copied. However, this evolving practice promotes the appearing of duplicated code snippets, also said clones. The literature proposes various methods for identifying clones in a software system [9, 10, 11, 12]. Sometimes, as also shown in [8], clone refactoring activities are performed to improve the source code structure and, possibly, its maintainability. As a result, if refactoring is consistently performed, during software system evolution new clones tend to appear, while at the same time old ones are refactored, and the overall cloning ratio tend to be kept stable. TS forecasting approaches would help to analyze the cloning ratio evolution [7] and to suggest possible refactoring activities;
4. The presence of design patterns: when monitoring the evolution of object-oriented software systems, it is worth investigating on the presence of design patterns into design documents and source code [15]. The identification of design pattern instances provides insight on software artifact structure and reveals places where changes, reuse, or extensions are expected. Moreover, design pattern extraction can give to maintainers a measure of source code/design quality and, finally, helps in the identification and extraction of components from existing software systems [16]. However,
2. Source code smells: a large list of source code smells, i.e., of portions of source code that can benefit from 3
a trade-off should be pursued between the benefits of design patterns and the side effects of an excessive use of design patterns, that may be cause of overhead [17].
[8] G. Antoniol, E. Merlo, U. Villano, and M. Di Penta, “Analyzing cloning evolution in the Linux Kernel,” Information and Software Technology, vol. 44, pp. 755– 765, Oct 2002.
4. Conclusions
[9] B. S. Baker, “On finding duplication and nearduplication in large software systems,” in Proceedings of IEEE Working Conference on Reverse Engineering, July 1995.
This paper proposes to use TS forecasting to determine candidate refactoring points, i.e. releases at which refactoring should be performed. This can be determined from different factors, comprising the stability of the software system evolution and the level of refactoring indicators such as the presence of clones, code smells, and also the library size, cohesion and coupling. Previous studies indicated the effectiveness of TS for predicting dimensional and structural metrics [18], and also the cloning level [7]. Work in progress is devoted to apply the approach to correlate software evolution with refactoring points and to study the evolution of characteristics such as code smells and design patterns.
[10] I. D. Baxter, A. Yahin, L. Moura, M. Sant’Anna, and L. Bier., “Clone detection using abstract syntax trees,” in Proceedings of IEEE International Conference on Software Maintenance, pp. 368–377, 1998. [11] T. Kamiya, S. Kusumoto, and K. Inoue, “CCFinder: A multilinguistic token-based code clone detection system for large scale source code,” IEEE Transactions on Software Engineering, vol. 28, pp. 654–670, July 2002. [12] K. Kontogiannis, R. De Mori, R. Bernstein, M. Galler, and E. Merlo, “Pattern matching for clone and concept detection,” Journal of Automated Software Engineering, March 1996.
References [1] M. M. Lehman and L. A. Belady, Software Evolution Processes of Software Change. Academic Press London, 1985.
[13] S. Grant and J. Cordy, “Automatic code smell detection by source transformation,” in Proceedings of IEEE Working Conference on Reverse Engineering, (Victoria, BC, Canada), IEEE CS Press, Nov 2003 (to appear).
[2] J. Cordy, “Comprehending reality - practical barriers to industrial adoption of software maintenance automation,” in Proceedings of the IEEE International Workshop on Program Comprehension, (Portland, OR, USA), pp. 196–205, May 2003.
[14] G. Antoniol and M. Di Penta, “Library miniaturization using static and dynamic information,” in Proceedings of IEEE International Conference on Software Maintenance, (Amsterdam, The Netherlands), pp. 235–244, IEEE Press, Sep 22-26 2003.
[3] M. Fowler, K. Beck, J. Brant, W. Opdyke, and D. Roberts, Refactoring: Improving the Design of Existing Code. Addison-Wesley Publishing Company, 1999.
[15] G. Antoniol, G. Casazza, M. Di Penta, and R. Fiutem, “Object-oriented design patterns recovery,” Journal of Systems and Software, no. 59, pp. 181–196, 2001.
[4] A. Harvey, Forecasting Structural Time Series Models and the Kalman Filter. Cambridge University Press, 1989.
[16] L. Tahvildari and K. Kontogiannis, “On the role of design patterns in quality-driven re-engineering,” in European Conference on Software Maintenance and Reengineering, (Budapest, Hungary), pp. 37–46, Mar 2002.
[5] G. Box and M. Jenkins, Time Series Forecasting Analysis and Control. San Francisco (USA): Holden Day, 1970. [6] S. Vicinanza, T. Mukhopadhyay, and M. Prietula, “Software-effort estimation: an exploratory study of expert performance,” Information Systems Research, vol. 2, pp. 243–262, dec 1991.
[17] P. Wendorff, “Assessment of design patterns during software reengineering: Lessons learned from a large commercial project,” in European Conference on Software Maintenance and Reengineering, (Lisbon, Portugal), IEEE Press, Mar 2001.
[7] G. Antoniol, G. Casazza, M. Di Penta, and E. Merlo, “Modeling clones evolution through time series,” in Proceedings of IEEE International Conference on Software Maintenance, (Florence, Italy), pp. 273–280, Nov 2001.
[18] F. Caprio, G. Casazza, M. Di Penta, and U. Villano, “Measuring and predicting the linux kernel evolution,” in Workshop on Empirical Studies on Software Maintenance, Nov 2001.
4