Code Smell Eradication and Associated Refactoring - wseas.us

5 downloads 5876 Views 235KB Size Report
A refactoring chain was first defined by Counsell ... eradication of 'code smells' [6], a term used by Fowler ... To inform our analysis, a bespoke software tool was.
2nd EUROPEAN COMPUTING CONFERENCE (ECC’08) Malta, September 11-13, 2008

Code Smell Eradication and Associated Refactoring H. HAMZA, S. COUNSELL AND T. HALL Department of Information Systems and Computing Brunel University Uxbridge, Middlesex, UB8 3PH, UK

G. LOIZOU Department of Computer Science and Information Systems, Birkbeck, University of London, Malet Street, London, WC1E 7HX, UK Abstract: - A refactoring may use many other refactorings to implement its mechanics and the dependencies thus produced form a nested ‘chain’ of other refactorings. In this paper, we propose an approach which provides a coarse guide to effort required in eradicating code ‘smells’. The approach is based on the quantitative analysis of the dependencies between two sets of code smells, namely, those of Kerievsky and Fowler. A bespoke tool was developed to extract the required dependency information. Results suggest that some code smells require a considerably larger effort to remedy than others, suggesting that, where possible, developers should avoid eradicating these smells in favour of other code smells which can be eliminated relatively easily. A clear difference in the composition between the smells of Kerievsky and those of Fowler was also observed. Key-Words: Refactoring, chain dependencies, code smells, Fowler, Kerievsky. refactorings is through identification and then eradication of ‘code smells’ [6], a term used by Fowler to describe code that screams out to be refactored. However, an open research question at present is the extent to which effort is required to eliminate code smells based on the fact that eradicating a code smell requires a number of refactoring chains to be followed. Other things remaining equal, a developer should, in theory, choose to eradicate smells that induce the smallest number of refactoring chains and hence avoid smells inducing large numbers of refactoring chains. In this paper, we explore the chain size of the twelve code smells of Kerievsky and the twenty-two code smells of Fowler as a basis of our analysis and for informing decisions on which smells are less ‘bad’ or ‘smelly’ than others.

1 Introduction Refactoring is an emerging and important technique in software engineering and seeks to improve the design of code by making it less complex and more maintainable without changing its external behaviour [4, 5, 6, 15]. A refactoring operation is a time consuming process when applied manually and while many tools have been proposed to automate the refactoring process, many refactorings have yet to be fully automated [16]. The complexity of a refactoring can be identified in many ways. One such way is through the length and number of chains it generates. A refactoring chain was first defined by Counsell et al. [2] as the number of refactorings that could be sequentially implemented as part of a higherlevel refactoring. In other words, each other refactoring may induce many, further related refactorings as part of its mechanics. For example the ‘Extract Hierarchy’ refactoring that re-organises the inheritance hierarchy requires consideration of the ‘Extract Class’, ‘Replace Constructor with Factory Method’ and ‘Extract Method’ refactorings [6]. Each of these three in turn may require other refactorings to be considered, which, in turn, may require still further related refactorings. When time and cost is limited, a developer has to choose the most efficient and effective set of refactorings and to apply those refactorings to the most relevant and needy parts of the system requiring refactoring. Often, the best guide to the developer on where and when to apply

ISSN:1790-5109

2 Related Work Seminal works on refactoring were the PhD. work of Opdyke [15] and the main text of Fowler [6]. The text from which Kerievsky’s code smells are taken appeared significantly later [8]. A full survey of recent refactoring research can be found in [12]. Many studies have theoretically and empirically evaluated Fowler’s refactorings and many issues have been discussed in terms of their implementation simplicity and complexity. Advani et al., [1] showed that there are certain ‘core’ refactorings that are frequently used by

102

ISBN: 978-960-474-002-4

2nd EUROPEAN COMPUTING CONFERENCE (ECC’08) Malta, September 11-13, 2008

developers for two reasons; firstly, their simplicity and, secondly, their low dependency on other refactorings. Additionally, these core refactorings appeared to be used frequently in remedying many of the code smells defined in Fowler’s text [3]. In this paper, we analyse the dependencies between refactorings required to eradicate code smells [11]; so, we are effectively analysing ‘uses’ relationships between refactorings. Counsell et al., [2] analyzed the complete relationships between Fowler’s 72 refactorings theoretically in terms of ‘uses’ or ‘used by’ relationships as a basis of their analysis. The study showed that a subset of those 72 refactoring were predominantly structural-based refactorings and generated longer chains than less complex refactorings. From a testing perspective, ‘test chains’ can also be generated, related to the dependencies of test requirements in the mechanics of the refactorings. In other words, we can generate the likely test requirements and coarse-grained effort for each refactoring. Code smells also have test chains therefore reflecting the amount of test effort required for their eradication. An empirical evaluation of code smells was undertaken by Mantyla and Lassenius [9] who conducted an empirical study of industrial developers and their opinion of smells in evolving structures. They also describe mechanisms for making refactoring decisions in subsequent work [10]. Finally, Munro [13] describes a set of product metrics that can be used to guide the developer to bad smells in code and a tool was developed to aid this process.

by smell X’ to describe the total number of refactorings in all the chains of smell X.

Fig. 1: Tool interface Table 1 shows the twelve code smells of Kerievsky with an indication (asterisked) of which of those are also proposed by Fowler. Smell 1* 2 3 4* 5 6* 7* 8* 9 10 * 11 12 *

3 Data Analysis To inform our analysis, a bespoke software tool was developed to generate the refactoring chains for each of the twelve code smells of Kerievsky and the twenty-two code smells listed by Fowler. Figure 1 shows the tool interface and the references made by a selected code smell (in this case the ‘Long Method’ smell). A Long Method bad smell containing many lines of code and possible complex structures should be avoided from a maintenance and comprehension point of view; any such method should be decomposed. From Figure 1, each Kerievsky smell selected by the user shows both the main Kerievsky refactorings needed to remedy this code smell and the number of Kerievsky (K_REF) and Fowler refactorings (F_REFS) used in the generated chains in order to eradicate the smell in question. The algorithm for generating the refactorings was based on a recursive tree-based search of all dependencies between the seventy-two refactorings of Fowler and Kerievsky in their respective texts. In the subsequent analysis, we use the shortened term ‘smell’ to denote a ‘code smell’ for conciseness; we also use the term ‘refactorings induced

ISSN:1790-5109

Name Alternative Classes with Different Interfaces Combinatorial Explosion Conditional Complexity Duplicated Code Indecent Exposure Large Class Lazy Class Long Method Oddball Solution Primitive Obsession Solution Sprawl Switch Statements

Table 1. Kerievsky Code Smells We note that while those code smells asterisked in Table 1 have the same name as those of Fowler, the individual refactorings that they each use are different. This reflects the strong emphasis of Kerievsky on patterns [8]. Table 2 shows the set of fifteen code smells proposed by Fowler of the 22 that do not overlap with any in Table 1. Smell 1 2 3 4 5 6 7

103

Name Comments Data Class Data Clumps Divergent Change Feature Envy Inappropriate Intimacy Incomplete Library Class

ISBN: 978-960-474-002-4

2nd EUROPEAN COMPUTING CONFERENCE (ECC’08) Malta, September 11-13, 2008

8 9 10 11 12 13 14 15

shows the main set of 7 refactorings that Kerievsky suggests need to be applied in order to remedy the Primitive Obsession smell.

Long Parameter List Message Chains Middle Man Parallel Inheritance Hierarchies Refused Bequest Shotgun Surgery Speculative Generality Temporary Field

Refactoring Encapsulate Composite Move Embellishment to Decorator Replace Conditional Logic with Strategy Replace Implicit Language with Interpreter Replace Implicit Tree with Composite Replace State-Altering Conditionals with State Replace Type Code with Class

Table 2. Unique Fowler Smells We first investigate the Kerievsky set of smells.

Table 3. Primitive Obsession Refactorings

3.1 Refactorings induced by Kerievsky’s Smells Each of the code smells in Table 1 may require a combination of: refactorings drawn from firstly, those proposed by Kerievsky, secondly, those proposed by Fowler (Kerievsky draws heavily on Fowler’s refactorings) and thirdly, the application and use of design patterns [7]. Figure 2 shows the sum total of Fowler refactorings induced by the twelve smells after generating all chains for the smells in Table 1. For example, by following all the chains and generating all possible chains for code smell 3, (Conditional Complexity) 122 references to Fowler’s refactorings were induced. The Conditional Complexity smell arises when there are large blocks of logic embedded in a class; these blocks should be simplified (possibly with the use of design pattern).

The seven refactorings in Table 3 is the most that any of the code smells in Table 1 induce. The same smell also induces six design patterns, suggesting that undertaking the eradication of this smell would be both effortintensive and time-consuming. Design patterns typically require a deep understanding of the structure of the code and to achieve the end pattern implementation requires application of significant change to the code. Again from Figure 2, the Duplicated Code and Large Class smells each induce a total of 163 Fowler refactorings and again give a reasonably good impression of how much effort would be needed to remedy this smell. The Duplicated Code smell occurs when the same code appears in more than one place and, as such, represents duplication and potential stored-up, future maintenance problems. The Large Class bad smell occurs when the class is trying to do too much (i.e., either it has too many instance variables or too many methods, or both). The class should be decomposed to make it less complex and, in theory, more maintainable as a result. From Figure 2, we also see that the simplest Kerievsky’s code smells to be remedied are Alternative Classes with Different Interface (when methods have the same name but different signatures), Indecent Exposure (when a class is revealing its internal data), Lazy Class (when a class that does not do much – see Section 3.2 also), Oddball Solution (when a strange solution to a problem has been implemented) and Solution Sprawl (when too many classes are being used to solve a simple problem). The evidence would suggest that these are relatively simple smells to eradicate. Table 4 shows the number of main refactorings required by each of these 5 code smells.

250

Refactorings

200 150

Fowler Refactorings

100 50 0 1

3

5

7

9

11

Code Smell

Fig. 2: Fowler Refactorings (Kerievsky Smells) Also from Figure 2, smell 10 (Primitive Obsession) induces a total of 200 Fowler refactorings in its complete chains incorporating 12 Kerievsky refactorings. The Primitive Obsession smell occurs when a developer relies excessively on the use of primitive data types without thinking of using small objects instead. One reason why this smell induced so many refactorings was due to the large number of relatively complex refactorings that it uses. Table 3

ISSN:1790-5109

Smell Smell Name

104

Refactoring Name

ISBN: 978-960-474-002-4

2nd EUROPEAN COMPUTING CONFERENCE (ECC’08) Malta, September 11-13, 2008

1

Alternative Classes with Different Interfaces

2

Indecent Exposure

3

Lazy Class

4

Oddball Solution

5

Solution Sprawl

follow. The chain we actually follow is determined by the mechanics of each individual refactoring and the extent of the ‘smell’ in the code. In other words, if the smell is very ‘bad’ we would expect to apply more refactorings than if the smell was relatively ‘nicer’.

Unify Interfaces with Adapter Encapsulate Classes with Factory Inline Singleton Unify Interfaces with Adapter Move Creation Knowledge to Factory

Refactorings

10 Kerievsky Refactorings

4 2 0 5

7

9

200

77.00

74.89

13

5.50

5.09

3.2 Fowler code smells complete chains Each of the code smells in Table 2 requires refactorings drawn from those proposed by Fowler. In contrast to the Kerievsky code smells, Fowler’s do not explore the use of design patterns (and since Fowler’s refactorings predate Kerievsky’s refactorings, do not use any of his refactorings). Figure 4 shows the refactorings induced by the 15 smells shown in Table 1. The highest number of refactorings (17) belongs to the Speculative Generality code smell. This smell arises when code contains features that do not really need to be there and have only been introduced in case of future requirements. Such code should be removed. The code smell with 15 refactorings is the Lazy Class bad smell; this smell arises when a class exists, but does not serve a particularly useful purpose, in which case it should be removed from the system and/or amalgamated with another class. The smell inducing 13 refactorings is the Comments bad smell which arises when there are inappropriate, unnecessary or overly complex comments in code; these should be either changed or removed.

12

3

0

The standard deviation and mean for Fowler refactorings are very high while the standard deviation for Kerievsky’s refactorings is relatively small (5.09). In other words, there is a high dependence by the code smells proposed by Kerievsky on Fowler’s refactorings.

14

1

Fowler’s

Table 5. Summary data of Kerievsky’s smells

Each of the refactorings in Table 4 requires just a single refactoring to be completed (this further explains why they are relatively easy smells to eradicate. Figure 3 shows the number of complete Kerievsky refactorings generated by the same twelve code smells. Figure 3 is interesting as the number of Kerievsky refactorings mirrors the number of Fowler refactorings in Figure 2, but to a far lesser extent. In other words, to eradicate a Kerievsky based smell, a significantly larger number of Fowler’s refactorings are required on average. This makes Fowler’s code smells far more attractive to use.

6

Min Max Mean S.D.

Kerievsky’s 1

Table 4. Smells requiring least number of refactorings

8

Refactoring

11

Code Smell

Fig. 3: Kerievsky Refactorings (Kerievsky Smells) 18 16

ISSN:1790-5109

Refactorings

The code smell that induces the most refactorings is code smell 4, Duplicated Code (13 references to Kerievsky refactorings). Two of the code smells (Conditional Complexity and Primitive Obsession) both induce 12 references to Kerievsky’s refactorings. Table 5 shows the summary data for the twelve smells. It indicates that on average a smell will induce 77 refactorings. In other words, if we want to remedy any of the code smells in Table 1, we might expect to have to apply a significant and large number of those refactorings depending on the chain of refactorings we

14 12 10

Fowler Refactorings

8 6 4 2 0 1

3

5

7

9 11 13 15

Code Smell

Fig. 4: Refactorings for Fowler’s Smells

105

ISBN: 978-960-474-002-4

2nd EUROPEAN COMPUTING CONFERENCE (ECC’08) Malta, September 11-13, 2008

corresponding smells of Kerievsky and show a significant increase on those of Fowler (Figure 5).

It is notable that the smells that seem to induce the most refactorings are those that on the face of it are conceptually the simplest. Figure 5 shows the Fowler refactorings used by the seven smells that overlap with the twelve of Kerievsky. The seven smells are 1) Alternative Classes with Different Interfaces 2) Duplicated Code 3) Large Class 4) Lazy Class 5) Long Method 6) Primitive Obsession and 7) Switch Statements. From Figure 5, we see that the Large Class code smell produces around 40 Fowler refactorings and this gives a fair idea of how much effort will be needed to remedy this smell. It is interesting that the Large Class smell of Kerievsky also used 160 Kerievsky refactorings. Eradication of the Large Class code smell seems it might require a significant amount of refactoring effort. Large Class should come with a warning to the developer.

250

Refactorings

200

0 1

Refactorings

2

3

4

5

6

7

Code Smell

Fig. 6: Same 7 smells (Kerievsky) The conclusion we can draw from this is that the seven Kerievsky smells corresponding to Fowler’s are likely to be more expensive in time and effort to eradicate than those of Fowler. We might have expected this result, not least because Kerievsky’s code smells often induce a set of design pattern refactorings as well as relatively large numbers of refactorings.

40 35 30 Fowler Refactorings

20

Fowler Refactorings 100 50

45

25

150

15 10

3.3 Discussion

5

One question that arises from the preceding analysis is which of Fowler’s refactorings are used most frequently to remedy code smells? A notable feature of the data is the role that the ‘core’ refactorings identified in [3] play in elimination of code smells (we focus on Fowler’s 22 code smells). Inspection of the set of refactorings required to remedy each of the twenty two smells revealed that the most Fowler refactorings used by those smells are Extract Class (6 times), Move Method (6 times), Extract Method (4 times), and Move Field (4 times). Thirty-seven different Fowler refactorings are used to remedy the 22 code smells, emphasising the role that these four refactorings play in the eradication process. In fact, only five of the 22 code smells are without at least one of these four refactorings further emphasising the importance of these refactorings to the smell eradication process.

0 1

2

3

4

5

6

7

Code Smell

Fig. 5. Seven smells of Fowler The code smell inducing 25 refactorings is Duplicated Code and it is also interesting that the corresponding code smell of Kerievsky generated 163 references to Kerievsky refactorings. This is also a smell that should be avoided because of the relatively large number of refactorings it induces. Table 6 shows the summary data for the Fowler code smells. Since the eradication of Fowler’s code smells do not use any of Kerievsky’s refactorings, we enter a null value in each column of the table in row 2. Refactoring

Min Max Mean S.D.

Fowler’s

1

Kerievsky’s -

40

11.95

9.48

-

-

-

The results in this paper also raise a number of discussion points and pose a number of threats to its validity. Firstly, we have assumed that the code smell that a developer will choose to remedy is always based on the potential number of refactorings that would be required to be completed. In reality, it may simply be necessity that drives a developer to eradicate code smells. The amount of work that is involved may take secondary consideration. Secondly, we have taken the total number of refactoring dependencies as a basis of

Table 6. Summary Data of Fowler’s Code Smells We note that the mean and standard deviation values are significantly lower than those in Table 5. Figure 6 shows the total number of Fowler’s refactorings induced for the

ISSN:1790-5109

106

ISBN: 978-960-474-002-4

2nd EUROPEAN COMPUTING CONFERENCE (ECC’08) Malta, September 11-13, 2008

our analysis and therefore the worst case scenario for a developer. We have to consider the possibility that many of the refactorings that we have stated are related to a code smell may either perfect a refactoring or be inapplicable in certain circumstances; their application would therefore be optional. Finally, we have to accept that time is a limiting factor for developers [6] and that eradication of code smells might have low developer priority in preference to regular maintenance. Developers might take short-cuts or adopt different practices for tackling code smells.

In Proceedings of the international Symposium on international Symposium on Empirical Software Engineering. Rio de Janeiro, Brazil, September 21 – 22, 2006a, ACM, pp 288-296. [4] Du Bois, B., Demeyer, S., and Verelst, J., “Refactoring Improving Coupling and Cohesion of Existing Code”. In Proceedings of the 11th Working Conference on Reverse Engineering. Delft, Netherlands, November 08 – 12, 2004. IEEE Computer Society, Washington, DC, pp. 144-151. [5] S. Demeyer, S. Ducasse and O. Nierstrasz, Finding refactorings via change metrics, ACM Conference on Object Oriented Prog. Systems Languages and Applications, Minneapolis, USA. pp. 166-177, 2000. [6] Fowler, M., “Refactoring: improving the design of existing code”. Addison- Wesley, 1999. [7] Gamma, E., Helm, R., Johnson, R., Vlissides, J., 1995. “Design Patterns Elements of Reusable Object Oriented Software”. Addison Wesley. MA, USA. [8] Kerievsky, J., “Refactoring to Patterns”, Addison Wesley, 2004. [9] Mäntylä, M. V. and Lassenius, C. "Subjective Evaluation of Software Evolvability Using Code Smells: An Empirical Study". Journal of Empirical Soft. Engineering, vol. 11, no. 3, 2006, pp. 395-431 [10] Mäntylä, M. V. and Lassenius, C. "Drivers for Software Refactoring Decisions". in Proceedings of the Intl Symposium on Empirical Soft. Engineering, pp. 297-306, 2006, Rio de Janeiro, Brasil. [11] Mens, T., Taentzer, G. & Runge, O., 2007. "Analysing refactoring dependencies using graph transformation". Software and Systems Modelling. Springer, vol.6, issue 3, pp. 269-285. [12] Mens, T. & Tourwe, T., "A Survey of Software Refactoring". IEEE Transactions On Software Engineering, 2004, Vol. 30, No. 2, pp. 126-139. [13] Munro, M Product Metrics for Automatic Identification of "Bad Smell" Design Problems in Java Source-Code. IEEE METRICS 2005: 15 [14] Neill, C.J. & Laplante, P.A., "Paying Down Design Debt with Strategic Refactoring". Computer, Vol. 39, No. 12, 2006, pp. 131-134. [15] Opdyke, W.F.,. "Refactoring object-oriented frameworks". Ph.D. Thesis. University of Illinois, USA, 1992. [16] Trifu, A. and Reupke, U., “Towards Automated Restructuring of Object Oriented Systems”. Proceedings of the European Conference on Soft. Maintenance and Reengineering. Amsterdam, Netherlands, March 21 – 23, 2007, IEEE Computer Society, Washington, DC, pp.39-48.

4 Conclusions and Further Work In this paper, we analysed the dependencies between Kerievsky refactorings and those proposed by Fowler in the context of code smells. We compared the refactorings induced by each of the 22 code smells proposed by Fowler and the 12 code smells of Kerievsky. The main purpose of the research was to investigate a rough gauge of the effort likely to be required by each of the code smells in order that they can be eradicated. Results suggest that there are several code smells that would be relatively expensive to eradicate and overall Fowler’s code smells were firstly, less complex to eradicate and secondly, induced fewer refactorings on average. They would thus be the chosen category of code smells over Kerivesky’s. Ironically, these hardest smells to eradicate are the ones that appear to be simple code smells to eradicate at first glance. Further work will focus on extending the tool so that experiments with actual code smells could be conducted. That would lead to a empirical model of code smells and their eradication. We urge further and replicated research into the area of code smells and to that end the data used in this study is freely available upon request from the authors. References: [1] Advani, D., Hassoun, Y., and Counsell, S., “Extracting refactoring trends from OSS and a possible solution to the 'related refactoring conundrum”. Proc.ACM Symp. on Applied Computing. Dijon, France, pp 1713-1720, 2006 [2] Counsell, S., Hierons, R. M., Najjar, R., Loizou, G., and Hassoun, Y., “The Effectiveness of Refactoring, Based on a Compatibility Testing Taxonomy and a Dependency Graph”. In Proceedings of the Testing: Academic & industrial Conference on Practice and Research Techniques. Windsor, UK, August 29 – 31, 2006, IEEE Computer Society, pp181-192. [3] Counsell, S., Hassoun, Y., Loizou, G., and Najjar, R., “Common refactorings, a dependency graph and some code smells: an empirical study of Java OSS”.

ISSN:1790-5109

107

ISBN: 978-960-474-002-4