How Good are Code Smells for Evaluating. Software ... Pigoski, T.M., Practical Software Maintenance: Best Practices for Managing Your Software Investment. .... critical by software developers? ..... http://dragonartz.wordpress.com/tag/standing/.
How Good are Code Smells for Evaluating Software Maintainability? Results from a Comparative Case Study
Aiko Yamashita Simula Research Laboratory Mesan AS
2
Outline The Future!
Results and Lessons
Research Methodology
Research Objective
Background and Motivation
Software Maintainability
Maintainability has been of paramount importance, not only due to extensive costs entailed by maintenance activities… [Harrison & Cook, 1990] [Abran and Nguyenkim, 1991] [Pigoski, 1996]
…but also because we rely on the proper functioning of the systems that we utilize on a daily basis…
W. Harrison, C. Cook, "Insights on improving the maintenance process through software measurement,”, Conference on Software Maintenance, 1990. pp.37-45,Nov 1990. Abran, A. and H. Nguyenkim. Analysis of maintenance work categories through measurement. in Conference on Software Maintenance. 1991. Pigoski, T.M., Practical Software Maintenance: Best Practices for Managing Your Software Investment. 1996: John Wiley & Sons, Inc. 384.
Motivation
Goals
Methodology
Results
Conclusion
Software Maintainability
Maintainability has been of paramount importance, not only due to extensive costs entailed by maintenance activities… [Harrison & Cook, 1990] [Abran and Nguyenkim, 1991] [Pigoski, 1996]
A major goal during software maintenance and evolution is to manage an increasingly
LARGE and COMPLEX
…but also because we rely on the proper functioning of the systems that we utilize code base as new releases or improvements are made to on a daily basis…
the software product.
Motivation
Goals
Methodology
Results
Conclusion
Code smells as indicators of maintainability What are code smells?
Code smells are hints or indicators of suboptimal design choices that can potentially decrease software maintainability.
How can code smells support maintainability? FIRST:
They are associated to refactoring strategies
Smell: Shotgun Surgery
Refactoring: Move Method
“A change in a class results in the need to make a lot of little changes in several classes” Motivation
Goals
Methodology
Results
Conclusion
Code smells as indicators of maintainability What are code smells?
Code smells are hints or indicators of suboptimal design choices that can potentially decrease software maintainability.
How can code smells support maintainability? MAX(0,(171 - 5.2 * ln (Halstead Volume) - 0.23 * (CC) - 16.2 * ln (LOC))*100 / 171)
SECOND:
They are easier to interpret than traditional code metrics
Motivation
Goals
Methodology
Results
Conclusion
Code smells as indicators of maintainability What are code smells?
Code smells are hints or indicators of suboptimal design choices that can potentially decrease software maintainability.
How can code smells support maintainability? I think there are too many Feature Envy methods in this class...
SECOND:
They are easier to interpret than traditional code metrics
Motivation
Goals
Methodology
Results
Conclusion
Code smells as indicators of maintainability As such, code smell analysis is a promising approach to support both assessment and improvement of Maintainability code smells
analysis Diagnosis
Action plan (refactoring)
Motivation
Goals
largest economy
Methodology
Results
Conclusion
9
However… there are challenges within code smell analysis Refactoring in order to eliminate a code smell implies a cost and a risk •
Cost for refactoring, reworking the test sets, performing testing
•
Risk of introducing new defects
Insufficient information on severity levels and the range of effects of code smells, makes refactoring prioritization a nontrivial task. It is not clear how and to which extent code smells can reflect or describe how (non)maintainable a system is. It is not clear which maintenance aspects can be addressed by code smells and which should be addressed by other means (evaluation approaches).
Motivation
Goals
Methodology
Results
Conclusion
10
Addressing one (‘the’) gap in code smell research Research during the last decade has emphasized the formalization and automated detection of code smells.
Will my superengineered water-bike work outside the Lab?
But relatively little has been done to investigate how comprehensive and informative code smells are in assessing maintainability in practical settings. Even less empirical work on code smells includes in-vivo studies, which limits the applicability of the results in industrial settings.
Motivation
Goals
Methodology
Results
Conclusion
11
Research objective Empirically inquire in a realistic setting, how useful code smells are in supporting software maintainability assessments by investigating how good code smells are to:
RQ 1: Indicate system-level maintainability? RQ2: Identify source code files that are likely to require more effort than others.
RQ3: Identify source code files that are likely to be problematic during maintenance.
RQ4: What proportion of maintenance problems can be explained by the presence of code smells?
Motivation
Goals
RQ5: How well they correspond with maintainability aspects deemed critical by software developers?
Methodology
Results
Conclusion
12
Overall research strategy • Longitudinal, in-vivo case study investigating a Maintenance Project • Case study with control for moderator variables • Combination of Qualitative + Quantitative evidence • 4 Java Applications • Same functionality • Different design (7KLOC to 14KLOC)
System
A
B
Task 1. Replacing external data source
[Anda et al., 2009]
✔ C
D System
Developer
Task 2. New authentication mechanism
Task 3. New Reporting functionality
Bente C. D. Anda, Dag I. K. Sjøberg, and Audris Mockus. “Variability and Reproducibility in Software Engineering : A Study of Four Companies that Developed the Same System.” In: IEEE Transactions on Software Engineering 35.3 (2009), pp. 407–429.
Motivation
Goals
Methodology
Results
Conclusion
13
Conceptual model, variables and data sources 50,000 Euros Tasks Moderator variables
System
Sep-Dec, 2008
Project context Programming Skill
Development Technology
7 working weeks 6 Developers 2 Companies
Maintenance outcomes
Variables of interest
Maintainability perception*
Code smells (num. smells** smell density**)
Maintenance problems**
Open interviews Audio files/notes
Data sources
Change Size**
Defects*
Subversion database
Eclipse activity logs
Source code
Think aloud Video files/notes
** System and file level * Only at system level
Effort**
Study diary
Daily interviews Audio files/notes
Trac (Issue tracker), Acceptance test reports
Borland Together and InCode 12 Types of Code Smells
Motivation
Goals
Methodology
Results
Conclusion
Think aloud Video files/notes
Task progress sheets
14
Control for moderators was used to do case replication Different Systems Same Tasks Developers with similar skills Same project setting Same technology
Same Systems Same Tasks Developers with similar skills Same project setting Same technology
Context
Context
Context
Code Smells
System A
≈
Code Smells
System A
Case Case 22
Case 1
Maintenance outcomes
System A
≈
Maintenance outcomes
Having four functionally equivalent Java Systems allowed for case replication, with control over context (moderator) variables
System A
Context
Code Smells
System A
≠
System B
Case Case 23
Case 1
Maintenance outcomes
System A
Literal Replication
Code Smells
≠
Maintenance outcomes
System B
Theoretical Replication
This enables higher confidence on results because addresses threats to internal validity through cross-case comparison
Motivation
Goals
Methodology
Results
Conclusion
15
11 Code smells (and 1 anti-pattern) analyzed in the 4 systems
Motivation
Goals
Methodology
Results
Conclusion
16
Analysis at system level RQ1: Can code smells Indicate system-level maintainability? Systems were ranked according to Standardized scores Systems were ranked according to their their no. of code smells, and their were calculated for Maintainability, which was measured by: smell density (no. smells/KLOC). the ranking effort (time) and no. of defects introduced.
Do they correspond?
Maintainability assessment
Actual maintainability
The degree of correspondence between the maintainability assessment (based on code smells) and actual maintainability was measured. In addition, the degree of correspondence with actual maintainability and prior maintainability assessments performed on the same systems, based on: • Expert Judgment [Anda, 2007] • Chidamber-Kemerer Metrics [Anda, 207]
was compared, to determine which assessment approach was best. Bente C. D. Anda. “Assessing Software System Maintainability using Structural Measures and Expert Assessments.” I n: I nt’l Conf. Softw. Maint. 2007, pp. 204– 213.
Motivation
Goals
Methodology
Results
Conclusion
17
Analysis at file level RQ2: Can code smells identify source code files that are likely to require more effort? Multiple linear regression analysis Dependent variable: Effort (time) to update a file Independent variables: No. of 12 different code smells in each file Control variables:
• • • • •
File size (LOC) Number of revisions on a file System Developer Round
RQ3: Can code smells identify source code files that are likely to be problematic? Binary logistic regression analysis Dependent variable: Binary variable = Problematic file? Independent variables: No. of 12 different code smells in each file Control variables:
• • •
File size (LOC) Churn System
Principal component analysis Motivation
Goals
+
Methodology
Analysis of Qualitative Evidence Results
Conclusion
18
Analysis at project and conceptual level RQ4: What proportion of maintenance problems can be explained by code smells? % Non-Source code related difficulties
• Observational study • Daily interviews • Think-aloud protocol
% Code-smell related difficulties Maintenance Difficulties
System
A
B
C
% Non-code-smell related difficulties
% Source code related difficulties
D RQ5: How well they correspond with maintainability aspects
deemed critical by software developers?
Developer
1. Open-ended interview Audio file
2. Open coding and Axial coding (Extract the factors from statements collected during interviews)
Transcript Coded Statement Maintainability Factor
The conceptual relatedness of each maintainability factor to the definitions of code smells by [Fowler, 1999] was investigated
3. Cross-case Synthesis (Summarize and compare the factors across cases)
Cross-case Matrix
Motivation
Goals
Methodology
Results
Conclusion
19
Results RQ1: Can code smells be used to compare maintainability at system level? 1.
Number of code smells displayed highest correspondence to actual maintainability However, number of code smells is highly correlated with system size!
2.
However, smell density outperformed number of smells, when comparing only systems of similar size
3.
Motivation
Goals
Methodology
Expert Judgment was considered as the most flexible approach, because it considers both the effects of system size and potential maintenance scenarios.
Results
Conclusion
20
Results RQ2: Can code smells be used to explain effort at file level? • A model that only includes code smells (Model 1) displayed a fit of R2 = 0.42 • A model that includes file size and number of changes to Model 1 (Model 3) displayed a fit of R2 = 0.58
Finding: Code smells are not better at explaining sheer-effort at • Removing the code smells from Model 3 did not decrease the fit (R2 = 0.58) file level, than size and number of revisions.
• The only smell that remain a significant variable in Model 3 was Refused Bequest, which registered a decrease in effort (α < 0.01) • File size and number of changes remain the most significant p redictors of effort (α < 0.001)
RQ3: Can code smells be used to explain if a file is problematic during maintenance? • The performance measures of the model are: accuracy = 0.847, p recision = 0.742, and recall = 0.377 • Interface Segregation Finding:Principle Some Vcode iolation smells (ISPV) wcan as able potentially to explain p explain roblems [the Exp(B) occurrence = 7.610, p = 0.032]
of problems during maintenance. Also, not all smells seem to be
• Data Clump also deemed significant contributor of model [Exp(B) = 0.053, p = 0.029] but associated to less problems! problematic… • PCA indicated that ISPV tends to not be associated to code smells that are related to size. • Qualitative data suggests that ISPV is related to error/change propagation, and difficult concept location.
Motivation
Goals
Methodology
Results
Conclusion
21
Results RQ4: How comprehensively can code smells explain the incidence of maintenance problems? • From the problems associated to Java code, 37 (58%) where attributed to code smells, 19 (30%) to other code characteristics and 8 (12%) from a combination (interaction) of properties. • Found evidence of the presence of interaction effects between collocated code smells. • Found evidence that interaction effects of collocated code code smells smells and coupled ode smells has same Finding: Interaction effects between can cpotentially implications in practice.
cause more problems during maintenance.
File
Moreover, interactions can occur between collocated smells (in the same artifact) or between coupled smells (distributed across multiple, coupled files). Coupling
≈
God Method
Dependencies should be observed between files displaying code smells / other design flaws
Feature Envy
Motivation
Goals
Methodology
Results
Conclusion
Results
RQ5: How well do current code smell definitions correspond with maintainability aspects/factors deemed critical by software developers?
Finding: Some code smells may deserve more attention from a
• Many important aspects are not covered by definitions of code smells, and those aspects need to be practical maintenance perspective. addressed by other means: expert assessment, semantic analysis, etc. • Design However, consistency found to a be very important, and assessment, potentially addressable with some code smells. towas achieve comprehensive multiple
approaches (expert judgment, semantic analysis, etc) are needed.
Motivation
Goals
Methodology
Results
Conclusion
Lessons learned… Controlling for moderator factors in a comparative case study is a powerful approach to strength the internal validity from qualitative findings.. (You just need to make sure you don't die in the attempt..)
The element of ‘Artificiality’ should be considered, but not feared. Study protocol and pilot study are of paramount importance!
(Context is the king! Always report the context)
It responds well to the current need of inductive research for developing theories in SE.
A research project is a project after all… (It is important to count with the adequate resources)
Your log-book is your best friend J
Consider who is going to do the data collection, analysis, when, how?
(A centralized reference can be used to connect and navigate across different data-sources).
Motivation
Goals
Methodology
Results
Conclusion
24
And the adventure continues… • Interaction effects among code smells (and other code properties)
• Study of collocated smells and coupled smells
• Nature and severity of maintenance difficulties
• Cost/benefit based definition/detection of smells
Motivation
Goals
Methodology
Results
Conclusion
25
Thanks for your attention! :)
Motivation
Goals
Methodology
Results
Conclusion
The case for the interaction effect…
Motivation
Goals
Methodology
Results
Conclusion
27
Summary of contributions • Number of smells is no better than system size for comparing maintainability of systems of dissimilar size. However, smell density was found outperform size when systems of similar size are involved [1]. • The code smells investigated are rather poor indicators of effort at file level, if compared to traditional measures as file size and change frequency [2]. • However, a code smell that is independent of size can potentially explain why some files are likely to be problematic during maintenance (ISP Violation) [3]. • We have found evidence on the “duality” of the nature of code smells, as some are in fact, associated to positive effects [2][3]. • Code smells may have a limited scope when it comes to explaining the overall maintenance problems, and covering many of the maintainability aspects important for developers [4][5]. • Based on our findings on coupled smells, we suggest a re-thinking on the level of granularity used in current smell analyses (class, method) and suggest incorporating dependency analysis [4]. • To achieve better maintainability assessments a combination of approaches should be used. We have suggested the use of Concept Mapping [6] for that purpose.
Motivation
Goals
Methodology
Results
Conclusion
28
Limitations and threats to validity Construct validity
• Code smell detection tools. • Protocol for identifying maintenance problems. • Lack of severity levels of maintenance problems.
Internal validity
• Effect of rounds over the effort outcome at system-level. • Effect on the sub-type of task (reading, writing) is not accounted for when analyzing effort at file level.
External validity
• Medium sized, Java-based, web information systems. • Medium to small maintenance tasks. • Solo-projects.
Motivation
Goals
Methodology
Results
Conclusion
29
Code smell-based assessment of the systems
Motivation
Goals
Methodology
Results
Conclusion
30
Maintainability of the 4 systems
Motivation
Goals
Methodology
Results
Conclusion
31
Correspondence between code smell-based assessment and actual maintainability
Motivation
Goals
Methodology
Results
Conclusion
32
Smell density can distinguish maintainability of systems similar in size
Motivation
Goals
Methodology
Results
Conclusion
33
Multiple Regression Model
Motivation
Goals
Methodology
Results
Conclusion
34
Logistic Regression Model
Motivation
Goals
Methodology
Results
Conclusion
35
Principal component analysis
Motivation
Goals
Methodology
Results
Conclusion
36
Distribution of maintenance problems according to source
Motivation
Goals
Methodology
Results
Conclusion
37
Interaction effects amongst smells Interaction effects occur between collocated code smells, and code smells and other code characteristics. This can result on intensified effects of smells or other effect types. File
≠
God Method
Collocated smells may play an important role on the overall effects of code smells on maintenance…!
Feature Envy
Motivation
≠
Goals
Methodology
Results
Conclusion
38
Interaction effects amongst smells Interactions also occur between coupled smells! (smells distributed across coupled files), so from a practical perspective, they may have the same effects as collocated smells. File
≈ God Method
Dependencies should be observed between files displaying code smells / other design flaws
Feature Envy
Motivation
Coupling
Goals
Methodology
Results
Conclusion
39
Aspects that cannot be addressed by code smell definition
Code smells associated
Detectabl e via code analysis?
Alternative evaluation techniques
no
NA
no
Expert judgment
Coherent naming
no
NA
no
Semantic analysis, Manual inspection
Initial defects
no
NA
no
Acceptance tests, Regression testing
Three-layer architecture
no
NA
no
Expert judgment
Covered by code smell?
Appropriate technical platform
Aspect
Motivation
Goals
Methodology
Results
Conclusion
Aspects that can (partially) be addressed by code smell definition Covered by code smell?
Design suited to the problem domain
40
Code smells associated
Detectable via code analysis?
Alternative evaluation techniques
partially
Speculative Generality
partially
Expert judgment
Encapsulation
partially
Data Clump
partially
Manual inspection
Inheritance
partially
Abuse of multiple inheritance (new smell?) Refused Bequest
partially
Manual inspection
Aspect
Libraries
partially
Wide Subsystem Interface
partially
Expert judgment, Dependency Analysis
Simplicity
partially
God Class, God Method, Lazy Class, Long Parameter List, Message Chains
yes
Expert judgment
Use of components
partially
God Class, Misplaced Class
yes
Semantic analysis, Manual inspection
Design consistency
partially
Alternative Classes with Different Interfaces, ISP Violation, Divergent Change, Temporary Field
partially
Semantic analysis, Manual inspection
partially
Feature Envy, Shotgun Surgery, ISP Violation
yes
Manual inspection, Dependency analysis
yes
Duplicated code, Switch statements
yes
Manual inspection
Logic Spread Duplicated code
Motivation
Goals
Methodology
Results
Conclusion
Image credits http://bestclipartblog.com/clipart-pics/mountain-clip-art-5.png http://dragonartz.wordpress.com/tag/standing/ http://www.miguelcarrasco.net/miguelcarrasco/WindowsLiveWriter/BlueScreenofDeathTop10_7 B1A/blue%20screen%20of%20death%20mac%20airport%5B2%5D.jpg http://www.katu.com/news/tech/78348972.html http://www.old-picture.com/american-legacy/013/Foraker-Arthur-Jr.htm http://www.qrcodepress.com/pioneer-develops-augmented-reality-navigation-system/859146/ http://www.clker.com/cliparts/J/A/q/9/3/2/mountain-range-sunset-hi.png