How Good are Code Smells for Evaluating Software Maintainability?

How Good are Code Smells for Evaluating Software Maintainability? Results from a Comparative Case Study

Aiko Yamashita Simula Research Laboratory Mesan AS

2

Outline The Future!

Results and Lessons

Research Methodology

Research Objective

Background and Motivation

Software Maintainability

Maintainability has been of paramount importance, not only due to extensive costs entailed by maintenance activities… [Harrison & Cook, 1990] [Abran and Nguyenkim, 1991] [Pigoski, 1996]

…but also because we rely on the proper functioning of the systems that we utilize on a daily basis…

W. Harrison, C. Cook, "Insights on improving the maintenance process through software measurement,”, Conference on Software Maintenance, 1990. pp.37-45,Nov 1990. Abran, A. and H. Nguyenkim. Analysis of maintenance work categories through measurement. in Conference on Software Maintenance. 1991. Pigoski, T.M., Practical Software Maintenance: Best Practices for Managing Your Software Investment. 1996: John Wiley & Sons, Inc. 384.

Motivation

Goals

Methodology

Results

Conclusion

Software Maintainability

Maintainability has been of paramount importance, not only due to extensive costs entailed by maintenance activities… [Harrison & Cook, 1990] [Abran and Nguyenkim, 1991] [Pigoski, 1996]

A major goal during software maintenance and evolution is to manage an increasingly

LARGE and COMPLEX

…but also because we rely on the proper functioning of the systems that we utilize code base as new releases or improvements are made to on a daily basis…

the software product.

Motivation

Goals

Methodology

Results

Conclusion

Code smells as indicators of maintainability What are code smells?

Code smells are hints or indicators of suboptimal design choices that can potentially decrease software maintainability.

How can code smells support maintainability? FIRST:

They are associated to refactoring strategies

Smell: Shotgun Surgery

Refactoring: Move Method

“A change in a class results in the need to make a lot of little changes in several classes” Motivation

Goals

Methodology

Results

Conclusion



How can code smells support maintainability? MAX(0,(171 - 5.2 * ln (Halstead Volume) - 0.23 * (CC) - 16.2 * ln (LOC))*100 / 171)

SECOND:

They are easier to interpret than traditional code metrics

Motivation

Goals

Methodology

Results

Conclusion



How can code smells support maintainability? I think there are too many Feature Envy methods in this class...

SECOND:

They are easier to interpret than traditional code metrics

Motivation

Goals

Methodology

Results

Conclusion

Code smells as indicators of maintainability As such, code smell analysis is a promising approach to support both assessment and improvement of Maintainability code smells

analysis Diagnosis

Action plan (refactoring)

Motivation

Goals

largest economy

Methodology

Results

Conclusion

9

However… there are challenges within code smell analysis Refactoring in order to eliminate a code smell implies a cost and a risk •

Cost for refactoring, reworking the test sets, performing testing

•

Risk of introducing new defects

Insufficient information on severity levels and the range of effects of code smells, makes refactoring prioritization a nontrivial task. It is not clear how and to which extent code smells can reflect or describe how (non)maintainable a system is. It is not clear which maintenance aspects can be addressed by code smells and which should be addressed by other means (evaluation approaches).

Motivation

Goals

Methodology

Results

Conclusion

10

Addressing one (‘the’) gap in code smell research Research during the last decade has emphasized the formalization and automated detection of code smells.

Will my superengineered water-bike work outside the Lab?

But relatively little has been done to investigate how comprehensive and informative code smells are in assessing maintainability in practical settings. Even less empirical work on code smells includes in-vivo studies, which limits the applicability of the results in industrial settings.

Motivation

Goals

Methodology

Results

Conclusion

11

Research objective Empirically inquire in a realistic setting, how useful code smells are in supporting software maintainability assessments by investigating how good code smells are to:

RQ 1: Indicate system-level maintainability? RQ2: Identify source code files that are likely to require more effort than others.

RQ3: Identify source code files that are likely to be problematic during maintenance.

RQ4: What proportion of maintenance problems can be explained by the presence of code smells?

Motivation

Goals

RQ5: How well they correspond with maintainability aspects deemed critical by software developers?

Methodology

Results

Conclusion

12

Overall research strategy • Longitudinal, in-vivo case study investigating a Maintenance Project • Case study with control for moderator variables • Combination of Qualitative + Quantitative evidence • 4 Java Applications • Same functionality • Different design (7KLOC to 14KLOC)

System

A

B

Task 1. Replacing external data source

[Anda et al., 2009]

✔ C

D System

Developer

Task 2. New authentication mechanism

Task 3. New Reporting functionality

Bente C. D. Anda, Dag I. K. Sjøberg, and Audris Mockus. “Variability and Reproducibility in Software Engineering : A Study of Four Companies that Developed the Same System.” In: IEEE Transactions on Software Engineering 35.3 (2009), pp. 407–429.

Motivation

Goals

Methodology

Results

Conclusion

13

Conceptual model, variables and data sources 50,000 Euros Tasks Moderator variables

System

Sep-Dec, 2008

Project context Programming Skill

Development Technology

7 working weeks 6 Developers 2 Companies

Maintenance outcomes

Variables of interest

Maintainability perception*

Code smells (num. smells** smell density**)

Maintenance problems**

Open interviews Audio files/notes

Data sources

Change Size**

Defects*

Subversion database

Eclipse activity logs

Source code

Think aloud Video files/notes

** System and file level * Only at system level

Effort**

Study diary

Daily interviews Audio files/notes

Trac (Issue tracker), Acceptance test reports

Borland Together and InCode 12 Types of Code Smells

Motivation

Goals

Methodology

Results

Conclusion

Think aloud Video files/notes

Task progress sheets

14

Control for moderators was used to do case replication Different Systems Same Tasks Developers with similar skills Same project setting Same technology

Same Systems Same Tasks Developers with similar skills Same project setting Same technology

Context

Context

Context

Code Smells

System A

≈

Code Smells

System A

Case Case 22

Case 1


System A

≈


Having four functionally equivalent Java Systems allowed for case replication, with control over context (moderator) variables

System A

Context

Code Smells

System A

≠

System B

Case Case 23

Case 1


System A

Literal Replication

Code Smells

≠


System B

Theoretical Replication

This enables higher confidence on results because addresses threats to internal validity through cross-case comparison

Motivation

Goals

Methodology

Results

Conclusion

15

11 Code smells (and 1 anti-pattern) analyzed in the 4 systems

Motivation

Goals

Methodology

Results

Conclusion

16

Analysis at system level RQ1: Can code smells Indicate system-level maintainability? Systems were ranked according to Standardized scores Systems were ranked according to their their no. of code smells, and their were calculated for Maintainability, which was measured by: smell density (no. smells/KLOC). the ranking effort (time) and no. of defects introduced.

Do they correspond?

Maintainability assessment

Actual maintainability

The degree of correspondence between the maintainability assessment (based on code smells) and actual maintainability was measured. In addition, the degree of correspondence with actual maintainability and prior maintainability assessments performed on the same systems, based on: • Expert Judgment [Anda, 2007] • Chidamber-Kemerer Metrics [Anda, 207]

was compared, to determine which assessment approach was best. Bente C. D. Anda. “Assessing Software System Maintainability using Structural Measures and Expert Assessments.” I n: I nt’l Conf. Softw. Maint. 2007, pp. 204– 213.

Motivation

Goals

Methodology

Results

Conclusion

17

Analysis at file level RQ2: Can code smells identify source code files that are likely to require more effort? Multiple linear regression analysis Dependent variable: Effort (time) to update a file Independent variables: No. of 12 different code smells in each file Control variables:

• • • • •

File size (LOC) Number of revisions on a file System Developer Round

RQ3: Can code smells identify source code files that are likely to be problematic? Binary logistic regression analysis Dependent variable: Binary variable = Problematic file? Independent variables: No. of 12 different code smells in each file Control variables:

• • •

File size (LOC) Churn System

Principal component analysis Motivation

Goals

+

Methodology

Analysis of Qualitative Evidence Results

Conclusion

18

Analysis at project and conceptual level RQ4: What proportion of maintenance problems can be explained by code smells? % Non-Source code related difficulties

• Observational study • Daily interviews • Think-aloud protocol

% Code-smell related difficulties Maintenance Difficulties

System

A

B

C

% Non-code-smell related difficulties

% Source code related difficulties

D RQ5: How well they correspond with maintainability aspects

deemed critical by software developers?

Developer

1. Open-ended interview Audio file

2. Open coding and Axial coding (Extract the factors from statements collected during interviews)

Transcript Coded Statement Maintainability Factor

The conceptual relatedness of each maintainability factor to the definitions of code smells by [Fowler, 1999] was investigated

3. Cross-case Synthesis (Summarize and compare the factors across cases)

Cross-case Matrix

Motivation

Goals

Methodology

Results

Conclusion

19

Results RQ1: Can code smells be used to compare maintainability at system level? 1.

Number of code smells displayed highest correspondence to actual maintainability However, number of code smells is highly correlated with system size!

2.

However, smell density outperformed number of smells, when comparing only systems of similar size

3.

Motivation

Goals

Methodology

Expert Judgment was considered as the most flexible approach, because it considers both the effects of system size and potential maintenance scenarios.

Results

Conclusion

20

Results RQ2: Can code smells be used to explain effort at file level? • A model that only includes code smells (Model 1) displayed a fit of R2 = 0.42 • A model that includes file size and number of changes to Model 1 (Model 3) displayed a fit of R2 = 0.58

Finding: Code smells are not better at explaining sheer-effort at • Removing the code smells from Model 3 did not decrease the fit (R2 = 0.58) file level, than size and number of revisions.

• The only smell that remain a significant variable in Model 3 was Refused Bequest, which registered a decrease in effort (α < 0.01) • File size and number of changes remain the most significant p redictors of effort (α < 0.001)

RQ3: Can code smells be used to explain if a file is problematic during maintenance? • The performance measures of the model are: accuracy = 0.847, p recision = 0.742, and recall = 0.377 • Interface Segregation Finding:Principle Some Vcode iolation smells (ISPV) wcan as able potentially to explain p explain roblems [the Exp(B) occurrence = 7.610, p = 0.032]

of problems during maintenance. Also, not all smells seem to be

• Data Clump also deemed significant contributor of model [Exp(B) = 0.053, p = 0.029] but associated to less problems! problematic… • PCA indicated that ISPV tends to not be associated to code smells that are related to size. • Qualitative data suggests that ISPV is related to error/change propagation, and difficult concept location.

Motivation

Goals

Methodology

Results

Conclusion

21

Results RQ4: How comprehensively can code smells explain the incidence of maintenance problems? • From the problems associated to Java code, 37 (58%) where attributed to code smells, 19 (30%) to other code characteristics and 8 (12%) from a combination (interaction) of properties. • Found evidence of the presence of interaction effects between collocated code smells. • Found evidence that interaction effects of collocated code code smells smells and coupled ode smells has same Finding: Interaction effects between can cpotentially implications in practice.

cause more problems during maintenance.

File

Moreover, interactions can occur between collocated smells (in the same artifact) or between coupled smells (distributed across multiple, coupled files). Coupling

≈

God Method

Dependencies should be observed between files displaying code smells / other design flaws

Feature Envy

Motivation

Goals

Methodology

Results

Conclusion

Results

RQ5: How well do current code smell definitions correspond with maintainability aspects/factors deemed critical by software developers?

Finding: Some code smells may deserve more attention from a

• Many important aspects are not covered by definitions of code smells, and those aspects need to be practical maintenance perspective. addressed by other means: expert assessment, semantic analysis, etc. • Design However, consistency found to a be very important, and assessment, potentially addressable with some code smells. towas achieve comprehensive multiple

approaches (expert judgment, semantic analysis, etc) are needed.

Motivation

Goals

Methodology

Results

Conclusion

Lessons learned… Controlling for moderator factors in a comparative case study is a powerful approach to strength the internal validity from qualitative findings.. (You just need to make sure you don't die in the attempt..)

The element of ‘Artificiality’ should be considered, but not feared. Study protocol and pilot study are of paramount importance!

(Context is the king! Always report the context)

It responds well to the current need of inductive research for developing theories in SE.

A research project is a project after all… (It is important to count with the adequate resources)

Your log-book is your best friend J

Consider who is going to do the data collection, analysis, when, how?

(A centralized reference can be used to connect and navigate across different data-sources).

Motivation

Goals

Methodology

Results

Conclusion

24

And the adventure continues… • Interaction effects among code smells (and other code properties)

• Study of collocated smells and coupled smells

• Nature and severity of maintenance difficulties

• Cost/benefit based definition/detection of smells

Motivation

Goals

Methodology

Results

Conclusion

25

Thanks for your attention! :)

Motivation

Goals

Methodology

Results

Conclusion

The case for the interaction effect…

Motivation

Goals

Methodology

Results

Conclusion

27

Summary of contributions • Number of smells is no better than system size for comparing maintainability of systems of dissimilar size. However, smell density was found outperform size when systems of similar size are involved [1]. • The code smells investigated are rather poor indicators of effort at file level, if compared to traditional measures as file size and change frequency [2]. • However, a code smell that is independent of size can potentially explain why some files are likely to be problematic during maintenance (ISP Violation) [3]. • We have found evidence on the “duality” of the nature of code smells, as some are in fact, associated to positive effects [2][3]. • Code smells may have a limited scope when it comes to explaining the overall maintenance problems, and covering many of the maintainability aspects important for developers [4][5]. • Based on our findings on coupled smells, we suggest a re-thinking on the level of granularity used in current smell analyses (class, method) and suggest incorporating dependency analysis [4]. • To achieve better maintainability assessments a combination of approaches should be used. We have suggested the use of Concept Mapping [6] for that purpose.

Motivation

Goals

Methodology

Results

Conclusion

28

Limitations and threats to validity Construct validity

• Code smell detection tools. • Protocol for identifying maintenance problems. • Lack of severity levels of maintenance problems.

Internal validity

• Effect of rounds over the effort outcome at system-level. • Effect on the sub-type of task (reading, writing) is not accounted for when analyzing effort at file level.

External validity

• Medium sized, Java-based, web information systems. • Medium to small maintenance tasks. • Solo-projects.

Motivation

Goals

Methodology

Results

Conclusion

29

Code smell-based assessment of the systems

Motivation

Goals

Methodology

Results

Conclusion

30

Maintainability of the 4 systems

Motivation

Goals

Methodology

Results

Conclusion

31

Correspondence between code smell-based assessment and actual maintainability

Motivation

Goals

Methodology

Results

Conclusion

32

Smell density can distinguish maintainability of systems similar in size

Motivation

Goals

Methodology

Results

Conclusion

33

Multiple Regression Model

Motivation

Goals

Methodology

Results

Conclusion

34

Logistic Regression Model

Motivation

Goals

Methodology

Results

Conclusion

35

Principal component analysis

Motivation

Goals

Methodology

Results

Conclusion

36

Distribution of maintenance problems according to source

Motivation

Goals

Methodology

Results

Conclusion

37

Interaction effects amongst smells Interaction effects occur between collocated code smells, and code smells and other code characteristics. This can result on intensified effects of smells or other effect types. File

≠

God Method

Collocated smells may play an important role on the overall effects of code smells on maintenance…!

Feature Envy

Motivation

≠

Goals

Methodology

Results

Conclusion

38

Interaction effects amongst smells Interactions also occur between coupled smells! (smells distributed across coupled files), so from a practical perspective, they may have the same effects as collocated smells. File

≈ God Method

Dependencies should be observed between files displaying code smells / other design flaws

Feature Envy

Motivation

Coupling

Goals

Methodology

Results

Conclusion

39

Aspects that cannot be addressed by code smell definition

Code smells associated

Detectabl e via code analysis?

Alternative evaluation techniques

no

NA

no

Expert judgment

Coherent naming

no

NA

no

Semantic analysis, Manual inspection

Initial defects

no

NA

no

Acceptance tests, Regression testing

Three-layer architecture

no

NA

no

Expert judgment

Covered by code smell?

Appropriate technical platform

Aspect

Motivation

Goals

Methodology

Results

Conclusion

Aspects that can (partially) be addressed by code smell definition Covered by code smell?

Design suited to the problem domain

40

Code smells associated

Detectable via code analysis?

Alternative evaluation techniques

partially

Speculative Generality

partially

Expert judgment

Encapsulation

partially

Data Clump

partially

Manual inspection

Inheritance

partially

Abuse of multiple inheritance (new smell?) Refused Bequest

partially

Manual inspection

Aspect

Libraries

partially

Wide Subsystem Interface

partially

Expert judgment, Dependency Analysis

Simplicity

partially

God Class, God Method, Lazy Class, Long Parameter List, Message Chains

yes

Expert judgment

Use of components

partially

God Class, Misplaced Class

yes


Design consistency

partially

Alternative Classes with Different Interfaces, ISP Violation, Divergent Change, Temporary Field

partially


partially

Feature Envy, Shotgun Surgery, ISP Violation

yes

Manual inspection, Dependency analysis

yes

Duplicated code, Switch statements

yes

Manual inspection

Logic Spread Duplicated code

Motivation

Goals

Methodology

Results

Conclusion

Image credits http://bestclipartblog.com/clipart-pics/mountain-clip-art-5.png http://dragonartz.wordpress.com/tag/standing/ http://www.miguelcarrasco.net/miguelcarrasco/WindowsLiveWriter/BlueScreenofDeathTop10_7 B1A/blue%20screen%20of%20death%20mac%20airport%5B2%5D.jpg http://www.katu.com/news/tech/78348972.html http://www.old-picture.com/american-legacy/013/Foraker-Arthur-Jr.htm http://www.qrcodepress.com/pioneer-develops-augmented-reality-navigation-system/859146/ http://www.clker.com/cliparts/J/A/q/9/3/2/mountain-range-sunset-hi.png

How Good are Code Smells for Evaluating Software Maintainability?

How Good are Code Smells for Evaluating Software Maintainability?

Suggest Documents