How Good are Code Smells for Evaluating Software Maintainability?

7 downloads 58888 Views 8MB Size Report
How Good are Code Smells for Evaluating. Software ... Pigoski, T.M., Practical Software Maintenance: Best Practices for Managing Your Software Investment. .... critical by software developers? ..... http://dragonartz.wordpress.com/tag/standing/.
How Good are Code Smells for Evaluating Software Maintainability? Results  from  a  Comparative  Case  Study

Aiko Yamashita Simula Research Laboratory Mesan AS

2

Outline The  Future!

Results  and  Lessons

Research  Methodology

Research  Objective

Background  and  Motivation  

Software Maintainability

Maintainability  has  been  of  paramount  importance,  not   only  due  to  extensive  costs  entailed  by  maintenance   activities… [Harrison  &  Cook,  1990]  [Abran  and  Nguyenkim,  1991]  [Pigoski,  1996]

…but  also  because  we  rely  on  the  proper   functioning  of  the  systems  that  we  utilize   on  a  daily  basis…

W.  Harrison,  C.  Cook,  "Insights   on  improving  the   maintenance   process   through   software   measurement,”,   Conference   on Software   Maintenance,   1990. pp.37-­45,Nov   1990. Abran,   A.  and   H.  Nguyenkim.   Analysis  of  maintenance   work  categories   through   measurement.   in  Conference   on Software   Maintenance.   1991.   Pigoski,  T.M.,  Practical  Software   Maintenance:   Best  Practices  for  Managing   Your   Software   Investment.   1996:   John   Wiley  &  Sons,   Inc.  384.  

Motivation

Goals

Methodology

Results

Conclusion

Software Maintainability

Maintainability  has  been  of  paramount  importance,  not   only  due  to  extensive  costs  entailed  by  maintenance   activities… [Harrison  &  Cook,  1990]  [Abran  and  Nguyenkim,  1991]  [Pigoski,  1996]

A  major  goal  during  software  maintenance  and  evolution  is   to  manage  an  increasingly  

LARGE and  COMPLEX

…but  also  because  we  rely  on  the  proper   functioning  of  the  systems  that  we  utilize   code  base  as  new  releases  or  improvements  are  made  to   on  a  daily  basis…

the  software  product.

Motivation

Goals

Methodology

Results

Conclusion

Code smells as indicators of maintainability What  are  code  smells?

Code  smells  are  hints  or  indicators  of  suboptimal   design  choices  that  can  potentially  decrease  software   maintainability.

How  can  code  smells  support  maintainability? FIRST:

They  are   associated to  refactoring   strategies

Smell:  Shotgun  Surgery

Refactoring:  Move  Method

“A  change  in  a  class  results  in  the   need  to  make  a  lot  of  little   changes  in  several  classes” Motivation

Goals

Methodology

Results

Conclusion

Code smells as indicators of maintainability What  are  code  smells?

Code  smells  are  hints  or  indicators  of  suboptimal   design  choices  that  can  potentially  decrease  software   maintainability.

How  can  code  smells  support  maintainability? MAX(0,(171 - 5.2 * ln (Halstead Volume) - 0.23 * (CC) - 16.2 * ln (LOC))*100 / 171)

SECOND:

They  are  easier to  interpret  than   traditional   code  metrics

Motivation

Goals

Methodology

Results

Conclusion

Code smells as indicators of maintainability What  are  code  smells?

Code  smells  are  hints  or  indicators  of  suboptimal   design  choices  that  can  potentially  decrease  software   maintainability.

How  can  code  smells  support  maintainability? I think there are too many Feature Envy methods in this class...

SECOND:

They  are  easier to  interpret  than   traditional   code  metrics

Motivation

Goals

Methodology

Results

Conclusion

Code smells as indicators of maintainability As  such,  code  smell  analysis  is  a  promising  approach  to  support   both  assessment  and  improvement of  Maintainability code smells

analysis Diagnosis

Action plan (refactoring)

Motivation

Goals

largest economy

Methodology

Results

Conclusion

9

However… there are challenges within code smell analysis Refactoring  in  order  to  eliminate  a  code  smell  implies  a  cost  and  a  risk •

Cost for refactoring, reworking the test sets, performing testing



Risk of introducing new defects

Insufficient  information  on  severity  levels  and  the  range  of  effects  of  code   smells,  makes  refactoring  prioritization  a  nontrivial  task.   It  is  not  clear  how  and  to  which  extent  code  smells  can  reflect  or  describe  how   (non)maintainable  a  system  is. It  is  not  clear  which  maintenance  aspects  can  be  addressed  by  code  smells   and  which  should  be  addressed  by  other  means  (evaluation  approaches).

Motivation

Goals

Methodology

Results

Conclusion

10

Addressing one (‘the’) gap in code smell research Research  during  the  last  decade  has  emphasized  the  formalization  and  automated   detection  of  code  smells.

Will my superengineered water-bike work outside the Lab?

But  relatively  little  has  been  done  to  investigate  how   comprehensive  and  informative  code  smells  are  in   assessing  maintainability  in  practical  settings.   Even  less  empirical  work  on  code  smells  includes   in-­vivo  studies,  which  limits  the  applicability  of  the   results  in  industrial  settings.  

Motivation

Goals

Methodology

Results

Conclusion

11

Research objective Empirically  inquire  in  a  realistic  setting,  how  useful  code  smells  are  in  supporting   software  maintainability  assessments  by  investigating  how  good  code  smells  are  to:

RQ  1: Indicate  system-­level maintainability? RQ2: Identify  source  code   files that  are  likely  to  require   more  effort  than  others.

RQ3: Identify  source  code  files that  are  likely  to  be  problematic   during  maintenance.

RQ4: What  proportion  of   maintenance  problems can  be   explained  by  the  presence  of  code   smells?

Motivation

Goals

RQ5: How  well  they  correspond  with   maintainability  aspects deemed   critical  by  software  developers?  

Methodology

Results

Conclusion

12

Overall research strategy • Longitudinal,  in-­vivo  case  study  investigating  a  Maintenance  Project • Case  study  with  control for  moderator  variables • Combination  of  Qualitative  +  Quantitative  evidence • 4  Java  Applications • Same  functionality • Different  design (7KLOC  to  14KLOC)

System

A

B

Task  1.  Replacing  external  data  source

[Anda  et  al.,  2009]

✔ C

D System

Developer

Task  2. New  authentication   mechanism

Task  3. New  Reporting   functionality

Bente   C.  D.  Anda,   Dag  I.  K.  Sjøberg,   and  Audris Mockus.  “Variability  and   Reproducibility   in  Software   Engineering   :  A  Study  of  Four  Companies   that   Developed   the  Same  System.” In:  IEEE   Transactions   on   Software   Engineering   35.3   (2009),   pp.  407–429.

Motivation

Goals

Methodology

Results

Conclusion

13

Conceptual model, variables and data sources 50,000  Euros Tasks Moderator variables

System

Sep-­Dec,  2008

Project context Programming Skill

Development Technology

7  working  weeks 6  Developers 2  Companies

Maintenance outcomes

Variables of interest

Maintainability perception*

Code smells (num. smells** smell density**)

Maintenance problems**

Open interviews Audio files/notes

Data sources

Change Size**

Defects*

Subversion database

Eclipse activity logs

Source code

Think aloud Video files/notes

** System and file level * Only at system level

Effort**

Study diary

Daily interviews Audio files/notes

Trac (Issue tracker), Acceptance test reports

Borland  Together  and  InCode 12  Types  of  Code  Smells

Motivation

Goals

Methodology

Results

Conclusion

Think aloud Video files/notes

Task progress sheets

14

Control for moderators was used to do case replication Different Systems Same Tasks Developers with similar skills Same project setting Same technology

Same Systems Same Tasks Developers with similar skills Same project setting Same technology

Context

Context

Context

Code Smells

System A



Code Smells

System A

Case Case 22

Case 1

Maintenance outcomes

System A



Maintenance outcomes

Having  four  functionally   equivalent  Java  Systems allowed  for  case  replication,   with  control  over  context   (moderator)  variables

System A

Context

Code Smells

System A



System B

Case Case 23

Case 1

Maintenance outcomes

System A

Literal Replication

Code Smells



Maintenance outcomes

System B

Theoretical Replication

This  enables  higher  confidence  on  results  because  addresses  threats  to  internal   validity  through  cross-­case  comparison

Motivation

Goals

Methodology

Results

Conclusion

15

11 Code smells (and 1 anti-pattern) analyzed in the 4 systems

Motivation

Goals

Methodology

Results

Conclusion

16

Analysis at system level RQ1:  Can  code  smells  Indicate  system-­level maintainability? Systems  were  ranked  according  to  Standardized  scores Systems  were  ranked  according  to  their   their  no.  of  code  smells,  and  their  were  calculated  for Maintainability,  which  was  measured  by:   smell  density (no.  smells/KLOC).   the  ranking effort (time)  and  no.  of  defects  introduced.  

Do they correspond?

Maintainability assessment

Actual maintainability

The  degree  of  correspondence  between  the  maintainability  assessment (based  on  code  smells)  and  actual  maintainability was  measured. In  addition,  the  degree  of  correspondence  with  actual  maintainability  and  prior  maintainability   assessments performed  on  the  same  systems,  based  on: • Expert  Judgment  [Anda,  2007] • Chidamber-­Kemerer  Metrics  [Anda,  207]

was  compared,  to  determine  which  assessment  approach  was  best. Bente  C.  D.  Anda.  “Assessing  Software  System  Maintainability  using  Structural  Measures  and  Expert  Assessments.”  I n:  I nt’l  Conf.  Softw.  Maint.  2007,  pp.  204– 213.

Motivation

Goals

Methodology

Results

Conclusion

17

Analysis at file level RQ2:  Can  code  smells  identify  source  code  files that  are  likely  to  require  more  effort? Multiple  linear  regression  analysis Dependent  variable:  Effort  (time)  to  update  a  file Independent  variables:  No.  of  12  different  code  smells  in  each  file Control  variables:

• • • • •

File  size  (LOC)   Number  of  revisions  on  a  file System Developer Round

RQ3:  Can  code  smells  identify  source  code  files that  are  likely  to  be  problematic? Binary  logistic  regression  analysis Dependent  variable:  Binary  variable  =  Problematic  file? Independent  variables:  No.  of  12  different  code  smells  in  each  file Control  variables:

• • •

File  size  (LOC)   Churn System

Principal  component  analysis Motivation

Goals

+

Methodology

Analysis  of  Qualitative  Evidence Results

Conclusion

18

Analysis at project and conceptual level RQ4:  What  proportion  of  maintenance  problems can  be  explained  by  code  smells? %  Non-­Source  code related  difficulties

• Observational  study • Daily  interviews • Think-­aloud   protocol

%  Code-­smell related  difficulties Maintenance Difficulties

System

A

B

C

%  Non-­code-­smell related  difficulties

%  Source  code related  difficulties

D RQ5:  How  well  they  correspond  with  maintainability   aspects

deemed  critical  by  software  developers?  

Developer

1.  Open-­ended   interview Audio file

2.  Open   coding  and  Axial  coding (Extract  the  factors  from  statements   collected  during  interviews)

Transcript Coded Statement Maintainability Factor

The conceptual relatedness of each maintainability factor to the definitions of code smells by [Fowler, 1999] was investigated

3.  Cross-­case  Synthesis (Summarize  and  compare  the factors  across  cases)

Cross-case Matrix

Motivation

Goals

Methodology

Results

Conclusion

19

Results RQ1:  Can  code  smells  be  used  to  compare  maintainability   at  system  level?   1.

Number  of  code  smells  displayed  highest  correspondence  to  actual  maintainability However, number of code smells is highly correlated with system size!

2.

However,  smell  density  outperformed  number  of  smells,  when  comparing  only  systems  of  similar  size

3.

Motivation

Goals

Methodology

Expert  Judgment  was  considered  as  the   most  flexible  approach,  because  it   considers  both  the  effects  of  system  size   and  potential  maintenance  scenarios.

Results

Conclusion

20

Results RQ2:  Can  code  smells  be  used  to  explain   effort  at  file  level? • A  model  that  only  includes  code  smells  (Model  1)  displayed  a  fit  of  R2   =  0.42   • A  model  that  includes  file  size  and  number  of  changes  to  Model  1  (Model  3)  displayed  a  fit  of  R2   =  0.58  

Finding: Code smells are not better at explaining sheer-effort at • Removing  the  code  smells  from  Model  3  did  not  decrease  the  fit  (R2  =  0.58) file level, than size and number of revisions.

• The  only  smell  that  remain  a  significant  variable  in  Model  3  was  Refused  Bequest,  which  registered  a   decrease  in  effort  (α  <  0.01) • File  size  and  number  of  changes  remain  the  most  significant  p redictors  of  effort  (α  <  0.001)

RQ3:  Can  code  smells  be  used  to  explain   if  a  file  is  problematic   during   maintenance?   • The  performance  measures  of  the  model  are:  accuracy  =  0.847,  p recision  =  0.742,  and  recall  =  0.377 • Interface  Segregation   Finding:Principle   Some Vcode iolation   smells (ISPV)  wcan as  able   potentially to  explain  p explain roblems  [the Exp(B)   occurrence =  7.610,  p  =  0.032]

of problems during maintenance. Also, not all smells seem to be

• Data  Clump  also  deemed  significant  contributor  of  model  [Exp(B)  =  0.053,  p  =  0.029]  but  associated  to  less   problems! problematic… • PCA  indicated  that  ISPV  tends  to  not be  associated  to  code  smells  that  are  related  to  size. • Qualitative  data  suggests  that  ISPV  is  related  to  error/change  propagation,  and  difficult  concept  location.

Motivation

Goals

Methodology

Results

Conclusion

21

Results RQ4:  How  comprehensively   can  code  smells  explain   the  incidence   of  maintenance   problems? • From  the  problems  associated  to  Java  code,  37  (58%)  where  attributed  to  code  smells,  19  (30%)  to other  code  characteristics  and  8  (12%)  from  a  combination  (interaction)  of  properties. • Found  evidence  of  the  presence  of  interaction  effects  between  collocated  code  smells. • Found  evidence   that  interaction   effects   of  collocated   code  code smells  smells and  coupled   ode  smells  has  same Finding: Interaction effects between can cpotentially implications  in  practice.

cause more problems during maintenance.

File

Moreover, interactions can occur between collocated smells (in the same artifact) or between coupled smells (distributed across multiple, coupled files). Coupling



God   Method

Dependencies  should  be  observed   between  files  displaying  code  smells  /   other  design  flaws

Feature   Envy

Motivation

Goals

Methodology

Results

Conclusion

Results

RQ5:  How  well   do  current  code  smell  definitions   correspond   with  maintainability   aspects/factors deemed   critical  by  software  developers?

Finding: Some code smells may deserve more attention from a

• Many  important  aspects  are  not  covered  by  definitions  of  code  smells,  and  those  aspects  need  to  be   practical maintenance perspective. addressed   by  other  means:   expert  assessment,   semantic  analysis,  etc. • Design  However, consistency   found  to  a be   very  important,  and  assessment, potentially  addressable   with  some  code  smells. towas   achieve comprehensive multiple

approaches (expert judgment, semantic analysis, etc) are needed.

Motivation

Goals

Methodology

Results

Conclusion

Lessons learned… Controlling for  moderator  factors  in  a  comparative  case  study  is  a  powerful   approach  to  strength  the  internal  validity  from  qualitative findings.. (You  just  need  to  make  sure  you  don't  die  in  the  attempt..)

The  element  of  ‘Artificiality’  should  be   considered,  but  not  feared. Study  protocol and  pilot  study   are  of  paramount   importance!

(Context  is  the  king!  Always  report  the  context)

It  responds  well  to   the  current  need  of   inductive  research   for  developing   theories  in  SE.

A  research project  is  a  project after  all… (It  is  important  to  count  with  the  adequate  resources)

Your  log-­book  is  your  best  friend  J

Consider  who  is   going  to  do  the   data  collection,   analysis,  when,   how?

(A  centralized  reference  can  be  used  to  connect  and  navigate  across  different  data-­sources).

Motivation

Goals

Methodology

Results

Conclusion

24

And the adventure continues… • Interaction  effects  among  code  smells  (and  other  code  properties)

• Study  of  collocated  smells  and  coupled  smells

• Nature and  severity of  maintenance difficulties

• Cost/benefit  based  definition/detection  of  smells

Motivation

Goals

Methodology

Results

Conclusion

25

Thanks  for  your  attention!  :)

Motivation

Goals

Methodology

Results

Conclusion

The case for the interaction effect…

Motivation

Goals

Methodology

Results

Conclusion

27

Summary of contributions • Number  of  smells is  no  better  than  system  size  for  comparing  maintainability  of  systems  of  dissimilar size.   However,  smell  density  was  found  outperform  size when  systems  of  similar size  are  involved  [1]. • The  code  smells  investigated  are  rather  poor  indicators  of  effort  at  file  level,  if  compared  to  traditional   measures  as  file  size and  change  frequency  [2]. • However,  a  code  smell  that  is  independent  of  size  can  potentially  explain  why  some  files  are  likely  to  be   problematic  during  maintenance  (ISP  Violation)  [3]. • We  have  found  evidence  on  the  “duality”  of  the  nature  of  code  smells,  as  some  are  in  fact,  associated  to   positive  effects  [2][3]. • Code  smells  may  have  a  limited  scope  when  it  comes  to  explaining  the  overall  maintenance  problems,  and   covering  many  of  the  maintainability  aspects  important  for  developers  [4][5]. • Based  on  our  findings  on  coupled  smells,  we  suggest  a  re-­thinking  on  the  level  of  granularity  used  in   current  smell  analyses  (class,  method)  and  suggest  incorporating  dependency  analysis  [4]. • To  achieve  better  maintainability  assessments  a  combination  of  approaches  should  be  used.  We  have   suggested  the  use  of  Concept  Mapping  [6]  for  that  purpose.

Motivation

Goals

Methodology

Results

Conclusion

28

Limitations and threats to validity Construct  validity

• Code  smell  detection  tools. • Protocol  for  identifying  maintenance  problems. • Lack  of  severity  levels  of  maintenance  problems.

Internal validity

• Effect  of  rounds  over  the  effort  outcome  at  system-­level. • Effect  on  the  sub-­type  of  task  (reading,  writing)  is  not accounted  for  when  analyzing  effort  at  file  level.

External  validity

• Medium  sized,  Java-­based,  web  information  systems. • Medium  to  small  maintenance  tasks. • Solo-­projects.

Motivation

Goals

Methodology

Results

Conclusion

29

Code smell-based assessment of the systems

Motivation

Goals

Methodology

Results

Conclusion

30

Maintainability of the 4 systems

Motivation

Goals

Methodology

Results

Conclusion

31

Correspondence between code smell-based assessment and actual maintainability

Motivation

Goals

Methodology

Results

Conclusion

32

Smell density can distinguish maintainability of systems similar in size

Motivation

Goals

Methodology

Results

Conclusion

33

Multiple Regression Model

Motivation

Goals

Methodology

Results

Conclusion

34

Logistic Regression Model

Motivation

Goals

Methodology

Results

Conclusion

35

Principal component analysis

Motivation

Goals

Methodology

Results

Conclusion

36

Distribution of maintenance problems according to source

Motivation

Goals

Methodology

Results

Conclusion

37

Interaction effects amongst smells Interaction  effects  occur  between  collocated  code  smells,  and  code  smells  and   other  code  characteristics.  This  can  result  on  intensified  effects  of  smells  or  other   effect  types. File



God   Method

Collocated  smells  may  play  an  important   role  on  the  overall  effects  of  code  smells   on  maintenance…!

Feature   Envy

Motivation



Goals

Methodology

Results

Conclusion

38

Interaction effects amongst smells Interactions  also  occur  between  coupled  smells!  (smells  distributed  across   coupled  files),  so  from  a  practical  perspective,  they  may  have  the  same  effects  as   collocated  smells. File

≈ God   Method

Dependencies  should  be  observed   between  files  displaying  code  smells  /   other  design  flaws

Feature   Envy

Motivation

Coupling

Goals

Methodology

Results

Conclusion

39

Aspects that cannot be addressed by code smell definition

Code smells associated

Detectabl e via code analysis?

Alternative evaluation techniques

no

NA

no

Expert judgment

Coherent naming

no

NA

no

Semantic analysis, Manual inspection

Initial defects

no

NA

no

Acceptance tests, Regression testing

Three-layer architecture

no

NA

no

Expert judgment

Covered by code smell?

Appropriate technical platform

Aspect

Motivation

Goals

Methodology

Results

Conclusion

Aspects that can (partially) be addressed by code smell definition Covered by code smell?

Design suited to the problem domain

40

Code smells associated

Detectable via code analysis?

Alternative evaluation techniques

partially

Speculative Generality

partially

Expert judgment

Encapsulation

partially

Data Clump

partially

Manual inspection

Inheritance

partially

Abuse of multiple inheritance (new smell?) Refused Bequest

partially

Manual inspection

Aspect

Libraries

partially

Wide Subsystem Interface

partially

Expert judgment, Dependency Analysis

Simplicity

partially

God Class, God Method, Lazy Class, Long Parameter List, Message Chains

yes

Expert judgment

Use of components

partially

God Class, Misplaced Class

yes

Semantic analysis, Manual inspection

Design consistency

partially

Alternative Classes with Different Interfaces, ISP Violation, Divergent Change, Temporary Field

partially

Semantic analysis, Manual inspection

partially

Feature Envy, Shotgun Surgery, ISP Violation

yes

Manual inspection, Dependency analysis

yes

Duplicated code, Switch statements

yes

Manual inspection

Logic Spread Duplicated code

Motivation

Goals

Methodology

Results

Conclusion

Image credits http://bestclipartblog.com/clipart-­pics/mountain-­clip-­art-­5.png http://dragonartz.wordpress.com/tag/standing/ http://www.miguelcarrasco.net/miguelcarrasco/WindowsLiveWriter/BlueScreenofDeathTop10_7 B1A/blue%20screen%20of%20death%20mac%20airport%5B2%5D.jpg http://www.katu.com/news/tech/78348972.html http://www.old-­picture.com/american-­legacy/013/Foraker-­Arthur-­Jr.htm http://www.qrcodepress.com/pioneer-­develops-­augmented-­reality-­navigation-­system/859146/ http://www.clker.com/cliparts/J/A/q/9/3/2/mountain-­range-­sunset-­hi.png