Recommendation of Move Method Refactorings ...

2 downloads 0 Views 2MB Size Report
method and each class, it is decided whether feature envy code smell exists in .... 2.1 Move Method Refactoring to Remove Feature Envy Code Smell . . 13.
Recommendation of Move Method Refactorings Using Coupling, Cohesion and Contextual Similarity

Md. Masudur Rahman MSSE 0404

A Thesis Submitted to the Master of Science in Software Engineering Program Office of the Institute of Information Technology, University of Dhaka in Partial Fulfillment of the requirements for the Degree of MASTER OF SCIENCE IN SOFTWARE ENGINEERING

Institute of Information Technology University of Dhaka DHAKA, BANGLADESH

c Md. Masudur Rahman, 2017

Recommendation of Move Method Refactorings Using Coupling, Cohesion and Contextual Similarity Md. Masudur Rahman

Approved:

Signature

Date

Student: Md. Masudur Rahman

Supervisor: Md. Rayhanur Rahman

ii

To my parents for their endless love, support and encouragement

iii

ABSTRACT

Placement of methods within classes is one of the most important design activities for any object oriented application in terms of coupling and cohesion. Misplacement of methods is known as feature envy code smell. Due to this code smell, the application will be tightly coupled and loosely cohesive reflecting inefficient design. Hence, development and maintenance time, cost and effort are increased. To enhance the design quality, move method refactoring technique plays a significant role through grouping similar behaviors of methods. It is also used as a refactoring technique of feature envy code smell by placing methods into more appropriate classes from the inappropriate ones. To apply this refactorings manually, it requires a lot of time and cost because of the inefficient design. Therefore, automatic recommendation approach of the move method refactoring techniques assist developers in making their maintenance activity easier. Existing approaches consider only dependency based information of non-static entities (methods and attributes) for the recommendation, . Therefore, these approaches are not applicable for all types of entities (static and non-static).

iv

This thesis proposes a novel idea of using contextual information based on Information Retrieval (IR) technique, along with coupling and cohesion based information of the application for recommending move method refactorings to enhance the design quality. In addition, the approach incorporates both static and non-static entities in its recommendation process to refactor the feature envy code smell. Basically, the approach is based on three factors (C3 ) - coupling, cohesion and contextual information of the software application. At first, it analyses source code through parsing to get dependency (coupling and cohesion) and context based information. Then, it calculates two types of similarity scores between a target method (assuming the method is placed in the inappropriate class) and a class: (1) dependency based similarity score, and (2) context based similarity score. After that, these two types of similarity scores are combined together to get the total (or actual) similarity score. By comparing the scores between the method and each class, it is decided whether feature envy code smell exists in the application or not. If the similarity score of the method’s current class is less than the scores of other one or more classes, then the technique detects the method as a feature envy code smell. Finally, it recommends the class having the highest similarity score with the method for refactoring. For validating the approach, a framework, named as ‘Move Method Refactoring Using Coupling, Cohesion and Contextual Similarity’ (MMRUC3) is developed, and is applied on seven well-known open source projects. The results of the experimental evaluation indicate that the proposed approach provides better results with an average precision of 18.91%, recall of 69.91%, and F-measure of 29.77% than JDeodorant tool (a popular eclipse plugin for refactorings). Moreover, this paper establishes several relationships between the accuracy of the approach, and project standards and sizes.

v

ACKNOWLEDGMENTS

This thesis would not have been possible without the advice, help, and support of the kind people around me. Above all, I would like to thank my supervisor, Md. Rayhanur Rahman. Not only I benefited from his knowledge and the opportunities that he provided me, but also he taught me many important aspects of being a researcher. In addition, he was always more than happy to help me with nonacademic issues, and I am very grateful for his understanding and support at a personal level. I would also like to thank my committee members for serving even at hardship. I also want to thank them for letting my defense be an enjoyable moment and for their brilliant comments and suggestions. I would like to also thank a few wonderful people at Institute of Information Technology, University of Dhaka. Special thanks to Dr. Kazi Muheymin-Us-Sakib, Dr. B. M. Mainul Hossain, Dr. Muhammad Mahbub Alam, and Dr. Mohammad Shoyaib for their valuable suggestion and feedback. I had many inspiring and fruitful discussions with them that made my experience and knowledge more complete. vi

I would also thank my thesis externals Dr. Chowdhury Farhan Ahmed and Dr. Moinul Islam Zaber, Department of Computer Science and Engineering, University of Dhaka, for their constructive feedback. Also I would like to thank other faculty members for their participation and constructive feedback in the production of the thesis. I am expressing ever gratefulness to all my fellow classmates whose advice, feedback and cooperation are truly incomparable. I am also thankful to Ministry of Posts, Telecommunications and IT, Government of the People’s Republic of Bangladesh for granting me ICT Fellowship 2016-17. Last, but not least, I am very grateful to my parents, Mahmuda and Ismail, for their unconditional love and support that helped me throughout my life. They always put my well-being, education, and happiness ahead of themselves.

vii

LIST OF PUBLICATIONS

1. “Recommendation of Move Method Refactoring to Optimize Modularization Using Conceptual Similarity,”International Journal of Information Technology and Computer Science(IJITCS), 2017. 2. “Recommendation of Move Method Refactorings Using Coupling, Cohesion and Contextual Similarity,”In Proceedings of IEEE International Conference on Imaging, Vision & Pattern Recognition (icIVPR), University of Dhaka, Dhaka, Bangladesh, February 13, 2017. 3. “A Context Based Approach for Recommending Move Method Refactoring to Optimize Modularization,” In Proceedings of 24th IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER), Klagenfurt, Austria, February 21-24, 2017. [Accepted]

viii

TABLE OF CONTENTS

Approval

ii

Dedication

iii

Abstract

iv

Acknowledgements

vi

Publications

viii

Table of Contents

ix

List of Figures

xii

List of Tables

xiii

1 Introduction 1.1 Key Terminologies . . . . . . . . . . 1.2 Motivation . . . . . . . . . . . . . . . 1.3 Issues in State-of-the-Art Approaches 1.4 Research Questions . . . . . . . . . . 1.5 Contribution . . . . . . . . . . . . . . 1.6 Organization of the Thesis . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

2 Background Study 2.1 Move Method Refactoring to Remove Feature Envy Code Smell 2.1.1 Refactorings and Code Smells . . . . . . . . . . . . . . . 2.1.2 Move Method Refactorings . . . . . . . . . . . . . . . . . 2.2 Design Quality: Coupling, Cohesion and Single Responsibility Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

. . . . . .

. . . . . .

1 2 4 6 7 9 10

12 . . 13 . . 13 . . 24 . . 27

. . . . .

27 28 29 31 35

3 Literature Review of Recommending Move Method Refactorings 3.1 Move Method Refactorings and Feature Envy Code Smells . . . . . 3.1.1 Recommendation of Move Method Refactorings to Remove Feature Envy Code Smells . . . . . . . . . . . . . . . . . . . 3.1.2 Recommendation of Move Method Refactorings . . . . . . . 3.1.3 Detection of Feature Envy Code Smells . . . . . . . . . . . . 3.2 Context Based Refactorings . . . . . . . . . . . . . . . . . . . . . . 3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

36 38

4 Recommendation of Move Method Refactorings 4.1 Overview of the Recommendation Approach . . . . 4.2 MMRUC3 Recommendation Framework . . . . . . 4.2.1 Source Code Analysis . . . . . . . . . . . . . 4.2.2 Move Method Refactorings Recommendation 4.3 Coupling & Cohesion based Similarity Calculation . 4.4 Context based Similarity Calculation . . . . . . . . 4.5 Complexity Analysis of MMRUC3 Algorithm . . . . 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

55 56 58 60 62 65 68 71 75

5 Experimental Results and Discussion 5.1 Experimental Setup . . . . . . . . . . . . . . . . . . 5.2 Result Analysis . . . . . . . . . . . . . . . . . . . . 5.3 Relationships between MMRUC3 and the Projects . 5.3.1 Relationship based on Project Standards . . 5.3.2 Relationship based on Project Sizes . . . . . 5.4 Comparative Result Analysis . . . . . . . . . . . . . 5.5 Discussion on Results . . . . . . . . . . . . . . . . . 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

77 78 79 83 83 86 90 96 98

. . . . . .

99 100 104 105 109 112 113

2.3 2.4

2.2.1 Coupling . . . . . . . . . . . . 2.2.2 Cohesion . . . . . . . . . . . . 2.2.3 Single Responsibility Principle Contextual Information (CI) . . . . . Summary . . . . . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

6 Case Study 6.1 About Project VideoStore . . . . . . . . . . . . . . . . . 6.2 Recommending Move Method refactorings for VideoStore 6.2.1 Parsing & Analyzing VideoStore Project . . . . . 6.2.2 Similarity Coefficient Measurement . . . . . . . . 6.2.3 Recommendation of Move Method Refactorings . 6.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . .

. . . . . .

. . . . .

. . . . . .

. . . . .

. . . . . .

. . . . .

. . . . . .

. . . . .

. . . . . .

38 43 46 49 54

7 Conclusion and Future Direction 114 7.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 7.1.1 Guidelines to Use MMRUC3 Approach . . . . . . . . . . . . 115 7.1.2 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . 116 x

7.2

Future Direction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

A Price and Movie class of VideoStore Project

118

Bibliography

120

xi

LIST OF FIGURES

1.1

Code Smell and Refactoring . . . . . . . . . . . . . . . . . . . . . .

2.1 2.2 2.3 2.4 2.5

Example of Refactoring . . . . . . . . . . . . . . . Example of Move Method Refactoring . . . . . . . Coupling and Cohesion . . . . . . . . . . . . . . . Example of Single Responsibility Principle(SRP) . Illustration of Cosine Similarity . . . . . . . . . .

3.1

Overview of M ove M ethod Refactorings Recommendation Literature 37

4.1 4.2 4.3 4.4

Architecture of Recommending Move Method Refactoring Approach (M M RU C3 Framework) . . . . . . . . . . . . . . . . . . . . . . . . Flow Chart of Move Method Refactorings Recommendation Approach Schematic Diagram of Contextual Similarity Measurement Process . Quadratic Nature of Complexity of MMRUC3 . . . . . . . . . . . .

57 64 71 75

5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8

Coding Standards versus Precisions and Recalls of MMRUC3 . . . Patterns of Precisions of MMRUC3 as Sizes Increase . . . . . . . Patterns of recalls of MMRUC3 as Sizes Increase . . . . . . . . . Comparison of Precisions . . . . . . . . . . . . . . . . . . . . . . . Comparison of Recalls . . . . . . . . . . . . . . . . . . . . . . . . Precision-Recall (PR) curves for both MMRUC3 and JDeodorant Comparison of F-measures . . . . . . . . . . . . . . . . . . . . . . Comparison between MMRUC3 and MethodBook . . . . . . . . .

. . . . . . . .

86 88 89 91 91 92 94 95

6.1 6.2 6.3 6.4

UML Diagram of V ideoStore Project (Refactored Version) . . . UML Diagram of V ideoStore Project (Non-Refactored Version) Examples of Java and Byte Code . . . . . . . . . . . . . . . . . Source Code Examples of V ideoStore . . . . . . . . . . . . . . .

. . . .

100 103 105 107

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . .

. . . . .

3 14 26 27 30 33

A.1 Source Code Examples of V ideoStore - II . . . . . . . . . . . . . . 119 xii

LIST OF TABLES

5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8

Experimental Projects . . . . . . . . . . . . . . . . Contingency Table for Result Analysis of MMRUC3 Results of the Proposed MMRUC3 Approach . . . . Categorization of Project Standards . . . . . . . . . Categories of the Source Projects . . . . . . . . . . Comparative Result Analysis (Precisions & Recalls) Comparative Result Analysis (F-measures) . . . . . Comparison between MMRUC3 and MethodBook .

6.1

Mathematically Cosine Similarity Calculation between method getRentedCharge and class M ovie . . . . . . . . . . . . Mathematically Cosine Similarity Calculation between method getRentedCharge and class Rental . . . . . . . . . . . Mathematically Cosine Similarity Calculation between method getRentedCharge and class P rice . . . . . . . . . . . . Similarity Scores for All Classes for Method getRentedCharge() Similarity Scores for All Classes for Method getP rice() . . . . . Recommendation of Move Method Refactorings . . . . . . . . .

6.2 6.3 6.4 6.5 6.6

xiii

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

79 80 81 85 85 90 93 95

. . 111 . . 111 . . . .

. . . .

111 112 112 113

CHAPTER

1 INTRODUCTION

Move method is one of the most significant refactoring techniques in order to enhance code quality of a software application in terms of coupling and cohesion [1]. It is generally used to move methods from irrelevant classes to more relevant ones in order to decrease coupling and increase cohesion of the application. Therefore, method placement is one of the most important design activities to ensure the code quality. The inappropriate placement of methods is known as a feature envy code smell. This code smell exists in the application when a method uses more features (data or functionality) of other classes to accomplish its task, rather than that of its current class. Due to this code smell, an application will be tightly coupled and loosely cohesive, reflecting poor design, and hence, development and maintenance effort, time and cost will be increased. In the case of structured design and programming, application design with low coupling and high cohesion lead to products that are both more reliable and maintainable [2]. 1

Chapter 1. Introduction

2

However, it requires huge time and human effort to manually identify the misplaced methods, because it is necessary to analyze each class of a system. An automatic recommendation approach assists developers to place methods into appropriate classes in a fewer time. In addition, placing methods into appropriate classes improves design quality easing maintenance activity. With the view of this objective, this research intends to provide a recommendation approach of move method refactorings for the methods placed in the inappropriate classes (affected methods). This research believes that identifying appropriate classes for the affected methods not only depends on coupling and cohesion, but also relies on contextual factors of the methods. So this work combines both coupling and cohesion based similarity, and contextual similarity between methods and classes in order to provide refactoring suggestions for the affected methods. This chapter demonstrates the issues of this automated recommendation of move method refactorings and introduces the research challenges out of these. It also briefly describes the contribution of the research. Finally, the organization of this thesis is indicated for giving a reading guideline to the readers.

1.1

Key Terminologies

Before discussing the motivation and aim of this study, several key terminologies are described in this section, to make it easier for the readers who are new to Code Smell and Refactoring research domain. Code Smells are common coding practices which make code difficult to understand for other developers [3]. Code smells do not hamper programs performance or accuracy, but decrease comprehensibility. In this thesis, the terms - ‘code smell’, ‘smell’ and ‘bad smell’ are used interchangeably. Refactoring means rewriting or changing code to remove code smells from it [3]. In Figure 1.1 (Top), the method updateUser takes too many parameters to

Chapter 1. Introduction

3

Figure 1.1: Code Smell and Refactoring perform perform some tasks. Although this function works correctly, it is not easy to understand the method at a glance, because of the long list of parameters. This practice is known as ‘Long Parameter List’ code smell. Since addUser contains ‘Long Parameter List’ smell, it is a ‘smelly method’ and therefore, it should be refactored. In Figure 1.1 (Bottom), a refactored version of the code is presented, in which, addUser is smell-free. Move Method Refactoring is a technique to place methods into more appropriate classes of a software application. It is also used to eradicate feature envy code smell from the application, resulting in enhancing software quality in terms of coupling and cohesion. Coupling and Cohesion are design characteristics of a software application in Object Oriented (OO) Design [4]. High coupling and low cohesion mean a lot of interactions among the application components and a lot of tasks to perform by each component. These inefficient software designs make the application harder to maintain. Therefore, the efficient design features should be low coupling and high cohesion of the software application.

Chapter 1. Introduction

1.2

4

Motivation

Code smell is a design problem that makes a software application harder to maintain, tightly coupled and complex [3]. The existence of these smells does not hamper the application’s performance or accuracy, but degrades code quality in terms of software maintenance. Therefore, code smells should be removed from the application in order to make maintenance tasks easier for software engineers. Refactoring is a technique which is used to remove the code smells by restructuring existing code. It not only increases aspects of software quality, but also improves productivity [5]. Therefore, refactoring is an important weapon for the engineers to ease maintenance activities through removing the code smells. High levels of coupling and low levels of cohesion make an application so complicated that it becomes very difficult for developers to maintain the application in the long run. In addition, during the development and maintenance phase, changing in one class makes effect in other classes that leads more activities to change those affected classes due to high coupling and low cohesion. These tasks require a lot of efforts on modifying existing classes, if feature envy code smells exist in the application rather than other code smells [6]. Therefore, to maintain and ensure high quality software, developers’ intension should be loose coupling and high cohesion in software design [7]. So refactoring of this code smell by moving methods into appropriate classes from incorrect ones plays an important role to decrease coupling and increase cohesion of the application. In an object oriented application, classes encapsulate internal states manipulated by their methods having lower coupling and higher cohesion. However, developers often unconsciously implement methods into incorrect classes and thus create feature envy code smell [3]. The code smell exists in an application when a method makes too many calls to other classes to obtain data or functionality in order to accomplish its task. Among the 22 types of code smells described by

Chapter 1. Introduction

5

Martin Fowler (Father of Code Smell) [3], feature envy is one of the code smells that is directly related to coupling and cohesion. This type of code smell arises when developers violate the principle of grouping similar behavior with related data which makes the application tightly coupled and loosely cohesive. Due to feature envy code smells and high coupling and low cohesion make difficult in consistent changes in the artifacts of the application [8]. Faults occurred in those artifacts because developers missed areas of the code that needed to be consistently changed after changes were done on the methods displaying the code smell [9]. In the case of structured design and programming, application designing with low coupling and high cohesion improves the design quality [10] and leads to products that are both more reliable and maintainable [2]. Therefore, to maintain high quality software, developers’ should implement loosely coupled and highly cohesive design [7, 11]. Moreover, modifying existing classes as well as introducing new features require higher effort if feature envy code smell presents in the application rather than other code smells [6]. So refactoring of the code smell by moving methods into appropriate classes from incorrect ones plays a significant role to reduce coupling and increase cohesion, and eventually, enrich modularization of the application. In a word, it improves the software quality in terms of maintainability and re-engineering process [12]. In the last decade, code smells have become an established concept for patterns or aspects of software design that may cause problems for further development and maintenance of the system [6]. However, manual inspection to group similar methods in the same classes is a time consuming and risky process, as assumption of method placement might not be correct always, and it varies from developer to developers. As a result, design and maintenance problems might exist in this manual process which increases time, cost and effort. In order to decrease coupling and increase cohesion in an

Chapter 1. Introduction

6

application, automatic move method refactoring technique is inevitable which is used to detect methods implemented in incorrect classes and recommend more appropriate classes for those methods. It is essential to automate the refactoring process that solves the manual problems resulting in enhancing software quality.

1.3

Issues in State-of-the-Art Approaches

Move method refactoring to remove feature envy code smell is a significant research field that enhances code quality in terms of coupling and cohesion. JDeodorant, a popular refactoring tool, follows a classical heuristic to refactor feature envy code smell by recommending move method refactorings [13]. The technique is based on coupling and cohesion measured by Jaccard Distance technique. the approach provides more recommendations than another technique, proposed by Sales et al. [14]. This paper has used Sokal and Sneath 2 similarity measurement technique in order to measure coupling and cohesion. As JDeodorant is a well-known plugin for eclipse IDE in order to identify and refactor feature envy code smell, Sales et al. compared the result with JDeodorant and claimed its recommendation technique for move method is more appropriate than JDeodorant. In another paper, Bavota et al. proposed an approach, named MethodBook to detect feature envy smell using a technique called Relational Topic Model (RTM) [15]. It uses comments, variable names and method invocations for the refactoring of the code smell. There exist more works based on coupling and cohesion in order to group similar methods into a class [16, 17, 18]. Other works used historical information of an application from version controlling system in order to remove the code smell [19, 20]. However, none of the researches has not addressed contextual factors of an application to group similar methods. Contexts of a method provide significant information and an important factor to group similar behavior of methods. Ac-

Chapter 1. Introduction

7

cording to Single Responsibility Principle (SRP) [21] , a class stands for a single responsibility, and the methods within the class perform the responsibility. This responsibility is referred to as a context and the methods within a class are based on it. Therefore, the contextual information might be a significant factor to improve the accuracy of the recommendation of move method refactoring technique. Moreover, most of the existing techniques have not considered static entities (methods and attributes of a class) because these have used reference names of method invocations in the similarity measurement process, and object static entities are not used by reference in object oriented system. Missing these important factors for grouping methods leads to less accuracy in the existing approaches. So there are a lot more works needed in this field to improve the accuracy of the move method refactoring technique so that the developers need not to give extra time and cost to maintain the application.

1.4

Research Questions

The existing researches provide various approaches regarding move method refactorings and feature envy code smell detections automatically with variance in accuracy. The most important part of the recommendation of the refactorings is to group methods based on similarities. It can be performed manually to get accurate results, that makes the process infeasible in terms of time, cost and efforts. So automation of the recommendation process is essential to enhance code quality. Existing techniques mostly were based on coupling and cohesion, have not considered the contextual factors of an application. Incorporation of the contextual factors in the refactoring recommendation approach adds a new dimension in the research field. Therefore, there is a scope to enhance and modify the similarity measurement technique in order to improve the recommendation move method refactoring technique. So this leads to the following research question:

Chapter 1. Introduction

8

RQ1: How to automate the recommendation process of move method refactorings? An approach of recommending move method refactorings is required which is based on contextual similarity as well as coupling and cohesion. At first, it analyzes source code information of an application through parsing in order to acquire dependency and context based information. Then, similarity between a target method and classes are calculated using method calls and used attributes which are referred to as coupling and cohesion. After that, another kind of similarity is measured based on the context between the target method and classes which is referred to as contextual similarity. Finally, these two kinds of similarity are combined and compared to recommend more appropriate class for the target method. In object oriented programming (OOP), two types of entities (methods and attributes) can be created: static and non-static entities. Most of the existing researches have considered the non-static entities for the refactorings, as these are based on references of method calls, and in OOP, references can not be used for static entities. This consideration leads another research question as following: RQ2: How to incorporate static entities (methods, attributes) along with non-static ones for recommending move method refactorings? An approach is required considering both static and non-static entities (methods, attributes) to make the recommendation approach more generalized. To solve this, for calculating dependency based similarity, class names for method invocations and attribute usages are used instead of reference names. References of classes are used for using non-static entities and class names are used for static ones. However, each entity belongs to a class and hence considering class names of entity usages is more useful in the similarity measurement process. Moreover, contextual information considers all contexts of an application including static and non-static entities, that can be applicable for both entities.

Chapter 1. Introduction

1.5

9

Contribution

The first contribution of this research is to propose a recommendation approach of move method refactorings based on the three factors (C3 ) - coupling, cohesion and contextual similarity. The metric, contextual similarity has been introduced in this thesis to provide more accurate recommendations than existing approaches, as existing ones have not considered the factor. In addition, this contextual metric establishes the key development and design concept of grouping similar behavior of methods into same classes. So, the integration of the metric differentiates the approach from the traditional approaches, and eventually improves the recommendation approach. The integration of Information Retrieval (IR) techniques, such as - term frequency (tf), inverse document frequency (idf), cosine similarity, etc. to acquire the contextual information, makes the approach novel and adds a new dimension in the refactoring research field. To the best of author knowledge, no research exists about the move method refactorings that uses IR techniques to measure contextual similarity, along with dependency based similarity. Moreover, the approach is generalized for both static and non-static methods. Another contribution of the research is that, relationships between the accuracy of proposed approach and two metrics (project standards and sizes) are established. The analysis states that the approach is highly dependent on project standards (naming conventions) and little depends on project sizes. Lack of dependency on project sizes indicates that the approach is balanced enough for the recommendations. For validation, a framework of the proposed approach, named as ‘Move Method Refactoring Using Coupling, Cohesion and Contextual Similarity’ (MMRUC3), is developed and a comparative analysis is conducted with JDeodorant tool (a popular eclipse plugin for refactorings). The preliminary evaluation on seven well-known open source java projects provides satisfactory results with average

Chapter 1. Introduction

10

precision of 18.91%, recall of 69.91% and F-measure of 29.77% which are better than the JDeodorant tool. The results also indicate that the incorporation of contextual strategy along with coupling and cohesion, and inclusion of static entities with non-static ones are important factors of the recommendation of move method refactoring technique to enhance software design quality in terms of coupling and cohesion. In summary, this paper makes the following main contributions: (i) A recommendation approach of move method refactorings based on coupling, cohesion and contextual similarity to decrease coupling and increase cohesion. (ii) An approach to automatically refactor feature envy code smells for both static and non-static methods. (iii) Establishment of relationships between the accuracy of the proposed approach and two metrics (project standards and sizes). (iv) Evaluations on seven well-known open source java projects based on precision, recall and F-measure metrics. The results show that the approach recommends move method refactorings more effectively than JDeodorant tool.

1.6

Organization of the Thesis

This section gives an overview of the remaining chapters of this thesis. The chapters are organized as follows – Chapter 2: Some preliminaries regarding refactorings and code smells are discussed along with the basic concept of move method refactorings. Moreover, the software design qualities - coupling and cohesion also described here. An overview of SRP principle is provided as well.

Chapter 1. Introduction

11

Chapter 3: To the best of the researcher’s knowledge, no existing literature considers contextual information in the refactoring process. This chapter shows the existing research activities in the recommendation of the refactorings. Chapter 4: The methodology of the recommendation framework of move method refactorings and the proposed algorithms are briefly demonstrated in this chapter. Moreover, a lemma about the complexity of the approach is derived in the chapter. Chapter 5: The implementation results of the proposed approach and comparative result analysis are presented here. In addition, relationships between the accuracy of the approach, and project accuracy and sizes, are established in this chapter. Chapter 6: A case study on a renowned sample project is shown here for the assessment of the proposed approach. Chapter 7: It is the concluding chapter which contains a discussion about the framework and some future directions.

CHAPTER

2 BACKGROUND STUDY

Coupling and cohesion are the two main design features that have impact on Object-Oriented (OO) Design. Strong coupling and weak cohesion of a system are generally associated with lower productivity, greater rework, more significant efforts by developers and higher defect rates [22]. Consequently, weak coupling and strong cohesion can be regarded as indicators of good design quality in terms of maintenance. However, Move Method is one of the most significant refactoring techniques, used to improve the design quality. Moreover, it is used to remove feature envy code smell which makes the system loosely cohesive and highly coupled. To work with the refactoring of the smell, background knowledge regarding refactorings, code smells, and software design quality (coupling and cohesion) are necessary. Therefore, the aim of this chapter is to provide a formal definition of some of the concepts and terminologies relevant to the scope of the problem studied in this thesis. 12

Chapter 2. Background Study

2.1

13

Move Method Refactoring to Remove Feature Envy Code Smell

Move Method, refactoring technique, assists developer to remove feature envy code smells which is addressed in the thesis. The two terms are significantly important in this thesis - refactoring and code smell. These topics are covered in this section.

2.1.1

Refactorings and Code Smells

Generally, refactoring is a way of improving software design quality and minimizing the existence of design problems, that are, code smells.

2.1.1.1 Refactorings “Refactoring is the process of changing a software system in such a way that it does not alter the external behavior of the code yet improves its internal structure. It is a disciplined way to clean up code that minimizes the chances of introducing bugs. In essence when you refactor you are improving the design of the code after it has been written.” - Martin Fowler (Father of Code Smell) [3]. An example of a refactoring could be extract method (Figure 2.1): If a method is too long, it should be decomposed, using this refactoring technique. Find a clump of code (within the long method) that goes well together, create a new method with a descriptive name and move the code into the new method. If local variables are being used, these need to be passed as parameters. The last step is to add a invoke to the new method and test the code [3]. Refactoring improves the design of existing system and shows how it can make object-oriented system simpler and easier to maintain [3]. Moreover, it helps to find bugs and to add new functionalities in an easier manner. It enhances reusability of codes [23]. The refactoring process consists of a number of distinct

Chapter 2. Background Study

14

Figure 2.1: Example of Refactoring activities [24]. These activities are mentioned below: 1. Identify where the software should be refactored; 2. Determine which refactoring(s) should be applied to the identified places; 3. Guarantee that the applied refactoring preserves behavior; 4. Apply the refactoring; 5. Assess the effect of the refactoring on quality characteristics of the software (e.g., complexity, understandability, maintainability) or the process (e.g., productivity, cost, effort); 6. Maintain the consistency between the refactored program code and other software artifacts (such as documentation, design documents, requirements specifications, tests and so on) Types of Refactorings Move Method is the name of a refactoring technique that is used for method placement. If a method is developed into an incorrect class, this refactoring assist

Chapter 2. Background Study

15

to place the method into more appropriate one. According to Martin Fowler, A method is, or will be, using or used by more features of another class than the class on which it is defined. Move the method to the class it uses most. This thesis deals with this refactoring technique. Besides Move Method, there exist a number of refactoring techniques defined by Martin Fowler [3]. The common and related refactorings are stated in the following [25]. 1. Extract Method A code fragment that can be grouped together. Turn the fragment into a method whose name explains the purpose of the method. 2. Pull Up Method Two subclasses have the same method. Move the method to the superclass. 3. Push Down Method A Method is used only by some subclasses. Move the method to those subclasses. 4. Form Template Method Two methods in subclasses perform similar steps in the same order, yet the steps are different. Get the steps into methods with the same signature, so that the original methods become the same. Then you can pull them up to form a template for the subclasses. 5. Decompose Conditional A portion of complicated conditional (if-thenelse) statement. Extract methods from the condition, then part, and else parts. 6. Replace Parameter with Method An object invokes a method, then passes the result as a parameter for a method. The receiver can also invoke this method. Remove the parameter and let the receiver invoke the method. 7. Move Field A field is, or will be, used by another class more than the class on which it is defined. Move the field to the class it uses most.

Chapter 2. Background Study

16

8. Hide Method A method is not used by any other class. Make the method private. 9. Replace Conditional with Polymorphism A code fragment has a conditional that chooses different behavior depending on the type of an object. Move each leg of the conditional to an overriding method in a subclass. Make the original method abstract. 10. Substitute Algorithm You want to replace an algorithm with one that is clearer. Replace the body of the method with the new algorithm 11. Collapse Hierarchy A superclass and subclass are not very different. Merge them together. 12. More Refactoring Techniques There exist many more refactoring techniques - Extract Subclass, Extract Super Class, Push Down Field, Pull Up Field, Hide Delegate, Encapsulate Downcast, Rename Method, Extract Module, Encapsulate Field, Extract Variable, Extract Interface, Inline Class, Inline Method, Inline Module, Remove Middle Man, Remove Parameter, Introduce Local Extension, Introduce Parameter Object, Introduce Null Object, Introduce Assertion, Introduce Foreign Method, Preserve Whole Object, Replace Array with Object, Replace Type Code with Classes, Replace Type Code with Subclasses, Replace Type Code with Super Classes, Replace Data Value with Object, Replace Delegation With Hierarchy, Replace Delegation With Inheritance, Replace Method with Method Object, Remove Setting Method, Encapsulate Collection, Replace Delegation with Inheritance, etc.

Among the various types of refactorings stated above, Move Method is one of most significant techniques. It is directly associated with the software design features - coupling and cohesion. In fact, the technique assists developers in their

Chapter 2. Background Study

17

maintenance activities through optimizing coupling and cohesion. Therefore, the thesis is based on this refactoring technique to enhance software quality.

Why Should We Do Refactoring? As stated earlier, although refactoring does not add features or functionalities of a software system, it is sharp weapon for developers in their maintenance activities. To understand the refactoring related activities, it is necessary to learn the importance of refactorings. Refactoring makes a software system easier to understand and cheaper to modify without changing its observable behavior by changing its internal structure. The purposes of refactoring are demonstrated in the following [3].

1. Refactoring Improves the Design of Software Without refactoring, the design of the program will decay. A unit of code (in most cases, a module) is decayed if it is harder to change than it should be, measured in terms of effort, time interval and quality [26]. As people change code frequently with or without a full comprehension of the design of the code, the code loses its structure. Therefore, it becomes harder to find the design by reading the code. The harder it is to see the design in the code, the harder it is to preserve it, and the more rapidly it decays. Regular refactoring helps code retain its shape, improves design quality [27, 28] and eventually eases maintenance task.

2. Refactoring Makes Software Easier to Understand The primary business of software is no longer new development; instead it is maintenance [29] and a good understanding of the software system is needed to reduce the cost of maintaining it. Software understanding tasks represent 50% to 90% of the the maintenance efforts [30]. However, the trouble is that when developers are trying to get the program to work, they are not thinking about

Chapter 2. Background Study

18

that future developer. It takes a change of rhythm to make changes that make the code easier to understand. Refactoring helps developers to make the code more readable. When refactoring developers have code that works but is not ideally structured. A little time spent refactoring can make the code better communicate its purpose. Programming in this mode is all about saying exactly what coder means. Therefore, refactoring increases understandability of the application [31].

3. Refactoring Helps Finding Bugs During maintenance phase of SDLC (Software Development Life Cycle), developers’ task is to find bugs and fix them. However, it is a very expensive and time consuming process to find bug at exact location in source code. Its effectiveness depends on developers’ understanding of the program being debugged [32]. Therefore, refactoring is an important tool for developers that enhances comprehensibility of the code and thus makes it easier to find bugs in lesser time and effort [33]. In fact, refactoring helps developers much more effective at writing robust code.

4. Refactoring Helps Programming Faster A good design is essential to maintain the speed in software development. Without a good design, you can progress quickly for a while, but soon the poor design starts to slow the development downs. Developers spend time finding and fixing bugs instead of adding new functionalities. Changes take longer as they try to understand the system and find the duplicate code. Refactoring helps to develop software more rapidly [34], because it stops the design of the system from decaying. In fact, it improves design and readability, reduces bugs and as a whole improves software quality.

Chapter 2. Background Study

19

When Should We do Refactoring? Refactoring is very important task for developers in order to reduce maintenance cost. Therefore, it is essential to do refactoring when the cost of refactoring is less than the cost of not refactoring during the maintenance phase of SDLC. In the maintenance phase, software engineers do three types of changed activities [26, 35]– 1. Adaptive Changes – To adapt the system with the new technologies. 2. Corrective Changes – To fix bugs in the system. 3. Perfective Changes – To add new functionalities as well as enhance code quality of the system. In order to do these activities at lower cost and effort, refactoring is an inevitable tool for the software engineers. So, refactor the code at the time of these three activities [3] which are demonstrated below.

1. Refactor When to Add Function The most common time to refactor code is when to add a new feature to the software system. The first reason to refactor here is to enhance understandability the code that is needed to modify. Finally, refactor the code in order to improve design quality so that the software will be able to adapt with the addition of new functionalities or technologies in an easier, faster and smoother way.

2. Refactor When to Fix a Bug For detecting and fixing bug, it is very important to have knowledge and understandability of the code. It takes a lot of time find bugs rather than fix these. Refactoring helps much in this case to detect the location of bug in lesser time and effort by making the system more understandable.

Chapter 2. Background Study

20

3. Refactor When to Review Code Code reviews help spreading knowledge through a development team and are also very important in writing clear code. Refactoring also helps the software engineers whether the changes can be easily refactored in the system or not. If yes, then make the changes. So, suggestions of the reviewers play a significant role to refactor the code in the reviewing process. Software systems need to go under modifications, improvements and enhancements in order to cope with evolving requirements. This maintenance can adversely affect their quality. Refactoring is one of the most important and commonly used techniques for improving the quality of software [36]. It is a process that can make object-oriented (OO) code simpler and easier to maintain. Refactoring, if not done properly, can set software engineers back days, even weeks. However, refactoring becomes risky if it is practiced informally or ad-hoc because it likes one digs one’s own grave. So, to avoid digging own grave, refactoring must be done, since it has two main benefits for software engineers. (1)Maintainability – It is easier to fix bugs because the source code is easy to read and its intent is easy to grasp [37]. (2)Extensibility – It is easier to extend the functionalities of the system if it uses recognizable design patterns, and it provides some flexibility if it is a refactored system [38]. However, to enhance software quality and make maintenance task easier, Move Method refactoring plays significant role than others. It is one of the most important refactoring techniques that is used to decrease coupling and increase cohesion in order to achieve good software design. That is why, the thesis topic is move method refactoring technique to optimize coupling and cohesion of the system.

2.1.1.2 Code Smells “A code smell is a design that duplicates, complicates, bloats or tightly couples code.” – Martin Fowler [3].

Chapter 2. Background Study

21

Undesired design flaws, known as code smells (or smells or bad smells), are widely considered as indicators of decrease in software quality [39]. In 1999, Martin Fowler, father of code smell, first identified a set of common symptoms in code that are threat to software quality and introduced the term ‘Code Smell’ to denote those smells [3]. According to his definition, code smells are surface indications of bad design practices implemented by developers that usually corresponds to a deeper problem in the system. Code smells are usually not bugs, these are not technically incorrect and do not prevent the program from functioning. These are structural characteristics of a software that may indicate a code or design problem that makes software difficult to evolve and maintain [40]. It exists in the source code due to poor design that makes the system difficult to maintain and hence it will be a cost intensive activity. Although code smells do not interfere with the functionality, accuracy or performance of a software, it makes a code difficult to understand [4]. For instance, a long method with thousands of statements might provide accurate outputs in real time, but it certainly is not easy for a developer to understand it. In the same way, system having high coupling and low cohesion makes difficult to update or accommodate changes. Therefore, code smells do not obstruct the users of a software, but those are big headaches for the developers.

Type of Code Smells Feature envy is the name of a code smell that is related to method misplacement in the application. It is a threat for developers to achieve software quality as method misplacement makes software application complex in terms of coupling and cohesion. According to the definition described by Martin Fowler, “Feature envy code smell occurs when a method is more interested in a class other than the one it actually is in” [3].

Chapter 2. Background Study

22

Besides feature envy code smell, there are 22 types of code smells defined by Martin Fowler [3]. The most common code smells are described below – 1. Duplicated Code The code smell occurs if same code structure in more than one place in an application is implemented. In other words, duplicated code is the result of copy-paste programming. Code duplicity creates confusion while performing maintenance tasks, hence should be removed or unified. The simplest duplicated code problem is when it is found the same expression in two methods of the same class. 2. Long Method Long Method or brain method is a method that performs too many responsibilities instead of one. It is large in size and has a high amount of complexities in it. Generally, this kind of method does more than its name suggests it. 3. Large Class or God Class God Class is a class that has become extremely large in size, controls a lot of other classes and performs too many tasks. A God Class is very hard to understand or maintain because of its size and complexity. These classes do not follow the good practice of divide-and-conquer which consists of decomposing a complex problem into smaller problems. Therefore, these classes also have low cohesion. 4. Long Parameter List A method that takes too many parameters every time it is called, is said to have Long Parameter List code smell. Generally, this smell occurs when the method has more than four parameters. This code smell creates problem in understandability and maintainability of the code. 5. Divergent Change

Chapter 2. Background Study

23

If a developer have to perform a series of changes in many places to implement a change, it means there is severe violation of encapsulation and this situation is known as Divergent Change smell. Basically, divergent Change occurs when many changes are made to a single class in different ways for different reasons. 6. Shotgun Surgery A class is affected by Shotgun Surgery bad smell when a change to this class (i.e., to one of its fields or methods) triggers many little changes to several other classes. This code smell makes maintenance task difficult. 7. Switch Statements Long conditional statement indicates that a code is more structural than object oriented. If conditional switch statements are used instead of object checking, it is considered as a code smell. 8. Parallel Inheritance Hierarchies When developer implements multiple hierarchies for single structure, it is considered as a design problem. This situation is the result of unplanned programming. 9. Speculative Generality When code is implemented on unnecessarily granular level, it is called Speculative Granularity code smell. 10. Message Chains When sending a message from one method to other method requires to go through other methods, it is called message chain. This situation is considered as a Message Chain code smell. 11. Refused Bequest Sometimes a class is forced to implement a super class just for ease of im-

Chapter 2. Background Study

24

plementation. This improper use of inheritance is considered as a Refused Bequest code smell. 12. More Code Smells The other code smells are - Lazy Class, Data Clumps, Primitive Obsession, Comments, Inappropriate Intimacy, Data Class, Temporary Field, Middle Man, Alternative Classes with Different Interfaces, Incomplete Library Class.Besides the code smells defined by Martin Fowler, there exist some other code smells [41]. These are - Blob code smell, Conditional Complexity, Combinitorial Explosion, Inconsistent Names, Indecent Exposure, Oddball Solution, Solution Sprawl, Type Embedded in Name, Uncommunicative Name, etc.

The existence of these code smells in a software system indicates that there are issues with code quality, such as - understandability and changeability, and hence it requires to be refactored. Feature envy is one of the most significant code smells that is directly co-related with coupling and cohesion. The aim of the thesis is to identify and remove this significant code smell through move method refactoring. The characteristics of feature envy smell and its refactoring technique are described briefly in the next section.

2.1.2

Move Method Refactorings

Placement of methods within classes is one of the most important design activities for any object oriented application to optimize software modularization. To optimize interactions among modularized components, Move Method refactoring plays a significant role through grouping similar behaviors (or methods). It is also used as a refactoring technique of feature envy code smell by placing methods into correct classes from incorrect ones. It is used in such situation: A method is, or will be, using or used by more features of another class than the class on which it

Chapter 2. Background Study

25

is defined [3].

Application of Move Method Refactorings Move Method is a significant refactoring way to improve software quality in terms of coupling and cohesion. It should be applied in the software system during the following cases. • Existence of Feature Envy Code Smells As stated earlier, move method refactoring technique is used to eradicate the code smells from an application. The code smell arises in any application when developers violate the principle of grouping behavior with related data which makes the application high coupled and low cohesive. When a method makes too many calls to other classes to obtain data or functionality to accomplish its task, feature envy is present there. That is, the method envies features of other classes rather than its current class. Therefore, the method should be kept in that envied class (the class whom data the method uses mostly). • Application Having High Coupling and Low Cohesion Loose coupling and high cohesion make an application easier to maintain. In order to optimize coupling and cohesion, the refactoring technique plays significant role. Coupling and cohesion are described in Section 2.3 • Ensuring Single Responsibility Principle(SRP) According to SRP design principle, a component or class stand for a single responsibility making the application more maintainable.

Example of Move Method Refactoring The move method refactoring implies that a method should be moved to a class if it is placed in the inaccurate class. Figure 2.2 shows the refactoring process as

Chapter 2. Background Study

26

an example.

(a) Before Move Method

(b) After Move Method

Figure 2.2: Example of Move Method Refactoring Figure 2.2 shows the move method refactoring technique. Suppose, Drive() method of DRIV ER class uses most features of CAR class than its current class shown in Figure 2.2(a). As a result, the module is tightly coupled and loosely cohesive. So the method should be moved into the CAR class from the DRIV ER class resulting in low coupling and high cohesion shown in Figure 2.2(b). It is recommended to move a method to a class that contains most of the data used by the method. This makes classes more internally coherent. Moreover, move a method in order to reduce or eliminate the dependency of the class calling the method on the class in which it is located. This can be useful if the calling class is already dependent on the class to which it is planned to move the method. This reduces dependency or coupling between classes. So, move method refactoring is a fundamental support for improving the cohesion of a class and reducing coupling between classes. Feature Envy is the violation of the design principle in any object oriented system about grouping behavior with related data. The principle of grouping data is based on Single Responsibility Principle (SRP). In addition, it increases coupling and decreases cohesion of the system, and hence increases maintenance

Chapter 2. Background Study

27

cost, time and effort. So this code smell should be removed from the system to improve software quality. The most important cure to eradicate the code smell is known as Move Method refactoring technique[3].

2.2

Design Quality: Coupling, Cohesion and Single Responsibility Principle

Software metrics (e.g., coupling and cohesion) play an important role in determining the design quality of a software [4, 42, 43]. These metrics are considered to be the most important attributes in any object oriented environment [7, 44, 21], as well as are the two key metrics directly co-related with the feature envy code smell [1]. SRP (Single Responsibility Principle) plays a significant role to optimize these metrics in the application. These topics are described in the following section.

Figure 2.3: Coupling and Cohesion

2.2.1

Coupling

Coupling has been defined the first time in the realm of procedure-oriented systems [45]. Stevens et al. defines coupling as “the measure of the strength of association established by a connection of one module to another. Strong coupling complicates

Chapter 2. Background Study

28

a system, since a module is harder to understand, change, or correct by itself if it is highly interrelated by other modules. Complexity can be reduced by designing systems with the weakest possible coupling between modules”. For any object oriented (OO) application, method invocations and attribute usage bind classes together. By calling a method, its implementation is imported in the caller method’s scenario. Such dependencies are the main transport routes for ripple effects during local alterations. The value of this type of coupling depends upon the number of distinct methods called, and the calling frequency [1]. The coupling of a module refers to the degree to which a module is related to other modules [46, 47]. Modules with low coupling make an application easier to maintain as well as reusable one. Methods are coupled by invocation of each other or by sharing data. Thus those methods may have an interaction relationship with each other. When a method uses method and attributes of other classes to accomplish its task, then it is regarded as highly coupled system (coupling is represented in Figure 2.3 by dotted line). Next to methods, also object classes have to be analyzed in terms of relationships with each other and thus in terms of coupling properties. Object classes may have component relationships and inheritance relationships with each other, in addition to interaction relationships. The coupling metric shows the relationship between modules in a object oriented application. A class is coupled to another class if it calls methods of another class. Since coupling introduces inter-dependencies among the classes, coupled systems are complex, less maintainable and have reduced potential of reusing [48, 22].

2.2.2

Cohesion

Cohesion is an important attribute corresponding to the quality of the abstraction captured by the class under consideration. Good abstractions typically exhibit high cohesion. Cohesion has been defined as ”the degree of connectivity among the

Chapter 2. Background Study

29

elements of a single module” [45]. For any object oriented (OO) application, when methods share common attribute usages, these are similar regarding internal data usages, and therefore belong together in a single module or class [1]. These methods can also be dependent upon each other by method invocation. In other words, the cohesion of a module refers to the relatedness of the module components. A module that has high cohesion performs one basic function and cannot be split into separate modules easily. Highly cohesive modules are more understandable, modifiable, and maintainable [46, 49]. Furthermore, Modules with strong cohesion, in particular with functional cohesion, greatly improve the possibility for reuse. A module has strong cohesion if it represents exactly one task of the problem domain, and all its elements contribute to this single task. Elements of a module are statements, sub-functions, and possibly other modules. More specifically, the object-oriented counterparts of a module are methods and classes. The elements of a method are statements, local variables, and also instance variables, since they are accessed either directly or via access functions in the methods. A method should perform its task through method calls and attribute usages of its present class that is referred as cohesion (shown in Figure 2.3 by solid lines). Next to methods also object classes have to be analyzed. The elements of an object class are methods and instance variables. Thus it can be easier to distinguish the cohesion of a method from the cohesion of an object class. Highly cohesive modules in an application should stay alone because high cohesion indicates good and reusable class division. Lack of cohesion increases complexity, and complex development is more error prone in software development and maintenance. In order to improve the design, good practice is to subdivide low cohesion classes to increase the cohesiveness [48].

Chapter 2. Background Study

2.2.3

30

Single Responsibility Principle

The five principles are called together SOLID: S for the Single Responsibility Principle (SRP), O for the Open/Closed Principle (OCP), L for the Liskov Substitution Principle (LSP), I for the Interface Segregation Principle (ISP) and D for the Dependency-Inversion Principle (DIP). The Single Responsibility Principle (SRP) is one of the simplest of the principles but one of the most difficult to get right. Conjoining responsibilities is something that is done naturally. Finding and separating those responsibilities is much of what software design is really about. The single responsibility principle refers that every module or class should have the responsibility over a single part of the functionality provided by the software, and that responsibility should be entirely encapsulated by the class. Robert C. Martin expresses the principle as follows [21] – “A class should have only one reason to change.” In the context of the SRP, a responsibility is defined by a reason for change. Consequently most changes will affect a small proportion of classes. If one can think of more than one motive for changing a class, then that class has more than one responsibility. Therefore, refactoring should be performed so that this unanticipated change becomes an anticipated change. The principle is shown in the Figure 2.4.

(a) Violation of SRP (More than one responsibility)

(b) Preservation of SRP (Separated responsibilities)

Figure 2.4: Example of Single Responsibility Principle(SRP)

Chapter 2. Background Study

31

Class Rectangle may be forced to make changes from two different unrelated sources, as shown in Figure 2.4 (a). One is from the Computational Geometry Application (CGA) that only calculates area of a rectangle with the help of the Rectangle class (more specifically, using area() method of the Rectangle class), never draws the rectangle on the screen. The other is from Graphical Application (GA) that does some computational geometry, but it definitely draws the rectangle on the screen with the help of the Rectangle class (more specifically, using draw() method of the Rectangle class). This design violates the SRP and might cause severe problem in the system because of having two responsibilities in the Rectangle class. The first responsibility is to provide a mathematical model of the geometry of a rectangle. The second responsibility is to render the rectangle on a GUI. As a result, a change from either of the two sources causes the other application to recompile. If one forgets to recompile the both classes, that application might break in unpredictable ways. A better design is to separate the two responsibilities into two completely different classes - Geometric Rectangle and Graphic Rectangle, as shown in Figure 2.4 (b). Class textitComputational Geometry Application is no longer dependent on graphical side of the class Rectangle and thus it becomes independent of class Graphical Application. Any change caused by graphical application no longer requires textitComputational Geometry Application to be recompiled. However, any changes from the textitComputational Geometry Application side may cause Graphical Application to be recompiled. Therefore, this design preserves the SRP and thus changes affect small portion of the application.

2.3

Contextual Information (CI)

Contextual Information (CI) refers the information based on contexts. This information is generally associated with query processings in the Information Retrieval

Chapter 2. Background Study

32

(IR) technique [50, 51]. The aim of CI is to acquire relevant documents based on query contexts. Generally, Information Retrieval (IR) techniques, such as - cosine similarity generally used for recommender systems [52] based on term frequency (tf) and inverse document frequency (idf) to identify similar context. However, the application of CI in the field of software refactorings adds a new dimension. The terminologies of CI, related to this thesis, are covered in this section.

Term Frequency (TF): Term Frequency (TF) refers the number of occurrences of a term (t, unique word) in a document. It is denoted by T Ft for a term t. For example, a document (d) has 1000 terms in total and a term, t = ‘football’ occurs 50 times in the document, d. So, T Ft,d = 50 for the document. Basically, TF is a property of a document. However, the actual TF is gained by logarithmic normalization.

Document Frequency (DF): Document Frequency (DF) refers the number of documents that hold a term (t).It is denoted by DFt for a term t. For example, there exist N = 100 documents in a collection and a term, t = ‘football’ appears in 10 documents in the collection. So, DFt = 10 for the collection. Basically DF is a global property for a term in the collection of documents. However, the actual DF is gained by logarithmic normalization.

Inverse Document Frequency (IDF): Inverse Document Frequency (IDF) is used to discriminate each document based on uniqueness of a term. It is also a global property of a term in the document, defined as below.

IDFt = log2 (N/DFt )

TF-IDF: TF-IDF combines the definitions of term frequency (TF ) and inverse document frequency (IDF ), to produce a composite weight for each term in each

Chapter 2. Background Study

33

document. The TF-IDF weighting scheme assigns to term t a weight in document d given by –

T F − IDFt,d = T Ft,d ∗ IDFt

Document As Vector: In the context of IR, each document is viewed as a vector with one component corresponding to each term in the dictionary (collection of terms), together with a weight for each component that is given by TF-IDF. For dictionary terms that do not occur in a document, this weight is zero. This vector form proves to be crucial to scoring and ranking the documents.

Vector Space Model (VSM): The representation of a set of documents as vectors in a common vector space is known as the vector space model (VSM). It is fundamental to a host of information retrieval operations, such as - ranging from scoring documents on a query, document classification and document clustering, etc. To compensate for the effect of document length, the standard way

Figure 2.5: Illustration of Cosine Similarity (Similarity(d1, d2) = cosθ) of quantifying the similarity between two documents d1 and d2 is to compute

Chapter 2. Background Study

34

the cosine similarity of their vector representations V~ (d1) and V~ (d1), where the numerator represents the dot product (also known as the inner product) of the vectors V~ (d1) and V~ (d2), while the denominator is the product of their Euclidean lengths. Therefore, Cosine Similarity is defined by –

Similarity(d1, d2) =

V~ (d1).V~ (d2) |V~ (d1)||V~ (d2)|

Figure 2.5 illustrates cosine similarity.

Tokenization: Given a character sequence and a defined document unit, tokenization is the process of chopping it up into pieces, called tokens, perhaps at the same time throwing away certain characters, such as punctuation. Here is an example of tokenization: Input: Friends, Romans, Countrymen, lend me your ears; Output: Friends Romans Countrymen lend me your ears These tokens of a document are grouped together in order to get semantic information of the document.

Bag of Words Model: The bag-of-words model is a simplifying representation used in natural language processing and information retrieval (IR). In this model, a document is represented as the bag containing its words, disregarding grammar and even word sequence but keeping multiplicity. The bag-of-words model has also been used for computing tf-idf scores for its unique words. The bag-of-words model is commonly used in methods of document classification where the (frequency of) occurrence of each word is used as a feature for training a classifier Inclusion of Contextual Information in the refactoring process is significant because each component of a software application is based on a context. If com-

Chapter 2. Background Study

35

ponents can be thought as documents having terms, it is relevant to acquire the contextual factors of the component. The contexts can be gained by using various IR techniques, such as - TF, IDF, VSM, Cosine Similarity, etc.

2.4

Summary

A discussion about code smells and refactorings has been done in this chapter. Code smells make an application harder to maintain and hence these should be removed through appropriate refactoring techniques. Move method is one of the significant refactoring techniques that makes the application poor design in terms of coupling, cohesion and SRP. So this thesis provides an approach regarding the refactoring technique to enhance the design quality. Therefore, the concepts and terminologies described in this chapter help to understand the proposed approach.

CHAPTER

3 LITERATURE REVIEW OF RECOMMENDING MOVE METHOD REFACTORINGS

Researches on recommending Move Method refactoring technique intend to find a way of suggesting a method to be placed into a more appropriate class from an inappropriate one of an application. It is an important field of research, as identifying suitable classes manually for the methods existed into inappropriate ones, is a tedious task for developers. Because, method placement depends on developers’ designing and intuitive perspective, and hence, it is difficult to be perceived by an automated system. Moreover, misplacement of methods causes the existence of feature envy code smell (a design problem), high coupling and low 36

Chapter 3. Literature Review of Recommending Move Method Refactorings

37

cohesion in the application. However, move method refactoring helps to achieve the expected design features, i.e., low coupling and high cohesion. In fact, there is a relationship between Move Method refactorings recommendation and feature envy code smell detection. The refactoring technique is used to eradicate the code smell, along with the improvement of software design quality, such as - high coupling and low cohesion of the application. However, most of the literature have used the refactoring technique to remove the code smell automatically from the application. The overview of the literature in this research is shown in Figure 3.1.

Figure 3.1: Overview of M ove M ethod Refactorings Recommendation Literature

In the figure, three types of contributions are seen in the recommendation of move method refactoring related research – • Recommendation of Move Method Refactorings to Remove Feature Envy Code Smell: Recommends a method to be moved into more suitable class of an application in order to refactor feature envy code smell. • Recommendation of Move Method Refactorings: Recommends a method to be moved into more suitable class of an application in order to optimize coupling and cohesion • Detection of Feature Envy Code Smell: Detects the existence of the feature envy code smell

Chapter 3. Literature Review of Recommending Move Method Refactorings

38

The above dimensions of research use structural information of source code in the code smell detection and refactoring field. However, contextual information of a component (or a class) is another dimension in the field of refactorings to optimize coupling and cohesion. In this chapter, these literature contributions are discussed.

3.1

Move Method Refactorings and Feature Envy Code Smells

Move Method is a refactoring technique used to remove feature envy code small in order to optimize coupling and cohesion. Various tools and approaches have been developed throughout the years which are discussed in this section.

3.1.1

Recommendation of Move Method Refactorings to Remove Feature Envy Code Smells

Move Method refactoring plays a significant role in order to detect and remove feature envy code smell. In the literature, several tools and approaches about refactoring of the code smell in an automatic way have been proposed throughout the years. These tools and approaches are discussed in this section.

JDeodorant Refactoring Plug-in JDeodorant is a well-known Eclipse IDE (Integrated Development Environment ) plug-in for refactoring code smells. Actually it that identifies five kinds of code smells, namely – Feature Envy, Duplicate Code, Type Checking, Long Method and God Class, and resolves these smells by applying appropriate refactoring techniques [13, 53]. Feature envy in one of those code smells that the tool identifies as well as provides recommendation to appropriate classes of the affected methods

Chapter 3. Literature Review of Recommending Move Method Refactorings

39

as refactoring of the smell. The recommendation of move method refactoring to remove feature envy code smells is based on the notion of the distance between methods and system classes. The distance between a method m and a class C, expresses the dissimilarity between the set of entities (method calls and used attributes) accessed by m and the set of entities belonging to C. JDeodorant follows a classical heuristic to detect feature envy code smell proposed by M. Fokaefs et al. in 2007 [13]: A feature envy code smell is identified if the distance of a method from a system class is less than the distance of the method from the class that it belongs to. In other words, a method m envies a class C 0 when m accesses more services from C 0 than from the class C that it currently exists. The refactoring technique is recommended to the class that has the lowest distance satisfying certain preconditions like the envied class must not contain same method signature. The dissimilarity between methods and system classes of the application has been measured by Jaccard Distance Coefficient technique [50]. Equation of Jaccard Distance Coefficient

JaccardSimilarity(A, B) =

|(A ∩ B)| |(A ∪ B)|

JaccardDistance(A, B) = 1 − Similarity(A, B)

(3.1)

(3.2)

Here, A = a set of entities established by the method m of it’s own class C. B = a set of entities established by the method m other class C 0 . In this technique, however, the tool detects only non-static methods as feature envy code smells and suggests move method refactoring for those smells having an average precision of 26.47% for JHOTDRAW system [14]. It has not used static entities in its detection and refactoring approach. Moreover, JDeodorant has been recently extended to also identify Extract Method [54] and Extract Class

Chapter 3. Literature Review of Recommending Move Method Refactorings

40

refactorings [55]. JMove Refactoring Plug-in JMove is another plug-in for eclipse IDE plugin used to identify feature envy code smell and refactors the smell using move method refactoring technique [14]. In this approach, first of all, the methods located in incorrect classes have been detected and then it suggests moving such methods to more suitable ones. The approach initially parses the source codes of an application and evaluates the dependency set by calculating coupling and cohesion. In this case, the calculation of dependency set is important which consists of the references of attributes, parameters, return types and method calls established by a given method m located in a class C. After that, it computes two similarity coefficients: (a) The average similarity between the set of dependencies established by m and by the remaining methods in C. (b) The average similarity between the dependencies established by m and by the methods in another class Ci . If the similarity score measured in the step (b) is greater than the similarity score measured in step (a), the technique infers that m is more similar to the methods in Ci than to the methods in its current class C. Therefore, Ci is a candidate class to receive m. JMove uses Sokal and Sneath 2 equation to measure the similarity coefficient between methods of two classes. [56, 57, 14]. Equation of Sokal and Sneath 2 Similarity Coefficient

Similarity(A, B) =

|(A ∪ B)| |(A ∩ B)| + 2 ∗ {|(A − B)| + |(B − A)|}

(3.3)

Here, A = a set of dependencies established by the method m of it’s own class C. B = a set of dependencies established by the method m other class Ci .

Chapter 3. Literature Review of Recommending Move Method Refactorings

41

In this technique, the tool accurately recommends 15.80 methods on an average having precision of 60.63% for different versions of JHotDraw source codes whereas JDeodorant detects only 10.40 with 26.47% precision [14]. However, like JDeodorant, the tool has not considered static entities in its refactoring technique resulting in threats to external validity of generalization.

inCode Refactoring Plug-in inCode is an eclipse plugin which is able to identify four kinds of design problems related to an improper distribution of intelligence among classes. Specifically these are: God Class, Data Class, Code Duplication and Feature Envy [58]. Feature envy code smells detected by inCode are only static methods. It does not detects non-static methods as the code smells those are used most in object oriented system. It does not manipulate any data of the source class of an application but it processes data of other system classes. According to object oriented design heuristics and principles, a method must be placed in the class, in which data it manipulates more. This basic heuristic is used in inCode approach to detect these methods as feature envy code smells [59]. In addition, it provides a recommendation of move method refactoring for those affected methods. Due to no access to inCode’s documentation, the approach is not understandable of how it detect only static methods as smell rather than non-static methods. The difference between JDeodorant and inCode tools is that JDeodorant detects non-static methods as feature envy smell while inCode detects only static ones. In addition, as it works and analyzes four code smells continuously on the background of eclipse coding, it might make the eclipse IDE slower. It might also cause disturbing to developers as all of the four smells are not important to them and for one smell, they have to run the plugin which continuously shows notification for all the smells.

Chapter 3. Literature Review of Recommending Move Method Refactorings

42

MethodBook Refactoring Approach To detect feature envy code smell and apply move method refactoring technique, an approach called MethodBook has been proposed by R. Oliveto et al. [15]. This approach uses Facebook as metaphor to detect and refactor the code smell. It identifies the friend methods of the target method by calculating similarity and provides recommendation to the appropriate class based on the calculation. In the implementation of MethodBook, methods and classes play a vital role. Methods’ bodies contain information about structural (e.g., method call) and conceptual relationships (e.g., similar comments) with other methods in the same class and in the other classes. It is called Relational Topic Model (RTM) that is used to identify ‘friends’ of a method based on sharable contents (variable, parameter, comment and method call). Friend is called with respect to higher similarities among the methods. The more the relationships between a method of a class and other methods in another class, the more the probability to use move method. A well-designed open source system, namely ArgoUML version 1.6 has been used for the preliminary evaluation of the MethodBook approach. It has 1,071 classes and 9,926 methods in total. For this purpose, 1,000 methods has randomly been extracted from the project. A confidence level has been used to provide reliability of the proposed refactoring. It uses the concept of information entropy which measure the amount of uncertainty of a discrete random variable. The accuracy of the MethodBook has been evaluated using two well-known Information Retrieval (IR) metrics, namely recall and precision. MethhodBook is able to identify 40% of the envied classes with precision of 95%. Additionally, 75% recall is achieved while precision is at 70%. Correlation between confidence level and MethodBook’s precision is also evaluated and it is 0.97 which indicates the MethodBook’s accuracy. In addition, the comparison performed with JDeodorant has shown that Methodbook is generally more precise than JDeodorant, provid-

Chapter 3. Literature Review of Recommending Move Method Refactorings

43

ing less suggestions to the developers of an average higher quality. However, the results also clearly have highlighted that JDeodorant is able to identify good refactoring operations and thus correct instances of feature envy that are missed by Methodbook [60]. On the other hand, it is difficult for the Methodbook process to identify envied class when a method has significant similarities with almost same number of methods of multiple classes. In that case the technique may give inefficient result. Most of the researches regarding the Move Method refactoring technique is to remove feature envy code smell and eventually improve software modularization in terms of coupling and cohesion. However, most of those researches have not considered static entities (methods and attributes) in thier approaches and have a variety of accuracy. Hence these approaches are not generalized for all entities of the application.

3.1.2

Recommendation of Move Method Refactorings

Move Method is the name of a refactoring technique to enhance code quality. Although it is generally used to refactor or remove feature envy code smell from an application, it also increases cohesion and decreases coupling, and eventually optimizes modularity of the application. Several works exist in order to find this refactoring opportunity discussed in this section. H. Liu et al. proposed an approach to identify move method refactoring opportunities on a group of methods in a single class having highest similarity and strongest relationship among those methods [16]. For instance, whenever a method m is moved from Cs (source class) to another class Cd (destination class), the approach looks for other methods within the source class that may deserve a movement to the destination class Cd based on the strength of relationship between the moved method and others. To measure the strength of the relationship between

Chapter 3. Literature Review of Recommending Move Method Refactorings

44

a pair of methods, the approach computes three metrics (i) Coupling: The coupling between two methods is computed based on the collection of their shared properties. (ii) Conceptual Correlation: Each class represents a concept in the real world and thus it should contain only such methods that are conceptually related to it. The conceptual correlation between two methods is computed based on identifiers within such methods, including method names, parameter names, and names of local variables. (iii) Similarity in Feature Envy: Feature envy code smell is one of the major reasons for moving methods which is calculated using Jaccard Distance Coefficient between a method and a class. The lower the distance, the more possibility to move the method to the class. The rational for the approach is that if two methods are strongly coupled and closely related in business logic, when one of those methods is moved, the other may deserve a movement as well. N. Tsantalis et al. has shown the qualitative analysis of the refactoring suggestions implemented in JDeodorant tool [17]. The proposed approach can be regarded as a semi-automatic approach since the designer will eventually decide whether a suggested refactoring should be applied or not based on conceptual or other design quality criteria. The refactoring process uses a metric based on two principles: • High Cohesion: The distances of the entities belonging to a class from the class itself should be the smallest possible. • Low Coupling: The distances of the entities not belonging to a class from that class should be as large as possible.

Chapter 3. Literature Review of Recommending Move Method Refactorings

45

The research revealed that the move method refactoring approach can be useful in assisting the designers to improve design quality in terms of coupling and cohesion. In another paper, F. Simon et al. has provided a visualization approach about move method refactoring using distance based cohesion metric of a system [18]. The approach has followed the design concept that methods with low distances with classes are cohesive, whereas methods with higher distances are less cohesive. Since the approach calculates cohesion between each two entities (attributes and methods) of the system, it might be time consuming calculation for large systems. In addition, it provides a visualization of the target entity along with other entities showing the geometric distances resulting in manual intervention to identify the move method refactoring opportunities. S. Kimura et al. have proposed a technique to identify the refactoring candidates by using dynamic source code analysis rather than static analysis [61]. More specifically, the technique analyzes method traces that contain method invocations during program execution. It detects irregular methods as candidates of move method based on patterns of method invocations in run time. The researchers concerned to the fact that the quality of the source code increases by moving these methods to appropriate classes as they cooperate with one another in a program execution. However, without having the method traces, that is, program execution, the technique will not work. For large software systems, computing metrics comprising thousands of classes or more, can be a time consuming task when performed on a single CPU (Central processing unit). For this reason, C. Napoli et al. has proposed a solution that computes the metrics by resorting to GPU (Graphics Processing Unit), hence greatly shortening computation time [62]. In fact, The purpose of the paper is to tackle two issues related to the suggestion of move method refactoring opportunities.

Chapter 3. Literature Review of Recommending Move Method Refactorings

46

Firstly, automatically identifying move method refactoring suggestions that improve several components in terms of cohesion metric. Moreover, the number of ways in which perfective changes can be introduced is dramatically increased in such large systems. Secondly, computing metrics should take a tiny amount of time, when a software system consists of a large number of methods, attributes, and classes. For this purpose, the approach devised a parallel algorithm runs on a GPU to compute time consuming product metrics and thus it greatly reduces, even by a factor of 50, the typical CPU computing time needed. It is stated that a GPU provides hundreds of computing cores, whereas a CPU provides only a few (typically 8). The approach has been based on CBO (Coupling Between Objects) and LCOM (Lack of Cohesion on Methods) aiming at improving modularity for large systems. Researchers regarding Recommendation of Move Method refactoring have provided various approaches in the literature. However, the goal of all the researches has a common characteristic and it is improving software design quality in terms of coupling and cohesion.

3.1.3

Detection of Feature Envy Code Smells

Although feature envy code smell detection is less helpful for developers than refactoring the smell, there exist a few works regarding only detection purpose.

HIST Approach HIST (Historical Information for Smell Detection) is an approach proposed in 2013 to detect five different code smells: Divergent Change, Shotgun Surgery, Parallel Inheritance, Blob and Feature Envy, by exploiting change history information mined from versioning systems [19].

Chapter 3. Literature Review of Recommending Move Method Refactorings

47

The feature envy can be detected solely relying on structural information, and several approaches based on static source code analysis have been proposed to detect the smell discussed in the above sections. However, the proposed HIST approach is different from those traditional source code analysis approaches. The HIST approach is based on change histories from versioning system logs and identifies methods affected by this smell as those involved in commits with methods of another class of the system more than in commits with methods of their own class. At the same time, HIST is able to compare directly to the existing source code analysis based approaches for detecting feature envy smells to assess to what extent change history data might be of some value in the detection also of these types of smells. This approach has been applied to eight software projects written in Java and wherever possible compared with existing state-of-the-art smell detectors based on source code analysis. The results indicate that HISTs precision ranges between 61% and 80%, and its recall ranges between 61% and 100%. More importantly, the results confirm that HIST is able to identify code smells that cannot be identified through approaches solely based on code analysis. However, to apply this approach, versioning system needs to be available to mine change histories of several versions. As a result, single version project suffers from the code smell. Consider another view that a feature envy may manifest itself when a method of a class tends to change more frequently with methods of other classes rather than with those of the same class. Based on such consideration, HIST approach has been updated in 2015 to detect smells based on change history information mined from versioning systems, and specifically, by analyzing co-changes occurring between source code artifacts [20].

Chapter 3. Literature Review of Recommending Move Method Refactorings

48

Machine Learning Technique F. A. Fontana et al. used various machine learning algorithms, such as - J48, Nave Bayes, JRip, Random Forest, etc. to detect four code smells including Feature Envy and experimented their approach on 74 systems [63]. They showed that J48 and JRip algorithms outperform in order to detect feature envy code smell. They also concluded that the metrics ATFD (Access to Foreign Data), FDP (Foreign Data Providers), LAA (Locality of Attribute Accesses), NOA (Number of Attributes) and NMO (Number of Methods Overridden) play significant role in the code smell detection approach.

Visualization Support in the Detection To detect several code smells such as - Feature Envy (FE), God Class (GC), and Divergent Change (DC), G. F. Carneiro et al. presented a multiple views approach that enriches four categories of code views with concern properties, namely: (i) concerns package-class-method structure, (ii) concerns inheritance-wise structure, (iii) concern dependency, and (iv) concern dependency weight [64]. The purpose of each view can be described as follows: i Concerns package-class-method structure: how a concern is realized through modularity units of a system, such as packages, classes, and methods. ii Concerns inheritance-wise structure: how a concern is dispersed through one or more inheritance trees. iii Concern dependency: how a concern affects the relationships among modules. iv Concern dependency weight: how a concern can be perceived as affecting the weight with which modules are coupled to each other.

Chapter 3. Literature Review of Recommending Move Method Refactorings

49

In another research, E. Emden et al. presented an approach for the automatic detection and visualization of several code smells including feature envy [39]. The approach performs automatic code inspection, relieving the developers of the manual inspection burden. Automatic inspection, reporting on the code’s quality and conformance to coding standards allow early (and repeated) detection of signs of project deterioration. Early feedback enables early corrections, thereby lowering the development costs and increasing the chances for success. However, the feature envy code smell detection approaches do not solve the design problem of coupling and cohesion, as they only detect the code smell. These approaches do not provide any way of removing the code smell.

3.2

Context Based Refactorings

Context is an important factor in Object Oriented Design (OOD) and adds a new dimension in the refactoring research field. Context of a class provides a concept of that class stood on SRP (Single Responsibility Principle) and hence a way of measuring cohesion and coupling of an application. Generally, Information Retrieval (IR) [51] techniques, such as - Latent Semantic Indexing (LSI) [65] are used to capture the concept (or context) of a class. Several literature on this topic are discussed in this section. The author of this thesis, M. Rahman et al. proposed an approach on the basis of dependency and contextual information [66]. This is one of the contributions of this thesis. Although the accuracy of the approach is significant, it has used only two real-life projects (three in total). Therefore, there is necessary to validate the approach with more projects and analysis. G. Bavota et al. proposed an approach in order to identify chains of strongly related methods for Extract Class refactoring [67]. The approach analyzes both the structural and semantic similarity of the methods to group highly cohesive

Chapter 3. Literature Review of Recommending Move Method Refactorings

50

methods. The identified method chains are used to define new classes with higher cohesion than the original class. In fact, the approach used the following two steps: I Method-by-Method Matrix Construction: The likelihood that method mi and method mj should be in the same class can be estimated capturing different types of relationships between the methods based on cohesion metric. Class cohesion is affected by several factors, such as - attribute references, method calls, semantic content, etc. The likelihood that two methods should be in the same class is obtained combining three different (structural and semantic) measures – (a) Structural Similarity between Methods (SSM): SSM captures the attribute references in methods, i.e., the higher the number of instance variables that two methods share, the higher the similarity between the two methods [3]. (b) Call-based Interaction between Methods (CIM): CIM represents the number of method calls by method mi to method mj with respect to the total number of incoming calls to mj to calculate cohesion between the two methods. (c) Conceptual Similarity between Methods (CSM): CSM is based on the semantic information (i.e., domain semantics) captured in the code by comments and identifiers [68]. Two methods are conceptually related if their (domain) semantics are similar, i.e. they perform conceptually similar actions or concepts. These measures capture three distinct ways in which methods relate to one another, each reflecting a different type of relationship between methods. The measures are stored in a nxn matrix, called method-by-method matrix, where n is the number of methods in the class to be refactored.

Chapter 3. Literature Review of Recommending Move Method Refactorings

51

II Method Chains Extraction: Using the information in the method-bymethod matrix, the approach extracts chains of strongly related methods. The methods of the original classes are distributed in different classes according to the extracted chains. The attributes of the original class are also distributed among the extracted classes according to how they are used by the methods in the new classes. Exploiting the extracted method chains, it is possible to obtain new classes having higher cohesion than the original class. In another paper, G. Bavote et al. also proposed a novel approach based on game theory to support extract class refactoring opportunities [69]. Given a class to be refactored, the approach models a non-cooperative game where two players contend for the methods of the original class to build two new classes with higher cohesion than the original class. This paper calculates the three measures - SSM, CIM and CSM as like the above paper. The approaches used an advanced Information Retrieval (IR) technique, namely Latent Semantic Indexing (LSI) in computing CSM that is significant in the research area. However, the approach is a semi-automated system because it takes as input a class previously identified by the software engineer as a candidate for the refactoring. G. Bavota et al. proposed another novel approach bsed on Relational Topic Models (RTM) to recommend Extract Class refactoring operations aiming at moving a class to a more suitable package to improve software modularization [70]. The RTM is computed using two factors extracted from the source code: I Semantic Information: The approach automatically analyzes the underlying latent topics (i.e., semantic information) inferred from identifiers, comments, and string literals in the source code classes. II Structural information: Besides semantic information, the approach ex-

Chapter 3. Literature Review of Recommending Move Method Refactorings

52

ploits static analysis to capture the following two types of structural information. (a) Dependencies among classes - It is employed to provide RTM with information concerning the dependencies (i.e., calls) between classes (that is the main information used for software modularization). (b) Existence of package composition - The package decomposition is used in the context of a fine-grained re-modularization to take into account the design decisions made by the developers. These two types of information are used to adjust the probability distribution taking into account structural relationships between classes, besides semantic information. Using the results of the analysis the approach is able to identify possible move class refactoring opportunities (i.e., more suitable packages for relocating a class under analysis). The integrated analysis of structural and semantic information, as modeled by the approach allows us to analyze the quality of software packages both from a conceptual (that is, responsibilities implemented in classes in different packages) and structural (that is, dependencies among classes in a package and among other packages) points of view. L. Ponisio et al. proposed a group of contextual metrics that assess the cohesion of a package based on the degree to which its classes are used together by common clients [71]. The main idea of the paper is that, if two classes of the package help to fulfill the responsibility of a common client they are conceptually related, regardless of the explicit relationships that exist between them. The following four kinds of class interactions form the basis of the group of metrics for package cohesion. I Inheritance: A class is a subclass of another. A subclass inherits behavior

Chapter 3. Literature Review of Recommending Move Method Refactorings

53

and state from its parent. (inherits dependencies). II State: A class may directly access instance variables inherited from its ancestors. (accesses dependencies). III Class Reference: a class makes an explicit (i.e., static) reference to another e.g., by instantiating the class (references dependencies). Here we only consider static relationships and not run-time interactions. IV Messages sent: Messages sent within a method of a given class cause methods of other classes to be invoked. Since these different kinds of interaction may indicate different relationships between a package and its clients, the approach considers these both separately and in combination, thus yielding a group of closely related contextual metrics. M. Gethers et al. proposed a new coupling metric for object-oriented software systems based on Relational Topic Models (RTM), generative probabilistic model, to capture latent topics in source code classes and relationships among these [72]. The use of RTM to measure coupling among source code classes is motivated by the two fact. Firstly, RTM provides a comprehensive model for describing documents (i.e., classes are represented as words from identifiers and comments). Secondly, the existence of links between documents based on underlying textual information and other knowledge of the document network. In the context of the approach, the binary link indicator, which indicates whether a link exists between two documents (i.e., classes), is used as an indicator of coupling in the pair of classes. That is, if the model identifies a link between two classes in the application with a high probability, we consider these classes to be coupled. One main benefit of the relational topic model is that it does not require knowledge of any existing links to make these predictions.

Chapter 3. Literature Review of Recommending Move Method Refactorings

54

D. Poshyvanyk et al. presented a new approach for measuring coupling in Object-Oriented (OO) software systems based on conceptual information of classes [68]. The conceptual information is discovered from identifiers and comments of different classes related to each other. This type of relationship, called conceptual coupling, is measured through the use of Information Retrieval (IR) techniques. The researchers claimed that the approach is different from existing coupling measures because traditional approaches follow structural information whereas this paper follows conceptual factors. As described above, contextual information of a class has a significance in refactorings to optimize software modularization. IR technique adds a new dimension to extract the information in the research field. However, this information has not yet used in move method refactorings to remove feature envy code smell.

3.3

Summary

As stated above, these research addresses the importance of removing feature envy code smells through move method refactorings, as it occurs due to the violation of the two vital design principles coupling and cohesion. Although various automated tools have been proposed throughout the years, all of those papers have common issue which is that these approaches are concerned about only structural information of a source application, rather than contextual information. Moreover, most of these researches only consider non-static entities (methods and attributes), and hence are not generalized for both static and non-static ones. Therefore, there is a lot more work needed in this research field to improve the move method refactoring approach.

CHAPTER

4 RECOMMENDATION OF MOVE METHOD REFACTORINGS

For recommending move method refactorings, the research considers contextual factors of an application, along with coupling and cohesion. Moreover, it incorporates both static and non-static entities (methods and attributes) through using class names of method calls (or used attributes) instead of references for the recommendations. Therefore, this thesis proposes a novel approach of recommending move method refactorings based on C3 factors - coupling, cohesion and contextual information of a software application. The inclusion of the contextual factor using Information Retrieval (IR) techniques (such as - TF, IDF, Cosine Similarity) is a new dimension in the refactoring research field. To the best of the researcher’s knowledge, no existing research has considered this significant factors to group

55

Chapter 4. Recommendation of Move Method Refactorings

56

similar methods into a class. Moreover, the approach is applicable for both static and non-static entities (methods and attributes) of the application. Method placement is one of the most important design activities in any object oriented application. Recommendation of move method refactorings plays a significant role through grouping similar behavior of methods. It is also used as a refactoring technique of feature envy code smells by placing methods into correct classes from the incorrect ones. Due to this code smell, an application will be tightly coupled and loosely cohesive reflecting poor design, and hence development and maintenance effort, time and cost will be increased. In order to reduce coupling among classes and increase cohesion in an application, feature envy code smells should be refactored automatically resulting in saving time and minimizing cost. However, existing techniques have not considered contextual information of an application which is a significant factor to group similar methods in a class. In addition, most of the techniques have used only non-static methods for the refactoring of the code smell and hence these approaches are not generalized for all types of methods (static and non-static). This thesis proposes an approach named as ‘Move Method Refactoring Using Coupling, cohesion and Contextual Similarity’ (MMRUC3 framework) for the recommendation of move method refactorings to the improve design quality not only using coupling and cohesion, but also considering contextual factors for both types of methods. In this chapter, the proposed approach of the recommendation process has been discussed briefly.

4.1

Overview of the Recommendation Approach

The overall architecture of move method refactoring recommendation approach is shown in Figure 4.1. The approach is divided into two parts. In the first part, feature envy code smells (design problems) to be detected in the software are analyzed. Then, in the second part, based on this detection, a framework called Move

Chapter 4. Recommendation of Move Method Refactorings

57

Figure 4.1: Architecture of Recommending Move Method Refactoring Approach (M M RU C3 Framework) Method Refactoring Using Coupling, Cohesion and Contextual-Similarity (MMRUC3) is devised. MMRUC3 detects the code smell in the software application, and recommends the refactoring technique to optimize software modularization in terms of coupling, cohesion and SRP (Single Responsibility Principle). The steps of MMRUC3 are – • Source Code Analysis: In order to group similar methods into a class and recommend move method refactorings, source code information of an application are required to be analyzed through parsing. The analyzed information are the basis of the approach which are used to calculate similarities

Chapter 4. Recommendation of Move Method Refactorings

58

between the target method and classes. So source code parsing to refactor the code smell can be considered as the initial and fundamental step of the approach. • Coupling & Cohesion based Similarity Calculation: The main task of this step is to calculate similarity between the target method and classes based on method calls and used attributes representing coupling and cohesion. • Contextual Similarity Calculation: In this third step, another kind of similarity is measured which is called contextual similarity. Contextual similarity refers that a method and a class are similar based on the context as the class’s responsibility is what it’s methods perform. • Move Method Refactoring Recommendation: In the fourth and final step, both similarity scores in step-II and step-III are combined together which represent the actual or total similarity between the target method and a class. Then these similarity scores are compared between the method’s current class and other classes to recommend the method’s appropriate class as the move method refactoring technique.

It is noticeable that, two types of similarity scores are measured by the approach MMRUC3 to group the similar methods into a class. The similarity measurement process are the core and vital steps of the approach. In the following sections, the MMRUC3 recommendation approach, are described in details.

4.2

MMRUC3 Recommendation Framework

The proposed framework (MMRUC3) is used to recommend move method refactorings to identify and remove feature envy code smell in any object oriented

Chapter 4. Recommendation of Move Method Refactorings

59

system. In general, it detects methods located into incorrect classes and then suggests moving such methods to more suitable ones through parsing and analyzing the source code information. More specifically, MMRUC3 technique analyzes source code information in order to calculate the set of static dependencies (dependency set) and capture contextual information established by a given method m located in class C. The dependency set consists of the references generated by the method m through method calls and used attributes, and the contextual information is gained from the method’s body excluding built-in keywords of java programming language [73]. After that, we compute two similarity coefficients using the calculated dependency sets: 1. The average similarity between the set of dependencies and contextual information established by m, and by the remaining methods in C ; and 2. The average similarity between the dependencies and contextual information established by m, and by the methods in another class Ci . If the similarity measured in the step (2) is greater than the similarity measured in (1), we infer that m is more similar to the methods in Ci than to the methods in its current class C. Therefore, Ci is a candidate class to receive m. In this MMRUC3 approach, methods implemented into incorrect classes are detected and a better suggestion to move such methods in more appropriate classes is provided. The approach has four phases shown in the previous section in Figure 4.1: I. Source code parser, II. Similarity measurement using coupling and cohesion, III. Similarity measurement using contextual information, and IV. Combine similarity scores and Compare. The whole procedure of the MMRUC3 framework is described briefly in the following subsections.

Chapter 4. Recommendation of Move Method Refactorings

4.2.1

60

Source Code Analysis

To refactor the feature envy code smell, methods and attributes in the source code of a software system are required to be analyzed, since these are the fundamental entities to measure coupling and cohesion. On the other hand, textual information from the source code are necessary to capture the context of the methods and classes. So by analyzing the source code, factors, like - method calls, used attributes and contextual information of a method are determined in order to calculate the similarity of the method with other methods of a class. So source code parsing to acquire those factors can be considered as the initial and fundamental step of the MMRUC3 framework. In the first phase of the MMRUC3 approach, information like - classes, methods, attributes, etc. of the source code are required to be analyzed in order to group similar methods into a class and recommend move method refactorings. To analyze these information, two third party parsers are used in this phase: • ByteParser A source code parser has been included in this approach in order to analyze the input source code of an application. ByteParser is an analyzer which is used to analyze java source code of .class files. It analyzes information like class name, method name, field name, method call, etc. from the source code [74]. • JavaParser Another third party parser for java application to capture source code information. It is incorporated into the approach to get the contextual factors of the application. The two parsers play significant role in the approach in order to analyze the source code of the system. First of all, classes of the source code (.java files) are

Chapter 4. Recommendation of Move Method Refactorings

61

converted to byte codes (.class files) of the corresponding classes and made those classes in text form using the following command – ”javap -c -private ClassName.class > ClassName.txt” Then those byte codes of classes are analyzed to get the methods and the classes of the source application for further analysis by using the ByteP arser. More specifically, ByteP arser is used to acquire method calls and used attributes established by each method of each classes. Such method calls and used attributes represent coupling and cohesion of the method (or in general the class). Coupling - represents method calls and used attributes of other classes of the system by the caller method (a method which invokes other methods) of a class. Cohesion - represents method calls and used attributes of the class in which the caller method belongs to. On the other hand, JavaP arser is used to attain the context of each class from .java files, so that the MMRUC3 framework is able to identify those methods performing similar responsibility. In point of fact, the contextual information ensures the SRP (Single Responsibility Principle) stated that a class stands for only one responsibility and the methods inside the class should provide functionalities to perform the responsibility. Therefore, the principle also makes the application more cohesive. However, the built-in keywords for java programming languages do not represent the context of a class. Therefore, it is more logical and appropriate that the approach excludes these keywords in order to achieve the more accurate context of the class. Although the framework has been developed for java projects, the approach is compatible for any object-oriented (OO) application. The framework not only analyzes method calls and used attributes representing coupling and cohesion, but also acquires contextual information through parsing

Chapter 4. Recommendation of Move Method Refactorings

62

source application separately. The inclusion of the contextual factor along with coupling and cohesion makes the approach more significant, meaningful and logical than existing approaches because similar methods performed similar tasks should be kept together to make the application highly cohesive and loosely coupled. As a result, the analyzed C3 three factors - Coupling, Cohesion and Contextual information are the fundamental basis of the MMRUC3 framework which are used to calculate similarities between the target method and classes in the next refactoring recommendation step. After the parsing step, similarity using the C3 factors - coupling, cohesion and contextual information will be measured to recommend move method refactorings to remove the feature envy code smell. The recommendation algorithm including the similarity coefficient measurement process is discussed in the following sub-sections. In a word, in the remainder of this section, the recommendation algorithm proposed in this paper (Subsection 4.2.2) and the two similarity calculation functions that play key role in this algorithm (Subsection 4.2.3 and Subsection 4.2.4 respectively) are described.

4.2.2

Move Method Refactorings Recommendation

Measuring similarities between methods and classes are significant in order to group similar behaviors (or methods) through the move method refactorings recommendation approach. After the source code parsing step, the MMRUC3 approach calculates the similarities using the parsed information of C3 factors: coupling, cohesion and contextual information. The proposed recommendation approach is shown in Algorithm 1. Assume, S is a system (source application) having a set of classes. m is a target method implemented in a class C of the system. For each method m ∈ C (Line 1), information of the class C and the method m acquired from the parsing step

Chapter 4. Recommendation of Move Method Refactorings

63

Algorithm 1 Recommendation of Move Method Refactorings Input: Target system S Output: A list of candidate classes 1: for each method m ∈ S do 2: C ← getClass(C) 3: T ← null 4: for each class Ci ∈ S do 5: if jaccSim(m, Ci ) + contextSim(m, Ci ) > jaccSim(m, C) + contextSim(m, C) then 6: T ← T + Ci 7: end if 8: end for 9: C 0 ← bestClass(m, T ) 10: end for are analyzed (Line 2). For each class Ci ∈ S (Line 4), the algorithm determines whether m is more similar to the methods in Ci than to the methods in its original class C (Line 5). In fact, it computes two similarity scores using the C3 factors and combines the scores. If Ci satisfies the condition of the Line 5, that is, Ci is more similar than C, then Ci will a probable candidate class to receive the method m. Such classes are inserted into a list T (Line 6) initialized as null in Line 3, as there can be multiple classes to be the candidates. Finally, the most suitable class C 0 to receive m is determined by the function bestClass(m, T ) (Line 9). The function receives the target method m and a list of candidate classes Ci as parameters. It then sorts the candidate classes according to the similarity scores of the classes and provides the most appropriate class having the highest similarity score. Thus the algorithm suggests move method refactoring in order to remove feature envy code smell from the system S and eventually optimize modularization in terms of coupling and cohesion. It is noticeable that, the recommendation approach does not follow the traditional move method recommendation approach. The traditional approach is to recommend a method to a class whose entities (methods and attributes) are used mostly by the method rather than similar behavior. On the other hand, the pro-

Chapter 4. Recommendation of Move Method Refactorings

64

posed approach is based on the concept that a method should be placed in the class such that the source method along with the methods of the class uses similar entities as well as performs similar task or responsibilities. The overall flowchart of the recommendation algorithm is shown in Figure 4.2. If the combined similarity score of the method’s present class is less than other one or more classes, then the technique detects the method as a feature envy code smell and recommend the more appropriate class having the highest similarity score.

Figure 4.2: Flow Chart of Move Method Refactorings Recommendation Approach

Chapter 4. Recommendation of Move Method Refactorings

65

The algorithm consists of two similarity functions that are the keys of the recommendation approach: • jaccSim() - This function calculates similarity based on coupling and cohesion (method calls and used attributes). • contextSim() - This function calculates similarity using contextual information (class’s and method’s body information except the application built-in keywords). The both key functions of the algorithm compute the similarities between the target method m and the methods in class C. The similarity functions incorporate both static and non-static entities and contextual information respectively that are the two main contributions of this thesis. These functions are described in the following sections.

4.3

Coupling & Cohesion based Similarity Calculation

Coupling and cohesion are the two key design factors in any object-oriented (OO) system. These design factors are essential to optimize software modularization through decreasing coupling and increasing cohesion of the system. Therefore, similarity based on coupling and cohesion plays an indispensable role in the MMRUC3 recommendation approach. In this similarity measurement step, analyzed information like - method calls and used attributes by a target method from the parsing step are applied. The main task of this phase is to calculate similarity between the target method and classes based on method calls and used attributes which represent coupling and cohesion. To measure coupling and cohesion, Jaccard Similarity Coefficient is used

Chapter 4. Recommendation of Move Method Refactorings

66

in this step (Equation 4.1) [75]. Jaccard index is a mathematical technique used for comparing similarity, dissimilarity, and distance of the data set. Measuring the Jaccard similarity coefficient between two data sets is the result of division between the number of features that are common to all divided by the number of properties as shown below.

Jaccard Similarity(A, B) =

|A ∩ B| |A ∪ B|

(4.1)

The function relies on the set of static dependencies (method calls and used attributes) established by a method m to compute its similarity with the methods in a class C, as described in Algorithm 2. Algorithm 2 Dependency based Similarity Function Input: Target method m and a class C Output: Similarity coefficient between m and C 1: similarityScore ← null 2: avgSimilarityScore ← null 3: for each method mi ∈ C do 4: if mi 6= m then 5: similarityScore ← similarityScore + getSimilarity(m, mi ) 6: end if 7: end for 8: if m ∈ C then 9: avgSimilarityScore ← similarityScore/[N OM (C) − 1] 10: else 11: avgSimilarityScore ← similarityScore/N OM (C) 12: end if 13: return avgSimilarityScore

Initially, the function computes the similarity between the target method m and each method mi except m in the class C (Line 5). It excludes the target method m by checking whether m and mi are similar or not (Line 4) because it is illogical to measure similarity with the self. The cumulated similarity score for all methods in class C are assigned into a variable named similarityScore. In the end, the similarity between m and C is defined as the arithmetic mean

Chapter 4. Recommendation of Move Method Refactorings

67

of the similarity coefficients computed and assigned the score into the variable avgSimilarityScore in the previous step (Line 9 and Line 11). Line 9 executes when the target method m belongs to the class C and Line 11 executes when m does not belong to C. In this algorithm, NOM(C) denotes the number of methods in a class C. The key function in Algorithm 2 is the getSimilarity(m, mi ), which computes the similarity between the sets of dependencies established by the two methods at Line 5. The similarity is measured by the use of the Jaccard similarity coefficient defined as Equation 4.2.

getSimilarity(m, mi ) =

|Am ∩ Ami | |Am ∪ Ami |

(4.2)

Here, Am = set of dependencies established by the target method m. Ami = set of dependencies established by other method mi .

It is note that, the dependency set consists of the class names of the references or objects used for method calls and attribute usages. This dependency set is significant in the similarity measurement function, as it uses class names rather than reference names those are used for method invocations and attribute usages. It makes the approach more general by incorporating both static and non-static entities (methods and attributes), whereas existing approaches have used reference names in the similarity calculation. Therefore, these approaches are not able to incorporate static entities because objects can not be created for using static entities in OO system. Moreover, these static entities are used directly via class names of the entities. However, the inclusion of both static and non-static entities in the recommendation approach is a significant contribution in this thesis.

Chapter 4. Recommendation of Move Method Refactorings

4.4

68

Context based Similarity Calculation

In this section, another kind of similarity is calculated called contextual similarity. Contextual similarity refers that a method and a class are similar based on the context, as the class’s responsibility (or context) is what it’s methods perform according to SRP (Single Responsibility Principle) [21]. So all methods in the class should possess the similar contextual information. The inclusion of the similarity assists to achieve similar methods performing single responsibility to be grouped into a class. Therefore, calculating similarity based on the contexts makes the recommendation approach more effective. In this phase of the approach, contextual information are gained using Information Retrieval (IR) techniques, such as - term frequency (tf ), inverse document frequency (idf ) and tf-idf measure, and similarity is calculated using the technique of Vector Space Model (VSM) - cosine similarity on the basis of these contextual information. The purposes of these techniques is illustrated below. • tf - is used to get the overall information based on the terms in a document. A method or class is referred as document in the approach. • idf - is used to discriminate each document or class based on uniqueness of a term. • tf-idf - is used to produce a composite weight for each term in each document. It helps to distinguish the responsibilities among various classes of an application. • Cosine Similarity - is used to calculate similarity score between a method and a class.

The function relies on the context that a target method and a class stand for accomplishing a specific task (or responsibility). The context is determined from

Chapter 4. Recommendation of Move Method Refactorings

69

the texts of method’s body and class’s body (referred to as documents), which is described in Algorithm 3. The algorithm takes a method m and a class C as input, calculates the contextual similarity between them and returns the calculated score as output. Algorithm 3 Contextual Similarity Function Input: Target method m and class C Output: Similarity between m and C 1: Index used fields and invoked methods in method m 2: Index declared fields and methods of class C 3: Calculate term frequency, tf (number of occurrences) of each indexed elements of m & C 4:

Calculate inverse document frequency, idf of each indexed elements of m & C

5:

Calculate tf-idf of each indexed elements of m & C using the formula: tf-idf of element, e ← (idf ∗ log2(1 + tf ))/max f requency(e)

6:

Calculate cosine similarity between tf-idf vectors of m & C using the formula: Cosine(m, C) ← (vector(m).vector(C))/(|vector(m)| ∗ |vector(C)|), Where vector(m).vector(C) = dot product P p of2 vector(m) and vector(C), (Ci ) |vector(C)| = magnitude of vector C =

7:

Return Cosine Similarity Value

At first, the body of method m is parsed to identify the used fields, declared variables and method invocations. Then, these information is tokenized to get multiple meaningful words from one word, based on white-spaces, camel-cases and pascal-cases of naming conventions. In this case Porter’s Stemming Algorithm is used [50]. After that, these information are stored in a vector. The vectors can be regarded as bags according to Bag of Words Model, since the vector do not consider the position of the words (or items). Similarly, the body of the class C is parsed to identify declared fields and declared methods, and these information are also stored in another vector. Then number of occurrences of each unique element in the vectors is calculated which is known as term frequency (tf ) (Line 3). In fact, tf is the logarithmic normalization (normalizedtft = 1 + log2 (tf )) form to ensure significance of an element. In the rest of the cases, tf refers the normalized tf. After

Chapter 4. Recommendation of Move Method Refactorings

70

calculating the term frequency, inverse document frequency (idf ) is calculated for each unique elements of the vectors. Inverse document frequency is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus. The formula to calculate idf for a term (t) in a document (D) is given in Equation 4.3.

idft,D = log2

N |{d ∈ D : t ∈ d}|

(4.3)

Where, N = Number of documents in the collection |{d ∈ D : t ∈ d}| = Number of documents where the term t occurs After calculating the idf of each unique element, tf-idf value of each element in the vectors is calculated in Line 5 using Equation 4.4, and stored in separate tf-idf vectors. Using the tf-idf vector of method m and tf-idf vector of class C, the cosine similarity between those vectors is calculated using Equation 4.5 in Line 6. Finally, the calculated cosine similarity score is returned.

tf − idft,D = tft,D ∗ idft,D

Similarity(d1, d2) =

V~ (d1).V~ (d2) |V~ (d1)||V~ (d2)|

(4.4)

(4.5)

Here, the Equation 4.5 uses Vector Space Model (VSM) of IR techniques to calculate the similarity between the target method and a class represented as two documents d1 and d2. It computes the cosine similarity of their vector representations V~ (d1) and V~ (d1), where the numerator represents the dot product (also known as the inner product) of the vectors V~ (d1) and V~ (d2), while the denominator is the product of their Euclidean lengths. The overview of the similarity measurement process is shown in Figure 4.3.

Chapter 4. Recommendation of Move Method Refactorings

71

Figure 4.3: Schematic Diagram of Contextual Similarity Measurement Process It is pointed that, the proposed approach uses similarity which is based on not only coupling and cohesion but also contextual information of a method and a class. Contexts are analyzed from method’s and class’s bodies. Therefore, the application should follow the appropriate coding conventions, such as - naming convention of methods, classes, attributes, etc. in order to get effective result of the approach.

4.5

Complexity Analysis of MMRUC3 Algorithm

The complexity of an algorithm M is the function f(n) which gives the running time and/or storage space requirement of the algorithm in terms of the size n of the input data [76]. An algorithm has mainly two types of complexity: 1. Time Complexity - how much space is needed for executing the algorithm. 2. Space Complexity - how fast the algorithm runs

Chapter 4. Recommendation of Move Method Refactorings

72

Time complexity is the concern of the refactoring research field, as the target of refactoring is to make faster the development and maintenance activities. Therefore, this section analyses only the time complexity of the MMRUC3 algorithm. The complexity is based on numbers of methods and classes of a input project. Big O notation is used to represent the complexity of the proposed MMRUC3 algorithm [77]. The notation is denoted by ‘O’ in this thesis. Assumptions for complexity calculation: mci = number of methods in a class P n= mci = total number of methods in the project P c = ci = total number of classes in the project The MMRUC3 algorithm takes each method of the project, then it combines two types of similarity functions (dependency based and contextual similarity) by mathematical addition, for each class. For each similarity function, it takes mc times for computation. Therefore, the complexity of the algorithm is: O(ncmci ) Moreover, the complexity is calculated based on the only total number of methods in the project. It is claimed that, the complexity, (O(ncmci )) of the algorithm is quadratic (n2 ). Therefore, a lemma (LEMMA 1) is derived for the complexity of the algorithm.

LEMMA 1: Let n be number of methods of a software project, c be number of classes and mci be number of methods of class ci . The complexity, O(ncmci ) of MMRUC3 approach for the project is O(n2 ). The lemma is proved by both mathematical induction and proof by cases [78],

Chapter 4. Recommendation of Move Method Refactorings

73

given below: Proof by Mathematical Induction Let P(n) be the proposition that O(ncmci ) = O(n2 ), where n = number of methods in a software project. Basis Step: P(1) is true, because LHS = O(1.1.1) = O(1) = O(12 ) = RHS [For n= 1, only one class contains the one method in the whole application. Therefore, c=1 and mci =1 ] Inductive Step For the inductive hypothesis, we assume that P(k) is true. That is, we assume that O(kcmci )=O(k 2 ). That is, O(kcmci )=O(k.k). Therefore cmci = k To carry out the inductive step using this assumption, we must show that, when we assume that P(k) is true, then P(k+1) is also true. That is, we must show that O((k + 1)cmci )=O((k + 1)2 ), assuming the inductive hypothesis is P(k). Under the assumption of P(k), we see that cmci = k + 1 Therefore, LHS = O((k + 1)cmci ) = O((k + 1)(k + 1)) = O((k + 1)2 ) = RHS

Chapter 4. Recommendation of Move Method Refactorings

74

Therefore, P(k+1) is true under the assumption of P(k) is true. This completes the inductive step. Now, we have completed the basis step and the inductive step. So, by the mathematical induction, it is said that, P(n) is true for all n (number of methods in the project).

Proof by Cases: The complexity of MMRUC3 is: O(ncmci ) (As stated above) For any project, three cases can be occurred: • Case-I: The project has only one class having all methods. In this case, c = 1, mci = n. Therefore, the complexity is : O(ncmci ) = O(n.1.n) = O(n2 ) • Case-II: Each class of the project contains only one method. In this case, c = n, mci = 1. Therefore, the complexity is : O(ncmci ) = O(n.n.1) = O(n2 ) • Case-III: Each class (ci ) may contain arbitrary number of methods (mci ). In this case, as each class has mci methods Therefore, c classes have cmci which is equal to n, total number of methods in the application.

Chapter 4. Recommendation of Move Method Refactorings

75

That is, cmci = n. Therefore, the complexity is : O(ncmci ) = O(n.n) = O(n2 ) Therefore, we can conclude from the above scenarios that, O(ncmci )=O(n2 ) for all cases, where n = total number of methods in a project. That is, the proposed MMRUC3 algorithm is quadric. Since MMRUC3 recommends for all the affected methods, for all three cases (best, worst and average), the nature of MMRUC3 algorithm is quadratic O(n2 ) on the based of number of methods (n) of a project. The quadratic nature of the algorithm shown in Figure 4.4.

Figure 4.4: Quadratic Nature of Complexity of MMRUC3

4.6

Summary

Misplacement of methods makes poor design quality of an application with high coupling and low cohesion. As a result, maintenance cost, time and effort will be increased due to the inefficient architecture. So, method placement is one of the

Chapter 4. Recommendation of Move Method Refactorings

76

most important design activities to software design quality in any object oriented application. The MMRUC3 approach proposed in this thesis recommends move method refactorings in order to remove feature envy code smells and enhance the design quality in terms of coupling and cohesion of the application. It incorporates both static and non-static entities (methods and attributes) in its recommendation process. In fact, the approach considers three factors (C3 ) - Coupling, cohesion and contextual information of the application. This considerations make the approach more significant and different from the existing approaches. Contextual similarity measured using Information Retrieval (IR) techniques, provides a new dimension in the refactoring research field to enhance the design quality in terms of coupling cohesion and Single Responsibility Principle (SRP).

CHAPTER

5 EXPERIMENTAL RESULTS AND DISCUSSION

The aim of this chapter is to experimentally evaluate the performance of MMRUC3 recommendation approach of move method refactorings. A prototype of MMRUC3 has been implemented for assessing the performance of this proposed approach and it is applied on several well-known open source projects. First of all, an evaluation is made after running it on the sample projects, and the accuracy of the MMRUC3 recommendations is calculated using the precision, recall and F-measure metrics. After that, patterns or relationships between the proposed approach and the projects are established to discover the best working scenario of the approach. Finally, a comparative result analysis is made over the widely used existing approach, JDeodorant tool (an eclipse plug-in for refactorings) [13, 79].

77

Chapter 5. Experimental Results and Discussion

78

These results indicate that the MMRUC3 outperforms over the existing one. To summarize, the effectiveness of the recommendation approach are depicted in this chapter.

5.1

Experimental Setup

This section discusses the tools used to develop the MMRUC3 prototype and experimental procedures for the evaluation task. In order to perform the experimentation, the prototype is developed using java programming language. Therefore, to develop the prototype algorithm, the following tools are used: • Eclipse Mars (4.5) [80] • ByteParser version-1.0.8 [74] • JavaParser [81]

To implement the proposed algorithm in java language, Eclipse Mars, an eclipse IDE (Integrated Development Environment) has been used. A source code parser named ByteP arser has been included in this implementation in order to analyze the input source codes of an application. Basically, ByteP arser is a parser which is used to analyze java source code of byte files (i.e., .class files). It analyzes information like class name, method name, field name, method call, etc. from the source application. In addition, JavaP arser is used to parse and analyze the contextual information from the source code. In a word, these two parsers are combined together to develop such an intelligent parser, that is able to analyze both coupling and cohesion, and context based information from the source application. The source code of MMRUC3 is available on github [82]. For the validation of the approach, eight open source java projects have been used as datasets. The projects, used by existing papers, are collected from online

Chapter 5. Experimental Results and Discussion

79

repository [83]. The descriptions of the projects are shown in Table 5.1 consisting of five columns. The columns of the table represent the project id, project name, project version, number of class (NOC), number of method (NOM), and line of code (LOC) respectively. Each project has a large number of NOC, NOM and LOC, except the last one. In fact, the last one VideoStore, used as examples of code smells and refactorings, is collected from the book Refactoring: Improving the Design of Existing Code [3]. Since, the project is standard and organized, it is selected as the dataset and case study (Chapter 7: Case Study). From the table, it is seen that W eka is the largest project, and M aven and F reeM ind are the smallest ones on the basis of NOC, NOM an LOC, except V ideoStore. Table 5.1: Experimental Projects

Id No. Project Version 1 JHotDraw 7.6 2 ArgoUML 0.34 3 JMeter 2.5.1 4 FreeMind 0.9.0 5 Maven 3.0.5 6 DrJava r583 7 Weka 3.6.9 8 VideoStore 1.0

NOC 674 1,291 940 658 647 788 1,535 7

NOM 6,533 8,077 7,990 4,885 4,888 7,156 17,851 25

LOC 80,536 67,514 94,778 52,757 65,685 89,477 272,611 212

The above mentioned eight projects are used for evaluating the MMRUC3 approach. After applying the approach on these projects, significant outcomes are found which are depicted in the next sections.

5.2

Result Analysis

The quality of a recommendation system is typically measured using Precision and Recall metrics of Information Retrieval (IR) techniques. The evaluations based on the metrics are discussed in this section. Theses metrics are calculated based on the following factors –

Chapter 5. Experimental Results and Discussion

80

• True Positive, TP - is the number of affected instances (i.e., methods) those are recommended correctly. Affected instances mean misplaced methods. • False Positive, FP - is the number of instances those are recommended incorrectly. • False Negative, FN - is the number of affected instances those are not recommended by the approach. • True Negative, TN - is the number of non-affected instances those are not recommended by the approach. These notions can be made clear by examining the contingency table (Table 5.2). Table 5.2: Contingency Table for Result Analysis of MMRUC3 Affected Methods Non-affected Methods Recommended True Positive (TP) False Positive (FP) Non-recommended False Negative (FN) True Negative (TN)

Precision Precision basically measures how well the recommender approach filters out the incorrect results. It is the fraction of the returned correct results in the overall result set. Thus, the fraction of true positive results in total returned results (true positive + false positive) is the precision.

P recision =

TP TP + FP

(5.1)

Recall Recall measures how well the recommender finds the correct results. It is the fraction of the correct returned results in the overall collection of correct results. Thus, recall takes the fraction of the true positive results with the total relevant

Chapter 5. Experimental Results and Discussion

81

results (true positive + false negative).

Recall =

TP TP + FN

(5.2)

The results of the proposed MMRUC3 approach are shown in Table 5.3. The table columns are project id, project name, TP, FP, FN, Precision and Recall. The precision and recall metrics are calculate using the Equation 5.1 and 5.2. For Table 5.3: Results of the Proposed MMRUC3 Approach Id No. Project TP # FP # 1 JHotDraw 17 52 2 ArgoUML 24 30 3 JMeter 7 36 4 FreeMind 6 36 5 Maven 14 52 6 DrJava 7 144 7 Weka 14 189 8 VideoStore 2 0 Average

FN # 3 7 3 4 9 2 10 0

Precision (%) Recall (%) 24.64 85.00 44.44 77.42 16.28 70.00 14.29 60.00 21.21 60.87 4.64 77.78 6.90 58.33 100.00 100.00 29.05 73.68

instance, Assume for JHotDraw project, T P = 17, F P = 52, and F N = 3 Therefore, precision of the project is – TP TP + FP 17 = 17 + 52

P recision =

= 0.2464 = 24.64%

Chapter 5. Experimental Results and Discussion

82

and recall is – TP TP + FN 17 = 17 + 3

Recall =

= 0.85 = 85%

It is observed that, the approach gets highest precision (100%) and recall (100%) for the V ideoStore project. The project size is very small compared to others in terms of NOC, NOM and LOC. In addition, it is a well-standard project, as it has followed proper naming conventions of OOD (Object Oriented Design) [3]. Therefore, it is easier for the MMRUC3 approach to acquire both the dependency and context based information for the recommendations. The second highest precision of 44.44% for ArgoU M L, a large project and lowest of 6.90% with the W eka project. Overall, for large projects, ArgoU M L, JHotDraw and M aven have higher precisions (above 20%); JMeter and FreeMind have moderate ones (between 10% and 19%); and finally, Weka and DrJava get lower ones (below 10%). Similarly, JHotDraw, DrJava and ArgoU M L have higher recalls (above 75%); JMeter and FreeMind have moderate ones (between 65% and 74%); and finally, M aven, FreeMind, and Weka get lower ones (below 65%). The results vary from projects to project because of the projects’ nature. It is also seen from the table that, the project DrJava gets lower precision, but moderate recall because, the project has moderate number of classes with a lots of methods and statements (Table 5.1). Similarly, Weka gets lower precision due to its exalted size. Because of the high size, there is a possibility of getting false positive results. For avoiding the biasness of small projects, VideoStore is excluded from the calculation average precision and recall of the approach. Therefore the average

Chapter 5. Experimental Results and Discussion

83

precision is 18.91% and recall is 69.91% after the exclusion. The subsequent sections of result analysis in this chapter also excludes the project. As each project has been developed following different coding standards (likenaming conventions, coding size, etc.) and the approach deals with contexts, so the results depend on the projects’ standards. These variations in the results imply that there exist relationships between the approach and the projects. These relationships are discussed in the next section.

5.3

Relationships between MMRUC3 and the Projects

In the previous section, it is shown that the accuracy of MMRUC3 varies on the basis of the project natures (coding standards and sizes). Therefore, there exist some patterns or relationships between the approach and the projects. These relationships are established in this section.

5.3.1

Relationship based on Project Standards

An important code quality aspect of large scale software development is conformance to coding standards. Coding standards ensure that everyone in a company can understand the codes and work with each others. If conformance is not achieved, that is, if the code is not written and organized according to the programming guidelines, it becomes much harder for a large team of programmers to develop, integrate, and maintain a particular piece of software and find errors [39, 84, 85]. Therefore, concise and consistent naming conventions improve readability and comprehensibility of the software application [85]. In order to discover the relationship between the MMRUC3 approach and the project on the basis of the coding standard, it is necessary to define the project

Chapter 5. Experimental Results and Discussion

84

standard. As the approach considers contextual factors, so project standard is defined by the naming conventions in the project. The more the project is readable, comprehensible and self-descriptive, the higher the project is standard. The categories (and priority values) of project standard are given below: • 1 for Excellent: All classes of a project are followed exact naming convention with easily understandable and self-descriptive. For instance, the name of a method, calculateSalary is comprehensible and self-descriptive. • 2 for Best: Most of the classes of a project contain completed and appropriate naming, but a few names are not self-descriptive. • 3 for Better: Almost all classes of a project are followed by appropriate name conventions. However, a few names should be completed to easily understand such as calSalary should be calculateSalary. • 4 for Good: Naming convention is followed properly. However, some names are confusing to understand such as cSalary. In order to categorize the standard of the projects, it is necessary to analyze the source code of each project. For this, two students of MSSE (Master of Science in Software Engineering) at Institute of Information Technology, University of Dhaka, Bangladesh, are selected. They were not familiar with the seven source projects (shown in Table 5.1) and worked independently to prioritize and categorize the projects based on the naming conventions (or standards). They manually analyzed the projects by choosing several classes randomly from each package of a project. Thus, two category sets are generated independently by the two students. Finally, merging the two sets, they generated the only one set of category for the projects. However, in the case of ambiguity of their combined findings, when merging the two sets into one, they consulted with each other and provided the

Chapter 5. Experimental Results and Discussion

85

final list. Their analysis for categorization is shown in Table 5.4 and the category list is shown in Table 5.5. Table 5.4: Categorization of Project Standards Project

Standard Standard Category Value

1.JHotDraw

Excellent

1

2.ArgoUML

Exellent

1

3.JMeter

Best

2

4.FreeMind

Better

3

5.Maven

Better

3

6.DrJava

Good

4

7.Weka

Good

4

Justification Almost all classes are followed exact naming convention with easily understandable and self-descriptive. Almost all classes are followed exact naming convention with easily understandable and self-descriptive. Most of the classes contain completed and appropriate naming but a few names are not self-descriptive. Many classes are followed by appropriate name conventions. However, a few names should be completed to easily understand such as getM emLoad should be getM emoryLoad. Many of the classes and methods followed naming conventions. However, some are nonstandard and hard to understand by name for example, add2() and mergeP luginContainer P lugins() methods. Method naming conventions are followed many cases, but variable naming conventions are not maintained properly. For example, active scroll, interpreterResetF ailed, etc. Naming convention is followed properly. However, some names are confusing to understand, such as m linearN ormN orm, m linearN ormOrig, etc.

Table 5.5: Categories of the Source Projects

Category Project Id Project 1. Excellent {1, 2} {JHotDraw, ArgoUML} 2. Best {3} {JMeter} 3. Better {4, 5} {FreeMind, Maven} 4. Good {6, 7} {DrJava, Weka}

After the standard for each project is generated, patterns between project stan-

Chapter 5. Experimental Results and Discussion

86

dards and the two metrics - precisions and recalls of MMRUC3 are shown as a graph in Figure 5.1. The graph provides significant findings that– As project standards decrease, both the two metric values decrease. Therefore, the results are dependent on how well the project is written or developed, because the MMRUC3 approach considers the contextual information of the project for recommendations.

Figure 5.1: Coding Standards versus Precisions and Recalls of MMRUC3 (The numbers shown in the graphs represent project id(s))

5.3.2

Relationship based on Project Sizes

This section establishes the relationships between the proposed MMRUC3 approach and project sizes. More specifically, there exist few patterns of precisions and recalls with project sizes - NOCs (number of classes), NOMs (number of methods), and LOCs (line of codes). Figure 5.2 shows the relationships between precisions of MMRUC3 and project sizes (NOCs, NOMs, and LOCs) as graphs. The Figure 5.2(a) shows no relation-

Chapter 5. Experimental Results and Discussion

87

ship between NOCs of a project and precisions of the approach because the approach is dependent on project standards (i.e., how well a class is written) rather than number of classes of project. In normal sense, it can be thought that, there is a possibility of increasing false positive results as NOCs increase, and hence precisions should be lower. However, Figure 5.2(b) indicates that, NOCs have no impact on the proposed MMRUC3 and therefore, it can be said that it is a balanced approach. So the finding of the graph is – “There is no relationship between precisions of MMRUC3 and NOCs (number of classes) of a project, and so the approach is balanced.” The graph of Figure 5.2(b) and 5.2(c) indicate that it is hardly found any pattern of precisions, though the trends are downwards when NOMs and LOCs increase. If it is investigated more critically for projects (project id - 2, 4, and 6) having almost same NOMs, and projects (project id - 1, 4, and 6), it is found that precision decreases on the basis of project standards. These are another significant findings of this thesis. So in a word the finding is – “If NOM (number of method) and LOC (line of code) values are almost in the same range for a set of projects, then precision decreases on the basis of project standards (Naming Convention). That is, for those projects, precision decreases as standard deteriorates.” The graphs of Figure 5.3 indicate that the trends of recalls decrease as values of NOC, NOM and LOC increase, though it is hardly found any patterns of recalls of the projects. The main finding of the graph is like – “If NOC (number of class), NOM (number of method) and LOC (line of code) values are almost in the same range for a set of projects, then recall decreases on the basis of project standards (Naming Convention). That is, for those projects, recall decreases as standard deteriorates.”

Chapter 5. Experimental Results and Discussion

88

(a) NOCs versus Precisions

(b) NOMs versus Precisions

(c) LOCs versus Precisions

Figure 5.2: Patterns of Precisions of MMRUC3 as Sizes Increase(The numbers shown in the graphs represent project id(s))

Chapter 5. Experimental Results and Discussion

89

(a) NOCs versus Recalls

(b) NOMs versus Recalls

(c) LOCs versus Recalls

Figure 5.3: Patterns of recalls of MMRUC3 as Sizes Increase (The numbers shown in the graphs represent project id(s))

Chapter 5. Experimental Results and Discussion

5.4

90

Comparative Result Analysis

The comparative results between the proposed approach and the widely-used refactoring tool, JDeodorant (an eclipse plugin) have been shown in this section. The comparative analysis shows the significant contributions of the proposed approach over other technique. The comparative results are shown in Table 5.6. Table 5.6: Comparative Result Analysis (Precisions & Recalls)

Precision (%) Recall (%) No. Project MMRUC3 JDeodorant MMRUC3 JDeodorant 1 JHotDraw 24.64 21.05 85.00 51.00 2 ArgoUML 44.44 37.5 77.42 56.25 3 JMeter 16.28 12.82 70.00 60.00 4 FreeMind 14.29 10.45 60.00 58.33 5 Maven 21.21 18.84 60.87 45.83 6 DrJava 4.64 4.25 77.78 72.22 7 Weka 6.90 6.57 58.33 64.52 Average 18.91 15.93 69.91 58.31 The table shows that, the proposed MMRUC3 approach gets higher precisions and recalls than the JDeodorant tool for each of the projects, except for W eka. The Weka is a huge project having a lot of interactions and lack of comprehension, and so the approach provides more false positive results for the project. However, as MMRUC3 deals with contextual information along with coupling and cohesion, it provides lesser false positive results and higher true positive results. Because methods should be grouped together on the basis of the context or responsibility they perform. To perform the responsibility, a class holding the methods should be highly cohesive and loosely coupled, and hence The approach also gets higher precisions and recalls over the exiting ones. On the other hand, JDeodorant does not consider the contextual factors in the refactoring recommendation process. The results of comparative analysis in terms of precisions and recalls, are graphically shown in Figure 5.4 and 5.5 respectively. These analytic results show the significant evidence that the proposed MMRUC3 approach is effective in the rec-

Chapter 5. Experimental Results and Discussion

91

ommendation of move method refactorings which eventually optimizes coupling and cohesion of the projects.

Figure 5.4: Comparison of Precisions

Figure 5.5: Comparison of Recalls However, the Precision-Recall (PR) curves are used in Information Retrieval (IR) for evaluating the performance of an algorithm [86, 87]. As the datasets, used

Chapter 5. Experimental Results and Discussion

92

in this thesis for validating the proposed approach, are highly skewed, PrecisionRecall (PR) curves give a more informative picture of an algorithm’s performance [88]. Therefore, from the Precision-Recall (PR) curves in Figure 5.6, it is seen that, almost all cases the MMRUC3 algorithm shows better performance than the JDeodorant.

Figure 5.6: Precision-Recall (PR) curves for both MMRUC3 and JDeodorant

Balanced F-score F-measure is a single measure that trades off precision versus recall. It is the weighted harmonic mean of the precision and recall. If precision and recall are equally weighted, balanced F-score (F1 − score) is found which is the default F-measure. F1 − score or F-measure is calculated as the accuracy of the recommendation. F1 − score = 2 ∗

P recision ∗ Recall P recision + Recall

(5.3)

Therefore, F-measure is calculated using precision and recall for both of the techniques using Equation 5.3. For example, the calculation of average F-measures using average precision and

Chapter 5. Experimental Results and Discussion

93

recall are given below:

FM M RU C3 = 2 ∗

18.91 ∗ 69.91 18.91 + 69.91

= 29.77%

FJDeodorant = 2 ∗

15.93 ∗ 58.31 15.93 + 58.31

= 25.02%

The result of comparative analysis in terms of F-measures, is shown in Table 5.7 as well as graphically represented in Figure 5.7.This comparative finding show the overall improvement (on an average 4.75%) of the proposed technique over the existing one. Hence, it shows the significant evidence that the proposed MMRUC3 approach is effective in the recommendation of move method refactorings which eventually optimizes coupling and cohesion of the projects. Table 5.7: Comparative Result Analysis (F-measures)

F-measure (%) No. Project MMRUC3 JDeodorant 1 JHotDraw 38.20 29.80 2 ArgoUML 56.47 45.00 3 JMeter 26.42 21.13 4 Maven 31.46 26.70 5 Weka 12.33 11.93 6 FreeMind 23.08 17.72 7 DrJava 8.75 8.03 Average 29.77 25.02 The result shows the evidence that, the inclusion of the C3 factors - coupling, cohesion and contextual information, have made a significant improvement of the proposed MMRUC3 approach. The inclusion of the contextual factor in

Chapter 5. Experimental Results and Discussion

94

Figure 5.7: Comparison of F-measures the recommendation process, is one of the most significant contributions of the thesis. Moreover, while the existing techniques considered only non static entities (methods and attributes), consideration of both static and non-static entities have made a meaningful improvement of the refactoring process. Finally, the findings of this chapter assist software engineers in making their maintenance task easier and cost-effective.

Comparison with Other Approach MethodBook is another approach for suggesting move method refactoring opportunities, proposed by R. Oliveto et al. in 2011 [15]. The approach, based on Relational Topic Model (RTM), uses variables, parameters, and comments for identifying friends (similar methods) of a target method, and it suggests the class having highest number of friends for the target method as the refactoring. The evaluation of the approach is based on the oracle of only the extracted misplaced methods for ArgoUML project, rather than all the methods of the project.

Chapter 5. Experimental Results and Discussion

95

Note that, evaluation based on customized oracles does not refer actual accuracy of the approach. Rather, the evaluation justifies the approach if oracle is the set of whole methods of a project, as this paper evaluates in the previous sections. However, for fair comparison between the MMRUC3 approach, proposed in this thesis, and MethodBook, a oracle is generated of only the extracted misplaced methods for the same project. The comparative results based on precisions, recalls and F-measures are shown in Table 5.8 and Figure 5.8. Table 5.8: Comparison between MMRUC3 and MethodBook for ArgoUML Project

Approaches Precision (%) Recall (%) F-measure (%) MMRUC3 86 89 87 MethodBook 70 75 72

Figure 5.8: Comparison between MMRUC3 and MethodBook for ArgoUML Project The table and figure show that, the MMRUC3 approach outperforms over the MethodBook technique in terms of precisions, recalls and F-measures for recommending the refactorings. An approach, proposed by F. A. Fontana et al. in 2016, have used machine

Chapter 5. Experimental Results and Discussion

96

learning techniques, such as - Decision Tree, Random Forest, Nave Bayes, etc. for only detecting the feature envy code smells along with several other smells [63]. In fact, it generates some rules for the detections. For generating the training dataset, they have to manually generated positive and negative instances and finally the approach classify code smells on testing dataset. Therefore, without the training dataset, the approach is unable to detect the code smells and is highly dependent on the dataset, and eventually the generated rules vary from source projects to project. Moreover, this approach does not provide any recommendation for refactorings. On the other hand, the MMRUC3 approach, proposed in this thesis, does not rely on training dataset and the detection rules does not vary based on the projects. Not only the this approach detects the feature envy code smells, but also it recommends move method refactorings. Note that, only detection of the code smell does not help developers too much, because they have to manually identify the appropriate places for the smelly methods, which requires a huge time and effort. Therefore, recommendations of the more appropriate places for the smelly methods by the MMRUC3 approach make developers’ life easier in their development and mostly maintenance activities.

5.5

Discussion on Results

The precision, recall and F-measure show evidence that, the MMRUC3 approach outperforms over the existing well-known technique, JDeodorant (a refactoring tool), for almost all the eight open source projects. This section justifies and summarizes the over all reasons behind the obtained results of the proposed MMRUC3 recommendation approach. The reason behind the performance the MMRUC3 approach is that, it considers contextual information in the recommendation process of move method refactorings. The contextual factor is the basis of the approach as methods should be

Chapter 5. Experimental Results and Discussion

97

grouped together based on the responsibility they perform according to SRP (Single Responsibility Principle). This responsibility is acquired through analyzing the context of a class. This consideration makes the approach novel and different from tradition existing approaches. The categorization of the projects based on standardization (i.e., naming convention) assists to analyze another significant findings of the experimental results. The results based on the project standards show that, there are relationships or patterns between the approach and project nature. Therefore, inclusion of the contextual factor is important in the recommendation process. Another factor that MMRUC3 considers is that, it incorporates both static and non-static entities (methods and attributes), whereas the existing techniques only considers the non-static ones. The approach uses class names of method calls and used attributes instead of reference names, in the coupling and cohesion based similarity measurement process. Therefore, the results are better than the existing approach. The final finding is that, the MMRUC3 approach deteriorates if the project size is significantly large (like- Weka project). This is because, large project size means a lot of responsibilities, and there is a possibility of having overlapped the responsibilities. Therefore, it is difficult for the MMRUC3 approach to distinguish the ambiguous contexts of each responsibility. This paper also compares its approach with MethodBook technique based on different setting of data for ArgoUML project, and shows the better performances over the MethodBook in terms of precisions, recalls and F-measures. The is because, the ArgoUML is a well-standard project and the MMRUC3 approach performs better on higher standard projects. From the results and findings in this chapter, it is said that, contextual in-

Chapter 5. Experimental Results and Discussion

98

formation, along with coupling and cohesion, is an important factor for grouping similar methods into classes. This factor is essential to distinguish the classes from each other, as similarities rely on methods’ behaviors and similar methods’ behaviors contain similar textual information. So, similar methods should remain in a class, and therefore, high cohesive and low coupling are achieved.

5.6

Summary

The chapter has described the results of the proposed recommendation approach of move method refactorings. The proposed approach has been applied on three projects to analyze the effectiveness of the approach. The results indicates that this is a balanced approach as MMRUC3 depends on coding standards of OOD (object oriented design) and hardly depends on project size. Mos of the projects in real life are not giant in size and most often software engineers follow coding standard. Therefore, the findings of this chapter help them to develop new refactoring tools.

CHAPTER

6 CASE STUDY ON A SAMPLE PROJECT: “VIDEOSTORE”

In this chapter, a detail walk through on the proposed recommendation approach is given. This will give an insight to the move method refactorings recommendation process. The move method refactorings recommender, named as MMRUC3, is proposed in the previous chapter. The process of recommendation includes source code parsing of a software application or project, calculation of similarity scores using three factors (C3) - coupling, cohesion and contextual information between the target method and the classes, comparing the similarity scores, and finally providing suitable recommendations. This chapter provides a step-by-step analysis of the recommendation process. The MMRUC3 approach not only recommends move method refactorings, but also detects and removes feature envy code smells

99

Chapter 6. Case Study

100

to optimize software modularization. A project VideoStore has been chosen here as the case study. This is because, the project possesses all three factors required to calculate similarities between methods and classes, as mentioned in Chapter 4.

6.1

About Project VideoStore

The project, V ideoStore [3] is a well-known example of refactoring usage. It is easy to comprehend the project quickly as it is well-designed. That is why, the project is selected for the case study of the MMRUC3 recommender framework. The scenario of the project is as follows:

Figure 6.1: UML Diagram of V ideoStore Project (Refactored Version)

The sample project is very simple to understand as it is developed following

Chapter 6. Case Study

101

refactoring techniques, such as - Replace Type Conditional with Polymorphism, Extract Method, and Move Method. It is a program that calculates and outputs a statement of a customer’s charges at a video store. The project is told which movies a customer rented and for how long time. It then calculates the charges depending on how long time the movie is rented, and identifies the type movie. There are three kinds of movies at the video store: regular movie, children’s movie, and new release movie. In addition to calculating charges, the statement also computes frequent renter points, which vary depending on whether the movie is a new release. The UML (Unified Modeling Language) diagram of the project is shown in Figure 6.1. The V ideoStore project consists of seven classes: M ovie, P rice, RegularP rice, ChildrenP rice, N ewReleaseP rice, Rental, and Customer. Each class has a responsibility to perform based on the renting movie types. The responsibilities of each class is described below: • Class M ovie The responsibility of the class is to determine the movie type and the price for renting the movie by a customer. In addition, it also counts point for a specific movie type. • Class P rice This is an abstract class developed using the polymorphism property of Object Oriented (OO) approach. The main functionality of the class is to calculate the price for each type of movies (Regular, Children, and New Release movie) based on the number of rented days. To perform the function, this super class has three subclasses (RegularP rice, ChildrenP rice, and N ewReleaseP rice) based on movie types. It also holds the inheritance relationship between the super and sub classes.  Sub-class RegularP rice

Chapter 6. Case Study

102

It inherits the class P rice and calculates the price for regular type movies.  Sub-class ChildrenP rice It inherits the class P rice and calculates the price for children type movies.  Sub-class N ewReleaseP rice It inherits the class P rice and calculates the price for new release type movies. It also performs calculating points for renting the movies. • Class Rental Customers use the class for renting movies of any of the types. Based on the customer’s requirements, this class calculates renting information like movie type and price, using the M ovie class. • Class Customer This is a class for customers by which renting information is viewed after a rent request is made. In other word, it is a client class. The project is well-standard and self-descriptive as it follows OO naming conventions and State Design Pattern [89]. The pattern allows an object to alter its behavior when its internal state changes, and therefore the object will appear to change its class. The State pattern in this project describes how M ovie class can exhibit different behavior in each P rice state based on movie types and and its renting duration. The gain of using the pattern is that, if one changes any of the price’s behavior, like - add new prices, or add extra price-dependent behavior, the changes will be much easier to make. The rest of the application does not know about the use of the state pattern. That is why, the project is used as the case study in this chapter so that it will be easier to visualize the approach properly. However, since the project is already refactored, it is easier for developers to meet adaptive, corrective or perfective changes in the maintenance phase of

Chapter 6. Case Study

103

(a) Feature Envy Code Smell Injected for Method getRentalCharge()

(b) Feature Envy Code Smell Injected for Method getPrice(int days)

Figure 6.2: UML Diagram of V ideoStore Project (Non-Refactored Version)

Chapter 6. Case Study

104

Software Development Life Cycle (SDLC). However, for the step-by-step procedure of the MMRUC3 framework proposed in this thesis report, the V ideoStore project is modified by extracting getRentedCharge() (red colored method in the figure) method from Rental class to M ovie class, and getPrice(int days) method from P rice to M ovie class, that is, injecting feature envy code smells manually, shown in Figure 6.2 (a) and (b) respectively. Moreover, placing the methods into incorrect classes increases interactions (high coupling) shown by red arrows in the figure and responsibilities by the classes (low cohesion). As a result, the proposed approach can be applied in the project to recommend move method refactorings to optimize these design factors.

6.2

Recommending Move Method refactorings for VideoStore

In this section, for every feature envy code smell, the MMRUC3 approach is executed on the modified version of V ideoStore project. After parsing the source classes of the project, similarities are calculated between methods and classes using the analyzed information. Based on the similarity scores, move method refactorings are recommended to optimize the project modularization. For the purpose of simplicity, Customer class and sub-classes of P rice class are out of the computation, because the first class, used only by the clients, is developed for different context and the sub-classes’ context is covered by the super class. The main phases of the case study are described briefly in this section, are as following: 1. Parsing & Analyzing V ideoStore Source Project 2. Similarity Coefficient Measurement 3. Recommendation of Move Method Refactorings

Chapter 6. Case Study

6.2.1

105

Parsing & Analyzing VideoStore Project

Parsing Source Code As mentioned in Chapter 5, analyzing source code information through parsing is the initial and significant step of the MMRUC3 approach. The more efficiently information is parsed, the more accurate result is generated. In order to acquire significant information, several preprocessings are made in the V ideoStore project before analyzing the source information. First of all, classes of the source classes (.java files) of the project are compiled to get the byte codes (.class files) of the corresponding classes. The .class files are not in the readable form, and therefore these files are converted to text forms (.txt files) using the following command – ”javap -c -private ClassName.class > ClassName.txt”

(a) Source Code Example (Class: Movie Portion)

(b) Byte Code Example (Class: Movie Portion)

Figure 6.3: Examples of Java and Byte Code

Figure 6.3(a) and 6.3(b) show the source code portion of Movie.java and the byte code portion (Movie.class) of that class after conversion respectively. After conversion to the byte code classes from the base source code, those byte codes and base source codes have been considered now in the form to be parsed.

Chapter 6. Case Study

106

In a word, the both .class files (or .txt files) and .java files are used for parsing the source information from the V ideoStore project. An intelligent parser, made from the combination of the ByteP arser [74] and JavaP arser [81], is used to parse the .class and .java files. Those files have been then parsed to get all the classes - Customer,Rental, Movie, Price, ChildrenPrice, NewReleasePrice and RegularPrice, and all methods in each class of the project.

Analyzing Information of the Project The parsed information from the source files are used for further analysis to acquire more specific information for the recommending process. Coupling and cohesion based information, like - method calls and used attributes, and contextual information, extracted from method’s and class’s body, are acquired through analyzing the parsed classes’ and methods’ information. From Figure 6.2(a), it is seen that, the method getRentedCharge is placed into the inaccurate class M ovie from the correct one Rental. Therefore, coupling increases of the application. The codes of these classes are shown in Figure 6.4. After parsing, information are acquired from method invocations, attribute usages, and context of the source codes. The information are based on dependency and context of the source classes of the project.

Chapter 6. Case Study

107

(a) Class Rental

(b) Class M ovie

Figure 6.4: Source Code Examples of V ideoStore (Target Smelly Method getRentedCharge())

Chapter 6. Case Study

108

Dependency based Information The body of the target method getRentedCharge() is analyzed to gain dependency based information. This information is called Dependency Set of the method. Here formal method notation [90], Set is used because it does not hold redundant information. The Dependency Set contains the class names by which the method uses features (other methods and attributes) – DependencySetgetRentedCharge() = { M ovie, Rental } Similarly, dependency sets for the system classes M ovie excluding the target method (owner class of the target method), Rental and P rice are captured. Class level dependencies are calculated based on all methods inside a class. The dependency sets are represented as follows: DependencySetM ove = { M ovie, P rice } DependencySetRental = { M ovie } DependencySetP rice = { } Contextual Information Another significant information is analyzed from the context of the target method getRentedCharge(). This contextual information is called Contextual Bag of the method. Here formal method notation [90], Bag is used because it can hold redundant information and do not consider the position of words according to Bag of Words model [51]. The Contextual Bag contains words of the method’s body including its name, – ContextualBaggetRentedCharge() = [[get, rented, charge, movie, get, movie, charge, rental, days, rented]] Similarly, contextual bags for the system classes M ovie excluding the target method (owner class of the target method), Rental and P rice are captured. The contextual bags are represented as follows:

Chapter 6. Case Study

109

ContextualBagM ovie = [[movie, children, regular, new, release, title, price, movie, title, price, code, title, title, set, price, code, price, code, get, movie, charge, days, price, get, charge, days, get, f requent, renter, points, days, rented, price, get, f requent, renter, points, days, rented, get, rented, charge, movie, get, movie, charge, rental, days, rented]] ContextualBagRental = [[rental, movie, movie, days, rented, rental, movie, movie, days, rented, movie, movie, days, rented, days, rented, get, f requent, renter, points, movie, get, f requent, renter, points, days, rented, movie, get, movie, movie]] ContextualBagP rice = [[price, get, price, code, get, charge, days, rented, days, get, f requent, renter, points, rented]] Note that, those contextual information are excluded the java programming language keywords and stemmed into multiple words according to camel case, pascal case, white spaces, etc. for each word in the method. These bag of words are converted to lowercase letters for removing ambiguity. For stemming the words, Porter’s Stemming algorithm is used [51]. The both information, dependency set and contextual bag are the basis in order to calculate similarities between the target method and other system classes of the recommendation approach.

6.2.2

Similarity Coefficient Measurement

After analyzing the method and class level information of the V ideoStore project, the MMRUC3 framework measures two types of similarity: Coupling & Cohesion based Similarity and context based similarity those are described below.

Coupling & Cohesion based Similarity The coupling and cohesion based similarity is calculated using the dependency sets of the target method and all classes of the project. Here Jaccard Similarity

Chapter 6. Case Study

110

Coefficient formula (Equation 4.1) is used. |DependencySetm ∩ DependencySetM ovie | |DependencySetm ∪ DependencySetM ovie | | { M ovie } | = | { M ovie, Rental, P rice } | 1 = = 0.33 3

JS1(m, M ovie) =

JS2(m, Rental) =

|DependencySetm ∩ DependencySetRental | |DependencySetm ∪ DependencySetRental | | { M ovie } | = | { M ovie, Rental } | 1 = = 0.5 2

Here, m = Method getRentedCharge JS1 = Jaccard similarity score for the first class M ovie JS2 = Jaccard similarity score for the second class Rental

Contextual Similarity The context based similarity is calculated using the contextual bags (also known as documents in Information Retrieval ) of the target method and all classes of the project. Here, Cosine Similarity formula is used as in Equation 4.3. To measure cosine similarity, the documents are considered as vectors according to Vector Space Model (VSM), and so the length of the two documents should be same [51]. To do that, the terms of the method are considered as the vector dimensions. The step by step similarity calculation process is shown in Table 6.1, 6.2, and 6.3 for the target method and the three classes (Movie, Rental, Price) of the projects, respectively. Meaning of the Symbols used in Table 6.1, 6.2, 6.3, 6.4 and 6.5: m = The target method getRentedCharge()

Chapter 6. Case Study

111

Table 6.1: Mathematically Cosine Similarity Calculation between method getRentedCharge and class M ovie Term ti

idfi log2 (N/dfti )

get rented charge movie rental days

0 0 0 0.58 0.58 0

Dm for m = ‘getRentedCharge()’ Normalized tfi tf tf − idfi tfi 1 + log2 (ti ) 1 1 0 6 2 2 0 4 2 2 0 4 2 2 1.16 5 1 1 0.58 1 1 1 0 5

D1 for C1 = ‘Movie’ Normalized tfi tf − idfi 1 + log2 (ti ) 3.58 0 3 0 3 0 3.32 1.74 1 0.58 3.32 0

Similarity Score (m, C1 )

0.9899

Table 6.2: Mathematically Cosine Similarity Calculation between method getRentedCharge and class Rental Term ti

idfi log2 (N/dfti )

get rented charge movie rental days

0 0 0 0.58 0.58 0

Dm for m = ‘getRentedCharge()’ D2 for C2 = ‘Rental’ Normalized tfi Normalized tfi tf tf − idfi tfi tf − idfi 1 + log2 (ti ) 1 + log2 (ti ) 1 1 0 3 2.58 0 2 2 0 5 3.32 0 2 2 0 0 0 0 2 2 1.16 10 4.32 2.66 1 1 0.58 1 1 0.58 1 1 0 5 3.32 0

Similarity Score (m, C2 )

0.969

Table 6.3: Mathematically Cosine Similarity Calculation between method getRentedCharge and class P rice Term ti

idfi log2 (N/dfti )

get rented charge days

0 0 0 0

Dm for m = ‘getRentedCharge()’ Normalized tfi tf tf-idf tfi 1+log2(ti) 1 1 0 3 2 2 0 1 2 2 0 1 1 1 0 2

D3 for C3 = ‘Price’ Normalized tfi tf − idfi 1+log2(ti) 2.58 2.58 1 0 1 0 1 0

Similarity Score (m, C3 )

0

i = { 1, 2, 3, ... } C1 = Class M ovie, C2 = Class Rental, C3 = Class P rice Di = Document for class Ci , Dm = Document for method m tf − idfi = tfi ∗ idfi JSi = Jaccard similarity score between m and Ci CSi = Cosine similarity score between m and Ci T Si = Total similarity score between m and Ci Combining the Two Types of Similarity In this step, the two similarities are combined by mathematical addition in order to gain the more accurate and total similarity scores between the target method and

Chapter 6. Case Study

112

the classes. The overall similarity scores between the method getRentedCharge() and all the classes of the project is shown in Table 6.4. Table 6.4: Similarity Scores for All Classes for Method getRentedCharge()

Jaccard Similarity JSi JS1 0.333 JS2 0.5 JS3 0

Contextual Similarity CSi CS1 0.989 CS2 0.969 CS3 0

Total Similarity Scores TSi = (JSi + CSi) TS1 1.332 TS2 1.469 TS3 0

Table 6.5: Similarity Scores for All Classes for Method getP rice()

Jaccard Similarity JSi JS1 0 JS2 0 JS3 0

Contextual Similarity CSi CS1 0.945 CS2 0 CS3 1

Total Similarity Scores TSi = (JSi + CSi) TS1 0.945 TS2 0 TS3 1

The similarity scores between the smelly method getP rice() (shown in Figure 6.2(b)) and all the three classes of V ideoStore project is shown in the Table 6.5 using the same process as described above in this section.

6.2.3

Recommendation of Move Method Refactorings

From Table 6.4, it is seen that, TS2, that is, the total similarity score between the method getRentedCharge() and class Rental, gets the highest combined similarity score, and eventually is higher than the class M ovie in which the method is kept. As a result, the method is placed in the wrong class and hence it is detected as a feature envy code smell. Therefore, comparing the scores, the MMRUC3 framework recommends the move method refactoring for the method, that is, the method should be moved to class Rental from its current class M ovie. The final recommendation of move method refactorings for the two feature envy code smelly methods (Figure 6.2) of the V ideoStore project is shown in Table 6.6. These refactorings make the project to achieve the original refactored project as

Chapter 6. Case Study

113

Table 6.6: Recommendation of Move Method Refactorings Smelly Method getRentedCharge() getPrice(int days)

Current Class Recommended Class Movie Rental Movie Price

shown in Figure 6.1 which ensure loose coupling and high cohesion. It is noticed that, CS2 is less than CS1 whereas JS2 is greater than JS1. But, after combining the two similarity scores TS2 gets better score than TS1. From this findings, it is stated that, only the cosine similarity (Context based) can not perform the recommendation correctly. On the other hand, for another injected feature envy code smell (Figure 6.2(b)), it is shown that only the Jaccard similarity (Coupling and Cohesion based) can not perform the recommendation appropriately which is shown in Table 6.5. Therefore, the final statement of these findings is that, some methods can be suggested the refactorings using coupling and cohesion based similarity whereas contextual similarity fails and vice versa. However, combining the two similarities shows more accuracy of the MMRUC3 approach.

6.3

Summary

The case study shows the complete process of recommending move method refactorings. It shows that how the similarity process works in the MMRUC3 recommendation approach proposed in this thesis. In addition, it shows the effectiveness of the two types of similarities - coupling & cohesion based and context based similarity, in this recommendation approach.

CHAPTER

7 CONCLUSION AND FUTURE DIRECTION

Placement of methods plays significant role in object oriented design (OOD) through optimizing coupling and cohesion. So, a method should be placed in an appropriate class in any object oriented application. Move method is an important refactoring technique used to move method into correct class classes from the incorrect ones. As a result, coupling will be decreased and cohesion will be increased which make the application more maintainable. Therefore, the MMRUC3 recommendation approach of move method refactorings, proposed in this thesis, helps software engineers to achieve the design quality and ease their maintenance activities. In this chapter, the document is concluded by providing a discussion of the benefits of the approaches along with the future direction. 114

Chapter 7. Conclusion and Future Direction

7.1

115

Conclusion

Coupling and cohesion are the two key factors considered during the designing phase of a software application. Since developers only need to focus on coupled classes to meet a change requirements, the application should be loosely coupled and highly cohesive to make maintenance task easier with lower effort, cost and time. Feature envy code smell is a barrier to achieve this goal of maintenance task as it increases coupling and decreases cohesion. The proposed MMRUC3 approach plays a significant role to refactor the code smell by recommending move method refactorings automatically. As a result, it helps to optimize the design quality of the application in terms of coupling and cohesion. Moreover, it assists software engineers in their maintenance activities in lesser effort, time and cost. The MMRUC3 approach, relied on the both dependency and contextual based information for both static and non-static entities (method and attributes) improves the recommendation accuracy with better precision and recall than competitive refactoring tool. In addition, the approach enriches software modularization through optimizing interactions among the components of the application. Therefore, low coupling and high cohesion are achieved. The incorporation of C3 factors - coupling, cohesion and contextual information in this thesis, assists the researchers and software engineering practitioners to develop more new refactoring tools, approaches and ideas.

7.1.1

Guidelines to Use MMRUC3 Approach

The proposed MMRUC3 approach uses both dependency (coupling and cohesion) and context based information to recommend move method refactoring. To get the effective output of the approach, the following guidelines are necessary to be met in the source projects: • Since the MMRUC3 approach uses contextual information based on IR tech-

Chapter 7. Conclusion and Future Direction

116

nique, the project should be written according to the object oriented (OO) naming conventions, like - camel case or pascal case formation. For example, a method name should be calculateSalary, rather than calculatesalary. • The names of the entities should be in readable-form and self-descriptive, so that the approach will be able to distinguish each class effectively based on their contexts. For example, calculateSalary is more understandable name for a method than calSalary. • As context is an important factor of the approach, the project written in non-readable form might not get the effective results. For example, name of an entity c does not provide any contextual information, and hence, the approach does not provide accurate output for this type of projects.

7.1.2

Threats to Validity

The internal and external threats to validity of this study are discussed in this section.

Threats to Internal Validity Subjectiveness in the categorization of project standards is inevitable due to the large manual efforts involved in the experimental study. In addition, there also might be human errors in analyzing coding standard (naming conventions) of the source projects. These threats are mitigated by independently double-checking all manual work. It is ensured that the results are individually verified and agreed upon by two MSSE (Master of Science in Software Engineering) students. These threats could be further reduced by involving third-party people who have experiences on coding standards to verify the results.

Threats to External Validity An external validity threat is that, the relationships between the accuracy of

Chapter 7. Conclusion and Future Direction

117

MMRUC3, and coding project and project sizes are established on the basis of seven open-source projects. The application of more projects could minimize this threats, but there are lacking of projects related to the refactorings.

7.2

Future Direction

Our solutions and algorithms can be extended in a few interesting directions. • In this thesis, both the coupling and cohesion based, and contextual similarity factors have the same priority in the similarity measurement process of the MMRUC3 approach. The future plan is to set priority on the two similarity factors in order to analyze the effectiveness of the two factors. • As the contextual factor of an application provides useful information, the plan is to utilize this factor in other refactoring techniques, such as - extract method, extract class, etc., those are dependent on contexts of components of the application/

APPENDIX

A PRICE AND MOVIE CLASS OF VIDEOSTORE

PROJECT

The class files for the smelly method getP rice() which is extracted from class P rice to class M ovie is shown in the following figure.

118

Appendix A. Price and Movie class of VideoStore Project

119

(a) Class P rice

(b) Class M ovie

Figure A.1: getP rice())

Source Code Examples of V ideoStore (Target Smelly Method

BIBLIOGRAPHY

[1] B. Du Bois, S. Demeyer, and J. Verelst, “Refactoring-improving coupling and cohesion of existing code,” in Reverse Engineering, 2004. Proceedings. 11th Working Conference on, pp. 144–151, IEEE, 2004. [2] N. Fenton and J. Bieman, Software metrics: a rigorous and practical approach. CRC Press, 2014. [3] F. Martin, B. Kent, and B. John, “Refactoring: improving the design of existing code,” Refactoring: Improving the Design of Existing Code, 1999. [4] S. R. Chidamber and C. F. Kemerer, “A metrics suite for object oriented design,” IEEE Transactions on software engineering, vol. 20, no. 6, pp. 476– 493, 1994. [5] R. Moser, P. Abrahamsson, W. Pedrycz, A. Sillitti, and G. Succi, “A case study on the impact of refactoring on quality and productivity in an agile team,” in Balancing Agility and Formalism in Software Engineering, pp. 252– 266, Springer, 2008. [6] D. Sjoberg, A. Yamashita, B. C. D. Anda, A. Mockus, T. Dyba, et al., “Quantifying the effect of code smells on maintenance effort,” Software Engineering, IEEE Transactions on, vol. 39, no. 8, pp. 1144–1156, 2013. [7] S. Sharma and S. Srinivasan, “A review of coupling and cohesion metrics in object oriented environment,” International Journal of Computer Science & Engineering Technology (IJCSET), vol. 4, no. 8, 2013. [8] A. Yamashita and L. Moonen, “To what extent can maintenance problems be predicted by code smell detection?–an empirical study,” Information and Software Technology, vol. 55, no. 12, pp. 2223–2242, 2013.

120

Bibliography

121

[9] A. Yamashita and L. Moonen, “Exploring the impact of inter-smell relations on software maintainability: An empirical study,” in Software Engineering (ICSE), 2013 35th International Conference on, pp. 682–691, IEEE, 2013. [10] J. Al Dallal, “Incorporating transitive relations in low-level design-based class cohesion measurement,” Software: Practice and Experience, vol. 43, no. 6, pp. 685–704, 2013. [11] L. Rising and F. W. Calliss, “Problems with determining package cohesion and coupling,” Software: Practice and Experience, vol. 22, no. 7, pp. 553–571, 1992. [12] S. Demeyer, S. Ducasse, and O. Nierstrasz, Object-oriented reengineering patterns. Elsevier, 2002. [13] M. Fokaefs, N. Tsantalis, and A. Chatzigeorgiou, “Jdeodorant: Identification and removal of feature envy bad smells.,” in ICSM, pp. 519–520, 2007. [14] V. Sales, R. Terra, L. F. Miranda, and M. T. Valente, “Recommending move method refactorings using dependency sets.,” in WCRE, vol. 20, p. 13, 2013. [15] R. Oliveto, M. Gethers, G. Bavota, D. Poshyvanyk, and A. De Lucia, “Identifying method friendships to remove the feature envy bad smell (nier track),” in Proceedings of the 33rd International Conference on Software Engineering, pp. 820–823, ACM, 2011. [16] H. Liu, Y. Wu, W. Liu, Q. Liu, and C. Li, “Domino effect: Move more methods once a method is moved,” in Software Analysis, Evolution, and Reengineering (SANER), 2016 IEEE 23rd International Conference on, vol. 1, pp. 1–12, IEEE, 2016. [17] N. Tsantalis and A. Chatzigeorgiou, “Identification of move method refactoring opportunities,” IEEE Transactions on Software Engineering, vol. 35, no. 3, pp. 347–367, 2009. [18] F. Simon, F. Steinbruckner, and C. Lewerentz, “Metrics based refactoring,” in Software Maintenance and Reengineering, 2001. Fifth European Conference on, pp. 30–38, IEEE, 2001. [19] F. Palomba, G. Bavota, M. Di Penta, R. Oliveto, A. De Lucia, and D. Poshyvanyk, “Detecting bad smells in source code using change history information,” in Automated software engineering (ASE), 2013 IEEE/ACM 28th international conference on, pp. 268–278, IEEE, 2013. [20] F. Palomba, G. Bavota, M. Di Penta, R. Oliveto, D. Poshyvanyk, and A. De Lucia, “Mining version histories for detecting code smells,” IEEE Transactions on Software Engineering, vol. 41, no. 5, pp. 462–489, 2015. [21] R. C. Martin, Agile software development: principles, patterns, and practices. Prentice Hall PTR, 2003.

Bibliography

122

[22] R. Subramanyam and M. S. Krishnan, “Empirical analysis of ck metrics for object-oriented design complexity: Implications for software defects,” IEEE Transactions on software engineering, vol. 29, no. 4, pp. 297–310, 2003. [23] R. Moser, A. Sillitti, P. Abrahamsson, and G. Succi, “Does refactoring improve reusability?,” in International Conference on Software Reuse, pp. 287– 297, Springer, 2006. [24] T. Mens and T. Tourw´e, “A survey of software refactoring,” Software Engineering, IEEE Transactions on, vol. 30, no. 2, pp. 126–139, 2004. [25] “Refactoring catelog.” https://refactoring.com/. Online; Accessed 10 April, 2017. [26] S. G. Eick, T. L. Graves, A. F. Karr, J. S. Marron, and A. Mockus, “Does code decay? assessing the evidence from change management data,” IEEE Transactions on Software Engineering, vol. 27, no. 1, pp. 1–12, 2001. [27] O. Chaparro, G. Bavota, A. Marcus, and M. Di Penta, “On the impact of refactoring operations on code quality metrics,” in Software Maintenance and Evolution (ICSME), 2014 IEEE International Conference on, pp. 456–460, IEEE, 2014. [28] D. Binkley, M. Ceccato, M. Harman, F. Ricca, and P. Tonella, “Automated refactoring of object oriented code into aspects,” in Software Maintenance, 2005. ICSM’05. Proceedings of the 21st IEEE International Conference on, pp. 27–36, IEEE, 2005. [29] R. L. Glass, “Editor’s corner we have lost our way,” Journal of Systems and Software, vol. 18, no. 2, pp. 111–112, 1992. [30] T. A. Standish, “An essay on software reuse,” IEEE Transactions on Software Engineering, no. 5, pp. 494–497, 1984. [31] M. Alshayeb, “Empirical investigation of refactoring effect on software quality,” Information and software technology, vol. 51, no. 9, pp. 1319–1326, 2009. [32] W. E. Wong and V. Debroy, “A survey of software fault localization,” Department of Computer Science, University of Texas at Dallas, Tech. Rep. UTDCS-45, vol. 9, 2009. [33] G. Bavota, B. De Carluccio, A. De Lucia, M. Di Penta, R. Oliveto, and O. Strollo, “When does a refactoring induce bugs? an empirical study,” in Source Code Analysis and Manipulation (SCAM), 2012 IEEE 12th International Working Conference on, pp. 104–113, IEEE, 2012. [34] E. Hieatt and R. Mee, “Going faster: Testing the web application,” IEEE software, vol. 19, no. 2, pp. 60–65, 2002. [35] R. D. Banker and S. A. Slaughter, “A field study of scale economies in software maintenance,” Management science, vol. 43, no. 12, pp. 1709–1725, 1997.

Bibliography

123

[36] K. Stroggylos and D. Spinellis, “Refactoring–does it improve software quality?,” in Proceedings of the 5th International Workshop on Software Quality, p. 10, IEEE Computer Society, 2007. [37] K. Beck, Extreme programming explained: embrace change. addison-wesley professional, 2000. [38] J. Kerievsky, “Refactoring to patterns, 2004.” [39] E. Van Emden and L. Moonen, “Java quality assurance by detecting code smells,” in Reverse Engineering, 2002. Proceedings. Ninth Working Conference on, pp. 97–106, IEEE, 2002. [40] F. A. Fontana, P. Braione, and M. Zanoni, “Automatic detection of bad smells in code: An experimental assessment.,” Journal of Object Technology, vol. 11, no. 2, pp. 5–1, 2012. [41] “Code smells.” https://blog.codinghorror.com/code-smells/. Online; Accessed 10 April, 2017. [42] M. Kaur and R. Kaur, “Improving the design of cohesion and coupling metrics for aspect oriented software development,” International Journal of Computer Science and Mobile Computing, vol. 4, no. 5, pp. 99–106, 2015. [43] R. Wirfs-Brock, B. Wilkerson, and L. Wiener, “Designing object-oriented software,” 1990. [44] J. Eder, G. Kappel, and M. Schrefl, “Coupling and cohesion in object-oriented systems,” tech. rep., Citeseer, 1994. [45] W. P. Stevens, G. J. Myers, and L. L. Constantine, “Structured design,” IBM Systems Journal, vol. 13, no. 2, pp. 115–139, 1974. [46] J. Al Dallal, “Empirical analysis of the relation between object-oriented class lack-of-cohesion and coupling,” 2013. [47] L. C. Briand, J. W. Daly, and J. K. Wust, “A unified framework for coupling measurement in object-oriented systems,” IEEE Transactions on software Engineering, vol. 25, no. 1, pp. 91–121, 1999. [48] D. Boshnakoska and A. Miˇsev, “Correlation between object-oriented metrics and refactoring,” in International Conference on ICT Innovations, pp. 226– 235, Springer, 2010. [49] J. Al Dallal and L. C. Briand, “A precise method-method interaction-based cohesion metric for object-oriented classes,” ACM Transactions on Software Engineering and Methodology (TOSEM), vol. 21, no. 2, p. 8, 2012. [50] C. D. Manning, P. Raghavan, H. Sch¨ utze, et al., Introduction to information retrieval, vol. 1. Cambridge university press Cambridge, 2008.

Bibliography

124

[51] R. Baeza-Yates, B. Ribeiro-Neto, et al., Modern information retrieval, vol. 463. ACM press New York, 1999. [52] P. Melville and V. Sindhwani, “Recommender systems,” in Encyclopedia of machine learning, pp. 829–838, Springer, 2011. [53] N. Tsantalis, T. Chaikalis, and A. Chatzigeorgiou, “Jdeodorant: Identification and removal of type-checking bad smells,” in Software Maintenance and Reengineering, 2008. CSMR 2008. 12th European Conference on, pp. 329– 331, IEEE, 2008. [54] N. Tsantalis and A. Chatzigeorgiou, “Identification of extract method refactoring opportunities for the decomposition of methods,” Journal of Systems and Software, vol. 84, no. 10, pp. 1757–1782, 2011. [55] M. Fokaefs, N. Tsantalis, E. Stroulia, and A. Chatzigeorgiou, “Identification and application of extract class refactorings in object-oriented systems,” Journal of Systems and Software, vol. 85, no. 10, pp. 2241–2260, 2012. [56] P. H. A. Sneath and R. R. Sokal, Principles of numerical taxonomy. WH Freeman San Francisco, 1963. [57] P. H. Sneath, R. R. Sokal, et al., Numerical taxonomy. The principles and practice of numerical classification. 1973. [58] R. Marinescu, G. Ganea, and I. Verebi, “incode: Continuous quality assessment and improvement,” in Software Maintenance and Reengineering (CSMR), 2010 14th European Conference on, pp. 274–275, IEEE, 2010. [59] A. Hamid, M. Ilyas, M. Hummayun, and A. Nawaz, “A comparative study on code smell detection tools,” International Journal of Advanced Science and Technology, vol. 60, pp. 25–32, 2013. [60] F. Palomba, A. De Lucia, G. Bavota, and R. Oliveto, “Anti-pattern detection: Methods, challenges, and open issues.,” Advances in Computers, vol. 95, pp. 201–238, 2015. [61] S. Kimura, Y. Higo, H. Igaki, and S. Kusumoto, “Move code refactoring with dynamic analysis,” in Software Maintenance (ICSM), 2012 28th IEEE International Conference on, pp. 575–578, IEEE, 2012. [62] C. Napoli, G. Pappalardo, and E. Tramontana, “Using modularity metrics to assist move method refactoring of large systems,” in Complex, Intelligent, and Software Intensive Systems (CISIS), 2013 Seventh International Conference on, pp. 529–534, IEEE, 2013. [63] F. A. Fontana, M. V. M¨antyl¨a, M. Zanoni, and A. Marino, “Comparing and experimenting machine learning techniques for code smell detection,” Empirical Software Engineering, vol. 21, no. 3, pp. 1143–1191, 2016.

Bibliography

125

[64] G. d. F. Carneiro, M. Silva, L. Mara, E. Figueiredo, C. Sant’Anna, A. Garcia, and M. Mendonca, “Identifying code smells with multiple concern views,” in Software Engineering (SBES), 2010 Brazilian Symposium on, pp. 128–137, IEEE, 2010. [65] S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, “Indexing by latent semantic analysis,” Journal of the American society for information science, vol. 41, no. 6, p. 391, 1990. [66] M. M. Rahman, R. R. Riyadh, and M. R. Rahman, “Recommendation of move method refactorings using coupling, cohesion and contextual similarity,” in Imaging, Vision & Pattern Recognition (icIVPR), 2017 IEEE International Conference on, pp. 1–6, IEEE, 2017. [67] G. Bavota, A. De Lucia, A. Marcus, and R. Oliveto, “A two-step technique for extract class refactoring,” in Proceedings of the IEEE/ACM international conference on Automated software engineering, pp. 151–154, ACM, 2010. [68] D. Poshyvanyk, A. Marcus, R. Ferenc, and T. Gyim´othy, “Using information retrieval based coupling measures for impact analysis,” Empirical software engineering, vol. 14, no. 1, pp. 5–32, 2009. [69] G. Bavota, R. Oliveto, A. De Lucia, G. Antoniol, and Y.-G. Gueheneuc, “Playing with refactoring: Identifying extract class opportunities through game theory,” in Software Maintenance (ICSM), 2010 IEEE International Conference on, pp. 1–5, IEEE, 2010. [70] G. Bavota, M. Gethers, R. Oliveto, D. Poshyvanyk, and A. d. Lucia, “Improving software modularization via automated analysis of latent topics and dependencies,” ACM Transactions on Software Engineering and Methodology (TOSEM), vol. 23, no. 1, p. 4, 2014. [71] L. Ponisio and O. Nierstrasz, “Using contextual information to assess package cohesion,” tech. rep., Citeseer, 2006. [72] M. Gethers and D. Poshyvanyk, “Using relational topic models to capture coupling among classes in object-oriented software systems,” in Software Maintenance (ICSM), 2010 IEEE International Conference on, pp. 1–10, IEEE, 2010. [73] C. S. Horstmann and G. Cornell, Core Java 2: Volume I, Fundamentals. Pearson Education, 2002. [74] “Byteparser.” https://github.com/rifatbit0401/ByteParser. accessed 10 November, 2016.

Online;

[75] S. Niwattanakul, J. Singthongchai, E. Naenudorn, and S. Wanapu, “Using of jaccard coefficient for keywords similarity,” in Proceedings of the International MultiConference of Engineers and Computer Scientists, vol. 1, p. 6, 2013.

Bibliography

126

[76] C. H. Papadimitriou, Computational complexity. John Wiley and Sons Ltd., 2003. [77] R. O. Duda, P. E. Hart, D. G. Stork, et al., Pattern classification, vol. 2. Wiley New York, 1973. [78] K. H. Rosen, “Discrete mathematics and its applications,” AMC, vol. 10, p. 12, 2007. [79] “Jdeodorant — eclipse plugins.” https://marketplace.eclipse.org/ content/jdeodorant. Online; accessed 12 April, 2017. [80] “Mars eclipse.” http://www.eclipse.org/mars. Online; accessed 14 October, 2016. [81] “Javaparser - home.” http://javaparser.org/. Online; accessed 12 January, 2017. [82] “masudshrabon/ thesisonfecodesmell / source/ bitbucket.” https:// bitbucket.org/masudshrabon/thesisonfecodesmell/src. Online; accessed 19 Decemberr, 2015. [83] “Jmovw — applied software engineering research group.” http://aserg. labsoft.dcc.ufmg.br/jmove/. Online; accessed 20 March, 2017. [84] S. Butler, M. Wermelinger, Y. Yu, and H. Sharp, “Relating identifier naming flaws and code quality: An empirical study,” in Reverse Engineering, 2009. WCRE’09. 16th Working Conference on, pp. 31–35, IEEE, 2009. [85] F. Deißenbock and M. Pizka, “Concise and consistent naming [software system identifier naming],” in Program Comprehension, 2005. IWPC 2005. Proceedings. 13th International Workshop on, pp. 97–106, IEEE, 2005. [86] C. D. Manning, H. Sch¨ utze, et al., Foundations of statistical natural language processing, vol. 999. MIT Press, 1999. [87] D. Hull, “Using statistical testing in the evaluation of retrieval experiments,” in Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 329–338, ACM, 1993. [88] J. Davis and M. Goadrich, “The relationship between precision-recall and roc curves,” in Proceedings of the 23rd international conference on Machine learning, pp. 233–240, ACM, 2006. [89] E. Gamma, Design patterns: elements of reusable object-oriented software. Pearson Education India, 1995. [90] J. Woodcock and J. Davies, Using Z: specification, refinement, and proof, vol. 39. Prentice Hall Englewood Cliffs, 1996.

Suggest Documents