Classifying Research on UML model inconsistencies ...

Master Thesis Software Engineering Thesis no: MSE-2013:126 March 2013

Classifying Research on UML model inconsistencies with Systematic Mapping

Pavan Kumar Thalanki Vinay Kiran Maddukuri

School of Computing School of Computing Blekinge Institute of Technology Blekinge Institute of Technology SE-371 79 Karlskrona SE-371 79 Karlskrona Sweden Sweden

This thesis is submitted to the School of Engineering at Blekinge Institute of Technology in partial fulfillment of the requirements for the degree of Master of Science in Software Engineering. The thesis is equivalent to 30 weeks of full time studies.

Contact Information: Authors: Pavan Kumar Thalanki E-mail: [email protected] Vinay Kiran Maddukuri E-mail: [email protected]

University advisor(s): Dr. Ludwik Kuzniarz School of Computing (COM), BTH

School of Computing Blekinge Institute of Technology SE-371 79 Karlskrona Sweden

Internet Phone Fax

: www.bth.se/com : +46 455 38 50 00 : +46 455 38 50 57

ABSTRACT Context: Unified Modeling Language (UML) is a universal and standard modeling language that has been extensively used in software development process. Due to overlap and synchronous nature among different modeling artefacts in UML, several consistency issues have been identified in many software development projects that may lead to project failure. To reduce the level of such threat, over the past decade, a substantial research addressing those problems has been done both in academic and industry. This study is intended to investigate the reported research and to provide a systematic picture on different researched aspects of UML model inconsistencies, using the systematic mapping method. Objectives: The overall goal was to be achieved by fulfilling the following two main objectives: elaborating a proper and justified tool for performing the mapping and later used the tool in order to obtain a systematic and multidimensional picture of the approaches and the performed research in the area relating to different issues considering inconsistencies when using UML in software development. Research Methods: In order to ensure quality of the final foreseen systematic picture of the conducted research, a considerable effort was put first on a preparation of the tool that was used to obtain the mapping. The tool was a rigorous process based on classification methods and mapping guidelines obtained from a systematic literature review on the systematic mapping in software engineering. Then the tool was applied in a systematic way to obtain a number of mappings, followed by the analysis of the obtained results. Results: The systematic literature review resulted in identifying 5 mapping guidelines, 21 classifications, and 2 categorization methods. After analysis of them, a justified mapping process was developed by selecting standard guidelines, appropriate classifications and categorization methods. The mapping process applied for the period of 1999-2012 revealed 198 relevant studies developed by 321 researchers. On the basis this evidences, a number of mappings illustrating the conducted research on UML model inconsistencies ware obtained. The mapping reviled that the published research is mostly focused on rather formal issues such as semantic, syntactic, intramodel, inter-model and evolution problems, while a less attention is placed on more practical on time, and security problems. When the quality of research is concerned, 38% of papers proposed solutions as well as validated them through academic, industry or both, 35% of papers proposed only solutions. When the usage of empirical methods is considered, case studies are most frequently used (in almost half of the relevant papers) and followed by experiments (reported in 15% of papers), while 25% carried works do on report a systematic method used. Conclusions: The findings of systematic mapping study revealed that there are some aspect related to consistency such as time and security that are not given big attention. Identification and in-depth studying of inconsistencies in UML designs along with their dependencies are also missing. Most of the investigations are also academic with no evidence whether these reports produce interest for industry or not. State-of-the-art followed by state-of-the-practice studies related to consistency checking techniques and validating them in real industrial setting could be recommended.

Keywords: UML, Consistency Problems, Systematic Mapping, Systematic Literature Review, modeling.

i

ACKNOWLEDGEMENT We sincerely express gratitude to our parents for their support in motivation, encouragement and forbearance during our master’s period at BTH. We would also thank our supervisor Dr. Ludwik Kuzniarz for his support, advices and feedback throughout work. Special Thanks to – Sushmitha Thalanki Eva Norling Chaitanya Gurram Vakkalanka Sai Ram Srikar Madhira

ii

TABLE OF CONTENTS ABSTRACT ...........................................................................................................................................I ACKNOWLEDGEMENT .................................................................................................................. II LIST OF TABLES ................................................................................................................................ 1 LIST OF FIGURES .............................................................................................................................. 3 1

INTRODUCTION ....................................................................................................................... 4

2

RESEARCH DESIGN ................................................................................................................. 7 2.1 BACKGROUND AND RELATED WORK ON UML MODEL INCONSISTENCIES ............................. 7 2.2 BACKGROUND AND RELATED WORK ON SYSTEMATIC MAPPING METHOD ............................ 9 2.3 RESEARCH PROBLEM ........................................................................................................... 11 2.3.1 Aims and Objectives........................................................................................................ 12 2.3.2 Research Questions......................................................................................................... 12 2.4 RESEARCH METHODOLOGY ................................................................................................. 13

3

SYSTEMATIC LITERATURE REVIEW .............................................................................. 16 3.1 PHASE1: PLANNING REVIEW ................................................................................................ 17 3.1.1 Step1: Need for review .................................................................................................... 17 3.1.2 Step2: Define research scope.......................................................................................... 17 3.1.3 Step3: Developing a review protocol .............................................................................. 18 3.1.4 Step4: Evaluating a review protocol ............................................................................... 20 3.2 PHASE2: CONDUCTING THE REVIEW..................................................................................... 20 3.2.1 Step5: Data retrieval process ......................................................................................... 20 3.2.2 Step6: Study Selection Process ....................................................................................... 22 3.2.3 Step7: Quality Assessment Criteria and Data Extraction Strategy ................................ 25 3.2.4 Step9: Data analysis ....................................................................................................... 27 3.3 PHASE3: REPORTING THE REVIEW ........................................................................................ 30 3.3.1 Step10: Results................................................................................................................ 30 3.3.2 Reflection and Discussion ............................................................................................... 42 3.4 VALIDITY THREATS FOR SLR ............................................................................................... 45 3.4.1 Identification of data regarding ‘classification’ ............................................................. 45 3.4.2 Publication bias .............................................................................................................. 45 3.4.3 Reliability of collected data ............................................................................................ 45 3.4.4 Judging inclusion/exclusion of primary studies .............................................................. 46 3.4.5 Search strings ................................................................................................................. 46 3.4.6 Data extraction process .................................................................................................. 46

4

DEVELOPING SYSTEMATIC MAPPING PROCESS ........................................................ 47 4.1 REASONS FOR ADAPTING THE SYSTEMATIC MAPPING PROCESS ............................................ 47 4.2 OVERVIEW OF SYSTEMATIC MAPPING PROCESS .................................................................... 48 4.2.1 Step-1: Definition of research questions ......................................................................... 49 4.2.2 Step-2: Conduct search for primary studies ................................................................... 49 4.2.3 Step-3: Study selection process ....................................................................................... 50 4.2.4 Step-4: Classification scheme ......................................................................................... 50 4.2.5 Step-5: Data extraction and Mapping ............................................................................ 50

5

SYSTEMATIC MAPPING ....................................................................................................... 51 5.1 DEFINITION OF RESEARCH QUESTIONS ................................................................................. 51 5.2 CONDUCT SEARCH FOR PRIMARY STUDIES ........................................................................... 51 5.2.1 Stage-1: Automated search ............................................................................................. 51 5.2.2 Stage-2: Manual search .................................................................................................. 54 5.2.3 Stage-3: Snowball Sampling ........................................................................................... 55 5.3 STUDY SELECTION PROCESS ................................................................................................. 55 5.4 CLASSIFICATION SCHEME..................................................................................................... 59 5.4.1 Keywording process ........................................................................................................ 59 5.5 DATA EXTRACTION .............................................................................................................. 62 5.6 SYSTEMATIC MAP AND ANALYSIS ........................................................................................ 64

iii

5.6.1 Rigor and Relevance ....................................................................................................... 65 5.6.2 Across Years ................................................................................................................... 66 5.6.3 Publication fora .............................................................................................................. 68 5.6.4 Most Active Researchers ................................................................................................. 71 5.6.5 Research Type ................................................................................................................. 75 5.6.6 Research Method Type.................................................................................................... 78 5.6.7 Contribution Type ........................................................................................................... 81 5.6.8 UML model inconsistencies ............................................................................................ 85 5.6.9 Comparison between classifications of UML inconsistencies ........................................ 87 5.6.10 Analysis on additional data ........................................................................................ 89 5.7 DISCUSSION ......................................................................................................................... 90 5.7.1 General observations on our mapping results ................................................................ 90 5.7.2 Consistency problems ..................................................................................................... 92 5.8 VALIDITY THREATS .............................................................................................................. 94 5.8.1 Publication bias .............................................................................................................. 94 5.8.2 Reliability of results ........................................................................................................ 94 5.8.3 Classification .................................................................................................................. 94 5.8.4 Study Selection and Data Extraction .............................................................................. 95 6

CONCLUSION .......................................................................................................................... 96 6.1

FUTURE WORK ................................................................................................................... 100

7

REFERENCES......................................................................................................................... 101

8

APPENDIX ............................................................................................................................... 106 8.1 REFERENCES FOR SLR PRIMARY STUDIES ......................................................................... 106 8.2 REFERENCES FOR SMS PRIMARY STUDIES ......................................................................... 110 8.3 GLOSSARY TERMS .............................................................................................................. 125 8.4 SYSTEMATIC LITERATURE REVIEW.................................................................................... 125 8.4.1 Snowball sampling process: ......................................................................................... 126 8.5 SYSTEMATIC MAPPING ...................................................................................................... 135 8.5.1 List of Conference Venues: ........................................................................................... 135 8.5.2 List of Workshops: ........................................................................................................ 137 8.5.3 List of Journal Venues: ................................................................................................. 137 8.5.4 Mapping Study – Relevance–Rigor results: .................................................................. 138 8.5.5 Contributions on UML model inconsistencies: ............................................................. 141 8.5.6 Reliability of inclusion decisions: ................................................................................. 146 8.5.7 Overview of systematic mapping .................................................................................. 147 8.5.8 List of classifications used in SMSs .............................................................................. 155

iv

LIST OF TABLES TABLE 2.1: SYSTEMATIC MAPPING VS. SYSTEMATIC LITERATURE REVIEW _____________________ 10 TABLE 3.1: SEARCH STRING FOR MINI LITERATURE SEARCH ________________________________ 17 TABLE 3.2: PICOC CRITERIA __________________________________________________________ 18 TABLE 3.3: SEARCH TERMS COLLECTION ________________________________________________ 18 TABLE 3.4: DATA SOURCES ___________________________________________________________ 19 TABLE 3.5: BASIC INCLUSION/EXCLUSION CRITERIA _______________________________________ 19 TABLE 3.6: DETAILED INCLUSION/EXCLUSION CRITERIA ____________________________________ 19 TABLE 3.7: INITIAL SEARCH STRING ____________________________________________________ 21 TABLE 3.8: FINAL KEYWORDS AND SEARCH STRING _______________________________________ 22 TABLE 3.9: KAPPA AGREEMENT VALUES [27] ____________________________________________ 24 TABLE 3.10: AGREEMENT VALUES FOR 20 SAMPLE PAPERS _________________________________ 24 TABLE 3.11: QUALITY ASSESSMENT CHECKLIST ___________________________________________ 25 TABLE 3.12: QUALITY AND THEIR DESCRIPTION __________________________________________ 25 TABLE 3.13: QUALITY ASSESSMENT RESULTS ____________________________________________ 25 TABLE 3.14: CLUSTERING PRIMARY STUDIES ACCORDING TO QUALITY________________________ 26 TABLE 3.15: DATA EXTRACTION FORM FOR PRIMARY STUDIES ______________________________ 27 TABLE 3.16: CATEGORIZING EXTRACTED DATA INTO CODES ________________________________ 28 TABLE 3.17: TRANSLATING CODES INTO THEMES _________________________________________ 29 TABLE 3.18: SUMMARY ABOUT MAPPING GUIDELINES ____________________________________ 35 TABLE 3.19: SUMMARY OF CLASSIFICATION SCHEMES ____________________________________ 37 TABLE 3.20: OVERVIEW OF CLASSIFICATION FACETS ______________________________________ 37 TABLE 3.21: CRITERIA APPLIED ON IDENTIFIED SYSTEMATIC MAPPING GUIDELINES _____________ 39 TABLE 3.22: SELECTED CLASSIFICATIONS MAPPED WITH RQ’S ______________________________ 40 TABLE 3.23: CRITERIA FOR SELECTING RESEARCH METHOD FACET ___________________________ 41 TABLE 5.1: SEARCH TERMS COLLECTION TABLE __________________________________________ 53 TABLE 5.2: DATA SOURCES AND WEB LINKS _____________________________________________ 53 TABLE 5.3: INITIAL SEARCH STRING ____________________________________________________ 53 TABLE 5.4: FINAL SEARCH STRING _____________________________________________________ 54 TABLE 5.5: LIST OF STUDIES FOUND IN DATA SOURCES ____________________________________ 54 TABLE 5.6: MANUAL SEARCH VENUES __________________________________________________ 55 TABLE 5.7: INCLUSION AND EXCLUSION CRITERIA ________________________________________ 55 TABLE 5.8: KAPPA AGREEMENT VALUES [27] ____________________________________________ 56 TABLE 5.9: AGREEMENT VALUES FOR 30 SAMPLE PAPERS __________________________________ 57 TABLE 5.10: CONSISTENCY PROBLEMS FACET ____________________________________________ 61 TABLE 5.11: CONTRIBUTION TYPE FACET _______________________________________________ 61 TABLE 5.12: CLASSIFICATION FACETS AND THEIR DESCRIPTION _____________________________ 62 TABLE 5.13: RESEARCH TYPE FACET ADAPTED FROM [35] __________________________________ 63 TABLE 5.14: RESEARCH METHOD FACET ADAPTED FROM [32] ______________________________ 63 TABLE 5.15: OVERVIEW OF CLASSIFICATIONS OF UML MODEL INCONSISTENCIES _______________ 88 TABLE 8.1: TERMS AND THEIR ABBREVIATIONS _________________________________________ 125 TABLE 8.2: LITERATURE OBTAINED IN VARIOUS DATA SOURCES ____________________________ 125 TABLE 8.3: DATA EXTRACTION FOR SYSTEMATIC MAPPING STUDIES (PRIMARY STUDIES)________ 127 TABLE 8.4: RESEARCH DONE ON SYSTEMATIC MAPPING METHOD __________________________ 130 TABLE 8.5: AGREEMENT VALUES FOR 200 SAMPLE PAPERS ________________________________ 134 TABLE 8.6: OVERVIEW OF CLASSIFICATION SCHEMES ____________________________________ 135 TABLE 8.7: LIST OF CONFERENCES ____________________________________________________ 135 TABLE 8.8: LIST OF WORKSHOPS _____________________________________________________ 137 TABLE 8.9: LIST OF JOURNALS _______________________________________________________ 137 TABLE 8.10: RELEVANCE-RIGOR RESULTS ______________________________________________ 138 TABLE 8.11: PAPERS RELATED TO TOOLS (CONTRIBUTION) WITH THEIR MEAN RELEVANCE AND RIGOR ______________________________________________________________________ 141 TABLE 8.12: PAPERS RELATED TO PLUG-IN (CONTRIBUTION) WITH THEIR RELEVANCE AND RIGOR 142

1

TABLE 8.13: PAPERS RELATED TO LANGUAGE (CONTRIBUTION) WITH THEIR RELEVANCE AND RIGOR ___________________________________________________________________________ 142 TABLE 8.14: PAPERS RELATED TO ALGORITHMS (CONTRIBUTION) WITH THEIR RELEVANCE AND RIGOR ______________________________________________________________________ 142 TABLE 8.15: PAPERS RELATED TO METHODS (CONTRIBUTION) WITH THEIR RELEVANCE, RIGOR __ 142 TABLE 8.16: PAPERS RELATED TO FRAMEWORK (CONTRIBUTION) WITH THEIR RELEVANCE AND RIGOR ______________________________________________________________________ 144 TABLE 8.17: PAPERS RELATED TO LESSONS LEARNED (CONTRIBUTION) WITH THEIR RELEVANCE AND RIGOR ______________________________________________________________________ 144 TABLE 8.18: PAPERS RELATED TO RULES (CONTRIBUTION) WITH THEIR RELEVANCE AND RIGOR __ 145 TABLE 8.19: PAPERS RELATED TO PROCESS (CONTRIBUTION) WITH THEIR RELEVANCE AND RIGOR 145 TABLE 8.20: PAPERS RELATED TO ADVICES AND IMPLICATIONS (CONTRIBUTION) WITH THEIR RELEVANCE AND RIGOR ________________________________________________________ 146 TABLE 8.21: PAPERS RELATED TO THEORY (CONTRIBUTION) WITH THEIR RELEVANCE AND RIGOR 146 TABLE 8.22: AGREEMENT VALUES FOR 80 SAMPLE PAPERS ________________________________ 146 TABLE 8.23: SYSTEMATIC MAPPING OVERVIEW _________________________________________ 147 TABLE 8.24:MAPPING CLASSIFICATIONS OVER WITH CLASSIFICATION FACETS AND PRIMARY STUDIES______________________________________________________________ 156

2

LIST OF FIGURES FIGURE 1.1: A SIMPLE EXAMPLE FOR INCONSISTENCY ______________________________________ 6 FIGURE 1.2: DOCUMENT STRUCTURE ___________________________________________________ 7 FIGURE 2.1: OUTLINE OF RESEARCH METHODOLOGY _____________________________________ 15 FIGURE 3.1: SLR PROCESS ____________________________________________________________ 17 FIGURE 3.2: SYSTEMATIC SEARCH PROCESS _____________________________________________ 22 FIGURE 3.3: STUDY SELECTION PROCESS ________________________________________________ 24 FIGURE 3.4: THEMATIC PROCESS ADAPTED FROM [23] ____________________________________ 31 FIGURE 3.5: PRIMARY STUDIES PLOTTED ACCORDING TO PUBLICATION TYPE __________________ 32 FIGURE 3.6: SYSTEMATIC MAPPING STUDIES IN SOFTWARE ENGINEERING ____________________ 33 FIGURE 3.7: USAGE OF MAPPING GUIDELINES ___________________________________________ 34 FIGURE 3.8: SYSTEMATIC MAPPING GUIDELINES ALONG WITH PUBLICATION YEAR _____________ 35 FIGURE 3.9: POSSIBLE METHODS FOR CONSTRUCTING/ADOPTING A CLASSIFICATION ___________ 37 FIGURE 4.1: OVERVIEW OF SYSTEMATIC MAPPING PROCESS ADAPTED FROM [13] ______________ 50 FIGURE 5.1: SYSTEMATIC SEARCH PROCESS _____________________________________________ 53 FIGURE 5.2: PRIMARY STUDY SELECTION PROCESS _______________________________________ 57 FIGURE 5.3: DISTRIBUTION OF ARTICLES ACROSS AUTOMATED SEARCH, MANUAL SEARCH, AND SNOWBALL SAMPLING __________________________________________________________ 60 FIGURE 5.4: KEYWORDING PROCESS ADOPTED FROM PETERSEN ET. AL. [13] __________________ 60 FIGURE 5.5: TAG-CLOUD VISUALIZATION FOR COLLECTED KEYWORDS ________________________ 61 FIGURE 5.6: RELEVANCE – RIGOR MAP _________________________________________________ 66 FIGURE 5.7: PUBLICATION DISTRIBUTION AND THEIR RIGOR, RELEVANCE AMONG YEARS ________ 68 FIGURE 5.8: SYSTEMATIC MAP – YEAR VS. CONSISTENCY PROBLEMS _________________________ 68 FIGURE 5.9: PUBLICATION DISTRIBUTION ON UML MODEL INCONSISTENCIES__________________ 70 FIGURE 5.10: TOP JOURNALS ON UML MODEL INCONSISTENCIES____________________________ 70 FIGURE 5.11: TOP CONFERENCES ON UML MODEL INCONSISTENCIES ________________________ 71 FIGURE 5.12: TOP WORKSHOPS ON UML MODEL INCONSISTENCIES _________________________ 72 FIGURE 5.13: MOST ACTIVE RESEARCHERS ON UML MODEL INCONSISTENCIES _________________ 74 FIGURE 5.14: GEOGRAPHICAL DISTRIBUTION OF THE WORK ON UML MODEL INCONSISTENCIES __ 76 FIGURE 5.15: DISTRIBUTION OF RESEARCH TYPE _________________________________________ 77 FIGURE 5.16: AVERAGE RELEVANCE AND RIGOR OVER RESEARCH TYPES ______________________ 78 FIGURE 5.17: SYSTEMATIC MAP – RESEARCH TYPE FACET VS. CONSISTENCY PROBLEMS FACET ____ 79 FIGURE 5.18: RESEARCH METHODS USED _______________________________________________ 80 FIGURE 5.19: SYSTEMATIC MAP – RESEARCH METHOD FACET VS. CONSISTENCY PROBLEMS FACET 81 FIGURE 5.20: STUDY SETTING ________________________________________________________ 82 FIGURE 5.21: SYSTEMATIC MAP – CONTRIBUTION TYPE FACET VS. CONSISTENCY PROBLEMS FACET 83 FIGURE 5.22: UML TOOL VERSIONS ____________________________________________________ 90 FIGURE 5.23: TACKLED DIAGRAMS ____________________________________________________ 91 FIGURE 8.1: SNOWBALL SAMPLING CRITERIA ___________________________________________ 128 FIGURE 8.2: THEMES USED TO DRAW CONCLUSIONS FOR SYSTEMATIC MAPPING _____________ 133 FIGURE 8.3: ANALYSIS TECHNIQUES USED IN SMSS ______________________________________ 133

3

1

INTRODUCTION

Software design process a structure imposed on the development of a software product. To handle the software design process effectively, there is a need to know and understand software process modeling and in order to describe the relationship between the products, resources, activities, and other sort of data, which are involved in the software process. To accomplish high quality end product, software developers should work accordingly to the selected and implemented software process model [1]. Modeling is a technique very frequently used in research and practice related to software engineering and also a rationale for Model driven development (MDD) paradigm. Software projects have high attention on models, which are commonly accepted as the main and the frequently the only type of artifacts that are considered and developed within the development process. UML stands for Unified Modeling Language. The language allows a rich expression and is used to make software blueprints and it [1]. Software industries have considered UML as an accepted and commonly used language for object oriented language. Much interest was put on UML reflected by the number of publications exploring the properties and usage of the language, and specially inconsistency issues related to the use of the language. The interests were exposed by events devoted to examine and discuss consistency, such as annual conferences related to UML, exposing the ongoing and planned research. UML was approved as an industry standard for modeling software intensive systems by International Organization for standardization (ISO) [4] in the year 2000. The primary developers of UML were Jim Rumbaugh, Ivar Jacobson, and Grady Booch, who formerly had their own competing methods OMT, OOSE, and Booch. Finally, they have combined their methods to a singular unified notation and brought an open standard called UML [1] [9]. UML is a “visual language used for specifying, constructing and documenting the artifacts of a system” [1]. Complex, huge software designs are more difficult to derive textually, but they can be carried through diagrams of UML. It provides three key attributes like visualization, complexity management, and clear communication, which help to describe and to design more easily complex software systems [5] [6]. Software systems have been built using object-oriented (OO) style [2] [7]. UML facilitates early evaluation of the system and therefore it reduces the number of defects in the final code and also improves the productivity of development team. The UML notation is a language but not a methodology. It does contain a number of diagrams that are utilized in a given development process, what enables easier to understand essential properties of the application which is under development. UML includes a number of diagram. Each diagram is dedicated to provide a different loo or perspective of the system under design. The essential and most useful, standard diagrams are use case diagram, class diagram, sequence diagram, state chart diagram, activity diagram, component diagram, deployment diagram and object diagram [1] [8] [SMS 25]. While using UML modeling, the first step is to identify and describe the process requirements namely process analysis of system e.g. use case diagram. The second step is defining and describing important elements of process model e.g. class diagram, activity diagram. As the third step, we motivate to report the activities, execution of elements in the form of a state chart diagram, sequence diagram [1] [8]. UML definition includes a combination of semantic and syntactic rules that helps in understanding and rendering of models. Some rules are formally stated [18] [19] to maintain the well- formedness of the models and the remaining are informally stated to provide more expressiveness and flexibility to models at different levels of abstraction. The main advantage of UML is it has the ability to express different types of design aspects ranging from structural diagrams to behavioral diagrams. Due to this unique advantage of multiple views, there is a 4

danger that different partial models produced during the development process can ”inconsistent” Both researchers and practitioners have investigated the notion of consistency. A substantial amount of research work related to consistency was published in various forums but there is a lack of a classification for UML model inconsistencies based on those research works [SMS 08], [SMS 19]. According to the online dictionary [53], consistency means “a harmonious uniformity or agreement among things or parts”. Based on that definition, inconsistency has been derived in UML perspective as “lack of uniformity among UML diagrams leads to miscommunication among software designers”, whereas [SMS 91] proposed the inconsistency definition as “a state in which two or more diagrams of UML model portray contradicting or conflicting”. As a simple example of incosistency, consider a class diagram and a sequence diagram in UML (see Figure 1.1), where the sequence diagram shows that the Class A invokes Method_Z in Class B. However, in the class diagram Class B has no Method_Z. In this case, the sequence diagram is said to be inconsistent to class diagram.

Figure 1.1: A simple example for inconsistency Speaking about ‘inconsistency’ we come across a very important question: why do we examine inconsistency in UML artifacts? The first motivation is to maintain the correctness between different levels of abstractions and different artifacts during software development process. The second motivation is implementability, which is usually involved in generating precise and unambiguous programming language from UML model [SMS 29]. Based on these two motivations the researchers started to enquire about UML model inconsistencies. Defining inconsistency is one task and detecting is another. The inconsistency in design can lead to the development of an improper design of the complete system and that may result in failing all test levels, and the designing system cannot guarantee to meet customer needs [9]. In addition, there is a chance for a misuse of UML in the typical iterative development process, where model is built incrementally beginning with the requirements and finishing with the code. The software development process has several levels and a number of engineers involve with different skills [SMS 37]. Without resolving consistency issues, it is hard to construct a good quality model and that may lead to project termination. In the past decade there was a growing interest and reported research related to UML model inconsistencies. This thesis is an attempt to provide a systematic view on the published research that was curried on in that area.

5

The rest of our thesis was structured in the following format (see Figure 1.2).

Research Design Background and Related work

Background and Related work

Research Problem

Research Methodology

Relevant information and report of other works regarding UML model inconsistencies has been illustrated

Relevant information and report of other works regarding systematic mapping method has been stated

Describes the research objectives and RQ’s

Description, rationale, and execution of research process

Systematic Literature Review Planning Phase

Conducting Phase

Reporting Phase

Developing and evaluating a review protocol

Describes the identification, refinement and assessment of studies.

Describes the data analysis, results, discussion and limitations

Developing Systematic Mapping Process Reasons

Overview

Motivation for making /changing of systematic mapping process

Description of our systematic mapping process

Systematic mapping Describes the execution of our mapping process and analysis of maps on UML model inconsistencies. Report the reflections and discussion of results, and limitations for systematic mapping.

Conclusions Significant findings, answers to RQ’s, generalizability of results, and possible future work.

Appendix Presents the final primary studies for SLR and SMS, long tables, complementing material in support of research.

Figure 1.2: Document structure

6

2

RESEARCH DESIGN

2.1

Background and Related Work on UML model inconsistencies

Consistency has been recognized early as a crucial aspect of modeling, particularly in UML [SMS 95]. Several workshop forums have been organized to address various kinds of inconsistencies [14], [20]. Several researchers published numerous research works on different kinds of consistency problems. A two dimensional view model is presented in [SMS 95] which describes and address about consistency. According to that model, there exist two dimensions associated with consistency: consistency between models in different phases of development process known as ‘horizontal or inter-consistency’, and consistency between models on different levels of abstraction known as ‘vertical or intra-consistency’. In paper [21], the inconsistency problem in UML is analyzed by specifying a classification and formal speculation of inconsistencies and also introduced a method to find and solve them. There exists different types of UML model inconsistencies, but only syntactic, semantic, intra-model, and inter-model are recognized in the past decade [18], [SMS 41]. Syntactic: This type of inconsistency occurs when a model does not conform to the syntax of the UML meta-model (a meta-model is a schema which defines rules to construct a model). Or when the same model element is defined and used in a different ways in different diagrams of the same model. Semantic: This type of inconsistency occurs when the same element of the the model has assigned a different meaning in the different views or version of the same model. This type of inconsistency occurs also due to ambiguous definitions of UML semantics. According to [SMS 96], ambiguity is a double edged sword. On one hand, it provides the flexibility to express the design at a higher-level of abstraction without giving commitment to details, whereas on the other hand, it complicates consistency analysis. Intra-model: This type of inconsistency happens when there is a lack of uniformity between different elements of UML model or different models at the same level of abstraction. It is also called as horizontal inconsistency. This type of inconsistency represents the static and dynamic views of the model. Inter-model: This type of inconsistency occurs when there is an incompatibility between UML models at different levels of abstraction. It is also called as vertical inconsistency. There exist several research papers which explore not only consistency problems in UML but also methodologies and processes for managing consistency in UML. For instance a rule based model which helps in detecting inconsistencies in UML solution is presented in the paper [SMS 100]. The inconsistencies were identified by defining a production system language and rules specific to software designs modeled in UML. Another example is an algebraic proposal for tackling UML consistency presented in [SMS 112]. In this paper the inconsistencies were taken as algebraic specifications and then, as many inconsistencies as possible were reduced by specification method [22]. There are also numerous papers, which report the research work on UML model inconsistencies. Among them the significant papers are considered and summarized below. Hassan et. al. in [SMS 104] proposed an approach for identifying and preventing inconsistency and incompleteness across multiple views of software design, in peculiar class diagrams, use case diagrams, sequence diagrams, and state chart diagrams. In order to evaluate this approach, 7

a detailed case study was done by carrying COMET. The COMET is a method used to define rules for describing the mapping between these different views, in which it uses the UML notation to express OCL constraints enhanced with accomplishing clauses. Feng et. al. in [SMS 106] performed a case study method to discover intra-model inconsistencies in UML model of a chat room and it is constructed from the initial requirements. Only class diagrams, sequence diagrams, and state chart diagrams of UML are frequently used in different levels of the development process. A component based approach is taken an advantage to form the model standard. Development of the chat room model depicts several consistency problems. For some of them, automate checking is applied successfully and resolved. Nevertheless, remaining consistency problems are not solved. Lange et. al. in [SMS 107] investigated the degree of inconsistency and incompleteness involving in UML designs and its impact on software engineering projects. He found out that, many or few types of inconsistencies often occur in the design process and that quantifying the inconsistency in design impacts the design process. To resolve these problems, they have developed a number of techniques for analyzing UML models. Huzar et. al. in [SMS 41] specified that, the initial general considerations are taken as consistency and discussed about relative problems of consistency. All workshop papers are summarized to support and categorize several approaches which are considered as the solutions for UML model inconsistencies. This helps both practitioners and researchers in finding specific guidelines or solutions to the specific problem. Kuzniarz et. al. in [SMS 105] discussed typical inconsistencies, which are encountered in student designs. A sample instructive development process is applied in projects during the course on the introduction to Object Oriented (OO) software development with UML. The inconsistency problems are pointed out in several situations by using ’Common Sense’ judgments. This simple process has a set of rules that would help in preventing inconsistency problems in several situations of design. Lucas et. al. in [SMS 14] used a systematic literature review to find articles related to new and already existing consistency problems. These articles are refined, extracted and analysed along with open problems, trends and future research within the scope that helps to address the consistency problems in UML based software development. In-depth details of formal approach have been presented in the article and it is used to handle the inconsistency problems which fulfill identified limitations. Kuzniarz et. al. in [SMS 120] revealed the importance of consistency perspectives and issues under different levels of the development process. The purpose of conducting voting scheme is to extend the existing research classification framework by including the necessity of the classification elements. A survey was conducted on a sample of 24 stakeholders from academia and industry, with different roles. Overall, pragmatic perspective scored a high level importance and these problems are most significant for practitioners from industry and tool builders while some issues from concepts perspective are important for the responders from academia. Several workshops were also conducted on consistency issues in UML [14]. After investigating the above literature, it is clear that 1. Existing classifications of UML model inconsistencies are made base on Survey and Experiment but not on state-of-the-art literature (published research). 2. Providing an overview of the state and progress of research is missing in current literature and also characterizing published research on UML model inconsistencies appears to be lacking.

8

Thus, we have selected the systematic mapping method to cover above two research problems. The selection of research method has been carried based on two motivations. Due to big increase of publications it is desired to providing an overview and summarize the reported research for UML model inconsistencies. This may expose it often provides a visual summary, the map, where it presents the lack of relevant, high-quality works. Since 2008, systematic mapping method covered the following areas of software engineering such as Global Software Engineering (GSE), Software Eco-system, Software Reuse, Object Oriented Software Design, Software Engineering itself, and Software Testing etc. [SLR 29] [SLR 03] [SLR 17] [SLR 34]. We plan to conduct a systematic mapping in order to provide a possibly complete structure of the research carried out in the area of UML inconsistency with providing a systematic and visual picture of the research. Providing such maps may contribute to possible identification of research gaps and directions.

2.2

Background and Related work on Systematic Mapping Method

In 1969, the term “Software Engineering” has been proposed in the debate of software development problems at NATO conference [41]. Now-a-days software engineering plays a significant role in building software, applications, tools etc. Still many problems are arising day-by-day with complex software which leads to project failure or takes additional cost/time than estimated [41]. To minimize these problems, software engineers must be educated with software engineering discipline and also a thorough investigation is required for finding possible solutions to the newly arising problems. Research method is a technique /method used for conducting search [42] [43], while research methodology is a design which is used to propose solutions in systematic way. The researchers should have sufficient knowledge on selected methods before applying to the research scope. There exists three types of research methods, primary study method, secondary study method, and tertiary study methods. Dawson [45] defined four primary study methods/techniques that might be used in common, viz., Experiment, Survey, Case study, Action research, though various methods/techniques that exists in SE research community. Usually most of SE researchers prefer to use secondary study methods such as systematic literature review and systematic mapping, but only one tertiary study method is available in Evidence-Based Software Engineering (EBSE). According to Kitchenham et. al. [16], primary study is an empirical investigation in which we can make direct knowledge on a specific objective or problem (whether by survey, experiment, case study etc.). Secondary study usually examines and aggregates data on all primary studies associating to specific phenomenon to generate data in stronger forms. Tertiary study, is a post-secondary study usually seeks to examine and aggregate stronger knowledge on secondary studies which are associating to particular phenomenon. In this study our preliminary focus is on secondary study method viz., Systematic mapping. It is also referred to as “scoping study”. The term is particularly proposed by Kitchenham, in his SLR guidelines [16]. According to web dictionary systematic means “characterized by order and planning” [44]. In mathematics context, mapping is derived as “A mathematical relation such that each element of a given set (the domain of the function) is associated with an element of another set (the range of the function)” [44]. Both the above term definitions are amalgamated and derived as “A study that provides a structure of primary studies by classifying them to specify an overview of present, past, and future on particular phenomenon”.

9

Systematic mapping is a method which is thoroughly used in applied science field but neglected in SE field. In 2007, Bailey adopted a mapping method from medical research [SLR 12] (“meta-narrative mapping”) to the research topic (object-oriented software design) [29]. This is the first empirical evidence that introduced and used the mapping method in SE field. Kitchenham et. al. [16] provided the definition and also pointed out the differences between SLR and Systematic mapping. To our knowledge, the following Table 2.1 has been made to report the dissimilar things between systematic mapping and SLR [16].

1. 2. 3. 4.

5. 6. 7. 8.

9. 10.

Table 2.1: Systematic mapping Vs. Systematic literature review Systematic mapping Systematic literature review It has broader research questions. 1. It has narrow research questions Less search terms are needed to 2. Most search terms are needed to formulate a search string. formulate a search string. The data extraction process is much 3. The data extraction process is less broader. compared to mapping. Unlike to use in-depth analysis 4. All types of analysis techniques can be techniques viz., meta-analysis and used for synthesizing SLR results. narrative analysis for synthesizing. Limited to report systematic mapping 5. Dissemination of results for SLR is results. unlimited. It delivers large granular overview. 6. It provides a limited overview. Less effort is required. 7. More effort is required. Presenting lacking or inadequate 8. This one presents where peculiar reported in being studies are not possible evidence is lacking or inadequate with systematic mapping. reported in being studies. Quality assessment does not require in 9. Quality studies are required to answer systematic mapping. research questions. Strictly classifications are needed for 10. Based on articles visualizations can be building a map. represented.

To perform systematic mapping on any specific topic pre-defined guidelines must be needed, which have been comprehensively produced and suggested by Petersen with respect to [SLR 12]. After that we thoroughly reviewed three other mapping guidelines [13], [16], [28] which are made for use or service and also understood the importance of classification scheme(s)/facet(s) in the mapping process. Regarding classification process, much of the papers used the terms ‘classification scheme’ and ‘classification facet’ in several occasions. To make these terms clear, their definitions are given below, Classification scheme: It is an automated or manual process used to construct a group of categories that helps in classifying papers in different perspectives. For instance, keywording of abstracts is a process which is used to define new classifications [13]. Classification facet: It allows us to sort papers with the aid of existing classifications, and it requires less effort. For example, a collection of papers can be classified using a research type facet (e.g. Wieringa et. al.), a research method facet (e.g. Easterbrook et. al.) etc. The objective of systematic mapping is to summarize and provide an overview of research articles and results quantitatively [13]. This method reduces the problems of the researcher in finding new research gaps and directions about particular topic, even though article numbers are increased when compared to SLR. The following research papers specify the importance of systematic mapping. One more important point is that very less number of researchers have experimentally implemented other

10

mapping guidelines and classification scheme/facet in the field of software engineering viz., they are summarized below, Budgen et. al. in [SLR 51] conducted an informal review of a certain number of mapping studies to assess the effectiveness of systematic mapping when used in SE field, and also to specify the challenges that occur while implementing systematic mapping. They explored some areas in SE field which are Unit testing techniques, Software cost estimation, Objectoriented design, UML, and Software design patterns. They have also identified some challenges in the mapping process which are good at presentation of primary studies, poor at classification guidelines. Moreover, they stated that the usage of the systematic mapping method is an especially bigger challenge for inexperienced researchers. Kitchenham et. al. in [SLR 54] used a participant-observer case study to evaluate the importance of systematic mapping in software engineering. In this study, the researchers identified both advantages and disadvantages of a specific topic, which are based on the outcomes of preceding mapping study. Also the researchers stated the differences between SLR and systematic mapping. However, they claimed that the advantages of mapping study are reducing the time and effort for SE researchers in finding new research scopes. On the basis of disadvantages, they suggested guidelines for developing a high-quality mapping study which are, the mapping process should contain rigorous search process including both automated and manual cases, and also that well-defined, reliable classification scheme(s)/facet(s) are required. Bailey et. al. in [SLR 12] conducted a systematic mapping study on the topic of object-oriented software design. This research paper was the first one to use the systematic mapping method in the field of SE. However, they did not use any classification scheme/facet, but classified the papers according to research methods, publication type, and comparison form. The researchers did not specify clear motivations on how these classifications were taken. Though their intention was to use Wieringa et. al. [35] framework for research type perspective, but decided to implement it for further study. Finally, they succeeded in identifying future research gaps of object-oriented software design area. Elberzhager et. al. in [SLR 10] executed systematic mapping on software testing in order to identify existing approaches that help to reduce testing effort. The systematic mapping process was implemented according to Petersen guidelines. For classification purposes, researchers have used only keywording process but have not carried Wieringa classification facet. Pretorius et. al. in [SLR 28] implemented systematic mapping on UML to classify the number of identified empirical studies related to UML. To perform systematic mapping process smoothly researchers adapted the Kitchenham guidelines [16]. After that they focused on selection of classifications to classify evidences with respect to the research problem. The keywording process [13] was not performed for classification because researchers felt that more defects can occur while building a new classification on specific area based on poor quality abstracts. So the researchers classified relevant literature according to the publication type, research method, and research focus. After investigating above literature we found that SLR was missing on systematic mapping studies. In order to gain knowledge on existing classifications, mapping guidelines and classification methods, we have conducted research on software engineering systematic mapping studies with the help of SLR. After the recognition of mapping guidelines, and classification methods, we will select appropriate and suitable guidelines and classification methods to our research topic.

2.3

Research Problem 11

The study was intended to provide an overview of published research on UML model inconsistencies.

2.3.1

Aims and Objectives

The aim of the study was to investigate research on UML model inconsistencies using the systematic mapping method and to provide a systematic picture of the research done so far in order to better understand what has been done in the investigated area. The aim was to be achieved by fulfilling the following objectives: Objective-1: Elaborating an appropriate specific systematic mapping method to be used in further research   

Understanding systematic mapping method based on published studies. Identifying different mapping guidelines and classifications from the published studies. Analysing the findings in order to select appropriate mapping guidelines and classifications that constitute a proper and justified process for conducting systematic mapping on consistency problems.

Objective-2: Applying the elaborated systematic mapping in order to get a systematic structured picture of the research conducted so far           

Finding publications related to UML model inconsistencies. Finding out the quality of published papers. Identifying main publication forums for UML model inconsistencies. Determining most active authors in UML model inconsistencies. Finding publications relevant to UML model inconsistencies across the years. Identifying research methods which have been implemented in their research. Determining types of research that have been conducted on UML model inconsistencies. Identifying primary contributions to the topic. Performing an analysis of systematic mapping results. Finding hot research topics and research gaps. Identifying the commonalities and differences between existing and identified classifications of UML model inconsistencies.

2.3.2

Research Questions

Objectives will be achieved by the following research questions: RQ-1: How systematic mapping method should be performed in the context of thesis? RQ-1.1: What research associated with the systematic mapping in SE has been published? RQ-1.2: Which mapping guidelines, classifications and categorization methods were used when performing systematic mapping? RQ-1.3: Which mapping guidelines, classifications and categorization methods are most appropriate for conducting systematic mapping on UML model inconsistencies? RQ-2: How the research on UML model inconsistencies can be mapped? RQ-2.1: What is the rigor and relevance of the identified publications? RQ-2.2: Which publication fora are the main targets? RQ-2.3: Who are the most active authors? RQ-2.4: How are the publications distributed across the years? RQ-2.5: What empirical research methods are used? RQ-2.6: What type of research has been conducted? 12

RQ-2.7: What kind of contribution has been made? RQ-3: What observations can be drawn from the mappings of UML model inconsistencies? RQ-4: What are the commonalities and differences between existing and identified classifications of UML model inconsistencies?

2.4

Research Methodology

This chapter provides an overview of our research process and lists the selected research methods. Our research process is driven by the research questions and is visually represented in Figure 2.1. In general, several available research methods, such as Literature review, SLR, Tertiary study and Systematic mapping etc., can be performed to collect appropriate data which is required to answer research questions [40]. Selecting an appropriate research method to solve the research problem is an important activity. This challenge was overcome by bringing the best reasonable solutions. Before commencing the research, we studied and imagined different methods to bring out promised solutions. The selection process was repeated until a systematic methodology suitable to the study was made. The process led us to search various prospects, which gave us more time for understanding on both research methodology and consistency problems in UML. At the end of the process, we clearly stated the research problem and also realized that, what we needed was a proper and flexible research methodology. As a result, a well-designed research methodology was constructed (see Figure 2.1). To provide wider empirical evidence supporting on consistency problems in UML, we have selected Systematic mapping method as research method. The selected method is very useful for us to share a broad range of knowledge, as well as exploring the clear picture of specific topic. When looking for a proper research method in order to gather evidence and understand UML model inconsistencies, we found out that the traditional SLR is an inadequate option and also obtained a study which has already performed SLR on consistency problems in UML. It is considered as one of the major reasons for not selecting SLR methodology. The second reason is the differences between SLR and SM, which were already presented in section 2.2 and also comprehensively discussed by Petersen et. al. in [13]. We planned to start our work with finding answers for RQ-1 by applying SLR methodology. This evidence-based approach is selected to perform systematic mapping research and practice. The foreseen outcomes from SLR are Systematic Mapping Studies (SMS) & research done on SM method. These evidences will answer RQ-1.1. Then we proceed to prepare a list of mapping guidelines, classifications, and categorization methods from the identified systematic mapping studies. This will answer RQ-1.2. After that we will select a standard mapping guidelines and suitable classifications from the list, corresponding to RQ-1.3. The chosen guidelines and classifications are examined whether they are suitable and feasible to answer RQ-2, if so they will be selected, otherwise alternative such as refinements will be performed. This idea will be performed till suitable mapping guidelines, classifications, and categorization methods comes out. Subsequently, the selected categorization methods will be integrated with the selected mapping guidelines and also some partial changes would be made for developing a conventional mapping method.

13

Classifying research on UML model inconsistencies

RQ-1

Systematic Literature Review (SLR) Answers RQ-1.1

R E S E A R C H M E T H O D O L O G Y

Systematic Mapping Studies (SMS) & Research done on systematic mapping RQ-2

Systematic mapping studies

Apply systematic mapping on UML model inconsistencies

Answers RQ-1.2

Answers RQ-2, and RQ-3

Systematic mapping guidelines and classification(s)

Mapping Results Systematic mapping process

Choose appropriate mapping guidelines and classification(s) Selected mapping guidelines and classification(s)

Compare existing and identified classifications of UML model inconsistencies

RQ-4

Answers RQ-1.3 Yes

Answers RQ-4 No

Commonalities and differences

Refinements needed ? Yes

Modify systematic mapping process

Answers RQ-1

Flowchart symbols – Meanings

Conclusions

Process –

Sub process -

Decision -

Data (Results) –

Figure 2.1: Outline of research methodology As the result, a feasible mapping method was made to fit for our context and applied on published literature of UML model inconsistencies to answer RQ-2 and RQ-3. The outcome of this process delivers a quantitative data which can be presented in the form of visualizations and helps to analyse those variations in quantitative measures. This analysis fulfils our overall aim of the study. Afterwards a significant comparison will be done among classifications of UML model inconsistencies for identifying commonalities and differences which would provide answers for RQ-4. Finally the contributions will be interpreted in the form of conclusions. Much of the current and recent research is committed to finding ways of preventing threats before they evoke in research design. Similarly, we took the first step to review about our research methodology in-depth and predicted some usual threats. This is done to ensure that we think of what will go wrong before we conduct the study. It is not possible to go back and re-do the study. Producing a quality plan to study and controlling threats are not an easy task. The threats may come up either with SLR or SM method. For instance, while performing SLR we might receive a number of threats and each threat needs to be mitigated by having some backup plans. Consider the study selection process activity as an example in SLR, during that we can find a threat when both researchers were individually, involved in screening of papers, as it is a time consuming process. Then there is a need to identify agreement levels between both researchers. To mitigate this threat, objective criteria assessment will be performed and also agreement levels between researchers will be calculated using Fleiss’s or Cohen’s kappa [25], [27]. Based on agreement strength, the decision to proceed further or back will be taken. If the study selection process is performed jointly then there is no need to calculate agreement levels between researchers. Another threat predicted in SLR is ‘identification of primary studies’, which can be diminished by performing snowball sampling. Likewise, there is a 14

chance for the occurrence of similar threats in SM method, and then same actions will be taken. Other suspicious threats, which are not predicted during the development of research methodology are to be identified and reduced to minimum.

15

3

SYSTEMATIC LITERATURE REVIEW

SLR is a systematic and rigorous stand-alone process which is mainly used for conducting literature survey and aggregating evidence in a systematic process [47]. It reduces the level of bias compared to literature review. According to Kitchenham [16], it consists of a set of activities which is known as a systematic process. We described our SLR process in this section. The basic outline of our SLR process (see Figure 3.1) is, Start

Step-1: Need for review

Step-3: Developing a review protocol

Planning phase

Step-2: Define research scope

Step-4: Evaluating a review protocol

Step-5: Data retrieval process

Step-7: Quality assessment criteria and data extraction strategy

Conducting phase

Step-6: Study selection process

Step-8: Data synthesis

Reporting phase

Step-9: Results and Analysis

Stop

Figure 3.1: SLR process The SLR is comprised of 3 phases; 1. Planning phase 2. Conducting phase 3. Reporting phase In the planning phase, we design a plan for conducting phase. Firstly, Need for review was conducted for verifying whether this kind of study was already performed or not. After that the goals were defined for conducting the review, and a review protocol was developed to execute the conducting phase. In conducting phase, a search process was created for refining search terms and search string. Initially, a trial search was done on selected data sources using search string then the total 16

results were compared with known related work papers. If the results found are greater than 90% then we proceed to automated search with the final search string, otherwise the search terms and the search string have to be refined. More details about search process were provided in section 3.2.1.1. The snowball sampling technique was specially implemented to alleviate the bias of missing studies. The bibliography of studies was retrieved and noted using the bibliography management tool. In the study selection process, the study selection criteria were executed on primary studies to terminate irrelevant records. Before executing the original data extraction process, a data extraction form was created with relevant properties and applied on final primary studies. The concluding evidences were analyzed by using thematic analysis. In reporting phase, the data obtained from conducting phase was used to address RQ-1 by interpreting results in various forms.

3.1

Phase1: Planning review

The starting phase of SLR was used for collecting empirical evidence on systematic mapping. This phase was later broken into three steps: defining review scope and keywords, developing review protocol, evaluating the review protocol. The result of phase-1 acts as input to the next phase.

3.1.1

Step1: Need for review

The motivation for review was given in the related work section. Furthermore, we conducted a mini literature search on RQ-1 to confirm whether similar research was published or not and then we conducted systematic search in two databases viz., INSPEC, COMPENDEX. The motivations behind selecting INSPEC and COMPENDEX databases are, it is one of the primary databases for computer science area, and it contains articles from ACM, IEEE, SPRINGER LINK and also the execution of the search string is very simple. Biolchini et. al. [48], Felizardo et. al. [SLR 25] addressed some synonyms for SLR. Besides that we identified more synonyms for systematic mapping by reviewing related articles. Afterwards, we formulated a search string (see Table 3.1) by using logical operators AND, OR with Systematic Literature Review and Systematic Mapping synonyms. Table 3.1: Search string for mini literature search Search string for mini literature search ("systematic mapping" OR "mapping study" OR "scoping review") AND ("systematic literature review" OR SLR OR "systematic review" OR "literature review" OR "meta analysis" OR "structured review" OR "literature review" OR "systematic reviews" OR "literature survey" OR "systematic overview" OR "literature analysis" OR "review of studies" OR "systematic research synthesis" OR "systematic literature reviews")wn KY

We found that none of the article from the results are addressing primary goal (RQ-1) of our study. Our primary goal of the study is unlike the ones found in results so we proceeded to define goals with rationale in the next step.

3.1.2

Step2: Define research scope

The goals behind conducting this review are as follows: 

Identify and report the list of software engineering systematic mapping studies and also explore research which is done on SM method.

17

 

3.1.3

Identify mapping guidelines, classifications and categorization methods which have been performed in other mapping studies. Prepare an appropriate systematic mapping process to the context of our thesis by using collected data.

Step3: Developing a review protocol

A review protocol is most important for each and every SLR because it pre-defines the objectives and approaches. It can also help to restrict the possibility of biased post-hoc alternatives in review methods. The basic steps of review protocol are: 1. Deriving search terms based on research questions, topic area and the phenomenon as a whole. 2. Identifying relevant data sources to the software engineering field. 3. Developing a search strategy in order to formulate rigorous search string. 4. Conducting search in selected data sources using final search string. 5. Applying basic and detailed inclusion/exclusion criteria on primary studies. 6. Applying forward and backward tracking for final primary studies which are obtained from automated search. 7. Extracting data from final set of primary studies. 8. Doing an analysis on extracted by using thematic technique and interpreted the reflections. 3.1.3.1 Search terms According to Kitchenham et. al. [16], PICOC (Population, Intervention, Comparison, Outcome, Context) criteria are used to frame research questions, but in this study we used it to identify search terms from RQ-1 [SLR 01] [SLR 19]. The keywords identified from PICOC are mentioned in Table 3.2. Table 3.2: PICOC criteria

RQ1

Population

Intervention

Comparison

Outcome

Context

software engineering

systematic Mapping, study

-

guidelines, classification schemes

-

In this study, only population and intervention terms were enough to consider concept terms (see Table 3.2). The possible search terms (viz., concepts) were collected from known literature [13] [SLR 17] [SLR 34] and we have identified various synonyms, related terms, alternative terms for each concept term [50] [44]. From those only few associated search terms were selected within software engineering context. Therefore, we arranged all terms according to concept terms (see Table 3.3). Table 3.3: Search terms collection

CONCEPTS

Concept Term 1

Concept Term 2

Concept Term 3

Concept Term 4

systematic

mapping

study

software engineering

review

SYNONYMS ALTERNATIVE WORDS RELATED TERMS

literature

guidelines

structured scoping

method

18

3.1.3.2 Data sources We selected following data sources for retrieving relevant articles (see Table 3.4), Table 3.4: Data sources #

Data sources

1 2 3 4 5 6 7

INSPEC COMPENDEX IEEE ACM Scopus Science Direct ISI Web of Science

There are other data sources available (such as Springer Link, Safari, Ebrary and scholar etc.) for SE research but the selected data sources are highly recommended, used frequently and also covers maximum relevant records to the topic. We have selected an expert search in all data sources (see Table 8.2). The search strategy was comprised of two parts: automated search and manual search (snowball sampling). Automated search resulted in collecting peer reviewed papers from several data sources. Whereas manual search helped in finding more hidden papers which are not captured in automated search. 3.1.3.3 Selection Strategy Selection strategy consists of inclusion criteria and exclusion criteria. Svahnberg et. al. [17] used basic and detailed selection strategies. We also employed both strategies in our primary study selection process which were obtained through the search process. Our basic include/exclude criteria are listed in Table 3.5. Table 3.5: Basic inclusion/exclusion criteria Basic Include/Exclude criteria 1. 2. 3. 4. 5.

Consider only publications which were written in English language and published in 2001 - 2012. Remove duplicates in earlier stage to decrease work effort. Consider solely one publication if similar paper was found in various databases. Include only accessible publications. Consider publications relevant to systematic mapping.

For inspecting publications in more depth, a detailed inclusion and exclusion criteria were introduced as shown in Table 3.6. Table 3.6: Detailed inclusion/exclusion criteria 1. 2. 3. 4.

5. 6. 7. 8.

Study inclusion criteria Publication should be peer reviewed. Publication should be available in full text. Publication only related to software engineering context is considered. Publication can be a literature review, systematic review, industrial experience, case/field study, an experiment, survey, action search or comparative study, tertiary study, systematic mapping study. Publication that discusses method/guidelines of systematic mapping. Publication that gives an overview of systematic mapping. Publication that evaluates or analyze an existing guidelines/method/study of systematic mapping. Publication that discusses a validation of existing guidelines/study of systematic mapping.

19

9.

Publication that was issued in form of journal, conference proceedings, symposium and workshop. 10. Publication will be included if it compare two or more guidelines of systematic mapping. 11. Publications which are based on the expert opinion will be included. 12. Duplicate papers report similar results then we only consider one publication which was completed. 1. 2. 3.

3.1.4

Study exclusion criteria Publications that do not match the inclusion criteria will be excluded Publications that are not relevant to RQ1 will be excluded Publications excluded which are in-accessible.

Step4: Evaluating a review protocol

Review protocol was discussed with our supervisor and his review was followed for manipulating potential review protocol. This was mainly helpful in developing search strategy. The librarian also suggested us some ideas in developing a potential search string for RQ-1. Based on their reviews, we evaluated our protocol in a series of iterations.

3.2

Phase2: Conducting the review

In this phase, we formulated search strings for finding evidence based publications. The obtained results were recorded for each data source. In the study selection process, both basic and detailed inclusion/exclusion criteria were performed on the obtained results for refining results and then snowball sampling was performed for finding hidden articles. In quality assessment and data extraction strategy steps, data extraction form was developed and executed on final primary studies in order to make analysis and synthesis easily.

3.2.1

Step5: Data retrieval process

In our study, only automated search is not sufficient for qualitative research so another source has to be included in the evidence search. Hence, snowball sampling was done after completing automated search. This helps in finding hidden articles which were not found in automated search. 3.2.1.1 Generating a search strategy The preliminary search focuses on classifying both existing systematic reviews and evaluating the value of relevant potential studies [16]. We initially had a set of 14 articles related to systematic mapping. By checking our search catch against known primary studies, the quality of the search string was evaluated and then based on search catch, refinements were done on search terms and search string. This improves the credibility of search process (see Figure 3.2).

20

Start Identify keywords

Keywords table

Select Data sources

Formulate search string

Reformulate search string

Conduct trial search

Refine keywords table Refinements needed ?

No

90%

Proceed to automated search with final search string

Figure 3.2: Systematic search process As the first result of the search process, we did not find at least 50% matching articles to known primary studies by performing an initial search string (see Table 3.7) [SLR 04] [SLR 20] [SLR 23] [SLR 12]. The obtained results have very few relevant records compared to inappropriate records. To make the search string feasible and to get relevant evidences for topic, we have consulted a librarian who has prior experience in formulation of search strings and evaluated it. The suggestions provided by the evaluator was considered and examined with search string, then improvements like interchanging search terms and performing truncations were done to increase the quality of search string. Table 3.7: Initial search string Initial search string (((systematic OR structured OR literature OR in-depth) AND mapping) OR scoping OR systematic map OR systematic map guidelines OR mapping) AND (study OR review OR method OR process) AND (software engineering)

The truncations were mainly used to reduce no. of queries. This process is iterated until refinements are not needed. At the end of the search process, we formulated a final search string which catches more than 90% of relevant articles compared to known primary studies. Our search string scored a high value of relevance studies which is closer to the maximum value (Max. 100%) thus we have ended our refinements up to this level. The final search string is shown in Table 3.8.

21

Table 3.8: Final keywords and search string

Concept Term 1 Concept Term 2 Concept Term 3 Concept Term 4

Collected search keywords A = systematic B = structured C = literature #1 = (A OR B OR C) D = map* #2 = (D) E = stud* F = review* G = method? H = guideline? #3 = (E OR F OR G OR H) I = software engineering #4 = (I) Final search string (#1) AND (#2) AND (#3) AND (#4)

Due to deficiency of standardization among online databases various search strings were formulated and implemented (see Table 8.2). 3.2.1.2 Publication bias Publication bias refers to similitude of pronouncing or publishing less negative research outcomes than the positive ones [16], [67]. To alleviate this threat, we considered publications that have discussed about systematic mapping method and software engineering systematic mapping studies. Furthermore, we considered a wide range of digital libraries to cover the breadth of the field and not restricted our search to particular journal or conference proceedings. Even though special care was taken to alleviate publication bias, there is a chance to get threat for not considering grey literature, technical paper, and non-peer reviewed. Yet, it was done to gather reliable information. The complex evidence obtained from SLR cannot trust exclusively on protocol-driven search strategy. To mitigate the publication bias we perform snowball sampling to the final primary studies which are obtained from data sources. By scanning these evidences we can identify hidden literature which are not found through automated search. 3.2.1.3 Bibliography management and document retrieval Generally, bibliography management tools like Endnote, Mendeley and Zotero etc., are available and these tools are primarily used for recording, citing, managing, sharing resources and also used for reducing duplicates [51], [52], [50]. In this study, we selected both Mendeley and Zotero tools for removing duplications after automation search. These tools work in reliable and flexible manner. Normally, in any thesis work only one tool will be used, but in our case we considered and performed two tools to test how capable they are in removing duplicates. As a result, we observed Mendeley is performing much better than the other. So finally we decided to use Mendeley tool. 3.2.1.4 Documenting the search We have performed various search string(s) on different selected data sources. The rationales behind formulation are discussed in section 3.2.1.1. The Table 8.2 represents a list of field limits used for each database and records obtained during the search.

3.2.2

Step6: Study Selection Process

Figure 3.3 reflects primary study selection process, in which an automated search was conducted on 7 selected data sources by using final search string(s) (see Figure 3.3). After that 22

all publications from 7 data sources were combined and resulted as 3031 studies. Then a filter table (see Figure 3.3) was developed on basis of both basic and detailed inclusion/exclusion criteria for ease-of-use. Afterwards articles were refined based on filter-1 (year) and filter-2 (language), by discarding 495 studies (articles not published between the years 2001-2012) and 79 studies (non-English articles). Final search string

SS1

SS2

ACM

Inspec

65 articles

398 articles

SS3

SS4

Compendex

Scopus

661 articles

892 articles

SS5

SS6

SS7

Science Direct

IEEE

ISI wos

396 articles

556 (articles)

63 articles

Step-1: Studies obtained by search string (3031 articles) Filter-1 (495 articles)

Filter-2 (79 articles)

Step-2: Relevant studies based on Year and Language (2457 articles) Filter-3 (569 articles)


Step-3: After title, abstract and keyword screening (92 articles) Filter-5 (5 articles)


S.No

Inclusion/Exclusion Criterion

Filter-1

Studies ∉ {2001-2012} Non-English Database duplicates Review based on title, abstract

Filter-2 Filter-3 Filter-4 Filter-5 Filter-6

and keywords Non-full text Removed after full-text reading

Step-4: Studies obtained based on detailed screening (45 articles) Step-5: Apply Snowball sampling technique (45 articles)

Articles found at backward level (3 articles)

Articles found at forward level (7 articles)

Step-6: Final primary studies (55 articles)

Figure 3.3: Study selection process After that, filter-3 (duplicates) was performed, where 569 duplicates were terminated from 2457 studies for unique studies by using a Mendeley reference management tool which is mainly used for its simple and easy-to-use features [52]. Next, we screened unique studies by applying filter-4 (title, abstract and keywords review) and terminated 1796 studies which were not relevant. Now, we have applied filter-5 (non-full text) and filter-6 (full-text reading) for the remaining 92 studies, where 5 studies were removed which are not available in full text and 42 studies were removed after full text reading. Then a set of 45 primary studies was obtained, on which snowball sampling [49] was applied to collect relevant studies that were missed during automated search. Snowball sampling methods such as pursuing references of references (viz. reference tracking) and special citation tracking databases are powerful to trace high potential studies [58]. The criteria that we used to conclude snowball sampling process is mentioned in Appendix (see Section 8.4.1). Then, 45 articles were given as an input for snowball sampling process, as the result, 3 studies were pursued through reference tracking (i.e., backward snowballing) and 7 studies were pursued through citation tracking (i.e., forward snowballing). In snowball sampling process, both tracking methods were done up to two iterations (levels), so far no study was found in third level, hence the process led to termination. Finally, a total of 55 primary studies were found, including both systematic mapping studies and research done on systematic mapping.

23

3.2.2.1 Reliability of inclusion decisions The sampling study selection process was done by two authors to make sure that both authors agree on the same level to include/exclude a paper. So, for including/excluding a research paper the authors should agree on the same level. To assess the level of agreements and disagreements between the two authors during study selection process, Kappa analysis was performed. Fleiss’s kappa analysis statistic is widely used in [25] [26] for assessing agreement levels among multiple observers assigned to the same sample, when they work independently. Hence, the authors have followed kappa statistic to assess their overall inter-rater agreement level among the literature. For kappa agreement values (see Table 3.9). Table 3.9: Kappa agreement values [27] Kappa analysis k

Classifying Research on UML model inconsistencies ...

Classifying Research on UML model inconsistencies ...

Suggest Documents

Detecting Inconsistencies between UML Models ... - Semantic Scholar

Identification and Check of Inconsistencies between UML Diagrams

Generating and Evaluating Choices for Fixing Inconsistencies in UML ...

Fixing Inconsistencies in UML Design Models - Semantic Scholar

Generating and Evaluating Choices for Fixing Inconsistencies in UML ...

Resolving model inconsistencies using automated ...

Explaining Inconsistencies in Research on News Coverage of ...

An Approach to Identifying Inconsistencies in Model

Model-Driven Theme/UML - CiteSeerX

Model-Driven Theme/UML - CiteSeerX

3rd Workshop on Living with Inconsistencies in

Classifying and communicating uncertainties in model ...

Code Generation based on Uml Model Transformation

On Challenges of Model Transformation from UML to Alloy - CiteSeerX

Finite Model Reasoning on UML Class Diagrams via Constraint

An Executable Unified Product Model Based on UML to Support ...

A UML Model for Mobile Game on the Android OS

Classifying Green Software Engineering - The GREENSOFT Model

Resolving inconsistencies among constraints on the ...

Some inconsistencies that arise when the CannyâPhillips model of ...

Classifying Based on Extracted Information

Searching the Variability Space to Fix Model Inconsistencies: A ...

Inconsistencies in the MIT bag model of hadrons

Model Processing Tools in UML - Practise

Classifying Research on UML model inconsistencies ...