IEEE TRANSACTIONS ON RELIABILITY, VOL. 63, NO. 4, DECEMBER 2014
913
Achievements and Challenges in State-of-the-Art Software Traceability Between Test and Code Artifacts Reza Meimandi Parizi, Sai Peck Lee, Member, IEEE, and Mohammad Dabbagh
Abstract—Testing is a key activity of software development and maintenance that determines the level of reliability. Traceability is the ability to describe and follow the life of software artifacts, and has been promoted as a means for supporting various activities, most importantly testing. Traceability information facilitates the testing and debugging of complex software by modeling the dependencies between code and tests. Actively supplementing traceability to testing enables rectifying defects more reliably and efficiently. Despite its importance, the development of test-to-code traceability has not been sufficiently addressed in the literature, and even worse there is currently no organized review of traceability studies in this field. In this work, we have investigated the main conferences, workshops, and journals of the requirements engineering, testing, and reliability, and identified those contributions that refer to traceability topics. From that starting point, we characterized and analyzed the chosen contributions against three research questions by utilizing a comparative framework including nine criteria. As a result, our study arrives to some interesting points, and outlines a number of potential research directions. This, in turn, can pave the way for facilitating and empowering traceability research in this domain to assist software engineers and testers in test management.
NC
naming convention
UUT
unit under test
FET
fixture element types
SCG
static call graph
LCBA
last call before assert
LA
lexical analysis
VCS
version control system
LSI
latent semantic indexing
Co-Ev
co-evolution
STS
starting tested set
CTS
candidate tested set
IR
information retrieval
COTS
commercial off-the-shelf
Index Terms—Software testing, software traceability, test-tocode traceability, traceability recovery.
AOP
aspect-oriented programming
FOP
feature-oriented programming
MDD
model-driven development
MDE
model-driven engineering
ACRONYMS AND ABBREVIATIONS
RE
requirements engineering
IDE
integrated development environment
CoEST
center of excellence for software traceability
OO
object-oriented
Manuscript received March 07, 2013; revised March 17, 2014; accepted April 14, 2014. Date of publication July 23, 2014; date of current version November 25, 2014. This work was supported by a High Impact Research Grant with reference UM.C/625/1/HIR/MOHE/FCSIT/13, funded by the Ministry of Education, Malaysia, and a Taylor’s University grant with reference TRGS/ERFS/1/2014/ SOCIT/006. Associate Editor: J.-C. Lu. R. M. Parizi is with the School of Computing and IT, Taylor’s University, 47500 Subang Jaya, Selangor, Malaysia (e-mail: rezameimandi.parizi@taylors. edu.my;
[email protected]). S. P. Lee and M. Dabbagh are with the Department of Software Engineering, Faculty of Computer Science and Information Technology, University of Malaya, 50603 Kuala Lumpur, Malaysia (e-mail:
[email protected];
[email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TR.2014.2338254
I. INTRODUCTION
S
OFTWARE testing is the process of executing a program or system with the intent of finding faults [1] to achieve an acceptable reliability before releasing the system to the customer [2]. Software reliability and testing are two research areas strictly connected to each other [3]. Reliability of software is expected to improve through testing, and consequent debugging [3]–[5], leading to gaining versatility and functionality in a large way for managerial and organizational situations such as software ageing [6]. Traceability is the ability to describe and follow the life of software artifacts, and is described by the links that connect related artifacts [7]. Traceability support (for software projects) is deemed to assist software engineers in comprehension, efficient development, and effective management of software systems [8]. Research has shown that inadequate traceability can be an important contributing factor to software project failures and budget overruns [9]; and it leads to less maintainable software, and to defects due to inconsistencies or omissions [10]. On the
0018-9529 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
914
other hand, achieving affordable traceability can become critical to project success [11], and leads to increasing the maintainability and reliability of software systems by making it possible as a means to verify and trace non-reliable parts [12], mostly through testing. Therefore, traceability is important not only for software artifacts but also as a major component of many standards for software development, such as the CMMI, ISO 9001:2000, and IEEE Std.830-1998, to increase industrial and business considerations. In particular, IEEE Standard 982.1 recognizes traceability as a measure of software reliability. Coding and testing are two key activities of software development [13] that are tightly intermingled, particularly in incremental and iterative methodologies [14] such as agile software development. About half of the resources consumed during the software development cycle are testing resources [15]–[17], i.e. testing is expensive and lengthy. It is beneficial to monitor this stage closely to make it more productive, and, if possible, shorter [18] to develop quality, reliable software [17]. In this regard, supplementing testing by traceability can be a decent remedy to provide useful information (e.g. to help efficiently locate faulty code, analyze impact of the changes, find most effective or redundant tests, and rectify defects more reliably in less time) for developers and testers during the testing and debugging phase. Thus, traceability-aware testing, in turn, helps achieve a highly reliable software system by reducing test effort, and increasing the effectiveness of software fault detection and removal techniques [19], [20]. The trace links between tests (or test cases) and code artifacts (hereafter called test-to-code traceability) are typically implicitly presented in the source code, forcing developers towards overhead code inspection or name matching to identify test cases relevant to a working set of code components. That is, tests also require frequent adaptation to reflect changes in the production code to keep an effective regression suite [14], e.g. during software evolution. Identifying trace links in turn can be seen as a person-power intensive, time-consuming, and error-prone job [21]. Moreover, capturing and creating traceability information as a by-product of development (i.e. performing in parallel with the artifacts) is often said [21] to be tedious to developers, and is rarely done to the necessary level of granularity, which can result in unreliable and incomplete recording of the relevant traceability. Having these drawbacks in mind, traceability link recovery techniques (as after-the-fact way of achieving traceability information) can aggressively tackle these problems by reducing the associated cost and effort needed to construct and maintain a set of traceability links, and compensating incompleteness of trace information during the development process. Traceability link recovery has been attracting more attention, and becoming the subject of research (both fundamental and applied) in recent years to re-establish the traceability relations between artifacts of software production, particularly for large and complex software projects. In this regard, a substantial amount of research effort in the software engineering community has been invested into proposing various techniques to retrieve traceability links between different artifacts (e.g. [22]–[29]). Some need human intervention, and are minimally-automated (e.g. [24], [25]), whereas others can
IEEE TRANSACTIONS ON RELIABILITY, VOL. 63, NO. 4, DECEMBER 2014
automatically generate traceability links (e.g. [22], [23], [30]). Of these, however, the realization of test-to-code traceability has received lesser attention in which only a small proportion of research work in the literature targets this subject. Even in practical and technological settings, test-to-code traceability is not common in software development [31] (e.g. some integrated development environments (IDEs) support the generation of a test case stub based on a given code element (e.g. class), but an explicit link between a code element and a test case is seldom maintained [14]). These observations show that the current research and practice on test-to-code traceability recovery is in infancy. Hence, to support the advancement in this area, and the maturity of the test-to-code traceability recovery, new approaches need to be developed or current approaches evolved with new concepts and ideas. However, maturity will not be reached unless the current approaches are thoroughly analyzed, assessed, and compared, and the needs for improvement are identified. In this respect, a detailed evaluation of the current test-to-code traceability recovery approaches in a theoretical and empirical manner would be needed. In response to the theoretical part of this need, this paper provides a qualitative study of the existing approaches by means of a set of comparison criteria. These criteria (see Section IV-A) indicate to which extent an approach satisfies some features in terms of its support for traceability in this domain. Nevertheless, to this end, we formally seek to answer the following three research questions. RQ1: What are the current test-to-code traceability recovery approaches? Answering RQ1 would help software engineers and researchers to gain insight into how much traceability in this particular domain has been so far achieved (in both research and practice). RQ2: What are the major characteristics of the identified test-to-code traceability recovery approaches? Discover similarities and differences by making comparisons and appropriateness studies. Answering RQ2 would help software engineers and researchers to find clear comparisons, which can be used to facilitate and guide them for choosing approaches that are more appropriate. In other words, the answer helps us to know how and when to apply the approaches developed. RQ3: What are the existing challenges as realization of the current state of the art in this domain? Answering RQ3 would be an attempt to discuss the challenges facing traceability approaches, and help software engineers and researchers, from the comparative analysis of the approaches for test-to-code traceability recovery, to better know the current difficulties in traceability implementation, and how to possibly overcome them. As far as we are aware, this is the first attempt made in the literature to survey this particular research area, which is currently scarce. As a general result, shortcomings and challenges, commonalities, and differences of the current approaches are identified. This identification can give a better understanding of the current approaches to pave the way for facilitating and empowering traceability research in this domain by providing comparison basis and analysis.
PARIZI et al.: ACHIEVEMENTS AND CHALLENGES IN STATE-OF-THE-ART SOFTWARE TRACEABILITY BETWEEN TEST AND CODE ARTIFACTS
915
The remainder of this paper is organized as follows. Section II briefly describes the background of the study, and provides a glimpse of the scope of the work. Section III identifies and presents the current test-to-code traceability recovery approaches, addressing the first research question. Section IV presents and describes the details of survey and evaluation results, addressing the second research question. Section V discusses the challenges that must be tackled to realize the current state of the art in this domain, addressing the third research question. Section VI gives the threats to validity of the study. Section VII presents the related work; and lastly, Section VIII reports the conclusion and future work. II. BACKGROUND AND SCOPE OF THE STUDY To define a common basis within this study, we introduce some definitions, necessary because traceability research has its roots in several different domains, resulting in different understandings of the area and terminology (Section II-A). Moreover, the scope of the study is briefly explained (Section II-B). A. Definitions Traceability, trace link: The Center of Excellence for Software Traceability (CoEST) [32] defines a trace link as a “specified association/relation between a pair of artifacts, one comprising the source artifact (e.g. tests) and one comprising the target artifact (e.g. code).” The task of tracing therefore involves discovering the set of target artifacts that are related to a given source artifact, and then establishing trace links between them. Test-to-code traceability link: In line with the understanding of the notion of traceability link, a test-to-code traceability link is the one to represent the relationship between two elements of tests and code artifacts, i.e. between a test case and the responsible unit under test (UUT). Note that the focus of this paper is on test-to-code traceability; therefore, the term traceability is used to refer to test-to-code traceability throughout this paper. Traceability (link) recovery approach: The term approach here refers to a generic term for methods, techniques, and tools. Thus, a traceability link recovery approach relies on retrieving candidate links between elements in one artifact, and elements in another, using a single or multiple techniques and strategies. A set of high quality candidate links represents a link set between these artifacts that contains as many correct links as possible, and as few fault links as possible [8]. Test-to-code traceability link recovery approach: A traceability link recovery approach is capable of deriving traceability links between specific tests and source code artifacts. The approaches of this type can have the advantage of expediting the process of debugging, and ensuring more effective testing as well, because they can link a test failure with the responsible unit under test, which is a faulty part of the code. In this case, traceability between tests and code artifacts can be seen as another highly desired quality as it can support fast, frequent code-test cycle communications. B. Scope Fig. 1 shows an example of a conjectural traceability graph (taken from [33]) showing trace links among artifacts in five typical phases of the software development life cycle. In this
Fig. 1. (from [33]) Focus of the current work.
figure, an arrow denotes a trace link, which can carry different meanings from one phase to another. For instance, it may mean either usage of an earlier artifact in developing a later artifact, or an artifact depending on another (e.g. a test case testing a code construct such as a class). The area shaded gray is the general focus of the current work to survey existing approaches that have been so far proposed to help generate and retrieve the trace links between tests and code artifacts, particularly in object-oriented (OO) systems. It is also worth pointing out that extracting high quality and accurately candidate links across the entire hierarchy of software artifacts is very challenging, and generally infeasible [22], [28], [34], [35]. No recovery approaches have the capability of recovering all possible links between all the artifacts automatically and accurately, i.e. some useful links might be missed, whereas some incorrect links are extracted. This condition is due both to the imprecision when expressing things in natural languages, and inherent information loss or acquisition when moving between software artifacts at differing levels of abstraction [8]. Therefore, the literature suggests that traceability must be bounded [35], where the trace for a purpose strategy comes into play by stating that trace links should only be captured if there is a direct usage of the traced information [36]. Thus, this explanation gives the motivation for the study of traceability between tests and source code exclusively in this current paper. III. APPROACHES
TEST-TO-CODE TRACEABILITY RECOVERY
ON
To answer the first research question (RQ1), we performed a detailed review of the literature following citations and references using web-based literature search engines. This effort resulted in gathering a number of publications that listed various approaches on the basis of different techniques relating to traceability topics. Thus from all the gathered publications from the relevant literature, we refined and selected those approaches that satisfy the following criteria in accordance with our research purpose (see Fig. 2). • Those traceability approaches that make separation between requirements traceability and traceability related to tests. • Those traceability approaches that start from the source code, inspired by the information derived from the running
916
IEEE TRANSACTIONS ON RELIABILITY, VOL. 63, NO. 4, DECEMBER 2014
Fig. 2. Overview of the identification process of the approaches.
test suite or software repositories such as version control system (VCS) log (i.e. those establish traceability links between test case and the corresponding unit under test). • Those traceability approaches that cover as many particular views of objective reflections as possible to be the most representative ones. That is, the focus was to identify those approaches that identify only test-to-code traceability information. In this case, practices and traceability recovery approaches in navigation between source code and other artifacts such as bugs [37], [38], documents [8], [22], [27], [39], and conversely between tests and other artifacts such as requirements and design [40] were not considered. Moreover, the implicit approaches (i.e. those are not really meant to be used for recovering test-to-code traceability, and mostly related to the debugging and locating of faults in code) such as the one proposed by Sneed [41] were outside the focus of this paper, and were therefore not included. There were other minor approaches tied to IDEs such as Eclipse, which generally offer little support to the developer trying to browse between unit tests and the classes under test. For instance, the Eclipse Java environment provides a wizard while creating unit tests, and requires the developer indicate the corresponding class under test. In addition, Eclipse offers a search-referring test menu entry that retrieves all unit tests that call a selected class or method; however, this capability has the problem of issuing more incorrect links as it cannot differentiate between the calls to helper classes and methods and production classes. In line with these Eclipse-based features, Bouillon et al. [42] presented a JUnit Eclipse plug-in that uses static call graph (SCG) to identify the classes under test for each unit test by using Java annotations from information within a comment string. These approaches were also elicited from the analysis, as they are technologically dependent, and considered as auxiliary-based features that require many configurations and manual efforts with less accuracy in results. Thus, they could not be those kinds of approaches that outline important procedures for the success and effectiveness of traceability recovery. Fig. 2 shows an overview of the methodology used for the identification process of the approaches, which was explained in the above. Nevertheless, after the final review and selection of contributions, we realized that only a few specific approaches for traceability link recovery of tests and code have been proposed. Surprisingly, most of them have come from one influential work [14]. In this previous work, a series of traceability recovery strategies (i.e. the first six approaches in the following sub-sections) was proposed that helps establish traceability links be-
tween production classes and XUnit test cases in object-oriented programs. This in turn again shows that the current traceability recovery between source code and test artifacts is poor. The following sub-sections present the identified approaches, which encompass all the prior major approaches in this domain. Moreover, for each approach, the main drawbacks or issues associated with its link recovery mechanism are identified as well. A. Traceability Recovery Approach Using Naming Convention In this approach [14], [43], traceability links are recovered using a naming convention (NC) technique from the name of a test case as a source for recovery purposes. In this respect, by containing the name of the unit under test as part of the test case, a developer or tester communicates what the purpose of that test case is. This communication is usually performed by pre or post-embedding the name of the unit under test to the string test at the time of constructing the name of the respective test case. Issues: Despite the fact that several tutorials and books describe the naming conventions (e.g. [44], [45]), and discuss their widespread usage and advantages in different domains, this approach in this specific domain does not create traceability links for neither the unit under test with no name containing its name, nor test cases with a name that does not entail a known type. Therefore, this approach relies on the presence of the class’s identifier in a test file by assuming that developers follow specific naming conventions, which forces the identification of a single class as the tested class. But in fact, such assumptions are not always followed in practice, particularly in industrial settings [46]. Moreover, this approach has the problem of ruling out class types that contribute to the unit under test other than the one that matches the naming convention, thus the resulting set of types for this approach is always a singleton. B. Traceability Recovery Approach Using Fixture Element Type In this approach [14], [43], traceability links are recovered using a fixture element types (FET) technique. Similar to the naming convention, the unit under test should explicitly consist of declaring fixture elements to be used as instance variables of the test case. These variables are exposed to all test commands in the test case, and can be initialized in the setUp method. To identify the UUT for a test case, two steps are required by this technique. First, the fixture elements’ set of types are identified. Then, a filtering operation is applied to reduce this set by selecting those one or more types that are associated with most fixture elements. For example, in Fig. 3, Money as the UUT is correctly identified; yet for Fig. 4, Money and Currency
PARIZI et al.: ACHIEVEMENTS AND CHALLENGES IN STATE-OF-THE-ART SOFTWARE TRACEABILITY BETWEEN TEST AND CODE ARTIFACTS
917
Fig. 3. Prototypical JUnit example [43].
Fig. 5. Test case super class with test helpers [43].
Fig. 4. Test case with a test data object of type Currency [43].
are the units under test because of the tie in the number of fixture elements (i.e. two elements for both types). Issues: Although making the unit under test explicit in the form of fixture elements is advocated in literature [47], this approach falls short in case no objects are declared as explicit fixture elements, or in circumstance that the fixture elements are declared with a more generic type than the type of the actual instantiation. C. Traceability Recovery Approach Using Static Call Graph In this approach [14], [43], traceability links are recovered using the static call graph [48] technique. The appeal for this approach comes from the fact that the unit under test by inspecting method invocations in the test case can be derived. In contrast to name or fixture-based approaches that are merely indicators as a result of developers pursuing explicitness, a static call graph-based approach reveals references to production classes in the test case implementation, which is known as test code. According to this approach, to identify the unit under test for a test case, all production classes that are directly being called by a test case, i.e. classes that are the destination of an outgoing method invocation, are collected. Then, the set of production classes that is most referenced will be selected. Refer to Fig. 3; in the testSimpleAdd method, the Money class is referenced twice, by means of the constructor call for instantiation of a new object and its add method call. Issues: The drawback of this approach, therefore, is recovering a potential large set of helper and data object types that will be included into the unit under test set in case there is no dominantly called production class. Thus, the accuracy of the recovered links will not be so fascinating. D. Traceability Recovery Approach Based on Last Call Before Assert In this approach [14], [43], traceability links are recovered using the same technique as with the previous approach, but considering the last call before assert (LCBA) technique to address SCG’s drawbacks. Specifically, this approach proposes to avoid the problem of including helper classes, methods, and data types by looking at what happens right before the assert statement. That is, the tested classes identified by LCBA are the classes with methods called in the statement that precedes an as-
sert statement. In this case, to compare the actual outcome with the expected outcome, it reasons that the test case needs to call the unit under test to retrieve the actual status change. In a case such as the one presented in Fig. 5, this approach enters the verifyResult process to correctly identify Money as the type of the last call (getCurrency) before the assert statement. Issues: Although, the idea behind the underlying technique of this approach was hypothesized to address the SCG’s drawbacks, a large set of units under test still could be recovered in case of a test style where developers write many asserts per test command. Van Deursen et al. refer to this test writing style and problem as Assertion Roulette [49]. Another limitation with this approach comes from the usage of the static call graph as it returns the classes associated with the last called method before the assert statement. It falls short, right before the assert statement, when there is a call to a state-inspector method from a class that is not the class under test [46]. E. Traceability Recovery Approach Using Lexical Analysis In this approach [14], [43], traceability links are recovered using the lexical analysis (LA) technique from the version control system of a software system such as CVS, SVN, Perforce, or SourceSafe. Unlike the previous approaches, this approach does not rely on programmer discipline nor assumes particular call patterns. Rather, it relies on the vocabulary that developers use inside source code, i.e. natural language used in type names, identifiers, strings, comments, etc. The assumption in this approach is that files related to each other contain similar vocabulary (i.e. textual similarity). This is to say that a test case and the corresponding unit under test contain very similar vocabulary that help recover trace links. Looking at the example in Fig. 5, notice how frequently words such as Money, add, and Currency occur both in the test case as well as in the Money class [47]. To calculate the similarity between two files (one test case and one production class as unit under test), this approach relies on latent semantic indexing (LSI) [50] as an information retrieval (IR) technique. The production file with the highest similarity to a test case is hypothesized to be the unit under test. In this regard, the approach sets the variable related to the threshold of link recover to 1, meaning that it takes the best match only (i.e. the correct unit under test). The constant threshold in this approach was set to , meaning that a minimum similarity value need not be defined. In this case, based on the authors’ experience, a rank- subspace was adopted for the singular value decomposition with .
918
IEEE TRANSACTIONS ON RELIABILITY, VOL. 63, NO. 4, DECEMBER 2014
Issues: Although IR techniques can provide the feature of automated traceability and reduce the effort of manual approaches, they are generally best used in legacy documents. One of the main drawbacks with applying IR techniques (like this approach) is that there is still a substantial amount of vocabulary in test cases that does not reappear in the unit under test, such as test, setUp, TestCase, expect, result, assert, etc. This fact in turn makes it difficult to recover accurate traceability links using data mined between these elements. In addition, this approach uses a heavyweight technique of IR methods, which is not usually efficient compared to lightweight techniques. For instance, Marcus et al. [27] use an LSI-based solution on the same systems used by Antoniol et al., where Baccheli et al. [22] experimented with lightweight methods that were involving capturing program elements with regexes. The researchers showed that lightweight methods significantly outperform IR methods. Fig. 6. SCOTCH overview [52].
F. Traceability Recovery Approach Using Co-Evolution In this approach [14], [43], traceability links are recovered using a co-evolution (Co-Ev) technique. The preceding approach started from the version control system of a software system such as CVS, SVN, Perforce, or SourceSafe. It is known that such a system captures changes that are made to the system throughout history, and keeps versions. In this regard, the motivation underneath this approach was based on the idea that test cases, and their corresponding units under test, must change together throughout time, as a change to the unit under test requires some modifications to the test case as well. Thus, to identify the unit under test for a given test case, this approach looks for the production file(s) that have been changed most, together with a test case in the version control change log. The idea used in this approach for recovering traceability links between tests and code is similar to some recent mechanism [51] utilized in the feature location field of research. Issues: This approach would wrongly identify production files that change very frequently as the unit under test of a variety of test cases, therefore the candidate set of recovered links can considerably grow for large projects. Moreover, this approach requires that developers use a version control system in such a way that changes to production and test code are actually brought into the system at the same time. Additionally, it has to be said that the co-evolution information is not usually captured in the VCS, unless developers practice testing during development. Hitherto, we overviewed a series of six approaches (through Sections III-Athrough III-F) on test-to-code traceability recovery, and somewhat highlighted the differences and similarities. It is important to note that the underlying techniques of these approaches were hypothesized by their original work proposed by Van Rompaey et al. [14] to see whether they could be viable means to reveal traceability links between tests and code. Rompaey et al. [14] showed that IR-based and co-evolves approaches are not effective in identifying test-to-code traceability links, though, in general, these approaches have been shown successful in identifying traceability links in other domains with different types of artifacts [27], [26]. The results
indicated that NC is the most accurate, whereas LCBA has higher applicability, in terms of consistency. However, as we discussed above, NC and LCBA have some important limitations associated with them to alleviate. This situation has most recently led to proposing a new test-to-code traceability recovery approach by Qusef et al. [52] to help overcome the limitations impose by these approaches, and also their own previous work [53] related to test-to-code traceability recovery. This approach is overviewed in Section III-G. G. Traceability Recovery Approach Using Slicing and Coupling Most recently, an approach called SCOTCH has been proposed in [52], [54]. Its appeal comes from the shortages of their previous work [46] to improve the recall measures of the results related to retrieving the tested classes. In this improved approach, i.e. SCOTCH, traceability links are recovered using techniques of dynamic slicing [55] and conceptual coupling. The overview of this approach is depicted in Fig. 6. SCOTCH shares with the aforementioned approaches the use of assert statements to derive test-to-code traceability links. In a simple way, SCOTCH identifies the set of tested classes using the two steps highlighted in Fig. 6. The first step exploits dynamic slicing to identify an initial set of candidate tested classes, called the starting tested set (STS). In the second step, the STS are filtered, exploiting the conceptual coupling between the identified classes and the unit test, resulting in the candidate tested set (CTS). Issues: The approach does not take into account the semantic of the STS filtering during the coupling process. Note, an extension of SCOTCH has been recently proposed to enhance filtering strategy [31]. Moreover, developing the slices requires not only a one-time overhead setup but also it involves turning and exercising the previous clusters, applying rules for each iteration. Besides, although conceptual coupling was used to discriminate between the actual tested classes and the helper classes, the set identified by dynamic slicing still contains an overestimate of the set of tested classes.
PARIZI et al.: ACHIEVEMENTS AND CHALLENGES IN STATE-OF-THE-ART SOFTWARE TRACEABILITY BETWEEN TEST AND CODE ARTIFACTS
IV. EVALUATION OF THE APPROACHES To answer the second research question (RQ2), we have identified a type of analysis that evaluates the approaches. This analysis is a kind of qualitative assessment which provides a characterization of the approaches by means of a set of comparison criteria acting as a comparative framework. In this section, we first explain the comparison criteria for the characterization of the approaches. Then, we present the results of our comparison and analysis, followed by a discussion of important trends and points seen in the results. A. Comparison Criteria The following criteria represent the most contributing aspects of a given effective traceability recovery approach, which can also be used to evaluate any traceability approach in traditional software engineering. They have been derived from a thorough analysis of literature (e.g. [56], [57]), as well as inspired by insights (e.g. [35]) obtained by influential researchers in the field. These criteria are briefly explained as follows. Underlying technique. It represents the core of an approach, through which traceability links between tests and code are recovered. Traceability analysis type. Three types of traceability analysis to recover links can be distinguished: manual, semi-automatic, and automatic. Manual analysis of traceability has the software engineer responsible for the search and final decisions on links between artifacts. The semi-automatic type involves automated traceability tools guiding the search for traceability links, and human intervention for final decisions on the candidate links. Automatic traceability analysis is performed by a special-purpose tracing mechanism responsible for the searching for retrieval of links [21]. Automatic type can be further divided into lightweight, and heavyweight. Lightweight types do not require pre-computation of the input, and can be directly executed at run-time. Heavyweight types, by contrast, require preprocessing of their input. Examples of these techniques include information retrieval (IR), and text mining [8]. Trace type. According to Pinheiro [58], traces can be divided into two different categories. Functional traces are created by transforming one artifact into another using a defined rule set. The transformation is not required to be performed automatically, but it has to follow unambiguous procedures. A subset of functional traces is the set of explicit traces which are either created together with the artifact as a by-product of the transformation, or which can be unambiguously reconstructed at any time by analyzing the original artifact, the transformed artifact, and the transformation rules. Note that functional traces, and particularly explicit traces, are usually applicable to model-based artifacts (but not much to test and code artifacts) where models have formally defined syntax and semantics. Non-functional traces are those that Pinheiro [58] categorizes into the aspects of reason, context, decision, and technical. They refer to traces of an informal nature. These traces result from more or less creative process, such as semantically analyzing and extracting customer requirements from a set of meeting minutes. In the technical category, non-functional traces could,
919
for instance, exist on the parts of the code that are affected by quality requirements. Trace direction. A traceability link can be unidirectional (such as depends-on), or bidirectional (such as alternative-for). The direction of a link, however, only serves as an indication of order in time or causality. It does not constrain its (technical) navigability; traceability links can always be followed in both directions [10]. Traceability scheme. A traceability scheme determines the involving artifacts in a trace link recovery process, and further for each artifact it gives the granularity level to record trace links [10]. In our context, tests and code are the artifacts from which trace links are recovered. The granularity level for code artifacts is a class, whereas for a test artifact it is a JUnit test case. In a simply way, this criterion gives the information about which types of links were recovered. Scalability. It is important that traceability approaches be scalable [35] for both capturing links and presenting the linked information to users. In manual or minimally-automated settings, scalability is more achievable due to the incremental capture and maintenance of trace links. Nevertheless, this criterion analyses whether the approach can be efficiently applied to large systems. In this case, a scalable traceability approach is as applicable for larger projects as it is for small projects [56]. Evaluation. This criterion determines how the approach was evaluated in the original work, i.e. by case study, survey study, or controlled experimental study. Visualization. Visualization is an important aspect of traceability as it allows users to verify the quality of trace links [35]. Thus, it will be important for traceability recovery approaches to provide useful graphical information to help depict traces, and flagging of outdated or suspect links. Tool support. An approach might theoretically automate the process of generating links; but, no matter how, it requires a tool to put the underlying theory into practice. Thus, this criterion evaluates whether the approach provides any tool support for facilitating traceability links. In this regard, the architecture of a given tool support can be classified into three types: Web-based, Eclipse plug-in, or standalone. B. Results, and Discussion The characterization’s results with respect to the comparison criteria are shown and summarized in Table I. From Table I, it can be clearly seen that only a little work provides support for test-to-code traceability recovery approaches in the literature. This observation implies, once again, that previous attempts and studies to recover traceability links between test cases and source code have been very limited, and still far from ideal for real-world software. 1) Tool Support: Given Table I, one noticeable feature comes from the tool support, where the results indicate that the practical realization of the test-to-code traceability recovery is very poor at this moment. In the course of reviewing the state of the art in this work, we found that the lack of providing proper practical parts is one of the main issues associated with existing approaches. The existing prototype tools are limited in their functionality, and mostly not distributed beyond the researchers involved in the initial work, typically for evaluating purposes.
920
This shortcoming in available traceability tools is perhaps why, as [34], [59] show, the majority of companies prefer utilizing manual traceability methods, e.g. traceability matrices stored in spreadsheets, though there might exist some tools claiming to provide traceability support. Nevertheless, to the best of our knowledge, and according to the table, the toolset (in the form of an Eclipse plug-in, and associated with the SCOTCH approach) reported in [60] is the only tool to support traceability in this domain. However, this toolset does not yet have the industrial-strength to support automated extraction of traceability links between applications and tests; it requires users to update many aspects of the traceability data. Moreover, the results show the absence of commercial offthe-shelf (COTS) tools in this domain. Therefore, developing adequate COTS test-to-code traceability tools for the needs of the software engineering industry, particularly testing, would be a growing demand. Note that COTS tools are typically marketed as complete requirements management packages, meaning that traceability is only one added feature. The traceability features usually only work if the project methodology is based around the tool itself. Unless the project is developed from the ground up using a particular tool, the tool is unable to provide much benefit without significant rework. In this case, support for heterogeneous computing environments can also be lacking [59]. Thus the literature stresses the need for industrial-strength tools, instead of providing simplistic support. From another point of view, the lack of tool support in this domain can question the tracing accuracy of the approaches themselves that do not provide any. This argument is based on the results of a series of controlled experiments performed by De Lucia et al. [57] in which it was concluded that the presence of tools can positively affect the tracing accuracy of approaches. This outcome also has led some researchers to conclude that poor tool support is the root cause of the lack of proper traceability implementation [61]. 2) Visualization: An interesting point that also warrants discussion is that none of the current approaches provides visualization support along with the recovered traceability links. As we mentioned before, visualization is an important aspect of traceability as it allows users (either project engineers or customers) to verify the quality of trace links [35] by providing improved process visibility. Because real software projects become naturally larger during their development, and present highly complex structures, testing these software projects would be non-trivial, and require a large set of test suites. Traceability visualization would be an important consideration, and preferred not only by users but also helpful in testing and management tasks. Hence, this can be a fruitful avenue of further research in the traceability community to explore appropriate visualization techniques between tests and code artifacts. 3) Evaluation: According to the recent recommendations of the empirical software engineering community [62], most of the current evaluations in the field are not performed rigorously enough. Thus, it remains difficult to generalize the results of the evaluations or the actual effect size of an approach. This condition is also the case with evaluations performed in the test-to-code traceability domain. Our results show that these
IEEE TRANSACTIONS ON RELIABILITY, VOL. 63, NO. 4, DECEMBER 2014
approaches are not usually empirically evaluated. In addition, most of the evaluations do not include the full spectrum of comparisons among all the approaches, whereas they have chosen a subset of approaches to compare. Also, the need for using real and potential software as subjects in empirical studies is another issue in this domain, though reaching them might be difficult for researchers. To address the more rigorous empirical evaluations, it is important to evaluate the claims (with respect to pervious results) made in the existing literature, and critically assess these claims against empirical evidence and replications from the field with a more thorough design of study in the future. 4) Scalability: For scalability criterion, it was challenging to determine the scalability of the approaches as there was no evidence showing their applicability to large software systems. In this regard, with reference to the evaluation part of each original work, considering the size and number of subject projects used, and the fact that the availability of tool support can also be an influencing factor for a better applicability of an approach for supporting larger projects, we determined the scalability of the approaches as follows. This view has been somewhat supported by Almeida et al. [63] and De Lucia et al. [57]. Because all the first six approaches (presented in Sections III-Athrough III-F) went through the exact same evaluation process with no tool support provided, they were all marked to be capable of being used in small up to medium sized projects. On the contrary, because SCOTCH (the seventh approach, presented in Section III-G) had used slightly bigger software projects in its evaluation process, and it had provided tool support, it was marked to be capable of supporting both small and large systems. However, this scalability determination still calls for further research to be conducted to quantitatively address how existing test-to-code traceability recovery approaches can perform in large-scale industrial and practical settings. 5) Automation: The results also show that automation is not only a central issue in traditional software engineering tasks but also in traceability link recovery between tests and code. We observed that only two approaches had the capability of being fully automatic. However, their underlying techniques use a heavyweight process to capture traceability link recovery. This problem can affect the efficiency and performance of using these approaches in practice. But, it has to be noted that, in almost all software domains, there is a classic trade-off to be considered at the balances of efficiency and effectiveness; i.e., in this context, effectiveness refers to the accuracy of recovered links. The results in this particular criterion also created an instance of this typical trade-off in which two approaches were found to be automatic, but perhaps less efficient, in terms of effort or computation time. Thus, if the effectiveness (which is in most of the cases far more important) is important, and time and effort is not limited, it would be preferable to use these automatic approaches. Nevertheless, it has to be said that the current degree of automation of traceability approaches in this domain is still not established due to some factors (such as absence or impreciseness of information concerning code elements and trace links) that inhibit the automation of traceability practices. This result
PARIZI et al.: ACHIEVEMENTS AND CHALLENGES IN STATE-OF-THE-ART SOFTWARE TRACEABILITY BETWEEN TEST AND CODE ARTIFACTS
in turn can outline a need of more appropriate techniques to make the capture of trace information facilitated and automated in more efficient and effective ways. 6) Summary: Based on our observations with reviewing and analyzing this particular research area, and the fact that these traceability recovery approaches are improved from the issues brought up on previously approaches, at the moment, and to the best of our knowledge and understanding of the surveyed approaches, the traceability recovery approach using slicing and coupling (SCOTCH) [52] shows to be one of the most complete test-to-code traceability recovery approaches, useful in objectoriented contexts, at the present time. Despite the fact that SCOTCH (as the most recent proposed approach) has shown to be a more promising approach than the other test-to-code traceability recovery approaches, it has not yet made its way to practice. This means there is no evidence on the application of SCOTCH by practitioners and industry. On the other hand, previous research, and our experiences from reviewing the state-of-the-art literature in this study, show that the approaches based on developer conventions and discipline, such as NC, and IR-based such as LA, were found to have been the best, most frequently used approaches over the past years by recording traces to the greatest possible extent. This outcome is also in line with the findings from [14]. V. CHALLENGES IN TEST-TO-CODE TRACEABILITY RECOVERY Despite the benefits that test-to-code traceability offers to the software developers and testers, from the comparative analysis of the approaches for test-to-code traceability recovery in this study, we identified the following further challenges (i.e. open issues) as a realization of the current state of the art in this field. Answering these challenges would answer the third research question (RQ3) presented in Section I. • The variety of techniques used for describing different approaches makes it difficult to manage a fine-grained evaluation basis (to validate the approaches) for recovered trace links due to their heterogeneity. This difficulty is the reason authors of previous studies did not include a complete set of comparisons among all the approaches under evaluation. Moreover, because different third-party case tools might be used during the various traceability steps, particularly throughout IR-based techniques, traditional approaches for evaluation processes might have more inconsistency issues to collect and manage data. As a remedy, we suggest providing more support approaches for in-place traceability [36]. • Constructing better traces for enabling traceability and use of the facilities provided by test-driven techniques should be better explored. The semantics of traceability and their structure is an open issue. Although this matter is not impacted by the testing process or other development process, testers may help the automatic creation of traceability links on the basis of a metadata that presents a good taxonomy of trace dependencies, and that expresses connectivity of trace dependencies in test suits. The semantic gap can be resolved by defining intra-domain synonyms, and utilizing a tool that can support domain-specific synonym machining [36].
921
• Less automation is found to cope with traceability. The degree of automation of traceability approaches, and generally poor tool support to the implementation of traceability in this domain, is a big challenge. This challenge is due to the fact that there is still the need of more appropriate techniques to advance the development of effective, automated approaches. Depending on the way test code is applied and presented, the capture of trace information in test suites can be facilitated and automated with respect to the knowledge about source code and its relevant programing conventions. • It is still not clear whether recovering trace links between tests and incremental production code processes can support each other efficiently. The reviewed traceability approaches lack support for incremental and iterative methodologies, most importantly, agile software development. In a general sense, this issue has posed a challenge not only to the current test-to-code traceability approaches but also to the area of requirements traceability. The reason is the wide industry adoption of agile software development in recent years, which has caused that the assumptions that the traditional approaches rely on do not hold true any longer. This challenge in turn implies that the current traceability approaches cannot keep pace with the rapid growth and adoption of agile development. To overcome this challenge, the traceability approaches should take into account assumptions underlying agile software development processes by investigating techniques that are useful for agile development environments. • Another open issue is the investigation of trace recovery between code elements and tests when the traces are not described explicitly. An example of this situation is when aspect code (provided by aspect-oriented programming (AOP) [64]) or feature-related code (provided by feature-oriented programming (FOP) [65]) is used in relation to base code for developing software systems. In this case, due to the obliviousness property associated with the nature of these programming paradigms, the relationships between injected codes (e.g. aspect code) would remain implicit with the rest of the code in software systems. Unfortunately, the reviewed traceability approaches do not explore mechanisms for discovering important implicit traces in their underlying techniques. Due to this issue, recovering traceability can bring questions. How to deal with the implicit relationships between modules in code with their respective tests? Further, what kinds of trace information a module (e.g. class) brings from its own sources to its targets? In our opinion, the questions regarding this issue are worth addressing when dealing with modularized code development paradigms such as AOP, or FOP. • Techniques for the evolution of trace links are not explored by the reviewed approaches to enable connectivity between codes and tests, especially during regression testing. In our view, the automatic update of traces by using a test suite’s changes should be considered as a way to keep the consistency of the traceability information, and then achieve a high quality traceability process. This
922
IEEE TRANSACTIONS ON RELIABILITY, VOL. 63, NO. 4, DECEMBER 2014
action requires that traces should evolve as tests and code evolve accordingly. In this sense, the tool support offered by the approaches was somehow incomplete, and requires further advancement to address this issue more explicitly. Another related challenge that remains is how we cope with the update of traces (given the target code) using a less costly procedure, if the execution of the complete test suit is not desired again. This matter has also been raised as one of the issues of the existing traceability approaches to fit into modern agile-based software. • Last, the need of considering traceability in distributed environments is another open issue to address. Testing distributed software systems is challenging, and non-trivial, where coding for development of such systems is usually performed from different sites and countries. In this regard, performing traceability tasks in this domain should take into account this matter to gain wide adoption as it can play an important role in providing a better quality and reliability understanding of distributed software systems (as benefits to testers), while consuming less effort and time in testing-related tasks. In our view, many of these challenges can be overcome through technical solutions that involve creating cost-effective, quality, automated tools that improve upon the design and feature set of currently available approaches and tools. Thus, researchers and practitioners interested in traceability in this domain continue to investigate solutions to improve the performance of trace results.
to determine the directions as there was not explicit information regarding the traced artifact’s interactions when designing the traceability recovery tasks. To mitigate this threat, we tried further contacting the proposers of the selected approaches, if further clarification and double checking would be needed.
VI. THREATS TO VALIDITY This section discusses the threats to validity for this study, focusing attention on general threats that could affect our results. • The sources that were used to extract the approaches from are mainly from the published research papers, especially from international journals or conference proceedings. The papers usually contain compressed information such that some relevant information with respect to the approach’s characterization or data analyses might not be presented. Seen in this way, this could be a threat to the study. However, since these papers were already published and well accepted in international community, in term of their contributions, completeness, and clarity, thus it would not be a severe threat. In addition to this, whenever possible, we tried to access to the long version of the papers such as dissertation reports or technical reports to get a broader and more clear understanding. • This threat is related to the identification process of the papers (i.e. the approaches). Although, we have tried to separate between requirements traceability and traceability related to tests, it turned out in some cases a clear separation between these topics was not possible. It is possible that we overlooked some relevant papers. However, we used a methodological process (as illustrated in Fig. 2) to alleviate this threat, assuring we select the most relevant and representative sources. • The last threat is related to the fact that the answers determined for some criteria (in Table I) had subjective relevance. For instance, for trace direction, it was not easy
VII. RELATED WORK A close look at the presented test-to-code traceability recovery approaches shows that they are based on different techniques (as could be seen from Table I), and mostly related to areas of requirements engineering (RE), model-driven development (MDD), and information retrieval. Inspired by this, we place emphasis on traceability literature concerning survey work, which has expanded considerably in recent years in these areas. A. Requirements Engineering In one of the preliminary work presented in [66], a survey on tracing approaches in traditional software engineering, with a focus more on requirements traceability, was carried out. This survey specifically provided an elaboration of traceability taxonomy of the main concepts affecting traceability, and integrated existing tracing approaches in practice and research into the taxonomy (taken from [67]) with investigation of several aspects such as technology or organizations. In another work, Winkler et al. [10] also performed a comprehensive survey of traceability research and practice in both areas of requirements engineering, and model-driven development. In this survey work, the results of the literature study in these areas were classified and reported according to the four categories: basics description of traceability and associated topics, working with traces to describe how traceability could be achieved and used, practice description of the state of the practice and what limits the application of traceability in the industry, and solutions, which is a description of approaches to overcome these limitations. One of the achievements of this work was to facilitate the gap between communications of requirements engineers and model-driven developers as a way of achieving traceability. Rochimah et al. [68] theoretically evaluated seven traceability approaches from the software evolution point of view. The approaches were drawn from different categories of techniques including the IR-based approach, the rule-based approach, the event-based approach, the hypertext-based approach, the feature model-based approach, the value-based approach, and the scenario-based approach, with having focus on requirements artifact to address traceability issues. The evaluation in this paper was based on a literature study to give characterizations of the approaches under evaluation with respect to the software evolution framework proposed in [69]. One of the limitations associated with this work is that it does not present any clear guideline based on the results to help choose an approach. Most recently, Torkar et al. [70] conducted an extensive systematic review of requirements traceability research and studies. In this work, requirements traceability definitions, challenges, tools, and techniques were examined; and accordingly, challenges and current issues associated with available tools and
PARIZI et al.: ACHIEVEMENTS AND CHALLENGES IN STATE-OF-THE-ART SOFTWARE TRACEABILITY BETWEEN TEST AND CODE ARTIFACTS
techniques were presented. Additionally, the results and analysis were complemented with a static validation in an industrial case study through a series of interviews. In our view, this recent work can serve as an inspiration for researchers and provide them with interesting avenues of exploration suited to their area of interest. B. Model-Driven Development Model-driven development, also sometimes called model-driven engineering (MDE), is a promising paradigm for developing programs by machine-assisted model transformations that help save human efforts, and reduce the possibility of introducing program faults [71]. Traceability has been a topic of great interest to software engineers and researchers, resulting in the proposal of a number of techniques, tools, and survey studies. In a work presented by Aizenbud-Reshef et al. [72], the authors reviewed recent advances on technologies to automate traceability, and discussed the potential roles of model-driven development in this field. In another work, Galvão et al. [56] performed a survey on traceability approaches in MDE. This survey discussed the state of the art in traceability approaches in MDE, and evaluated them with respect to five comparison criteria: representation, mapping, scalability, change impact analysis, and tool support. The traceability approaches that were considered for analysis in this survey were classified into three categories: requirements-driven approaches, modeling approaches, and transformation approaches. As a general result, this work gave a list of open issues on tracing requirements and model elements in MDE that could be better explored by traceability in this field. As with our study in this current paper, this survey also showed that tool support to automate the traceability in MDE is crucial, and needs further development. C. Information Retrieval In the last decade, many authors have applied IR methods to the problem of recovering traceability links between different software artifacts, which has resulted in various tracing techniques and approaches. This activity, on the other hand, has motivated the research community to compare and assess these techniques that deal with different kinds of entities and relationships to be traced to further identify their effectiveness and characterizations. As a result, multiple analytical and survey studies have emerged, either by providing methodological advice, or by mapping previous research. In Abadi et al. [73], the authors compared five IR-based techniques for traceability between code and documentations within their own proposed recovery process. The techniques used included latent semantic indexing [50], vector space model [74], Jensen-Shannon similarity model, probabilistic latent semantic indexing (PLSI) [75], and sufficient dimensionality reduction (SDR) [76], which were compared against two datasets: SCA version 2.2, and CORBA version 2.3. Their main conclusion was that the two IR techniques that best fit the traceability recovery task are the vector space model, and the Jensen-Shannon similarity model, though they do not perform dimensionality reduction well.
923
De Lucia et al. [57] performed an assessment of the usefulness of IR-based traceability recovery tools through controlled experiments. The main result achieved in the two experiments in this study was that the use of a traceability recovery tool could significantly reduce the time spent by the software engineer with respect to manual tracing. In addition, they also investigated whether subjects’ experience and ability play any role in the traceability link identification process, which led to the conclusion that different levels of experience and ability of the software engineers could affect the traceability outcomes. Similar to this work, Sundaram et al. [21] presented a recent analytical survey work, which focus on the evaluation of IR-based traceability recovery approaches, such as latent semantic indexing, vector space retrieval, and probabilistic IR. However, they were concentrating on examining issues of the importance of the vocabulary base used for tracing, and the evaluation and assessment of traceability mappings and methods using secondary measures within the IR context. They examined these areas supplemented with empirical results. Similar to the work presented in [66], a recent work by Borg et al. [77] provided a context taxonomy for information retrieval traceability tools as a means to evaluate them by exploring the consequences of the fact that most of the evaluations of such tools have been focused on benchmarking tool output. They concluded that datasets for evaluations purposes should be thoroughly characterized on traceability recovery, especially when they cannot be disclosed. This approach can enable replications and secondary studies in future research. Datasets used without a clear characterization was found to be one the major issues associated with the current evaluations that could bias the traceability outcomes. D. Summary of Related Work We observed that there is much more research in evaluating IR-based traceability approaches compared to the other two areas. One possible reason to this observation can be the wide usage of IR methods in large-scale software development and maintenance, where large amounts of information are generated, and are of useful sources, particularly for establishing traceability links while consuming less time. Overall, as the work presented in this paper considers, the related work study shows that none of the presented above work has given a synthesis analysis performed solely on test-to-code traceability recovery approaches and their characterizations. In this sense, this work considerably differs from all the current related work reported in the literature on software traceability. VIII. CONCLUSIONS, AND FUTURE WORK Traceability is an important topic in the field of software testing, where traces can play an important role to support change propagation between tests and code, as an integral part of the results of debugging, regression testing, or system evolution. Research on test-to-code traceability recovery is immature. This paper realized a theoretical evaluation of the current approaches (assessed with respect to nine general comparison criteria) as a fundamental step towards increasing maturity by giving an up-to-date view and analysis on the state of research
924
IEEE TRANSACTIONS ON RELIABILITY, VOL. 63, NO. 4, DECEMBER 2014
SUMMARY OF
THE
TABLE I COMPARISON
in this area. Our goal for this survey was to encourage the reader to take a universal view on traceability in this domain by providing the advancement achieved by the traceability, and the challenges faced by the practice of traceability in software projects today. Consequently, our results can contribute to the body of knowledge about the comparison and evaluation of test-to-code traceability recovery, which is currently scarce but growing, in the interests of both software testing and reliability audiences. In a future line of this research, we intend to include surveying and performing analysis on more test-to-code traceability recovery approaches by incorporating new approaches that are in the process of research. Most importantly, as a supplement to
OF
APPROACHES
theoretical evaluation in this paper, we have planned to perform an experimental study and replications of the current claims related to the approaches to empirically assess their effectiveness, e.g. accuracy of the current results, and efficiency, e.g. cost and effort required in recovering links, as part of our future work agenda. ACKNOWLEDGMENT The authors would like to thank Dr. Will Venters and the anonymous reviewers for providing their insightful comments and useful suggestions on earlier versions of this paper. They thank all those researchers whose works are referenced.
PARIZI et al.: ACHIEVEMENTS AND CHALLENGES IN STATE-OF-THE-ART SOFTWARE TRACEABILITY BETWEEN TEST AND CODE ARTIFACTS
REFERENCES [1] B. Zachariah, “Analysis of software testing strategies through attained failure size,” IEEE Trans. Rel., vol. 61, pp. 569–579, 2012. [2] S. Maghsoodloo, D. B. Brown, and C.-J. Lin, “Reliability and cost analysis of an automatic prototype generator test paradigm,” IEEE Trans. Rel., vol. 41, pp. 547–553, 1992. [3] D. Cotroneo, R. Pietrantuono, and S. Russo, “Combining operational and debug testing for improving reliability,” IEEE Trans. Rel., vol. 62, pp. 408–423, 2013. [4] Y. K. Malaiya, M. N. Li, J. M. Bieman, and R. Karcich, “Software reliability growth with test coverage,” IEEE Trans. Rel., vol. 51, pp. 420–426, 2002. [5] B. Zachariah and R. N. Rattihalli, “Failure size proportional models and an analysis of failure detection abilities of software testing strategies,” IEEE Trans. Rel., vol. 65, pp. 246–253, 2007. [6] R. Matias, P. A. Barbetta, K. S. Trivedi, and P. J. F. Filho, “Accelerated degradation tests applied to software aging experiments,” IEEE Trans. Rel., vol. 59, pp. 102–114, 2010. [7] P. Lago, H. Muccini, and H. van Vliet, “A scoped approach to traceability management,” J. Syst. Software, vol. 82, pp. 168–182, 2009. [8] X. Chen and J. Grundy, “Improving automated documentation to code traceability by combining retrieval techniques,” in Proc. 26th IEEE/ACM Int. Conf. Automated Software Eng., 2011, pp. 223–232. [9] R. Dömges and K. Pohl, “Adapting traceability environments to project-specific needs,” Commun. ACM, vol. 41, pp. 54–62, 1998. [10] S. Winkler and J. V. Pilgrim, “A survey of traceability in requirements engineering and model-driven development,” Software Syst. Model., vol. 9, pp. 529–565, 2010. [11] R. Watkins and M. Neal, “Why and how of requirements tracing,” IEEE Software, vol. 11, pp. 104–106, 1994. [12] A. Ghazarian, “A research agenda for software reliability,” IEEE Trans. Rel., IEEE Reliability Soc. Technical Operations Annual Tech. Rep. 2010, vol. 59, pp. 449–482, 2010. [13] C.-Y. Huang and S.-Y. Kuo, “Analysis of incorporating logistic testing-effort function into software reliability modeling,” IEEE Trans. Rel., vol. 51, pp. 261–270, 2002. [14] B. V. Rompaey and S. Demeyer, “Establishing traceability links between unit test cases and units under test,” in Proc. 2009 European Conf. Software Maint. Reeng., 2009, pp. 209–218. [15] Z. Wang, K. Tang, and X. Yao, “Multi-objective approaches to optimal testing resource allocation in modular software systems,” IEEE Trans. Rel., vol. 59, pp. 563–575, 2010. [16] C.-Y. Huang and M. R. Lyu, “Optimal testing resource allocation, sensitivity analysis in software development,” IEEE Trans. Rel., vol. 54, pp. 592–603, 2005. [17] H. Ohtera and S. Yamada, “Optimal allocation and control problems for software-testing resources,” IEEE Trans. Rel., vol. 39, pp. 171–176, 1990. [18] P. Kubat and H. S. Koch, “Managing test-procedures to achieve reliable software,” IEEE Trans. Rel., vol. R-32, pp. 299–303, 1983. [19] S.-Y. Kuo, C.-Y. Huang, and M. R. Lyu, “Framework for modeling software reliability, using various testing-efforts and fault-detection rates,” IEEE Trans. Rel., vol. 50, pp. 310–320, 2001. [20] A. Espinoza and J. Garbajosa, “A study to support agile methods more effectively through traceability,” Innovations Syst. Software Eng., vol. 7, pp. 53–69, 2011. [21] S. K. Sundaram, J. H. Hayes, A. Dekhtyar, and E. A. Holbrook, “Assessing traceability of software engineering artifacts,” Require. Eng., vol. 15, pp. 313–335, 2010. [22] G. Antoniol, G. Canfora, G. Casazza, A. D. Lucia, and E. Merlo, “Recovering traceability links between code and documentations,” IEEE Trans. Software Eng., vol. 28, pp. 970–983, 2002. [23] A. Bacchelli, M. Lanza, and R. Robbes, “Linking e-mails and source code artifacts,” in Proc. 32nd ACM/IEEE Int. Conf. Software Eng., Cape Town, South Africa, 2010, pp. 375–384. [24] A. Egyed, “A scenario-driven approach to trace dependency analysis,” IEEE Trans. Software Eng., vol. 29, pp. 116–132, 2003. [25] W. Jirapanthong and A. Zisman, “XTraQue: Traceability for product line systems,” Software Syst. Model., vol. 8, pp. 1619–1366, 2009. [26] A. D. Lucia, F. Fasano, R. Oliveto, and G. Tortora, “Recovering traceability links in software artifact management systems using information retrieval methods,” ACM Trans. Software Eng. Methodol., vol. 16, pp. 13:1–13:50, 2007. [27] A. Marcus and J. I. Maletic, “Recovering documentation-to-sourcecode traceability links using latent semantic indexing,” in Proc. 25th Int. Conf. Software Eng., Portland, OR, USA, 2003, pp. 125–135.
925
[28] X. Wang, G. Lai, and C. Liu, “Recovering relationships between documentation and source code based on the characteristics of software engineering,” Electron. Notes Theoret. Comput. Sci., vol. 243, pp. 121–137, 2009. [29] R. Witte, Q. Li, F. F. Informatic, Y. Zhang, and J. Rilling, “Text mining and software engineering: An integrated source code and document analysis approach,” IET Software, vol. 2, pp. 1–19, 2008. [30] J. Cleland-Huang, R. Settimi, C. Duan, and X. Zou, “Utilizing supporting evidence to improve dynamic requirements traceability,” in Proc. 13th IEEE Int. Conf. Require. Eng., 2005, pp. 135–144. [31] A. Qusef, G. Bavota, R. Oliveto, A. D. Lucia, and D. Binkley, “Recovering test-to-code traceability using slicing and textual analysis,” J. Syst. Software, vol. 88, pp. 147–168, 2014. [32] Center of Excellence for Software Traceability. [Online]. Available: http://www.coest.org/ Dec. 2012 [33] C. Wiederseiner, V. Garousi, and M. Smith, “Tool support for automated traceability of test/code artifacts in embedded software systems,” in Proc. IEEE 10th Int. Conf. Trust, Security and Privacy in Computing and Communications, 2011, pp. 1109–1117. [34] O. C. Z. Gotel and A. C. W. Finkelstein, “An analysis of the requirements traceability problem,” in Proc. 1st Int. Conf. Require. Eng., 1994, pp. 94–101. [35] H. U. Asuncion, A. U. Asuncion, and R. N. Taylor, “Software traceability with topic modeling,” in Proc. 32nd ACM/IEEE Int. Conf. Software Eng., Cape Town, South Africa, 2010, pp. 95–104. [36] J. Cleland-Huang, R. Settimi, E. Romanova, B. Berenbach, and S. Clark, “Best practices for automated traceability,” Computer, vol. 40, pp. 27–35, 2007. [37] C. S. Corley, N. A. Kraft, L. H. Etzkorn, and S. K. Lukins, “Recovering traceability links between source code and fixed bugs via patch analysis,” in Proc. 6th Int. Workshop Traceability in Emerging Forms of Software Eng., Honolulu, HI, USA, 2011, pp. 31–37. [38] N. Kaushik, L. Tahvildari, and M. Moore, “Reconstructing traceability between bugs and test cases: An experimental study,” in Proc. 18th Working Conf. Reverse Eng., 2011, pp. 411–414. [39] G. Capobianco, A. De Lucia, R. Oliveto, A. Panichella, and S. Panichella, “Traceability recovery using numerical analysis,” in Proc. 16th Working Conf. Reverse Eng., 2009, pp. 195–204. [40] M. Lormans and A. V. Deursen, “Can LSI help reconstructing requirements traceability in design and test?,” in Proc. Conf. Software Maint. Reeng., 2006, pp. 47–56. [41] H. M. Sneed, “Reverse engineering of test cases for selective regression testing,” in Proc. 8th Eur. Conf. Software Maint. Reeng., 2004, pp. 69–74. [42] P. Bouillon, J. Krinke, N. Meyer, and F. Steimann, “EZUNIT: A framework for associating failed unit tests with potential programming errors,” in Proc. 8th Int. Conf. Agile Process. Software Eng. Extreme Program., Como, Italy, 2007, pp. 101–104. [43] B. V. Rompaey, “Developer testing as an asset during software evolution: A series of empirical studies,” Ph.D. dissertation, Universiteit Antwerpen, Antwerp, Belgium, 2009. [44] M. Fewster and D. Graham, Software Test Automation: Effective Use of Test Execution Tools. Boston, MA, USA: ACM Press/AddisonWesley, 1999. [45] G. Meszaros, XUnit Test Patterns: Refactoring Test Code. Boston, MA, USA: Addison-Wesley, 2007. [46] A. Qusef, R. Oliveto, and A. De Lucia, “Recovering traceability links between unit tests and classes under test: An improved method,” in Proc. 2010 IEEE Int. Conf. Software Maint., 2010, pp. 1–10. [47] K. Beck and E. Gamma, “Test infected: Programmers love writing tests,” Java Report, vol. 7, pp. 51–56, 1998. [48] B. G. Ryder, “Constructing the call graph of a program,” IEEE Trans. Software Eng., vol. SE-5, pp. 216–226, 1979. [49] A. V. Deursen, L. Moonen, A. V. D. Bergh, and G. Kok, “Refactoring test code,” in Proc. 1st eXtreme Program. Flexible Process. Conf., 2001, pp. 92–95. [50] S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, “Indexing by latent semantic analysis,” J. Amer. Soc. Inf. Sci., vol. 41, pp. 391–407, 1990. [51] C. Ziftci and I. Kruger, “Feature location using data mining on existing test-cases,” in Proc. 19th Working Conf. Reverse Eng., 2012, pp. 155–164. [52] A. Qusef, G. Bavota, R. Oliveto, A. De Lucia, and D. Binkley, “SCOTCH: Test-to-code traceability using slicing and conceptual coupling,” in Proc. 27th IEEE Int. Conf. Software Maint., 2011, pp. 63–72.
926
IEEE TRANSACTIONS ON RELIABILITY, VOL. 63, NO. 4, DECEMBER 2014
[53] A. Qusef, R. Oliveto, and A. De Lucia, “Recovering traceability links between unit tests and classes under test: An improved method,” in Proc. 2010 IEEE Int. Conf. Software Maint. (ICSM), 2010, pp. 1–10. [54] A. Qusef, “Recovering test-to-code traceability via slicing and conceptual coupling,” in Proc. 18th Working Conf. Reverse Eng., 2011, pp. 417–420. [55] B. Korel and J. W. Laski, “Dynamic slicing of computer programs,” J. Syst. Software, vol. 13, pp. 187–195, 1990. [56] I. Galvao and A. Goknil, “Survey of traceability approaches in modeldriven engineering,” in Proc. 11th IEEE Int. Enterprise Distributed Object Computing Conf., 2007, pp. 313–324. [57] A. D. Lucia, R. Oliveto, and G. Tortora, “Assessing IR-based traceability recovery tools through controlled experiments,” Empirical Software Eng., vol. 14, pp. 57–92, 2009. [58] F. A. C. Pinheiro, “Requirements traceability,” in Perspectives on Software Requirements, J. C. D. Sampaio do Prado Leite and J. Horacio, Eds. Berlin, Germany: Springer, 2003, pp. 93–113. [59] A. Kannenberg and H. Saiedian, “Why software requirements traceability remains a challenge,” CrossTalk, J. Defense Software Eng., pp. 14–19, 2009. [60] A. Q. G. Bavota, R. Oliveto, A. D. Lucia, and D. Binkley, “Evaluating test-to-code traceability recovery methods through controlled experiments,” J. Software: Evolution and Process, 2013, 10.1002/smr.1573. [61] G. Spanoudakis, A. Zisman, E. Pérez-Miñana, and P. Krause, “Rulebased generation of requirements traceability relations,” J. Syst. Software, vol. 72, pp. 105–127, 2004. [62] M. Ivarsson and T. Gorschek, “A method for evaluating rigor and industrial relevance of technology evaluations,” Empirical Software Eng., vol. 16, pp. 365–395, 2011. [63] J. P. Almeida, P. V. Eck, and M.-E. Iacob, “Requirements traceability and transformation conformance in model-driven development,” in Proc. 10th IEEE Int. Enterprise Distributed Object Computing Conf., 2006, pp. 355–366. [64] G. Kiczales, “Aspect-oriented programming,” ACM Comput. Surveys, vol. 28, 1996. [65] D. S. Batory, J. N. Sarvela, and A. Rauschmayer, “Scaling step-wise refinement,” IEEE Trans. Software Eng., vol. 30, pp. 355–371, 2004. [66] B. Paech and A. V. Knethen, “A Survey on Tracing Approaches in Practice and Research,” Fraunhofer-Institut Experimentelles Software Engineering, Kaiserslautern, Germany, Technical Report IESE, Report Nr. 095.01/E, 2002. [67] A. V. Knethen, “Change-oriented requirements traceability. Support for evolution of embedded systems,” in Proc. Int. Conf. Software Maint., 2002, pp. 482–485. [68] S. Rochimah, W. M. N. Kadir, and A. H. Abdullah, “An evaluation of traceability approaches to support software evolution,” in Proc. Int. Conf. Software Eng. Adv., 2007, pp. 19–26. [69] J. Buckley, T. Mens, M. Zenger, A. Rashid, and G. Kneisel, “Towards a taxonomy of software change,” Int. J. Software Maint. Evolution: Research and Practice, vol. 17, pp. 309–332, 2005. [70] R. Torkar, T. Gorschek, R. Feldt, M. Svahnberg, U. A. Raja, and K. Kamran, “Requirements traceability: A systematic review and industry case study,” Int. J. Software Eng. Knowledge Eng., vol. 22, pp. 385–433, 2012.
[71] J. S. Her, H. Yuan, and S. D. Kim, “Traceability-centric model-driven object-oriented engineering,” Inf. Software Technol., vol. 52, pp. 845–870, 2010. [72] N. Aizenbud-Reshef, B. T. Nolan, J. Rubin, and Y. Shaham-Gafni, “Model traceability,” IBM Syst. J., vol. 45, pp. 515–526, 2006. [73] A. Abadi, M. Nisenson, and Y. Simionovici, “A Traceability Technique for Specifications,” in Proc. 16th IEEE Int. Conf. Program Comprehension, 2008, pp. 103–112. [74] G. Salton, A. Wong, and C. S. Yang, “A vector space model for automatic indexing,” Commun. ACM, vol. 18, pp. 613–620, 1975. [75] T. Hofmann, “Probabilistic latent semantic indexing,” in Proc. 22nd Annu. Int. ACM SIGIR Conf. Research and Development in Information Retrieval, 1999, pp. 50–57. [76] A. Globerson and N. Tishby, “ Sufficient dimensionality reduction,” J. Machine Learn. Res., vol. 3, pp. 1307–1331, 2003. [77] M. Borg, P. Runeson, and L. Brodén, “Evaluation of traceability recovery in context: A taxonomy for information retrieval tools,” in Proc. 16th Int. Conf. Evaluation & Assessment in Software Eng., 2012, pp. 111–120.
Reza Meimandi Parizi received the Doctoral degree in software engineering. He is a Lecturer in the School of Computing and IT, Taylor’s University, Selangor, Malaysia. His research interests in software engineering include automated software testing, software traceability, software reliability, object- and aspect-oriented programming, and empirical studies.
Sai Peck Lee (M’99) received the Ph.D. degree in computer science from Université Paris 1 Panthéon-Sorbonne, France. She is a Professor in the Department of Software Engineering, University of Malaya, Kuala Lumpur, Malaysia. Her current research interests include requirements engineering, software traceability and clustering, software quality, software reuse, object-oriented techniques and CASE tools development. She has published an academic book and book chapters, as well as over 100 papers in various refereed journals and conference proceedings. She has actively participated in conference program committees and is currently in several expert review panels, both locally and internationally.
Mohammad Dabbagh received the Master degree in software engineering. He is pursuing the Ph.D. degree in the Department of Software Engineering at the University of Malaya, Kuala Lumpur, Malaysia, and currently working as a research assistant. His research interests include requirements engineering, requirements prioritization, software quality, and software traceability.