The Journal of Systems and Software 86 (2013) 587–603
Contents lists available at SciVerse ScienceDirect
The Journal of Systems and Software journal homepage: www.elsevier.com/locate/jss
A mapping study to investigate component-based software system metrics Majdi Abdellatief ∗ , Abu Bakar Md Sultan, Abdul Azim Abdul Ghani 1 , Marzanah A. Jabar Department of Information System, Faculty of Computer Science & Information Technology, University Putra Malaysia, 43400 Serdang, Selangor, Malaysia
a r t i c l e
i n f o
Article history: Received 20 May 2011 Received in revised form 7 August 2012 Accepted 5 October 2012 Available online 13 October 2012 Keywords: Systematic mapping study Software metrics Software components Component-based software system Software quality
a b s t r a c t A component-based software system (CBSS) is a software system that is developed by integrating components that have been deployed independently. In the last few years, many researchers have proposed metrics to evaluate CBSS attributes. However, the practical use of these metrics can be difficult. For example, some of the metrics have concepts that either overlap or are not well defined, which could hinder their implementation. The aim of this study is to understand, classify and analyze existing research in component-based metrics, focusing on approaches and elements that are used to evaluate the quality of CBSS and its components from a component consumer’s point of view. This paper presents a systematic mapping study of several metrics that were proposed to measure the quality of CBSS and its components. We found 17 proposals that could be applied to evaluate CBSSs, while 14 proposals could be applied to evaluate individual components in isolation. Various elements of the software components that were measured are reviewed and discussed. Only a few of the proposed metrics are soundly defined. The quality assessment of the primary studies detected many limitations and suggested guidelines for possibilities for improving and increasing the acceptance of metrics. However, it remains a challenge to characterize and evaluate a CBSS and its components quantitatively. For this reason, much effort must be made to achieve a better evaluation approach in the future. © 2012 Elsevier Inc. All rights reserved.
1. Introduction Component-based software engineering (CBSE) has been characterized by two development processes: the development of components for reuse and the development of component-based software systems (CBSS) with reuse by integrating components that have been deployed independently. CBSE proved to be the best practices development paradigm in terms of both time and cost (Heineman and Councill, 2001; Crnkovic and Larsson, 2002; Cesare et al., 2006; Pandeya and Tripathi, 2011). For continuous success of this developmental approach, the evaluation of CBSSs and individual components is an essential research area. Being able to objectively measure the quality of CBSS attributes, helps us to better understand, evaluate, and control the quality of CBSSs and to isolate weaknesses over the entire software life cycle. The two different processes of CBSE led us to distinguish between metrics that are relevant to component producers and those that are relevant to component consumers. Component producers are concerned with the design, implementation and maintenance of individual
∗ Corresponding author. Tel.: +966537838449; fax: +60 389466575. E-mail addresses:
[email protected] (M. Abdellatief),
[email protected] (A.B.M. Sultan),
[email protected] (A.A.A. Ghani),
[email protected] (M.A. Jabar). URL: http://profile.upm.edu.my/azim/en.html (A.A.A. Ghani). 1 Tel.: +60389471735. 0164-1212/$ – see front matter © 2012 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.jss.2012.10.001
components, whereas component consumers search for specific components, evaluate them and integrate them to construct a CBSS. In the literature, there is a consensus that CBSE metrics require a different approach from that of structural software or objectoriented metrics. One of the difficulties of applying the existing traditional metrics to a CBSS is the inadequacy of the measurement unit (Cho et al., 2001; Gill and Grover, 2003). Procedural metrics focus on measures that are derived from code, for example, lines of code (LOC). Object-oriented metrics focus on measures that are derived from both the code level and the higher level units, such as methods, classes, packages or subsystems. Code-based metrics are clearly inappropriate for CBSS evaluation because components are considered to be black-box software. Object-oriented metrics are also restricted in their application to CBSSs because CBSS interfaces are usually specified at the component level, not at the class level. Thus, several authors have described different techniques and guidelines and have proposed a wide-ranging set of metrics for assessing the quality of CBSS attributes. The limitations of existing CBSE metrics approaches not only are the lack of consistent approaches and measures that provide a reliable method to evaluate component quality, but also include the ambiguity in their definitions and the lack of an appropriate mathematical property that can fail quality metrics. When a measure is informally presented (i.e., using human language), practitioners applying a specific metric to the same system can interpret it in different ways and obtain completely different results (Narasimhan et al., 2009; Serban et al., 2010).
588
M. Abdellatief et al. / The Journal of Systems and Software 86 (2013) 587–603
The work described in this paper not only extends and updates the previous reviews (Bertoa et al., 2003; Goulão and Abreu, 2004b; Mahmood et al., 2005; Ismail et al., 2008; Kalaimagal and Srinivasan, 2008) but also provides the goal of supporting and directing future research. Our review differs from previous reviews that represent the literature in the CBSS quality evaluations with respect to the following elements: • Different goal. The main aim of this review is to understand, classify and analyze existing metrics for measuring the quality of CBSSs and their components, to direct and support future research, while other reviews (Bertoa et al., 2003; Goulão and Abreu, 2004b; Mahmood et al., 2005; Ismail et al., 2008; Kalaimagal and Srinivasan, 2008) aim mainly at provide an overview of quality models for component evaluations. Certainly, a difference in goals leads to a different focus. • Different scope and review perspective. A software metric involves not only the defining adequate metrics for quality attributes but also the extent to which they are empirically validated. In this paper, our review is focused on the both metrics definition and validation. The reviews in Bertoa et al. (2003), Goulão and Abreu (2004b), Ismail et al. (2008) and Kalaimagal and Srinivasan (2008) covered a wider scope (i.e., CBSS quality models and related attributes, CBSS metrics and validation approaches) and the survey by Mahmood et al. (2005) covered the areas of cost estimation, formalization techniques for component models and metrics for the following quality attributes: reliability, performance, maintainability and testability. While the other reviews considered both the perspectives of the producers and consumers of the components, in our review, we have an emphasis on the component consumer’s viewpoint only. • Systematic mapping review and more comprehensive approach. We based our review on a systematic mapping review, which led to the identification of 36 studies. The review in Kalaimagal and Srinivasan (2008) is based on only 3 articles, and that in Ismail et al. (2008) is based on only 4 articles. The review in Goulão and Abreu (2004b) is based on 9 articles. In the review by Mahmood et al. (2005), it is difficult to determine how many primary studies contributed to their study. None of the previous reviews present a systematic mapping review (Kitchenham, 2004, 2007; Biolchini et al., 2005). Compared to a traditional literature review, a systematic review has advantages: a well-defined methodology that reduces bias and a wider context that allows for a general conclusion (Petersen et al., 2008). • Classification of studies. We classify the identified papers with respect to the scope (Goulão and Abreu, 2004b), the study context (Jorgensen and Shepperd, 2007), the target component model and the granularity of the metrics. We also classify the metrics-based approaches for measuring the quality CBSS with respect to the metric name, description, assumption and metric interpretation. Goulão and Abreu (2004b) have provided a framework for the characterization of component evaluation approaches, to provide a comparison review of the proposals. The framework includes the following: an outline of the aim of the proposal, the type of component that is considered, how the metrics are defined, the most noticeable feature of the proposal and an assessment of its maturity level. The reviews in Bertoa et al. (2003), Goulão and Abreu (2004b), Mahmood et al. (2005), Ismail et al. (2008), Kalaimagal and Srinivasan (2008) and Jianguo et al. (2009) discussed each paper in their review according to this framework. Based on what we believed were interesting issues to review, we conducted a systematic mapping study that is a form of systematic literature review with the same basic methodology but with the aim of identifying and categorizing the available research on a
specific topic. The systematic mapping review method has allowed us to identify the relationship between the researchers and the practitioners, to assess the current state of metrics research in the context of CBSSs and to identify areas that need improvement by outlining the limitations of current research. We believe that the results that are obtained from this mapping study are important for the community of researchers who want to know the gaps in the literature and who want to understand topics that have been researched. This review will also be useful for practitioners as an indication of maturity in the selection of the existing metrics and to remain up-to-date with the state-of-the-art. In addition, new and enhanced metrics can be proposed based on the research that has already been performed in this area. This paper is organized as follows: Section 2 discusses CBSS concept. Section 3 describes the methodology. Section 4 provides in more detail the results of our research questions. Section 5 discusses and analyzes the results. Section 6 concludes the paper and identifies future trends. 2. CBSS concept from the perspective of defining metrics Several definitions of a component are given in Crnkovic and Larsson (2002), Faison (2002), Szyperski (2002) and Gill and Grover (2003); each of the definitions states different characteristics for the software components. This variety indicates that there is much debate in the literature about the definition of software components. Some definitions focus on structural characteristics and others on functional characteristics. What is common across all of the definitions is the notion of reusability. Reusability implies that the functionality contained in the component can be accessed by others. The functionality of components is defined by their interface. Internal components are hidden and are unreachable except via abstract interfaces. However, the software engineering community as whole has no agreement on a single definition of software components, but there is a relatively large consensus on Szyperski’s definition of software components (Szyperski, 2002). Following Szyperski’s definition, CBSSs are complex artifacts that contain component elements, each of which may possess its own attributes. These component elements can be identified in terms of the underlying developmental methodology. For example, in a CBSE paradigm, the fundamental building blocks could be interfaces, which are organized into components and may have properties, methods and events, and the relationships among the interfaces. As we mentioned earlier, the perspectives by which the CBSE metrics are viewed are the producers and consumers of the components. From both perspectives, the component evaluation is considered to be an important activity. A clear discussion of both the producer and the consumer perspective is provided in Venkatesan and Krishnamoorthy (2009). However, we need to have metrics for Szyperski’s software components if we want to talk about “a real” attempt to turn software development into engineering through the principles of CBSE, with the goal of reducing the development cost and effort. In this paper, considering Szyperski’s definition, we will only consider primary studies in which components are black-boxes. From a component consumer perspective, internal code metrics for the analysis of the components are not useful.2 Instead, complexity of the interface, reusability of the component for different contexts, testability, suitability and dependences of components are the interesting attributes that should
2 This claim should, as we understand it, not be interpreted outside the context of CBSE metrics hypothesis. Obviously, from both perspective component consumer and producer, there is a need to have metrics for Szyperski’s software components as primary interest in order to develop consistent approaches and measures that provide a reliable method to evaluate CBSS quality.
M. Abdellatief et al. / The Journal of Systems and Software 86 (2013) 587–603
implements an interface should be completely inaccessible and invisible to CBSS developers.
Component Developers visible_to visible_to
Component Specification
Defines
visible_to
Interface {Method Event Property}
visible_to Component
589
Body
Implements
visible_to CBSS Developers
Fig. 1. Simplified model of a component.
be assessed. Certainly, the perspective of the component producers is also important because they not only produce the source code but they also cannot simply ignore the quality of the internal structure. In addition, much CBD is performed ‘in-house’, where developers would have access to source code (if required). However, in CBSE, we need a uniform approach for rating CBSS quality from the perspective of both the component producers and the consumers, according to Szyperski’s definition. Therefore, a white-box metrics viewpoint is considered out of the scope of this review. In this paper, we visualize software component concepts from the perspective of component developers and CBSS developers. This strategy offers an efficient approach to understanding and assessing the different sets of existing metrics. Fig. 1 provides a simplified model of a component such that a specification defines the functionality and behavior of a component, which is composed of an interface part and a body part. The specification and interface are visible to CBSS developers, whereas the specification, interface and body are all visible to component developers. The interface definition includes the declaration of methods, properties and events for specifying the functionality and behavior that is identified in the specification. The body of the system implements the external methods, and any other internal methods that are needed to provide the functionality and behavior are identified in the specification. Metrics may be derived from the specification, interface or body, but only metrics that are derived from the interface and specification can be used by CBSS developers. Therefore, for a better understanding of the interface specification, we further refine the generic model of a component to explain the elements of an interface. This strategy will provide a clear picture of the elements of the software component that seem necessary to be considered in the measurement. 2.1. Required elements of a software component The measurement of software components should be based on the elements that characterize the software components, such as the interface, which is composed of methods, events, properties and their signatures (Hamilton, 1997; Han, 1998; Gill and Grover, 2004; Sharma et al., 2008). These elements mainly define the overall capability of the software components. In the context of defining metrics, we think of interfaces, methods, events and properties as elements that have characteristics or attributes and also have relationships with other elements. 2.1.1. Interfaces An interface is defined as the “name of a collection of functions, and provides only the descriptions and the protocols of these functions” (Crnkovic and Larsson, 2002). Specifically, an interface can only contain declarations of named methods, properties and events. The implementation of an interface is completed in the body of a component using any programming language. The code that
2.1.2. Interface methods The interface specification describes the functionalities of components through method declarations. The method specified definition is in terms of parameter types, constituting the signature of the method. Message-sending is a function of the method that is implemented in the component, rather than the interface. For this reason, an interface receives messages from the method that is built into a component. Moreover, a method specifies inputs and outputs, to capture the dynamic behavioral capability of the component (Mahmood and Lai, 2008). 2.1.3. Property First, we should notice the difference between the attributes in object-oriented programming and the properties in componentbased development. In object-oriented programming, attributes represent a class’s variables, while in CBD, a property is a type of method that shows the state of an attribute by reading and writing its value. However, the distinction between an attribute and a property depends largely on the context, and in many cases, they are synonyms. In an interface specification, a property is a name that is given to an attribute (Hamilton, 1997; Faison, 2002; Sharp, 2008). A set of properties is provided with each component that conforms to a given function (Washizaki et al., 2003). In a.NET and JavaBeans model, CBS developers can change the property values through visual interfaces or by using code, to customize and configure components at run time (Han, 1998; Washizaki et al., 2003; Gill and Grover, 2004; Sharma et al., 2008). For example, think about Television, Personal Computers (PC) and remote control machines. They have properties such as an ON/OFF button, volume adjustment, a channel changer and color adjustment. According to the way that CBS developers manipulate the property values, we can classify the following types of properties: • Simple property: a property with a single value for which the changes are independent of changes in any other property. For example, the PC screen ON/OFF button. • Indexed property: a property that supports a range of values instead of a single value. For example, the volume adjustment could have a range of values. • Bound property: a property for which a change to the property results in a notification that is sent to other components. For example, the channel changer. • Constrained property: a property for which a change to the property results in validation by another component. The other component may reject the change if it is not appropriate. For example, if the value of the channel changer was set to an invalid channel value, the digital machine may not respond. 2.1.4. Event An event is a name that is given to a function in an interface specification. In practice, an event is an action that is performed on the control as a notification, which means that something of interest occurred or the current state changed. All of the component technologies, such as.NET and JavaBean, offer standard mechanisms and tools to control the dynamic behavior of the component (Gill and Grover, 2004). In the context of programming, events are similar to methods, but semantically, they stand for a call that is initiated by external causes (or is usually associated with asynchronous processing, for example, event driven programming), while “normal” methods are for calls that are made under program control (or via synchronous processing). Other than these situations,
590
M. Abdellatief et al. / The Journal of Systems and Software 86 (2013) 587–603
Databases ACM Digital Library, IEEE Explore, SpringerLink, Scopus, Science @ Direct and Google Scholar
Exclude studies on the basis of software metrics
Electronic Search
S te
p1
S te
p2
2445 papers
Exclude studies on the basis of CBSE metrics
S te
455 papers
p3
36 papers
Fig. 2. Steps of the search strategy.
methods and events are based on similar concepts (Cesare et al., 2006; Sharma et al., 2008). 2.1.5. Signatures A method declaration includes a list of parameters that can be passed to the methods when the method is called by another method (or by itself). The signature includes the parameter list and return values. Each signature must have its data type declared. A property and an event are also declared in a similar way (Mahmood and Lai, 2008). 3. Research method 3.1. Protocol development This paper presents a systematic mapping study of CBSS metrics based on guidelines that were proposed by Kitchenham (2004, 2007). We started by reviewing of the existing systematic literature (Brereton et al., 2007; Jorgensen and Shepperd, 2007; Staples and Niazi, 2007; Beecham et al., 2008; Catal and Diri, 2009; Khurum and Gorschek, 2009; Kitchenham et al., 2009; Lucas et al., 2009; Alves et al., 2010; Karg et al., 2010; Lisboa et al., 2010; Williams and Carver, 2010; Chen and Ali Babar, 2011). Then, we concentrated on developing a protocol for a systematic mapping study that has addressed questions that are related to the evaluation of CBSS attributes by following the guidelines of Budgen et al. (2008), Petersen et al. (2008), Kitchenham et al. (2010, 2011), Nakagawa et al. (2010) and Palacios et al. (2011). In the following sections, we will detail each process that we use. The execution of the overall process involves iterations, consultation and refinement of the defined process as a result of an anonymous review of an earlier version of this paper. 3.2. Research questions and motivation The following research questions have been addressed. RQ1. Are the measurements performed based on entire CBSSs or are they instead based on an individual component in isolation? In making an assessment of software artifacts, it is important to understand the granularity level of the metrics. For example, consider the failures per day; a failure could be recorded in terms of computer execution time (i.e., fine granularity) or calendar time (i.e., course granularity) (Kaner and Bond, 2004). Therefore, some authors Narasimhan et al. (2009) and Sedigh-Ali et al. (2001) claimed that individual component evaluation has become a very important task in CBSSs. Others Wallnau and Stafford (2002), Goulao and Abreu (2005) and Mahmood et al. (2005) claimed that evaluations should be performed on assemblies rather than on individual components. To address this question, we investigated and classified existing metrics as to whether a measurement was performed on the CBSS or on a single component; we describe the investigation in Section 5.2.
RQ2. Which elements of a CBSS are being measured? How were these elements defined and validated?Given that a component is similar to a black-box, we will discuss this question in Section 5.3 based on a definition of the metrics with respect to the number of issues, as follows: - The danger of an ambiguous definition of the elements can make it difficult to collect metrics data reliably and could lead to an incorrect interpretation of the metric values. - Whether the elements being measured are visible to CBSS developers or to only the component developers. - Whether the metrics definition and formulation are validated. - Limitations that restrict the practical use of the metrics from the perspective of measurement theory (Fenton and Pfleeger, 1997). RQ3. Are there limitations on the current research? The aim of this question is to identify any gaps in the current research, to suggest areas for future research. We will discuss this question in Section 5.4 with respect to the limitations that were identified by this mapping study. 3.3. Search process To determine how many primary studies relate to these research questions, we conducted an automated search to collect papers on CBSE metrics. The results obtained are shown in Fig. 2. In step 1, based on our experience and the terms used in Dybå and Dingsøyr (2008), Gómez et al. (2008), Khurum and Gorschek (2009) and Kitchenham (2010), we identify the following search strings: 1. 2. 3. 4.
measure OR metric OR quality OR evaluation OR attribute, software AND component, component-based AND software, COTS AND software.
To make the search comprehensive and precise, an expert librarian was consulted. All of the possible combinations of these identified search strings were tested in the following databases: ACM Digital Library, IEEE Explore, Springer Link, Scopus, Science Direct and Google Scholar. These databases were selected because they are accessible through our library. In step 2, a quick review of the title resulted in 455 papers that looked relevant to software metrics in general (including objectoriented and procedural metrics). Step 2 was planned to ensure that any important articles are not missed. The electronic versions of papers were stored in a RefWorks3 system, for ease of access during the review. In step 3, a more detailed review of the title, keywords and abstract using the exclusion and inclusion criteria defined in Section 3.4 was performed. Basically, only studies about the evaluation of CBSS were selected. Then, the reference lists
3 RefWorks is an online research management, writing and collaboration tool. It is designed to help researchers easily gather, manage, store and share all types of information, as well as generate citations and bibliographies (http://www.refworks.com/).
M. Abdellatief et al. / The Journal of Systems and Software 86 (2013) 587–603
containing the primary studies identified in the first step were searched manually. This step resulted in a list of 36 papers. A total of 31 of the 36 studies were primary studies, while five were secondary studies. Other researchers Jorgensen and Shepperd (2007) and Karg et al. (2010) used a similar or the same search approach. 3.4. Inclusion and exclusion criteria With respect to the research questions that are addressed in this paper, we excluded the following: (a) In step 2. Irrelevant studies or papers that lie outside the field of software metrics. (b) In step 3: • The studies that are related to object-oriented and procedural metrics. • The studies on process metrics and resource metrics because these metrics do not measure attributes that are used to evaluate CBSSs or their components. • Duplicate publications of the same study in different journals or publishers. This step is necessary because SCOPUS indexes IEEE, ACM and ELSEVIER publications. • Implementation metrics for individual component metrics (i.e., white-box metrics). Using a white-box tool procedure viewpoint and considering metrics over the internal structure of a component is no different from what the software engineering researchers have performed for object-oriented software since the mid-nineties, e.g., C&K metrics (Chidamber and Kemerer, 1994) or MOOD metrics (E Abreu, 1995). As such, it cannot be claimed to be CBSE specific. In contrast, papers on the following topics were included: • Both CBSS metrics and individual component metrics were included. • Specifically, we focus on the metrics that were proposed to evaluate internal and external quality attributes. • Papers published before 2011. 3.5. Quality Assessment Questions (QAQ) of primary studies It is not essential to include an assessment of quality in mapping studies, as discussed in Kitchenham et al. (2010). However, in this study, the goal of the quality evaluation is to assess whether the proposed metrics are meaningful, and the findings that were presented well would be of use to practitioners. While the research questions (RQs) aim to characterize each metric according to the basic principles of CBSE and the representation of measurement theory, the QAQs are an attempt to provide a brief overview of the proposal and to measure the quality of the reporting of a study’s concept, aims, context, data collection and analysis. Taken together, these QAQs could represent the concerns of the researchers and practitioners of the metrics. Therefore, the importance of such QAQs is not only to improve the quality of on-going studies but also to encourage researchers to assess their proposal before submitting it for publication. To address our goal, we used the following questions: QAQ1. Did the authors justify the need for their metrics or state what problem the metrics are intended to solve and provide a clear statement of the aims of the proposal? QAQ2. Did the authors appropriately present a research design to address the aims of the metric with respect to the underlying framework for the attributes (or a quality model)? QAQ3. Did the authors provide a specific hypothesis to be tested, state it clearly prior to defining the metric and discuss the theory
591
from which it is derived? Without an underlying theory and a shallow hypothesis, we cannot understand the metric. Consequently, we use inconsistent approaches and obtain inconsistent results. A good example of a defining hypothesis from a theory can be found in Vinter et al. (1998). QAQ4. Did the authors provide a clear unambiguous definition of the individual metrics and explain how metrics values could be measured from a specific entity at a specific point in time? The clear definition of the entities, attributes and how to measure them more consistently is very important (Kitchenham et al., 1995). QAQ5. Did the authors clearly identify who the metrics user is? It is also important to identify the target users of the metric. These users are mainly software architects, designers, analysts, developers, testers and maintainers, which can be depicted from the component producer or consumer perspectives. QAQ6. Did the authors specify the context in which the metrics would be used? For example, did they specify the point in the process when the metrics would be extracted and used, such as CBSS construction, CBSS quality assessment, CBSS deployment, and CBSS performance evaluation. It is difficult to understand and apply metrics if the context of a study is not fully defined. For example, for the context of software maintenance, see Kitchenham et al. (1999). QAQ7. Did the authors explain how the metrics data could be gathered? For example, did they explain the appropriate data collection tool and how the metrics values could be interpreted, to meet the needs of the metrics user? Kitchenham et al. (2001) provided a good discussion of many of the problems with data collection. Without a clear data collection template, practitioners applying a certain metric to the same system can interpret the collected data in different ways and obtain completely different results (Narasimhan et al., 2009; Serban et al., 2010). QAQ8. Did the authors identify any pre-conditions that must be met, of constraints/limitations that are related to the metrics or how the validity is assured? For example, are they appropriate only to specific component models, or only one point in the CBSS process, or only to large/small systems? Many authors discussed the differences between the underlying component model technologies (Estublier and Favre, 2002; Hnetynka, 2004; Coronato et al., 2005; Wang and Gian, 2005), and researchers may want to replicate the studies in different contexts. The questions from QAQ1 to QAQ8 were answered on an ordinal scale, as shown in Table 1. 3.6. Data extraction The candidate studies were collected, and all of the data that is related to the research questions and the broader aims of this study were extracted. The information that was extracted from each primary study included the following: 1. Whether the proposal applies to individual components or the full CBSS (see Tables 3 and 4). 2. Metric definition context: to state the goal of the paper, the CBSS attributes that are measured and how the authors defined the software components. 3. Target component model of the proposed metrics. 4. Granularity level of the metrics (see Tables 3 and 4). 5. Whether the metrics are collected at the component level or the CBSS level (see Tables 5–7). 6. Metrics full names and metrics acronyms. 7. Metrics descriptions: which state and what measurement approach is used, and how it is operationalized (i.e., what is being counted or measured).
592
M. Abdellatief et al. / The Journal of Systems and Software 86 (2013) 587–603
Table 1 The answers scored criteria. The answers
Ordinal scale of the answers
The answers are explicitly written in the primary study The answers can be mostly inferred from the primary study The answers can be somewhat inferred from the primary study The answers are undetectable in the primary study or unknown
Yes Mostly Somewhat No
Table 2 Summary of primary studies: an overview of the approaches that were followed to develop the CBSE metrics. Approach
Meaning of the approach
Internal structural-based metrics
Paper that attempts to measure the attributes of a component or CBSSs based on the detailed analysis of a component’s design, which is not available Paper that attempts to measure the attributes of components or CBSSs based on a graph theory representation Paper that attempts to measure the attributes of a component or CBSSs based on the specification of the CBSS Paper that attempts to measure the attributes of components or CBSSs that is based on a specific quality model Paper that attempts to discuss the requirements for metrics for a CBSE. No actual metrics are proposed. Instead, the papers proposed a set of quality attributes that should be measured through metrics or a framework for metrics definitions
Graph theory-based metrics Specification-based metrics Quality model-based metrics Others
8. Metrics assumptions and interpretation guidelines: to explain how the metric values could be interpreted, to meet the needs of the metrics users. This information allowed us to record the full details of the primary studies and to be specific about how each of them addressed our research questions. 3.7. Data analysis Based on Goulão and Abreu (2004b), we first classified the identified papers into component evaluation papers and CBSS evaluation papers according to the information that was extracted from the metric definition context and the granularity level of the metrics columns in Tables 3 and 4. We further extended the classification with respect to whether the individual metrics apply to individual components or full CBSSs, according to the context of the metric calculation and metric definition. The metric is classified as an individual component metric if the context of the metric value calculation requires a single independent component. This metric can be applied to measure individual component attributes before
the integration process. The metric is classified as a CBSS metric if the metric value calculation context requires assembly. These metrics can be applied to measure and compare individual component attributes in the context of CBSS. If the metric can be applied in both, we base our classification on the author’s view and paper objectives. The data extracted in Section 3.6 are analyzed with respect to RQs and QAQs, as stated in Fig. 3. For example, the answer of the RQ1 is extracted from in “whether the proposal is apply to individual component or full CBSS” and “whether the metrics are collected at component level or CBSS level”. 4. Results The results against each research question are presented in the subsequent sections. 4.1. Primary study background The summary data were generated by categorizing the research studies, as shown in Table 2. Of the 36 papers that were identified
Whether the proposal apply to individual component or full CBSS
RQ1
Whether the metrics are collected at component level ofCBSS level
RQ2
QAQ1 Metrics full names and metrics acronym QAQ2 Metrics description RQ3
QAQ3 Metric assumption and interpretation guidelines QAQ4 Granularity level ofthe metrics set QAQ5
Metric definition context
Target component model ofthe proposed metrics
QAQ6 QAQ7
QAQ8
Fig. 3. Mapping between research questions and quality assessment questions with data extraction.
M. Abdellatief et al. / The Journal of Systems and Software 86 (2013) 587–603
593
Table 3 Approaches to the evaluation of individual components. Primary studies
Metric context
Target component model
Granularity level of metric
Level of validation
Bertoa and Vallecillo (2002)
They defined, informally, metrics for the COTS component quality model. Szyperski’s definition of the software component (Szyperski, 2002) is adopted but it is treated as COTS (large grained) They informally identified the need for specific types of measures, without actually proposing any measures They analyzed existing popular definitions of components and then derived a new component definition A set of metrics for measuring the understandability and reusability of component interfaces is presented. Szyperski’s definition of a software component (Szyperski, 2002) is adopted Metrics for the reusability of a component are proposed and validated. Component is treated as a JavaBean component model (Hamilton, 1997) Interface complexity of a component is studied. Characterization of a component presented in (Han, 1998) is adopted A component quality model and related metrics for the evaluation is presented component is treated as black-box software Interface complexity of component is studied. Component is treated as a JavaBeans component model (Hamilton, 1997) Component complexity metric is proposed. Szyperski’s definition of a software component (Szyperski, 2002) is adopted Test Quality Rating (TQR) metric is proposed, to measure the dependability of the components. Component treated as a software system delivered in executable format It presents a high level discussion of component dependencies rather than a concrete proposal. Component treated as COTS software Formalization and an independent validation for reusability metrics proposed by Washizaki et al. (2003) are presented. Component is treated as a JavaBean component model (Hamilton, 1997) A set of metrics to assess the Usability of a software component is developed. Szyperski’s definition of a software component (Szyperski, 2002) is adopted, but it is treated as COTS (large grained) Adaptability, composability and complexity of individual components are studied. Szyperski’s definition of a software component (Szyperski, 2002) is adopted Cost based selection, which is calculated based on the quality attributes of the components, is presented. No specific definition has been adopted
–
Component
–
–
Component
–
.NET and COM
Interface
Small experiment
JavaBean
Component
Independently validated by Goulão and Abreu (2004a)
–
Interface
–
–
Component
Industrial experimental
JavaBean
Interface and component
Small experiments
JavaBean
Component
Small experiments
–
Component
Industrial experiments
JavaBeans, COM and CORBA
Component
–
JavaBean
Component
–
JavaBean, .NET and ActiveX
Component
Small experiments
–
Interface and component
Anecdotal
–
Component
Anecdotal
Gill and Grover (2003)
Boxall and Araban (2004)
Washizaki et al. (2003)
Gill and Grover (2004)
Alvaro et al. (2005)
Sharma et al. (2008)
Khimta et al. (2008)
Voas and Payne (2000)
Dias and Richardson (2001)
Goulão and Abreu (2004a)
Bertoa et al. (2006)
Rotaru and Dobre (2005)
Kaur and Mann (2010)
to be research studies, 31 were primary studies, while five were secondary studies. • 18% of the studies assume that a component is an assemblage of classes and the standard object-oriented metrics can be applied to the individual classes, summing up to give a measure for the component. There may be additional assumptions that are related to a component interface, which is assumed to be a class-based interface. However, given that the source codes of the software components are not available, and with respect to the basic characteristic of Syzperski’s components, these assumptions may not be completely true.
• Another 18% of the studies assume that the CBSS can be modeled graphically, with the components as nodes and the interfaces as edges, to identify direct interactions between components. They utilize this approach mainly because of the unavailability of component source code. However, Deo (2004) has discussed some unsolved problems in graph theory in engineering and computer science. • 36% of the research papers were specification-based metrics. These papers have assumed (correctly) the view of a component that is presented in Section 2 (see Fig. 1), and they proposed a set of metrics that depend on the information that is available at the specification of the components.
594
M. Abdellatief et al. / The Journal of Systems and Software 86 (2013) 587–603
Table 4 Approaches to the evaluation of CBSS. Primary studies
Metric context
Target component model
Granularity level of metrics
Level of validation
Narasimhan and Hendradjaya (2007)
Static and dynamic aspects for the assembly of components are proposed. Szyperski’s definition of a software component (Szyperski, 2002) is adopted
CORBA
Components and CBSS
Alhazbi (2004)
Architecture complexity of CBSS is studied, based on graph theory as a medium to represent a CBSS. No specific definition has been adopted Structural complexity of a CBSS written in UML is studied. Szyperski’s definition of a software component (Szyperski, 2002) is adopted Structural complexity of CBSS is studied and evaluated as maintainability indicators. Szyperski’s definition of software component (Szyperski, 2002) is adopted A function point like approach is named; a component point to measure the size of CBSS written in UML is presented. Szyperski’s definition of software component (Szyperski, 2002) is adopted Interaction complexity of CBSS is studied. Szyperski’s definition of software component (Szyperski, 2002) is adopted Analysis of dependency and interaction complexity of CBSS is studied based on graph theory. Component treated as black-box software A link-list based technique to measure the interaction density and dependency level of individual components, and a CBSS is presented. Component treated as black-box software Based on the concept of service utilization, a set of metrics to measure the fitness of a component in a specific architecture is developed. Product line components Informal metrics for three internal attributes, namely, the suitability, accuracy and complexity, and four external attributes, namely, the usability, maintainability, reusability and performance, are developed. Software component treated as black box software Metrics for the analysis of the cost and quality of COTS components are presented. Hopkins’ definition of a software component (Hopkins, 2000) is adopted Based on graph theory, a set of metrics is developed to measure the structure of the CBSS architecture. Szyperski’s definition of a software component (Szyperski, 2002) is adopted A comparison of three suites of metrics is studied, using a benchmark software program. Software component treated as black box software Coupling and cohesion metrics are proposed based on graph theory. Software component treated as black box software Based on graph theory, they defined a component dependency relationship. No specific definition has been adopted A conceptual framework for component based metrics definition is presented. Software component treated as black box software A metamodel extension to capture the formal definition of metrics for CBSE is presented. Szyperski’s definition of a software component (Szyperski, 2002) is adopted
–
Component and CBSS
Small experimental by link up in Narasimhan et al. (2009) –
UML
Interface and component
Anecdotal
–
CBSS
Anecdotal
UML
Interfaces and components
Small experiment
–
Component and CBSS
–
–
Component and CBSS
Anecdotal
JavaBean
Component and CBSS
Anecdotal
–
Component and CBSS‘
Anecdotal
–
Component and CBSS
Anecdotal
–
COTS based system
–
JavaBean and CORBA
CBSS
Anecdotal
–
Classes, components and CBSS Component and CBSS
Small experiment
–
Components
Anecdotal
–
CBSS
–
CORBA
Component and CBSS
Anecdotal
Mahmood and Lai (2008) Salman (2006)
Wijayasiriwardhane and Lai (2010)
Kharb and Singh (2008)
Gill and Balkishan (2008) Sharma et al. (2009)
Hoek et al. (2003)
Venkatesan and Krishnamoorthy (2009)
Sedigh-Ali et al. (2001)
Wei et al. (2009)
Narasimhan et al. (2009) Seker et al. (2004)
Ratneshwer (2010)
Serban et al. (2010)
Goulao and Abreu (2005)
• The rest of the papers have adopted a more sophisticated approach for introducing the proposed metric. They argued that, before using metrics for the design or integration of components, a relationship between metrics and quality attributes should be established, to ensure that the metrics provide a correct evaluation of the attribute that is visible to the user. 4.2. Quality assessment of primary studies We assessed the primary studies for quality using the QAQs that were addressed in Section 3.5. The quality assessment for each primary study is shown in Appendix A. The assessment was extracted in three steps. First, the first author selected the candidate studies
–
Anecdotal
and extracted all of the answers that were related to the quality assessment questions. We then randomly allocated 11 papers to each author of this study to assess independently. Second, all of the answers collected from each primary study were discussed in a formal meeting by other authors. During the meeting, the authors carefully cross-checked each answer for each QAQ, and justification was provided by each author. In case of a disagreement, a negotiation took place until we reached agreement. Third, the rows that were marked with an asterisk were discussed and checked with the second author, as a sample to insure that the answers to the criteria did not result in bias. However, we did not achieve a result that indicated reliable agreement. We resolved this situation by iterating the above three steps again, to achieve an agreement.
M. Abdellatief et al. / The Journal of Systems and Software 86 (2013) 587–603
595
Fig. 4. Overall quality assessment.
Fig. 4 presents an overview of the quality levels for each of the QAQs that are described in the previous section. This step is an attempt to measure how strong a case the original authors made when presenting their proposed metric. Our point is that it is not possible to define and validate a metric without clearly stating the addressed QAQs. In this chart, from the left to the right, we present each QAQ; from the front to the back, we present each of the analyzed rating scales; on the vertical axis, we have the quality level of each question. The overall low level of quality throughout the several ratings presented in our QAQs suggests that the metrics described in these papers have a number of limitations. The most interesting part is that QAQ1 identifies 64% and 25% of the primary studies, giving a total of 89% (30 papers) scored “Yes” and “Mostly”, respectively. These scores suggest that there is strong justification for the need for CBSE specific evaluation approaches. In contrast, the most disappointing aspect of the primary studies is the methodological weakness in their research process, which occurred in QAQ2 when 58% and 10% of the primary studies scored “somewhat” and “No”, respectively. This result may have occurred because most of the primary studies are conference papers and would have had limitations on the number of pages. 4.3. Component evaluation versus CBSS evaluation (RQ1) In this sub-section, we provide detailed information on the set of primary studies that are included in this paper, as shown in Tables 3 and 4. For each paper, we identified the metrics context, the target component model, the granularity of the metrics and the level of validation. The metric context column summarizes the aim of the paper with respect to the CBSS attributes being measured and how the authors treated the software components. We present a definition of the component that is adopted in each paper, to avoid any confusion that may arise in their absence. It is interesting to note that 28% of the primary studies explicitly adopted Szyperski’s definition, while 33% implicitly adopted it by treating software components as black-box software. A target component model column states whether the authors limit the proposed metrics to a specific component model (e.g., CCM, .NET, JavaBeans or EJB). The fields that are marked with a dashed line are fields for which the authors of the metrics did not specify a specific component model for the proposed metrics. Component developers build software components that meet the requirements of a specific
component technology, such as JavaBeans, COM, CORBA, EJB and .NET. Each component technology has its own requirements with regard to the model specifications and deployment and has different techniques for interaction. Furthermore, each component technology is supported by a builder tool. For example, Microsoft Visual Studio is a builder tool for .NET and COM technologies, and BDK (Bean Development Kit) is the builder tool for JavaBeans. However, CBSE metrics should, as much as possible, be defined independently from a specific component technology and then be mapped carefully in exact accordance with the target component technology. The definition of the metrics must be both generic enough to encompass all of the component model technologies and specific enough to provide developers with adequate support for implementing products with high quality (Crnkovic and Larsson, 2002; Coronato et al., 2005). It is difficult to understand metrics without understanding the architectural differences between underlying component models. The column (labeled the “granularity level of metric”) states the aggregation levels of the metrics. For example, the number of interfaces in a system, which can be computed as the sum of the number of interfaces of all of the components in the system, is grouped under CBSS-level metrics because it cannot be computed at the interface level itself. The granularity level constrains the evaluation that can be performed on the software artifacts. The last column (labeled the “level of validation”) represents the extent to which the proposed metrics have been validated. The level of validation is classified according to the criteria presented in Goulão and Abreu (2004b) as follow: (1) Anecdotal: example is provided to motivate the usefulness and applicability of the proposed metrics. (2) Small experimental: an experimental is conducted to assess the proposed metrics, but the sample of data does not allow generalizing conclusion. (3) Industrial experimental: an experimental with a significant sample of real-world application is conducted. (4) Independently validated: an experiment made by third-party team confirm the conclusion made by the original authors. To collect the above information, we conducted a citation analysis using Google Scholar and Scopus to look up follow-up papers that would complement the validation of metrics presented in an
596
M. Abdellatief et al. / The Journal of Systems and Software 86 (2013) 587–603
Table 5 Example of metrics that can be collected at the CBSS level from the specification. Reference
Metric name
Description
Assumption and interpretation guidelines
Salman (2006)
Total Number of Components (TNC)
This metric counts the total number of components in the system
Salman (2006)
Average Number of Methods per Component (ANMC)
This metric is estimated by dividing the total number of methods by the total number of components in the system
Salman (2006)
Average Number of Interfaces per Component (ANIC) Component Interaction Density (CID)
This metric is estimated by dividing the total number of interfaces by the total number of components in a system The ratio of the actual number of interactions to the available number of interactions in a component The ratio of the actual number of incoming interactions (required interfaces) to the available number of incoming interactions in a component The ratio of the actual number of outgoing interactions (provided interfaces) to the available number of outgoing interactions in a component This metric counts the number of components, in which their links exceed a given critical value This metric counts the number of bridge components in a system. (A bridge component links two or more components in an application.) This metric counts the number of components, which exceeds a given value This metric is estimated by dividing the number of incoming interactions used by the number of outgoing interactions used in a component These metrics estimate the ratio of the actual used number of services provided by component X to other components over the total number of services provided by component X This metric estimates the actual number of services used that are required from a component X by other components over the total number of services that are required from a component X The sum of the ratio of the actual number of services provided over the sum of the total number of services provided in an application
CBSS that has many components indicates the need for extra effort on integration and on making corrections to errors A component that has many methods indicates the need for extra effort on integration and on making corrections to errors A CBSS with many interfaces indicates the need for extra effort on integration and on making corrections to errors A CBSS or a component that has a higher density of interactions has a higher complexity A comment that has a higher density of required interfaces needs extra testing effort
Narasimhan and Hendradjaya (2007) Narasimhan and Hendradjaya (2007)
Component Incoming Interaction Density (CIID)
Narasimhan and Hendradjaya (2007)
Component Outgoing Interaction Density (COID)
Narasimhan and Hendradjaya (2007)
Link Criticality Metric (CRITlink)
Narasimhan and Hendradjaya (2007)
Bridge Criticality Metric (CRITbridge)
Narasimhan and Hendradjaya (2007) Kharb and Singh (2008)
Size Criticality Metric (CRITsize) % age of Component Interaction (CI%)
Hoek et al. (2003)
Provided Services Utilization (PSU)
Hoek et al. (2003)
Required Services Utilization (RSU)
Hoek et al. (2003)
Compound Provided Services Utilization (CPSU)
Hoek et al. (2003)
Mahmood and Lai (2008)
Compound Required Services Utilization (CRSU) Interface Complexity (IC)
Mahmood and Lai (2008)
Interaction Complexity of Component (CC)
The sum of the ratio of the actual number of services required over the sum of the total number of services required in an application This metric counts the number of parameters and the number of methods that are in an interface, assigning weights based on IFPUG standard weights The ratio of the number of messages exchanged in a method over the total number of messages exchanged in an interface. In addition, the number of data types involved in the messages exchange for an interface method
earlier paper. The results of the citation analysis were summarized the in the column (labeled the “level of validation”). The fields that are marked with a dashed line are fields for which we did not found any one of the above four criteria in the proposed metric. It should be noted that several of the proposed metrics are from research still in progress. The studies presented in Table 3 are mostly targeted at the evaluation of individual components in isolation. The obvious viewpoint for this situation would be that a component consumer should
A comment that has a higher density of provided interfaces needs an extra testing effort A CBSS that has a link criticality requires substantial testing effort A CBSS that has a bridge criticality has a high chance for failure
A component that has a size criticality has potential risks A CBSS or a component that has a higher number of interactions has a higher complexity A component with PSU close to zero shows extra functionality that is not used by other components, which effect performance and reusability A component with RSU close to zero may not function well within an architecture
Low CPSU or CRSU indicates an unbalanced architecture. In practice, a low CPSU and a high CRSU imply that the provided services are more than what is needed Conversely, high CPSU and low CRSU indicate that the required services are less than it needs An interface that has a lower value of IC indicates that its functional complexity is low A component that has a higher CC is more complex
always attempt to choose the best components to optimize the quality of a CBSS, whereas the studies presented in Table 4 are mostly targeted at the evaluation of the CBSS architecture level. The obvious viewpoint of this scenario would be that a component consumer should also consider the context under which the components in a CBSS will operate. Consider that the way in which a component operates with other components in a CBSS may lead to an evaluation that is more worthy to the CBSS than the evaluation that is for the component in isolation.
M. Abdellatief et al. / The Journal of Systems and Software 86 (2013) 587–603
597
Table 6 Example of some useful metric that can be collected at the component level (black-box component). Ref. no.
Metric name
Description
Metric assumption and interpretation guidelines
Washizaki et al. (2003)
Rate of Component Observability (RCO)
Washizaki et al. (2003)
Rate of component Customizability (RCC)
Boxall and Araban (2004)
Argument per procedure (APP)
Sharma et al. (2008)
Interface Complexity Metrics (ICM)
A component that has the value of RCO between 0.17 and 0.42 is likely to be easier to understand and reuse A component that has the value of RCC between 0.17 and 0.34 is likely to be easier to understand and reuse Interface that has fewer procedures and arguments is likely to be easier to understand and reuse An interface that has a higher ICM is more complex and difficult to maintain and reuse
Voas and Payne (2000)
Test quality rating (TQR)
The number of readable properties divided by the total number of properties in a component The number of writable properties divided by the total number of properties in a component The total number of arguments divided by the total number of procedures This metric is the sum of the number of methods and the number of properties declared in an interface weighted by their complexity (see Table 9) This metric provides information concerning how much test was performed relative to predicted faults
In Tables 5–7, for each metric, we identified a reference of the primary study to facilitate the discussion of the metrics. In the second column, we presented the metric name followed by a brief description of the metric definition. More often, the metric definition takes into account some assumptions about the theory of the metric or a statement that is assumed to be true, which is summarized in the assumption column. We also stated the values of the metrics in their application with respect to the measured attribute and how can they guide CBSS developers to enhance the quality of CBSS (Roche and Jackson, 1994). We categorize each metric according to the classification criteria mentioned in Section 3.7, as follows: • Example of metrics that can be collected at the CBSS level from the specification (see Table 5). • Example of metrics that can be collected at the component level (black-box component) (see Table 6). • Example of metrics that can be collected at the CBSS level during the run time (see Table 7). As shown in Table 7, Narasimhan and Hendradjaya (2007) assume that the execution of a CBSS can be instrumented to measure its execution characteristics. During system testing, this scenario would probably need a user profile (i.e., the expected usage patterns of the CBSS) to guide the execution if the performance or reliability assessment were being evaluated. They proposed a metric called NC (number of cycles). To illustrate this metric, suppose that we have the assembly of a number of executable components that perform a common service. When this assembly is executed, the components call other components through various interfaces. Such processes create a cycle within a component’s graphical representation (Narasimhan and Hendradjaya, 2007). Such dynamic metrics can identify heavily used components in the overall assemblage that require intensive testing and/or identify components
Component testability scores are likely to translate into a liberal estimate of the amount of testing that is needed for most of the possible faults in the components (Voas and Miller, 1995)
that act as performance bottlenecks and are candidates for replacement. However, although the authors are careful to define AC, they do not clearly state how these metrics are measured. The proposed metrics are validated against the theoretical properties proposed by Weyuker (1988). Later on, the proposed metrics are also empirically validated and benchmarked using several large scale publicly available software system in Narasimhan et al. (2009). However, the lack of threshold values could restricts the practical use of the metrics.
4.4. Which elements of CBSS are being measured? How were these elements defined? The results of this research question were summarized in terms of the measurement ontology GarcÃa et al. (2006) and the framework presented in Gómez et al. (2008), as shown in Table 8. In the first column of Table 8, we answer the first part of the question. Based on the analysis of the metrics elements that are present in Tables 5–7 (Section 4.3), we then answer the second part of the research question and summarize it based on the elements that a metric attempts to measure (Cho et al., 2001; GarcÃa et al., 2006). As shown in Table 8, the attribute that we measure depends on the elements that we choose to measure (GarcÃa et al., 2006). For example, we may assess the size of a software component by counting the number of methods or the number of interfaces.
5. Discussion In this section, we discuss the implications of the quality assessment of the primary studies and the results with respect to our research questions. Overall, the results in this paper are mostly similar to the Goulão and Abreu report (Goulão and Abreu, 2004b). However, the study unit here is based on primary studies and
Table 7 Example of some useful metrics that can be collected at the CBSS level during the run time. Ref. no.
Metric name
Description
Metric assumption and interpretation guidelines
Narasimhan and Hendradjaya (2007)
Number of Cycles (NC)
Narasimhan and Hendradjaya (2007)
Active Component (AC)
This metric counts the number of cycles in a graph representation A component is active when its interface is in use
NC and the derived metrics may identify heavily used components in the overall assemblage that require intensive testing A higher density of active components shows a higher utilization of components in application
598
M. Abdellatief et al. / The Journal of Systems and Software 86 (2013) 587–603
Table 8 Elements of a CBSS that are measured. Entity or element
Interface
Class
Interface method
Attributes
Metrics
Component complexity (Salman, 2006) Degree of utilization of the component (Narasimhan and Hendradjaya, 2007) Integration complexity (Narasimhan and Hendradjaya, 2007), architecture complexity (Hoek et al., 2003), component dependency (Sharma et al., 2009) or interaction complexity (Kharb and Singh, 2008)
Count of the total number of interfaces in an application The average number of active components per time interval Ratio of the actual number of provided (or required) interfaces to the total number of provided (or required) interfaces
Component’s degree of understandability and reusability (Washizaki et al., 2003)
Whether or not the target component is provided BeanInfo class (1or 0)
Interface complexity (Mahmood and Lai, 2008; Sharma et al., 2008) Component composability degree (Rotaru and Dobre, 2005) Size of interface (Boxall and Araban, 2004) Degree of component customization (Choi et al., 2009)
Count of the number of methods and parameters weighed by subjective rating Count of the number of methods and parameters
Component reusability (Choi et al., 2009)
Property signature
Component observability (Washizaki et al., 2003) Component customizability (Washizaki et al., 2003) Degree of component understandability and reusability (Boxall and Araban, 2004) Degree of component understandability and reusability (Boxall and Araban, 2004)
individual metrics, whereas the study by Goulão and Abreu is a primary study only. 5.1. Quality assessment of primary studies Because the total possible quality score is 100% for each QAQ (i.e., the answers of each QAQ for the 36 papers is “Yes”), we have clearly identified a number of common problems with CBSE metrics that help to explain the current state of affairs. However, most of these problems are not specific to CBSE metrics only. Indeed, it is common in much of software engineering research (Kitchenham et al., 2002). We think that the greatest deficiency in these primary studies is the absence of any serious consideration of QAQ3 and QAQ8. These problems reduce the soundness of their conclusions. Perhaps the most serious problem is the QAQ3. Without underlying theory and a shallow hypothesis, we cannot understand the metric (Kitchenham et al., 2002; Runeson and Höst, 2009). Most of the papers fail to grant the required quality score for QAQ7, which occurs when 29% and 29% of the primary studies have scored “No” and “somewhat”, respectively. These quality scores suggest that approximately 18 papers might fail to discuss how to collect metrics data and how the metrics values could be interpreted to guide practitioners to the needed information. This result is consistent with the view of Goulão and Abreu (2004b), who have reported that the problem with existing metrics is the misleading interpretations. If different implementations of metrics collection tools are made, they could produce different results on the same artifacts. Thus, as identified in Tables 2 and 3 (see Section 4), 49% of the proposals are not directly supported by specific component technology. The metrics discussed in these primary studies are, therefore, more likely to be unreliable than the metrics that are discussed in the other primary studies. As the results confirm, some authors (Crnkovic and Larsson, 2002; Coronato et al., 2005) argued that a certain amount of interpretation is required to map the specifications of the target component technology, according to the definition of the metrics. For example, Liu and Cunningham (2004) have clearly discussed the mapping from component
Count of the number of methods and parameters Sum of the methods for attributes, behaviors, and workflow customization Count of the number of methods that provide common functions and methods that provide private functions Ratio of readable properties to the total number of properties Ratio of writable properties to the total number of properties Count of the number of arguments with the same name and type Sum of the number of arguments passed by reference and arguments passed by values
specification to enterprise JavaBeans implementation. However, metrics should remain as language independent as possible, and the definition of metrics should be both generic enough to encompass all component model technologies and specific enough to provide developers with adequate support for implementing a product with high quality. 5.2. Are the measurements performed based on a CBSS or on individual components? (RQ1) Some authors (Wallnau and Stafford, 2002; Goulao and Abreu, 2005; Mahmood et al., 2005; Narasimhan et al., 2009) claimed that the best component for specific assemblies would not necessarily be the best overall components that are available for the required functionality, under different composition contexts. The claim in those references is broad, in the sense that the best component may not be the best candidate for composition in all composition scenarios. The overall idea is that knowing how good a fit the component is to a particular assembly (or configuration) depends on the other components in that assembly. Thus, the simple rule for separating the evaluations is whether one can collect the metric based only on the information that is conveyed by that component (in isolation). With this interpretation, we mapped the primary studies that are based on the proposal level in Tables 3 and 4 and on the metric level in Tables 5–7. For example, consider the studies by Hoek et al. (2003) that are grouped in Table 5. Some of their metrics are collected upon an individual component while others are collected upon the whole system. For the former, they proposed the PSU metric (provided service utilization), which is described as “The ratio of the actual used number of services provided by component X to other components in the system over the total number of services provided by component X”. Just from the component, we do not truly know this information. In fact, the metric’s value can change, just by changing something in its external environment. In both cases, we need information from the whole system to collect the metrics, even if, for some metrics, we are collecting them at the component level and for others we are collecting them at the system level. Such metrics are grouped in Tables 5 and 6. This set of
M. Abdellatief et al. / The Journal of Systems and Software 86 (2013) 587–603
groupings is not unlike the set of metrics in Washizaki et al. (2003), which are grouped in Table 6. In this case, we need information from the component in isolation to collect the metrics. Accordingly, a total of 17 studies out of 31 were proposed to measure CBSS attributes. This result is consistent with the view of Wallnau and Stafford (2002) and Goulao and Abreu (2005), in that we are more interested in the context of overall CBSS rather than the context of individual components in isolation. Usually, a group of components depends on each other to supply complex system functionality. Any modification to a component can cause a change in the composite functionality because the composite functionality is reflected in different components. On the other hand, a total of 14 studies were proposed to evaluate individual components in isolation. The overall idea is that these metrics would facilitate the modeling of cost, quality and testing of individuals in isolation. Thus, a component consumer can choose the best components from the marketplace, in terms of the cost and the quality of the CBSS. As the results confirm and with respect to the main characteristic of CBSE, which is a separation of CBSS development from component development, the individual component metrics and the entire CBSS metrics are equally important during the entire life cycle of a CBSE, to compare the system attributes against a predefined threshold. Whenever the result exceeds such a threshold, a decision should be made about the suitability of the individual component or the CBSS itself. In addition, because CBSS development involves only replacing, adding and deleting components, the evaluation of individual component attributes will make the process of CBSS integration, testing, and maintenance easier and will facilitate the cost and quality evaluation.
5.3. Which elements of a CBSS are being measured? How were these elements defined and validated? (RQ2) As we discussed in Section 4.4, we may want to assess the size of a software component by counting the number of methods. Suppose that we find out that there are approximately 50 methods in a component. Unfortunately, after obtaining such values in isolation, we still have ambiguities when diagnosing the size of a component. For example, knowing that a component has 50 methods does not show us how large the component is because the methods in question could all be extremely large or extremely small. Furthermore, how the method is defined is very important in the metrics context. For example, should we count property methods or should we count event methods? Without a clear definition of a metric, its application is likely to lead to different results. Moreover, we understand that there is an importance of the theoretical and empirical validation. Finally, without threshold values, any interpretation of the measurement is difficult. This scenario means that we cannot say that 50 methods are large or small if we do not have a welldefined metric (Roche and Jackson, 1994; Purao and Vaishnavi, 2003; Laird and Brennan, 2006; Lanza and Marinescu, 2006). In the same way, with respect to the discussion above, overall the definitions of existing metrics and elements are ambiguous and unclear for most if not for all of the metrics. However, in fact, measures are not only instruments with strength but also have limits and constraints. In this context, the characterization and evaluation of software in general and CBSS in particular are not the easiest job. Consequently, before a measurement can be developed, a clear specification of what is being measured, why it is to be measured and how this metric value could be measured must be formulated, to provide real information from the metric rather than only numbers. Thus, to answer this question, we discuss each metric according to the elements that a metric attempts to measure.
599
5.3.1. Interfaces Many researchers (Hoek et al., 2003; Narasimhan and Hendradjaya, 2007; Kharb and Singh, 2008; Sharma et al., 2009) proposed a set of metrics that were based on the concept of provided and required interfaces. They proposed metrics that are very similar in their concepts and definitions, to measure integration complexity, architecture complexity, component dependency and interaction complexity. Overall, they aim to assess the fitness of a component in a specific system. However, how the interface is defined is not clear for most if not for all of the metrics sets. For example, a public method or public property can be considered as an interface. In addition to what has been discussed so far, the elements used in Narasimhan and Hendradjaya (2007), Kharb and Singh (2008) and Sharma et al. (2009) may not exhibit the attributes that the researchers claim to have been measuring because their approach considers only an individual attribute (or component interaction) as a means of measuring the complexity of a CBSS. Moreover, they only consider the presence of the interaction and do not consider the interaction content while measuring the CBSS complexity. For example, consider two interfaces with N different types of signatures; this scenario could be a source of extra complexity that is not captured by these metrics. Most of these metrics discussed above are therefore, just proposed theoretically, without any correlation with any quality characteristics with no validation on real-life application. Likewise, Mahmood and Lai (2008) described the IC metric (interface complexity) as the number of parameters and the number of methods in an interface. However, they did not clearly define parameters and methods. Accordingly, the interface in question could be very complex or less complex. It would be very complex if all of the methods of this interface contained complex parameter types, such as pointer types or class types, and less complex if all of the methods of this interface contained simple parameter types, such as integer types or char types. On the other hand, to consider the interaction content, Mahmood and Lai (2008) proposed the CC metric (the interaction complexity of a component). From a measurement theory perspective, the CC metric combines measures from different scales in a manner that is inconsistent with measurement theory. Specifically, the information content is on the ordinal scale, while the frequency of interaction is on a ratio scale. Thus, the linear combinations of the calculation may not be meaningful (Fenton and Pfleeger, 1997). Concerning the metrics validation, there is no empirical validation conducted for these metrics against complexity attribute, thus leaving the work incomplete. Wijayasiriwardhane and Lai (2010) extended the approaches described in Mahmood and Lai (2008) while proposing a function point like approach, named Component Point to measure the size system level size of a CBSS using the specification written in UML. The proposed metrics are validated against five small classroom-based applications. However, there is no empirical validation against any industry projects.
5.3.2. Interface methods Many authors proposed metrics that were based on methods that were specified in the interface. Washizaki et al. (2003) proposed a set of metrics for JavaBeans component reusability assessment. These metrics are proposed with confidence intervals that were set by a statistical analysis of 125 JavaBeans components. However, it may seem surprising that independent validation of the metrics set by Goulão and Abreu (2004a) shows that the metrics set fail to meet the structural quality proposed by (Washizaki et al., 2003). Another possible concern related to SCCr (self-completeness of a component’s return values) and SCCp (self-completeness of a component’s parameters) metrics is being blind to the parameter type complexity.
600
M. Abdellatief et al. / The Journal of Systems and Software 86 (2013) 587–603
Table 9 Weight values for interface methods and attributes (Sharma et al., 2008). No. of data type
1–3 4–6 7–9 ≥10
Data type Simple
Medium
Complex
Highly complex
0.1 0.2 0.3 0.4
0.15 0.3 0.45 0.6
0.2 0.4 0.06 0.8
0.25 0.5 0.75 1.0
On the other hand, to consider the complexity that is associated with parameter types in the evaluation of the complexity of method interfaces, Sharma et al. (2008) and Rotaru and Dobre (2005) both divided the interface methods into the following categories: (1) interface methods without return values and without signatures, (2) interface methods with return values but without signatures, (3) interface methods with no return values but with signatures, and (4) interface methods with both return values and signatures. They assumed that the complexity of an interface method can be measured based on its return type and the arguments passed to it. Then, they assigned weight values to these methods based on the nature (i.e., the data types) of the arguments or the return values and the number of arguments, as shown in Table 9. Sharma et al. (2008) validated their metrics on 10 JavaBean components against execution time, readability and customizability. The only ambiguous point of the definitions of these metrics is the reference value. We need to have established an upper threshold for a complex interface. For example, should we reject any component with a highly complex interface and accept a component with a complex interface, or reject any component with worse than a medium level of complexity?
5.3.3. Property With respect to the previous discussion about CBD properties in Section 2, we found that only one approach was presented to define metrics that were based on properties. Washizaki et al. (2003) categorized properties into properties with values that can be read (“readable properties”) and properties with values that can be written (“writable properties”). According to this classification, they defined two metrics, namely, RCO and RCC. The RCO metric is defined as the percentage of readable properties out of all of the fields implemented within a component. This metric indicates the component degree of observability for the component user. The RCC metric is defined as the percentage of writable properties out of all of the fields implemented within a component. This metric indicates the degree of customizability for components.
5.3.4. Signatures Boxall and Araban (2004) proposed a suite of metrics that were based on interface signatures. They assumed that “a large number of arguments with the same name and type make things easier to understand”. Then, they proposed a metrics set of 5 for component interfaces that are written in C or C++. The limited empirical evolution of the proposed metrics has shown that most of the metrics have no direct effect on the component reusability. These metrics may provide a better understandability assessment of component interfaces if they are tested on the opposite assumption. We would have thought that “a large number of arguments with the same name and type make things harder to understand”. Evaluating the proposed metrics under this assumption could provide further insight into the metrics. Thus, further empirical testing is needed to determine the relative usefulness if the metrics are set.
5.4. Are there limitations on the current research? Although the set of metrics presented in this paper are indeed useful for the characterization of the CBSS, the above analysis clearly provided a judgment regarding the following: • The lack of a widely accepted metric and quality model for CBSE from the components consumer and producer perspectives. This lack may arise because most metrics definitions were performed in an ad-hoc fashion, rather than meeting information requirements of a specific framework upon which we plan to interpret the metric. In the absence of such a framework, the data collection and interpretation of the metric becomes subjective. In addition, most of these proposals have not achieved an industrial level validation. • The poor quality of some papers identified in the quality evaluation section, which reduces the trustworthiness of the proposed metrics. • The poor quality of some metric definitions, which makes it difficult for researchers or practitioners to ensure the correct collection of measurements that were initially intended by the metrics developers. Overall, many metrics have insufficiencies either in their formulation, collection, validation or applications. • The elements of metric definitions those are not visible to CBSS developers, including elements that are incompatible with the standard concepts of a component or CBSS, such as a class or source code. • Elements that do not exhibit the attributes that the researchers claim to have been measuring. • The limitations or constraints that are required to map the specifications of target component technology according to the definition of the metrics. • Insufficient information, such as non-stated hypotheses and inadequate context provided by the original studies, which may cause subjectivity in their replication or interpretation. • As far as we know, most of the proposals were just proposed theoretically, without any correlation with any external quality attribute. A few proposals were only tested by their authors, limiting knowledge sharing. For example, it is worth noticing that only one independent validation was performed by (Goulão and Abreu, 2004a). This is mainly due to difficulties in experimental replication. A third party validation of metric is fundamental and very much desirable for their proof of usefulness before common acceptance is sought. Thus, there is insufficient experimental validation. • The poor experimental validation leads to lack of established metric threshold values, which obscures their value to practitioners. • Most of the existing proposals to evaluate an individual component may not be applicable to component level testing as its black box test. 5.5. Limitations of the study Our aim was to cover papers that were published between 2000 and 2011. However, regarding the search process, we may have overlooked certain papers because of the accessibility of their publisher sites and the limitations of our library. The limitations of this study are primary study selection bias, inaccuracy in data extraction, misclassification and quality assessment bias. To avoid a selection bias, a multistage process was used in the search strategy that involved many searches and three steps for the inclusion and exclusion criteria. The first author initially performed Sections 3.3 and 3.4. A second author was to independently repeat Section 3.4 from a random sample of all of the identified papers, and the results were to be tested in an inter-rater agreement test. Other researchers (Brereton et al., 2007; Staples and Niazi, 2007) use a
M. Abdellatief et al. / The Journal of Systems and Software 86 (2013) 587–603
similar or the same selection approach. To minimize the chance of misclassification of the metrics and misinterpretation of the terms, the data collected from each primary study (Section 3.6) and the classification scheme (Section 3.7) were also checked by the other authors independently. The procedure of having one extractor is not consistent with the standards (Kitchenham, 2007), but it is useful in practice, as stated by Staples and Niazi (2007) and Kitchenham et al. (2009). With respect to the quality assessment criteria, there is a possibility that the extraction process may have caused some bias in the results. The other authors were to independently check the assessment, and there were no critical differences in the assessments. Disagreements were resolved by a formal meeting and, when necessary, by asking other staff members from the software engineering group in our university. 6. Conclusions and future work To provide an overview of CBSS metrics and to identify the right metrics to measure the needed attributes, we have presented a systematic mapping study of existing metrics that were proposed for CBSS. We contribute to filling the gap on current approaches to CBSE metrics in general and to CBSS evaluation approaches in particular, from a component consumer perspective (i.e., development with reuse). Obviously, the perspective of a component producer (i.e., development for reuse) is essential to CBSEs, but it is beyond the scope of the review performed in this paper. We think that the benefits of a CBSE cannot be achieved without metrics for effectively evaluating CBSS. We found 20 proposals that could be applied to evaluate CBSSs, while 19 proposals could be applied to evaluate individual components. This task resulted in a plethora of metrics for components and CBSSs; however, most of them may not be of much relevance to the CBSE. We also investigated and presented various elements of components that were measured from a component consumer perspective. Unfortunately, the results of our survey showed that there is no consensus yet on many of the concepts and elements that are measured in this field. Even worse, inconsistencies in component definitions can frequently be found among the many studies by measurement researchers. Two main factors could help to solve this conflict: first, we need agreement on which element of a component is to be measured. Our work provides a clear discussion in this respect, and it can serve as a starting point for further discussions (see Section 2). Second, we need to define, without any ambiguities, the elements
601
of a software component that are to be measured. For example, what exactly is a component, an interface and a method? We also contribute a good framework for systematic review comparison and quality assessment of metrics proposals by independent research teams. This framework can be further refined and adopted, to provide more details concerning the metrics definition that could mitigate many of the identified problems. We do not claim that our review resolves all of the limitations and is agreed on by all parties, but rather that it serves as a basis for further discussion from where the CBSE measurement community can start paving the way to future agreements. From an academic point of view, we believe that this study can act as the starting point for further primary studies as well as for more detailed secondary studies, which could lead to an empirically based body of knowledge. For practitioners, the results of the studies can be used as an indication of maturity for the current research. We believe that several questions are raised by this investigation, and areas for future research are presented. An interesting area for further research involves revising the existing definition of CBSS metrics for better precision in measurement. Another interesting area is to develop a more sophisticated approach, such as combining more than one metric based on logical conditions by which a subset of problems is detected, to characterize and evaluate CBSS with real information. To obtain an overview of this approach, see Lanza and Marinescu (2006). We also note that there are no automated support tools that facilitate the collection and calculation of the metrics. Last, but not least, the majority of the metrics discussed here were either insufficiently validated or not validated at all in their original proposal. Because of space constraints, we have left this concern for future work. Acknowledgements We thank Professor Barbara Kitchenham the mother of systematic literature review in software engineering for her ideas, comments, suggestions, and support as we prepared this paper. This work will not be possible without the help of Kitchenham. We also thank all those researchers whose works are referenced. Finally, we wish to thank the anonymous reviewers for their valuable comments. Appendix A. Quality assessment of the primary studies
Primary studies
QAQ1
QAQ2
QAQ3
QAQ4
QAQ5
QAQ6
QAQ7
QAQ8
Alhazbi (2004) Washizaki et al. (2003) Gill and Grover (2004) Boxall and Araban (2004) Rotaru and Dobre (2005) Mahmood and Lai (2008) Salman (2006) Narasimhan and Hendradjaya (2007) Wijayasiriwardhane and Lai (2010) Kharb and Singh (2008) Gill and Balkishan (2008) Sharma et al. (2009) Sharma et al. (2008) Hoek et al. (2003) Venkatesan and Krishnamoorthy (2009) Bertoa and Vallecillo (2002) Bertoa et al. (2006) Sedigh-Ali et al. (2001) Wei et al. (2009) Alvaro et al. (2005) Gill and Grover (2003) Khimta et al. (2008) Goulão and Abreu (2004a) Narasimhan et al. (2009)
Somewhat Yes Somewhat Mostly Mostly Yes Mostly Yes Mostly Mostly Mostly Mostly Yes Yes Yes Yes Yes Yes Somewhat Yes Yes Yes Yes Mostly
No Yes No
No Yes No
Somewhat Mostly Somewhat Mostly Mostly No Somewhat Somewhat Somewhat Mostly Somewhat Somewhat Yes Somewhat Somewhat Somewhat Somewhat Somewhat Mostly Somewhat
Somewhat No No No Mostly No No No Mostly Somewhat No No Mostly No No No No Somewhat Somewhat No
Somewhat Mostly No Somewhat Somewhat Mostly Somewhat Somewhat Somewhat Somewhat Somewhat Somewhat Somewhat Yes Somewhat Somewhat Somewhat No Somewhat Somewhat No Somewhat Yes Somewhat
Somewhat Mostly Somewhat Mostly Somewhat Yes Mostly Yes Yes Mostly Mostly Somewhat Somewhat Yes Mostly Yes Mostly Yes Mostly Yes No Somewhat Mostly Somewhat
Somewhat Yes Mostly Yes Somewhat Yes Somewhat Yes Yes Somewhat Somewhat Mostly Mostly Yes Yes Yes Yes Yes Somewhat Yes Somewhat Yes Mostly Somewhat
No Yes No Yes Somewhat Mostly Somewhat Mostly Somewhat Somewhat Somewhat Somewhat Mostly Yes Yes No Yes Somewhat Somewhat No No No Yes Yes
Somewhat Mostly No Mostly Somewhat Mostly Somewhat Mostly Somewhat Somewhat No Somewhat Somewhat Yes No Somewhat Yes No No Somewhat No No Mostly Somewhat
602
M. Abdellatief et al. / The Journal of Systems and Software 86 (2013) 587–603
Primary studies
QAQ1
QAQ2
QAQ3
QAQ4
QAQ5
QAQ6
QAQ7
QAQ8
Seker et al. (2004) Voas and Payne (2000) Dias and Richardson (2001) Kaur and Mann (2010) Ratneshwer (2010) Serban et al. (2010) Goulao and Abreu (2005)
Yes Yes Yes Yes Yes Yes Yes
Somewhat Mostly Somewhat Somewhat Somewhat Mostly Mostly
No Mostly No No No No No
Mostly Somewhat No Somewhat Mostly Yes Mostly
Yes Yes Mostly Somewhat Somewhat Yes Yes
Somewhat Yes Mostly Yes Yes Yes Yes
Somewhat Yes No No No Mostly Yes
No No No No No No No
References Alhazbi, S.M., 2004. Measuring the complexity of component-based system architecture. In: Proceeding of International Conference on Information and Communication Technologies: From Theory to Applications, pp. 19–23. Alvaro, A., Almeida, E., Meira, S., 2005. Quality attributes for a component quality model. In: The 10th International Workshop on Component Oriented Programming (WCOP) in Conjunction with the 19th European Conference on Object Oriented Programming (ECOOP), Glasgow, Scotland. Alves, V., Niu, N., Alves, C., Valenca, G., 2010. Requirements engineering for software product lines: a systematic literature review. Information and Software Technology 52 (8), 806–820. Beecham, S., Baddoo, N., Hall, T., Robinson, H., Sharp, H., 2008. Motivation in software engineering: a systematic literature review. Information and Software Technology 50 (9–10), 860–878. Bertoa, M.F., Troya, J.M., Vallecillo, A., 2006. Measuring the usability of software components. Journal of Systems and Software 79 (3), 427–439. Bertoa, M.F., Troya, J.M., Vallecillo, A., 2003. A survey on the quality information provided by software component vendors. In: QAOOSE, vol. 3, pp. 25–30. Bertoa, M.F., Vallecillo, A., 2002. Quality attributes for COTS components. D Computation 1 (2), 128–148. Biolchini, J., Mian, P.G., Natali, A.C.C., Travassos, G.H., 2005. Systematic review in software engineering. System Engineering and Computer Science Department COPPE/UFRJ, Technical Report ES; 679(05). Boxall, M.A.S., Araban, S.,2004. Interface metrics for reusability analysis of components. In: Proceedings of the 2004 Australian Software Engineering Conference. IEEE Computer Society, pp. 40–51. Brereton, P., Kitchenham, B.A., Budgen, D., Turner, M., Khalil, M., 2007. Lessons from applying the systematic literature review process within the software engineering domain. Journal of Systems and Software 80 (4), 571–583. Budgen, D., Turner, M., Bereton, O., Kitchenham, B.,2008. Using mapping studies in software engineering. In: Proceedings of PPIG’08. Lancaster University, pp. 195–204. Catal, C., Diri, B., 2009. A systematic review of software fault prediction studies. Expert Systems with Applications 36 (4), 7346–7354. Cesare, S.D., Lycett, M., Macredie, R.D., 2006. Development of Component-based Information System. Prentice Hall of India, New Delhi. Chen, L., Ali Babar, M., 2011. A systematic review of evaluation of variability management approaches in software product lines. Information and Software Technology 53 (4), 344–362. Chidamber, S.R., Kemerer, C.F., 1994. A metrics suite for object oriented design. IEEE Transactions on Software Engineering 20 (6), 476–493. Cho, E.S., Kim, M.S., Kim, S.D.,2001. Component metrics to measure component quality. In: Proceedings of the Eighth Asia-Pacific on Software Engineering Conference. IEEE Computer Society, pp. 419–426. Choi, M., Kim, I.J., Hong, J., Kim, J., 2009. Component-based metrics applying the strength of dependency between classes. In: Proceedings of the 2009 ACM Symposium on Applied Computing, pp. 530–536. Coronato, A., d’Acierno, A., De Pietro, G., 2005. Automatic implementation of constraints in component based applications. Information and Software Technology 47 (7), 497–509. Crnkovic, I., Larsson, M., 2002. Building Reliable Component-based Software Systems. Artech House, London. Deo, N., 2004. Graph Theory with Applications to Engineering and Computer Science. PHI Learning Pvt. Ltd., New Delhi, India. Dias, M.E.R.V.M.S., Richardson, D.J., 2001. Describing dependencies in component access points. In: Proceedings of the 23rd International Conference on Software Engineering, pp. 115–118. Dybå, T., Dingsøyr, T., 2008. Empirical studies of agile software development: a systematic review. Information and Software Technology 50 (9–10), 833–859. E Abreu, F.B., 1995. The MOOD metrics set. In: Proc. ECOOP 95 Workshop on Metrics. Estublier, J., Favre, J.M., 2002. Component Model and Technology. Building Reliable Component-based Software System. Artech House, London. Faison, T., 2002. Component-based Development with Visual C#. Hungry Minds, New York. Fenton, N., Pfleeger, S.L., 1997. Software Metrics: A Rigorous & Practical approach, 2nd ed. International Thomson Publishing, Boston. GarcÃa, F., Bertoa, M.F., Calero, C., Vallecillo, A., RuÃz, F., Piattini, M., et al., 2006. Towards a consistent terminology for software measurement. Information and Software Technology 48 (8), 631–644. Gill, N.S., Grover, P.S., 2003. Component-based measurement: few useful guidelines. SIGSOFT Software Engineering Notes 28 (6), 1–6. Gill, N.S., Balkishan, 2008. Dependency and interaction oriented complexity metrics of component-based systems. SIGSOFT Software Engineering Notes 33 (2), 1–5.
Gill, N.S., Grover, P.S., 2004. Few important considerations for deriving interface complexity metric for component-based systems. SIGSOFT Software Engineering Notes 29 (2), 1–4. Gómez, O., Oktaba, H., Piattini, M., García, F., 2008. A systematic review measurement in software engineering: state-of-the-art in measures. Software and Data Technologies 10 (3), 165–176. Goulão, M., Abreu, F.B., 2004a. Cross-validation of a component metrics suite. In: Proceedings of the IX Jornadas de Ingeniería del Software y Bases de Datos (JISBD 04), Spain, pp. 73–86. Goulão, M., Abreu, F.B., 2004b. Software components evaluation: an overview. In: Proceedings of the 5a Conferência da APSI, Lisbon, pp. 1–12. Goulao, M., Abreu, F.B., 2005. Composition assessment metrics for CBSE. In: Proceeding of 31st EUROMICRO Conference on Software Engineering and Advanced Applications, pp. 96–103. Hamilton, G., 1997. JavaBeans.sun microsystems. Han, J.,1998. A comprehensive interface definition framework for software components. In: Proceedings of the Fifth Asia Pacific Software Engineering Conference. IEEE Computer Society, pp. 110–117. Heineman, G.T., Councill, W.T., 2001. Building instead of buying: a rebuttal. In: Heineman, G.T. (Ed.), Component-based Software Engineering: Putting pieces Together. Addison Wesley, Boston, MA, USA. Hnetynka, P., 2004. Component model for unified deployment of distributed component-based software. Université de Charles, http://nenya.ms.mff.cuni. cz/publications/Hnetynka-tr-2004-4.pdf, in Technical Report No. 2004/4, Charles University, Praque. Hoek, A., Dincel, E., Medvidovic, N.,2003. Using service utilization metrics to assess the structure of product line architectures. In: Proceedings of the 9th International Symposium on Software Metrics. IEEE Computer Society, pp. 298–308. Hopkins, J., 2000. Component Primer. Communications of the ACM 43 (10), 27–30. Ismail, S., Wan-Kadir, W., Saman, Y.M., Mohd-Hashim, S.Z., 2008. A review on the component evaluation approaches to support software reuse. In: Proceeding of Information Technology, 2008. ITSim 2008. International Symposium on: IEEE, pp. 1–6. Jianguo, C., Yeap, W.K., Bruda, S.D., 2009. A Review of component coupling metrics for component-based development. In: Proceeding of WCSE’09 WRI World Congress on Software Engineering, pp. 19–21. Jorgensen, M., Shepperd, M., 2007. A systematic review of software development cost estimation studies. IEEE Transactions on Software Engineering 33 (1), 33–53. Kalaimagal, S., Srinivasan, R., 2008. A retrospective on software component quality models. SIGSOFT Software Engineering Notes 33 (6), 1–10. Kaner, C., Bond, W.P., 2004. Software engineering metrics: what do they measure and how do we know? In: Proceedings of the 10th International Software Metrics Symposium, pp. 1–12. Karg, L.M., Grottke, M., Beckhaus, A., 2010. A systematic literature review of software quality cost research. Journal of Systems and Software 84 (3), 415–427. Kaur, A., Mann, K.S., 2010. Component Based Software Engineering. International Journal of Computer Applications (IJCA) 2 (1), 105–108. Kharb, L., Singh, R., 2008. Complexity metrics for component-oriented software systems. SIGSOFT Software Engineering Notes 33 (2), 1–3. Khimta, S., Sandhu, P.S., Brar, A.S., 2008. A complexity measure for JavaBean based software components. World Academy of Science, Engineering and Technology 42, 449–452. Khurum, M., Gorschek, T., 2009. A systematic review of domain analysis solutions for product lines. Journal of Systems and Software 82 (12), 1982–2003. Kitchenham, B., 2010. What’s up with software metrics? A preliminary mapping study. Journal of Systems and Software 83 (1), 37–51. Kitchenham, B.A., Hughes, R.T., Linkman, S.G., 2001. Modeling software measurement data. IEEE Transactions on Software Engineering 27 (9), 788–804. Kitchenham, B.A., Pfleeger, S.L., Pickard, L.M., Jones, P.W., Hoaglin, D.C., El Emam, K., et al., 2002. Preliminary guidelines for empirical research in software engineering. IEEE Transactions on Software Engineering 28 (8), 721–734. Kitchenham, B.A., Travassos, G.H., von Mayrhauser, A., Niessink, F., Schneidewind, N.F., Singer, J., et al., 1999. Towards an ontology of software maintenance. Journal of Software Maintenance 11 (6), 365–390. Kitchenham, B., Budgen, D., Bereton, O., 2010. The value of mapping studies – a participant observer case study. In: Proceeding of EASE’10: BCS eWic, pp. 1–9. Kitchenham, B., Budgen, D., Brereton, O., 2011. Using mapping studies as the basis for further research – a participant–observer case study. Information and Software Technology 53 (6), 638–651. Kitchenham, B., 2007. Guidelines for performing Systematic Literature Reviews in Software Engineering; Tech. Rep. EBSE 2007-001 Keele University and Durham University Joint Report.
M. Abdellatief et al. / The Journal of Systems and Software 86 (2013) 587–603 Kitchenham, B., 2004. Procedures for Performing Systematic Reviews. Joint Technical Report TR/SE – 0401, ISSN: 1353-7776, and NICTA Technical Report 0400011T.1. Kitchenham, B., Brereton, O.P., Budgen, D., Turner, M., Bailey, J., Linkman, S., 2009. Systematic literature reviews in software engineering – a systematic literature review. Journal of Systems and Software 51 (1), 7–15. Kitchenham, B., Pfleeger, S.L., Fenton, N., 1995. Towards a framework for software measurement validation. IEEE Transactions on Software Engineering 21 (12), 929–944. Laird, L.M., Brennan, M.C., 2006. Software Measurement and Estimation. A Practical Approach. IEEE Computer Society/Wiley & Sons, Inc., Hoboken, New Jersey. Lanza, M., Marinescu, R., 2006. Object-Oriented Metrics in Practices: Using Software Metrics to Characterize, Evaluate and improve the Design of Object-Oriented Systems. Springer, Berlin Heidelberg – Germany. Lisboa, L.B., Garcia, V.C., Lucradio, D., de Almeida, E.S., de Lemos Meira, Silvio Romero, de Mattos Fortes, Renata Pontin, 2010. A systematic review of domain analysis tools. Information and Software Technology 52 (1), 1–13. Liu, Y., Cunningham, H.C., 2004. Mapping component specifications to Enterprise JavaBeans implementations. In: Proceedings of the 42nd Annual Southeast Regional Conference, ACM, pp. 177–182. Lucas, F.J., Molina, F., Toval, A., 2009. A systematic review of UML model consistency management. Information and Software Technology 51 (12), 1631–1645. Mahmood, S., Lai, R., Soo, K., Hong Kim, Y., Cheon Park, J., Suk Oh, S.H., 2005. A survey of component based system quality assurance and assessment. Information and Software Technology 47 (10), 693–707. Mahmood, S., Lai, R., 2008. A complexity measure for UML component-based system specification. Software: Practice and Experience 38 (2), 117–134. Nakagawa, E., Feitosa, D., Felizardo, K., 2010. Using systematic mapping to explore software architecture knowledge. In: Proceedings of the 2010 ICSE Workshop on Sharing and Reusing Architectural Knowledge: ACM, pp. 29–36. Narasimhan, L., Hendradjaya, B., 2007. Some theoretical considerations for a suite of metrics for the integration of software components. Information Sciences 177 (3), 844–864. Narasimhan, V.L., Parthasarathy, P.T., Das, M., 2009. Evaluation of a suite of metrics for component based software engineering (CBSE). Issues in Informing Science and Information Technology 6 (5/6), 731–740. Palacios, M., GarcÃal, J., Tuya, J., 2011. Testing in service oriented architectures with dynamic binding: a mapping study. Information and Software Technology 53 (3), 171–189. Pandeya, S.S., Tripathi, A.K., 2011. Testing component-based software: what it has to do with design and component selection. Journal of Software Engineering and Applications 1, 37–47. Petersen, K., Feldt, R., Mujtaba, S., Mattsson, M., 2008. Systematic mapping Studies in Software Engineering. In: Proceeding of EASE 08: BSC eWIC, pp. 71–80. Purao, S., Vaishnavi, V., 2003. Product metrics for object-oriented systems. ACM Computing Surveys 35 (2), 191–221. Ratneshwer, T.A.K., 2010. Dependence analysis of software component. ACM SIGSOFT Software Engineering Notes 35 (4), 1–9. Roche, J., Jackson, M., 1994. Software measurement methods: recipes for success? Information and Software Technology 36 (3), 173–189. Rotaru, O.P., Dobre, M.,2005. Reusability metrics for software components. In: Proceedings of the ACS/IEEE 05 International Conference on Computer Systems and Applications (AICCSA-05). IEEE Computer Society. Runeson, P., Höst, M., 2009. Guidelines for conducting and reporting case study research in software engineering. Empirical Software Engineering 14 (2), 131–164. Salman, N., 2006. Complexity metrics AS predictors of maintainability and integrability of software components. Journal of Arts and Sciences 5, 39–50. Sedigh-Ali, S., Ghafoor, A., Paul, R.A., 2001. Software engineering metrics for COTSbased systems. Computer 34 (5), 44–50. Seker, R., van der Merwe, A., Kotze, P., Tanik, M.M., Paul, R., 2004. Assessment of coupling and cohesion for component-based software by using Shannon languages. Journal of Integrated Design & Process Science; 8 (4), 33–43. Serban, C., Vescan, A., Pop, H.F., 2010. A conceptual framework for component-based system metrics definition. In: Proceeding of Roedunet International Conference (RoEduNet), 9th 2010, pp. 73–78. Sharma, A., Grover, P.S., Kumar, R., 2009. Dependency analysis for component-based software systems. SIGSOFT Software Engineering Notes 34 (4), 1–6. Sharma, A., Kumar, R., Grover, P.S., 2008. Empirical evaluation and validation of interface complexity metrics for software components. International Journal of Software Engineering and Knowledge Engineering (IJSEKE) 18 (7), 919–931. Sharp, J., 2008. Microsoft Visual C# 2008 Step by Step. Microsoft Press, Washington, USA. Staples, M., Niazi, M., 2007. Experiences using systematic review guidelines. Journal of Systems and Software 80 (9), 1425–1437. Szyperski, C., 2002. Component Software: Beyond Object Oriented Programming, 2nd ed. Addison Wesley, New York. Venkatesan, V.P., Krishnamoorthy, M., 2009. A metrics suite for measuring software components. Journal of Convergence Information Technology 4 (2), 138–153.
603
Vinter, R., Loomes, M., Kornbrot, D., 1998. Applying software metrics to formal specifications: a cognitive approach. In: Proceedings of Software Metrics Symposium – Fifth International: IEEE, pp. 216–223. Voas, J., Payne, J., 2000. Dependability certification of software components. Journal of Systems and Software 52 (2–3), 165–172. Voas, J.M., Miller, K.W., 1995. Software testability: the new verification. IEEE Software 12 (3), 17–28. Wallnau, K., Stafford, J., 2002. Dispelling the myth of component evaluation. In: Larsson iICaM (Ed.), Building Reliable Component-Based Software Systems. Artech House Publishers, pp. 157–177. Wang, A.J., Gian, A.K., 2005. Component Oriented Programming. Wiley & Sons, Inc., New Jersey, USA. Washizaki, H., Yamamoto, H., Fukazawa, Y.,2003. A metrics suite for measuring reusability of software components. In: Proceedings of the 9th International Symposium on Software Metrics. IEEE Computer Society, pp. 211–223. Wei, G., Zhong-Wei, X., Ren-Zuo, X., 2009. Metrics of graph abstraction for component-based software architecture. In: Proceeding of Computer Science and Information Engineering, WRI World Congress on: IEEE, pp. 518–522. Weyuker, E.J., 1988. Evaluating software complexity measures. IEEE Transactions on Software Engineering 14 (9), 1357–1365. Wijayasiriwardhane, T., Lai, R., 2010. Component point: a system-level size measure for component-based software systems. Journal of Systems and Software 83 (12), 2456–2470. Williams, B.J., Carver, J.C., 2010. Characterizing software architecture changes: a systematic review. Information and Software Technology 52 (1), 31–51. Majdi Abdellatief is an assistant professor in the Department of Computer Science at Technical Education Corporation, Ministry of Higher Education & Scientific Research, Sudan. He is also an academic member at University of Shaqra, Suadi Arabia. He holds a Doctoral degree in Software Engineering from University Putra, Malaysia, M.Sc. in Information Technology from the Faculty of Computer Science and Information Technology, Alneelain University, Sudan. His research interstates include software measurements, Componentbased Software Engineering (CBSE) and software quality. Dr. Abdellatief has extensive experience in Systematic Literature Review and in System Design modeling using UML. Abu Bakar Md Sultan received his Bachelor of Computer Science from Universiti Kebangsaan Malaysia in 1993, Master in Software Engineering from Universiti Putra Malaysia (UPM) and Ph.D. in Artificial Intelligence also from UPM. He is currently an academic member at the Faculty of Information System and Computer Science of UPM. His research interest includes optimization and Searchbased Software Engineering (SBSE).
Abdul Azim Abd Ghani is a professor in the Department of Information System at University Putra Malaysia. His research interstates include software measurements, software testing, and software quality. He holds a Doctoral degree in Software Engineering from University of Strathclyde, M.Sc. in Computer Science from University of Miami, B.Sc. in Mathematics/Computer Science from Indiana State University, and is a member of the IEEE. .
Marzanah A. Jabar is a Senior Lecturer at the Department of Information System, Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, Malaysia. She holds a Ph.D. in Management Information System from UPM in 2007. Dr. Marzanah has extensive experience as a System Analyst and Software Project Manager in a Computer Centre for the past 20 years before joining the Faculty. Her currents research focuses on emerging trends and application in applied information system and software engineering.