software metrics thresholds: a study proposal

5 downloads 423 Views 88KB Size Report
We refer to this as a study proposal, because we make no claim for ... the software development process, such as the average level of experience of the ...
SDPS-2015 Printed in the United States of America, November, 2015 2015 Society for Design and Process Science

SOFTWARE METRICS THRESHOLDS: A STUDY PROPOSAL Marco Canaparo, Elisabetta Ronchieri INFN CNAF, Bologna, Italy {[email protected], [email protected]}

ABSTRACT Many papers cover the topic of thresholds in software metrics to assess the software quality. Their authors provide a plethora of different results, obtained by using various approaches (from experience up to machine learning techniques) and applied to heterogeneous contexts. This makes it difficult to use them effectively. This paper aims to propose how to investigate existing studies that may have tackled the issue of thresholds from the very beginning until now. We intend to use papers in journals, conference proceedings and technical reports published from 1970 to 2014 with the objective of synthesizing the quantitative and qualitative results of the overall studies, which report sufficient contextual and methodological information according to the criteria we applied. The problem is twofold: a cloudy selection of the right approach to calculate thresholds; a direct application of metric thresholds to the code context. INTRODUCTION Organizations and computer scientists have given many definitions of software quality over time. The IEEE defines quality as “the degree to which a system, component, or process meets specified requirements or customer or user needs or expectations” (IEEE, 1990). The International Organization for Standardization (ISO) (ISO, 1987) defines quality as “the degree to which a set of inherent characteristics fulfills requirements”. Other experts define quality based on conformance to requirements and fitness for use. However, a good definition must lead us to measure quality meaningfully. According to Fenton and Bieman (Fenton & Bieman, 2014), “measurement is the process by which numbers or symbols are assigned to attributes of entities in the real world in such a way as to describe them according to clearly defined rules”. Software metrics represent a widespread way to measure code characteristics. The IEEE defines software metrics as “the quantitative measure of the degree to which a system, component or process possesses a given software attribute” (IEEE, 1990) related to quality characteristics. Measurement allows us to speculate about the quality of the software over time. On the other hand, software metrics can yield the anticipation and reduction of future maintenance needs, supporting decision-making process. To change the use of metrics from simple measurement to decision-making tool, it is essential to meaningfully define threshold values. As concerns McCabe's complexity metric (McCabe, 1976), its author provided a threshold of 10; as a

consequence, a subroutine with a higher number was expected to be unmaintainable and untestable. However, determining suitable threshold values is arduous because they may depend on specific domains (such as aerospace and student exercises) and programming language characteristics. In our research, we found many papers that talk about the topic of thresholds, but none of them is able to provide objective rules to use them effectively. In this paper, we describe the methodology we applied to analyze the current status in this field. Our final goals are the categorization of thresholds research and the introduction of a finer-grained aggregation of the papers by addressing the following two questions: 1. What papers are currently most important in the software metrics thresholds research community? 2. Can existing papers be aggregated in finer-grained categories according to the technique they adopt? We refer to this as a study proposal, because we make no claim for completeness. We have concentrated on the years 1970-2014 and we have mainly used the SCOPUS tool to search for relevant papers, leaving other tools (such as ACM, IEEE and CiteSeer digital libraries) for further studies. SCOPUS is a general indexing system that includes publishers such as IEEE, ACM, Elsevier, Wiley and Springer Lecture Notes publications. This makes SCOPUS potentially a very powerful tool for research. To the best of our knowledge, there are no surveys in the thresholds field. This paper is organized in four macro sections: the first provides background information about software metrics thresholds; the second describes the methodology process that we have started to employ for paper selection; the third shows the results; the forth reports conclusions and the future work. BACKGROUND Software metrics thresholds represent a way to determine the quality of code through a quantitative criterion. Software metrics are widely classified in process and product metrics (Bundschuh & Dekkers, 2008): the former measures the software development process, such as the average level of experience of the programming staff; the latter, on the other hand, measures the software product at any stage of its development. In this section, we only consider common product metrics. In addition to this, we provide a set of criteria that determine their appropriate measurement scales useful to make some analytic studies and statistical analysis. Detailing product metrics categories Product metrics may measure e.g., the complexity of the software design, the size of the final program and the quality

characteristics for software. A number of categories, whose thresholds are computable, are discussed below. Size metrics quantify code size. The most widely used metric is lines of code. Complexity metrics measure the relative simplicity of the system design. One of these metrics is the McCabe’s complexity, also called cyclomatic complexity, which quantifies the control flow within a program by counting the independent paths on a control flow graph that indicates a certain degree of well structuredness of an application. Quality metrics can be computed in terms of the length of time between occurrences of defects (mean time between failures) or defects density (e.g., defects per size). Since quality has a variety of definitions (IEEE, 1990; ISO, 1987; Fenton & Bieman, 2014), there are several metrics to reflect the different viewpoints. As regards ISO, quality includes:  internal attributes that are intrinsic to the software and can be measured by the developers;  external attributes that are a function of the product and can be measured by the customers;  quality in use is based on the product and also can be measured by the customer use. Object-oriented metrics are often used to measure complexity, maintenance and clarity. As such, they are mostly quality metrics and are used mainly to understand to which extent the concepts of object orientation are realized in a system. Illustrating criteria for thresholds identification We briefly summarize the various approaches to determine metrics thresholds (Alves et al., 2010):  computer scientists’ experience in coding;  descriptive statistics, such as average and standard deviation;  techniques, such as error information and cluster analysis. Many authors (McCabe, 1976; Nejmeh, 1988) have defined thresholds according to their experience, making it difficult to reproduce or generalize these results and leading to dispute about their values. Other scientists (Erni, 1996) have used the average and standard deviation of the metric values to determine the [Tmin, Tmax] range (i.e., Tmin the lower, Tmax the higher) and identify outliers. Last but not least, some researchers have investigated the relation between metrics thresholds and software failures (Shatnawi et al, 2009); the use of K-means cluster algorithm to identify outliers in the data measurements (Yoon et al, 2007).

February 2015, an example of search query is outlined in Table 1. Search 1 found 253 articles many of which were irrelevant – for example papers that reported in the title the following words: biodiversity, photon, radiation therapy, proteomic, dynamic supply voltages, genotyping array data, macroinvertebrates, metal-oxide-semiconductor, shear wave speed, medical image, transistor, tomotherapy treatment, cellular networks, placental calcification, mammograms, ionospheric anomalies, oropharyngeal cancer, lung tumor, ECG quality metrics, hearing protectors, multi-antenna, large synoptic survey telescope, fog events, visual acuity, wireless ad hoc networks, cervical spine, tomosynthesis, left ventricle, network traffic, lethal toxin neutralization. We removed all the papers with the words above in the title and other 2 with no title at all; as a consequence, there were 95 records left. Then, amongst these, we found 78 papers that looked relevant by leveraging a brief review of both the abstract and the text. We also removed some papers that were off topic or not found on internet (even the abstract) or not written in English. Afterwards, we sorted the papers in terms of relevance, removing the ones without any citations. There were 55 records left. Table 1 Search query Example Search Id

Goal

Search String

Search 1

Metrics papers 19702014

( TITLE-ABS-KEY ( software ) AND TITLE-ABS-KEY ( metrics ) AND TITLE-ABS-KEY ( thresholds ) ) AND PUBYEAR > 1969 AND PUBYEAR < 2015

We included papers in our survey if the paper describes research on software metrics thresholds. Up to now, we prefer to keep papers which do not include experimental results. Papers with respect to their years, datasets, metrics, techniques, evaluation criteria and results have been examined. We based the inclusion of papers on the similarity degree of the study with this topic. We did not exclude conference papers because conference proceedings publish experience reports, also including a source of information about the industry’s experience. For the years 1970-2014, we reviewed the titles and, in a second phase, skimmed both the abstracts and texts. Then we sorted the selected papers on the basis of the citation count and removed the ones without any citations. Extracting data

METHODOLOGY PROCESS For the study proposal we defined all the steps that compose the methodology we decided to adopt. Identifying relevant papers We mainly used SCOPUS to search for software metrics thresholds papers published from 1970 up to 2014. The search process started on 7th January 2015 and ended on 7th

We collected some standard information about all papers, such as the authors, the full reference, whether the paper was related to a conference or a journal, the total number of citations. We aim to classify the papers according to the following criteria: 1. the main topic (see Table 2); 2. whether the paper was empirical, theoretical or both (see Table 3);

3.

whether the paper applied metrics in the context of open source software; 4. whether the paper use public data-sets of metrics; 5. which languages the analyzed projects are written; 6. whether the techniques discussed were statistical or artificial intelligence based; 7. whether the context of the paper was maintenance; 8. what sort of metrics were being evaluated; 9. the summarized goal of the study. We have completed the first and second criteria. Whereas the others need more investigation. Table 2: Main topic of thresholds Categories Development Evaluation

Analysis

Framework

Tool Literature survey

Meaning The paper is about a specification of a new technique for calculating thresholds. The paper is about the evaluation, validation or assessment of existing thresholds or techniques. The papers discuss and or illustrate methods for analyzing software metrics thresholds; to suggest new or improved methods, or to critique existing methods. The paper is about general procedure or process by which thresholds are defined, extracted and analyzed. The paper is about an automated framework. The paper summarizes the literature on some aspect of thresholds. Table 3: General type of paper

Categories Empirical

Theoretical

Both

Meaning The paper describes the results of analyzing software metrics thresholds. Typically papers that evaluate existing thresholds or a technique for calculating them are included in this category. The paper is descriptive. It discusses some issues and may (but not always) consider some theoretical issues concerning software metrics thresholds. The paper is a mixed theoretical and empirical paper. Typically papers that develop new techniques for computing thresholds and provide some evaluation or demonstration of the technique are included in this category.

We extracted data from all the relevant papers identified by Search Id (see Table 1) before extracting data from the most cited papers. There were no discrepancies between the data related to the type of paper (Journal or conference) and whether the papers concerned object-oriented metrics, or open source systems.

Aggregating data The initial search identified analysis and evaluation papers as the type of papers which are of most interest to the metrics community. Moreover, it selected more detailed categories according to the different techniques to calculate thresholds. RESULTS In this section we present some tabulation of the result of categorizing the identified papers. In particular we provide the publication source for conference and journal papers. Table 4: Source of conference papers Conference (or source)

2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering, CSMR-WCRE 2014 - Proceedings Conference on Software Maintenance ICSOFT 2010 - Proceedings of the 5th International Conference on Software and Data Technologies IEEE International Conference on Program Comprehension IEEE International Conference on Software Maintenance, ICSM IEEE International Working Conference on Mining Software Repositories IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD Information and Software Technology International Journal on Artificial Intelligence Tools ITNG 2009 - 6th International Conference on Information Technology: New Generations Journal of Physics: Conference Series Journal of Systems and Software Lecture Notes in Business Information Processing Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Lecture Notes in Electrical Engineering Proceedings - 2010 IEEE International Conference on Granular Computing, GrC 2010 Proceedings - 2011 IEEE International Symposium on Network Computing and Applications, NCA 2011 Proceedings - International Conference on Advanced Information Networking and Applications, AINA

Number of papers 1

1 1

1 1 1 1

1 1 1 1 1 1 3

1 1 1

1

Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI Proceedings - International Symposium on Software Reliability Engineering, ISSRE Proceedings - Joint Conference of the 21st International Workshop on Software Measurement, IWSM 2011 and the 6th International Conference on Software Process and Product Measurement, MENSURA 2011 Proceedings of IEEE International Symposium on High Assurance Systems Engineering Proceedings of the 16th IEEE International Requirements Engineering Conference, RE'08 Proceedings of the 2010 IEEE/IFIP Network Operations and Management Symposium, NOMS 2010 Proceedings of the 2012 4th International Conference on Computational Aspects of Social Networks, CASoN 2012 Proceedings of the 24th International Florida Artificial Intelligence Research Society, FLAIRS - 24 Proceedings of the European Conference on Software Maintenance and Reengineering, CSMR Proceedings of the IEEE International Workshop on Systems Management Proceedings of the International Symposium on Software Reliability Engineering, ISSRE SEKE 2010 - Proceedings of the 22nd International Conference on Software Engineering and Knowledge Engineering SEKE 2011 - Proceedings of the 23rd International Conference on Software Engineering and Knowledge Engineering Total

1

Total

1

In order to provide the classification results related to all the criteria shown in the extracting data section, we believe it is necessary to compare what we obtained by SCOPUS with the data derived from other tools (such as ACM, IEEE and CiteSeer digital libraries).

1

CONCLUSIONS 1 1 1

1

1

2

Empirical Software Engineering Expert Systems with Applications IEEE Transactions on Reliability IEEE Transactions on Software Engineering Information and Software Technology Information Sciences Innovations in Systems and Software Engineering International Review on Computers and Software Journal of Machine Learning Research Journal of Software Maintenance and Evolution Neurocomputing Pattern Recognition The Journal of Systems and Software

We believe this study is useful because it may act as a starting point for more detailed work. In particular, we may extend this work by taking into consideration all the criteria listed in the extracting data section to provide further classification results. Furthermore, we could use other tools (such as ACM, IEEE and CiteSeer digital libraries) in order to have a comparison with the results obtained by SCOPUS. To fully analyze the current status in the field of thresholds, we will firstly undertake the mapping study (Kitchenham & Charters, 2007) and secondly the systematic review (Cronin et al, 2008). The former allows identifying the set of primary works highlighting their gaps according to the established question. The latter provides a list as complete as possible of all the published and unpublished studies relating to a particular subject area.

1

ACKNOWLEDGMENT

1

The findings and opinions in this study belong solely to the authors.

1

1

34

Table 5: Sources of journal papers Journal

21

Number of papers 4 1 1 2 5 1 1 1 1 1 1 1 1

REFERENCES Alves, T. L. & Ypma, C. & Visser, J. (2010). Deriving metrics thresholds from benchmark data. Proceedings of the IEEE International Conference on Software Maintenance. PP: 1-10. Bundschuh, M. & Dekkers, C. (2008), The IT Measurement Compendium, Estimating and Benchmarking Success with Functional Size Measurement. Springer Berlin Heidelberg. Ch 8, PP: 207-239. Cronin, P. & Ryan, F. & Coughlan, M. (2008) Undertaking a literature review: a step-by-step approach. British Journal of Nursing. 17(1), PP:38-43. Erni, K. & Lewerentz, C. (1996). Applying design-metrics to object-oriented frameworks. Proceedings of the 3 rd International Software Metrics Symposium. PP: 6474. Fenton, N., & Bieman, J. (2014). Software Metrics: A Rigorous and Practical Approach, Third Edition. CRC Press, PP: 1-67. IEEE (1990). IEEE Standard Glossary of Software Engineering Terminology. IEEE Std 610.12-1990. ISO (1987). ISO 9000 - Quality Management, http://www.iso.org/iso/iso\_9000. Kitchenham, B. & Charters, S. (2007). Guidelines for performing Systematic Literature Reviews in Software Engineering. Technical Report EBSE

2007-001. Keele University and Durham University Joint Report. McCabe, T. J. (1976). A complexity measure. IEEE Transactions on Software Engineering. SE-2(4), PP: 308-320. Nejmeh, B. A. (1988). NPATH: A measure of execution path complexity and its applications. Communications of the ACM. 31(2), PP: 188-200. Shatnawi, R. & Li, W. & Swain, J. & Newman, T. (2009). Finding software metrics threshold values using ROC curves. Journal of Software Maintenance and Evolution: Research and Practice. 22(1), PP: 1-16. Yoon, K.-A. & Kwon, O.-S. & Bae, D.-H. (2007). An Approach to Outlier Detection of Software Measurement Data using the K-means Clustering Method. Proceedings of the First International Symposium on Empirical Software Engineering and Measurement. PP: 443-445.

Suggest Documents