Towards a Test Automation Improvement Model ... - Semantic Scholar

2014 IEEE International Conference on Software Testing, Verification, and Validation Workshops

Towards a Test Automation Improvement Model (TAIM) Sigrid Eldh Radio System and Technology Ericsson AB, Stockholm Dept. Math and Comp. Science, Karlstad University, Karlstad Sweden [email protected]

Kenneth Andersson, Andreas Ermedahl Radio System and Technology Ericsson AB, Stockholm Sweden [email protected] [email protected]

TAIM defines ten key areas (KA), and one general area (GA). For each KA we aim to provide measurements, allowing the model to provide objectively defined stepwise improvements. One step should mean better to best aid improvements. Research have in many ways answered how to measure test e.g. by usage of coverage [11] or utilizing mutation tests [30]. Both could be viewed as automated approaches of measurement that provides a factual measurement that can be used to compare two test suites. Coverage is beneficial in supporting assessment of quality, but does not answer all quality questions on a test. Other aspects must be taken into account; e.g. is the test suite maintainable? Can the fully automated test execution suite execute faster? Can any other step of the test process be automated, and what are the cost and consequences of such automation? TAIM must be able to provide measurement to support the “best” automation. We have taken the first step of validating our model on a large complex telecommunication system, where we have measured properties of test suites and performed detailed comparisons of the test- and source code characteristics. This is by no means a new approach, since Nagappan et.al. [41] bears strong resemblance to this initial attempt. Are these metrics good measurements of what we are after? We are still not convinced. We share some highlights of this metrics study. We can support the test community by gathering know-how of earlier published research and seeking facts by applying and replicating measurements on existing test systems and system under test, as well as contribute to a simplification on how to improve. Moreover, we believe that our presented TAIM – the main contribution of this paper – should challenge the research community and provide a way to combine the body of knowledge in a more deployable manner for industry. The reminder of this paper is organized as follows: Chapter II describes related work. Chapter III presents the requirements and observations upon which TAIM is built. Chapter IV presents the GA and KA of TAIM. In Chapter V we present our initial case study and some metrics. The last chapter contains conclusions and further work.

Abstract— In agile software development, industries are becoming more dependent on automated test suites. Thus, the test code quality is an important factor for the overall system quality and maintainability. We propose a Test Automation Improvement Model (TAIM) defining ten key areas and one general area. Each area should be based on measurements, to fill the gap of existing assessments models. The main contribution of this paper is to provide the outline of TAIM and present our intermediate results and some initial metrics to support our model. Our initial target has been the key area targeting implementation and structure of test code. We have used common static measurements to compare the test code and the source code of a unit test automation suite being part of a large complex telecom subsystem. Our intermediate results show that it is possible to outline such an improvement model and our metrics approach seems promising. However, to get a generic useful model to aid test automation evolution and provide for comparable measurements, many problems still remain to be solved. TAIM can as such be viewed as a framework to guide the research on metrics for test automation artifacts. Keywords—software; industry; test automation; measurement; improvement model; test code;

I.

INTRODUCTION

To improve the test and software quality, industries have traditionally measured faults/failures and coverage, number of passed/failed test cases and has in addition relied on improvement- and maturity models, such as Test Process Improvement (TPI) [4], Test Maturity Model (TMM) [15] and Test Improvement Model (TIM) [1]. The area is evolving with new generations of these models, including Test Management Approach (TMAP) Next [24], TPI Next [23] and TMMi (integrated) [26]. These models have in part been successful to aid stepwise improvements for industry but are insufficient in describing improvement steps for test automation with enough detail. Most of the existing test improvement models are based on a subjective, assumed, “order” of importance for improvements [25]. Another caveat of these models is the lack of scientific rigor. As a consequence, we propose a new model of improvement that we call Test Automation Improvement Model (TAIM), where most of the evaluation is based on validated metrics. This enables us to define what could be considered a better, more cost-efficient, objective and mature way to conduct test automation. 978-0-7695-5194-4/14 $31.00 © 2014 IEEE DOI 10.1109/ICSTW.2014.38

Kristian Wiklund Radio Base Systems Ericsson AB, Stockholm Dept. Innovation, Design and Eng. Mälardalens Högskola, Västerås Sweden [email protected]

335 337

II.

RELATED WORK

We are approaching this area based on measurement. There is an abundance of measurement frameworks and we have used as our main guidance Abran [42] who is focused on software metrics. One can learn from Baker et al. [36], who provide metrics and refactoring specifically for TTCN-3 test specifications. The test code smells for TTCN-3 by Neukirchen and Bisanz [37] is also highly interesting input. Both papers are targeting existing test artifacts to measure and improve test code. JUnit test patterns have been explored by van Druesen et al. [38] and by Zaidman et al. [39]. Their work on refactoring attempts on test code works towards measuring the quality of the test code, with similarities to our initial metrics. Further research is needed to conclusively confirm refactoring as a strong (and cost-effective) approach to maintainability according to Yamashita [40]. We conclude that TAIM fills a gap in the current research, where KA included in testware and thus in the test automation improvement have not been defined. By creating TAIM we provide ourselves with a goal and a research strategy that put our earlier research in a context [9] and aids in a more holistic approach. Thus, our previously published papers could be seen both as validating and motivating parts for the TAIM. The overall ambition aids us to identify areas missing. For example, we have identified technical debt and impediments in test automation [5][6]. We have looked at the automatic build- and regression test management which is performed after a system fix [10], thereby combining several KA in our model. We have worked with triaging and fault localization [8] as well as fault classifications [12] and have attempted to compare requirements of test tools [13].

We define testware as the sub-set of software in a system which targets the automation of testing. Testware equally describes all of the materials (artifacts) used to perform a test in all its phases. Gelperin and Hayashi [15] describes in TMM that the primary issue with testware is reuse, and that to enable reuse, one must support the testware by configuration management, modularization and independence. It is recommended that the test environment shall be automated, flexible and easy to use. This is emphasized by Jacobs et al. [18] discussing the TMM and claiming that the test system is paramount in testing and must be addressed in any test process improvement model. However, no new requirements are proposed to expand the TMM to encounter Test Automation aspects. In TMMi [26] adaptions are made to cater for new processes (agile), but no real test automation advancement is proposed. In “TIM - A Test Improvement Model” [1], Ericson et al. dedicate one of the five KA to testware. The KA is, apart from a requirement on configuration management, primarily concerned about what the testware does, such as automated result checking, and not as much about automation per se and adds in the review KA that testware shall be reviewed. TPI [4] promotes reusable testware, centrally managed testware with a testware product owner, configuration management, proper delivery procedures for testware, and testware architecture. Heiskanen et al. [14] build on TPI to address concerns for automated test generation. In the field of testware management, they include evaluation of when to automate, and propose that a test environment specialist is included in the staff. TPI and TMM were analyzed by Blaschke et al. [7] according to the requirements of the ISO/IEC 15504 standard. This leads to the definition of a new reference model in line with the standard, where they introduce ”Test Environment Operation” as one of the four primary life cycle processes of the model with sub-areas in user support and operational use of the test environment. TMAP Next aims at result driven testing [24] with a more business adapted process improvement, this is also viewed in TPI Next [23]. In these two works from Sogeti, Test Tools and Test Environment gets more in the focus including a better discussion on the management of these, but no metrics support are provided. Kakkonen [25] describes the change over from TPI to the new TPI Next and gives an account of how these lack support in agile process organizations and discusses how these models are based on poor assumptions that does not hold for all types of organizations, which confirms the earlier result provided by Eldh [27] for TPI. Basili’s well-established approach Goal/Question/Metrics Paradigm (GQM) [31][32][33][34], can support our aim in some of our suggested KA. We are challenging GQM by suggesting a more bottom up approach. We do not want to create goals of “better” from a limited viewpoint (e.g. asking industry) but by defining concepts on what is truly measurable. Our combined approach of using top-down GQM together with a more bottom-up approach would aid in identifying the gaps in research on test automation.

III.

REQUIREMENTS ON TAIM AND OBSERVATIONS

A. Requirements on TAIM The below list present the requirements on TAIM: x Simplified improvement steps to enable juxtaposing different automation implementations for each KA. E.g. Step 0 is basis as “no automation” for each KA. Step 1 “part” of the area is automated. Then one must consider full automation which can also be implemented “better” on a further scale. x Measurements; Ability to determine that x is better than y and that provide predictable outcome and direct guidance. Scientifically validated metrics, well defined, and preferably available in tools. x A Cost/Gain ROI (Return on investment) focus; Using the method, technique, approach including cost of competence needed, should be compare to results and savings x Availability; To deploy tools, artifacts, competences x Test automation viewed in a more end-to-end life-cycle perspective, instead of only focusing on test execution and/or test generation. x Guidelines for each KA with “major steps” based on thorough literature reviews for each KA and subarea to better define the gap between existing knowledge. x Support for self-assessment and qualitative complementary information.

338 336

x x

x

Our identified TAIM KAs provide an end-to-end view of a full automation of the entire test process. The first issue we are facing is if each KA should be viewed independently or together with some of the other areas. Our experience from utilizing earlier models has taught us that any model always needs to be adapted when implemented. It seems that within each area [25], [27] it is difficult to define and describe the implicit assumptions and to make the recommendations true for all businesses. We believe that the problem arises since the models are not based on scientifically validated- and generally applicable measurements. Thus, by utilizing a more measurement-based model, the stepwise improvement might be a “continuous” improvement on a floating scale, where a collection of measurements allow for trade-off discussions and priorities for a particular business. At some point is should be possible to agree on which metrics that could be viewed as “better”. Such comparison could for example be achieved by translating all aspects either into time or costs. This however, is already a tradeoff in itself. Even if we define a test execution to include result comparison (test oracle) and the result of “pass” and have 100% coverage of a set of test cases, it is still possible to do this automation in many different ways. For example, should preference be given to a test suite that executes fast, or to a test suite with small creation effort? Or, should we spend most of the time in the creation of the test case? Is high test suite maintainability a strong indicator of good test automation (a cost-over-time view)? Or should we rather look at aspects like “adaptability” to the constant software updates (a product-line view)? Thus, should we add aspects like ”well-structured test automation suite” – or are we just digging a grave to our 100% automated test execution for 100% statement (or MC/DC) coverage? Should we promote high readability of test suites (something which are often important for software maintenance)? This does not really fit newer, more automated testing techniques, e.g. search-based testing [20], where the test cases generated are often almost unreadable. Defining the better “automation” is a challenge. It is not sufficient to just describe the software under test or domain. Thus, for each KA, there are a multitude of practices that needs to be taken into account.

Domain independence (or suggested priority adaptions) and Dependency Analysis (e.g. hardware) TAIM is based on a view that all aspects of the test process could be partly or fully automated. However, even though high level of test automation is possible, it is not always economically viable. An improvement model must be economically sound to be suggested as a step. Process, domain and organization independent; TAIM should allow for some variation of when, not dictate by whom and focus on what and how.

B. Observation on the TAIM Requirements The first step of test automation in industry is to boost efficiency by making test cases repetitive [2] [3], by automating the existing test suites. This is performed either by automating a selection of regression tests (usually functional tests) or creating a unit test harness. The result is an automated test execution suite, containing no or some support for verdict evaluation. When the artifacts or the user number grow, test artifacts are often put in a test management system [10]. In agile development, the build, integration and regression testing are highly automated, making industries rely heavily on a successful test suite. There is little evidence that the new way of working results in good test automation. If the focus is only to increase the efficiency of testing, then no or little testing is the fastest way to perform the activity. Thus a thoroughness of testing, through e.g. coverage is a necessity. Measuring only faults found is insufficient for the same reason. Our experience shows that for complex systems, even if high levels of code coverage are reached at one level, there is no guarantee that enough faults are found, nor that the correct aspects of the system has been considered [11]. In our targeted system, the coverage levels are consistently high and faults are fewer, but this is not enough. Generally speaking, the automation suites can be more or less well made, resulting in more costly maintenance of the test suites. This is important when several agile teams have to check the same code. We conclude that there is a need to measure how good a particular test suite actually is in testing the system. Test automation means for a scientist to automatically generate test cases based on either a more formal test specification [17] or define it as search problem for a test goal defining data, e.g. model based testing [21] and searchbased testing [20]. Another approach is to or use methods like model-checking [22], invariants or constraints to automatically limiting faults and checking properties of the targeted software. The usage in industry of test case generation varies based on expected quality levels, which is strongly related to delivery times of the software and its domain. A measurement model that can discuss the costs and step-wise improvements of such investments will provide a better support for investing and deploying such techniques. Thus, a model must take all aspects of automation in the test process into account.

IV.

TEST AUTOMATION IMPROVEMENT MODEL

We have so far identified ten KA and one GA. The overall quality of all test artifacts, including the test and the source code, should be combined and aggregated to an overall result. The list below presents the identified areas. For each KA (1-10) it should be possible to identify aspects provided in the GA. A.

General Area Consists of: Traceability; Defined measurements such as Efficiency, Effectiveness and Cost (return of investment); Analysis e.g. Trends, Data Aggregation; Testware; Standards; Quality of Automation and Competence.

339 337

B. Key Areas in TAIM 1) Test Management: Planning, Deployment including Resource Management, Evaluation, Reports, Status and Progress, Automation Analysis, Technical Debt 2) Test Requirements: Specific Test Automation Requirements e.g. tools or environment, Testability, Analysis of System, Analysis of Architecture (e.g. testability, scope, dependencies/slicing – components/integration levels) 3) Test Specifications: Test Case Generation, Test Design Techniques, Pre-process analysis of test, Build in features e.g. Constraints, Properties, Invariants 4) Test code (implementation): Architecture of test code, Standards/templates/patterns, Test Code Language, Static/ Dynamic measurements on test code 5) Test Automation Process: Context, Type, Level. Process Metrics on e.g. Speed 6) Test Execution: Test Case Selection, Priority, Type of test technique (test goal), Regression Tests 7) Test Verdicts: Test Oracles, Post-Process Analysis, Results (e.g. test case verdict, logs, reports) 8) Test Environment (context): Test environment specification & set up, Type (e.g. Simulation, Emulation, Target), Test Data, Certification suites, API’s 9) Test Tools: Tool Selection, Integration, (Interchange), Tool chains, Tool(s) Architecture, Frameworks, API’s, Components, Installation, Upgrade, Changeability 10) Fault/ Defect handling: Change Reports/Anomaly (Failure/Bug) Reports, Classifications, Fault identification, Triaging, Fault localizations, Fault Correction, Fault Prediction V.

standards, templates and patterns, implicitly defining an internal structure of the test case. Architecture is not the same as a test framework (but can be). If it does not dictate how the internal order, grouping etc. is created; the actual test suite architecture is seldom defined. In Table 1 we see some result obtained by measuring the internal structure of the test- and source code. TABLE 1. SOME (AVERAGE) INITIAL MEASUREMENTS ON TEST CODE Metrics Size (kLOC) #files #function #constants/kLOC #Literal assigns/kLOC #defines/kLOC Maintainability Index [19]

Source Code 853,98 3378 8802 82,53 6,53 6,53 51,1

Test Code 1023,7 1379 11744 290,98 25,93 25,93 39,1

First, in Table 1 we conclude that the amount of test code (in this case, unit test code) is much larger than the source code. This is our main argument for investments in this research, in the context of cost of maintainability. This gives in itself an important message – how much cost, effort etc. that are invested in creating and maintaining the test code – something that cannot be ignored for any serious business. Another related observation is that, in comparison to the source code, the test code contains a larger set of functions organized in fewer files. This indicates there is a lack of architecture and organization in the test code itself. This is an example of a claim that we hope to better verify and propose steps to improve. Our case study indicates that there is a lack of sufficient test architects and not enough time spent on test architecture for our specific test cases. Those facts are not measurable in the code and are just indirect. Are library functions in use in the test code? Even if we could not in this case lean on much other than design rules (as guidelines) for the written code, we need to investigate better test patterns. Can we measure this?

OUR CASE - THE INITIAL VALIDATION

Our case study provides an example of typical industrial software, containing a mixture of propriety and open-source solutions. Our case-study target system is a sub-system of a complex large telecommunication system, which utilizes a proprietary programming language, compiler and linker. The programming language is a variant of DSP-C [35] somewhat adapted to our proprietary multicore hardware. The test code and the source code are written in the same language. Specific set up and closure (clean-up) libraries are provided in our propriety testing framework. The testing framework partly integrates some open-source software, such as Google Mock [28] and Jenkins [29]. Our initial validation only scratches the surface. In our validation approach we looked at KA 4 for Test Code and attempted to establish some initial measurable points that could provide valuable insight in our model (well aware of [41] and [42]). The efforts involved in only partly investigate one KA made us realize why we need to invite other researchers to improve on TAIM.

B. KA4 - The Standard, Templates and Patterns Is the use of a test design technique a pattern? In Table 1 we can see a much higher use of constants in the test code in comparison to the source code. With simple variation of them (and the literal assigns and defines) [9] we could probably have a better and more efficient test code (using input variables instead of hard coded constants). Thus, if data indicates overuse of hard coded constants it could be viewed as lack of test design techniques and indicating poor quality of the test cases in its use of test data. We note, however, that some constants are often needed to set up the particulars of the multi-core environment needed for the test case, giving that these specific constants should not be varied. Set up constants is a part of this test code pattern. Can we distinguish this from other constants? Can one see an evolution of the test cases and patterns provided? Is test code really similar in structure to source code?

A. KA 4 - The Architecture of Test Code Architecture of test code specifically means how the test code in itself is structured. This goes well with use of

340 338

Compared to source code, there is a general lack of guidelines, guidance and education on designing the test code. Moreover, very few of the test suites investigated are explicitly utilizing specific test design techniques. We can again confirm that basic competences in test design techniques [9] should be improved.

becomes a copy-paste activity. Is this good or bad? On one hand, reuse should be encouraged – and without doubt, copypaste is probably very much how test cases evolve when people learn from each other. On the other hand, proper structuring of test code would encourage the creation of functions or libraries, together with a more keyword- or template-driven approach to writing the test code. Maybe this is not a static metric and instead evidence of code patterns? What are really big enough chunks of duplication? Is not always code this regular? It is costly to refactor test code as the business gain could be discussed. The perspective should be how to value a test case and its likelihood to find faults. The cost of fixing old test cases should be compared to make new test cases for uncovered areas. The solution is not always straight forward. Further, this is in itself an activity that could be automated. Should automatic test case transformation be a part of TAIM? I would assume that any automatic transformation in regards to testing is an important aspect that has to be considered for the model.

C. KA4 - The Test Code Language The test-code language sub-area can be divided in many ways, e.g. Compiled vs. Interpreted; Functional vs. Procedural, or by other type of groupings: Object oriented or family-wise e.g. C, C++, C#. There is very little research providing good estimates for comparing the use of specific test languages (that is often required in e.g. specific test tools) compared to using the same language for tests as the source code is written in. To use the same programming language for unit test as for source code seems obvious, since it is easier to interface between them. Others insist upon limiting the different unit test frameworks and align them with “higher level” test case frameworks. These claims have not been tested sufficiently, as strategies are not scientifically based. Maybe usability, ease of constructing test cases, sufficient detail and ease of learning the language should be taken into account at this stage of measurements. Another important factor could be availability and cost of tools that support these test code languages. It is easy in our case to lower the score of proprietary languages, since availability of static and dynamic measurements from e.g. the compiler or tools were non-existent, and we have had to implement a tool from scratch (which is very costly compared to using a simple open source tool) to be able to measure our data. Though, there might be other justifications for a specialist language other than tool availability.

TABLE 2. CODE DUPLICATION (BLOCK SIZE: 4 AND CHARACTERS: 3). Metrics Number of Files Lines Duplicate LOC Total Duplicate Blocks

Source Code 1971 205776 99756 17672

Test Code 593 248626 437666 77298

To collect the above data, both static and dynamic measurements had to be performed. We have implemented a measurement tool, with very basic static measurement common to most static analysis tools, with metrics visible in STREW [41] in addition dynamic measurements e.g. coverage. Moreover, in [9], we could see that open source coverage tools, yielded different results on the same code for the same measurement. It is therefore extra important to discuss the good, better, best when it comes to the quality of the measurements in the data collected, as well as the tool in itself.

D. KA4 - Static and Dynamic Measurements on Test Code In comparison with other industrial tests at this level, the dynamic measurement coverage, have very high levels for such complex system in addition to being measured at integration level. These are metrics that indicates in a comparison with other industrial test automations, our implementation should be ranked rather high. On the other hand, we tried several maintainability indexes that all pointed in the same direction. We describe the used maintainability index [19]. In Table 1 this indicates that both the code and test code are overly complex. Especially the test code value is shockingly low since the target is a number above 80 [19]. We realize that this is largely insufficient data at this point and therefore the collected results are somewhat questionable. However, they provide a first indication that test code just does not measure up to source code in quality. We have looked at code duplications, see Table 2. The duplication tool compares sets of 3 characters for a code block size of 4. We conclude that the amount of code duplication in the test code is much higher than for the source code. The reason might be that test code writing often

VI.

CONCLUSION & FURTHER WORK

TAIM can be viewed as work-in-progress and as a research challenge. Our quest for metrics that objectively compare data from the testware is inspiring for further work. We have only started to validate our proposal of gathering measurements by addressing mainly the KA test code. We expect the sub-areas within the KA to evolve as we mature in gathering evidence and validation by better and wider collection of metrics in real systems. Further work is directed to automatically create patterns. It is important to compare TAIM KA with existing models. We intend to further explore measurements and meta-data from the test automation process for a series of languages. We strive to establish a basic set of metrics to further improve and validate the TAIM with aid of the research community. ACKNOWLEDGMENT

341 339

[20] P. McMinn, "Search-based software test data generation: a survey." Software Testing, Verification and Reliability 14.2. pp.105156 (2000) [21] A. Pretschner, et al. "Model-based testing for real." International Journal on Software Tools for Technology Transfer 5.2-3, pp.140-157, (2004) [22] G. Fraser, F. Wotawa, and P.E. Ammann. "Testing with model checkers: a survey." Software Testing, Verification and Reliability 19.3 (2009) [23] A. van Ewijk, B. Linker, M. van Oosterwijk, B. Visser, ”TPI Next Business Driven Test Process”, UTN, ISBN: 978-9072194978 (2009) [24] T. Koomen, L. van der Aalst , B. Broekman, M. Vroon. “TMAP Next, for result driven testing”, UTN, ISBN 978-9072194800 (2006) [25] K. Kakkonen. “Improve through individual - TPINext and Personal Testing Process”, Nordic Testing Days, Tallin (2012) [26] E. van Veenendal, “Test Maturity Model Integration TMMi Guidelines for Test Process Improvement”, UTN, ISBN 978-9490986100 (2012) [27] S.Eldh “Test Assessments based on TPI“ Proc. EuroStar Conf, Edinburgh, UK, 2002 [28] GoogleMock, see: https://code.google.com/p/googlemock/ (20/01/2014) [29] K.Kawaguchi; (Hudson/) Jenkins, see http://jenkins-ci.org/ (20/01/2014) [30] J.H. Andrews, L.C. Briand and Yvan Labiche. "Is mutation an appropriate tool for testing experiments?" Proc. 27th International Conference on Software Engineering, ICSE, IEEE (2005) [31] V.R Basili, C. Caldiera, H.D. Rombach. “Goal Question Metric Paradigm” Encyclopedia of Software Engineering (Marciniak, J.J., ed.), Vol. I, John Wiley, pp. 528-532 (1994) [32] V.R Basili. “Applying the Goal/Question/Metric Paradigm in the Experience Factory” Software Quality Assurance and Measurement: A Worldwide Perspective pp.21-44. (1993) [33] R Van Solingen and E. Berghout “Integrating Goal-Oriented Measurement in Industrial Software Engineering: Industrial Experiences with and Additions to the Goal/Question/Metric Method (GQM)” , Proc. of the 7th International Software Metrics Symposium (METRICS 2001), IEEE Computer Society, pp. 246-258 (2001) [34] R. Van Solingen, V. Basili,G. Caldiera and D. H. Rombach, “Goal Question Metric (GQM) Approach”, Encyclopedia of Software Engineering (Marciniak, J.J. ed.), John Wiley & Sons, (2002) [35] DSP C, see http://www.dsp-c.org/ (20/01/2014) [36] P. Baker, D. Evans, J. Grabowski, H. Neukirchen, and B. Zeiss, “TRex-the refactoring and metrics tool for TTCN-3 test specifications” In Testing Academic and Industrial Conference-Practice And Research Techniques TAIC PART 2006. pp. 90-94. IEEE.(2006) [37] H. Neukirchen and M. Bisanz. “Utilising code smells to detect quality problems in TTCN-3 test suites.” Testing of Software and Communicating Systems. Springer Berlin Heidelberg, 2007. 228-243. [38] A. van Deursen, L. Moonen, A. van den Bergh, G. Kok.”Refactoring Test Code”.In Extreme Programming Perspectives. Addison-Wesley, Boston pp 141–152 (2002) [39] A. Zaidman, B. Van Rompaey, S. Demeyer, & A. Van Deursen “Mining software repositories to study co-evolution of production & test code.” In Software Testing, Verification, and Validation, 1st Int.Conf. on, pp. 220-229. IEEE (2008) [40] A. Yamashita “How Good Are Code Smells for Evaluating Software Maintainability?” Results from a Comparative Case Study," Software Maintenance (ICSM) 29th IEEE Int. Conf on pp. 22-28 Sept. (2013) [41] N. Nagappan, L. Williams, J. Osborne, M. Vouk, & P. Abrahamsson, “Providing test quality feedback using static source code and automatic test suite metrics”. Software Reliability Engineering, 16th IEEE Int. Symposium on, (ISSRE) IEEE, (2005) [42] A. Abran “Software metrics and software metrology” John Wiley & Sons, (2010)

We like to thank Ericsson for supporting our work through the ATAC project, an ITEA2 project funded by Vinnova, Sweden. We also give our acknowledgments to the ITS-EASY Research School, which is funded by The Knowledge Foundation, Sweden. REFERENCES [1] [2]

[3]

[4] [5]

[6]

[7]

[8]

[9] [10]

[11] [12]

[13] [14]

[15] [16]

[17]

[18] [19]

Ericson, Thomas, Anders Subotic, and Stig Ursing. "TIM - A Test Improvement Model." Software Testing Verification and Reliability 7.4 (1997): 229-246. B. W. Boehm,., RK. Mcclean, and D. E. Urfrig. "Some experience with automated aids to the design of large-scale reliable software." Software Engineering, IEEE Transactions on 1 (1975): 125133 H. Do, and G. Rothermel. "An empirical study of regression testing techniques incorporating context and lifetime factors and improved cost-benefit models." Proc. of the 14th ACM SIGSOFT int. symposium on Foundations of Software Engineering. ACM, 2006 T. Koomen and M. Pol.” Test process improvement: a practical stepby-step guide to structured testing.” Addison-Wesley Professional, 1999 ISBN:978-0201596243 K. Wiklund, S Eldh, D Sundmark, K Lundqvist.“Technical Debt in Test Automation”. In: Fifth Int. Conf. on Software Testing, Verification and Validation (ICST), 2012 IEEE. pp. 887–892 K. Wiklund. D. Sundmark, S. Eldh, K. Lundqvist. “Impediments in Agile Software Development: An Empirical Investigation”. 14th Int Conf. Product Focused Software Development and Process Improvement (PROFES), Springer, Paphos, Cyprus, June (2013) M. Blaschke, M. Philipp, and T. Schweigert. “The Test SPICE Approach“ Journal of Software: Evolution and Process 24.5 471-480, (2012) L. Jonsson, D. Broman, K. Sandahl and S. Eldh. “ Towards Automated Anomaly Report Assignment in Large Complex Systems using Stacked Generalization”, ICST, IEEE (2012) S. Eldh. “On Test Design”, PhD Thesis, Mälardalen University Press Västerås, Sweden (2011) S. Eldh, J. Brandt, M. Street, H. Hansson, S. Punnekkat. ”Towards Fully Automated Test Management for Large Complex Systems” ICST, IEEE (2010) S. Eldh, S. Punnekkat, H. Hansson. “Experiments with Component Test to Improve Software Quality”, ISSRE (2007) S. Eldh, S. Punnekkat, H. Hansson, P. Jönsson. ”Component Testing is Not Enough - A Study of Software Faults in Telecom Middleware” TESTCOM/FATES, Springer LNCS, (2007) Y. Zang, E. Alba, JJ. Durillo, S. Eldh, M. Harman : “Today/future Importance Analysis”, GECCO, ACM, (2010) H Heiskanen, M Maunumaa, and M Katara: Test Process Improvement for Automated Test Generation., Tech. rep. Tampere University of Technology, Department of Software Systems, (2010) D. Gelperin and Hayashi “How to support better software testing”. In: Application Development Trends May (1996) D M Rafi and K. Petersen. “Benefits and limitations of automated software testing: Systematic literature review and practitioner survey”. In: Automation of Software Test, 7th International Workshop (2012) C. V. Ramamoorthy and F. Ho. Sill-bun "Testing large software with automated software evaluation systems." ACM SIGPLAN Notices. Vol. 10. No. 6. ACM, (1975) J. Jacobs, J. van Moll, and T. Stokes. “The Process of Test Process Improvement”. In: XOOTIC Magazine 8.2, pp. 23–29 (2000) J. Novak., & G. Rakić, G. “Comparison of software metrics tools for: net”. Proc. of 13th Int. Multiconference Information Society-IS, Vol A pp. 231-234 (2010)

342 340

Towards a Test Automation Improvement Model ... - Semantic Scholar

Towards a Test Automation Improvement Model ... - Semantic Scholar

Suggest Documents

Test Automation in Practice - Semantic Scholar

Test development for communication protocols: towards automation

Towards a Components Quality Model - Semantic Scholar

Domain Model Normalization: Towards a ... - Semantic Scholar

A Strategic Test Process Improvement Approach ... - Semantic Scholar

Towards a Model-Driven Join Point Model ... - Semantic Scholar

Towards a Model-Driven Join Point Model - Semantic Scholar

Towards Automation of Enterprise Architecture Model Maintenance

Towards a Semantic Web Based Model for the ... - Semantic Scholar

Towards Creating a Test Collection for European ... - Semantic Scholar

Towards a Standard for Embedded Core Test: An ... - Semantic Scholar

Towards a Standard for Embedded Core Test: An ... - Semantic Scholar

A New Compact Nonlinear Model Improvement ... - Semantic Scholar

A Model for Urban Stormwater Improvement ... - Semantic Scholar

Improvement and validation of a model for ... - Semantic Scholar

Model Based Test Automation - Capgemini Sogeti Danmark A/S

A Model-Based Approach to Test Automation for ...

Towards automation of model execution from a decision ... - MSSANZ

Towards the Automation of E-Negotiation ... - Semantic Scholar

Towards automation of palynology 2: the use of ... - Semantic Scholar

Beyond Automation - Semantic Scholar

Towards automation of low standardized logistic ... - Semantic Scholar

Towards the Automation of Scientific Method - Semantic Scholar

MODEL-LEVEL AUTOMATIC TEST GENERATION ... - Semantic Scholar