Empirical Observations on Software Testing Automation

8 downloads 0 Views 187KB Size Report
automated software testing include, for example, that manual testing is time .... Testing service provider. Small/national .... customer and developed specially for that customer by some contractor. ... because the OU had to take into account possible ... management strategy, which emphasized the transfer of knowledge (such ...
Empirical Observations on Software Testing Automation Katja Karhu, Tiina Repo, Ossi Taipale, Kari Smolander Lappeenranta University of Technology P. O. Box 20 FI-53851 Lappeenranta FINLAND +358 5 621 6650 [email protected], [email protected], [email protected], [email protected] Abstract This study explores the factors that affect the use of software testing automation in different types of organizations. A case study was conducted in five organizations that develop and test technical software for the automation or telecommunication domains. The data was collected from interviews with managers, testers, and developers in each organization. The grounded theory was used as the data analysis method. According to the results, the factors which emphasize the wide use of testing automation include that the tested products are generic and independent of thirdparty systems. Low human involvement in testing, steady underlying technology, and reusability support the use of testing automation.

1. Introduction Testing can be divided into automated and manual testing. In a definition by Dustin et al. [1], automated software testing means the automation of software testing activities including the development and execution of test scripts, verification of testing requirements, and the use of automated test tools. According to Dustin et al. [1], reasons for using automated software testing include, for example, that manual testing is time consuming, and that testing automation increases efficiency, especially in regression testing, where test cases are executed iteratively after making changes to the software. Manual testing is a viable solution when, for example, automation is not cost effective. Software testing automation, when applicable, is seen to reduce software testing costs, which, according to Kit’s [2] estimate, may take more than 50% of the overall software development effort.

However, Persson and Yilmaztürk [3] note that the establishment of automated testing is a high risk and a high investment project. Models for estimating testing automation costs, for example by Ramler and Wolfmaier [4] support decision making in the trade-off between automated and manual testing. According to Bertolino [5], testing automation is a significant area of interest in current testing research, with the aim to improve the degree of automation, either by developing advanced techniques for generating the test inputs, or by finding support procedures to automate the testing process. Although there are studies and reports on testing automation in practice (such as [6], [7], [3], and [8]), there seems to be a lack of studies that investigate and compare the practice of software testing automation in different kinds of software development organizations. According to Briand [9], “empirical studies are crucial to software testing research in order to compare and improve software testing techniques and practices”. To provide more empirical knowledge about this area, we conducted a case study in five organizations with the purpose to understand which factors influence testing automation in practice. The organizations in the study were selected from companies that produce and/or test automation or telecommunication software. The data was collected by interviewing people in different organizational positions (managers, testers, and developers) in each of the organizations. The grounded theory research method described by Glaser and Strauss [10] and later extended by Strauss and Corbin [11] was used in analyzing the interview data. We selected grounded theory as the research method because according to our knowledge, the practice of software testing automation has not been covered widely in previous research. According to Glaser and Strauss [10], grounded theory has the

ability to uncover the issues from the practice under observation that may not have been identified in earlier literature. The paper is structured as follows: The research process and the grounded theory method are described in section 2. The analysis results are presented in section 3. Finally, the discussion and conclusions are given in section 4.

2. Research process This paper describes some of the later analysis results of a long empirical research project that included several phases. First, we collected quantitative data with the survey method and interviews from 30 organizational units and analyzed the data statistically. The process was complemented with a qualitative case study of five organizational units that we selected from the organizational units of the survey. In the case study phase we used both the interview data that was tape-recorded during the survey data collection, and also collected new qualitative data with theme-based interviews. To understand the practice of software testing automation in different kinds of software development organizations and to make theoretical observations that explain the phenomenon in question, an exploratory and qualitative strategy following the grounded theory approach [10, 11] was selected. According to Seaman [12], a grounded approach enables the identification of new theories and concepts, making it a valid choice for software engineering research. The need for qualitative approaches in the areas related to human behavior is recognized widely also in software engineering [1214]. We followed the process of building a theory from case study research described by Eisenhardt [15] and its implementation example [16]. Principles for an interpretive field study were derived from [17]. Other example studies included [18, 19]. Our data analysis included Within-Case Analysis and Cross-Case-Analysis, as in [15]. We used the tactic of looking for within-group similarities coupled with inter-group differences [15] in the five case organizational units.

2.1. Selecting case organizational units The standard ISO/IEC 15504-1 [20] specifies an organizational unit (OU) as a part of an organization that is the subject of an assessment. An OU deploys one or more processes having a coherent process context, and operates within a coherent set of business

goals. An OU is typically a part of a larger organization, although in a small organization, the OU may cover the whole organization. The OUs and the interviewees of this study are presented in Table 1. The reason to use an OU as the unit for observation was that we wanted to normalize the effect of the company size to get comparable data. The criticality of the produced or tested software was measured during the first interview round by asking how severe problems the faults in their products can cause [21]. The OUs estimated the criticality of the software using a five point scale from irritation and dissatisfaction to loss of a human life/lives. The objective was to select OUs with a software criticality above average [21]. For the first interview round, the selection from the population to the sample was based on probability sampling. The population was identified with the help of national and local authorities. The OUs were in a random order in our database and every second organization was selected. The sample of the first interview round consisted of 30 OUs. From this sample, five OUs were further selected as case OUs for the second, third and fourth interview rounds. The sampling was theoretical [16], where the researcher’s goal was not to collect a representative sample of all possible variations, but to gain a deeper understanding of the analyzed cases and to identify affecting factors and their relationships. In theoretical sampling, we selected polar types [15], which mean that the cases represent different types of OUs, e.g. different business, different size of the company, and different kind of operation.

2.2. Data collection Our data collection consisted of four interview rounds, which are presented in Table 1. The company size classification is taken from [22]. During the first interview round, managers responsible for testing and/or development were interviewed. We selected managers as the first interviewees because they have wide experience, and they are able to give an overview of testing in the OU. The purpose of the first interview round was to collect multiple choice survey data, answers to open questions, and information for the theoretical sampling of the case OUs. The interviews were tape-recorded for further qualitative analysis. The results of the survey conducted during the first interview round are reported in [21, 23, 24]. The second, third, and fourth interview rounds were conducted in the selected case OUs. The purpose of the second, third, and fourth interview rounds was to get a detailed understanding of software testing in the OUs.

Table 1. OUs and interviewees Interview round(s) 1st

2nd to 4th 2nd to 4th 2nd to 4th 2nd to 4th nd

2 to 4

th

OU

Business

Company size

Interviewees

All 30 OUs including cases from A to E

Automation or telecommunication domain.

The interviewed OUs were parts of large companies (53%) and small and medium-sized enterprises (47%).

Case A

A MES producer and integrator Software producer and testing service provider A process automation and information management provider

Large/international

Managers, 52% of the interviewees were responsible for both development and testing, 28% were responsible for testing, and 20% were responsible for development. Testing manager, tester, developer

Small/national

Testing manager, tester, developer

Case B Case C

Large/international

Case D

Electronics manufacturer

Large/international

Case E

Testing service provider

Small/national

The interviewees of the second round were managers of testing, those of the third round were testers, and those of the fourth round were developers. The new ideas were reflected in the next interview rounds. The data collection process of all interviews generated a transcription of 946 pages.

2.3. Data analysis The grounded theory method was used in analyzing the interview data from the five case OUs. According to Strauss and Corbin [11], grounded theory contains three data analysis steps: open coding, where categories of the study are extracted from the data; axial coding, where connections between the categories are identified; and selective coding, where the core category is identified and described. A category can be described as a conceptual label where different variations of the same phenomena are combined. Categorizing is done to reduce the number of units to work with [11]. In practice, the data collection and data analysis overlapped and merged because the process proceeded iteratively. First, the interviews from the first interview round were transcribed and analyzed. After that, the interviews were transcribed and analyzed after each interview round to collect new issues, and to see if it was worthwhile to continue the data collection procedure. The general rule in grounded theory is to sample until theoretical saturation is reached. This means, until (1) no new or relevant data seem to emerge regarding a category; (2) the category development is dense, insofar as all of the paradigm elements are accounted for, along with variation and process; (3) the relationships between categories are well established and validated [11]. The theoretical saturation was reached during the fourth interview round because new categories did not appear anymore,

Testing manager, tester, developer Testing manager, 2 testers, developer Testing manager, tester, developer

categories were not merged, shared, or removed, attributes or attribute values of the categories did not change, and the relationships between the categories were considered stable, i.e. the already described phenomena begun to repeat in the data. The objective of the open coding was to classify the data into categories and identify leads in the data. The open coding of the interviews was done using ATLAS.ti software [25]. The process started with “seed categories” [26] that contained essential stakeholders (such as organizational units and organizational roles), phenomena (such as testing knowledge, testing automation, and reuse), and problems. Seaman [12] notes that the initial set of codes (seed categories) comes from the goals of the study, the research questions, and predefined variables of interest. In our case, seed categories were deduced by the research group from the research question, areas of interest defined in the earlier phases of the research project [27], and from the phenomena observed when transcribing the interviews from the first round. The open coding was then done by four researchers of the research project. When the open coding and the data collection were considered as completed, three researchers were each given a viewpoint (testing automation, knowledge management, and processes) from which they started the axial coding of the data. The senior researcher, who conducted the interviews in the first round, acted as a referee. The objective of the axial coding was to further develop categories, dimensions, and causal conditions or any kinds of connections between the categories. However, during the axial coding, changes were made to the existing categories so that they would be more descriptive and make further analysis easier. The categories were further developed by defining dimensions. The dimensions represent the locations of

the property or the attribute of a category along a continuum [11]. The objective of the selective coding was to identify the core category [11] and systematically relate it to the other categories. Selective coding was mainly done by an individual researcher who presented the results to the other researchers. The findings were then discussed and further developed in collaboration with the research group members. We found that there was no single core category among the existing categories. According to Strauss and Corbin [11], sometimes the core category is one of the existing categories, and at other times no single category is broad enough to cover the central phenomenon. In that case, the central phenomenon must be given a name. In this study, the creation of the core category meant the identification of the affecting factors (categories) that explain the practice of software testing automation, and finding the relationships between these categories.

customer and developed specially for that customer by some contractor. Size and Complexity of Tested Products describes how large and complex the tested products are. Type of Testing Automation describes the level of automated testing currently utilized in the OUs. Advantages and Disadvantages of Testing Automation describes the perceived benefits and problems of testing automation in the OUs. Development of Testing Automation describes how testing automation is being developed in the OU. Reuse describes the level of the reuse in the development and the testing. Testing Knowledge describes the knowledge management strategy applied in testing. According to Nonaka [29], organizational knowledge is created through a continuous dialogue between tacit (personalized) and explicit (codified) knowledge. In the following, we will describe our observations of these categories in each of the cases.

3. Analysis results

3.1 Description of cases

As suggested by Eisenhardt, [15] we iteratively collected evidence for each category, explored the logic across the case OUs, and searched for evidence from the data for each identified relationship between categories. The categories were selected by going through the categories formed in open coding that concerned software testing automation. After that, the categories were further developed by merging categories that seemed to be closely connected and dividing categories which seemed to contain separate phenomena, and by defining dimensions for the categories. The purpose was first to get a comprehensive picture of the current use of testing automation in the case OUs. After that, other categories were compared with the testing automation categories to find connections between them. This process was repeated until the categories that seemed to have the most influence on testing automation in the case OUs were found. The final categories included the following (see also Table 2): Type of Tested Products describes what kind of products are developed and tested in the OUs. Sommerville [28] divides software products into two broad classes, generic products and customized products. Generic products are produced by a development organization and sold on an open market to any customer who is able to buy them. Customized products are systems commissioned by a particular

3.1.1. Case A: A MES producer and integrator. Case A developed and tested manufacturing execution systems (MES). Its services included systems integration and customizing of systems. Customized systems were based on uniform product kernels. Case A used internal product development tools in managing the product kernel and its variations. One of the major problems in systems testing was that the MES systems can only be fully tested at the customer’s site. This is because the customized systems are complex and they have interfaces to customer specific external systems, which cannot be fully simulated in the development environment. Also changes in technology caused additional problems because the OU had to take into account possible version variations of, for example, operating systems and other infrastructure, that different customers had. Testing automation was utilized in the development of product development tools, but not in the product kernel or in the final customized system. There were plans to develop testing automation further so that there would be automation in testing the product kernel. We observed that the lack of resources prevented the development of testing automation. According to the observations, there might be potential for testing automation in the integration testing where test cases were repeatable, but it was not yet implemented.

Table 2. Categories for the case OUs Category Type of Tested Products Size and Complexity of Tested Products Type of Testing Automation Advantages and Disadvantages of Testing Automation Development of Testing Automation Reuse Testing Knowledge Management Strategy

Dimension Generic – customized. Large – small, complex – simple. Automated – manual. Advantage – disadvantage.

Description The type of software tested in the OU. Size and complexity of the tested software. Level of testing automation in the OU. Issues that facilitate or hinder testing automation.

List of development subjects. Wide – small. Codification – personalization.

Development subjects of testing automation. Reuse in software development and testing. Type of knowledge needed in testing.

System testing of MES systems was mainly manual and it was indicated that testers therefore needed wide domain knowledge. 3.1.2. Case B: Software producer and testing service provider. Case B developed and tested its own software products and offered testing services to an external customer. The tested products were standardized and generic products. Although Case B had not made investments in testing automation, their testing services included working with the customer’s automated testing systems. The testing services consisted of scripting test cases for the customer’s testing systems, supervising the execution of the systems testing, and reporting the test results. There was a project underway in the OU to introduce systematic unit testing to the development of their own software products. The major motivation for developing unit testing automation was that it makes testing faster. According to the interviewed developer, in automated unit testing there has to be some human reasoning in the selection of test cases to achieve good test coverage. The testing manager felt that testing automation made testing faster, but also noted that there always has to be some human involvement in testing. For example, they had problems with the customer’s automated testing systems that were not working properly. Testing was mainly based on codification because Case B had to follow their customer’s knowledge management strategy, which emphasized the transfer of knowledge (such as test cases and results) in documents. 3.1.3. Case C: Process automation and information management provider. Case C developed customized process automation and information management systems. The systems were large and complex, and also depended on other suppliers’ systems. Especially

interoperability testing was found difficult because often it could only be done at the customer’s site. Also managing several versions of the product for different platforms and operating systems was considered challenging. There seemed to be no wide utilization of testing automation especially in the systems testing phase. There had been attempts to develop testing automation, but the results were not satisfactory. One issue that may be hindering the development of testing automation is that the employees may have a negative attitude towards it or resist change. It seemed that manual testing was the primary strategy in systems testing because the systems testing of process automation and information management required that the testers have a significant amount of domain knowledge when they put themselves to an end-user position. Especially automating user interface testing was considered difficult and sooner left for human beings. In addition, the lack of resources was viewed as a problematic issue because implementing and maintaining a testing automation system requires a considerable amount of time, money, and human resources. In testing of customized systems, domain knowledge was seen essential because testing was mainly systems testing. 3.1.4. Case D: Electronics manufacturer. Case D developed and tested generic, independent, and highly standardized products. Their end-products were of a high quality because recalls in mass production are very expensive. Testing automation was widely utilized, constantly developed, and according to the observations it improved the quality of testing. According to the observations, using testing automation does not necessarily shorten the product development project, but it allows running more tests than manually. This improves quality through better test coverage.

However, Case D had problems in the development of testing automation. One of the problems was that they had no specifications against which they could test. Therefore, they had to find people who knew about the domain to make test specifications. Also, it was noted that testing automation required significant investments. It was observed that currently the greatest problem in the testing automation was that sometimes the testing automation systems were faulty themselves and needed maintenance. Case D used a quality system which was applied throughout the organization. Reuse had been taken into account when developing the product development and testing infrastructure. Because there was less need for manual testing in Case D, there was also less need for testers’ domain knowledge. Knowledge of testing tools was viewed as more important than knowledge of the domain. 3.1.5. Case E: Testing service provider. Case E provided testing services to other organizations. The tested products were mainly generic and standardized. Testing customized systems was seen problematic because it was difficult to estimate the effort it would take to test customized systems. Testing automation was not utilized in Case E because they have so many different customers with different types of products. It is not profitable to build testing automation systems that can be used only once. In other words, there were little chances for reuse between the different customer projects. The costs of setting up a testing automation infrastructure are significant. In addition to test execution, testing tasks in Case E consisted of planning of testing, documentation, and defining the testable components and systems. Knowledge of the customers’ software development processes was viewed as more important than domain knowledge.

3.2 Summary of the observations There was a clear difference between the cases in their approaches to software testing and testing automation. We found that Case D was the only OU where testing automation was widely utilized, developed, and taken into account throughout the software development process. Case B was twofold in testing automation. Although they had not themselves invested in testing automation, they worked with their customer’s automated testing systems. Automated unit testing had

also been introduced to the product development in Case B. Cases A and C were similar in that they had no systematic approach to testing automation. Large systems, complexity, and the need for human evaluation and domain knowledge (especially in systems testing) seemed to be the most decisive factors in hindering testing automation investments. It was also observed in Case C that there may be some resistance in accepting testing automation within the testers. With the testing service provider E, no significant utilization of testing automation was observed. There was no reuse value for software testing automation because the projects were short and different customers had different types of products. The investment in testing automation infrastructure would be too expensive for basically one-time use. Observations were shaped according to Eisenhardt [15]. According to her, the central idea is that researchers constantly compare theory and data iterating toward a theory which closely fits the data. We formulated the following observations on the basis of the interview data and grounded theory analysis of categories, dimensions, and relationships between the categories. 3.2.1. Observation 1: Automated software testing may reduce costs and improve quality because of more testing in less time, but it causes new costs in, for example, implementation, maintenance, and training. The major perceived benefits of testing automation according to the data seem to consist of quality improvement through better test coverage, and that more testing can be done in less time. The major disadvantages were costs. The implementation cost items mentioned in the interviews consisted of direct investment costs, extra implementation time, and the need for extra human resources. Implementation of the testing automation infrastructure is costly and takes time. Testing automation also requires constant maintenance. Motivating the employees to adopt new testing practices seems to be important. Adopting testing automation means that especially the testers’ way of working changes, and that they probably need training in automated testing systems. Unreliability was also mentioned as a problem. Automated testing systems consist of hardware and software and suffer from the same issues as any other systems. ”There is often less time for testing and we should test more, and using automation it can be made possible.” (Case B, developer)

3.2.2. Observation 2: Generic and independent products facilitate and customized and complex products hinder testing automation. If the tested products are generic and independent of third-party systems, test cases and therefore test automation needs are easier to specify. Customized systems have more variability and interfaces to third-party systems, and therefore, it is more difficult to carry out test specifications because the system environment and functionalities vary between customers. ”We have been trying to include automation in our operations for years. There have been projects to automate some testing phases, but they have not been successful. It’s too difficult. And we have also found out that it doesn’t suit our type of products.” (Case C, tester) 3.2.3. Observation 3: Products based on a uniform product kernel or variants of a product family facilitate and dissimilar products hinder testing automation. If the testing organization (such as Case E) tests several different types of products, there are little chances for finding similarities between them and automating testing tasks. ”We do not have testing automation.” (Case E, tester & developer) On the other hand, products based on uniform product kernel or variants of product family emphasize the use of testing automation like in Case D. 3.2.4. Observation 4: Low human involvement facilitates and high hinders testing automation. The need for human involvement was emphasized in the data. Automated testing systems have to be supervised in case of faults. Some testing tasks are difficult to automate, especially if they require extensive domain knowledge. The need for domain knowledge was especially observed in the systems testing phase of customized systems. ”It (domain knowledge) is essential because the tester has to be in the end-user’s position.” (Case A, testing manager) 3.2.5. Observation 5: Standardized technological infrastructure facilitates and rapid changes in the underlying technology hinder testing automation. If there are constant technological changes either in the development or product infrastructure, testing automation systems may soon become obsolete. Keeping the testing automation systems up to date will therefore require significant maintenance costs. “We have to test the testing systems constantly [because of changes].” (Case B, testing manager)

3.2.6. Observation 6: High reusability facilitates and low reusability hinders testing automation. Compared to Observation 1, reuse seems to bring longterm benefits by sharing set-up costs over a longer time. When investing in testing automation, there should be possibilities for reusing the automated system over a long period of time to get returns on the investment. ”It is always expensive to set up an automated system. The price may be tenfold compared to one test. --- But later, if there is more repetition, the unit cost per test decreases quite significantly at some point.” (Case E, testing manager)

4. Discussion and conclusions The objective of this case study was to observe and identify factors that affect the state of testing automation in different types of organizations. Our cases included three different types of organizations: product-oriented software development, customized systems development, and testing service providers. We interviewed employees from different organizational positions in each of the cases. The interview data was analyzed using the grounded theory method. The major perceived benefits of testing automation include quality improvement through a better test coverage, and that more testing can be done in less time. However, an observation was made that to achieve a better test coverage, automation alone is not enough, but human involvement is needed in the selection of test cases. We found that the main disadvantage of testing automation was costs, which include implementation costs, maintenance costs, and training costs. Implementation costs included direct investment costs, time, and human resources. Fewster [7] views that if the maintenance of testing automation is ignored, updating an entire automated test suite can cost as much, or even more than the cost of performing all the tests manually. Also, according to Fewster [7], there is a connection between implementation costs and maintenance costs. If the testing automation system is designed with the minimization of maintenance costs in mind, the implementation costs increase, and vice versa. Training costs arise because employees have to change their working practices. Especially testers’ daily tasks may be completely transformed from manual testing to maintaining automated testing systems. This requires major training of the testing staff.

The unreliability of testing automation was also mentioned as a problem: automated testing systems suffer from same problems as any other software. This means that someone always has to supervise automated testing. Bach [6] has also noticed that “all automated test suites require human intervention, if only to diagnose the results and fix broken tests”. One of the hindering features of testing automation that we found was that if there is a great need for domain knowledge in testing, automating testing tasks becomes difficult. Domain knowledge is often tacit, embedded in employees, and transferring tacit knowledge is difficult. According to Bertolino [5], one of the dreams of software testing is 100% automatic testing. However, according to our, as well as for example Bach’s [6], observations, it cannot be achieved. The type of the tested product influences the use of testing automation. If the tested products are generic and independent, automated tests are easier to specify. If the tested products are customized and complex, specifying automated tests becomes more difficult. The reusability of the testing automation system is essential in making the investment worthwhile. Jones [30] views that reuse can be successful when the reusable materials are of a high quality, i.e. “certified to levels of quality that approach or achieve zero defects” and when artefacts are constructed so that subsequent reuse is straightforward and efficient. According to Jones [30], another barrier to reuse is finding the time and funds to construct reusable materials in the schedule and cost pressure under which most software projects are. According to our observations, the testing schedules were tight, and when the customer defined the schedule and budget, there were no extra resources left for developing software testing automation. Constant technological changes in the software development or product infrastructures may prevent the use of testing automation and make the testing automation systems, built according to the old infrastructure, obsolete. This seems to be a profound problem. For example Sommerville [28] has noted that changes in hardware over the past twenty years have been remarkable, and that changes in software seem to be equally significant. To guarantee the validity of the study, we used probability sampling when selecting the sample for the first interview round and theoretical sampling when selecting the case OUs. Robson [31] lists three threats to validity in this kind of research: reactivity (the interference of the researcher’s presence), researcher bias, and respondent bias. To avoid reactivity, the interpretation of the data was confirmed by presenting the results to company participants in the research

project. To avoid bias, the interviews were conducted by two researchers and the data was analyzed by four researchers (observer triangulation) [32]. The results of this study can only be directly generalized when discussing comparable OUs. In fact, this is a limitation of any survey, regardless of the sample size (see, for example, Lee & Baskerville [33]). The obvious limitation of the study is the number of the case OUs. Increasing the number of cases to cover a wider variety of different types of organizations could reveal more details, but we believe that this study has already revealed important empirical observations on software testing automation.

5. References [1] E. Dustin, J. Rashka, and J. Paul, Automated software testing: introduction, management, and performance. Boston: Addison-Wesley, 1999. [2] E. Kit, Software Testing in the Real World: Improving the Process. Reading, MA: Addison-Wesley, 1995. [3] C. Persson and N. Yilmaztürk, "Establishment of Automated Regression Testing at ABB: Industrial Experience Report on ‘Avoiding the Pitfalls’," in 19th International Conference on Automated Software Engineering (ASE’04): IEEE Computer Society, 2004. [4] R. Ramler and K. Wolfmaier, "Economic Perspectives in Test Automation: Balancing Automated and Manual Testing with Opportunity Cost," in AST’06 Shanghai, China: ACM, 2006. [5] A. Bertolino, "Software Testing Research: Achievements, Challenges, Dreams," in Future of Software Engineering: IEEE Computer Society, 2007, pp. 85-103. [6] J. Bach, "Test Automation Snake Oil," 1999. [7] M. Fewster, "Common Mistakes in Test Automation," Grove Consultants 2001. [8] T. Parveen, S. Tilley, and G. Gonzalez, "A case study in test management," in ACM Southeast Regional Conference 2007, pp. 82 - 87 [9] L. C. Briand, "A Critical Analysis of Empirical Research in Software Testing," in First International Symposium on Empirical Software Engineering and Measurement Madrid, Spain: IEEE Computer Society, 2007. [10] B. Glaser and A. L. Strauss, The Discovery of Grounded Theory: Strategies for Qualitative Research. Chicago: Aldine, 1967.

[11] A. Strauss and J. Corbin, Basics of Qualitative Research: Grounded Theory Procedures and Techniques. Newbury Park, CA: SAGE Publications, 1990.

[24] O. Taipale, K. Smolander, and H. Kälviäinen, "Factors Affecting Software Testing Time Schedule," in the Australian Software Engineering Conference, Sydney, 2006.

[12] C. B. Seaman, "Qualitative Methods in Empirical Studies of Software Engineering," IEEE Transactions on Software Engineering, vol. 25, pp. 557-572, 1999.

[25] "ATLAS.ti - The Knowledge Workbench," Scientific Software Development, 2005.

[13] M. Shaw, "Writing good software engineering research papers: minitutorial," in Proceedings of the 25th International Conference on Software Engineering (ICSE'03), Portland, Oregon, 2003, pp. 726-736. [14] D. I. K. Sjöberg, T. Dybå, and M. Jorgensen, "The Future of Empirical Methods in Software Engineering Research," in FOSE'07: 2007 Future of Software Engineering, 2007, pp. 358-378.

[26] M. B. Miles and A. M. Huberman, Qualitative Data Analysis. Thousand Oaks, CA: SAGE Publications, 1994. [27] O. Taipale, K. Smolander, and H. Kälviäinen, "Finding and Ranking Research Directions for Software Testing," in the European Software Process Improvement and Innovation Conference 2005, Budapest, Hungary, 2005. [28] I. Sommerville, Software Engineering. Essex, England: Addison Wesley, 1995.

[15] K. M. Eisenhardt, "Building Theories from Case Study Research," Academy of Management Review, vol. 14, pp. 532-550, 1989.

[29] I. Nonaka, "A Dynamic Theory of Organizational Knowledge Creation," Organization Science, vol. 5, pp. 1437, 1994.

[16] G. Pare´ and J. J. Elam, "Using Case Study Research to Build Theories of IT Implementation," in the IFIP TC8 WG International Conference on Information Systems and Qualitative Research, Philadelphia, USA, 1997.

[30] C. Jones, "Economics of Software Reuse," in IEEE Computer. vol. 27, 1994, pp. 106-107.

[17] H. K. Klein and M. D. Myers, "A Set of Principles for Conducting and Evaluating Interpretive Field Studies in Information Systems," MIS Quarterly, vol. 23, pp. 67-94, 1999. [18] C. R. Carter and M. Dresner, "Purchasing's role in environmental management: Cross-functional development of grounded theory," Journal of Supply Chain Management, vol. 37, pp. 12-28, 2001. [19] K. Smolander, M. Rossi, and S. Purao, "Going beyond the Blueprint: Unraveling the Complex Reality of Software Architectures," in the 13th European Conference on Information Systems: Information Systems in a Rapidly Changing Economy, Ragensburg, Germany, 2005. [20] ISO/IEC, ISO/IEC 15504-1, Information Technology Process Assessment - Part 1: Concepts and Vocabulary, 2002. [21] O. Taipale, K. Smolander, and H. Kälviäinen, "A Survey on Software Testing," in 6th International SPICE Conference on Software Process Improvement and Capability dEtermination (SPICE'2006), Luxembourg, 2006. [22] EU, "SME Definition," European Commission, 2003. [23] O. Taipale, K. Smolander, and H. Kälviäinen, "Cost Reduction and Quality Improvement in Software Testing," in Software Quality Management Conference, Southampton, UK, 2006.

[31] C. Robson, Real World Research, Second Edition: Blackwell Publishing, 2002. [32] N. K. Denzin, The research act: A theoretical introduction to sociological methods: McGraw-Hill, 1978. [33] A. S. Lee and R. L. Baskerville, "Generalizing Generalizability in Information Systems Research," Information Systems Research, vol. 14, pp. 221-243, 2003.