2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement
Combinatorial Testing Tool Learnability in an Industrial Environment Peter M. Kruse Berner & Mattner Systemtechnik GmbH Gutenbergstr. 15 Berlin, Germany
[email protected]
Nelly Condori-Fernández(1,2), Tanja Vos ProS Research Center, Universitat Politècnica de València1 Valencia, Spain {nelly, tvos}@pros.upv.es
Alessandra Bagnato, Etienne Brosse SOFTEAM R&D Department 8 Parc Ariane Guyancourt, France {alessandra.bagnato, etienne.brosse}@softeam.fr
Information System Group2, University of Twente Enschede, Twente
[email protected] the improvement of software quality. Moreover, most of the published studies (50 out of the 93 articles reviewed) are motivated by generating small, not industrial-sized, test suites.
Abstract — [Context] Numerous combinatorial testing techniques are available for generating test cases. However, many of them are never used in practice. [Objective] Considering that learnability plays a vital role in initial adoption or rejection of a technology, in this paper we aim to investigate the learnability of a combinatorial testing tool in an industrial environment. [Method] A case study research method was designed and conducted, by including i) the definition of learnability measures for test cases models built using a combinatorial testing tool. ii) A training program was also implemented. iii) Qualitative and quantitative evaluation based on a three-level strategy was carried out (Reaction, Learning, and Performance). [Results] At the first level, the tool was perceived as easy to learn by the trainees (from a fivepoint ordinal scale). However, at the second level, during hands-on learning, it changed slightly: According to the working diaries, there were major difficulties. At third level, analyzing the learning curve of each trainee, we observe that semantic errors made per each subject were reduced slightly over the time.
Compared to a vast amount of work that has been reported on the theoretical side, there is a lack of empirical studies and experience reports on applying testing techniques to real systems in an industrial context with real subjects [16]. The current paper will concentrate on this and present a case study whose objective is to investigate the learnability of a combinatorial testing tool (CTE XL Professional [15]) in an industrial context of a company. The company is called SOFTEAM1, a private software vendor and engineering company with about 700 employees located in Paris, France. The objective of the study presented in this paper is twofold. First, we want to gain insights about the learnability of combinatorial testing in industrial practice. Second, we want to identify potential improvements of the tools and learning material for combinatorial testing when applied in a real context with industrial subjects.
Keywords— combinatorial testing, learnability assessing, technology acceptance, classification tree method, industrial case study.
I.
This remainder of this paper is structured as follows: Section II presents the motivation of our work. Section III provides an overview of the learnability model and training program used in our study. Section IV describes the planning and design of our case study research method. Section V discusses the evaluation results and in section VI we conclude and suggest further research work.
INTRODUCTION
Combinatorial testing can be a very efficient and effective strategy for testing software systems [11]. Given a System under Test (SUT) with k parameters, t-way combinatorial testing requires all combinations of values of t (out of k) parameters be covered at least once, where t is usually a small integer. Consequently, if the test parameters are modeled properly, all faults caused by interactions involving no more than t parameters will be detected.
II.
In a survey by Nie and Leun on the state of the research of combinatorial testing [12], we can find the following list of topics that have been less investigated in the area of combinatorial testing: i) Modeling, or how to identify the appropriate parameters, values, and the interrelations of parameters of SUT; ii) Constraints to avoid invalid test cases in the test suite generation; iii) Failure characterization and diagnosis; iv) Prioritization of test cases to detect faults as early as possible in the most economical way; v) Studies on the degree to which combinatorial testing contributes to 978-0-7695-5056-5/13 $26.00 © 2013 IEEE DOI 10.1109/ESEM.2013.49
MOTIVATION
The motivation of the presented study is twofold and is derived from two different perspectives. A. From client perspective (SOFTEAM) One of the priorities of SOFTEAM is to maximize usersinteraction coverage of their test suites with minimum costs. Specifically those test suites created for the Modelio SaaS system. To achieve this, SOFTEAM is eager to investigate benefits and drawbacks of using the combinatorial testing 1
304
Softeam available at http://www.softeam.fr/
tool, the Classification Tree Editor (CTE XL Professional) [15]. This motivation is mainly due to the fact that the current testing process has several limitations such as: i) There is no guaranty or even information related to the code coverage of the test suite; ii) Test case design and execution is performed manually; iii) Test cases are hardly traceable to the requirements specifications; and vi) Resources for manual inspection of test cases are limited.
of four components: Planning, Method, Logistic and Evaluation. Planning. Two staff members of SOFTEAM were involved in the planning and implementation of the training program. By answering some specific questions, their competence level on testing was determined. This information was helpful to structure an initial training that started with an introductory course on combinatorial testing. This training took 4 hours. Subsequently, the tool was installed and setup during a hands-on session. SOFTEAM employees were then able to model and generate tests cases with the continuous on-line feedback of the instructor. Teachers and researchers of the Polytechnic University of Valencia (UPVLC) coordinated this process and the training was performed by specialized personal of B&M.
Modelio SaaS is a web administration console written in PHP, which allows an administrator to connect to his account for managing modeling projects created with Modelio UML Modeling tool [1]. Modelio SaaS provides services to two kinds of users: server administrators and customers. Learning to use and integrate a combinatorial testing tool into SOFTEAM’s current testing processes, could allow testers to test all user/action combinations and to avoid costly redundancy within the test suite. Also SOFTEAM wants to be able to prioritize the test cases within the test suite in order to optimize test efforts to know which test cases are to be executed in case there is not enough time to execute all the test cases. The downside of an optimization potential is the extra effort for applying a new test approach in the existing development process. To decide if this extra effort is worth spending, a case study was planned and designed. The results contribute to support the decisions making for adopting or not this combinatorial testing tool at SOFTEAM.
Method. As the training was planned to teach a combinatorial method, it was conducted using the respective tool since the program beginning. During the introductory course, audio-visual presentations (i.e. tool demos, slides) were used. For supporting the hands-on learning activities, the individual problem-solving method was used. Logistic. The important issues considered were location and materials. The hands-on learning activities were carried out at SOFTEAM premises. Before the actual hands-on part, an introduction in terms of a course was given in-house at SOFTEAM. The training materials (e.g. slides, example files) were prepared by the trainer. Evaluation. In order to determine the effectiveness of the training program, feedback from the SOFTEAM staff members on the training program as a whole was gathered in different ways. A levels-based strategy, adapted from [7], for evaluating the learning processes was applied. Next we explain briefly each level.
B. From Tool’s owner perspective (Berner&Mattner) The development of high quality software tools is a challenging task. Typical quality assurance focusses on (functional) software testing. Usability and learnability are often less prominently evaluated during software development. With this case study, as part of an on-going process of optimization of our products, Berner&Mattner2 (B&M) wants to assess the quality of training, courses and learning materials.
•
Level 1-Reaction. Assessment at this level measures how the learners perceive and react to the learning and performance process. This level is often measured with attitude questionnaires that are passed out after most training classes.
According to Grossman et al [5], learnability can be understood and evaluated in two different ways:
•
Initial learning allows users to reach a reasonable level of usage proficiency within a short time. But it doesn’t account for the learning that occurs after such a level has been reached.
Level 2-Learning. This is the extent to which learners improve knowledge, increase skill, and change attitudes as a result of participating in a learning process.
•
Extended learning, in contrast to initial learning, considers a larger scope and long term of learning. It applies to the nature of performance change over time.
Level 3-Performance. This evaluation involves testing the learner's capabilities to perform learned skills while on the job. These evaluations can be performed formally (testing) or informally (observation).
•
Level 4-Impact. It measures the effectiveness of the initiative. Although it is normally more difficult and time-consuming to perform than the other three levels, it provides information that is of increasingly significant value as it proves the worth of a learning and performance process.
III.
LEARNABILITY AND TRAINING PROGRAM
In the presented study, we are interested in assessing extended learnability. For this purpose, a training program was designed in order to develop an individual level of knowledge on combinatorial testing and skills to use the combinatorial testing tool. This training program consisted
2
In this paper we focus our analysis on the three first levels. The last level of evaluation is still an on-going process. As is shown in Figure 1, for each level (reaction, learning and performance) different instruments were
http://www.berner-mattner.com
305
B. The treatment
designed for getting some evidence about the training effectiveness and learning process. In Sections IV.D and IV.E more detail about instrumentation is provided.
As indicated before, the combinatorial testing tool used in this study is the CTE XL Professional [14]. It implements and supports the test design technique called Classification Tree Method (CTM) [13]. Applying the classification tree method involves two steps—(1) designing the classification tree and (2) generating test cases. Designing the classification tree. In the first phase, all aspects of interests and their disjoint values are identified. Aspects of interests, also known as parameters, are called classifications; their corresponding parameter values are called classes. Classes can again be classified in different ways, so called refinements [13]. A good rule of thumb is to avoid classifications with more than 10 classes. Compositions are used to structure and group classifications. A good rule of thumb is not to have more than 20 classifications per composition. This creates a tree made of classes and classifications (and compositions) – a Classification Tree (CT).
Figure 1. Scope of the learnability process evaluation IV.
RESEARCH METHOD
Our research has been conducted as an industrial case study at SOFTEAM, which was planned for a period of six months (January 2013 to June 2013). The main goal of this study is to identify and analyze the immediate benefits and drawbacks of using CTE XL Professional within the SOFTEAM testing process. However, given SOFTEAM’s limited experience using combinatorial testing techniques, our study started investigating the learnability of the CTE XL Professional from the viewpoint of a novice tester. The following research questions were formulated:
Generation of test cases. Having composed the classification tree, the CTE generally allows for two ways of generating test cases. The first way is based on combinatorial testing techniques: test cases are defined by combining classes of different classifications. For each classification, a significant representative (class) is selected. Classifications and classes of classifications are disjoint. Since classifications contain only disjoint values test cases cannot contain several values of one classification. The most common coverage criteria are 2-way or 3-way testing, that is fulfilled if all possible pairs/triplets of values are covered by at least one test case in the result test set. Since large trees can generate many test cases due to the combinatorial explosion, CTE XL Professional offers prioritization of classes in a tree and hence the generated test cases will have a corresponding prioritization [15]. There are two editions of the tool, CTE XL Professional (commercial software) and CTE XL (freeware). For this study the commercial variant has been used.
RQ1 - How learnable is the CTE XL Professional when it is used by testing practitioners of SOFTEAM? RQ2. What potential CTE XL Professional features could be improved to increase its learnability? A. The empirical context This sections deals with the empirical context of the study on learnability, by describing the subjects, objects and interaction between them in an industrial environment. The subjects consist of two computer scientists that are novice testers from SOFTEAM. The trainee 1 is a senior analyst (5 years), and the trainee 2 is a software developer (10 years). Both have less than one year of experience in software testing: Both had previously modeled test cases using the OMG UML Testing Profile (UTP) [2] and the Modelio implementation of the UML Testing Profile [3].
C. The case study procedure Figure 2 illustrates the training and activities conducted with CTE XL Professional.
The objects (System under Test) selected by the SOFTEAM partner for this study is the Modelio SaaS system, which is a prototype system developed at Softeam. Modelio SaaS is a web application written in PHP that allows for the easy and transparent configuration of distributed environments. It runs in virtualized environments on different cloud platforms presenting a high number of configurations and hence presents various challenges to testing [10].
The SOFTEAM trainees installed and set-up CTE XL Professional using manuals and online assistance provided by the trainer from B&M. A 1-day course on CTE XL Professional was organized at the SOFTEAM premises to assist the trainees and to give them basic information on combinatorial testing and an overview over the tool. After that, the trainees carried out hands-on learning sessions using CTE XL Professional including automatic generation of abstract test cases and test case prioritization. The learned competency was evaluated by means of a final exam, which was conducted under test conditions. After both trainees successfully passed this exam, a last feedback session was carried out by the trainer. This second training part took
The part we will focus on is the Web administration console, which allows administrators to manage projects created with the Modelio modeling tool [1], and to specify allowed users for working on projects. The source code is composed of 50 PHP files for a total of 2141 lines of executable code.
306
performance: decrease in errors made while modeling the Classification Tree (CT) over a certain time interval; and 2) conducting a performance-exam on combinatorial testing.
approximately 1 month (from 4 January to 6 February 2013). Next, as is shown in Figure 2, both trainees started producing a consolidated classification tree of the Modelio SaaS case without any further support or help from the trainer, by applying their acquired competences to generate the abstract test case. Both, the finalization of the consolidated classification trees and the generation of the abstract test cases, were performed in various iterations including manually inspection of resulting test cases by the trainees.
In order to identify the CT-modeling errors, we focus on semantics aspects of CTs built with CTE: Duplicates in tree structure (E1), non-separation of concerns (E2), occurrence of abstract values over concrete values (E3), non-definition of custom Tags for Metadata (E4), non-usage of boundary values (E5), existence of special values (E6), and non-equivalence partitioning (E7). E. Data collection
Particular care was taken while inspecting the probabilities associated to the different classes from the classification trees since these will affect the order/prioritization of the test cases. Once the trainees were satisfied with the resulting abstract test cases, the next step, the actual implementation of concrete test cases (not covered within this paper), has been carried out.
Data collection methods included the administration of three questionnaires, test-based examination (see Table 5 in Appendix), working diaries, inspection on different CT models, as well as interviews with SOFTEAM development and management. Table 1 shows descriptive data of CT models built by the trainees (S1 and S2) over the entire training. Each model is described in terms of size measures. Most of them (#compositions, #classifications, and #classes) were calculated automatically by the CTE XL Professional tool. Width of the tree was manually obtained. Collecting this data, the performance criteria was measured (Level 3). Table 1. CT Models built by the trainees (S1 & S2). # Compositions # Classifications CT S1 S2 S1 S2 models M1 3 3 18 18
Figure 2. Activities conducted with CTE along all the training process
# Classes S1 70
S2 70
Width of the tree S1 70
S2 70
M2
6
1
10
17
29
71
29
56
M3
3
1
18
3
70
33
70
32
M4
3
18
70
70
Regarding to the working diaries, the trainees reported all the activities carried out over the hands-on learning period without a pre-established schedule. Most of the activities were individually performed, but some of them were also performed in pairs.
Regarding to the data collection along all these activities, questionnaires were administered in paper copies and collected individually, as well as the working diaries were self-reported by the trainees and analyzed by the second author. Similarly, the trainer attended all online training sessions and reviewed all test cases created by each subject, using the learnability measures, discussed next.
Table 2 shows the collected data for these activities that contributed to measure the learning criteria (level 2). Table 2. Self-reported activities during the hands-on learning process Time reported in minutes Activities S1 S2 In pairs CTE-Material analysis 370 360 Creation CTE tree 180 160 CTE-models reproduction 55 35 120 Internal discussion 120 meeting Skype meeting with trainer 120 80 90 Total time 725 635 330
D. The learnabiltiy measures Regarding to the three criteria for learnability evaluation, quantitative and qualitative measurement was carried out such as follow: Reaction criteria. it is operationalized by means of a post-questionnaire to capture first responses (impressions) on the introductory course. Learning criteria. it is operationalized by means of a test, which consisted of four parts related to classification tree and test cases elements, test generation, and dependency rules. In addition, self-reports of working diaries were collected to analyze the learning difficulties.
Table 3 shows the questionnaires formulated to the trainees, applied for measuring the reaction criteria (level 1).
Performance criteria. it is operationalized by 1) using a measure adapted from [4] related to actual on-the-job
307
Post questionnaire C
Pre questionnaire B
Post questionnaire A
Table 3. Questionnaires implemented along the case study Focus on Scale Perceived learnability, content, time, and quality of the material of the CTE course.
1 item in 5-points ordinal scale.
Perceived useful and satisfaction of using the CTE tool. Perceived effectiveness of the hands-on learning environment, Perceived learnability (it was applied before the exam)
5-points ordinal scale (9 items)
Perceived useful, and satisfaction of using the CTE tool. Perceived effectiveness of the hands on learning environment. Perceived learnability (it was applied after the exam)
8 items in 5-points ordinal scale
V.
trainees with regard to perceived usefulness and learnability. The tool’s usefulness was rated in general as good, and the tool’s learnability was also perceived as easy. However the satisfaction with the tool was changed slightly by one of the trainees, from a good satisfaction (before the exam) to very good (after the exam).
6 items in 5-points likert scale
Learning (level 2) – Analyzing the working diaries (Table 2), we observed that around 54% of total time of activities reported by trainees (51% and 56% respectively) was dedicated for analyzing the training material (course slides and CTE manual), this was mainly due to difficulties related to how to appropriately modeling a variable number of servers and users of the Modelio SaaS system, and defining logical rules. As these difficulties could not be solved by using only the manual, thanks to online feedback sessions, these difficulties, found during their attempts to solve problems, were directly addressed by the instructor. On the other hand, when they were asked (through an interview) again about the three most important features of the tool, the dependency rules still continued being among the most important features for both respondents. The remaining features were related to: coverage analysis, state machine modeling, test cases prioritization and test reports generation. With respect to difficulties during this phase, one trainee found it difficult to link dependencies rules to allowed arks. Both respondents (trainees) realized the modeling capability (in order to have a good model-based generation of test cases) as one of the most learning difficulties.
5 items in 5-points likert scale
DATA ANALYSIS AND INTERPRETATION
However, both were also very satisfied by the online feedback sessions with the instructor. Since due to the interactive nature, all understanding problems with the manual were directly resolved. The re-assurance to have performed all required steps in the correct way was really appreciated by both trainees.
This section discusses the results regarding the initial research questions. A. How learnable is the CTE XL Professional when it is used by testing practitioners of SOFTEAM? Empirical data was collected along all the training process in order to analyze learnability at the three identified different levels.
Performance (level 3) – In order to analyze the actual performance level of the trainees, the different models produced were evaluated by counting semantic modeling errors. As it can be seen in Figure 3, for subject 1, the total number of semantic errors by deliverable is almost constant.
Reaction (level 1) - Responses from three postquestionnaires about first impressions of the course (Postquestionnaire A), hands-on learning environment (Prequestionnaire B) and another one applied after the test exam (Post-questionnaire C) were analyzed. With respect to the course (at level 1), both respondents showed to be satisfied with the content of the course, and the time allocated for it. The practical examples using CTE XL Professional were perceived as very useful to understand the combinatorial testing concepts. Besides, both respondents considered the dependencies rules as one of the most important and useful feature of the tool.
Number of modelling errors
7 6 5 4 3 2 1 0
With respect to the perceived effectiveness of hands-on learning, we found that after one month using the tool for generating abstract test cases for the Modelio SaaS system, their impressions on the training material had changed slightly. Both respondents found that the tool manuals were actually not easy to understand and use. Comparing the responses given before and after the exam, we observe that there was not a change in the responses given by both
0
1
2
3
4
0
1
2
3
4
Figure 3. Number of errors of subject 1 (left) and subject 2 (right) per deliverable (CT models) The most repetitive error made was related to the nonequivalence partitioning issue used (E7). In the second deliverable, a new error was made by the trainee, this was
308
related to duplicity found in tree structure (E E1) and another related to the occurrence of abstract values over concrete values (E3). However both errors were rectified r in the following two deliverables.
frequent error made by trainee 1, non-equivalence partitioning (E7) was not ideentified at this phase. The total of semantic errors e detected is again almost constant (See Figure 5).
For subject 2, the most repetitive error (22 out of 5), made in the first deliverable, was related to the t equivalence partitioning issue (E7). However it waas immediately corrected in the remaining models. The low w separation of concerns (E2), like the first subject, was thee most common error made in all deliverables. This error could c have been made due to that the modeling of the classifiication trees was based on the description of the SUT and not n related to a functional specification, e.g. in terms of reequirements that were not available to both subjects. As it is shown in Figure 2, an exam waas conducted to determine how much was learned from thee program (CTE course and hands-on learning). 22 out of 299 questions were successfully answered for both trainees (Seee in Appendix, Table 5). Figure 4 shows the distributiion of success percentage by the topics that were included in the exam. Dependency rules
e performed in-pairs Figure 5. Number of errors B. What potential CTE featuures could be improved to increase its learnability? Reports on number of errors found should be supplied within the CTE XL Professiional. This would help users to identify problems with theirr classification trees already in early stages of learning the tool. t More examples could also be provided, to give an intrroduction and possible solution for many features.
50%
Abstract tests generation
62,5%
Test elements
With respect to the trainning course, the preciseness of slides, e.g. the definition andd use of technical terms can be improved. Also with resullts from the examination, the training should focus morre on the basic modeling of combinatorial problems andd keep advanced features of the tool for a secondary training session.
100%
Classification trees elements
80% 0%
20% Success
40%
60%
80%
100%
Unsuccess
C. The threats validity
Figure 4. Distribution of success percentagge by topic
In [6] various types off threats that could affect the results of a case study are lissted and explained. This section discusses some of these threaats that were addressed.
The questions QA1, Q4, Q6 of Part 1 (classification trees), Q1, Q3, and Q4 of Part 3 (abstract test t generation), and Q10 of the part 4 (dependency rules) weere unsuccessful. A last feedback by the trainer for these questtions was given.
Construct validity reflectts to what extent our operational measures really represent whhat is investigated according to the research questions. In ouur case, although the learnability evaluation was based on a four-level f strategy [7], some of the threats could not be fullyy mitigated, at least, for the two first levels (Reaction and Leaarning). This is because most of the collected data was based b on trainee’s responses (interviews, post questionnaiires, working diaries). However, in order to reduce possible misinterpretations m of formulated questions and answers gathered, g data analyzed and interpreted by the second auuthor was also validated by the respondents (trainees).
Given that the trainer was satisfied with the exam results, the consolidation phase (activity perfformed in pairs) was initiated. Analyzing the different models built (D1-D6), such as shown in Table 4, we found that thee most repetitive errors were related to definition of custom taags for metadata (E4) and separation of concerns (E2). Table 4. Distribution of semantic errorrs (in-pairs) Type of error D1 D2 D3 D4 D5 D6 1 1 1 1 0 0 E1 1 1 1 1 1 1 E2 0 0 0 0 0 0 E3 1 1 1 1 1 1 E4 0 0 0 0 0 0 E5 0 0 0 0 0 0 E6 0 0 0 0 0 0 E7
With respect to the peerformance level, in order to facilitate the correct identtification of modeling errors, representative elements of the classification trees (to be inspected) were identifiedd by the first two authors. However, in order to improvve the completeness of our list of semantic errors (seven in tootal) it could have been revised by any external reviewerr. Another threat is that all classification trees were insppected by the trainer (expert on combinatorial testing). Allthough the identification of
Although this time less errors of booth types were detected (E2 and E4), we observed that theyy still persisted. It is also important to remark that working inn pairs, the most
309
RQ1. How learnable is the CTE when it is used by testing practitioners of SOFTEAM? As we are interested in investigating a long term and large scope of learning, a levels-based strategy was applied, by using three evaluation criteria: reaction, learning and performance proposed by [8].
semantic errors could have been more precise, by including at least two reviewers, it was not possible due to resources limitations. Internal validity is of concern when causal relations are examined. Although learning (level 2) and performance (level 3) criteria are conceptually related [8], [9], this threat was not mitigated because environmental variables of the hands-on learning process could not be monitored. Only working diaries were self-reported by the trainees. Another threat is related to the requirements documentation and SUT description used by the trainees for building classification trees and generating abstract test cases. We used the existing documentation provided by SOFTEAM, without including any improvement, because the company mainly was interested in comparing the quality of test cases that will be obtained from this case study, with its own test cases obtained within its testing process.
At the first level during the course, the CTE XL Professional was perceived as easy by the trainees (from a five-point ordinal scale), and the tool’s usefulness was rated in general as good. They were also satisfied with the content, time allotted and quality of the material provided during the One-day course. However, when the trainees were involved in a hands-on learning environment, these first impressions changed slightly with respect to the quality of the material (the manual was actually not perceived as easy to understand and use). At the second level (learning), according to the working diaries, their major difficult was the appropriate modeling of a variable number of servers and users of the Modelio SaaS system. Thanks to the interactive approach (online feedback sessions), this difficulty and others, found during their attempts to solve problems, were directly addressed by the instructor.
External validity is concerned with to what extent it is possible to generalize the findings, and to what extent the findings are of interest to other people outside the investigated case. Statistical generalization is not possible from a single case study. The obtained results about the learnability of CTE XL Professional need to be evaluated further in different contexts. However, these results could be interesting for other companies like SOFTEAM, whose staff is still very motivated to enhance its actual testing process. Regarding to the system under test (SUT), it was carefully selected by the trainees with the approbation of the rest of the research team (UPVLC, B&M) and management staff of SOFTEAM. So, the selected SUT is not only relevant from a technical perspective (to investigate a combinatorial testing problem), but also from an organizational perspective, which facilitated to perform all the case study activities.
However, how easy was to actually learn the combinatorial testing tool. As part of our analysis was based on self-reported working diaries, it was not possible to measure the invested time for all the learning activities. But, according to trainees’ responses, the CTE XL Professional is perceived as easy to learn and that they would adopt it within their further testing activities. At the third level (Performance), the outcomes (testcases models, and abstract test cases) produced by the trainees were evaluated in terms of number of semantic errors made over the time per each subject. Working in pairs, both subjects demonstrated an acceptable performance by modeling and generating abstract test cases with CTE XL Professional. The most repetitive errors were related to definition of custom tags for metadata (E4) and separation of concerns (E2). An increase in understanding of tool and underlying method can be recognized. By analyzing the exam results, both respondents (trainees) had certain difficulties in correctly answering some questions related to the application of dependency rules (one of two questions). However, they passed this exam with a 76% of success rate (22 out of 29 questions). The questions that had more difficulty were related to dependency rules and abstract test generation.
Reliability is concerned with to what extent the data and the analysis are dependent on the specific researchers. Although a reliability analysis of the post-questionnaires was not conducted, all the formulated questions were reviewed, in terms of clarity, by other three volunteer colleagues from UPVLC. A detailed protocol was also developed and all data collected was appropriately coded and reviewed by case subjects. VI.
SUMMARY AND FUTURE WORK
In this paper we report first results of a case study on learnability of a combinatorial testing tool, named CTE XL Professional [15], in an industrial environment. The case study was conducted with a twofold purpose. First one from a consumer perspective, where the SOFTEAM company try to improve its current testing process, which had several limitations, by adopting a combinatorial testing strategy within its own software development process. The second one is from the perspective of a tool owner, where B&M is interested to optimize the quality of its products, by assessing the quality of training, courses, learning materials, and learnability of CTE XL Professional.
RQ2. What potential CTE features could be improved to increase its learnability? Based on the observations during the entire training program and collected data, a very useful feature would be to supply the learnability metrics within CTE XL Professional. This would help users to identify modeling problems in early stages of learning the tool. We are planning to conduct some structured interviews by including not only the trainees but also the management staff in order to evaluate the impact of the training results in SOFTEAM. Besides, in order to detect further benefits of
Regarding the two research questions defined in Section IV we conclude:
310
CTE, the test cases will be executed in order to measure their effectiveness (by using coverage measures).
[7]
For future iterations of a case study like this, we would however suggest to supply subjects with metrics already instantly during the creation phase of classification trees instead of a post-evaluation as it was done in this case study.
[8]
[9]
ACKNOWLEDGMENTS This work is partly supported by EU grant ICT-257574 (FITTEST).
[10]
REFERENCES [1] [2] [3]
[4]
[5]
[6]
[11]
"Modelio.org,". Available: http://www.modelio.org/. UML Testing Profile (UTP) Web site. [OnLine]. Available: http://utp.omg.org/ Modelio implementation of the UML Testing Profile. [Online] http://www.modeliosoft.com/en/modelio-store/modules/modelingextensions/utp.html C. D. Michelsen, W. D. Dominick, and J. E. Urban (1980). A methodology for the objective evaluation of the user/system interfaces of the MADAM system using software engineering principles. ACM Southeast Regional Conference, pp. 103-109. T. Grossman, G. Fitzmaurice, R. Attar. A Survey of Software Learnability: Metrics, Methodologies and Guidelines. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '09). ACM, New York, NY, USA, pp. 649-658. P. Runeson, M. Höst. Guidelines for conducting and reporting case study research in software engineering. Empirical Software Engineering Journal. April 2009, 14(2):131-164.
[12] [13] [14]
[15] [16]
311
L. D. Kirkpatrick and J. D. Kirkpatrick. Evaluating Training Programs. Published by Berrett-Koehler Publishers. Third Edition. 2009. J. A. Colquitt, J. A. LePine, & R. A. Noe, (2000). Toward an integrative. theory of training motivation: A meta-analytic path analysis of 20 years of research. Journal of Applied Psychology, 85, pp. 678–707. P. Edens, S. T. Bell. Effectiveness of Training in Organizations: A Meta-Analysis of Design and Evaluation Features. Journal of applied Phsycology, 88(2): 234-245. 2003. A. Bagnato, A. Sadovykh, E. Brosse, T. Vos; (2013). The OMG UML Testing Profile in Use--An Industrial Case Study for the Future Internet Testing, CSMR 2013 17th European Conference on Software Maintenance and Reingineering, pp.457-460. K. Burr and W. Young. Combinatorial test techniques: Table-based automation, test generation and code coverage. In Proc. of the Intl. Conf. on Software Testing Analysis & Review. Citeseer, 1998. N., Changhai, and H. Leung. A survey of combinatorial testing. ACM Computing Surveys (CSUR) 43(2):1-29, January 2011. M. Grochtmann and K. Grimm. Classification trees for partition testing. Softw. Test., Verif. Reliab., 3(2):63-82, 1993. E. Lehmann and J. Wegener. Test case design by means of the CTE XL. In Proceedings of the 8th European International Conference on Software Testing, Analysis & Review (EuroSTAR 2000), Kopenhagen, Denmark. Citeseer, 2000. P. M. Kruse and M. Luniak. Automated test case generation using classification trees. Software Quality Professional, 13(1):4-12, 2010. L. Briand, Y. Labiche. Empirical Studies of Software Testing Techniques: Challenges, Practical Strategies, and Future Research. In WERST Proceedings/ACM SIGSOFT. 29(5):1-3. 2004.
APPENDIX Table 5. Final exam results on the classification tree method and the classification tree editor (CTE XL Professional)
Classification trees elements Part1 a Solution 1 S1 1 S2 Part2 a Solution 1 S1 1 S2 1
QA1 b c
a
1 QB1 b c
a
QA2 b c 1 1 1
a
QB2 b c 1 1 1
a
QA3 b c 1 1 1
1
QB3 b c 1 1 1
a 1 1 1
a 1
QA4 b c
a
1
QB4 b c
a
QA5 b c 1 1 1 QB5 b 1 1 1
a
QA6 b c 1 1
Score 6 4 4
c 1 1 1
a 1 1 1
1
QB6 b 1 1 1
c 1 1 1
a
c
d
Score 4 4 4
d 1 1
a
QB7 b c 1 1 1
a
QB8 b c 1 1 1
a 1 1 1
QB9 b c 1 1 1
a
Q7 b c
Score 9 9 9
Test elements Q1 a Solution 1 S1 1 S2 1
b
Q2 c
d
a
b
Q3
Q4
c 1 1 1
d
a
b
c
c
d 1 1 1
a
Q3 b c 1
d 1 1 1
a
d
a
b 1 1 1
Abstract test generation Q1 a
b
Q2 c 1
Solution S1 1 S2
d
a
b
1
Q4 b c
1 1
1
Dependency rules Q9 a Solution S1 S2
b 1 1 1
c
d
a 1 1
b 1
Q10 c d
e
Score 2 1 1
312
Q5 b c 1 1 1
d
a
Q6 b c 1 1 1
d
d 1 1 1
a
Q8 b c 1 1 1
d
Score 8 6 7