A practical approach to testing GUI systems - CiteSeerX

4 downloads 2044 Views 467KB Size Report
Dec 19, 2006 - GUI applications, the user may use a keyboard shortcut, a button ... to concentrate on other tasks rather than focusing on GUI development.
Empir Software Eng (2007) 12:331–357 DOI 10.1007/s10664-006-9031-3

A practical approach to testing GUI systems Ping Li & Toan Huynh & Marek Reformat & James Miller

Published online: 19 December 2006 # Springer Science + Business Media, LLC 2006 Editor: José Carlo Maldonado

Abstract GUI systems are becoming increasingly popular thanks to their ease of use when compared against traditional systems. However, GUI systems are often challenging to test due to their complexity and special features. Traditional testing methodologies are not designed to deal with the complexity of GUI systems; using these methodologies can result in increased time and expense. In our proposed strategy, a GUI system will be divided into two abstract tiers—the component tier and the system tier. On the component tier, a flow graph will be created for each GUI component. Each flow graph represents a set of relationships between the pre-conditions, event sequences and post-conditions for the corresponding component. On the system tier, the components are integrated to build up a viewpoint of the entire system. Tests on the system tier will interrogate the interactions between the components. This method for GUI testing is simple and practical; we will show the effectiveness of this approach by performing two empirical experiments and describing the results found. Keywords GUI testing . Two-tierd testing . Testing . GUI . GUI component . Complete interaction sequence . Event sequence

1 Introduction Graphical user interfaces (GUIs) are now the common interface for user interaction. They significantly simplify a software system, allowing novice users to perform tasks with ease. P. Li (*) : T. Huynh : M. Reformat : J. Miller Department of Electrical and Computer Engineering, Electrical and Computer Engineering Research Facility, University of Alberta, Edmonton, AB T6G 2V4, Canada e-mail: [email protected] T. Huynh e-mail: [email protected] M. Reformat e-mail: [email protected] J. Miller e-mail: [email protected]

332

Empir Software Eng (2007) 12:331–357

However, as they attempt to become more usable, the source code for these systems is increasing in size and complexity. Over half of the code for modern GUI systems is responsible for GUI display rather than functionality (Myers and Olsen 1993; Memon 2002). GUI systems’ security, robustness and usability are improved once they are tested for correctness. Unfortunately, due to their special characteristics (Gerrard 1997; Memon et al. 2001), conventional code-based testing techniques are often difficult to apply to GUI testing; hence testing GUIs is a challenging task—which require a unique solution. Some of the special characteristics of GUI systems include: &

&

&

&

&

Event-driven software and extremely large input space—The input to the GUI systems is a set of event sequences triggered by the user. The permutations of these events can be huge. Furthermore, many GUI objects require inputs from users. The users may choose to input values by click on a selection or entering a value directly. The permutations of all these input values grow dramatically when the GUI system is complex. Consequently, the extremely large size of different inputs and events leads to a large number of GUI states that need to be tested. Hidden synchronization and dependencies—It is very common for GUI objects to have some form of synchronization and dependency implemented. In many situations, the synchronizations and dependencies between objects cannot be restricted to objects in the same window (e.g., a value in one window may be linked to a value in another window). The concern for testers is discovering these links. However, finding all these links is not easy. Object oriented software programming—Today’s GUIs map very well to the objectoriented paradigm. During development, a developer typically uses instances of precompiled objects stored in a library to build the GUIs hierarchically. The source code for these elements may not always be available for code-based coverage evaluation. So traditional coverage criteria cannot be applied to GUI testing directly. Many ways in, many ways out—One of the advantages of GUIs is that they can provide multiple ways for users to interface with an application. In the majority of situations in GUI applications, the user may use a keyboard shortcut, a button click, a menu option, a click on another window, etc. How many of these should be tested? If the user is allowed to access a feature using all these different ways, then must we test the same feature again many times? Window management—In a GUI environment, users employ the standard features of the window management system, which is a software package to help users monitor and control different contexts by separating them physically onto different parts of one or more display screens. These features include window movement, resizing, maximization, minimization, and closure. Although the operating system handles the windows’ behavior, the programmer must anticipate the impact of these behaviors on the application. For example, closing a window before completing a transaction may leave the application or the database in an inconsistent state. Programmers might avoid such complications by disabling all the standard window buttons and commands. However, they might also have made it impossible for users to reverse or undo certain actions. From the test’s point of view, which standard window controls need to be tested? Where is the dividing line between testing the application and testing the operating system? Do we need to test navigation paths both forwards and backwards?

Today, developers can use many user interface design tools to help speed up the GUI design and implementation process. These tools can shorten development time by allowing

Empir Software Eng (2007) 12:331–357

333

developers to concentrate on other tasks rather than focusing on GUI development. However, testing GUIs can be a frightening experience for testers thanks to special properties that are unique to GUI systems (Gerrard 1997; Memon et al. 2001). Amazingly, research done on GUI testing is rather limited when compared against the popularity of GUI systems (Belli 2001; Memon et al. 2001; Memon et al. 2000a, b; White et al. 2001, White and Almezen 2000; White 1996). We will attempt to address this deficiency by introducing a very practical approach to GUI testing.

2 Background Originally, researchers applied the concept of Latin Squares to reduce the number of possible test cases substantially (White 1996). However, their assumption to test only pairs of GUI objects seems to be too limiting, as more complex GUI interactions cause many “real-world” failures. Subsequently, White introduced the concept of Complete Interaction Sequences (CIS), which, to some extent, assists in partitioning the entire GUI system to sets of independent tasks (White and Almezen 2000). Their steps to the GUI testing process include: 1. Responsibilities are identified using all the available information sources concerning the GUI system. A responsibility is a GUI activity applied on one or more GUI objects to produce an observable effect on the surrounding environment of the GUI. This definition tends to be ambiguous in many situations and no information was provided on the process of identifying the responsibilities; we will address this issue in Section 3.1. 2. For each identified responsibility, the corresponding CIS is defined. A CIS consists of the sequence of relative GUI objects and selections that will invoke a responsibility. 3. For each CIS, a finite state machine (FSM) model is constructed. Some state reduction algorithms are then applied to produce a reduced FSM for each original FSM. 4. Tests are generated for each component according to its corresponding reduced FSM, which is represented as a directed graph. Each test takes the form of a directed path through a component, starting with a specific input to the component, and terminating with a specific output from that component. White further investigated the use of memory tools to detect missing effects and the interactions between different CISs (White et al. 2001). Belli extended this method by transforming the CISs to Fault Complete Interaction Sequences to test GUIs in a holistic view (Belli 2001). To some extent, Belli’s and White’s models tend to oversimplify the behavior of each individual GUI object. For example, Fig. 1 shows a diagram of an open file interaction sequence based on White’s model. This model simplifies the behavior of all GUI objects to a single situation. A successful transition from D (Name) state means that the user has entered a valid file name. But what happens when the user enters in an invalid filename? Only considering valid situations makes the testing model incomplete and inaccurate. Belli extends White’s model to include illegal interactions between states, however, the invalid situation still remains missing. An AI plan generation technique to generate testing information was also developed by Memon et al. (2000b). Memon claims that GUI testing can be treated as a typical planning problem: given a set of operators, an initial state and a goal state, a planning system produces a sequence of the operators that transforms the initial state to the goal state. The

334

Empir Software Eng (2007) 12:331–357

Fig. 1 The GUI of the notepad application

A

F A: File B: Open D: Name E: Select F: Cancel H: Open

B

E

D

H

additional benefit of this method is its ability to automate the construction of a testing oracle (Memon et al. 2000a). This test case generation process includes two phases: the setup phase and the plan-generation phase. In the setup phase, PATHS (Planning Assisted Tester for graphical user interface Systems) creates an abstract model of a GUI and uses that to produce a list of operators representing GUI events. Then test designers use their knowledge of the GUI to define the preconditions and the effects of the operators. In the plan-generation phase, the test designer first describes a task. The task is defined by a set of initial states and goal states. Once the task has been specified, PATHS automatically generates a set of distinct test cases to reach the goal. The test designer should iterate through the plan-generation phase a number of times with different scenarios to generate alternate plans for each task. Additionally, Memon et al. (2001) define a set of Coverage Criteria using event sequences to specify a measure of test adequacy for GUI software. However, using event sequences to evaluate the coverage of a test suite can be rather inaccurate due to the large

Empir Software Eng (2007) 12:331–357

335

amount of infeasible event sequences in the system under test. We will provide more details about these unfeasible event sequences in Section 3.4. Memon’s model also suffers inaccuracy. For example, Fig. 5 shows a modal dialog. The events within this model dialog from a GUI component. The three main events in this GUI component are “Yes,” “No” and “Cancel.” Events “No” and “Cancel” are termination events because they terminate the modal window. However, the event “Yes” can be a restricted-focus event or a termination event based on preconditions before the invocation of this modal dialog. We cannot create an event flow graph for this component using Memon’s definition because we cannot classify the “Yes” event. Figure 2 is a model of the integration tree; this integration tree is based on Memon’s definition of how GUI components can be integrated to form the GUI. The invocation relationship between the FileNew, ConfirmSave, and FileSaveAs components and the relationship between FileSave and FileSaveAs are incomplete. FileNew only invokes the ConfirmSave component if the text content is modified, the ConfirmSave component then only invokes the FileSaveAs component if a filename is not defined. The FileSave component only invokes the FileSaveAs component if a filename is not defined. Memon’s definition does not allow these conditional invocations.

3 A Two Tiered GUI Testing Strategy Effective testing cannot be achieved without using abstraction to conquer the astronomical complexity of typical software systems. GUIs have many special characteristics (Gerrard 1997; Memon et al. 2001; White 2000) that have to be taken into consideration during the test model creation process. The mapping between the GUI and the underlying code is not always simple and straightforward, so traditional testing methods cannot be used to test GUI systems without modifications. The objective of creating a single model of the GUI, which can represent all features within a system and abstract out all of the details from the system without losing the essential information for revealing faults, is actually difficult to accomplish. Especially for large and complex systems, detecting various types of errors based on a single testing

Fig. 2 A partial integration tree

336

Empir Software Eng (2007) 12:331–357

model is difficult. A wise strategy is to separate testing concerns for the system and conquer them, respectively. Each type of model has its advantages and weakness. For different testing concerns, appropriate testing models are applied to achieve the optimum result for the entire system. Our strategy breaks a system down into two abstract tiers: the component tier and the system tier. This strategy allows us to apply different testing models that best suite the appropriate “to-be-tested” tier. The testing objective for the component tier can be realized by verifying whether each component completes its corresponding responsibility correctly with various input space divisions. The testing model on this tier is a flow graph combined with state based models of the involved GUI objects. Some well-known heuristics for test case selection including equivalence classes, boundary value analysis, and special values testing (Myers 1979) play a key role here. The flow graph model facilitates the application of these testing techniques. On the system tier, the components are integrated to build up a viewpoint of the entire system. Tests on this tier try to interrogate the interactions between components. An Event Sequence Diagram is created to represent the system on this tier. 3.1 Responsibility Definition and Identification Before we can begin producing test cases on the component tier, we must first identify all responsibilities for the GUI system under test. This is a process to divide a large complex GUI into independent components that can be tested in isolation. Components will be constructed according to their corresponding responsibilities. Tests on the component tier will deal with the validity inside these components. White’s definition of a responsibility is: a GUI activity applied on one or more GUI objects to produce an observable effect on the surrounding environment of the GUI (White and Almezen 2000). This definition is ambiguous because the size of the “observable effect” is unknown; it can be very small or very big. Hence, the corresponding responsibilities can be as simple as involving only a couple of activities or as complex as involving several windows. There is no clear idea about the size of a responsibility and how to identify it. For example, most setting dialogs require the user to click on the “Apply” or “Ok” button before the changes take effect. A user can toggle a setting many times before clicking on the “Apply” or “Ok” button. So, should a simple act of toggling a setting without clicking on the confirmation button be considered a responsibility? While using a definition similar to that given by White, we provide clear and practical guidelines on identifying responsibilities. 3.1.1 A Brief Introduction to the Example GUI System Used In order to evaluate our testing strategy, we applied our strategy on two different GUI systems. The details of these two experiments will be discussed in Section 4. This section will give a brief overview of the GUI system we used in our first experiment because we will use this GUI system as an example in the following sections. This GUI system is a clone of the Notepad software, this time created in Java. The software defines nine classes and is approximately 1,400 lines of code in length. The GUI interface includes a main window, three menus (the help menu was not considered in our experiment), five dialogs and a number of message boxes. Figure 3 shows the main window of the application. We will provide screenshots on the other parts of the application as required to illustrate the system.

Empir Software Eng (2007) 12:331–357

337

Fig. 3 The GUI of the notepad application

3.1.2 Responsibility Definition A responsibility using our definition is a group of activities performed by users on one or more GUI objects to complete a task. The key difference between this definition and White’s is that this definition requires a task to be completed before a responsibility can be classified. Using the same example above, the act of opening a setting dialog then just toggling a setting on and off many times without closing this dialog would not be considered as a responsibility because the task is not completed until the setting is either saved or cancelled. A responsibility can be viewed as a process involving one or more steps that once invoked will capture the user’s focus until it is finished or aborted. The “task” here refers to the function specified in the software specification and thus, “task” can be used interchangeably with “function.” Usually, a task has to be invoked directly by the user. Observable effects on the GUI and its environment can be produced either indirectly or directly from a completion of a task. For example, dialogs that allow a user to change settings are almost universal in today’s GUI systems. When users conduct the setting function, no matter whether the change in settings changes the GUI appearance or not, certain parameters from the application setting must have been changed as required. Before going any further, we must discuss some of the commonly confusing topics about responsibilities. First, in the system tier, an Event Sequence Diagram (ESD) is used to represent the entire GUI system—the definition of the ESD is discussed in Section 3.3. The sequences of activities that implement the responsibilities are the input event sequences to the ESD (more details will be discussed in Section 3.3). Hence, the number of identified responsibilities will significantly affect the scale of the model. For a complex GUI system, the number of its responsibilities may be large. It causes the structure of the ESD to become very complicated and the testing cost to increase. We deal with this problem by simply grouping certain responsibilities together as a compound state of the ESD. The compound state can be represented as a sub-ESD if required. Second, some GUI objects can include more sub-objects. For example, a “Save a File” dialog (Fig. 4) has buttons, lists, a combo box and so on. These GUI objects are on the same level; that is, the whole set of variables to the objects will not be passed down to the under lying code until the user closes the dialog. Hence, these objects are treated as one GUI object.

338

Empir Software Eng (2007) 12:331–357

Fig. 4 The save a file dialog box

In general, during responsibility identification, we only consider GUI objects such as menu items, dialogs or sub-windows, and controls on toolbars as objects for analysis. 3.1.3 Identifying Responsibilities The process to identify the responsibilities of a GUI begins with the gathering and exploring of a variety of reference materials for the system. The possible sources include the requirements specification, the design document, the user manual, and even experience when interacting with the system. In the following part of this section, we will provide some guidelines and examples to describe how to explore the available information to identify the responsibilities. The first possible source for identifying responsibilities is the Software Requirements Specification (SRS) document. It is a fundamental document, which defines what the system is required to do and what constraints are placed upon the implementation. The major functions and sub-functions that the software should perform are described in this document. Based on these descriptions, we can determine which part will require users’ interactions; these parts are the starting point for identifying responsibilities. However, since the SRS only contains functional and non-functional requirements and does not describe many details about the GUI design, generating accurate responsibilities is not possible. Responsibilities can be identified during the design stage when the design is detailed enough to be implemented. This responsibility list should be reviewed once implementation has been completed, as there may be changes from the initial design. At this stage, we were able to identify 13 responsibilities for our Notepad application: open an existing text file, create a new text file, redo a cancelled action, undo the last action, cut, copy, paste, wrap line, find a word in the text, set the attribute of the text’s font, save the file, print out the file, and exit the application.

Empir Software Eng (2007) 12:331–357

339

If the design is not detailed or has been omitted, testers will have to wait until implementation and unit testing has been completed before they can identify responsibilities. They will have to explore all GUI objects associated with each function based on the SRS. Responsibilities are then identified based on this work. Another information source that is helpful with responsibility identification is the user manual. They describe what the software is designed to accomplish and step-by-step processes of how to complete tasks.

3.2 The Component Tier 3.2.1 GUI Components In the proposed approach, a component is defined as a subsystem of the GUI system, and each component is a unit of testing. Testers will have to identify all the responsibilities; each identified responsibility can be mapped to a component. In other words, a component is a collection of GUI objects that contribute to the implementation of the corresponding responsibility. For example, the Create a new text file component is defined after we identified the GUI objects required for the Create a new text file responsibility. These objects are the menu item File-new, the Confirm Save message box, the Save dialog, and the text area. We have to note that a GUI object may belong in multiple responsibilities; thus, multiple components may contain the same GUI object. Testing on this tier aims at verifying whether the behavior of an individual component meets the requirements for the corresponding responsibility. Some responsibilities are very simple and straightforward; as soon as users invoke the corresponding GUI objects, their event handling programs start to run and complete the task without any more inputs from the users. An example would be the Redo and Undo responsibilities in the Notepad application. Checking whether these responsibilities work well independently is straightforward. For these components, testing efforts focus on checking whether they work properly in different circumstances. Doing so involves testing the interactions among the components. We leave this testing to the system tier testing. Conversely, many responsibilities are complex, involving complicated logical relationships and multiple steps; for example, the Open an existing text file responsibility in our Notepad application includes several steps: 1. Checking whether the current text area needs to be saved. 2. Getting the drive and file name. 3. Loading the contents of the file in the text area. Usually, multiple GUI objects are often required to make up the components associated with these responsibilities. The execution path of a component is dependent on the inputs to the GUI objects. The input space of the GUI objects and the logical relationships among them are explored and analyzed. Hence, all objects that need inputs from the user are tested on this tier. Generally, the execution of these components starts from the GUI objects in the top-level frame, such as the application’s main window. These GUI objects invoke subframes comprising of more GUI objects, which in turn can invoke more sub-frames. These objects can be visualized as a tree structure, each one having a single parent and a number of children. Our testing strategy combines all GUI objects that are a part of a responsibility into a component. These components will usually follow different execution paths depending on the inputs received. Thus, verifying whether a component works properly

340

Empir Software Eng (2007) 12:331–357

through different paths will satisfy our testing objective on this tier. This verification process can be done by creating a flow graph for each component (the flow graph is a diagrammatic representation of the possible alternative control flows) then creating inputs to follow through all of these alternative flows. In traditional branch testing, it is convenient to examine a directed graph representation of a program, which shows the flow of control through the program by using a network of nodes and edges. Our graph is a directed graph showing different execution paths through various GUI objects of a GUI component. Each node in the flow graph represents a GUI object. Edges leaving a GUI object represent users’ GUI activities on the object that trigger an event. Each flow graph has one start node (the first GUI object required for the responsibility) and one end node (the GUI object of the responsibility). Each complete path is a sequence of events causing the completion of a task. There are GUI attributes that can affect the execution path of the component and the attributes that can change during the execution of the component. These are the pre-conditions and post-conditions, respectively. A formal definition of the flow graph is defined as follows: A flow graph of a GUI component C is a 4-tuple where: 1. N is a set of nodes representing all the GUI objects for the responsibility. Each n∈N represents a GUI object in C. 2. E  N  N is a set of directed edges between the nodes. Each e∈E represents the control flow from one GUI object to the next. We say that the control flows from n1 to n2 iff the user can interact with n2 after interaction with n1. 3. ni  N , is the initial node. 4. nf  N , is the final node. For each n∈N, we define to_set(n) as a set of all nx such that the control flows from n to nx. We then determine to_set(n) for all n∈N using the following algorithm. This algorithm initializes an empty set called to_set, then initiates an interaction with n. Then, for each v∈ N, it checks to see whether the user can now interact with v. If the user can interact with v, then it adds v to the to_set set. If not, it forces the preconditions for the GUI attribute v to be true, then interacts with n and checks again to see whether v is now accessible. Once all v∈N are checked, it returns the final to_set(n). More formally, the algorithm is as follow: Get_to_set(n) { to_set = {}; interact_with!(n); foreach (v∈N) { if (canInteract?(v) and v≠n) { to_set = to_set [ v; } elseif (v∉to_set) { setPreconditionsForV!(v); interact_with!(n); if (canInteract?(v) and v≠n) { to_set = to_set [ v; } } } return to_set; }

Empir Software Eng (2007) 12:331–357

341

Fig. 5 The confirm save message box

3.2.2 An Example of a Flow Graph In this section, we present an example to show how to create a flow graph in detail. The GUI of the Notepad application is used as the example. After careful analysis of the responsibilities identified, we came up with eight components for our example application. The identified components are: & & & & & & & &

Open an existing text file Create a new text file Edit a text file Find a word in the text Set the attributes of the font Save a text file Print out a text file Exit the application

The Edit a text file component is actually a compound component containing six simple components: redo, undo, cut, copy, paste, and wrap line. We decided to make this compound component because the individual components in this compound component are very simple and testing them is straightforward. The Create a new text file component is used (by the user) to create a new file (the filename is unspecified at this point) with a blank text area. Our Notepad application enters this state when it is executed. We identified the following GUI objects to be a part of this component: & & & &

Menu item File-new (shown in Fig. 3). The Confirm Save message box (Fig. 5). The Save dialog (Fig. 4). The text area (Fig. 3).

All these objects should appear in the flow graph. Some of these objects are sub-frame windows meaning they are composed of more GUI objects. For simple sub-frames such as the Confirm Save message box, which has only three buttons without any complex Table 1 Summary of the attributes for “Create a new file” Pre-condition attributes

Post-condition attributes

The text area may or may not be modified

The text area is blank and the window title is called “untitled” The original content of the text area is saved to a file

The original file may or may not contain a filename

342

Empir Software Eng (2007) 12:331–357

interaction, we can incorporate their models into the flow graph directly. We need to create sub-graphs for complex sub-frames with many sub-GUI objects and complex interactions among them such as the Save dialog. Table 1 shows the precondition attributes and the output attributes for Create a new text file. Now that all the necessary information has been obtained, we create a flow graph for the Create a new text file component. This flow graph is shown in Fig. 6. In the flow graph, a node can be represented as a rectangle or a diamond. A rectangle represents the model of a GUI object. A diamond shaped node represents a decision-making step in the flow graph. We should note that some of the nodes are complex sub-frames. For these nodes, sub-graphs will

Fig. 6 The new component flow graph

File-New Menu Item Precondition 1 not satisfied Precondition 1 satisfied

Save Confirm Dialog

Cancel Yes No Button Button Button

Precondition 2 not satisfied

Precondition 2 satisfied

Save Dialog

Text Area

Empir Software Eng (2007) 12:331–357

343

Fig. 7 The sub-graph of the save a file dialog

Save Confirm Dialog Cancel Yes No Button Button Button

Look In List

File List

New Folder button

Uplevel button File Name Text Box

Cancel

Home button File Type List

Save

Text Area

be created using the models of the associated GUI objects. In many situations, testing different event-sequences inside a sub-frame is not very meaningful. For example, a dialog, regardless of its type, is always composed of many GUI objects designed to solicit information from users. These objects are located on the same tier. The possible sequences of activities performed on them do not result in significantly different outputs. Before the user clicks on a confirmation button, the software does not pass the selected data to the underlying code. In other words, the underlying code will only run according to the final state of the GUI objects before the confirmation action. Therefore, testing the permutation of the various states of the GUI object models is much more important than testing permutations of event sequences that can be performed. Figure 4 shows the Save dialog within the Create a new text file component. The dialog is composed of seven buttons, two drop-down lists, one list box, one file list box, one text box and some labels.1 Our testing strategy does not look at the labels and the aesthetic conditions of the GUI. Figure 7 is a simplified graph of the Save dialog. The objects in the dash box are treated as one compound object because they are on the same level as each other (that is, the software does not pass the selections or data from these buttons to the underlying code until the dialog box is closed). The Cancel and the Save buttons are not included in the dash box 1 We used the FileChooser class for our Save dialog. However, the “Detail” button and “List” button do not work in JDK 1.2 and the Rational Robot tool that we used in the experiment only supports JDK 1.2. Thus, we excluded these two buttons in our experiment.

344

Empir Software Eng (2007) 12:331–357

because they cause the dialog to terminate. The start node is the same as the Save Confirm Dialog node in Fig. 6 while the end node is the same as the Text Area node in Fig. 6. These two nodes are not part of the Save dialog, they are displayed to illustrate the continuity between the Fig. 6 and this sub-graph. The flow graph for the Create a new text file component is complete at this stage. Test cases can now be created to test this component. We also created the flow graphs for other components based on the steps above; the test cases for these components are then generated (the coverage criteria used is described in Section 3.4). Our testing objective on this tier is to verify whether all these functional implementations can meet their requirements under varied inputs. Testing on this tier is very important for the system because a large percentage of errors can be found at this level. In the system testing stage, the testing concerns will focus on the interactions among the components. 3.3 System Tier Testing The goal on this tier is to find the most effective and efficient input event sequences that will allow testers to verify the GUI’s responses and whether all components are working together correctly. Although finite state machines represent a very powerful way of describing and implementing the logic for an application, they need to be modified for GUI systems (due to the special characteristics associated with GUI applications). As discussed, the number of distinct event sequence inputs to a GUI may be huge and can result in a large number of GUI states. This problem makes the building, maintaining, reviewing, checking and testing of the FSM models very difficult or even impossible. For large and complex systems, state explosion can create a serious challenge for testers. There are two ways we can deal with state explosion: abstraction and exclusion (El-Farand and Whittaker 2002). Abstraction involves the incorporation of complex data into a simple structure. For example, a dialog box requiring the entry of multiple fields of data before clicking the OK button can be modeled as two abstract inputs: valid data entry and invalid data entry. All the fields are treated as a structure. If the entire set of data is valid, then the transition to the correct state is made. When the data is invalid, the model represents enough knowledge to go to an error state. The downside of using abstraction is that it always has the effect of losing information. In this case, any possible error associated with the entry is simplified as invalid data. We lose the information about the exact field that caused the error. The other way to deal with the state explosion is exclusion. This method drops information rather than abstracting it. This can be done by decomposing the software behavior into multiple models. Each model will contain certain inputs and exclude others. Hence, the models cover different aspects of a system. Our testing strategy is based on the exclusion philosophy; we break the GUI system down into two tiers: the component tier and the system tier. Component tier testing focuses on: do the components work as expected in isolation? System tier testing focuses on verifying whether all the components in the GUI system can communicate with each other to complete a task. Each component will be treated as a trusted unit on this tier because they have passed the component tier testing phase. The execution path of the system will be represented by a sequence of these trusted units. The testing model for this tier is based on the Finite State Machine (FSM) model and is called an Event Sequence Diagram (ESD). This ESD model represents all the possible execution paths in the system. One other problem with using FSMs for GUI testing is state definition. A state in a FSM is identified with respect to the variables of the system at a certain time. Usually, a GUI has many

Empir Software Eng (2007) 12:331–357

345

attributes that can affect its state. To explicitly define the states of a GUI is not an easy job. However, the testing objective for a GUI system is not to check its rigid states and verify every single transition, but to evaluate the external behavior of the GUI and the operating sequences of its components. What we need is a model to integrate all the components identified on the previous testing stage into a system, which represents the way in which users interact with the GUI. We use an Event Sequence Diagram (ESD) for our FSM model. This is a diagram where a state is characterized by a set of inputs and its responses to the inputs. In other words, each node represents an event while a state transition is determined based on how the current node is responding to the inputs. Sub-ESD models can represent nodes with complex behavior; this setup allows complex ESD models to be broken down into simpler models. For complex GUI systems, the number of event sequences will be very large, and hence, it would be impractical to analyze without this abstraction. Testers with some practical experience and domain knowledge of the GUI system under test are needed for this modeling process because it is done manually. An Event Sequence Diagram D is a twotuple where: 1. N is a set of nodes representing all the events for the system. Each n∈N represents an event in D. 2. E  N  N is a set of directed edges between the nodes. Each e∈E represents transition from one event to the next. We say that event e2 follows e1 iff e2 can be initiated after e1. For each n∈N, we define follow_set(n) as a set of all nx such that nx follows n. We then determine follow_set(n) for all n∈N using the following algorithm. This algorithm initializes an empty set (follow_set), then triggers the event associated with node n. Then, for each v∈N, it checks to see whether the event v can be triggered by the user. If the user can trigger event v, it adds v to follow_set. Once all v∈N are checked, it returns the final follow_set(n). More precisely, the algorithm is as follow: Get_follow_set(n) { follow_set = {}; performEvent!(n); foreach (v∈N) { if (canTriggerEvent?(v) and v≠n) { follow_set = follow_set [ v; } } return follow_set; } Our model differs from White and Almezen (2000) and Belli’s (2001) in that we use our model on the system tier knowing that the GUI objects have already been tested on the component tier. The input events now deal with the completion of a component’s functionality rather than trivial operations done on the GUI object themselves. White and Almezen (2000) and Belli (2001) tried to cover all low-level information including user actions on the GUI objects in their model; as a result the GUI objects’ behaviors are often simplified to just a node on the state diagram. Where in practice, GUI objects usually have many different states and their output can change depending on the current inputs. Bugs in a GUI system can be missed if we overlook these characteristics of the GUI objects.

346

Empir Software Eng (2007) 12:331–357

O1

A G C

B

A: Highlight B:Copy C:Cut D:Paste E:Undo F:Redo G:Find H:Font I:Linewrap O1:Enter edit O2:Exit edit

D

H

O2

E

I F Main

New

Open Edit file

Entry

Save file

Open a file

Exit

Print the file

Fig. 8 The event sequence diagram for the notepad application

As discussed previously, a GUI system may contain a large number of components depending on its complexity. If the number of components is numerous, testing costs may be too high to be effective because the ESD will be very complicated. We can resolve this problem by grouping certain components together to create a compound state for the ESD. Then, each compound state can be represented by a sub-ESD. The system event sequence diagram for our Notepad application was created using the process described above. The following five states were created from the 13 components that we identified from the previous section: & & &

Open File. Edit File. Save File.

Empir Software Eng (2007) 12:331–357

& &

347

Print File. Exit.

The event sequence for the system can be seen in Fig. 8. After the construction of the event sequence diagrams, we can begin generating test cases for the system. These test cases should cover all edges in the diagram. The next section of this paper will describe what our coverage criteria are for our testing strategy. 3.4 Coverage Criteria The coverage criteria answers an important question that every tester has to face, that is “when should I stop testing?” It defines the rules to measure the adequacy of a test suite. Typically, an effort to detect a certain class of potential errors is measured with a certain type of coverage. For instance, 100% statement coverage cannot only mean that every line of code has been executed; it also means that software has been tested for bugs that can be revealed by simple execution of a line of code. Thus, discussing completeness or thoroughness is not meaningful without reference to an explicit coverage criteria. According to Memon et al. (2001), the events identified within each component are represented as an event-flow graph. Three coverage criteria are defined based upon the events within the components. They are: & & &

Event Coverage: each event in the component is triggered at least once. Event-Interaction Coverage: after an event E has been performed, all events that can interact with E should be executed at least once. Length-n Event-sequence Coverage: a set of P event sequences which satisfy the length-n event sequence coverage criterion if and only if P contains all event-sequences of length equals to n.

Three inter-component coverage criteria were also defined based on an integration tree constructed using the components. These criteria are: & & &

Invocation Coverage: each restricted focus event that opens a model window must be performed at least once. Invocation-termination Coverage: all length 2 event sequences consisting of a restricted focus event followed by one of the invoked component’s termination event has to be tested. Length-n Event-sequence Coverage: all length-n event sequences that start with an event in one component and end with an event in another component must be tested

Since our method for modeling a GUI system is different than the one discussed in Memon et al. (2001), we need to define different event-based coverage criteria for our approach. Our coverage needs to reveal all aspects of a GUI’s behavior from a set of distinct event sequences generated. We will also try to find the minimal set of these event sequences. Below is the definition of the coverage criteria used in our approach. The coverage criteria on the component tier requires that all unique paths in the flow graph to be covered. That is, the test cases generated should cause the execution of all paths in the flow graph. A unique sequence of branches from the flow graph’s entry node to its exit node forms a path. Equivalent, non-unique, paths are paths that only differ in the number of loop traversals (the paths contain the same set of branches). The test cases should cover all edges, including those in the sub-graphs of the nodes and in all models of different GUI objects, at least once.

348

Empir Software Eng (2007) 12:331–357

Our model on the system tier is an ESD. The coverage hierarchy of node coverage, transition coverage, transition pair coverage, and longer sequence coverage (Beizer 1995) is associated with this model. In general, node coverage tells us almost nothing because if the software has already been tested at a lower level, such as unit testing, it is likely that node coverage has been achieved. The essential issue in an ESD is the transitions among the states. Transition coverage, which requires the execution of all edges in the ESD to be covered at least once in the tests, is used for our model. Thus, as a minimum coverage, all edges have to be covered at least once. It is also possible to adopt the longer sequences coverage in some specific circumstances. Memon et al. (2001) criteria suffers from some serious limitations; specifically it includes many unfeasible event sequences in the total set of event sequences. In the case study provided in Memon et al. (2001), 21,659 event sequences were generated up to the length of three with 4,189 (or 19.3%) of them being unfeasible. This percentage is likely to vary significantly between systems. Using the number of total event sequences to evaluate the coverage of a test suite can be somewhat inaccurate. The criteria proposed in this paper does not have this problem because the logical relationships between GUI objects and the testing models. Second, the invocation relationships between components can be oversimplified using Memon’s (2001) method. For example, one message box may be triggered through different states of a GUI. However, the coverage criteria defined in Memon et al. (2001) can be satisfied when the message box appears without any regard to the state of the GUI. Hence, crucial information will be lost during the testing process. The criteria in this paper solves this problem by including states of the GUI objects in the testing models. 4 Empirical Investigations and Results We performed two experiments on two different GUI systems in order to test the faultdetection capability of our proposed approach. The first experiment was performed on the simplified Notepad application. We injected a large amount of errors into the program then, using our testing strategy, we generated test cases using our approach to try and discover the injected faults. WinCvs was used in our second experiment. This application is a GUI front end for a very popular and useful open-source version control system CVS.

4.1 Case Study One This experiment was performed on the Notepad program written in Java. It is the example application through this paper to demonstrate the process of conducting our proposed GUI testing strategy. We performed the following steps in order to apply our strategy to the testing process: & & &

Identify the components and create the corresponding flow-graphs. After analyzing all available information for the system, eight components were defined. For each of the components, a corresponding flow-graph was created. Construct the Event Sequence Diagram for the system tier. An ESD is used to represent the finite state machine model of the GUI system. Generate test cases according to the defined coverage criteria. The test cases were generated with respect to the models on the component and system tiers. On the component tier, test cases were designed to cover all branches on the flow-graph of each

Empir Software Eng (2007) 12:331–357

&

&

&

349

component. On the system tier, test cases were designed to cover all edges of the event sequence diagram of the system. As a result, 12 test cases for the component tier and four test cases for the system tier were created. The tool we used to record and play back the test script files was Rational Robot (2004). It should be noted that a test case script is an entire sequence of actions required to create a complete user scenario for the system— from start-up through all of the actions and ending with shutdown. A test script can be decomposed into a series of individual test primitives that accomplish specific actions. The primitives can be combined in a specific sequence to provide a test script that verifies a unique usage scenario for the product. For the testing strategy in this paper, the test scripts on the component tier may be decomposed into primitives, which can contribute to the construction of test scripts for the system tier. Doing so can reduce the effort to develop the test scripts on the system tier. Create the mutants for the experiment. These mutants hold the same meaning as the ones used in mutation testing. A mutant is killed when a test case causes it to fail (Offutt 1995). To date, this method appears to be commonly used to evaluate the effectiveness of test strategies and the adequacy of coverage criteria (Frankl et al. 1996; Briand et al. 2004; Andrews et al. 2005). In fact, Andrews et al. (2005) empirically demonstrate, that within the industrial system that they examined, that the mutants they injected into their system were similar to the actual faults. Eighty mutants were created for our investigation. In general, in order to avoid interaction effects between faults; only one fault is seeded in the program each time. Execute the test cases on the mutants. A “live” mutant is one that has passed all 16 test cases. Conversely, a “killed” mutant is one that failed at least one test case. We can measure the effectiveness of the test cases by the number of mutants killed divided by the total number of mutants generated. An “equivalent” mutant is one that is functionally equivalent to the original program. We must examine all live mutants to see whether they are equivalent mutants or not. When evaluating the effectiveness of a test suite, these equivalent mutants should not be counted because they are the same as the original program. Analyze and evaluate the result of the execution. After the execution of the test cases on all the mutants, the test results were analyzed to evaluate the fault–defect capability of our proposed approach to GUI testing.

In this experiment, the quantity and quality of the mutants significantly affect the effectiveness of the evaluation. In order to draw statistical inferences, a large population of mutants was required to be seeded into the system under test. Furthermore, the seeded faults should be similar to practical faults made by designers or developers during the software engineering process. Hence, in order to be systematic and as complete as possible, faults were seeded using mutation operators. Due to the limited available research on creating mutant operators for GUI systems, we decided to use two different methods to create and seed the faults. First, based on our previous programming experience on GUI software, we created and seeded faults that are commonly made during the design and development process. For example, we did not reset all the variables back to the default state when a new file is opened; thus, the undo function did not get updated correctly. Second, based on the four levels of faults discussed by Ma et al. (2002), we created various mutants to be seeded in our application. Ma et al. (2002) defined the fault levels as: (1)

Intra-method level faults, which occur when the functionality of a method is implemented incorrectly.

350

Empir Software Eng (2007) 12:331–357

(2)

Inter-method faults and (3) intra-class faults, which are made at the interaction between pairs of methods of a single class or between pairs of methods that are not part of a class. (4) Inter-class faults that occur due to specific object-oriented features such as encapsulation, inheritance, polymorphism, and dynamic binding. On the intra-method level, traditional mutation operators for procedural programs, were adaptable for Java; these can be found in Alexander and Bieman (2002). On the intermethod and intra-class levels, the mutant operators proposed in Delemaro et al. (2001) were used in our experiment. Lastly, Ma et al. (2002) provided six groups of mutant operators for the inter-class level. It should be noted that our Notepad application is not a very complex system. Hence, not all the mutant operators in the four categories discussed can be applied. After analyzing all the mutant operators and the Notepad system, we selected 11 mutation operators to create 74 mutants for the experiment. Adding six mutants designed by seeding the naturally occurring faults found during development, a total of 80 mutants were created. Table 2 shows concise descriptions of all the mutant operators. Some of the operators can be used on both intra-method level and inter-method level because faults on different levels can be seeded when they are applied to different kinds of variables or constants.

Table 2 The mutant operators used in the experiment No.

Mutant operator

Description

1

Increment the value of constants by 1 Decrement the value of constants by 1 Mutators for map

For scalar constants, increment the value by 1. This mutant operator can trigger level 1 faults For scalar constants, decrement the value by 1. This mutant operator can trigger level 1 faults. Mutators are implemented for the Map object in Java. This mutant operator can trigger level 1 faults. Change the values of non-scalar constants. This mutant operator can trigger level 1 faults. For scalar variables, increment or decrement their values. This mutant operator can trigger level 1 faults. Insert an arithmetic, logical, or bit negation operator before the use of each variable and constant. This mutant operator can trigger level 1 faults. Perturb the value returned by return statements. This operator can trigger level 1, 2, and 3 faults. Replace the operators that influence the value of non-interface variable or interface variable. This mutant operator can trigger level 1, 2, and 3 faults. Explicit call of a parent’s constructor deletion. This mutant can trigger level 4 faults. This keyword deletion. This mutant can trigger level 1, and 4 faults. Reference comparison and content comparison replacement. This mutant operator can trigger level 1, 2, and 3 faults.

2 3 4

6

Change the value the constants used in the function Variable increment and decrement operators—“variable” Unary operator insertion

7

Replace return statement operator

8

Replace the operators

9

Inheritance-IPC

10

Java Features—JTD

11

Common programming mistakes—EOC

5

Empir Software Eng (2007) 12:331–357

351

4.1.1 Test Execution Results and Analysis To help speed up the testing process, we recorded the test cases using Rational Robot, then played back the scripts on the 80 mutants. The results of the experiment are shown in Table 4. From Table 3, across the row labeled “Total,” we can clearly see that a large amount of mutants were killed (90%). Upon closer inspection of the remaining 17 live mutants, we discovered 10 equivalent mutants. The remaining seven only caused changes in appearance of certain GUI objects instead of the functional behavior of the application. For example, one mutant caused the application background to stop being repainting when the Print Dialog (modal dialog) was relocated. Other mutants caused minor bugs such as: modal dialogs were changed to modeless dialogs, incorrect layout of the GUI objects, etc. Our test cases were not designed to verify the appearance of the GUI; hence these mutants were not caught by our tests. These errors are hard to detect without manual inspection; testers should use a checklist (GUI Checklist 2004) to perform a visual inspection of the system to detect these types of bugs. Furthermore, we know of no testing approach, as opposed to an inspection approach, which would find these types of errors. We will now compare the test cases generated against the coverage defined in Memon et al. (2001) in order to get a better understanding of our testing approach. The coverage metrics of our test cases are shown in Table 4; these metrics are the percentages of the total eventsequences covered by the test cases. The invocation and invocation-termination coverage metrics are shown in column 1’ and 2’, respectively. Based on the data in the columns for event sequence 2 and 3, we can see that length 2 and 3 event sequences are only covered by a small percentage. We can also expect the coverage percentage to decrease dramatically as the length of the sequences increases. Yet, 90% of the faults seeded in the application were detected by our test cases. Now, we must ask whether the remaining 10% of the faults can be revealed by test cases with 100% coverage (using the coverage defined by Memon et al. (2001)). As discussed previously, these remaining faults only caused changes in the GUI appearance not its functionality; such faults are not easy to detect even with the methods proposed by Memon et al. (2001) and White and Almezen (2000). Thus, the answer to our question is “No.” Usually in an industrial setting, testers will have to pick the most effective test cases and discard low value test cases because of time and money constraints. From the discussion above, we can see that our test cases and coverage criteria are very effective, and thus very practical.

Table 3 A summarization of the results Fault found

Total mutants

Live mutants

Killed mutants

Equiv. mutants

Development Intra-method, inter-method, intra-class Inter-class Total

6 54

2 7

4 47

0 3

20 80

8 17

12 63

7 10

Killed/total (%) 66.7 92.2 92 90

352

Empir Software Eng (2007) 12:331–357

Table 4 The percentage of the event-sequence covered by the test cases Component name

Event-sequences length

Main File open File save Print Font Find word

1’

2’

1

2

3

100 100 100 100 100 100

100 50 50 50 100 100

100 100 100 100 100 100

43.6 12.7 12.7 30 55 66.7

7 2 1.6 3.5 8.9 –

4.1.2 Exploring the Validity of the Mutants While results from other studies suggest that this approach provides a set of faults, which are similar to industrial-strength issues, we will explore the validity of this conjecture in this section. Simple, easy to find, mutants should be found by the majority of test cases generated; hence, the number of times any mutant is killed is shown in Fig. 9. Results obtained show that a small number of mutants fall into this category (three out of the 80 mutants used). However, if we look at the distribution of the number of times any mutant is killed, given in Fig. 9, we quickly see that the majority of mutants, that are killed, are killed by very few test cases. Specifically, 63 mutants are killed, of which 44.5% (28/63) are only

Frequency of killing each mutant 30

frequency

25 20 15 10 5 0 1

2

3

4

5

6

7

8

9

10 11 12 13 14 15 16

number of times Fig. 9 The distribution of how often every mutant is killed

Empir Software Eng (2007) 12:331–357

353

found by a single test case; and 16% (10/63) are only found by two test cases! Hence, while no absolute proof is possible; our results show that the mutants do not conform to any obvious detection patterns. Therefore, we conclude that, in general, the mutants used in this experiment are NOT trivial to locate; and that the majority of them appear to be “difficult” to find. 4.2 Case Study Two The second experiment was performed on WinCvs (2004), which is a graphical front end to the popular CVS version control system. CVS stands for Concurrent Versions System. It is a version control system that has been developed in the public domain since 1986. Using it, developers can record the history of their source files. For example, bugs sometimes creep in when software is modified, and developers may not detect these bugs after they make the modification. With CVS, the developers can easily retrieve old versions to see changes that created the bugs. The biggest problem with CVS is that it uses a command line interface. Since most developers prefer a graphical user interface, several graphic front ends to the CVS core have been developed. One of the best front ends for Windows systems is WinCvs. Many developers around the world depend on WinCvs to help speed up their development effort. WinCvs is currently being developed under the GPL license. Over 20 developers actively contribute to its development. Our approach in this second experiment is to apply our testing strategy to WinCvs to see whether it interacts correctly with both the user and the underlying application. This second experiment provides us with a counter-point to experiment 1 as it provides insight into the approach applied to software utilized in the commercial world developed by independent professional programmers. We had to collect information about WinCvs through many difference resources due to its open source nature. We looked through the user manual for WinCvs, the user manual for CVS, and mailing list containing discussions between the users and the developers. We used WinCvs version 1.0, CVS core version 1.1 and Windows 2000 for this experiment. A single novice tester, a graduate student who was independent from the research team, performed the testing. Unfortunately, WinCvs has very little documentation and the tester had no prior experience with this system. Hence, it took 60 h for this tester to test the system. First, the tester took 15 h to study the CVS systems and the CVS operations supported in WinCvs. Then 30 h were used to analyze the GUI objects used, collect documentation and to identify the responsibilities, create the flow graphs, produce the event sequence diagram and create the test cases based on the coverage criteria. The test cases were executed and examined in the last 15 h. Due to the lack of documentation associated with the open source project, the tester was unable to discover all of the functionality implemented in WinCvs. As a result, the produced test models and cases do not consider the complete WinCvs system; and hence, the coverage of the entire system is not at 100%. Due to, and perhaps because of, the extensive amount of information gathered, approximately 90% of the functionality of the system was discovered by the tester. This figure is a retrospective estimate produced by one of the researchers reanalyzing the WinCvs documentation and discovering functionality, which existed outside of the tester’s models. The tester created 30 test cases for the component tier which covered all of the unique paths for all of the flow graphs that were created. For the system tier, four test cases were designed to traverse through all the edges of the ESD at least once. Although the test cases designed allow the coverage criteria for both tiers to be met

354

Empir Software Eng (2007) 12:331–357

(that is 100% coverage for the subset of the system the tester recovered from the documentation); the coverage on both of these tiers is only at 90% because of the limitation noted—the tester’s failure to recover the entire set of implemented features. Hence, we consider that the tester executed the testing process perfectly after the initial production of the models. Additionally, after retrospective review, the researchers consider that the models themselves are consistent just not complete. Table 5 lists the faults and defects discovered by the test cases. Clearly, the possibility exists that further defects exist outside of the functionality described in the test models.

5 Conclusion Our testing strategy breaks a GUI system down into a component tier and a system tier, allowing us to apply two different testing models on the same system. We also defined coverage criteria for GUI systems to complete our approach. We then performed two empirical investigations to examine the effectiveness of our strategy. From the experiments, we can conclude that our strategy is quite effective and practical to use. Most of the characteristics of the GUI are covered with our strategy and our test cases are very effective at revealing bugs. Our strategy also has two other benefits. First, by breaking the GUI system down into components, our approach is flexible under the cost and time pressures often found in software development companies. This approach to test case management is value-oriented and is very similar to the use of operational profiling (Musa 1998). Table 5 Faults found in WinCvs No. Related responsibility 1

2

3

4

5 6 7

Description

After a file in the work area has been modified, click the “Unedit selection” button. WinCvs will display a message box asking the user whether to revert the changes or not. Then when the user clicks the “Edit selection” button and clicks “Unedit selection” button again, WinCvs does not display any message box. WinCvs shows inconsistent behavior in this situation. Check the status When checking the status of files in a repository, the import time of a file of the files shown in the status window are different from the actual import time and the operating system time. There is nowhere to adjust the time system in WinCvs. Check the difference When there is no difference between the compared revisions, before committing the exit code is 0; when there is difference between the two revisions, the code is 1. When CVS diff aborts, the exit status code is 1 too. This exit code should be different to avoid confusion. Print The print function does not work in our test environment. We believe this problem is caused by our network setup coupled with the old printer driver we were using. Print preview Starting the print preview multiple times can cause WinCvs to hang up. Lock files When a file is locked, it should be able to be unlocked by the same user who locked it. But in WinCvs, users cannot unlock the files locked by themselves. Create a tag In “create a tag” dialog, if users do not select the check box “Tag also the attic of module folder,” the attic folder should not be tagged; but WinCvs tags it too.

Editing a file

Empir Software Eng (2007) 12:331–357

355

Second, regression testing can be accomplished easily with our approach. Modifications are made thorough the life cycle of a software system. When the system is modified either for bug fixes or for updates, all parts that have been modified should be re-tested for errors. These modified parts can easily be located since our strategy has broken the entire system into different components. Additionally, by breaking the system into different components, the testing effort is also minimized because we can restrict testing to only the affected components. While the strategy proposed can be effective, it also has limitations. The modeling step requires a tester with some practical experience and domain knowledge of the GUI system. If the tester has no prior experience then they will need additional time to study the GUI system; however this limitation exists in most manual testing approaches. In order for all of the responsibilities to be identified, either detailed design or specification documents need to be available. If the system lacks documentation then some responsibilities may be missed which leads to incomplete coverage of the system under test as happened in case study 2. Therefore, the approach would probably experience difficulties in integrating into “Agile” development approaches and is perhaps more suited to development approaches where the style can be described as more “Waterfall” based. In our future research, we will attempt to automate the process of creating the testing model and generation of test cases. The possibility of converting the model into an expression, so allowing commercial test tools can use them will also be investigated. Finally, we plan to explore the feasibility of applying our technique to web-based systems.

References Alexander RT, Bieman JM (2002) Mutation of java objects. In: Proceedings of the IEEE international symposium software reliability engineering (ISSRE). pp 341–351 Andrews JH, Briand C, Labiche Y (2005) Is mutation an appropriate tool for testing experiements? In: ICSE ’05: Proceedings of the 27th international conference on software engineering. ACM, pp 402–411 Beizer B (1995) Black-box testing: techniques for functional testing of software and systems. Wiley, New York Belli F (2001) Finite state testing and analysis of graphical user interfaces, Proc. ISSRE 2001, IEEE Comp., pp 34–43 Briand LC, Labiche Y, Wang Y (2004) Using simulation to empirically investigate test coverage criteria based on statecharts. Proc. 26th international conference on software engineering, Edinburgh, Scotland, United Kingdom, pp 86–95 Delemaro ME, Maldonado JC, Mathur AP (2001) Interface mutation: an approach for integration testing. IEEE Trans Softw Eng 27(3):228–247 El-Farand IK, Whittaker JA (2002) Model-based software testing. In: Encyclopedia of software engineering, 2nd edn. Wiley, New York Frankl PG, Weiss SN, Hu C (1996) All-uses versus mutation testing: an experimental comparison of effectiveness. J Syst Softw 38(3):235–253 Gerrard P (1997) Testing GUI applications. In: EutoSTAR’97, pp 24–28 GUI Checklist. http://www.members.tripod.com/~bazman/checklist.html; Accessed November 2004 Ma Y, Kwon Y, Offutt J (2002) Inter-class mutation operators for java. In: 13th international symposium on software reliability engineering, Annapolis, MD. pp 352–363 Memon AM (2002) GUI testing: pitfalls and process. IEEE Computer 35(8):87–88 Memon A, Pollack ME, Soffa ML (2000a) Automated test oracles for GUIs. In: The proceedings of the eighth international symposium on the foundations of software engineering, San Diego, CA. pp 30–39 Memon A, Pollack M, Soffa ML (2000b) Plan generation for GUI testing. In: The 5th international conference on artificial intelligence planning and scheduling, Breckenridge, CO, pp 226–235 Memon A, Soffa ML, Pollack ME (2001) Coverage criteria for GUI testing. In: 8th Europe conference and 9th ACM SIGSOFT foundation of software engineering (FSE-9). pp 256–267 Musa J (1998) Software reliability engineering. McGraw-Hill, New York Myers GJ (1979) The art of software testing. Wiley, New York

356

Empir Software Eng (2007) 12:331–357

Myers BA, Olsen DR (1993) User interface tools. In: Proceedings of ACM INTERCHI’93 conference on human factors in computing systems—adjunct proceedings, tutorials, p 239 Offutt AJ (1995) A practical system for mutation testing: help for the common programmer. In: 12th international conference on testing computer software, Washington, DC, pp 99–109 Rational Robot. http://www-306.ibm.com/software/awdtools/tester/robot/; Accessed November 2004 White LJ (1996) Regression testing of GUI event interactions. In: Proceedings of the international conference on software maintenance, Washington, DC, pp 350–358 White L, Almezen H (2000) Generating test cases for GUI responsibilities using complete interaction sequences. In: International symposium on software reliability engineering, pp 110–121 White LJ, Almezen H, Alzeidi N (2001) User-based testing of GUI sequences and their interactions. In: 12th international symposium on software reliability engineering (ISSRE’01) Hong Kong, China. pp 54–63 WinCVS. http://www.wincvs.org; Accessed November 2004

Ping Li received her M.Sc. in Computer Engineering from the University of Alberta, Canada, in 2004. She is currently working for Waterloo Hydrogeologic Inc., a Schlumberger Company, as a Software Quality Analyst.

Toan Huynh received a B.Sc. in Computer Engineering from the University of Alberta, Canada. He is currently a PhD candidate at the same institution. His research interests include: web systems, e-commerce, software testing, vulnerabilities and defect management, and software approaches to the production of secure systems.

Empir Software Eng (2007) 12:331–357

357

Marek Reformat received his M.Sc. degree from Technical University of Poznan, Poland, and his Ph.D. from University of Manitoba, Canada. His interests were related to simulation and modeling in time-domain, as well as evolutionary computing and its application to optimization problems. For three years he worked for the Manitoba HVDC Research Centre, Canada, where he was a member of a simulation software development team. Currently, Marek Reformat is with the Department of Electrical and Computer Engineering at University of Alberta. His research interests lay in the areas of application of Computational Intelligence techniques, such as neuro-fuzzy systems and evolutionary computing, as well as probabilistic and evidence theories to intelligent data analysis leading to translating data into knowledge. He applies these methods to conduct research in the areas of Software Engineering, Software Quality in particular, and Knowledge Engineering. Dr. Reformat has been a member of program committees of several conferences related to Computational Intelligence and evolutionary computing. He is a member of the IEEE Computer Society and ACM.

James Miller received the B.Sc. and Ph.D. degrees in Computer Science from the University of Strathclyde, Scotland. During this period, he worked on the ESPRIT project GENEDIS on the production of a real-time stereovision system. Subsequently, he worked at the United Kingdom’s National Electronic Research Initiative on Pattern Recognition as a Principal Scientist, before returning to the University of Strathclyde to accept a lectureship, and subsequently a senior lectureship in Computer Science. Initially during this period his research interests were in Computer Vision, and he was a co-investigator on the ESPRIT 2 project VIDIMUS. Since 1993, his research interests have been in Software and Systems Engineering. In 2000, he joined the Department of Electrical and Computer Engineering at the University of Alberta as a full professor and in 2003 became an adjunct professor at the Department of Electrical and Computer Engineering at the University of Calgary. He is the principal investigator in a number of research projects that investigate software verification and validation issues across various domains, including embedded, web-based and ubiquitous environments. He has published over one hundred refereed journal and conference papers on Software and Systems Engineering (see www.steam.ualberta.ca for details on recent directions); and currently serves on the program committee for the IEEE International Symposium on Empirical Software Engineering and Measurement; and sits on the editorial board of the Journal of Empirical Software Engineering.

Suggest Documents