Automatic Test Generation for Database-Driven Applications Zhenyu Dai Amazon.com 705 5th Ave. S, Seattle, WA 98104 206-266-7865
[email protected] Abstract Database-driven software has been widely adopted in many areas of software applications. In this type of software, the database is an integrated part of the system. Traditional testing techniques have focused either on the software or on the databases, but ignored the interactions between the two core components of the system. Recently, the importance of testing database-driven applications has gradually been recognized. Testing cannot be considered complete until the software and its interactions with databases are adequately exercised. In this paper, we present an automatic test case generation for testing the interactions between the applications and the databases. The aim is to generate a test suite that can fulfill the requirement of a set of test adequacy criteria that cover def-use associations of database interactions and the boundary points. We present a tool that supports automatic generation of test cases and a case study conducted to demonstrate the validity of the technique and the tool .
1. Introduction A database-driven application is a program strongly coupled with external databases. A database is a persistent component of a database-driven application; the behavior of the application is highly dependent on the correctness of the databases and the interactions between the application and its databases. To ensure the reliability of database-driven applications, testing is crucial and must cope with the following complex situations: (1) Modern software is often deployed in a shared resource execution environment in which more than one application may access the same database. Under such circumstances, when one application changes a shared database, other applications may be affected. Testing must take into account the shared database as well as all the applications using the database. (2) Many database-driven applications are designed to be deployed in various execution environments and each of them may use different database management systems (DBMSs). Although there are standard interfaces such as SQL, JDBC, and ODBC, the interoperability between the applications and the DBMSs cannot be fully guaranteed. For example, ORACLE will enforce the “CHECK” statement in database schema while MySQL will not. Thus, testing database-driven applications in every deployment context is necessary. (3)
Mei-Hwa Chen SUNY at Albany Computer Science Dept. Albany, NY12222 518-4424283
[email protected] Databases are often frequently updated, and each update may affect the applications that make use of the data. Thus after each update, all the applications using the database must be re-tested. To effectively and efficiently test database-driven applications under these complex situations, there is a need for a testing tool that can automatically generate test cases to fully exercise not only the applications and the databases, but more importantly, the interactions between them. The importance of testing database-driven applications has recently been recognized. Several unit level testing techniques have been proposed [3] [6] [13]. These techniques focus on testing the database at the unit level to investigate the states of the databases after the execution of a single or a group of SQL statements. These techniques were not designed to detect database faults affecting the application programs, or program faults propagating to databases and to the other applications. To test the interactions between the applications and the databases in database-driven applications, Kapfhammer and Soffa proposed a family of test adequacy criteria based on the data-flow analysis of database entities [9]. These coverage criteria can be utilized to design test cases to exercise the dependences between the databases and the applications. Nevertheless, as in traditional dataflow testing, a significant amount of effort will be required to generate test cases in order to fulfill the coverage requirement if a tool/technique for automatic test case generation is lacking. Existing techniques for automatic test case generation include symbolic execution [1] [4] [10], execution-based techniques [7] [8] [11], and model-based approaches [2] [5] [12]; however, none of these techniques account for database-driven applications. In this paper, we present an automatic test case generation technique designed to fully exercise the interactions between the applications and their databases. Our test model is to cover a set of database-related dataflow coverage criteria by extending the concept of symbolic execution [10] and to select inputs from the boundary values of critical points. We developed an automatic test case generation tool implementing the technique to demonstrate its feasibility and conducted a case study by using this tool to show the potential strengths of the technique. The remainder of this paper is organized as follows: Section 2 describes our test model. The technique for test case generation is presented in Section 3 and an automatic
test case generation tool is shown in Section 4. A case study is demonstrated in section 5 and we give our conclusions in section 6.
2. The test model A database-driven application normally contains many database interaction operations. A database interaction operation can be viewed as a computation of a (partial) function from the input space I to the output space O with the database state (i.e., contents of the database) being a part of I or O or both [3]. Our test model is to tackle database-related faults that cause the output of database interaction operations to depart from their specifications. Three test coverage criteria are proposed to guide the selection of test cases to expose database related faults, including all-DIPs, all-DUs, and all-constraints. We use a database entity flow analysis to identify all-DIPS and all-DUs, and a boundary analysis to cover all-constraints. Figure 1 shows a high level view of the test model. It takes the database schema in SQL database definition language (DDL) and program source code as inputs, and conducts DE-dataflow analysis and boundary value analysis. The test case generator then generates test cases , and each test case includes values for both input parameters and database state. All these steps in the model are automated in a tool.
Database Schema
Program Source
Database integrity failure: In [9] Kapfhammer and Soffa defined four types of failures relevant to the database integrity. The first two types are related to the database validity. A program can violate the validity of one of its databases if (1-v) it inserts a record into a database that does not reflect the real world; or (2-v) it fails to insert a record into the database when the status of the real world changes. The last two types are related to the database completeness. A program can violate the completeness of one of its databases if (1-c) it deletes a record from a database while the record still reflects the real world or (2-c) the status of the real world changes and the program fails to include this information as a record in the database. We adopt their definitions of database integrity failures with minor modifications. Under the assumption that when real world changes, the user will conduct a corresponding database interaction operation to commit these changes in database, then type (2-v) can be defined as an insertion operation that fails to insert a record to reflect the real world, and type (2-c) can be defined as a deletion operation that fails to delete a record to reflect the real world. Moreover, in addition to insertions and deletions, modifications of some records can also cause violation of the database integrity. Thus, we define database integrity failures as the following two types: (1) an operation changes the database to a state that is inconsistent with the real world, which covers types 1-v and 1-c. (2) An operation fails to change the database content to reflect the real world, which covers types 2-v and 2-c.
Input
DE-dataflow Analysis
Guide
Guide
Boundary Value Analysis
Us ed b y
Test Adequacy Criteria
Used b y
Test Case Generator Test Cases Automatic Execution and Validation
Figure 1: Testing Model In the following, we first describe the type of database related faults, and then we describe the database entity flow analysis and the boundary value analysis. 2.1 Database-related faults Database-related faults, in general, can be classified into two categories by the two types of failures they can cause: (1) database integrity failure; (2) output inconsistency failure.
Output inconsistency failure: A database interaction operation normally will render the user some outputs based on the contents of the database. For example, if a student conducts a “personal information retrieval” operation, the application should display the student’ s personal information on the screen; and if an advisor wants to check whether a student has finished a specific course, the application should respond with a “yes”or “no”according to that student’ s record in the database. The output inconsistency failure occurs when the outputs of a database interaction operation are inconsistent with the database content. For example, if the application responds with a “no” while the student has indeed finished a specific course, it’ s an output inconsistency failure. 2.2 Dataflow analysis for database entities The purpose of database entity dataflow (DE-dataflow) analysis is to investigate where a database entity is defined and where the defined value of the database entity is referenced subsequently. A statement contains a definition of a database entity if it inserts, deletes, or modifies this database entity, and a statement contains a use of a database entity if it references this database entity. A statement containing a definition or use of database entity is called a database interaction point (DIP). Database integrity failures are caused by incorrect modifications of a database, which usually take place at
database interaction points. Furthermore, the database entities are the persistent components of an application; thus they can exist before and after the execution of the application program. According to this, we define the root node of the control flow graph of the application program as the pseudo-definition of all the database entities. A database entity or its content may be assigned to a program variable, and this variable may be used in a statement different from the statement references this database entity. Therefore, we introduce the notion of content-use, which is a statement where a program variable representing a database entity (or some attribute(s) of this entity) is used. The all-DIPs criterion requires the test set to cover all the database interaction points in the program, and all-DUs requires that, in addition to covering all-DIPs, the test set should cover all the paths between any def-use, def-contentUse, pseudoDef-use and pseudoDef-contentUse pairs of the any database record in the program. Thus, all-DUs criterion subsumes all-DIPs. The coverage of all the def-use and pseudoDef-use pairs is intended to reveal the faults causing database integrity failures, since every possible change to database will be tested together with a subsequent use of the changed content through all possible paths, which have a great chance to detect any incorrect database modification. The coverage of all the def-contentUse and pseudoDef-contentUse pairs is intended to reveal the faults causing output inconsistency failures. 2.2.1 DE-dataflow based path selection To generate test cases to fulfill the all-DIP and the all-DU criteria, we use a depth-first-search to perform dataflow analysis for database entities and then select paths in the application’ s system control flow graph that will satisfy the two test adequacy criteria. The CFG is annotated by DE-dataflow information during the search. In the depth-first-search, we maintain a definition/pseudo-definition set that contains all the record definitions/pseudo-definitions with their CFG node numbers in the current path. Every time a use or a content-use node is visited in the search, if it matches any in the definition/pseudo-definition set with a definition-clear path, the current control flow path will be selected. We als o select paths to cover all the DIPs that are not covered by any def/pseudoDef – use/contentUse pair. A selected path starts from the root node of the system-CFG and ends in a database-record use or contentUse node. The following steps are performed to select the paths: Step1. We maintain a definition/pseudo-definition set for the current path during the search. Specifically, when a node is added (or removed) into (or from) the current path during the search, if this node is a def/pseudo-def of database record, we insert (or delete) this node into (or from) the set with corresponding def/pseudo-def information. Step2. If a visited node is a use/content-use of database record, then we find all the nodes in the def/pseudo-def set that have a definition clear path to the
current node, and mark the corresponding paths as selected paths. Step3. After the depth-first-search is completed, we do a second run of depth-first-search. In this run, when a DIP node that is not covered by the paths selected by the first run is visited, we mark the current path (from the root node to the current node) as the selected path. This is to ensure that every DIP will be covered, because some DIP may not be in any pair of def/pseudo-def and use/content-use. To determine the numb er of iterations in the presence of loops in path selection, we use a commonly used technique in tradition dataflow analysis, which assumes a loop can only iterate up to two times. This assumption uses “two” to represent multiple times, and thus gains efficiency in testing by sacrificing some completeness. 2.3 Boundary value analysis Considering that database integrity failures may also be caused by violation of database constraints, simply covering a path may not expose the fault. All-constraints criterion complements the all-DUs and all-DIPs to catch the constraints violation faults, which requires that for each “definition”type DIP the test set should exercise all the conditions to ensure that the DIP does not violate any database constraint, and should exercise all the possible databas e constraint violation conditions for this DIP. The database constraints can be classified into four types: (1) database attribute type and length constraints; (2) “UNIQUE”, “NOT NULL” and “DEFAULT” constraints; (3) primary key and foreign key constraints; (4) constraints in “CHECK” statement. To satisfy the all-constraints criterion, we conduct boundary value analysis to complement our DE-dataflow analysis in test case generation. Boundary values are the boundaries of the input domain which satisfies the database constraints. We first find the boundary values for both database entities and input parameters based on the database constraints, and then partition the input domain into sub-domains based on the boundary values. In test case generation, we generate test cases to exercise sub-domains and the boundaries of sub-domains. Boundary value analysis focuses on the boundary of the input space to identify test cases. The rationale behind boundary value testing is that “errors tend to occur near the extreme values of an input variable.” In boundary value analysis, a boundary value can also be used to partition the input domain into sub-domains, and each sub-domain should be tested at least once. We define that a boundary value for a variable or database attribute as a value such that the program may have different behaviors (or results) in the following three different cases: (1) the variable/attribute is assigned to this value; (2) the variable/attribute is assigned to a larger value; (3) the variable/attribute is assigned to a smaller value. In a database-driven application, we can get boundary values for database attributes by the following three ways: (1) from database constraints in the database schema; (2) from the SQL queries in the application; (3) from “general boundary values”of different data types.
We use the following three rules to get boundary values for database attributes from a database schema: (1) the value specified by “DEFAULT” constraint should be a boundary value of the corresponding attribute. (2) The values specified in “CHECK” statements should also be boundary values. (3) If an attribute is a string, the length of this attribute should also have boundary values. In addition to the boundary values found from database schema, the database attributes should also have “general boundary values”dependent on their data types: (1) “NULL” should be a boundary value for all data types. (2) “0” should be a boundary value for numeric types (integer, double, float, etc.). (3) “TRUE” and “FALSE” should be boundary values for Boolean type. (4) Empty string should be a boundary value for a string type (i.e., the length of the string is 0). After we find the boundary values for all the database attributes, the boundary values for the input parameters can be easily found. First, we introduce the notation “host variable.” Host variables are the “variables in the host language that are used as parameters in SQL queries” [3]. If an input parameter is a host variable, its boundary values are the same as those of the corresponding database attribute. If not, its boundary values are just the “general boundary values.” In [3], Chays et al., proposed a similar approach to find boundary values for database attributes, but their analysis is only limited to the “check” statement in database schema and SQL queries in program.
3. Test Case Generation In this section, we present a technique for automatic test case generation, which generates test cases not only exercising the selected paths that satisfy the first two test adequacy criteria, but also covering the all-constraints criterion. Each test case includes all the input parameter values and a database state. For each selected path, we calculate the generalized path condition (GPC); the notion of GPC is inherited from the symbolic execution methodology proposed in [14]. In a symbolic execution, symbolic values are used instead of real values for input parameters, and the values of program variables are represented by symbolic expressions during execution and a path condition (PC) is calculated. The GPC is an extension of PC by treating database state as a special input parameter. It is the condition that both the normal and the special input parameters must satisfy for a specific path to be executed. It is calculated by combining the initial conditions and all the branch conditions in a path by using a logical “AND”. The test cases satisfying GPC and covering the all-constraints criterion are generated by using a sub-domain generation technique. In this technique, the domain of each input parameter defined by GPC is first partitioned into several sub-domains by the boundary values of that input parameter. Then one value is randomly chosen strictly within each sub-domain (i.e., not on the boundary of any sub-domain) and is put into the candidate value set of that input parameter. The intuition is that the values from different sub-domains may have different
violation conditions for a database constraint. The boundaries of all the sub-domains within GPC should also be put into the candidate value set no matter whether the boundaries are open or close, since boundaries are usually more fault-sensitive. A test case can be generated by extracting one value from each input parameter’ s candidate value set. We provide four value-combination rules that can be used in sub-domain generation: (1) All the candidate value combinations should be exhausted. (2) Each candidate value should appear at least once. (3) For every two input parameters, every pair of candidate values should appear at least once. The last rule is called pair-wise-coverage in [5]. (4) The user provides a normal test case that represents the normal running of an application; in each generated test case, one and only one input parameter will have different value from the normal test case, and each candidate value should appear at least once in the test set. The first rule will generate the largest number of test cases and the second rule will generate the smallest number of test cases. The sub-domain generation cannot be directly used when the GPC is non-statically defined, i.e., the domain of some input parameter defined by GPC depends on the values of other input parameters. For example, in GPC “x>y and x