Systematic Testing of Database Engines Using a ... - CiteSeerX

37 downloads 30232 Views 232KB Size Report
problem of automated database testing into generating three artifacts: (1) SQL ... for Automated SQL Query Generation using the Alloy tool-set, and experimental ...... puts a load on the clauses generated by Alloy and running the tests with ...
Systematic Testing of Database Engines Using a Relational Constraint Solver Shadi Abdul Khalek Department of Electrical and Computer Engineering The University of Texas at Austin Austin TX, USA [email protected]

Abstract—We describe an automated approach for systematic black-box testing of database management systems (DBMS) using a relational constraint solver. We reduce the problem of automated database testing into generating three artifacts: (1) SQL queries for testing, (2) meaningful input data to populate test databases, and (3) expected results of executing the queries on the generated data. We leverage our previous work on ADUSA and the Automated SQL Query Generator to form high-quality test suites for testing DBMS engines. This paper presents a detailed description of our framework for Automated SQL Query Generation using the Alloy tool-set, and experimental results of testing database engines using our framework. We show how the main SQL grammar constraints can be solved by translating them to Alloy constraints to generate semantically and syntactically correct SQL queries. We also present experimental results of combining ADUSA and the Automated SQL Query Generator, and applying our framework to test the Oracle 11g database. Our framework generated 5 new queries, which reveal erroneous behavior of Oracle 11g.

I. I NTRODUCTION Database management systems has been used widely for the past decades. They are steadily growing in complexity and size. At the same time reliability is becoming a more vital concern; the cost of user data loss or wrong query processing can be prohibitively expensive. DBMS testing, in general, is a labor intensive, time consuming process, often performed manually. For example, to test the correctness of a query execution, the tester is required to populate the database with interesting values that enable bug discovery, and manually check the execution result of the query based on the input data. Automating DBMS testing not only reduces development costs, but also increases the reliability in the developed systems. This paper presents a novel SAT-based approach to automate systematic testing of database management systems. There are three fundamental steps in testing a DBMS: (1) generating test queries with respect to a database schema, (2) generating a set of test databases (tables), and (3) generating oracles to verify the result of executing the queries on the input databases using the DBMS. Previous work has addressed each of these three steps but largely in isolation of the other steps [4], [5]. While a brute-force combination of existing approaches to automate DBMS

Sarfraz Khurshid Department of Electrical and Computer Engineering The University of Texas at Austin Austin TX, USA [email protected]

testing is possible in principle, the resulting framework is unlikely to be practical: it will generate a prohibitively large number of test cases, which have a high percentage of tests that are redundant or invalid, and hence represent a significant amount of wasted effort. Some approaches, such as [3], target generating queries with cardinality constraints. Integrating query generators with data generators, however, is still either specialized [5], or sometimes not possible [3]. Several academic and commercial tools target the problem of test database generation [6], [7], [19], [24]. Nevertheless, they do not support query generation nor test oracle generation. Recent work in query aware input generation [16] takes a parameterized SQL query as input and produces input tables and parameter values, but does not generate an oracle. Recent approaches introduced query-aware database generation [9], [11]. These approaches use information from queries as a basis to constrain the data generator to generate databases that provide interesting results upon query executions. Query-aware generation is gaining popularity in both DBMS and database application testing [21], [23] but requires providing queries manually. The insight of our work is that a relational engine backed by SAT provides a sound and practical basis of a unified approach that supports all the three fundamental steps in DBMS testing and allows generation of a higher quality test suite: queries generated are valid, database states generated are query-aware, and expected outputs represent meaningful executions. Thus, each test case checks some core functionality of a DBMS. We have leveraged our previous work on automated SQL query generation and ADUSA to provide a framework which fully automates the process of systematic testing of database management systems. Syntactically and semantically valid SQL queries combined with query aware data generators and the expected output oracles is the backbone of our framework. The framework builds on that and automates the validation of each test suite on a DBMS, reporting any errors and information to reproduce them. In this paper we make the following contributions: • Framework for DBMS testing. We present a framework leveraging Alloy and the Alloy Analyzer to produce SQL queries, test data and expected result of





executing the queries on the data sets. Extensible Alloy library for SQL and DBMS testing. We provide the Alloy models which create the skeleton for SQL queries and database schemas. These models can be extended to cover additional parts of the SQL grammar and to add user constraints or specifications for more specific test suites. Users can also use the models as an example for other grammar based testing techniques using SAT solvers. Experiments. We conduct experiments on the framework with Oracle11g database management system. We perform tasks for generating SQL queries based on different subsets of the SQL grammar and automatically verify the output of the DBMS. Our experiments successfully identified queries which result in erroneous output. II. E XAMPLE

In this section we illustrate our testing approach by applying our framework on a test database schema. We start with describing our test input database schema and show how the framework produces the corresponding Alloy specifications which generate (1) SQL queries to test the database, (2) input test data to populate the database, and (3) the expected output of running each query on every set of test data. CREATE TABLE students( id int, name varchar(50) );

Figure 1.

CREATE TABLE grades( studentID int, courseID int, grade int );

Example database schema.

Let us consider a sample database schema as shown in Fig. 1. These SQL statements create two relations (also known as tables): (1) students table with two attributes, id of type int, and name of type varchar, (2) grades table with three attributes, studentID of type int representing a student ID number, courseID of type int representing the course ID number and grade of type int representing the grade which the student earned in that course. The first step in our framework is to automatically generate valid SQL queries for testing. Let us consider a subset of SQL grammar consisting of selecting up to two table attributes from either one or two tables cross joined. The terminal strings of the grammar are the table names and attribute names: students, grades, id, name, studentID, and courseID. In addition, we consider that the grammar allows the use of aggregate functions when selecting a field. We consider MAX and MIN aggregate functions in this example. Below is the grammar of SQL queries that we will consider in this example: QUERY ::= SELECT FROM SELECT ::= ’SELECT’ selectTerm+ FROM ::= ’FROM’ (table | table JOIN table) selectTerm ::= term | agg(term) table ::= ’students’ | ’grades’

SELECT SELECT SELECT SELECT SELECT SELECT SELECT SELECT SELECT SELECT SELECT SELECT SELECT SELECT SELECT SELECT ...

courseID, studetnID FROM GRADES, STUDENT; MAX (courseID), MAX (studetnID) FROM GRADES, STUDENT; MIN (courseID), MIN (studetnID) FROM GRADES, STUDENT; courseID FROM GRADES, STUDENT; MAX (courseID), MIN (NAME) FROM GRADES, STUDENT; MAX (NAME), MIN (courseID) FROM GRADES, STUDENT; courseID, MIN (NAME) FROM GRADES, STUDENT; courseID, MAX (NAME) FROM GRADES, STUDENT; NAME, MAX (courseID) FROM GRADES, STUDENT; courseID, NAME FROM GRADES, STUDENT; NAME, MIN (courseID) FROM GRADES, STUDENT; courseID, MIN (courseID) FROM GRADES; studetnID, MIN (studetnID) FROM GRADES; MAX (NAME), MIN (id) FROM STUDENT; MAX (id), MIN (NAME) FROM STUDENT; id, MAX (NAME) FROM STUDENT;

Figure 2.

Sample SQL queries generated by our approach.

term ::= ’id’|’name’|’studentID’|’courseID’|’grade’ agg ::= ’MAX’ | ’MIN’

After automatically generating the complete Alloy model for this SQL grammar, the Alloy analyzer, based on SAT, will convert all Alloy formulas into Boolean formulas and enumerates all possible solutions satisfying the model. We run the output through our concretization program to convert Alloy instances into complete SQL queries. For the grammar in this example, considering up to two SELECT terms, up to two FROM tables, and two aggregate functions, we get 186 unique non isomorphic1 SQL queries generated, which is what we would expect. Fig. 2 is a sample subset of the SQL queries generated by our approach for this example. For each of the SQL queries automatically generated, ADUSA automatically generates test input and test oracle to verify the output of the query upon running it on a database management system. ADUSA [1] is a framework for query aware test generation. It uses the Alloy Analyzer which in turn uses SAT to generate test data and test oracles. The use of Alloy enables specifying constraints on both the query as well as the results which enables more precise test input generation. Using the same students table schema described above, the framework adds Alloy constraints to model the relational properties between the tables of the database schema such as primary and foreign key constraints. The following Alloy specification models the students schema representation: one sig student { rows: Int -> varchar } { all x: rows.varchar | one x.rows }

ADUSA continues by adding Alloy functions to model the SQL query under test. Each section of the SQL statement is modeled as a separate Alloy function which enforces the constraints on the data selected by the query. For example, the following Alloy paragraphs model the query SELECT id FROM STUDENT; : fun query () : Int { {select[from[student.rows]]} } fun select (rows: Int -> varchar) : Int { {rows.varchar} } 1 In this example, two queries are considered isomorphic if they only differ in the order at which the SELECT or the FROM tables are used.

fun from(rows: Int -> varchar) : Int -> varchar { {rows} }

Functions (fun) used by Alloy represent named expressions. A fun paragraph takes a relation as input and returns a relations with similar or different arity. As the SQL queries get more complicated, ADUSA adds more functions and predicated to model the WHERE, GROUP BY, and HAVING clauses. After generating the Alloy specification for the schema and the query, ADUSA uses the Alloy Analyzer to find database instances that satisfy the specification, i.e. provide valuations for the types and relations that satisfy all the constraints. The Alloy Analyzer finds all the instances within a given scope, i.e., a bound on the number of data elements of each type to be considered while generation. Below is an example of an instance generated by the Alloy Analyzer: varchar: {varchar$0, varchar$1} student:{student$0} rows: {(1, varchar$1), (2, varchar$0), (4, varchar$1)} $query: {1, 2, 4}

In the above instance, the varchar set has two elements labeled as varchar followed by the element number. The rows relation consists of three tuples, each of which represent a row in the student table. The query set represents the result of executing the query on the rows relation. Since the example query is SELECT id FROM STUDENT;, the query set holds the id attribute of the tuples, thus it is the set of integers {1, 2, 4}. After generating the Alloy Instance, ADUSA translates the instance into INSERT SQL queries that are used to populate an empty database. For example, for the above Alloy instance, ADUSA identifies the rows relation and generates the following SQL statements: INSERT INTO student VALUES (1, varchar$1) INSERT INTO student VALUES (2, varchar$0) INSERT INTO student VALUES (4, varchar$1)

Once the database is populated with data, the given SQL query is executed on the DBMS and the result is verified with the one found in the Alloy instance. The process is repeated for each generated instance as well as for each generated SQL query. III. F RAMEWORK In this section, we discuss the general algorithm for our approach. We describe the integration of the automated SQL generator with ADUSA to create a systematic fully automated testing framework. The full framework models the syntax and semantics of both SQL queries and relational databases without considering specific DBMS implementation details. A. Framework Outline Figure 3 shows the complete framework for DBMS testing. Boxes represent the processing modules; ovals represent

the inputs and outputs of these modules. The main components of the framework are divided as follows: 1)The Automated SQL Query Generator component which models the syntax and semantics of valid SQL queries, it includes two main modules: a)The SQL Model Generator module which generates an Alloy specification by modeling the user requirements in addition to the main syntax and semantic requirements of valid SQL queries. b)The Alloy2SQL_Query Translator module which translates the Alloy instances into SQL query statements that are used to test the database management systems. 2)The ADUSA component which automates the generation of test databases and the expected output for a given test SQL query. It is comprised of the following sub-modules: a)The SQL2Alloy module which generates an Alloy specification by translating a given set of SQL statements representing a database schema and a test query. b)The Alloy2SQL module which translates the Alloy instances into SQL statements that are used to populate the database on the DBMS under test. 3)The Alloy Analyzer module which generates instances that satisfy the translated Alloy specification. It used off-the-shelf SAT solvers to find solutions for the formulas generated by the Alloy specifications. Both previous two components use the Alloy Analyzer to find and enumerate solutions for the Alloy models generated. 4)The Verifier module which automates the process of loading the test data into the databases under test and comparing the DBMS with the Alloy query results. It reports any inconsistencies found and provide a report with the status of the database which is used for reproducing any errors. 5)The FullTest module which integrates all the modules together providing a benchmark of the queries tested and the ones remaining, plus the number of inconsistencies found and useful testing measurement information. B. The Automated SQL Query Generator In our approach, we consider a subset of SQL query grammar as shown in Fig. 4. We have chosen this subset since it serves as a good example for the validity of our approach and it has proven its usefulness. In section V, we show how this grammar can be extended for more complicated SQL queries. We describe how to write an Alloy specification to model a database schema and a subset of SQL query grammar. Then using the Alloy tool-set, based on SAT, we generate

Figure 3.

Full framework for DBMS testing.

QUERY ::= SELECT FROM WHERE GROUP_BY HAVING SELECT ::= ’SELECT’ selectTerm+ selectTerm :: term | aggregate(term) FROM ::= ’FROM’ (table | table JOIN table) WHERE ::= ’WHERE’ term operator (term | value) GROUP_BY ::= ’GROUP BY’ term HAVING ::= ’HAVING’ term operator value aggregate ::= ’MAX’ | ’MIN’ | ’AVG’ | ’COUNT’ operator ::= ’=’ | ’=’

Figure 4.

SQL Grammar supported

syntactically and semantically valid queries. We next describe the subset of Alloy that we use throughout this paper. More details about Alloy can be found in [12], [13]. Alloy is a strongly typed specification language. It assumes a universe of atoms (or elements) partitioned into subsets, where every type is associated with a subset. An Alloy specification consists of a sequence of paragraphs where a paragraph enables defining new types, introducing relations between types, and adding constraints on these types and relations. Being an analyzable relational language, Alloy semantics are closely related to those of relational databases. This enables systematic modeling of relational databases, and automated analysis of relational operations on databases. For example, to model fields of a database table, the SQL Query Generator automatically creates a signature for each field in the tables. These signatures extend the Field type defined in the SQL Alloy skeleton2 . In addition, for each Field type extended we explicitly set the relation constraints, setting the name and type of every field explicitly. Similarly for each table in the schema, we create a signature extending the Table type. The code below shows the declarations of such signatures for both fields and tables of the schema. Note that for fields relation in the signatures extending Table, we use the keyword in which guarantees that the field (on the left hand side) belongs to the set of fields for that table. lone sig field_id extends Field {} { name = id type = intType } lone sig field_name extends Field {} { name = name type = stringType 2 More

information about basic SQL Alloy Skeleton can be found in [2].

} ... lone sig table_student extends Table {} { name = students field_id in fields field_name in fields } lone sig table_grades extends Table {} { name = grades field_studentID in fields field_courseID in fields field_grade in fields }

The signatures declared above do not add any relations to the Table and Field types. Thus, the empty pair of braces {}is used at the end of the declaration. Instead of altering the base signatures of the model we add Alloy facts to the new signatures. Facts resemble explicit constraints that we want to satisfy. Thus we use another pair of braces directly following the signature declaration (facts in Alloy can be written as separate paragraphs as well, in this case we chose to embed them in the signature declarations for the ease of read). After modeling the main components in Alloy, this guarantees that the SQL grammar is satisfied in the Alloy model. However, we need to add constraints to the model to guarantee semantically correct queries. Looking back at the SQL grammar in this example, and using a grammar based string generator, it would generate queries such as: SELECT courseID from STUDENT; SELECT name, name from STUDENT; SELECT id from STUDENT, STUDENT;

(1) (2) (3)

These queries are syntactically correct but not semantically. Query (1) selects a field which does not belong to the table selected in the FROM section. It causes an error in execution when run on most database systems. To prune out similar queries in our query generation we add Alloy constraints to the model. The constraint is added by introducing a new fact paragraph as follows: fact field_in_table { all f: term.field | some t: FROM.tables | f in t.fields }

The above Alloy fact paragraph reads as: for any of the fields of term elements, there exists a table t in FROM tables, where the field belongs to t’s fields. The all quantifier stands

for universal quantification, the some quantifier stands for existential quantification. The dot operation ’.’ represents a relational join, e.g., FROM.tables returns the domain of the tables relation and the t.fields returns the image of an element t in the fields relation. Query (2) is semantically correct. It would run and execute on any database system, however we would prefer selecting different fields in the SELECT section to get more meaningful queries for testing. We add a new fact to the model to enforce this property:

Alloy some keyword. The following Alloy code models the GROUPBY section of the query: lone sig GROUPBY{ fields : some term }{ fields in SELECT.fields #fields.agg = 0 }

The main constraint in the GROUP BY section is that the terms used within it should be a subset of the terms used within the SELECT section, not considering the ones used with an aggregate function. Adding this constraint to fact unique_select_terms { grammar based generator is complex. One all a, b : SELECT.fields.term | (a.field = b.field and a.agga =conventional b.agg ) => a=b } way to do so, is by marking each term in the SELECT section with a unique alias (using SQL ’as’ keyword), and The above Alloy fact paragraph reads as follows: for then having the terms in the GROUP BY section group be any two elements in the terms set of SELECT.fields image, selected out of those alias names. In our approach, we add if both are related to the same field element and both a fact to the GROUPBY signature to indicate that the terms have the same aggregate element, this implies those two in the fields mapping belong to the terms in the SELECT elements are identical. The ’=’ operator in Alloy is not an fields. Then we add another constraint forcing the number assignment operator, it is a boolean operator which checks of aggregate function in the GROUPBY terms to be exactly if the elements on the right and left hand sides are the zero, guaranteeing that the terms are out of the ones which same. The ’and’ operator is a logical conjunction which are not aggregated. For example, query (1) is valid while evaluates to true if and only if both sides of the equation query (2) is not because NAME is not selected in the SELECT are true. The ’=>’ operator is the implication operator as clause without an aggregate fun: used in mathematical logic operators. Thus we specify in SELECT ID,MAX(NAME) FROM STUDENT GROUP BY ID;(1) this constraint that if any two fields belonging to the set SELECT ID,MAX(NAME) FROM STUDENT GROUP BY NAME;(2) of selected fields have the same table field and aggregate, then make these selected fields unique, guaranteeing that the D. SQL HAVING Constraints SELECT section of the query will not have duplicate terms. We introduce two new signatures for the HAVING Similar to query (2), query (3) is syntactically and semansection: HAVING and havingTerm. The havingTerm tically correct, but seldom is the case that queries require represents the grammar ”aggregate_function(term) joining the same table to itself. Adding a constraint to operator value”. Each of the havingTerm elements not join the table with itself in the grammar is complex is related to one term element, one Operator element, and could be done manually by enumerating all possible and one Value element. The HAVING signature contains combinations of tables to be cross joined and explicitly the fields relation mapping it to a non-empty set of mentioning those in the grammar. In our example, since havingTerm elements. The following Alloy code models we only have two tables to pick from, we can update the the HAVING section of the query: grammar easily to not choose the same table twice. But in abstract sig Operator{} abstract sig Value{} a more general case where we have more tables to pick sig havingTerm{ from, enumerating possible table joins is tedious and prone field : one term, operator : one Operator, to mistakes. In our approach, we add a constraint to the value : one Value model to guarantee not to pick same table twice. We specify } lone sig HAVING{ this fact as: for any two tables in the tables image of the fields: some havingTerm FROM element, if those tables map to the same table element }{ fields.field in SELECT.fields then they must be identical : fact unique_tables { all a, b: FROM.tables | a.table = b.table => a = b }

C. SQL GROUP BY Constraints Modeling the GROUP BY section requires declaring a new signature paragraph in Alloy. We add a one-to-many relation to the GROUPBY signature named fields which maps to a set of term elements. The multiplicity of the fields relation is at least one which is enforced by the

all f: fields.field | #f.agg = 1 }

The constraint of HAVING clause is that the terms used should be a subset of the terms used in the SELECT queries with an aggregation function. The fact following the HAVING signature declaration ensures two properties: (1) all the terms in the HAVING clause are subset of the terms used in the SECTION clause and (2) for any term in the HAVING clause, the term is used with an aggregation function. For example, query (1) is valid while query (2) is not because MAX (NAME) is not selected in the SELECT clause:

SELECT NAME, MAX (ID) FROM NAME HAVING MAX (ID) > SELECT NAME, MAX (ID) FROM NAME HAVING MAX (NAME)

STUDENT GROUP BY 5; (1) STUDENT GROUP BY > 5; (2)

E. SQL WHERE Constraints We introduce two new signatures for the WHERE clause: whereTerm and WHERE. The whereTerm elements repre-

sent a clause containing a term compared to another term or another constant value. The operator relation maps the whereTerm element to a single operator. The WHERE signature maps the where clause to a set of whereTerm elements. We discuss the relation between the where terms in the discussion section. The following code models the SQL WHERE clause and their constraints in Alloy: sig whereTerm { leftTerm : one term, operator : one Operator, rightTerm : one term + Value }{ leftTerm != rightTerm } lone sig WHERE { fields: some whereTerm } fact whereTerms { all n: (WHERE.fields.leftTerm + WHERE.fields.rightTerm).field | some t: FROM.tables | n in t.fields all a, b: whereTerm | a.leftTerm = b.leftTerm and a.rightTerm = b.rightTerm and a.operator = b.operator => a = b }

We constrain comparing the same element with itself using the fact ’leftTerm != rightTerm’. We constrain that the terms in the WHERE clause belong to one of the tables in the FROM clause inside the whereTerms fact paragraph. The fact indicates that for any field in the either left or right side terms of the where term, there exists a table in the FROM section which the field belongs to. Another fact statement prunes out have same where terms by specifying that for any two where terms, if they have the same fields on the left and right hand side and the same operator, then these two fields are identical. F. Alloy2SQL Query Translator The Alloy Analyzer tool-set compiles an Alloy model into a Boolean formula and uses SAT technology to solve it. It iterates over all possible solutions for the Boolean formula. For every solution, it converts it into an Alloy instance. In our approach, we take the Alloy instances and convert them into a valid SQL queries that are then used for testing databases and applications. An Alloy instance generated by the Alloy analyzer is a set of valuations assigned to the signatures and relations declared by the Alloy model. To translate an instance into a SQL query, we first identify the signatures associated with the fields, tables, aggregate functions and other singleton elements which serve as the initial parts of the grammar. Then we iterate over relations on the signatures setting the

for all sig : Instance.Signatures for each valuation of sig in the solution Create a Java object for the valuation //map the Alloy object to Java object map.add (AlloyObject, JavaObject) for all sig : Instance.Signatures for all field : sig.fields //need a loop for non-singleton field for every valuation of field sourceObject = map.get (field.source) targetObject = map.get (field.target) sourceObject.setField(field.name, targetObject) print map.values()

Figure 5.

Algorithm for translating an Alloy instance into a SQL query.

corresponding field values for every relation. We use Java classes to represent each part of the SQL grammar. Then we use a mapping from Alloy objects into Java objects. We set the relations between signatures by calling setter methods in the Java classes corresponding to the field being set. Fig. 5 illustrates the basic algorithm for creating a concrete SQL query out of an Alloy instance. IV. E XPERIMENTS In this section we discuss the results of experimenting our full framework with Oracle11g database management system. We perform tasks for generating SQL queries based on different subsets of the SQL grammar and automatically verify the output of the DBMS. Our experiments successfully identified new queries which result in erroneous output. One of the main motivations for automatic generation of syntactically and semantically valid SQL queries is to automate the three fundamental steps in database testing. Our previous work on Automated Database Testing Using SAT (ADUSA) [1] uses model-based testing to perform (1) query-aware database generation to construct a useful test input suite that covers the various scenarios for query execution and (2) test oracle generation to verify query execution results on the generated databases. ADUSA takes as input (1) a database schema and (2) a SQL query. It populates the database with meaningful data and verifies the output of executing the query upon the database using an automatically generated oracle. Our current approach, closes the gap of having a SQL query as a user-provided input for ADUSA. Given an input schema, our approach automatically generates valid SQL queries for testing. These queries along with the schema are used by ADUSA to perform a black-box testing on the database system. Having both approaches based on Alloy and SAT enables us combining the Alloy models to minimize number of variables used to solve the model. Tests generated using ADUSA were able to find and reproduce bugs in Oracle 11g [17], MySQl 4.0 [22], and HSQLDB [18] (injected bug) [1]. In the following sections we describe the results of full tests which illustrate the functionality of the framework while testing Oracle11g (Release 1) [17]. The first set of tests shows the performance and complexity of the Alloy models

as the number of selected tables increases. The second set of tests focuses on queries which revealed bugs in the Oracle DBMS. Our previous work on ADUSA was successfully able to reproduce and provide a counter example for a bug in Oracle11g. Nevertheless, this bug was only reproduced using a specific SQL query. Using our automated framework, we were able to identify 5 new queries which reveal erroneous output in Oracle11g which we were not previously aware of, and the required data to reproduce each of them. A. Generating full tests First we demonstrate the effectiveness of our framework in providing a complete test cycle for testing databases. Our framework generates valid SQL queries for test and each query is modeled in Alloy in addition to the database schema. Then it produces test inputs to populate the database and the expected output of running the query on each test input. In the following tests, we consider a database schema constituting of 3 tables: (1) student table with ID (primary key) and Name fields, (2) course table with cID (primary key) and Name fields, and (3) department with ID (primary key) and Name fields. We consider all fields of the tables to be of varchar type. Given this schema, we consider automatically generating all valid SQL queries satisfying the following grammar: QUERY ::= SELECT FROM SELECT ::= ’SELECT’ selectTerm+ selectTerm :: term | aggregate(term) FROM ::= ’FROM’ (table | table NATURAL JOIN table | table NATURAL JOIN table NATURAL JOIN TABLE) aggregate ::= ’COUNT’ term ::= ’ID’ | ’NAME’ | ’CID’ | *

We add to the constraints while generating SQL queries that no table is selected twice in the FROM clause, this guarantees that a table is not crossed joined by itself. In addition, we do not have to distinguish fields with same names between tables since NATURAL JOIN joins tables based on common fields names3 . In addition, we add a specific constraint for the COUNT aggregate function; we specify that there are no other terms selected if COUNT is used. Another constraint added is that if the * term is selected then no other terms can be selected at the same time to guarantee a correct SQL query syntax. Using the schema and subset SQL grammar discussed above, our automated query generator produces 57 unique queries. To show the results, due to lack of space, we group the queries into 7 groups. Each group is identified by the tables in the FROM clause. We show the average time needed to generate a database instance and verify the correctness of execution by comparing it to the oracle (expected result). Table I shows the details of the test 3 Our approach supports tables with same field names by creating alias names for the fields of the tables selected, thus every field is uniquely represented as table name.field name.

suites execution. The tests were run on an Intel Core 2 Duo 2.0GHz, 2GB RAM with Java Runtime Environment 1.6.0. Table I shows the details of testing Oracle11g with sets of queries; for each query generated, ADUSA enumerates all test cases and checks the DBMS output for correctness. The FROM Clause Set identifies the queries by the tables in the FROM clause. The ’x’ identifies NATURAL JOIN between tables; S, D, and C were used as abbreviations of table names for simplicity. #Queries is the total number of queries within the FROM Clause set. Total #DB tests is the total number of database instances which ADUSA generated for testing the database. Avg. #tests query is the average number of database instances generated for each query by ADUSA. Primary variables and clauses are the average variables and clauses in the Alloy model generated by ADUSA to produce a test. Average time per query is the average time ADUSA takes per query to enumerate all possible database instances, delete data from the database for every instance, populate the database with new database instance data, query the database, and verify the correctness of the DBMS output. Total time is the total time consumed by ADUSA to verify the correctness of all the queries in a given FROM Clause set. The scope of variables used by ADUSA in all the tests above is 3 varchar and 4 bits for Int. We set a threshold of 1,000 database instances to test per query. In most cases, in less than 10 seconds ADUSA was able to verify 1,000 test cases for correctness for each query. It is worth mentioning that for each test case there is an overhead time consumed in emptying the database, inserting new elements into it, and finally querying it. The time required for concretizing the oracle (expected output) from an Alloy instance into real data and comparing it to the actual DBMS output is negligible. For each query, we establish a database connection once and close it after all instances generated by ADUSA for that query has been tested. Note that in Table I the average time (ms) per query is the time elapsed to test all instances generated for one query. In almost cases, the number of database instances is more than 500. In addition to testing Oracle11g, we ran the same tests on MySQL 5.0. The most noticeable difference is the time consumed for running the tests. It took in almost all cases, half the time to run the tests on MySQL compared to Oracle11g. Which points out that the overhead of database connection and querying is a bottle neck. B. Experimenting with Oracle11g Table I shows that out of the total 11,000 tests on the FROM set which includes student NATURAL JOIN course NATURAL JOIN department, there were 988 tests which revealed bugs. We experiment more on these queries showing bugs. We examine the relevance of the scope of variables to finding a bug faster verses complexity of the Alloy model. Increasing the scope used by ADUSA

FROM Clause Set

#Queries

STUDENT (S) COURSE (C) DEPARTMENT (D) SxD SxC DxC SxCxD

6 6 6 6 11 11 11

Total #DB tests 3036 3036 3036 5096 11000 11000 11000

Avg. #tests/ query 506 506 506 850 1000 1000 1000

Prim. Vars

Clauses

30 30 30 39 44 40 48

468 468 468 1252 1531 1486 3485

Avg. Time(ms)/ query 4072.8 4312.5 4221.2 7721.5 9642.1 8836.7 12670

Total time(sec)

Bugs

24.4 25.8 25.3 46.3 106.1 97.2 139.4

0 0 0 0 0 0 988

Table I R ESULTS OF RUNNING OUR FRAMEWORK ON O RACLE 11 G .

can significantly blow up the number of database instances generated. On the other hand, having a richer set of test cases may reveal bugs faster. The set of test queries with the 3 tables NATURAL JOIN’ed together is as follows: SELECT SELECT SELECT SELECT SELECT SELECT SELECT SELECT SELECT SELECT SELECT

DISTINCT id FROM ... DISTINCT id,NAME FROM ... DISTINCT id, NAME, cID FROM ... DISTINCT NAME FROM ... DISTINCT NAME,cID FROM ... DISTINCT cID FROM ... DISTINCT id, cID FROM ... COUNT (DISTINCT cID) FROM ... COUNT (DISTINCT id) FROM ... COUNT (DISTINCT NAME) FROM ... COUNT (*) FROM ...

All these queries select from student NATURAL JOIN course NATURAL JOIN department. They constitute all possible valid queries adhering to the constraints and grammar we specified for these tests. Using ADUSA framework to test these queries on Oracle11g we were able to identify bugs in 6 queries. Table II shows the queries which revealed unexpected wrong output. The scope is the maximum number of varchar elements used to populate the database with test instances. #DB tests is the number of database test instances, generated by ADUSA, considered for each query. #Bugs is the number of tests which produced an inconsistence result with the expected output. The first 5 queries were newly discovered using our approach of automated SQL query generation combined with ADUSA framework. The 6th query was dsicovered in our previous work on ADUSA. To verify that a bug exists, we used two methods. First, for every bug detected, ADUSA provides us with efficient data needed to reproduced the bug. We use the same data to test the query on a different database, for instance MySQL 5.0, which has the same database schema saved on it as well. For all the queries which revealed bugs, none of them showed any inconsistencies with MySQL 5.0. In addition, we ran random manual verifications. For random counter examples, we manually analyzed the data, predicted the output and verified our analysis. Indeed, the output was erroneous when ADUSA predicted so. Table II shows the number of database instances generated and corresponding inconsistencies found. An interesting query is Query#4, where ADUSA was able to find an

inconsistency after enumerating 3,798 instances with scope of exactly 3 varchar; and it took 2,148 instances to reproduce the bug with a scope of 4 varchar. This indicates that a tester can easily come up with a test suite out of the thousands of possible tests and not detect the bug. Trying to manually generate data to reproduce the bug would have taken hours, while exhaustive bounded checking can guarantee that up to a given scope all possible instances have been verified. Nevertheless, this keeps the possibility of finding a bug with bigger scopes. We compare running the tests with scope of 3 versus 4 for the last 3 queries in Table II. The results show that using a bigger scope there was a higher possible of detecting the bug faster. For example, tests with scope of 4 on Query#5 showed only 1 inconsistency in the first 1,000 tests, but it revealed 138 more inconsistencies in the next 1,000 tests. While using scope of 3, the first inconsistency was detected after 2,721 tests. On the other hand, using a bigger scope puts a load on the clauses generated by Alloy and running the tests with bigger scopes is a little slower, but compared to the database overhead the time to run the tests with higher scopes is negligible. The time consumed increased significantly in most cases because of the blow up in possible database instances to generate. Originally we used 1,000 instances as a threshold to stop ADUSA from generating more instances. As our experience in finding erroneous queries increase, we think that a higher threshold might be needed for higher confidence in the results. Fig. 6 shows a screen shot of the data produced by ADUSA to generate a counter example for Query#5 of Table II on Oracle11g. The expected output of the query is ’varchar$0’ while Oracle11g reports ’varchar$3’ and ’varchar$0’ as output which is wrong. It is worth mentioning that without using our framework, we were not aware of any bugs found in the first 5 queries of Table II and it would have been hard to find and reproduce them. Previously, having known that Query#6 produces a bug from our previous work, we suspected a bug in the implementation of the COUNT aggregate function, but using our full framework for automating tests enabled us to detect new queries producing erroneous output. The queries show that the bug is not related to the aggregate function and it could

#

Query

scope

1 2 3

SELECT DISTINCT NAME,cID FROM ... SELECT COUNT (DISTINCT cID) FROM ... SELECT COUNT (DISTINCT NAME) FROM ...

4

SELECT DISTINCT NAME FROM ...

5

SELECT DISTINCT cID FROM ...

6

SELECT COUNT (*) FROM ...

3 3 3 3 4 3 4 4 3 3 4

#DB tests 1000 1000 1000 3798 2148 2721 1000 2000 1000 4000 4000

Total time(ms) 13500 11703 11593 75812 36500 32828 17141 33265 11938 47984 80188

#Bugs 229 229 228 1 1 1 1 139 302 1139 2414

Table II Q UERIES WHICH REVEALED BUGS IN O RACLE 11 G .

SELECT statements can have access to the outer SELECT terms but not vice-versa using the same approach. VI. R ELATED W ORK

Figure 6. Snapshot of an erroneous query output in Oracle11g detected by our framework. The counter example is automatically produced. The correct query execution should return ’varchar$0’.

be that multiple bugs are related in these queries. V.

EXTENSION

Our approach shows how to use Alloy and the Alloy Analyzer to model a subset of SQL query grammar and its constraints to ensure the validity of the syntax and semantics of queries generated. The approach is extensible; we can systematically add support for a larger subset of SQL grammar. For example, we showed how to integrate in our framework the types for table attributes; these types can be used to add type checking constraints in the WHERE, GROUP BY, and HAVING clauses. SQL transactional grammar can be extended as well. DELETE statements can be introduced by modifying the grammar as: DELETE FROM TABLE WHERE term in (SELECT term FROM table WHERE condition). The constraint that the term to be deleted is the same as the term to be selected is simple to write in Alloy. Nested SELECT statements can be extended by ensuring that the inner

The use of Alloy for modeling and analyzing software systems is not new. Alloy has been previously used to analyze software systems [14]. These techniques involve modeling both the program inputs as well as the computations in Alloy. Alloy has been also used for specification based testing. The TestEra [15] framework for specification based testing generates test cases from Alloy specifications of data structures. Our framework uses a similar approach to that of TestEra in using Alloy to specify the input and for checking the correctness of the output. However, it differs from TestEra in two key ways. (1) TestEra is specialized for testing programs that take linked data structures as inputs, whereas our framework targets testing database management systems. (2) TestEra requires the programmer to learn a new specification language in order to specify the input. On the other hand, we only require the input described in SQL (the language of the application under test) and systematically the Alloy specifications are generated. The use of Alloy and the Alloy Analyzer enables the unification of the three key approaches for database testing: generation of SQL queries, test databases, and test oracles. A popular framework for query generation is the Random Query Generator (RQG) [10], which uses the SQL grammar as a basis of query generation - in the spirit of production grammars. Given a grammar RQG generates random queries and tests databases by running the tests against two or more databases and comparing their results. Since the query generation is purely grammar-based, it generates a large number of invalid queries as well as redundant ones. Moreover, validating that the queries generated are syntactically correct is hard and sometimes impossible to ensure [10]. Test database generation is a well studied problem [6], [7], [11], [19], [24]. Several approaches perform data generation by analyzing a given database schema [19], [20]. Such

approaches aim at generating large databases that are used as benchmarks for performing various analysis on databases. Our approach for data generation is query-aware and targets generating a large set of small databases for exhaustively testing a DBMS system. Unlike other approaches which use constraint solvers [9], or object modeling language [24], our framework uses the Alloy Analyzer which in turn uses SAT to generate its data. The use of Alloy enables specifying constraints on both the query as well as the results which enables more precise test input generation. Several database testing approaches target transaction testing, i.e., checking the effect of executing a sequence of related SQL queries on a database. For example, a recent tool, AGENDA [8], uses state validation techniques to verify the consistency of the database after executing a transaction. Our approach does not target transaction testing. The framework primarily considers SQL selection statements which unlike transactions do no update the state of the database. VII. C ONCLUSION In conclusion, we presented a novel approach for systematic testing of database engines using a relational constraint solver. Our framework automatically generates (1) syntactically and semantically correct SQL queries for testing, (2) meaningful input data to populate test databases, and (3) expected results of executing the queries on the DBMS. Our approach leverages the SAT-based Alloy tool-set. We systematically model the SQL queries and the database schema in Alloy and add constraints to ensure semantic meaning of the queries and test oracles, we then use the Alloy Analyzer to complete test suites to verify the correctness of a DBMS. Experimental results show the effectiveness of our framework and it ability in finding and reproducing erroneous behavior in different database management systems. R EFERENCES [1] Shadi A. Khalek, Bassem Elkarablieh, Y. O. Laleye, and Sarfraz Khurshid. Query-Aware Test Generation Using a Relational Constraint Solver. In ASE ’08, pages 238-247, IEEE. [2] Shadi A. Khalek, and Sarfraz Khurshid. Automated SQL Query Generation for Systematic Testing of Database Engines. In ASE ’10, pages 329–332, ACM. [3] Nicolas Bruno, Surajit Chaudhuri, and Dilys Thomas. Generating queries with cardinality constraints for DBMS testing. IEEE Transactions on Knowledge and Data Engineering, 18(12):1721–1725, 2006.

[6] Nicolas Bruno and Surajit Chaudhuri. Flexible database generators. In VLDB ’05, pages 1097–1107. VLDB Endowment. [7] Kenneth Houkjaer, Kristian Torp, and Rico Wind. Simple and realistic data generation. In VLDB ’06, pages 1243–1246. VLDB Endowment. [8] Yuetang Deng, Phyllis Frankl, and David Chays. Testing database transactions with agenda. In ICSE ’05: Proceedings of the 27th international conference on Software engineering, pages 78–87, 2005. [9] Carsten Binnig, Donald Kossmann, Eric Lo, and M. Tamer ¨ Ozsu. Qagen: generating query-aware test databases. In SIGMOD Conference, pages 341–352, 2007. [10] MySQL Forge Random Query Generator. http://forge.mysql. com/wiki/RandomQueryGenerator/. [11] Carsten Binnig, Donald Kossmann, and Eric Lo. Reverse query processing. In ICDE’07, pages 506–515. IEEE. [12] Emina Torlak and Daniel Jackson. Kodkod: A relational model finder. In TACAS’07, pages 632–647. [13] Daniel Jackson. Alloy: A lightweight object modeling notation. ACM Transactions on Software Engineering and Methodology (TOSEM), 11(2), April 2002. [14] Daniel Jackson and Mandana Vaziri. Finding bugs with a constraint solver. In ISSTA’00), Portland, OR. [15] Sarfraz Khurshid and Darko Marinov. TestEra: Specificationbased testing of Java programs using SAT. Automated Software Engineering Journal, 2004. [16] Margus Veanes, Nikolai Tillmann, and Jonathan de Halleux. Qex: Symbolic SQL Query Explorer Microsoft Research Technical Report MSR-TR-2009-2015, October 2009 [17] Oracle Corp. Oracle database engine. http://www.oracle.com/ technology/software/products/database/index.html. [18] The hsqldb Development Group. HSQL database engine. http: //www.hsqldb.org/. [19] IBM DB2. Test database generator. www.ibm.com/software/ data/db2imstools/db2tools/db2tdbg/. [20] Datanamics. Db date generator. http://www.datanamic.com/ datagenerator/index.html. [21] Michael Emmi, Rupak Majumdar, and Koushik Sen. Dynamic test input generation for database applications. In ISSTA ’07, pages 151–162, 2007. [22] MySql open source database. http://www.mysql.com/.

[4] Donald R. Slutz. Massive stochastic testing of sql. In VLDB’98, Proceedings of 24rd International Conference on Very Large Data Bases, August 24-27, 1998, New York City, New York, USA, pages 618–622. Morgan Kaufmann, 1998.

[23] David Willmor and Suzanne M. Embury. An intensional approach to the specification of test cases for database applications. In ICSE ’06, pages 102–111, ACM.

[5] Meikel Poess and Jr. John M. Stephens. Generating thousand benchmark queries in seconds. In VLDB’2004, pages 1045– 1053.

[24] Yannis Smaragdakis, Christoph Csallner, and Ranjith Subramanian. Scalable automatic test data generation from modeling diagrams. In ASE ’07, pages 4–13, ACM.

Suggest Documents