Abstractâ. We introduce in this paper a new specification language named Praspel, for PHP Realistic Annotation and SPEcification. Language. This language ...
Praspel: A Specification Language for Contract-Based Testing in PHP Ivan Enderlin, Abdallah Ben Othman, Fr´ed´eric Dadeau and Alain Giorgetti LIFC - University of Franche-Comt´e - INRIA CASSIS Project 16 route de Gray - 25030 Besanc¸on cedex, France Email: {ivan.enderlin, abdallah.ben othman}@edu.univ-fcomte.fr, {frederic.dadeau, alain.giorgetti}@univ-fcomte.fr
Abstract— We introduce in this paper a new specification language named Praspel, for PHP Realistic Annotation and SPEcification Language. This language is based on the Design-by-Contract paradigm. Praspel clauses annotate methods of a PHP class in order to both specify their contracts, using pre- and postconditions, and assign realistic domains to the method parameters. A realistic domain describes a set of concrete, and hopefully relevant, values that can be assigned to the data of a program (class attributes and method parameters). Realistic domains are more precise than types, and can specify data in any programming language, typed or not. Realistic domains are designed to provide a basis for automated unit test generation. To this end, they present two useful features. The first one is a predicate that is used to check if a value belongs to the realistic domain. The second one is a dedicated value generator, that is used to produce input test data. These features are implemented into a test generator for PHP (also named Praspel) that offers a random test generator, which computes test data, coupled with a runtime assertion checker, which decides whether a test passes or fails. Keywords-Realistic domains, PHP, Design by Contract, annotation language, random testing.
I. I NTRODUCTION Over the years, testing has become the main way to validate a software at low cost. Current investigations are looking forward to automate the test generation so as to unburden the developers from writing their tests manually. Recent development techniques, such as Agile methods, consider tests as first-class citizens, that are written prior to the code. Modelbased testing [1] is an efficient paradigm for automating test generation. It considers a model of the system that is used for generating conformance test cases (w.r.t. the model) and compute the oracle (i.e. the expected result) that makes it possible to decide whether the test passes or fails. In order to ease the model description, annotation languages have been designed, firstly introduced by B. Meyer [2], creating the Design by Contract paradigm. These languages make it possible to express formal properties (invariants, preconditions and postconditions) that directly annotate program entities (class attributes, methods parameters, etc.) in the source code. Many annotation languages exist, such as the Java Modeling Language (JML) [3] for Java, Spec# [4] for C#, or the ANSI-C Specification Language (ACSL) [5] for C. Design by Contract considers that a system has to be used in a contractual way:
to invoke a method the caller has to fulfil its precondition; in return, the method establishes its postcondition. A. Contract-Driven Testing Annotations can be checked at runtime to make sure that the system behaves as specified, and does not break any contract. Contracts are thus well-suited to testing, and especially to unit testing [6]. The idea of Contract-Driven Testing [7] is to rely on contracts for both producing tests, by computing test data satisfying the contract described by the precondition, and for test verdict assessment, by checking that the contract described by the postcondition is ensured after execution. On one hand, method preconditions can be used to generate test data, as they characterize the states and parameters for which the method call is licit. For example, the Jartege tool [8] generates random test data, in accordance with the domain of the inputs of a given Java method, and rejects the values that falsify the precondition. The JML-Testing-Tools toolset [9] uses the JML precondition of a method to identify boundary states from which the Java method under test will be invoked. The JMLUnit [10] approach considers systematic test data for method input parameters, and filters irrelevant ones by removing those falsifying the method precondition. On the other hand, postconditions are employed similarily in all the approaches [9], [10], [11], [12]. By runtime assertion checking, the postcondition is verified after each method execution, to provide the oracle. B. Contributions In this paper, we define the concept of realistic domains. Realistic domains are data domains encountered in concrete implementation situations. A data domain should come with a pratical way to generate values in it. These values are expected to be relevant data in the input domain of functions submitted to unit testing. For example, email addresses constitute a realistic domain. Many methods take an email address as input and it is easy to generate an email address (e.g. at random) from its syntactic definition. Our second contribution is a new language named Praspel, for PHP Realistic Annotation and SPEcification Language. Praspel is a specification language for PHP [13] which illustrates the concept of realistic domains. Consequently, Praspel is adapted to test generation. Praspel annotates PHP (which is dynamically typed) by specifying realistic domains from
which PHP methods are tested with realistic data generated at random. Our third contribution is a random testing tool (also named Praspel) working in three steps: (i) the tool generates random values for variables according to the contract, (ii) it runs the PHP program with the generated values, and (iii) the tool checks the contract postcondition to assign the test verdict. C. Paper outline The paper is organized as follows. Section II introduces the concept of realistic domains. Section III presents an implementation of realistic domains in the Praspel language, a new annotation language for PHP. Section IV describes the mechanism of automated generation of unit tests from PHP files annotated with Praspel specifications. Section V compares our approach with related works. Finally, Section VI concludes and presents our current investigations. II. R EALISTIC D OMAINS When a function precondition is any logical predicate, say from first-order logic, it can be arbitrarily difficult to generate input data satisfying the precondition. One could argue that the problem does not appear in practice because usual preconditions are only simple logical predicates. But defining what is a “simple” predicate w.r.t. the problem of generating values satisfying it (its models) is a difficult question, still partly open. We plan to address this question and to put its positive answers at the disposal of the test community. In order to reach this goal we introduce the concept of realistic domains. Realistic domains are intended to be used for test generation purposes. They specify the set of values that can be assigned to a data in a given program. We first describe the principles of realistic domains, before introducing their associated features, aiming at test generation. We then present the declination of realistic domains in PHP. A. Principles of realistic domains Principle 1. Realistic domains are not types Realistic domains are an overlay of types, adding necessary properties for the validation and generation of data values. They are also more specific than types. For example, a realistic domain can be an integer between a bound A and a bound B. While types define sets of possible values, domains define sets of values that can be generated or checked for membership. Principle 2. Realistic domains describe accurate data values Realistic domains are not intended to provide a type system for typing data, they are meant to automatically and efficiently generate realistic data (e.g. at random), i.e. data that satisfy a given context. Therefore, realistic domains have to be defined so as to fit at best in the situation in which they are employed, in accordance with the semantics of the data that was intended by the programmer. Example 1 (A simple realistic domain). Consider a function mail($address, $text) that respectively takes as input
an email address and the text to be emailed. Unlike typed languages which would assign the type string to $address, we would rather assign the EmailAddress realistic domain to it, that defines at best a string that represents an email address. B. Features of realistic domains Realistic domains are defined with two features, which are now described and illustrated. 1) Predicability: The first feature of a realistic domain is to carry a characteristic predicate. This predicate makes it possible to check if a value belongs to the possible set of values described by the realistic domain. Example 2 (Realistic domain predicate). Consider the EmailAddress realistic domain from Example 1. A characteristic predicate attached to this realistic domain checks that the string representing an email address matches a regular expression specifying the presence of exactly one @ symbol, the absence of forbidden characters, the termination with a valid extension name, etc. 2) Samplability: The second feature of a realistic domain is to propose a value generator, called the sampler, that makes it possible to generate values in the realistic domain. The data value generator can be of many kinds: a random generator, a walk in the domain, an incrementation of values, etc. A realistic domain owns at least one value generator. Example 3 (Realistic domain sampler). Consider again the EmailAddress realistic domain. An email address sampler could be a function in charge of picking a valid email address at random, which would behave as follows: • it randomly selects the number of parts before the @ sign (to be separated by a “.”); • it randomly chooses the length of these parts; • it randomly generates the characters composing the different parts; • it randomly generates the domain name; • it randomly generates an extension amongst the existing ones. Another possibility would be to dispose of a finite set of already well-formed email addresses amongst which a random selection would occur. We now present the implementation of realistic domains in PHP and show some interesting additional principles they could obey. C. Realistic domains in PHP In PHP, realistic domains are implemented as classes providing at least two functions, corresponding to the two features of realistic domains. The first function is named predicate($q) and takes a value $q as input. It returns a boolean indicating the membership of the value to the realistic domain. The second function generates values that belong to the realistic domain. An example of realistic domain implementation in a PHP class is given in Fig. 1. This class
class EmailAddress extends string { public function predicate($q) { // regular expression for email addresses // see. RFC 2822, 3.4.1. address specification $regexp = ’. . .’; if (false === parent::predicat($q)) { return false; } return preg_matches($regexp,$q); } public function randomize() { // string of authorized chars $chars = ’ABCDEFGHIJKL. . .’; // array of possible domain extensions $domains = array(’com’,’org’,’fr’,’net’,’edu’); $nbparts = rand(2,4); $q = ’’; for ($i=0; $i < $nbparts; $i++) { if ($i > 0) { // add separator or arobase $q .= ($i == $nbparts-1) ? ’@’ : ’.’; } $len = rand(1,10); for ($j=0; $j < $len; $j++) { $index = rand(0, strlen($chars)-1); $q .= $chars[$index]; } } $q .= ’.’ . $domains[rand(0,count($domains)-1)]; return $q; } }
Fig. 1.
PHP code of the EmailAddress realistic domain
represents the EmailAddress realistic domain presented in the preceding examples. In PHP, realistic domains may obey additional interesting principles exploiting the PHP object programming paradigm. 1) Hierarchical inheritance: PHP realistic domains can inherit from other realistic domains. A realistic domain child inherits the two features of its parent and is able to redefine them. Consequently, all the realistic domains constitute an universe. This universe is hierarchically organized in three layers: • the first layer, called the seed, is a singleton: it only contains the undefined realistic domain, with an alwaystrue predicate and a randomizer yielding a realistic domain from the next layer; • the second layer, called the sprout, contains all the builtin realistic domains, which are 4 scalar realistic domains: void, integer, float and boolean, and 3 structured realistic domains: string, array, and class. All these realistic domains inherit from the undefined realistic domain and refine its predicate and its sampler, by overriding the related functions. • the third layer, called the fruit, contains all the other realistic domains. These realistic domains inherit from realistic domains that belongs to the sprout layer. They are organized in two categories: the standard library and the user-defined realistic domains. The standard library provides among others the following realistic domains: – boundinteger and boundfloat are respec-
tively integer or float where it is possible to constraint the yielded numbers to be included in a closed or an opened interval; – smallinteger is a child of integer, defining an interval from −128 to 127; smallfloat has the same behavior but applied on float with an interval from −128.0 to 127.0; – integerpp is like integer, but its sampler returns 0 the first time it is called, and increments the last generated integer at each invocation. 2) Parameterizable realistic domains: Realistic domains may have parameters. They can receive arguments of many kinds. In particular, it should be possible to give realistic domains as arguments of realistic domains. This notion is very important for the generation of recursive structures, such as arrays, objects, graphs, automata etc. Example 4 (Realistic domains with simple arguments). The realistic domains boundinteger(7, 42) and string(’fr_FR’) respectively admit two numbers and one string as arguments. The realistic domain boundinteger(X,Y ) contains all the integers between X and Y . The realistic domain string(’fr_FR’) is intended to contain all the strings composed of French words. III. I MPLEMENTATION IN P RASPEL Realistic domains are implemented for PHP in Praspel, a dedicated annotation language based on the Design-byContract paradigm [2]. In this section, we present the syntax and semantics of the language. Praspel specifications are written using API documentation comments such as: /** * An API documentation comment. */
Praspel makes it possible to mix informal documentations and formal constraints, called clauses and described hereafter. Praspel clauses are ignored by PHP interpreters and integrated development environments. Moreover, since each Praspel clause begins with the standard @ symbol for API keywords, it is usually well-handled by pretty printers and API documentation generators. The grammar of Praspel annotations is given in Fig. 2. Notation (σ)? means that σ is optional. (σ)rs represents finite sequences of elements matching σ, in which r is either + for one or more matches, or * for zero or more matches, and s is the separator for the elements in the sequence. Underlined words are PHP entities. They are exactly the same as in PHP1 . An identifier is the name of a PHP class or the name of a function or method parameter. It cannot be the name of a global variable or an attribute, which are respectively prohibited (as bad programming) and typed in invariants in 1 see http://php.net/language.types.integer for integers, http://php.net/ language.types.float for floating-point numbers and http://fr.php.net/language. types.string for strings.
annotation clause
::= ::=
requires-clause ensures-clause throwable-clause predicate-clause invariant-clause behavior-clause
::= ::= ::= ::= ::= ::=
expressions extended expression
::= ::= ::=
real-dom-spec
::=
variable constructors real-doms real-dom
::= ::= ::= ::=
built-in
::=
arguments argument array pairs pair Fig. 2.
::= ::= ::= ::= ::=
(clause)∗ requires-clause; | ensures-clause; | throwable-clause; | predicate-clause; | invariant-clause; | behavior-clause @requires expressions @ensures expressions @throwable (identifier)+ , @predicate extended @invariant expressions @behavior identifier { (requires-clause; | ensures-clause; | throwable-clause;)+ } (expression)+ and identifier = \old($x)); * * @throwable FooBarException; */ function foo ( $x ) { if($x === 42) throw new FooBarException(); return $x * 2; }
3) Behavorial clauses: In addition, Praspel makes it possible to describe explicit behaviors inside contracts. A behavior is defined by a name and local @requires, @ensures, and @throwable clauses, as shown in Fig. 4. The semantics of the behavioral contract is as follows: • the caller of the function must guarantee that the call is performed in a state where the property R1 ∧ . . . ∧ Rn holds. Nevertheless, property A1 ∧ . . . ∧ Ak should also hold; • the called function establishes a state where the property (A1 ∧ . . . ∧ Ak ⇒ E1 ∧ . . . ∧ Ej ) ∧ Ej+1 ∧ . . . ∧ Em holds, meaning that the postcondition of the specified behavior only has to hold if the precondition of the behavior was satisfied; • exceptions Ti (for i ∈ 1..t) can only be thrown if all preconditions R1 ∧ . . . ∧ Rn ∧ A1 ∧ . . . ∧ Ak hold. The @behavior clause only contains @requires, @ensures and @throwable clauses. If a clause is declared or used outside a behavior, it will automatically be set into a default/global behavior. If a clause is missing in a behavior, the clause in the default behavior will be choosen. Example 10 (Behavior with default clauses). We give here a complete example of behavioral clauses:
This contract means: the first argument $x is always an integer and the result is always a boolean, but if the second argument $y is a positive integer, then the third argument $z is a boolean (behavior foo), else if $y is a negative integer, and the $z is a float and then the function may throw an exception (behavior bar). D. Creating realistic domains on-the-fly The @predicate clause allows the user to create a new realistic domain on the-fly by predicate inheritance and overloading. It has the following form: /** * @predicate P