Graphical editing support for QuickCheck models

3 downloads 2691 Views 313KB Size Report
machine for the API exported by that system. This state machine ..... We run a web server as part of QuickCheck to talk to the web browser graph editing tool. .... with Spec Explorer,” in Formal Methods and Testing, vol. 4949. Springer. Verlag ...
Graphical editing support for QuickCheck models Thomas Arts∗ Kirill Bogdanov† Alex Gerdes, John Hughes‡ ∗ QuviQ

AB, G¨oteborg, Sweden AB and Chalmers University of Technology, G¨oteborg, Sweden † Dept of Computer Science, The University of Sheffield, Sheffield S1 4DP, UK ‡ QuviQ

Abstract—QuickCheck can test a system by specifying a state machine for the API exported by that system. This state machine specification includes a list of possible API calls. Each call is accompanied by a precondition, a postcondition, a generator for the arguments, and a description of how the state is changed. Based on this specification QuickCheck generates a random sequence of API calls. The preconditions ensure that a generated sequence is valid, and the postconditions check that the system behaves as expected. Many systems require an initialisation call before other calls, describing the transition from an uninitialized to an initialised state. QuickCheck offers two ways of specifying transitions between states: using preconditions or a finite state machine abstraction. In this paper we show, by means of an example, that the latter approach is to be preferred. In addition, we present a recent extension to QuickCheck that allows a user to graphically create and edit a finite state machine specification. This extension simplifies and speeds up the specification of a finite state machine, which can be regarded as a formal model of the system, considerably. The graphical representation makes the formal model easier to understand, and visualises the distribution of API calls. Moreover, the extension allows the user to change this distribution.

I.

I NTRODUCTION

Software testing is a task that can consume a significant amount of project time which is why automation is quite significant. Generation of test inputs on its own is often not enough: if a program crashes, this is a clear problem but if it produces a plausible output, one needs an oracle to tell whether this output is correct. This is where model-based testing is useful: by utilising models of software behaviour, one may both explore what a program is supposed to do and check whether the output produced is appropriate. Additionally, one can attempt invalid inputs to check that all such cases are correctly categorised as invalid by a system under test and that appropriate error messages are produced. One can distinguish two related approaches to model-based test generation, where a model is a precise model of system behaviour, based on which tests have to be generated, and where a model is a test model, directly determining how to generate test data and encoding an oracle to verify responses from a system under test. In this work, the emphasis is on the latter type of model-based testing. Quviq QuickCheck [1], [2] is a commercial tool widely used for model-based testing of software, with emphasis on 2015 IEEE Eighth International Conference on Software Testing, Verification and Validation Workshops (ICSTW) 10th Testing: Academic and Industrial Conference - Practice and Research Techniques (TAIC PART) c 978-1-4799-1885-0/15/$31.00 2015 IEEE

software written in Erlang and C. It was inspired by the freely-available QuickCheck Haskell tool [3]. Test models for Quviq QuickCheck are written as Erlang modules, specifying operations to invoke and including generators for the arguments. In practice, this will be combined with a definition of a precondition that has to be satisfied for a test to be generated; where a specific set of randomly generated values fails to satisfy a precondition, a new set of values is generated. Other generators include selection of an arbitrary value from a list of possible values, this is used where there is a limited range of values to choose from. Traditionally, QuickCheck models have been defined purely in a form of Erlang text with or without explicit EFSM states. In addition to generators of test data, both models permit a tester to specify generators that produce operations that can be invoked in each state of a test model. Tests are generated from state machines by performing a random walk, whilst making sure that preconditions of all operations in each generated test sequence are satisfied. By means of a simple example, this paper demonstrates how inclusion of state related constraints in preconditions of operations can actually make the underlying specification difficult to read. We develop a state machine specification for a simple resource locking implementation in two different ways, one using the eqc_statem style with explicit preconditions and one using eqc_fsm with preconditions implicit in allowed state transitions. The two obtained specifications are very similar, but we do find the eqc_fsm state machine specification easier to read. We have developed both state machines by starting with a simple specification in Sect. II and then slowly improving this specification. For the explicit state notation described in Sect. III, we show how a graphical editor can be used to construct the eqc_fsm state machine from scratch and how one can evolve from the eqc_statem state machine to the eqc_fsm state machine. Related work is described in Sect. IV, followed by conclusions in Sect. V. II.

L OCKING RESOURCES WITH IMPLICIT STATE TRANSITIONS

Assume we have a simple server used to read a resource and write to it. The resource is a key-value store. Many clients can read, only one can write at the same time. In order to support a single writer, we have an operation lock after which the client that asked for the lock can write. We do not use a lock per key, but lock the complete key-value store

at once. The resource, i.e., the complete key-value store, is released by the operation unlock. We write a QuickCheck state machine using the eqc_statem library. We start by building a model for starting the server and locking/unlocking the resource; reading or writing the resource is described in Sect. II-B. A. Start, stop, lock and unlock The naive first attempt to make a QuickCheck state machine specification is to just add the API calls to a module and use the standard property of creating any possible sequence with these API calls. The model below uses the Erlang record language construct to store state data. The record is called state and it does not contain any elements, denoted with an empty tuple {}. We extend the state with a number of elements as we go. The function initial state returns the initial state of the system. The expression #state{} constructs a new instance of the state record. Each API call of a system under test is identified by specifying a generator for the arguments of the call using the QuickCheck standard notation of _args after the name of the API call and taking a state as an argument. The underscore in _S denotes that the argument is not being used. -record(state,{}). initial_state() -> #state{}. start_args(_S) -> []. stop_args(_S) -> []. lock_args(_S) -> []. unlock_args(_S) -> [].

This model is clearly insufficient for testing, as soon as we run QuickCheck, a weird test case is generated and the property fails. The error trace below shows the sequence run by QuickCheck and the exception thrown when the first element of the trace is run. Failed! After 1 tests. [{set,{var,1},{call,lock,stop,[]}}, {set,{var,2},{call,lock,unlock,[]}}, {set,{var,3},{call,lock,stop,[]}}, {set,{var,4},{call,lock,lock,[]}}, {set,{var,5},{call,lock,lock,[]}}] lock:stop() -> !!! {exception, {’EXIT’, {noproc, {gen_fsm, ..., [locker, stop]}}}}

Elements such as{call,lock,stop,[]} denote an API call of function stop in module lock (the module under test), with no arguments (empty list []). Elements set,{var,1} make it possible for QuickCheck to symbolically record values returned by API calls, permitting test calls to store them in an element of state in order to construct tests where arguments to API calls are values returned by previous calls. This is, for example, very useful when a module under test spawns processes and returns their identifiers or where calls to API are expected to use values returned by earlier calls. Clearly, it makes no sense to stop a server before you start it, or to ask for the lock before you start it. The software under test behaves correctly, it should raise an exception when you try to perform operations on a server that has not been

started. We can model that by adding a postcondition check that validates that indeed the server raises this exception. Adding postconditions, however, is not the right approach, we would generate more invalid tests than valid tests and we would test that the Erlang standard library is correct by raising exceptions. This is not what we want to test, we want to test that the software behaves as expected when it is started. This can be solved by adding a precondition to the commands that exclude certain commands if the server is not started. As soon as we start the server, we update the state by returning a new value of it from the next state function start_next in the model. In other words, the model state contains a flag reflecting whether the server is started and the state data is passed as an argument S to tester-written functions. The extended model is shown below, with record state extended to contain element started. Function initial state constructs and returns an instance of this state, setting started to false using an expression #state{started=false}; for updates to a record, the value to be modified is placed before the #. -record(state,{started}). initial_state() -> #state{started = false}. start_pre(S) -> not S#state.started. start_args(_S) -> []. start_next(S,_,[]) -> S#state{started = true}. stop_pre(S) -> S#state.started. stop_args(_S) -> []. stop_next(S,_,[]) -> S#state{started = false}.

We add preconditions to lock and unlock that state that the system must be started, similar to the stop_pre precondition above. When we now generate tests, QuickCheck always selects the start command as the first command, since the other commands are disabled by preconditions. To obtain this, the price is rather high, we add a field to our state record and have to update each command. The other price we pay is that it gets more difficult to understand what kind of tests QuickCheck will generate, since we need to understand the precondition in order to know whether an API call can be included in a sequence or not. When we now run QuickCheck we find a number of interesting failing examples. lock:start() -> {ok, } lock:unlock() -> ok lock:unlock() -> !!! {exception, {’EXIT’,{badarg, ....}}}

Apparently, we cannot unlock after starting. The second unlock error message suggests that the server had died just before. Thus, the first call to unlock does return ok, but even so, the server seems not to survive this request. Equally so, it seems that we cannot lock the server once it is locked: lock:start() -> {ok, } lock:lock() -> ok lock:lock() -> ok lock:stop() -> !!! {exception, {’EXIT’,{noproc, ... }}}

We can add postconditions to reflect this behaviour, but since the server does not exist any more after calling the API functions in the wrong state, it makes little sense to have test cases that perform these actions. We would rather stop these

cases from being generated. We restrict our test case generation once more by adding preconditions, this time we add a field in the record to indicate whether the resource is locked or not. lock_pre(S) -> S#state.started andalso not S#state.locked. lock_args_S) -> []. lock_next(S,_,[])->S#state{locked=true}. unlock_pre(S) -> S#state.started andalso S#state.locked. unlock_args(_S) -> []. unlock_next(S,_,[])->S#state{locked=false}.

This addition, however, is insufficient. Here we see a clear case where QuickCheck users would be helped with assistance in the process of adding preconditions. Because we add information to the state, we need to consider the consequence of this state data in all commands, even those that seem to be unrelated. This is demonstrated by running QuickCheck on the model with the modifications above: lock:start() -> {ok, } lock:lock() -> ok lock:stop() -> ok lock:start() -> {ok, } lock:unlock() -> ok lock:stop() -> !!! {exception, {’EXIT’,{noproc, {gen_fsm, ... }}}}

If the server is stopped when it is locked, the model remembers the state as being locked, whereas the implementation clearly resets the locked state to unlock when restarting. We need to refine our model by changing the next state function of start to also reset the state from locked to unlock; in fact we reset it to the initial state and then update the field started to true. start_next(S,_,[]) -> (initial_state()) #state{started = true}.

When we do that, we pass the tests. OK, passed 100 tests 36.1% {lock,start,0} 31.4% {lock,stop,0} 22.5% {lock,lock,0} 9.9% {lock,unlock,0}

The test statistics indicate that we test the unlock function only in 10 percent of all API calls. We will return to the test distribution challenge later. One important observation here is that we cannot see from the statistics that we in any of the tests stop and start the server again, neither how often we actually lock and unlock again. B. Reading and writing Now that we have made sure we generate valid test cases, we add read and write operations. We do not want to restrict reading, but writing should only succeed in case we do have a lock on the server, otherwise it should return not_locked. We add a new field kvs to the record to store the values for

the different resources. When we read a resource, we check that we received the right one. Clearly, we also need to add the common precondition that we can only read and write when we have started the server. The precondition for write does not contain a check whether the server is locked, since we do want to check a write in an unlocked state and see that it actually returns an error. initial_state() -> #state{started = false,locked=false,kvs=[]}. write_pre(S) -> S#state.started. write_args(S) -> [key(), int()]. write_next(S, _, [Key, Value]) -> S#state{kvs = [{Key,Value}| proplists:delete(Key,S#state.kvs)]}. read_pre(S) -> S#state.started. read_args(_S) -> [key()]. read_post(S, [Key], Res) -> eq(Res, proplists:get_value(Key, S#state.kvs)).

This addition is not modelling the behaviour correctly, as can be seen when we use this model for testing: lock:start() -> {ok, } lock:write(a, 0) -> not_locked lock:read(a) -> undefined Reason: undefined /= 0

We made a commonly made mistake by updating the state data both in cases that the operation succeeds, and also in the case that the operation does not succeed. Note that we know that the success depends upon whether we are in the locked state or not. Here we therefore are going to use the state data to determine whether the write is successful. The change we make to the model uses the state both in computation of the next state data as well as in the postcondition for write. write_next(S, _, [Key, Value]) -> case S#state.locked of true -> S#state{kvs = [{Key,Value}| proplists:delete(Key,S#state.kvs)]}; false -> S end. write_post(S, [_,_], Res) -> case Res of ok -> S#state.locked; not_locked -> not S#state.locked end.

With those changes, the software under test passes a hundred tests with this specification. OK, passed 100 tests 24.8% {lock,start,0} 19.3% {lock,read,1} 19.2% {lock,write,2} 18.9% {lock,stop,0} 12.4% {lock,lock,0} 5.4% {lock,unlock,0}

Again, we cannot directly see how often we have written in the locked state or whether we have tested a write in the unlocked state. In order to do so, we would have to collect

more statistics, for example using the features one can define in the state machine libraries or by adding even more information to the state data and extracting that after the test has completed. The complete specification we derive in this way half a page long; in a realistic industrial setting, the specifications are easily 1000 to 2000 lines of code, including comments and helper functions. III.

L OCKING RESOURCES WITH EXPLICIT STATE

split the init state into two states, init and started. We do this graphically with our editing tool, written in JavaScript using the vis.js library [4] and running in a web browser. We only start in the init state and perform all other operations in the started state. Although the editing is done in the web browser, the corresponding Erlang code is automatically updated and compiled in an Erlang shell. We run a web server as part of QuickCheck to talk to the web browser graph editing tool. The resulting graph is depicted in Fig. 2.

TRANSITIONS

We will again develop a state machine specification for the same resource locker implementation, but now using the eqc_fsm library of QuickCheck. In this formalism we use named abstract states. In analogy with the model evolution in Sect. II, the first iteration only includes the start, stop, lock and unlock operations. A. Start, stop, lock and unlock In eqc_fsm the state is divided between state data and abstract state. The initial state data is specified by a function initial_state_data(). The abstract state machine consists of Erlang functions, but when using graphical editing of this abstract state machine, it suffices to mention in the specification that the graph is stored in a separate file. The code we write is almost identical to the start we made in Sect. II. One difference is, though, that the argument generators for the API functions now have a From and To states as input, these are the states from the abstract state machine. It allows the user to specify the command arguments depending on the transition taken in the abstract state machine. For this simple example, we only use this feature in Sect. III-C. When we compile this code, the compiler notices that the external fsm is missing and it creates the default state machine with one state and all defined transitions looping in that state. The state machine that we specify in this way is visualized in Fig. 1. All API calls are possible from the same state, by default called init.

Fig. 1.

Fig. 2.

Splitting the initial state into init and started states

When we run QuickCheck on this model we get errors, without surprise, similar to those we got in Sect. II: the unlock API can cause a crash, observed by a command performed after the unlock call, viz. lock:start() -> {ok, } lock:unlock() -> ok lock:stop() -> !!! {exception, ...}

We cannot unlock before we have locked and in order to fix that, we can again edit the graph and split the started state in two different states, one unlocked state and one locked state. Since we know the specification, we can already write the right transitions in each state, arriving at the state machine depicted in Fig. 3.

Visualization with all API calls in same state Fig. 3.

In the same way as we have seen in Sect. II, in this model we will see similar errors when we start testing, such as the exception {’EXIT’,{badarg,...}}. We cannot unlock a resource before we have started the system. We now want to

Splitting the started state into locked and unlocked states

The textual representation is updated behind the scenes and the code is automatically recompiled. The main advantage is

that we do not add many preconditions, but simply modify simple finite state machine with states and transitions, making the specification clear and making changes much less error prone.

steps is irrelevant. The result of editing the graph is shown in Fig. 5.

Without adding a single precondition, we end up with almost the same result as at the end of Sect. II-A. When we run QuickCheck all tests pass. OK, passed 100 tests > origin/master

The statistics tell us slightly more, for example, we can see that we called stop in both the locked and unlocked state during testing. We also see that we only called unlock in the locked state. We get the coverage of the state machine as a result of testing. We learn that we have actually called all possible transitions in the states we did expect them to be called. This information is fed back to the graph in which we visualise the coverage of a test run as shown in Fig. 4.

Fig. 5.

The final state graph

Our initial state data is simpler than the record type we used in Sect. II. It suffices to use a list of key value pairs as state data. We continue with the same step as before, adding the logic for read and write. Note that we now have specified that we cannot write in the unlocked state, which comes more or less naturally using the graphical editor. Therefore, we only get the positive case of the write, not the error case. We add: write_args(_From, _To, _S) -> [key(),value()]. write_next(_,_,S,_,[Key,Value]) -> [{Key,Value}|proplists:delete(Key,S)]. write_post(_,_,_,_,R) -> eq(R,ok). read_args(_, _, _) -> [key()]. read_post(_,_,S,[Key],Res) -> eq(Res, proplists:get_value(Key, S)).

We know from the state machine evolution in Sect. II, we had a problem with this model, since there we needed to reset the state when we restarted the server. The state data and the actual initial state were not connected. In the explicit modeling using eqc_fsm we get this reset for free. The semantics of visiting the initial state is that the initial state data is reset. Fig. 4.

Visualizing the test statistics in the graph

The reason that all tests pass is that we do not check anything. There are no postconditions in our model! But with only specifying a finite state machine, we have excluded all tests that violate the correct usage of the API functions. The API specification is done, now the logic for reading and writing must be modeled. B. Reading and writing To add the read and write calls to our state machine we need to perform two steps. We need to add the appropriate transitions to the state transition graph and we need to add the corresponding generator functions (i.e., read_args and write_args) to the specification. The order of these two

C. Complete model The Erlang description of the constructed state machine is shown below which additionally includes the negative test case that calls write in the unlocked state. This is possible because models described in this paper are test models, developed with an intent to describe what is being tested and how to validate responses from an implementation, not what a complete specification of a system under test is. Compared to a complete specification of a system under test, test models only include a subset of inputs. We add a write_next and write_post to the specification. initial_state_data() -> []. start_args(_,_,_) -> []. stop_args(_,_,_) -> []. lock_args(_,_,_) -> [].

unlock_args(_,_,_) -> []. write_args(_,_,_) -> [key(),value()]. write_next(locked,_,S,_,[Key,Value]) -> [{Key,Value}|proplists:delete(Key,S)]; write_next(unlocked,_,S,_,[_,_]) -> S. write_post(locked,_,_,_,R) -> eq(R,ok); write_post(unlocked,_,_,_,R) -> eq(R,not_locked). read_args(_, _, _) -> [key()]. read_post(_,_,S,[Key],Res) -> eq(Res, proplists:get_value(Key, S)).

This second specification is twice as short in the number of lines and arguably easier to read, since it separates the logic of state transitions from the logic of modeling the key/value store. Surely, the picture corresponding to it is part of such a specification. Finite state machine pictures like this are well known to engineers and they have little problems understanding such formal specifications. Reading expressions in preconditions is more difficult.

Fig. 6.

The coverage stemming from updated weights

D. Statistics – steering the test distribution When test statistics similar to the one in Fig. 4 is obtained for a complete model, it turns out that transition read was hardly ever invoked from the unlocked state. This is not surprising because it can only be executed if we have called write in the locked state before. In order to make read run more often, it is possible to adjust what is called weights, that determine how hard QuickCheck attempts to run a specific transition as part of a randomly generated walk. By switching the graphical editor into the coverage and weights mode, it is possible to see that all transitions have the same weight. In order to make read run more often from the unlocked state, we have to increase the weight of it as well as the weights of write and unlock from the locked state. After re-running the tests, the diagram shown in Fig. 6 is displayed that clearly shows that the transition of interest accounts for 11% of all executed transitions, compared to 2% previously. If weights of these three transitions are increased very significantly (such as to 10), it is possible that transition read from the locked state will never be invoked and hence shown in red on the diagram. Obtaining the desired frequency of transition invocation usually requires a few attempts at adjusting weights, made significantly easier where the outcome is immediately visible. IV.

R ELATED WORK

The separation of the graphical description of a state machine and Erlang code that defines generators of test data is close to what is known as X-machines [5] as well as Extended Finite-State Machines (EFSM). In both cases there is a clear separation between graphical representation and description of operations unlike that in many other notations such as UML state machines. QuickCheck is provided with a test model and a series of weights to prioritise exploration of it. In this way, it is similar to existing approaches to test purpose-driven EFSMbased test generation such as [6], [7]. The main difference is the focus of the described work on test models written

in Erlang. Quickcheck has a number of other features that are not covered in this paper: ‘shrinking’ is an automatic process to reduce the length of a failure trace in order to help a tester locate a fault. In addition, QuickCheck associates symbolic values to those returned by a system under test, this behaviour is similar to object binding between a model and an implementation in [6]. V.

C ONCLUSIONS AND FUTURE WORK

This paper shows how we can use a state machine diagram to complement the description of a test model written in QuickCheck. It shows that by using a graphical notation to define permitted sequences of calls, one can easily define and maintain a test model by writing less than half of Erlang code that would be needed otherwise and making it easier to understand. Future work involves experimental quantification of the effectiveness of the described tool, where a number of developers are given a problem of maintenance of a test model, with half of developers working with pure Erlang models and the other half using the described graphical notation. ACKNOWLEDGEMENTS This project has received funding from the EU FP7 Collaborative project PROWESS, grant number 317820, http://www.prowessproject.eu R EFERENCES [1]

T. Arts, J. Hughes, J. Johansson, and U. T. Wiger, “Testing telecoms software with Quviq QuickCheck,” in Erlang Workshop, 2006, pp. 2–10. [2] J. Hughes, “Quickcheck testing for fun and profit,” in Practical Aspects of Declarative Languages, ser. Lecture Notes in Computer Science, M. Hanus, Ed. Springer Berlin Heidelberg, 2007, vol. 4354, pp. 1–32. [Online]. Available: http://dx.doi.org/10.1007/978-3-540-69611-7 1 [3] K. Claessen and J. Hughes, “QuickCheck: a lightweight tool for random testing of Haskell programs.” in Proceedings of ACM SIGPLAN International Conference on Functional Programming, 2000, pp. 268–279. [4] Almende B.V, “vis.js,” http://visjs.org/, 2015.

[5]

M. Holcombe and F. Ipate, Correct Systems: building a business process solution. Springer-Verlag Berlin and Heidelberg GmbH & Co. KG, Sep. 1998. [6] M. Veanes, C. Campbell, W. Grieskamp, W. Schulte, N. Tillmann, and L. Nachmanson, “Model-based testing of object-oriented reactive systems with Spec Explorer,” in Formal Methods and Testing, vol. 4949. Springer Verlag, 2008, pp. 39–76. [7] C. Artho, A. Biere, M. Hagiya, M. Seidl, E. Platon, Y. Tanabe, and M. Yamamoto., “Modbat: A model-based API tester for event-driven systems,” in 9th Haifa Verification Conference (HVC), Israel, 2013.