Initial steps towards assessing the usability of a verification tool Mansur Khazeev∗ , Victor Rivera† and Manuel Mazzara‡ Innopolis University, Software Engineering Lab. Innopolis, Russia Email: ∗
[email protected], †
[email protected], ‡
[email protected]
Abstract—In this paper we report the experience of using AutoProof to statically verify a small object oriented program. We identified the problems that emerged by this activity and we classified them according to their nature. In particular, we distinguish between tool-related and methodology-related issues, and propose necessary changes to simplify both tool and method.
I. I NTRODUCTION Formal proof of correctness of the software is still not usual thing nowadays, even though the hardware and the software technologies for verification has significantly improved since it was first mentioned about verifying compiler - a cipher for an integrated set of tools checking correctness in a very broad sense [1]. In ideal world, verifying software would need only pushing a button, though this kind of provers exist, but they can verify only simple or implicit properties such as absence of invalid pointer dereference [2]. However to be able to fully verify a software, a specification, like contracts in design by contract methodology, should be provided against which it will be verified. The term was introduced in connection with design of the Eiffel programming language but nowadays it is adopted for many other popular languages. There is a prover for functional correctness of programs written in Eiffel language called AutoProof. Having a powerful methodology for framing and class invariants it fully supports advanced object-oriented features [2]. In order to test the usability and the ability of the tool to be applied in general practice, a series of case studies are being conducted: verification of a) ordinary class; b) a subset of classes of Tokeneer system [3] and c) Tokeneer system entirely by non-expert in the field. This paper will describe the results of the first step - verification of class MY SET, that implements classic sets from set theory, and its classical operations. The challenge of this exercise is mainly related to difficulties of the tool usage. There is no explicit documentation available for the users: only the website as a main source, and several papers from the authors of the the tool. However the notation was evolving and in some cases these papers are not up relevant. As it was previously said verification with AutoProof often requires additional annotation that help the tool to derive the more complex properties from trivial ones. However, for someone who does not know how the tool works and what is going on under the hood, the feedback from the tool would not always be helpful. Of course, for some specific tool, which is
to be used by a narrow group of scientists this is excusable, but since the main purpose is to make the tool for static verification applicable in industrial practice - complete documentation should be developed in the meanwhile minimizing the need of additional assertions. It is essential because the tool still requires a knowledge of underneath mechanisms and a number of extra annotations. II. AUTOPROOF AutoProof [4] is a static verifier for programs written in Eiffel, which is a real complex object oriented programming language that natively supports Design-by-Contract methodology. Users can specify the behavior of Eiffel classes by equipping them with contracts: pre- and post-conditions and class invariants, that are represented as assertions [3]. This allows to minimize amount of specification and effort needed to reason about programs properties. AutoProof follows the auto-active paradigm where verification is done completely automated, similar to model checking [5], but users are expected to feed the classes providing additional information in the form of annotations to help the proof. The tool is capable to identify software issues without executing the code, therefore opening a new frontier for “static debugging”, software verification and reliability, and in general for software quality. AutoProof verifies the functional correctness of the code written in Eiffel language equipping with contracts. The tool checks that routines satisfy pre- and post-conditions, maintenance of class invariants, loops and recursive calls termination, integer overflow and non Void (null in other programming languages) references calls. For that AutoProof uses an intermediate verification language Boogie: it translates the Eiffel code into a file and feeds it to a tool called Boogie [6] as well. The Boogie tool generates verification conditions (logic formulas whose validity entails correctness of the input programs) that are then passed to an SMT solver Z3. Finally, the answer retrieved back to Eiffel. The tool supports most of the Eiffel language constructs: inlined assertions such as check (assert in other programming languages), types, multi-inheritance, polymorphism. By default AutoProof only verifies user-written classes when a program is verified, referenced libraries should be verified separately or should be based on preverified library base eve. [7]. This preverified library has most of abstract
classes, the naming starts with prefix V verified [3].
meaning that it is
III. C ASE STUDY EXPERIENCE This case study is a small exercise which was to implement in Eiffel programming language a generic class MY SET using V LINKED LIST class and equipping it with contracts. On the next step this class had to be proved for correctness adding all needed annotations. Basically, the exercise was about expressing some properties of the set as invariants: • • •
No duplicating elements Order of elements in the set does not matter Cardinality is always bigger or equal to 0
And implementing some basic operations: • • • • •
is empty - set with no elements in it cardinality - number of elements in the set has - if the set contains some specific element is strict subset, is subset - if the set is subset of other set union/intersection/difference - functions returning new set from two other
During the process of verification, it turned out, that for non-expert users working with V classes was too complicated therefore the decision was done to ease the example replacing V LINKED LIST with SIMPLE LIST. IV. P ROBLEMS TAXONOMY Despite the class under study was rather simple, it resulted in facing many different problems because of lack of experience with the AutoProof tool before. Beginning with issues of the tool installation and kept up till checking the verified class with tests. While analyzing the results, all those problems were divided into two main categories: problems with the tool and with the approaches or methodologies used in the tool. The first category includes rather minor problems and bugs, mostly related to the particular implementation in the tool and means that those require local fixes. However, the second category require improving the methodology or replacing them with the alternative ones. A. Problems with the tool The challenge of this exercise was mainly related the the fact that it had to be done by a person that have no previous experience with AutoProof nor any other similar tools. The difficulty is not in some sophisticated user interface (UI) of the tool, which is by the way, rather simple - a ”Verify” button and a table (see figure 1), where the results are being displayed. The main obstacle is in the fact that, the tool expects an input in terms of assertions, and it is not always clear what the real effect of each input is.
Fig. 1. UI of AutoProof
1) Lack of documentation: As it was previously said the tool requires additional annotations that lead the verification and help to derive one property from another. Although AutoProof uses the syntax of Eiffel language, there are many new things has been introduced by the developers of the tool. Most of this briefly described in an on-line manual which is available on the web-site of EVE1 . In addition there is an online tutorial which are useful for quick acquaintance with the tool. However, this is surely not enough for using it as an reference while working with the tool. Overall, there is not much of documentation available in the Internet: the website, and several papers from the authors of the AutoProof tool, however the notation is some of these papers are not relevant anymore. Of course, for some specific tool, which is to be used by a narrow group of scientists this is excusable, but if we are talking about applying verification in industrial practice - the documentation would be essential because the tool still requires quite a bit of additional annotations and knowledge of underneath mechanisms. 2) Feedback from the tool: The process verification (or static debugging) starts with pushing a ”Verify” button and when the tool returns some feedback (error and failure messages) - fixing them by adding required assertions and repeating the procedure until everything is ”successful”. A failure message is a message that describes the property that can not be satisfied. The error messages returned by the tool are information on some issue with the input. AutoProof implements a collection of heuristics known as two-step verification that helps discern whether failed verification is due to real errors or just to insufficiently detailed annotations [2]. The failure messages usually informative, they describe the property and sometimes may describe what can be the reason. However, the error messages usually do not tell more that that the tool could not proceed with input it has received. It occurs due to problems in ”translation”. If there is a failure during translations between AutoProof and Boogie the verification process stops and the AutoProof returns an error message about ”internal failure” in some cases with no additional information. Usually this can be resolved only if the error was caused by newly entered assertion, because otherwise it is not possible to learn what exactly causes this error. This may become an issue while verifying a class with all features implemented and contracts stated because it is not possible to determine the source of the the 1 EVE (Eiffel Verification Environment) - is a development environment integrating formal verification with the standard tools
error. The solution might be to add features one by one, verifying each one separately. 3) Redundancy in notations: AutoProof supports most of the Eiffel language as used in practice [2], as well as introduces some new notations to support the methodologies used for verification. Usually those notations are useful for decreasing amount of assertion by skipping repetitions, for manipulating the defaults of semantic collaboration in the feature or the class (will be discussed in IV-B1). However, some of these additional notations introduce redundancy. For example, the Eiffel class, containing one or more creation procedures should have these procedures explicitly listed under keyword create. In figure 2 make is defined as creation procedure. However, the verifier expects additional note clause with status: creator in order to treat it as creation procedure. create make feature make note status : creator do ... Fig. 2. Denoting a procedure make as creation
Another example is depicted in the figure 3 were the status of the function has to be defined with impure keyword, meaning that it modifies the state of the object, and then, modify clause with empty squire brackets that stands from function purity. This is done just to be able to use wrap/unwrap in the function. 4) Misleading notations: AutoProof support inline assertions and assumptions, which can be expressed using check clause. Checks are intermediate assertions that are used during the debugging process to check if you have the same understanding of the state at a program point as the verifier [8]. However, removing an intermediate check clause from successfully verified feature might fail later verification, because check assertions guide the verifier towards the successful verification. Perhaps the developers of the tool made the decision not to add new keyword and to use check clause from Eiffel in order to keep thing simpler. However, this might convey users to a wrong understanding of the impact of the clause. 5) Order of assertions: The check clauses might be useful because the verifier not just checks the property enclosed, but also uses it in further derivations, in case it was proven correct. Same applies to invariants, and that makes order of properties substantial for the tool. Basically, this means that properties are joint not by and operator but and then instead, which may lead to verification failures even if all needed properties are stated but in improper order.
6) Limitations of the tool: Currently, AutoProof has some limitation on usage of the verified library eve-base. Even though, this library is fully verified, it can only be used when the void-safety property of the tool is disabled. The latest works demonstrate that this library was checked for void safety, however this versions are not available yet. 7) UI bug: The tool lacks of support which can be observed in some rare bugs. For example it can skip some of the features of the class or verify only one of the feature instead of whole class. Even though, the tool never returned improper successful verification results, this kind of bugs might be annoying. 8) Compilation from the sources: The tool is still raw and it is better not to use the latest versions. The repository requires some clean up, because currently, if one try to build the verifier from some last sources, it just will not compile. B. Methodology: Problems with Semantic collaboration AutoProof supports advanced object-oriented features and a powerful methodology for framing and class invariants, which make it applicable in practice to idiomatic object-oriented patterns [2]. But this power comes with price of simplicity the tool requires understating of all these methodologies under the hood. This makes the tool available for the expert users only by over complicating the verification of even a relatively simple classes. 1) Semantic collaboration: AutoProof supports semantic collaboration, a full-fledged framing methodology that was designed to reason about class invariants of structures made of collaborating objects [2]. This methodology introduces own annotations which does not existed in Eiffel language. Annotations are used to equip features and classes entirely with additional information which proceeded by the verifier. These include many ”ghost” attribute 2 which are quite useful when maintenance of global consistency is required as in subject/observer or iterator pattern examples [9]. Meanwhile, all these ghost attributes and default assertions that are added into pre- and post-conditions unreasonably overcomplicate verification process of rather simple classes. In the earlier stages of verification quite a bit of time was spent trying to understand the failure message that default is closed may be violated on calling some feature” of some private attribute data. Basically, the tool was expecting owns = [data] in the invariants of the class which is not obvious without understanding the methodology. Moreover, for this specific example the property could be derived from exportation status of the attribute. The verifier ignores this useful informations and requires the properties stated explicitly. Eiffel language supports the notion of ”selective export”, which exports the features that follow to the specific classes and their descendants [10]. Considering this information might narrow the need for using semantic collaboration [11]. 2) Framing: The framing model is used in AutoProof in order to help reason about objects that are allowed/not allowed to be updated. There are different ways to specify 2 class
members used only in specifications
this by adding modifies clauses in preconditions: one can specify one or more model fields, attributes of the class or list of objects which may be updated. This is rather intuitive and straightforward, though seems to be more relevant to postcondition clauses. There are default clauses included into each routine, so it should be used only if behavior of the routines are different from default. In MY SET example all the routines had to be pure and because of that modify clause had to be specified in each of them. Even in a function, which is pure by default, if one wishes to use ”is wrapped” clause, that function needs to be specied as impure and in a meanwhile, that it does not modify anything (see figure 3); which looks confusing. feature −− Queries union(other : likeCurrent) : likeCurrent −− New set of values contained in ‘Current’ or ‘other’ note status : impure require modify nothing : modify([ ]) ... Fig. 3. Pure function has specified as impure
V. R ELATED W ORK Formal notations to specify and verify software systems have existed for long. A survey of the major approaches can be found in [12], while [13] discusses the most common methodological issues of such approaches. Design-by-contract [14] combined with static verification technologies could offer wide applicability in practice. Nowadays, many popular programming languages support this methodology using embedded contract languages [15], and some are expected to support it on a language level soon. In [16] the authors present an extensive survey of algorithms for automatic static analysis of software. The discussed techniques (static analysis with abstract domains, model checking, and bounded model checking) are different, but complementary, to the one discussed in this paper, and they are also able to detect programming errors or prove their absence. The importance of focusing on usability requirements for verification tools has been identified in [17]. In particular, the non-expert usage of AutoProof has been studied in [18] where programmers with little formal methods experience have been exposed to the tool. VI. C ONCLUSION AutoProof is not trivial in usage and needs detailed knowledge of what is going on behind the scenes. The tool requires a number of additional assertions in pre- and post-conditions, as well as in invariants for successful verification, ignoring some information that has been already provided. To be used in practice the usability of the tools should be significantly improved making verification simple enough to be used by ordinary programmers. By simple we mean, that it should:
require less additional annotations by: automatically deriving properties from information which currently is neglected; removing redundant clauses and reworking some of ghost class members; • provide clearer feedback in case some property can not be satisfied, offering hints and possible solution; In addition, it is important to: • develop a documentation describing all used methodologies, including detailed information about notations with examples • clean up and rebuild the tool from latest sourced that are available in EVE repository, fixing all the bugs that we identified; •
R EFERENCES [1] J. Woodcock, E. G. Aydal, and R. Chapman, The Tokeneer Experiments, pp. 405–430. London: Springer London, 2010. [2] J. Tschannen, C. A. Furia, M. Nordio, and N. Polikarpova, “Autoproof: Auto-active functional verification of object-oriented programs,” CoRR, vol. abs/1501.03063, 2015. [3] M. Khazeev, V. Rivera, M. Mazzara, and A. Tchitchigin, “Usability of autoproof: a case study of software verification,” in Proceedings of the Institute for System Programming, vol. 28, issue 2, pp. 111–126, 2016. [4] J. Tschannen, C. A. Furia, M. Nordio, and N. Polikarpova, “Autoproof: Auto-active functional verification of object-oriented programs,” in 21st International Conference, TACAS 2015, London, UK, April 11-18, 2015. Proceedings, pp. 566–580, 2015. [5] E. M. Clarke, Jr., O. Grumberg, and D. A. Peled, Model Checking. Cambridge, MA, USA: MIT Press, 1999. [6] K. R. M. Leino, “This is boogie 2,” tech. rep., June 2008. [7] “Autoproof manual.” [Online] http://se.inf.ethz.ch/research/autoproof/manual/ (Date last accessed: 2016-03-15). [8] E. Z. Chair of Software Engineering, “Autoproof tutorial,” [9] C. A. F. Nadia Polikarpova, Julian Tschannen and B. Meyer, Flexible Invariants through Semantic Collaboration, pp. 514–530. Cham: Springer International Publishing, 2014. [10] B. Meyer, Touch of Class: Learning to Program Well with Objects and Contracts. Springer Publishing Company, Incorporated, 1 ed., 2009. [11] D. de Carvalho, “Modularly reasoning in object-oriented programming using export status.” unpublished, 2017. [12] M. Mazzara and A. Bhattacharyya, “On modelling and analysis of dynamic reconfiguration of dependable real-time systems,” in 2010 Third International Conference on Dependability, pp. 173–181, 2010. [13] M. Mazzara, “Deriving specifications of dependable systems: toward a method,” in Proceedings of the 12th European Workshop on Dependable Computing, EWDC, 2009. [14] B. Meyer, Object-oriented software construction, ch. 11: Design by Contract: building reliable software. Prentice Hall PTR, 1997. [15] M. F¨ahndrich, M. Barnett, and F. Logozzo, “Embedded contract languages,” in Proceedings of the 2010 ACM Symposium on Applied Computing, SAC ’10, (New York, NY, USA), pp. 2103–2110, ACM, 2010. [16] V. D’Silva, D. Kroening, and G. Weissenbacher, “A survey of automated techniques for formal software verification,” IEEE Trans. on CAD of Integrated Circuits and Systems, no. 7, pp. 1165–1178. [17] R. Razali and P. Garratt, “Usability requirements of formal verification tools: A survey,” Journal of Computer Science, no. 6, pp. 1189–1198. [18] C. A. Furia, C. M. Poskitt, and J. Tschannen, “The AutoProof verifier: Usability by non-experts and on standard code,” in Proceedings of the 2nd Workshop on Formal Integrated Development Environment (FIDE) (C. Dubois, P. Masci, and D. Mery, eds.), vol. 187 of Electronic Proceedings in Theoretical Computer Science, pp. 42–55, EPTCS, June 2015. Workshop co-located with FM 2015.