Generating Invariants for Linked Heap structures

Generating Invariants for Linked Heap structures Vajihollah Montaghami April 25, 2011 Abstract Verifying programs that manipulating dynamic linked structures in heap is a complicated procedure. In this project, a technique for generating invariants from abstraction function is proposed. The goal is to improve programmer’s productivity by produce higher quality programs with less effort. I Built a library of linked data structures that have been specified by professionals. Translate the specifications in that library into JFSL/Alloy which has a simple syntax and more concise for a variety of things that one wants to say about linked structures. Given the abstraction functions, the technique generates a list of candidate invariants from class invariants in this library. Such candidates are checked by JForge against Abstract Function and implementation code. In compare to other methods, the result shows less lines of specification is required to specify the invariants and verify the system.

1

Introduction

To verify the programming blocks, the invariants have to be clearly determined. As the formal specification is a time consuming task and compare to other programing styles requires more knowledge, many practitioners do not really care to explicitly specify such statements. For example, specifying a Binary Search Tree by Jahob[6] is getting more that seventy percent of source code. In [8], D. Rayside has developed some techniques to specify the abstraction functions and get the synthesized code for tree-like heap data structure. Given the abstract function specification, I derived the invariants that can be examined by the checker to verify whether or not they are consistent with what she intended. To cope it, I created a reliable library of linked data structure that have been specified by professionals. Then, by looking at the abstraction functions and class invariants in the library, a list of candidate invariants will be guessed. Definitely, a subset of candidates are not corresponded to the specification. We will use JForge [9] verification system to check which candidate is consistent with the program specification and code. In order to build up the specification library, we are going to use JSFL, an alloy-based specification language. Since the Alloy is fully based on the relational logic, the syntax is just simpler and more concise for various use-cases such as specifying the abstract functions and invariants of the linked data structures.

1

Unlike the other invariant generation methods, our method is relying on a loose generation algorithm. Basically, instead of generating the proper invariants, it generates a set of candidate invariants. The candidates are derived from the abstraction function which is written in a relational logic specification language like Alloy. The abstraction function describes how the ADT’s information can be viewed from outside. Since the abstraction function just describes about how the ADT’s information is exposed to the clients, the correctness of the implementation will not be verified lonely. One step to reach the implementation verification is to define the proper domain of abstraction function or concrete variablesIt means that if the implementation always satisfies the associated invariants, the abstract specification fields always hold the abstract state of the system. The candidate can be assumed as some guesses that might be accepted or declined later. For example, provided F1 is and abstraction function of the tree, it can be guessed that both right and left pointers are not allowed to point to a same node. As the candidates’ list generation process does not consider the implementation, some irrelevant invariants might be included. In the next section, the required backgrounds are described. The Background is mostly about Alloy, JFSL/Jforge. For illustration the concepts, an example is provided in the following section. Then, the candidate invariant generation algorithm and the related software architecture are explained in detail. Because of the important role of the invariant generation rules, they are explained in a separate section. The related works is discussed in the next section Finally, the conclusion and future work are discussed in the last section.

2 2.1

Background Alloy

Alloy is an object-modeling language based on the combination of first-order logic and set theory. It can be used to create a light-weight model specification of a system. The light-weight model means that it gets the advantageous of traditional formal methods by lower initial investment [4]. A benefit of light-weight modeling is that the model can be incrementally developed by focusing on the very important demands and risky areas [?]. Relying on a complete and formal semantics, Alloy makes possible to automatic model analyzing. Besides, the language has been developed with an analyzer tool. The Alloy analyzer tool converts the Alloy program to boolean formula and solve it using a SATsolver. The conversion is being done in two steps that are translating from Alloy syntax into Intermediate Representation (IR) and building the boolean formula from IR. In order to become familiar with the Alloy language an example is provided in the next section. However, before go through the example, some concepts have to be introduced. All Alloy specifying structure are upon atom and relation. Atom is an elementary entity that indivisible, immutable, and uninterpreted. As an atom has no implicit meaning, Alloy exploits the relation to capture more properties or meaning. A relation is a structure that relates atoms. It can be seen as a set of tuples containing a sequence of atoms. The relation can also be supposed as a table that each row represents a tuple. The order of 2

the columns matters, but rows can be selected in any order. The relations are typed and all atoms in the same column are in the same type. The number of the columns is called arity. The minimum arity is one which is called unary or set [?]. The Alloy model represents the atoms and relations as data, fact, and assertion. Data is set s and relations. Fact is formula that restricts data. Assertion is a formula for checking data. Such formula are defined by using first-order logic and can be structured by predicates and functions. To check the assertion, Alloy builds a boolean formula by joining all facts and the negation of the assertion. If some sets or relations are founded that satisfy such conjunction, a counter example will be provided for the assertion.

2.2

Jforge & JFSL

JForge Specification Language (JFSL) is a light-weight specification language for objectoriented languages such as Java. The language is based on the Alloy relational logic. Along with a tool called Jforge, JFSL is suitable for bounded verification analysis. JForge tool is based on bounded verification which makes the logic decidable. Thus, it provides a push-bottom interface by finding the counter-example traces that are converted to initial program state and will be understandable for the programmer. JFSL provides a collection of Java 5 annotations to integrate the specification into the Java source code. The content of the annotations are essentially Alloy expressions and formulae. Yessenov [9] has defined ten major annotations in his master thesis[9]. In order to specify the abstraction function and check the conjectured invariants, I have exploited @SpecField and @invariant. Like the other D. Rayside’s works [8], the programmer are just dealing with @SpecField to specifies the abstract function and relates the concrete fields to the abstract ones. The @invariant annotation is transparent from programmer’s view and is automatically declared when the candidate invariants are going to be checked by the JForge tool. In fact, the user will not directly touch it. She only sees the result which are relevant, irrelevant, and semi-relevant invariant expressions. The general form of @SpecField annotation is: @SpecF ield(”f : rf romg|α”) where f is the name of the specification field. The name-space of the specification field name is the same as the concrete field. The specification fields are public which means any specification expressions have access to it. r is determining the multiplicity of f ’s range, g is f ’s data group. Finally, α is the Alloy based abstraction function expression. As D. Rayside mentioned in [8], g is not applicable for the abstraction function specification and will be ignored in the following formulae. In fact, I am interested more on the α or abstraction function part which is declared by JFSL formula. Generally, the abstraction function can be either in the form of this.f = e or p(e1 , ..., en , this.f ). In the former one e is an Alloy/JFSL formula that relates concrete 3

values to the abstract one. In the later one, p is a logical predicate and ei is a JFSL expression. For programmer’s convenience, D. Rayside [8] has provided a library of useful predicates that mainly with sequences.

2.3

JForge

2.4

Example

In this section, I have provided an example to illustrate the JFSL/Alloy and abstraction function concepts. Suppose that we want to find the invariants for binary search tree which is shown in Figure 1. An object of a BT reerepresents a binary search tree. The root property represents a sentinel node to the tree’s root. The field valuedenotes the primitive integer value in each node object. The lef t field points the left-hand side child node and right field points the right-hand side child node in the tree. In this example, root, lef t, right references all are the same type as BT ree. There are a group of methods that are manipulating such fields for creating, inserting, and deleting the tree’s node. Due to lack of space the implementation of the methods have been hidden.

Figure 1: Sample Code The abstraction function for the BT ree class has been written in the @SpecF ield annotation on the top of the class name. The abstract field is ”entries”, its multiplicity is as a set, and its type is BT ree as the data structure type. Above all, the abstraction function in the assignment-like equation form relates the concrete values to the abstract one which is ”this.entries”. The Alloy expression, this.root. ∗ (lef t + right) declares the connection between the nodes by root, lef t, and right concrete fields. Literally, it means that all nodes are accessible from the root node by iterating over lef t or right references. Recalling from Alloy, join, or specifically dot join, of the relations p and q is the relation that is taken every combination of a tuple in p and a tuple in q, and including their join, if it exists. the ’.∗’ is a join reflexive, which joins the left-hand side relation with the reflexive-transition closure of the right-hand side. Besides, the ’+’ is a set of the union of both sides relations. Such expression is just a general abstract function that relates concrete values to abstract values. It just determines the general structure of the heapbased data structure and remains silent about the other constraints such as cyclic/acyclic, size, incoming edges, shared edges. The constraints can be provided by other concept 4

Table 1: Abstraction Function Syntax Rules AFExpr

= UnaryRoot UnaryExpr | CallName Param

UnaryExpr

= this | null | UnaryExpr | UnaryExpr | UnaryExpr | UnaryExpr | UnaryExpr

BinaryExpr

’+’ UnaryExpr ’-’ UnaryExpr ’.’ BinaryExpr ’.*’ BinaryExpr ’.ˆ ’ BinaryExpr

= FieldExpr FieldName | UnaryExpr ’+’ UnaryExpr | UnaryExpr ’-’ UnaryExpr

which is called invariant.

3

The Architecture

In this section the procedure and related software architecture is described. The system’s input is a source code which is annotated by some specification for declaring abstraction function. The system’s output is a set of invariant which is preserved during all object’s behavior. The specification expressions, here means abstraction function, is based on the JFSL/Alloy grammar. At the first step, the annotated source code is taken by the program and the abstraction function expression, specified in @SepcF ield, is extracted. Next, The an Abstract Syntax Tree (AST) is constructed to represent the Abstraction Function structure. The Abstract Syntax, which is presented in Table 2, considers the below features to make it easier to generate invariants. 1. The syntax rule are classified under the arity of the output relation which can be unary, binary or set. As such classification gives us more information about the contexts, this can help to generate more related invariant. 2. The root production of AST is always a set or sequence. It is more convenient to represent the data structures. 3. The leaves are always either fields or predefined singletons such as this, null, or int. After constructing the AST, the set of candidate invariants can be generated by visiting each intermediate nodes. Such rules is explained in the section X. In the final step, 5

the initial source code is annotated with the set of candidate invariants. At the last step, Jforge checks the source code to find the compliance between the the data structure implementation and the specification including invariants and Abstract Function. The checking is performing on per specification and method case basis that provides tracking the progress of verification. The result would tell us, which pair of invariant/method has been passed or accepted. The invariant that passed by all methods will be the invariant of the class for sure; however, the invariant which has been mostly but not completely passed can be supposed as a warning point. The warning points tell the programmer that some methods might have implementation problems which make the invariant unacceptable.

Figure 2: Software Architecture The Figure 2 shows a blueprint of the implemented candidate invariants generation procedure. The Syntax Analyzer component creates the AST according to the syntax rules. Given an invariant repository, the Invariant Generator component generates a candidate once it meets a relevant production which is described in the next section. The candidates are checked by JForge to see which invariant can pass all class’s operations. The result is two sets of accepted and rejected invariants. In this version, I do not argue about the rejected invariants, but in the next step, the rejected invariants are investigated to see why it has been rejected. If it only a portion of the methods rejected it, it might be because of the method implementation infection.

4

Invariant Candidate Generation

In this section, I will explain how to extract the candidate invariants from abstraction function by a syntax directed approach. The result will be checked by JForge to filter out the improper invariants. The rest of the candidates are preserved by all class methods and operations. We used Alloy model to describe each production. The Alloy models are 6

Predicate ’+’ or ’−’

Table 2: Invariant Generation Rules Expression no lhs.udenotes & rhs.udenotes

’.’

no disj a, b : lhs.udenotes | let r = rhs.bdenotes | some a.r & b.r

’.∗’ or ’.ˆ ’

no o : JObject | { some disj x, y : JObject | { ((x→o)+(y→o)) in (rhs.bdenotes) =⇒ (x+y) in (lhs.udenotes) + (lhs.udenotes).ˆ (rhs.bde.bdenotes)} and { no o : JObject | { (o→o) inˆ(rhs.bdenotes) =⇒ some x : lhs.udenotes | (x→o) inˆ(rhs.bdenotes) } and no disj p, q : lhs.udenotes | (p→q)inˆ(rhs.bdenotes)}

link2seq(header, next,result)

all n : JObject | n ! in n.ˆ next all n : JObject | n in (header).* next all n: header.* next | n.next & header result.size = #((header).*next)

just showing the AST related part of the abstract function.The Alloy model draws on a signature, JObject, and two relations, udenodes and bdenodes. The runtime java objects are modeled by JObject signature. The udenodes and bdenodes relations make use of modeling the unary and binary productions in the abstraction function. Reminding the unary relation is like a on column table and can be interpreted as a set. The binary relation is a relation with two atoms. The value of such notation is determined by the abstraction function semantic which is written by JFSL/Alloy. The first invariant is about the different (-) and union (+) productions. As it is shown in the Table 2, it means that there is no intersection between the interpretation of the left-hand side and right-hand side. It is taken to mean that that the pointers to the left and right objects do not point to the same object. For example, in the abstraction function of the binary tree, this.root. ∗ (lef t + right), the pointers to the left and right have no intersection and do not pointing in the the same object. Such invariant can be applied either unary or binary relations. The unary join is constrained to more complicated invariant. In a nutshell, the invariant says that an object will not be accessible from two different paths after the join operation. According to the invariant formula in Table 2, there is no two disjoint starting point from left-hand side, such as a, of the join leads to a common point which is starting from the right-hand side. The invariant for the Join-Reflexive-Closure and the Join-Closure are the same and includes three other invariants. Assuming that Join-Closure or Join-Reflexive-Closure are applied on a heap data structure, the right-hand side can be intuitively supposed as the

7

starting point and the left-hand side as the traversing path.As Table 2 depicts, the first invariant says that each object is accessible only from one path. Namely, no object has two different income edges, here like X and Y. Such invariant is more intuitive once tree-like structures or generally acyclic ones are being analyzed. The next invariant preserves the data structure from cyclic pointers. It means that a starting point in the left-side will not lead to a cycle in the right-hand side. In other words, it does not allow the heap structure to have a loop. This invariant is more usual in the tree-like heap structure; however, in the list-like structures, all linked object can be visited iteratively in a global loop. In such case, the last pointer of the last object in the right-hand side has to point out to the interpretation of the left-hand side. The Table 2 is showing such formula for the cyclic invariant. The previous invariants are related to the right-hand side of the operator and tried to preserve the data structure in the right-hand side. This invariant checks whether lefthand side enters the data structure properly. For example, in a tree data structure, the left-hand side has to point to the root of tree and does not point to one of the intermediate nodes. It means that no starting points on the left-hand side are hierarchically related. Now, the invariant of join, reflexive join, union and difference have been determined. These operators are useful to declare the abstraction function of heap data structures. In JFSL, a predicate library has been provided to simplify the abstraction function declaration. As an illustration, it is a bit tricky to declare an abstraction function to convert a linked-list structure into a sequence. This will be happened, once we want to directly write the abstraction function of java.util.linkedlist that is implemented with a sentinel node as circualr list. The link2seq predicate, Table 2, takes three parameters which are called header, next, and allentries for passing the list’s header, the next relation, and returned sequence respectively. In fact, it takes a list of concrete fields and returns a sequence of abstract values. In JFSL/Alloy a sequence is a relation between a series of numbers and sequence of atoms, means ”Int → Atom”. As the name of the predicate shows, it takes a linked-list and returns a sequence. Hence, it can be inferred that the header and next relations are probably holding some linked-list corresponded invariants. Since just one pointer has been mentioned, the list would be single linked-list and all nodes are reachable on one way from header to the tail. As it is depicted in Table 2, the first invariant is about reachability of all nodes. That is, by iterating on the the next relation starting from the header relation, all nodes have to be accessible. It ensures that all links are directly chained properly from header to the tail. The two next invariants are dealing with the list circularity. In the cyclic linked-list, the header is reachable from the last reachable node in one way. In contrast, there is no reachable path between the any node and its predecessor nodes. Since the parameters do

8

not mentioned the circularity of the list, it can be supposed as either cyclic or acyclic. Both invariants would be generated and will be checked to select one of them. A list cannot have both invariants. The related invariant is checking the size of returned sequence. The size of the sequence has to be equal to the number of all reachable nodes from header. The seq type in Alloy has the size property indicating the contained item counts. Such value has to be exactly the same as the number of reachable nodes from the header.

5

Related works

Dynamic anlysis and symbolic execution are two most relevant works. In [7], a technique has been proposed to generate Alloy specification from set of given example instances. Indeed, aDeryaft is based on the Daikon[2] which detects invariant from heap snapshots. Daikon is leveraged by supporting complex data structures, employing some optimization heuristics, and generating Alloy invariants. It hereditary has a restriction to extract the invariant from a limited size of instances. Generally, dynamic invariant detection system extracts the possible invariants from the program execution. The program source is marked to trace the particular variables through the execution of a set of test cases. Finally, the result is evaluated over both marked and derived variables. DySy [1] and KRYSTAL [5] employ symbolic analysis techniques to extract likely local data structure invariants. The former is using symbolic execution and dynamic testing to extract pre(post)condition for each procedure. It generates predicates from program execution and elicits invariants by symbolic variable replacement.The later uses universal symbolic execution that makes a new symbolic variable per every not assigned left side value. To make such symbolic memory mapping and path flow, the program is executed against test cases and the result is used to generate a set of local symbolic predicates. The final invariants are resulted after validating the invariant that simplified and elicited by taking conjunction and disjunction of local symbolic predicates respectively. The soundness and completeness of generated invariants are highly depends on the equally of the test cases. Besides, the global linked data structure invariants, such as acyclic linked list, cannot be generated by such technique. Daikon and DIDUCE[3] are based on dynamic approaches. DIDUCE employs online analysis to extract simple invariants from program variables’ values. Daikon generates more relevant invariant by taking a set of templates.The set is fixed and contains the patterns for (in)equalities and some relationships between the program variables. At some particular points, like function entry and returning points and loop heads, Daikon reveals invariants by executing such points over all values in the scope and testing against all execution of test cases. One of the main drawbacks of the Daikon is its disability to generate the invariants involved intricate invariant of data structures such as the ones relating to path length [5]. Moreover, the test cases and preset invariant patterns are influencing on the quality of the inferred invariants.

9

6

Conclusion & Future work

In this project, I have tried to develop a technique to mechanically generate heap-data structure from the abstraction function. The technique is based a JForge tool helping to check the generated candidates. Comparing to the other methods, this technique needs less specification and by giving a abstract function, the technique can provide a list of invariants that always held through running time. As the future work, I am going to expand the invariant repository by exploring more examples and support more data structures such as HashTable, or even BTree. As the next step, in an interactive procedure, the rejected invariants are analyzed more to see whether or not an infection in the method’s implementation is caused. Finally, it might be possible to make acceptable a rejected invariant by a slight of changing in invariant specification.

References [1] Christoph Csallner and Georgia Tech. DySy : Dynamic Symbolic Execution for Invariant Inference Categories and Subject Descriptors. Science. [2] Michael D. Ernst, Jeff H. Perkins, Philip J. Guo, Stephen McCamant, Carlos Pacheco, Matthew S. Tschantz, and Chen Xiao. The Daikon system for dynamic detection of likely invariants. Science of Computer Programming, 69(1-3):35–45, December 2007. [3] S. Hangal and M.S. Lam. Tracking down software bugs using automatic anomaly detection. Proceedings of the 24th International Conference on Software Engineering. ICSE 2002, pages 291–301. [4] Daniel Jackson. Alloy: a lightweight object modelling notation. ACM Transactions on Software Engineering and Methodology, 11(2):256–290, April 2002. [5] Yamini Kannan and Koushik Sen. Universal symbolic execution and its application to likely data structure invariant generation. In Proceedings of the 2008 international symposium on Software testing and analysis, pages 283–294, New York, New York, USA, 2008. ACM. [6] Viktor Kuncak. Modular Data Structure Verication. PhD thesis, 2007. [7] M. Malik, Aman Pervaiz, and Sarfraz Khurshid. Generating representation invariants of structurally complex data. Tools and Algorithms for the Construction and Analysis of Systems, pages 34–49, 2007. [8] Derek Rayside, Zev Benjamin, Rishabh Singh, Joseph P. Near, Aleksandar Milicevic, and Daniel Jackson. Equality and hashing for (almost) free: Generating implementations from abstraction functions. 2009 IEEE 31st International Conference on Software Engineering, pages 342–352, 2009.

10

[9] Kuat T Yessenov. A Lightweight Specification Language for Bounded Program Verification by. PhD thesis, 2009.

11

Generating Invariants for Linked Heap structures

Generating Invariants for Linked Heap structures

Suggest Documents

Generating Loop Invariants for Program

Generating Polynomial Invariants for Hybrid Systems - CiteSeerX

Dynamical Equations, Invariants and Spectrum Generating Algebras ...

Generating Discourse Structures for Written Texts - CiteSeerX

Generating Discourse Structures for Written Texts - CiteSeerX

Generating functions for K-theoretic Donaldson invariants and Le ...

Generating Heap-bounded Programs in a ... - Computer Science

projective invariants of projective structures and applications

projective invariants of projective structures and applications

LNCS 3175 - Invariants for Discrete Structures - Computer Vision

Semantic Message Passing for Generating Linked ... - UMBC ebiquity

SEMashup: Making Use of Linked Data for Generating Enhanced ...

Generating Property-Directed Potential Invariants By Backward ... - arXiv

Geometry of generating functions and Lagrangian spectral invariants

Generating systems of differential invariants and the ... - Tubitak Journals

Correctness of Data Representations involving Heap Data Structures

Generating global network structures by triad types

Towards Generating Text from Discourse Representation Structures

Organic-inorganic cross-linked structures prepared

Data Flow Analysis for Software Prefetching Linked Data Structures ...

Push vs. Pull: Data Movement for Linked Data Structures*

A Decidable Logic for Describing Linked Data Structures Michael ...

Push vs. Pull: Data Movement for Linked Data Structures - CiteSeerX

heap allocation