A New Abstraction Framework for Affine Transformers - Minds @ UW

40 downloads 0 Views 494KB Size Report
ё d1¤i,j¤n : MPri, js castQpΣ1¤l¤nMri, ls ¤Mrl, jsq (by Axiom 3) ё castQpMq ¢castQpMIq castQpM ¢MIq. СТ. Lemma 5. A convex combination of a set of rationals ...
A New Abstraction Framework for Affine Transformers Tushar Sharma1 and Thomas Reps1,2 1



University of Wisconsin; Madison, WI, USA 2 GrammaTech, Inc.; Ithaca, NY, USA

Abstract. This paper addresses the problem of abstracting a set of Ñv 1  ÝÑv  C ÝÑd , where ÝÑv and ÝÑv 1 represent the preaffine transformers Ý state and post-state, respectively. We introduce a framework to harness any base abstract domain B in an abstract domain of affine transformations. Abstract domains are usually used to define constraints on the variables of a program. In this paper, however, abstract domain B is reÝÑ purposed to constrain the elements of C and d —thereby defining a set of affine transformers on program states. This framework facilitates intraand interprocedural analyses to obtain function and loop summaries, as well as to prove program assertions.

1

Introduction

Most critical applications, such as airplane and rocket controllers, need correctness guarantees. Usually these correctness guarantees can be described as safety properties in the form of assertions. Verifying an assertion amounts to showing that the assertion holds true for all possible runs of an application. Proving an assertion is, in general, an undecidable problem. Nevertheless, there exist staticanalysis techniques that are able to verify automatically some kinds of program assertions. One such technique is abstract interpretation [3], which soundly abstracts the concrete executions of the program to elements in an abstract domain, and checks the correctness guarantees using the abstraction. In this paper, we provide analysis techniques to abstract the behavior of the program as a set of affine transformations over bit-vectors. An affine transformer Ýd , where Ñ Ýv 1  Ñ Ýv  C Ñ Ýv 1 and Ñ Ýv are row is a relation on states, defined by Ñ vectors that represent the post-transformation state and the pre-transformation 

Supported, in part, by a gift from Rajiv and Ritu Batra; by DARPA under cooperative agreement HR0011-12-2-0012; by NSF under grant CCF-0904371; DARPA MUSE award FA8750-14-2-0270 and DARPA STAC award FA8750-15-C-0082; and by the UW-Madison Office of the Vice Chancellor for Research and Graduate Education with funding from the Wisconsin Alumni Research Foundation. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors, and do not necessarily reflect the views of the sponsoring agencies.  T. Reps has an ownership interest in GrammaTech, Inc., which has licensed elements of the technology discussed in this publication.

2

Tushar Sharma and Thomas Reps

Ñ Ý

state, respectively. C is the linear component of the  transformation and d is 10 1 1 x y x y s  r s 2 0 r 10 0 s denotes the affine a constant vector. For example, r transformation px1  x 2y 10 ^ y 1  0q over variables tx, y u. We denote an Ñ Ý affine transformation by C : d . The paper is based on the following observation:

Observation 1 Abstract domains are usually used to define constraints on the variables of a program. However, they can be re-purposed to constrain the eleÑ Ý ments of C : d —thereby defining a set of affine transformers on program states. The need for abstraction over affine transformers. Abstractions of affine transformers can be used to obtain affine-relation invariants at each program point in the program [12]. An affine relation is a linear-equality constraint between numeric-valued variables of the form

n °



a i vi

b

 0. For a given set of

variables tvi u, affine-relation analysis (ARA) identifies affine relations that are invariants of a program. The results of ARA can be used to determine a more precise abstract value for a variable via semantic reduction [4], or detect the relationship between program variables and loop-counter variables. Furthermore, when the abstract-domain elements are abstractions of affine transformers, abstract interpretation can be used to provide useful function summaries or loop summaries [2, 18]. In principle, summaries can be computed offline for large libraries of code so that client static analyses can use them to provide verification results more efficiently. Previous work [6] compared two abstract domains for affine-relation analysis over bitvectors: (i) an affine-closed abstraction of relations over program variables (AG), and (ii) an affine-closed abstraction of affine transformers over program variables (MOS). M¨ uller-Olm and Seidl [13] introduced the MOS domain, whose elements are the affine-closed sets of affine transformers. An MOS element can be represented by a set of square matrices. Each matrix T is an Ñ Ý affine transformer of the form T  01 Cd , which represents the state transfori 1

Ñ Ý

Ýv s T . In [6], the authors Ýv 1 : Ñ Ýv  C d , or, equivalently, 1|Ñ Ýv 1 : r1|Ñ mation Ñ observe that the MOS domain can encode two-vocabulary relations that are not affine-closed even though the affine transformers themselves are affine closed. (See §2.5 for an example.) Thus, moving the abstraction from affine relations over program variables to affine relations over affine transformations possibly offers some advantages because it allows some non-affine-closed sets to be representable. While the MOS domain is useful for finding affine-relation invariants in a program, the join operation used at confluence points can lose precision in many cases, leading to imprecise function summaries. Furthermore, the analysis does not scale well as the number of variables in the vocabulary increases. In other words, it has one baked-in performance-versus-precision aspect. 



Problem Statement. Our goal is to generalize the ideas used in the MOS domain—in particular, to have an abstraction of sets of affine transformers—but to provide a way for a client of the abstract domain to have some control over

A New Abstraction Framework for Affine Transformers

3

the performance/precision trade-off. Toward this end, we define a new family of numerical abstract domains, denoted by ATArB s. (ATA stands for AffineTransformers Abstraction.) Following Obs. 1, ATArB s is parameterized by a base numerical abstract domain B, and allows one to represent a set of affine transformers (or, alternatively, certain disjunctions of transition formulas). Summary of the Approach. Let the pk k 2 q-tuple pd1 , d2 , . . ., dk , c11 , c12 , .

. ., k ™

 ° pcij vi q dj , j 1 i1 Ñ Ý also written as “C : d .” The key idea is that we will use pk k 2 q symbolic constants to represent the pk k 2 q coefficients in a transformation of the form C : Ñ Ý c1k , c21 , c22 , ..., ckk q denote the affine transformation

vj1

k

d , and use a base abstract domain B—provided by a client of the framework—to represent sets of possible values for these symbolic constants. In particular, B is an abstract domain for which, for all b P B, γ pbq is a set of pk k 2 q-tuples—each tuple of which provides values for tdi u Y tcij u, and can thus be interpreted as Ñ Ý an affine transformation C : d . š this approach, a given b P B represents the disjunction tpC : Ñ Ýd q With P γ pbqu. When B is a non-relational domain, each b P B constrains the values of tdi u Y tcij u independently. When B is a relational domain, each b P B can impose intra-component constraints on the allowed tuples pd1 , d2 , . . . , dk , c11 , c12 , . . . , c1k , c21 , c22 , . . . , ckk q. ATArB s generalizes the MOS domain, in the sense that the MOS domain is exactly ATArAGs, where ° AG is a relational abstract domain that captures affine equalities of the form i ai xi  b, where ai , b P Z2w [8, 6] (see §2.4). For instance, an element in ATArAGs can capture the set of affine transformers “x1  k1 x k1 y k2 , where k1 is odd, k2 is even, and k1 is the coefficient of both x and pk k2 q y.” On the other hand, an element in the abstract domain ATArIZ2w s, where

pk

q

IZ2w is the abstract domain of pk k 2 q-tuples of intervals over bitvectors, can capture a set of affine transformers such as x1  k3  x k4  y k5 , where k3 P r0, 1s, k4 P r2, 2s, and k5 P r0, 10s. This paper addresses a wide variety of issues that arise in defining the ATArB s framework, including describing the abstract-domain operations of ATArB s in terms of the abstract-domain operations available in the base domain B. k2

Contributions. The overall contribution of our work is the framework ATArB s, for which we present – methods to perform basic abstract-domain operations, such as equality and join. – a method to perform abstract composition, which is needed to perform abstract interpretation. – a faster method to perform abstract composition when the base domain is non-relational. §2 introduces the terminology used in the paper; and presents some needed background material. §3 demonstrates the framework with the help of an example. §4 formally introduces the parameterized abstract domain ATArB s. §5 provides discussion and related work. Proofs are given in App. A and App. B.

4

Tushar Sharma and Thomas Reps

2

Preliminaries

All numeric values in this paper are integers in Z2w for some bit width w. That is, values are w-bit machine integers with the standard operations for machine addition and multiplication. Addition and multiplication in Z2w form a ring, not a field, so some facets of standard linear algebra do not apply. Throughout the paper, k is the size of the vocabulary V  tv1 , v2 , .., vk u— Ýv to denote the vector rv1 v2 ..vk s i.e., the variable-set under analysis. We use Ñ of variables in vocabulary V . A two-vocabulary relation RrV ; V 1 s is a transition relation between values of variables in the pre-state vocabulary V and values of variables in the post-state vocabulary V 1 . For instance, a transition relation RrV ; V 1 s in the concrete collecting semantics is a subset of Zk2w  Zk2w (which is isomorphic to Z2k 2w ). Matrix addition and multiplication are defined as usual, forming a matrix ring. We denote the transpose of a matrix M by M t . A one-vocabulary matrix is a matrix with k 1 columns. A two-vocabulary matrix is a matrix with 2k 1 columns. In each case, the “ 1” is related to the fact that we capture affine rather than linear relations. In denotes the n  n identity matrix. Given a matrix C, we use C ri, j s to refer to the entry at the i-th column and j-th row of C. Given Ñ Ý Ñ Ý Ñ Ý a vector d , we use d rj s to refer to the j-th entry in d . 2.1

Affine Programs

xBlocky xN exty xCondy xOpy xExpry xStmty

2.2

:: l : pxStmty ;q xN exty :: jump l; | jump xCondy ? l1 : l2 :: ? | xExpry Op xExpry ::  |  | ¥ | ¤ :: c0

k °



ci  v i

:: vj : xExpry | vj : ? i 1

We borrow the notion of affine programs from [13]. We restrict our affine programs to consist of a single procedure. The statements are restricted to either affine assignments or non-deterministic assignments. The controlflow instruction consists of either an unconditional jump statement, or a conditional jump with an affine equality, an affine disequality, an affine inequality, or unknown guard condition.

Abstract-Domain Operations

The two important steps in abstract interpretation (AI) are: 1. Abstraction: The abstraction of the program is constructed using the abstract domain and abstract semantics. 2. Fixpoint analysis: Fixpoint iteration is performed on the abstraction of the program to identify invariants. For the purpose of our analysis, the program is abstracted to a control-flow graph, where each edge in the graph is labeled with an abstract transformer. An abstract transformer is a two-vocabulary transition relation RrV ; V 1 s. Concrete states described by an abstract transformer are represented by row vectors of

A New Abstraction Framework for Affine Transformers

5

Table 1. Abstract-domain operations. Type Operation Description A K bottom element bool pa1  a2 q equality

Type Operation A αpvj :?q

A

A

A A

pa1 \ a2 q pa1 ∇a2 q

join widen identity element

Id

A

αpvj : c0

Description abstraction for nondeterministic assignments k °



cij

i 1

 vi q abstraction for affine assignments composition

pa1  a2 q

length 2k. A (two-vocabulary) concrete state is sometimes called an assignment to the variables of the pre-state and the post-state vocabulary. Tab. 1 lists the abstract-domain operations needed to generate the program abstraction and perform fixpoint analysis on it. Bottom, equality, and join are standard abstract-domain operations. The widen operation is needed for domains with infinite ascending chains to ensure termination. The two operations of the form αpStmtq perform abstraction on an assignment statement Stmt to generate an abstract transformer. ™k Id is the identity element; which represents the identity transformation p i1 vi1  vi q. Finally, the abstract-composition operation a1  a2 returns a sound overapproximation of the composition of the abstract transformation a1 with the abstract transformation a2 . 2.3

The M¨ uller-Olm/Seidl Domain

An element in the M¨ uller-Olm/Seidl domain (MOS) is an affine-closed set of affine transformers, as detailed in [13]. An MOS element is represented by a set of pk 1q-by-pk 1q matrices. Each matrix T is a one-vocabulary transformer of 1 b Ýv 1 : Ñ Ýv  M b, the form T  0 M , which represents the state transformation Ñ

Ýv s T . Ýv 1 : r1|Ñ or, equivalently, 1|Ñ An MOS element M, consisting of a set of matrices, represents the def affine denoted by hMi. hMi is defined as follows: hMi  !  span of the set, ) °  |M| T  Dw P Z2w : T  M PM wM M ^ T1,1  1 . The meaning of M is the 



union of the graphs of the affine transformers in (hMi. Thus, γMOS pMq Ýv , Ñ Ýv 1 q  Ñ Ýv , Ñ Ýv 1 P Zk2w ^ DT P hMi : r1|vs T  r1|v1 s . pÑ

Example 1. If w affine span hMi





"

 

*

102 , 010 represents the 4, the MOS element M 000 "         * 10 0 102 104 1 0 12 1 0 14 0 1 0 , 0 1 0 , 0 1 0 ,..., 0 1 0 , 01 0 , which 00 0 000 000 00 0 00 0



100 010 000



def

corresponds to the transition relation in which v11 and v21 can have any even value. l

 v1 , v2 can have any value,

Tab. 2 gives the abstract-domain operations for the MOS domain. The bottom element of the MOS domain is the empty set H, and the MOS element that represents the identity relation is the singleton set tI u. The equality check can be done by checking if the span of the matrices in the two values is equal.

6

Tushar Sharma and Thomas Reps

[6] provides an normal form for the MOS domain, which can be used to reduce the equality check to syntactic equality checks on the matrices in M1 and M2 . The widening operation is not applicable to MOS because it is a finiteheight lattice. The abstraction operation for the affine-assignment statement k °

αpvj : d0



cij  vi q gives back an MOS-element with a single matrix where

every variable v P V tvj u is left unchanged, and the variable vj is transformed to reflect the assignment by updating the corresponding column in the matrix with the assignment coefficients. The abstraction operation for the non-deterministic assignment statement αpvj :?q gives back an MOS-element containing two matrices. Similar to the abstraction for affine assignment operation, every variable v P V  vj is left unchanged in both the matrices. vj is set to 0 in the first and and 1 in the second matrix. The affine-closed set of these two matrices ensures that vj is assigned to non-deterministically. The abstract-composition operation perform multiplication for each pair of the matrices in M1 and M2 . i 1

Table 2. Abstract-domain operations for the MOS-domain. Type Operation A KMOS bool pM1  M2 q A p M1 \ M2 q A pa1 ∇a2 q A

αpvj : d0

A

αpvj :?q

A A

Id p M1  M2 q

Description

H

hM1 i  hM2 i M 1 Y M2 not applicable

$ , ' / d0 0 / ' & 1 0 .    cij  vi q ' 0 Ij1 rc1j , c2j , ...cpj1qj st 0 /   / i1 cjj 0 ' % 00 00 rc j pj 1qj, cpj 2qj , ...ckj st Ik, $ ' / ' & 1 0 0 0   1 0 1 0  / .  0 Ij1 0 0  ,  0 Ij1 0 0  ' /     ' / 0 0 0 0 % 0 0 0 0 k °

0

2.4

0

0 Ik  j

tIk 1 u tA2 A1 |Ai P Mi u

0

0

0 I k j

The Affine-Generator Domain

Ýv ; Ñ Ýv 1 s) is a two-vocabulary An element in the Affine Generator domain (AGrÑ matrix whose rows are the affine generators of a two-vocabulary relation over Ýv . An AGrÑ Ýv ; Ñ Ýv 1 s element is an r-by-p2k 1q matrix G, with 0   r ¤ variables Ñ Ýv ; Ñ Ýv 1 s element is 2k 1. The concretization of an AGrÑ γAG pGq

Ýv , Ñ Ýv 1 q | Ñ Ýv , Ñ Ýv 1 P Zk2 ^ 1|v v1  P row G( .  pÑ

def

w

The row space of a matrix G is defined by row G  tr | Dw : wG  ru. Ýv ; Ñ Ýv 1 s domain captures all two-vocabulary affine spaces, and treats The AGrÑ them as relations between pre-states and post-states. def

A New Abstraction Framework for Affine Transformers

7

The bottom element of the AG domain is the empty matrix, and the Ýv Ñ Ýv 1 1 Ñ

Ýv ; Ñ Ýv 1 s element that represents the identity relation is the matrix AGrÑ 1 1  1 v1 v2 v1 v2 

The AGrtv1 , v2 u; tv11 , v21 us element

1

 1 1

0 1 0 0

0 0 1 0

0 1 0 0

0 0 0 2

 

1

0 1 I

0 I



.

represents the transition rela-

tion in which v11  v1 , v2 can have any value, and v21 can have any even value. To compute the join of two AG elements, stack the two matrices vertically and get the canonical form of the result [6, §2.1]. 1

2.5

Relating MOS and AG

There are two ways to relate the MOS and AG domains. One way is to use them as abstractions of two-vocabulary relations and provide (approximate) interconversion methods. The other is to use a variant of the AG domain to represent the elements of the MOS domain exactly. Comparison of MOS and AG elements as abstraction of twovocabulary relations. As shown in [6, §4.1], the MOS and AG domains are incomparable: some relations are expressible in each domain that are not expressible in the other. Intuitively, the central difference is that MOS is a domain of sets of functions, while AG is a domain of relations. AG can capture restrictions on both the pre-state and post-state vocabularies, while MOS can capture restrictions only on its post-state vocabulary. Example 2. For example, when k  1, the AG element for “assume x  2”  1 x x1  is 1 2 2 , i.e., “x  2 ^ x1  2”. In contrast, there is no MOS element that represents x  2 ^ x1  2. The smallest MOS that over-approximates ) ! element 10 . [\ “assume x  2” is the identity transformer 01 On the other hand, the MOS-domain can encode two-vocabulary relations that are not affine-closed. Example 3. One example is the matrix basis M



"

100 011 000

 

,

100 000 011

*

. The set

that M encodes is

$  ,   '   1 x y   10 w0 w0    1 x1 y1  / & .  D w , w : 0 1 0 0 x y x1 y 1  γMOS pM q  0 w w / ' 1 1  % ^ w0 w1  1    (  x y x1 y1  Dw0 : x1  y1  w0 x p1  w0 qy  x y x1 y1   Dw0 : x1  y1  x p1  w0 qpy  xq(  x y x1 y1   Dp : x1  y1  x ppy  xq(

(1)

Affine spaces are closed under affine combinations of their elements. Thus, γMOS pM q is not an affine space because some affine combinations of its    elements are not in γMOS pM q. For instance, let a  1 1 1 1 , b  2 2 6 6 , and c 

8



Tushar Sharma and Thomas Reps



0 0 4 4 . By Eqn. (1), we have a P γMOS pM q when p  0 in Eqn. (1), b P γMOS pM q when p  1, and c R γMOS pM q (the equation “4  0 pp0  0q” has no solution for p). Moreover, 2a  b  c, so c is an affine combination of a and b. Thus, γMOS pM q is not closed under affine combinations of its elements, \[ and so γMOS pM q is not an affine space. Soundly converting an MOS element M to an overapproximating AG element is equivalent to stating two-vocabulary affine constraints satisfied by M [6, §4.2]). Reformulation of MOS elements as AG elements. An MOS element M  tM1 , M2 , ..., Mn u represents the set of pk 1qpk 1q matrices in the affine closure of the matrices in M . Each matrix can be thought of as a pk 1qpk 1q vector, and hence M can be represented by an AG element of size n ppk 1qpk 1qq. Example 4. Tab. 3 shows the two ways MOS and AG elements can be related. Column 1 shows the MOS element M from Ex. 3, which represents the set of matrices in the affine closure of the two pk 1q  pk 1q matrices, with k  2. The second column gives the AG element A1 (a matrix with 2k 1 columns) representing the affine-closed space over tx, y, x1 , y 1 u satisfied by M . Consequently, γAG pA1 q … γMOS pM q. Column 3 shows the two matrices of M as the 2  ppk 1q  pk 1qq AG element A2 . Because A2 is just a reformulation of M , γAG pA2 q  γMOS pM q. \[ Table 3. Example demonstrating two ways of relating MOS and AG. MOS element (M )

Overapproximating Reformulation as abstraction AG element (A1 ) over affine transformers (A2 ) , x y / / 1 x y x1 y1 1 a01 a02 a10 a11 a12 a20 a21 a22

$ 1xy 1 ' ' & 1 0 0 1 0 0 . 1    0 1 1 , 0 0 0 1 ' 0 1 1 / ' / % 000 -

3

0 0 0 0 0 0 1 1



1

0 1 0

0 0

0 0

1 0

1 0

0 0

0 1

0 1



Overview

In this section, we motivate and illustrate the ATArB s framework, with the help of several examples. The first two examples illustrate the following principle, which restates Obs. 1 more formally:

Ñ Ý

Observation 2 Each affine transformation C : d in a set of affine transformations involves pk 1q2 coefficients P Z2w : p1, d1 , d2 , . . . , dk , 0, c11 , c12 , . . . , 0, c21 , ...ckk q.3 Thus, we may use any abstract domain whose elements concretize pk 1q2 as a method for representing a set of affine transformers. to subsets of Z2w

\[

3

k of the coefficients are always 0, and one coefficient is always 1 (i.e., the first column is always p1| 0 0 ... 0qt ). For this reason, we really need only k k2 elements, but we will sometimes refer to pk 1q2 elements for brevity.

A New Abstraction Framework for Affine Transformers

9

Example 5. The AG element A2 in column 3 of Tab. 3 illustrates how an AG element with pk 1q2 columns represents the same set of affine transformers as the MOS element M shown in column 1. For instance, the first row of A2 represents the first matrix in M . \[ Example 6. Consider the element E  pr1, 1s, r0, 10s, r0, 0s, r0, 0s, r1, 1s, r2, 3s, r0, 0s, r0, 0s, r1, 1sq of IZ92w . E can be depicted more mnemonically as the following matrix:  r1,11s r0,x10s r0,y0s   r0, 0s r1, 1s r2, 3s , where every element in E is an interval pIZ2w q. E represents r0, 0s r0, 0s r1, 1s the point set tpx1 , y 1 , x, y q : Di1 , i2 P Z2w : x1  x i1 ^ y 1  i2 x y ^ 0 ¤ i1 ¤ 10 ^ 2 ¤ t2

¤ 3u.

\[

Examples 5 and 6 both exploit Observation 2, but use different abstract domains. Ex. 5 uses the AG domain with pk 1q2 columns, whereas Ex. 6 uses pk 1q2 the domain IZ2w . In particular, an abstract-domain element in our framework Ýd , such that the allowed Ýv 1  Ñ Ýv  C Ñ ATArB s is a set of affine transformations Ñ Ñ Ý coefficients in the matrix C and the vector d are abstracted by a base abstract domain B. The remainder of this section shows how different instantiations of Observation 2 allow different properties of a program to be recovered. Example 7. In this example, the variable r of function f is initialized to 0 and conditionally incremented by 2x inside a loop with 10 iterations. The exact function summary for function ENT: int f(int x) { L0: int i = 0, r = 0; f , denoted by Sf , is pDk.r1  2kx ^ 0 ¤ L1: while(i