l Introduction - CiteSeerX

0 downloads 0 Views 328KB Size Report
plicable crashes. Many crashes can be traced to buffer overruns [14]. .... does not have a mechanism for dealing with possible changes to a buffer - only ..... Let P½ be an equilateral triangle, P¾ a hexagon, P¿ a dodecagon and so forth ..... does not bar overruns and any overrun has the potential of side-effecting a variable ...
Analyzing String Bu ers in C Axel Simon and Andy King Computing Laboratory, University of Kent, Canterbury, CT2 7NF, UK A bu er overrun o

urs in a C program when input is read into a bu er whose length ex eeds that of the bu er. Overruns often lead to rashes and are a widespread form of se urity vulnerability. This paper des ribes an analysis for dete ting overruns before deployment whi h is

onservative in the sense that it lo ates every possible bu er overrun. The paper details the subtle relationship between overrun analysis and pointer analysis and explains how bu ers an be modeled with a linear number of variables. As far as we know, the paper gives the rst formal a

ount of how this software and se urity problem an be ta kled with abstra t interpretation, setting it on a rm, mathemati al basis.

Abstra t.

1 Introdu tion Although C programs are ubiquitous, o

urring in many se urity- riti al, safety riti al and enterprise- riti al appli ations, they are parti ularly prone to inexpli able rashes. Many rashes an be tra ed to bu er overruns [14℄. A bu er overrun o

urs when input is read into a bu er and the length of the input ex eeds the length of the bu er. Bu er overruns typi ally arise from unbounded string opies using sprintf(), str py() and str at(), as well as loops that manipulate strings without an expli it length he k in the loop invariant [15℄. Un he ked input whi h over ows a bu er an overwrite portions of the sta k frame. Ha kers have exploited this e e t to redire t the instru tion pointer in the sta k frame to mali ious ode within the string and thereby gain partial or total ontrol of a host [17℄. Bu er overruns are a parti ularly widespread lass of se urity vulnerability [5℄ and the National Se urity Agen y predi ts that overrun atta ks will ontinue to be a problem for at least the next de ade [20℄. Finding se urity faults, su h as string handling that may overrun, has been likened to the problem of nding a needle in a hay-sta k [11℄. This paper des ribes an abstra t interpretation s heme that does not attempt to nd the needle, but rather aims to prune the hay-sta k down so that it an be sear hed manually.

1.1 Design spa e for overrun analysis One (surprising) tradeo that is applied in program analysis is that of sa ri ing soundness for tra tability [24℄. In the ontext of dete ting overruns, one potential tradeo in the design-spa e is that of a hieving simpli ity by, say ignoring aliasing, at the ost of possibly missing real errors and possibly reporting spurious errors. Whether this is a

eptable depends on whether the analysis aims

to dete t most errors [12, 13, 23℄ (debugging) or dete t every error [8℄ (veri ation). One point in this design-spa e is represented by LCLint. LCLint is an annotation-assisted stati he king tool founded on the philosophy that it is ne to \a

ept a solution that is both unsound and in omplete" [12℄. This enables, for example, loops to be analyzed straightforwardly using heuristi s. Its annotation me hanism a hieves both ompositionality and s alability. However, as [12, 13℄ point out, it seems optimisti to believe that a programmer will add annotations to their programs to dete t overrun vulnerabilities. Another point in the design spa e is the onstraint based analysis of [23℄ whi h an dete t potential overruns in unannotated (lega y) C ode. This analysis ignores pointer arithmeti and is therefore unsound, but it is fast enough to reason about medium-sized appli ations without assistan e. For speed and simpli ity, the analysis is ow insensitive whi h means that it annot reason about (non-idempotent) library fun tions su h as str at() whi h are a frequent sour e of overruns [15℄. The work of Dor, Rodeh and Sagiv [8℄ is unique in that it attempts to formulate an analysis that is onservative in the sense that it never misses a potential overrun. This is a parti ularly laudable goal in the ontext of se urity where there is a history of elite ha kers levering in onspi uous and inno uous features into major se urity holes. Our work builds on the foundation of Dor et al [8℄ and is a systemati examination of the abstra tions and analyses ne essary to reason about overruns in a truly onservatively way.

1.2 Conservative overrun analysis The bu er abstra tion of Dor et al [8℄ is essentially that of Wagner [23℄ but spe i ed as a program transformation. Ea h bu er is abstra ted by its size, allo , and the position of the null hara ter, len . This two-variable model, however, is not ri h enough to a

urately tra k the null position. Suppose, for example, if two pointers p and q point to di erent positions within the same bu er, then altering the len attribute of p (by writing a null hara ter to the bu er) might alter the len attribute of q . Dor et al [8℄ therefore introdu e spe ial p overlaps q variables to quantify the pointer o set and thereby model string length intera tion. This ta ti potentially in reases the number of variables quadrati ally whi h is unfortunate sin e relational numeri abstra tions su h as polyhedra [4℄ be ome less tra table as the number of variables in rease. This is an eÆ ien y issue. More subtle is the orre tness issue that relates to pointers that de nitely or possibly point to the same bu er. This is illustrated in the following ode: 1 2 3 4 5 6 7

har *p, *q, s[32℄, t[32℄; str py(s,"Boat "); /* str py(t,"Aero"); /* p = t+4; /* str at(p, "plane "); /* if (rand()) q=s; else q=t;/* str at(q,"to R eunion"); /*

s[5℄ */ s[5℄ t[4℄ */ p[4℄ s[5℄ t[4℄ */ p[10℄ s[5℄ t[10℄ */ p[10℄ q[5,10℄ s[5℄ t[10℄ */ p[10,20℄ q[15,20℄ s[5,15℄ t[10,20℄*/

The omments indi ate the possible null positions at the various lines. The str at in line 7 either updates the s bu er or the t bu er (but not both) depending on the random value. The transformation des ribed by Dor et al [8℄ does not have a me hanism for dealing with possible hanges to a bu er { only de nite hanges are tra ked. In parti ular the transformation does not orre tly re e t that the null is either at position 5 or 15 in the s bu er. Our remedy to this is to infer that q possibly shares with s and t so that both 5 and 15 are possible null positions for s and symmetri ally both 10 and 20 are potential null positions for t. Thus possible sharing information is used as an aid to orre tness. Conversely, de nite sharing information is used to in rease pre ision. It enables a write to a bu er through one pointer to destru tively update the null positions for those pointers that de nitely share the same bu er. For example, the str at at line 5 hanges the null position of the t bu er from 4 to 10. The information that t and p de nitely share is needed to infer that the null position in t is no longer 4. Moreover, possible sharing is also required to dedu e that no other bu ers are a e ted. To summarize, our work makes the following ontributions:

{ It gives the rst formal a

ount of bu er overrun analysis. A bu er abstra -

tion map is proposed that spe i es how to model a bu er. This abstra tion is then used to formalize the orre tness of the analysis. This means that a programmer an be on dent that every possible overrun is dete ted. { It shows that any string bu er overrun analysis that is both a

urate and

orre t needs to reason about pointer aliasing. Inter-pro edural points-to analysis [1, 21℄ is required to as ertain whi h pointers possibly point to the same bu er. The paper also shows how to improve pre ision by employing a de nite sharing analysis [9℄. { It explains how bu ers an be modeled with three variables per pointer and shows how the need for p overlaps q is nessed by sharing analysis, thereby avoiding quadrati blowup. The paper thus des ribes a pra ti al foundation for analyzing string bu ers.

Se tion 2 introdu es the language String C to formalize operations on string bu ers. Se tion 3 explains how polyhedra and sharing domains an be used to des ribe bu er properties, and then Se tion 4 explains how these properties an be tra ked by analysis. Se tions 5 and 6 and present the related and future work and Se tion 7 on ludes. Proofs are omitted be ause of la k of spa e.

2

The String C language and its semanti s

The analysis is formulated in terms of a C-like mini language alled String C that expresses the essential operations on bu ers.

2.1 Abstra t syntax Let (n 2) N denote the set of non-negative integers and (l 2) Lab denote a set of labels whi h mark program points. Let (x; y 2) X and (f 2) F denote the

[skip℄  ` hskip; i! [num℄  ` hx = n; i!:[(x) 7! n℄ [str℄



` hx = "n1 : : : n "; 1 i! 2 :[(x) 7! ha; a; a + m + 1i℄ m

[var℄  ` hx = y; i!:[(x) 7! :(y)℄ [add℄  ` hx = y1 + y2 ; i!:[(x) 7! v℄ [sub℄  ` hx = y1 y2 ; i!:[(x) 7! v℄

if n 2 N if n1 ; : : : ; nm 2 N ^ [a; a + m + 1℄ \ dom(m1 ) =1 ; ^ 2 = 1 :[a + i 7! ni+1 ℄i=0 : [a + m 7! 0℄

if :(yi ) = vi ^ v1  v2 = v if :(yi ) = vi ^ v1 v2 = v  if 1 :(yi ) = vi 2 if ab  a < ae ^ v1  v2 = hab; a; ae i [arr1 ℄  ` hx = y1 [y2 ℄; 1 i! err otherwise ^ 2 = 1 :[(x) 7! 1 (a)℄  if 1 :(yi ) = vi ^ 1 :(x) 2 N  if ab  a < ae [arr2 ℄  ` hy1 [y2 ℄ = x; 1 i! 2 ^ v1  v2 = hab ; a; ae i err otherwise ^ 2 = 1 :[a 7! 1 :(x)℄ if :(y) = v ^ [b; b + v℄ \ dom() = ; [mallo ℄  ` hx = mallo (y); 1 i!1 :2 :3 ^ 2 = [b + i 7! rand()℄vi=0 ^ 3 = [:(x) 7! hb; b; b + vi℄ Con rete semanti s for simple statements. The fun tion rand() returns random values and models mallo 's behavior to allo ate uninitialized memory Fig. 1.

( nite) sets of variables and fun tion names o

urring in a program. F in ludes the symbol main to indi ate the program entry point and ea h fun tion f 2 F has an asso iated variable fr 2 X to return values from fun tions in Pas al-style.

C ::=f (x1 ; : : : ; xn )L; C j " L ::=[skip℄l j [x = n℄l j [x = "n1 : : : nm "℄l j [x = y℄l j [x = y1 + y2 ℄l j[x = y1 y2 ℄l j [x = y1 [y2 ℄℄l j [y1 [y2 ℄ = x℄l j [x = mallo (y)℄l j[if x then L1 else L2 ℄l j [while x L℄l j [L1 ; L2 ℄l j [return℄l j [x = f (y1 ; : : : yn )℄l The language L(C ) de nes the set of valid String C programs. Given a xed program, the map : F ! ([i2N X i )  L(L) retrieves the formal arguments hx1 ; : : : ; xn i and body L for a fun tion f . String C does not distinguish between

integers and unsigned hars be ause reasoning about bu ers of di erent sized obje ts in reases the number of ases in the abstra t semanti s and obs ures the underlying ideas. String C an be enri hed following the ideas detailed in [1, 18℄.

2.2 Instrumented operational semanti s The semanti s of String C instruments ea h pointer with details of its underlying bu er. Spe i ally, the set of pointer values is de ned as Pnt = fhab ; a; ae i 2 N 3 j ab < ae g. The interpretation of a triple hab ; a; ae i is that the ab and ae re ord the rst lo ation within the bu er and the rst lo ation past the end of the bu er. The address a re ords the urrent position of the pointer within the

[if1 ℄  ` hif x then L1 else L2 ; i!hL1 ; i if :(x) 6= 0 [if2 ℄  ` hif x then L1 else L2 ; i!hL2 ; i if :(x) = 0 [while1 ℄  ` hwhile x L; i!hL; while x L; i if :(x) 6= 0 [while2 ℄  ` hwhile x L; i!  if :(x) = 0  ` hL1 ; 1 i!hL2 ; 2 i [seq1 ℄  ` hL1 ; L3 ; 1 i!hL2 ; L3 ; 2 i  ` hL1 ; 1 i! 2 [seq2 ℄  ` hL1 ; L2 ; 1 i!hL2 ; 2 i 2 ` hL1 ; 1 i! hL2 ; 2 i [env1 ℄ 1 ` henv 2 in L1 ; 1 i!henv 2 in L2 ; 1 i 2 ` hL; 1 i! 2 [env2 ℄ 1 ` henv 2 in L; 1 i!2 [ret℄  ` hreturn; L; i!  if (f ) = hz1 ; : : : ; zn ; Li 1 ` hx = f (y1 ; : : : ; yn ); 1 i! ^ 2 = [zi 7! ai ℄ni=1 [ all℄ henv 1 :2 :[fr 7! (x)℄ in L; 2 :1 i ^ dom(1) \ rng(2 ) = n; ^ 2 = [ai 7! 1 :1 (yi )℄i=1 Fig. 2.

Con rete semanti s for ontrol statements

bu er. For a valid bu er a

ess ab  a < ae must hold. The set of values is then de ned as Val = N [ Pnt whereas the set of environment and store maps are de ned Env = X ! N and Str = N ! Val respe tively. The following (partial) maps formalize address arithmeti .

De nition 1. The fun tions ; : Val 2 ! Val are de ned as follows:

i  j = i+j i  hbb ; b; be i = hbb ; b + i; be i hab ; a; aei  j = hab ; a + j; ae i hab ; a; aei  hbb ; b; be i = ?

i j = i j i hbb ; b; be i = ? hab ; a; ae i j = hab ; a j; ae i hab ; a; ae i hbb ; b; be i = a b

Figures 1 and 2 detail the operational semanti s for simple statements and the

ontrol statements. In Figure 2, dom(f ) and rng(f ) denote the domain and range of a mapping f and : denotes fun tion omposition de ned su h that g:f (x) = g(f (x)). The env statement introdu ed in Figure 2 models s oping.

3 Abstra t domains 3.1 Linear inequality (polyhedral) domain Let Y denote the variables fyP1 ; : : : ; yn g, let LinY denote the set of (possibly rearranged) linear P equalities ni=1 mi yiP= m and (possibly rearranged) nonstri t inequalities ni=1 mi yi  m and ni=1 mi yi  m where m; mi 2 Z. Let EqnY denote all nite subsets of LinY . Note that although ea h set E 2 EqnY P is nite, EqnY is not nite. De ne [ e℄ = fhx1 ; : : : ; xn i 2 R n j ni=1 mi xi } mg

where } 2 f; =; g. Then de ne [ E ℄ to be the polyhedron [ E ℄ = \f[ e℄ j e 2 Eg if E 2 EqnY . EqnY is ordered by entailment, that is, E1 j= E2 i [ E1 ℄  [ E2 ℄ . Equivalen e on EqnY is de ned E1  E2 i E1 j= E2 and E2 j= E1 . Let Poly Y = EqnY = . Poly Y inherits entailment j= from EqnY . In fa t hPoly Y ; j=; u; ti is a latti e (rather that a omplete latti e) with [E1 ℄ u [E2℄ = [E1 [ E2 ℄ and [E1 ℄ t [E2 ℄ = [E ℄ where [ E ℄ = l( onv ([[E1 ℄ [ [ E2 ℄ )) and

l(S ) and onv(S ) denote the losure and onvex hull of a set S 2 R n respe tively [19℄. Note that in general onv ([[E1 ℄ [ [ E2 ℄ ) is not losed and therefore annot be des ribed by a system of non-stri t linear inequalities as is illustrated below.

Example 1. Let Y = fx; yg, E1 = fx = 0; y = 1g and E2 = f0  x; x y = 0g so that [ E1 ℄ = fh0; 1ig and [ E2 ℄ = fhx; yi j 0  x ^ x = yg. These polyhedra are illustrated in the left-hand graph. Then onv ([[E1 ℄ [ [ E2 ℄ ) in ludes the point h0; 1i but not the ray fhx; yi j 0  x ^ x + 1 = yg and hen e is not losed. This

onvex spa e is depi ted in the right-hand graph.

y

3

 E

6

[[ 2 ℄℄

2

3

6



2

[[E1 ℄℄1 r 0

y

0

1

2

3

-x

1 r 0

0

1

2

3

-x

It is useful to augment u and t with three operations: proje tion, minimization and maximization. The minimization and maximization operations are respe P tively de ned min(hm1 ; : : : ; mn i; [E ℄ ) = minf ni=1 mi xi j hx1 ; : : : ; xn i 2 [ E ℄ g and max(hm The ve tor hm1 ; : : : ; mn i is 1 ; : : : ; mn i; [E ℄ ) is de ned analogously. P Pn written as niP =1 mi yi for brevity. Note that min( i =1 mi yi ; [E ℄ ) 2 R [ f 1g whereas max( ni=1 mi yi ; [E ℄ ) 2 R [ f+1g. Proje tion is de ned 9xi ([E ℄ ) = [E 0 ℄ where [ E 0 ℄ = fhx1 ; : : : ; xi 1 ; x; xi+1 ; : : : ; xn i j x 2 R ^hx1 ; : : : ; xn i 2 [ E ℄ g. For brevity, let 9yi1 ;:::;yim ([E ℄ ) = 9yim (: : : 9yi1 ([E ℄ ) : : :). The variable set for [E ℄ is de ned var([E ℄ ) = \fvar(E 0 ) j E  E 0 g where var(E 0 ) is the set of variables (synta ti ally) o

urring in E 0 . Let false = [f0 = 1g℄ so that [ false℄ = ;. Proje tion an be omputed using the Fourier algorithm [3℄, min and max using Simplex [3℄, u is straightforward to ompute whereas t an either be

al ulated using the point, ray andPline representation [4℄ or by using onstraint relaxation te hniques [7℄. Let s = ni=1 mi yi . Then entailment an be tested by: P j= fs < mg i max(s; P ) < m (even though fs < mg is stri t); P j= fs  mg i max(s; P )  m; P j= fs = mg i P j= fs  mg and P j= fs  mg. Moreover, P j= fe1 ; : : : ; ek g i P j= fe1 g . . . P j= fek g. Hen eforth, whenever unambiguous, [E ℄ will be written as E for brevity. Finally note that Poly Y does not satisfy the as ending hain ondition, that is, hains P1 j= P2 j= : : : exist for whi h ti>0 Pi does not exist. Example 2. Let P1 be an equilateral triangle, P2 a hexagon, P3 a dode agon and so forth su h that the verti es of Pi are ontained within those of Pi+1 . Then P1 j= P2 j= P3 : : : is an as ending hain whi h onverges onto a ir le whi h

itself is not a polyhedron. Thus widening is required to enfor e termination in polyhedral x-point al ulations [4℄.

3.2 Polyhedral bu er domain

Xn , Xo and Xs denote sets of variables su h that x 2 X i xn 2 Xn , xo 2 Xo and xs 2 Xs and suppose that X , Xn , Xo and Xs are pair-wise disjoint. Let PB X = Poly X [X [X [X . The latti e hPB X ; j=; u; ti is used to des ribe

Let

n

o

s

numeri bu er properties salient to overrun analysis. In parti ular, suppose an environment  maps the variable x 2 X to the address (x) = b and a store  maps b to a pointer triple (b) = hab ; a; ae i. Then the variable xs 2 Xs des ribes the number of lo ations between ab and ae (the size of the underlying bu er); xo 2 Xo represents the o set of a relative to ab (the position of x within the bu er); and xn 2 Xn aptures the rst lo ation between ab and ae whose

ontents is zero (the position of the null when the bu er is interpreted as a string). If P 2 PB X des ribes the pair h; i, then P aptures the numeri relationships between xs , xo and xn . This idea is formalized below.

De nition 2. The polyhedral on retization map XPB : PB X ! }(Env  Str) PB (P ) = fh; i j s (h; i) u bf (h; i) j= Pg where is de ned X X X

Xs (h; i) = fx = :(x) jx 2 X ^ :(x) 2 N g 9 8 ab ; x 2 X ^ :(x) = hab ; a; ae i ^ = < xo = a Xbf (h; i) = : xs = ae ab ; b = min(faeg [ fn 2 [ab ; ae 1℄ j (n) = 0g) ; xn = b a b If zero does not o

ur within the bu er then f n 2 N j ab  n < ae ^  (n) = 0g = ; so the fae g element is used to ensure the map is well-de ned. It also re ords the de nite absen e of a zero (and thereby enables ertain de nite errors PB : }(Env  Str) ! PB X annot be to be found). An abstra tion map X PB s synthesized from X sin e PB X is not omplete. In parti ular, tf X (h; i) u bf X (h; i)jh; i 2 Mg is not well de ned for arbitrary M 2 }(Env  Str). To put it another way, M does not ne essarily have a best polyhedral abstra tion. Tra king the rst zero (rather than, say, every zero) keeps the number of variables in the model low whi h aids simpli ity and omputational eÆ ien y. However, there are unusual ombinations of string operations where just tra ing the rst null an lead to a loss of pre ision and hen e false warnings. Example 3. Consider the following (syntheti ) C program that zeros the bu er lo ated by s, then and writes the bu er hara ter by hara ter.

har *s = mallo (10), t[4℄; memset(s,0,10); s[0℄='O'; s[1℄='k'; str py(t, s);

The all to memset will set sn = 0, i.e. the rst null position is 0. After the rst write the analysis an only dedu e sn  1 sin e the rst null (if it exists) must o

ur to the right of the 'O'. Likewise after the se ond write the analysis infers sn  2. The write to the 4 hara ter bu er t is safe if sn  3. However sn  2 does not imply sn  3 and therefore the all to str py generates a spurious warning. Inserting s[2℄='a'; s[3℄='y'; in front of the all to str py will yield a de nite error sin e sn  4 implies that sn  3 annot be satis ed. On the other hand, writing the same four hara ters to the same positions in reverse order only leads to a warning: sn = 0 holds valid until s[0℄ is written whi h then updates the null position to sn  1.

3.3 Possible and de nite bu er sharing domains

Let PS X = }(X 2 ) and DS X = }(X 2 ). The omplete latti es hPS X ; ; \; [i and hDS X ; ; [; \i are used to express (pair-wise) possible bu er sharing and de nite bu er sharing respe tively. De nition 3. The abstra tion maps XPS : }(Env  Str) ! PS X and XDS : }(Env  Str) ! DS X and on retization maps XPS : PS X ! }(Env  Str) DS : DS X ! }(Env  Str) are de ned as follows: and X XPS (M ) = [f Xsh(h; i) j h; i 2 Mg XDS (M ) = \f Xsh(h; i) j h; i 2 Mg

PS (S ) = fh; i j sh(h; i)  Sg DS (D) = fh; i j D  sh (h; i)g X

where

X

X

X

h; i)= fhx; yi 2 X j ((x)) = hab ; ax ; ae i^((y)) = hab ; ay ; ae ig. The intuition is that ea h D 2 DS X aptures the ertain presen e of sharing in sh (h; i) for all h; i 2 DS (D ). Conversely, that if hx; yi 2 D then hx; yi 2 X X ea h S 2 PS X des ribes the ertain la k of sharing, that is, if hx; yi 62 S then hx; yi 62 Xsh(h; i) for all h; i 2 XPS (S ). This is the negative interpretation of S . The positive interpretation of S is that S des ribes the possible present PS (S ) su h that of sharing, that is, if hx; yi 2 S then there exists h; i 2 X sh hx; yi 2 X (h; i). Proposition 1. XP S and XP S are monotoni ; XP S ( XPS (S ))  S for all S 2 PS X ; and M  XP S ( XP S (M )) for all M 2 }(Env  Str). Sin e }(EnvStr) and PS X are omplete latti es, it follows that the quadruple h}(Env  Str); XP S ; PS X ; XP S i is a Galois onne tion between }(Env  Str) DS ; DS ; DS i is also a Galois onne tion. and PS X . Likewise h}(Env  Str); X X X sh X(

2

The overrun analysis presented in this paper pre-supposes possible and definite bu er sharing information. The inter-pro edural ow-insensitive points-to analysis of [21℄ an be adapted to derive the former whereas intra-pro edural analysis is likely to be suÆ ient for the latter. The sharing abstra tions S l and Dl are introdu ed to abstra t away from a parti ular sharing analysis. De nition 4. S l and Dl are de ned so that h; 2 i 2 XPS (S l ) \ XDS (Dl ) if  ` hx = main(); 1 i ! hLl ; 2 i where  = [x 7! 0℄, 1 = [0 7! n℄ and n 2 N .

3.4 Domain intera tion A polyhedral abstra tion an be used to re ne a possible sharing abstra tion whereas a de nite sharing abstra tion an improve a polyhedral abstra tion.

De nition 5. The operators %D : PB X ! PB X and %P : PS X de ned %D (P ) = P u fxn = yn ; xs = ys j hx; yi 2 Dg and

%P (S ) = S n hx; yi 2 S PP jj== xxsn > err else if P j= fyo + z  ys g > xn ;xo ;xs (P )  fx = 0g else if P j= fyo + z = yn g > if P j= fyo + z < yn g > :99xx;xn ;xo;x;xs;x(P(P) ) fx  1g else otherwise 8 n o s err if P j= fyo + z < 0g > :P 00  fvi = yn j vi 2 V g otherwise [mallo 0 ℄ hx = mallo (y); P i!0 9x (P )  fxn  0; xo = 0; xs = yg Abstra t semanti s for simple statements where W = fwn j hw; yi 2 Dl g, l 00 = % l (update (9W nfy g (P 0 )). n j hv; y i 2 %P 0 (S )gn W and P D D x;y;z n

Fig. 3.

P 0 = % l (P ), V = fv

Proposition 3. Let  : f1; : : : ; kg!f1; : : : ; kg be a permutation, ei = fxi }i si g, si = mi + jn=1 mi;j yi;j and xi 62 var(ej ) for all i 6= j . Then P  he1 ; : : : ; ek i = P  he(1) ; : : : ; e(k)i and P  he1 ; : : : ; ek i = P  he(1) ; : : : ; e(k) i. For brevity, de ne P fe1 ; : : : ; ek g = P he1 ; : : : ; ek i if xi 62 var(ej ) for all i 6= j . Proposition 3 ensures that P fe1 ; : : : ; ek g is well-de ned whereas Proposition 4 explains how  an be used to approximate  when a polyhedra an be updated i

with di erent ombinations of equations (and possibly not at all).

Proposition 4. P  E 0 j= P  E for all E 0  E . 4.2 Non-array statements To onstru t abstra t versions of the non-array statements, the s and bf maps are introdu ed to dete t whether an obje t is a s alar or a pointer. The following

proposition states an invariant of any orre t analysis: a polyhedra annot simultaneously re ord s alar and bu er attributes for a given variable. The lemma then asserts orre tness for the abstra t semanti s of the non-array statements. De nition 7. The maps s : PB X ! }(X ) and bf : PB X ! }(X ) are de ned s (P ) = X \ var(P ) and bf(P ) = fx 2 X j fxn ; xo ; xs g \ var(P ) 6= ;g. Proposition 5. If s (P ) \ bf(P ) 6= ; then XPB (P ) = ;. Lemma 1. Suppose L = [skip℄l j : : : j [x = y z ℄l . Then if  ` hL; 1 i ! 2 , h; 1 i 2 XPB (P ) and hL; P1i!0 P2 it follows that h; 2 i 2 XPB (P2 ).

4.3 Array statements To reason about array statements that an error, the store is extended to EStr = Str [ ferrg where err denotes an error state. Likewise, the polyhedral domain is extended to EPB X = PB X [ f?; err; warng. hEPB X ; ; f; gi is a latti e where the ordering  is de ned by ?  P and P  warn for all P 2 EPB X whereas for all Pi 2 PB X , P1  P2 i P1 j= P2 . Moreover, f and g are given by:

P1 t P2 if P1 ; P2 2 PB X P1 u P2 if P1 ; P2 2 PB X > > > > err else if P1 = P2 = err < err else if P1 = P2 = err P1 f P2 = > P1 else if P2 = warn P1 g P2 = > P1 else if P2 = ? > > P2 else if P1 = ? P2 else if P1 = warn > > > > : : warn otherwise ? otherwise 8 > > > >
yn g > A > > > P else if P j= fx > 0; yo + z < yn g > A > > > P else if P j= fx = 0; yo + z < yn g > B > > > P else if P j= fyo + z < yn g > C > < PD else if P j= fx > 0g ^ wu = nl = wl PB else if P j= fx = 0g ^ wu = nl = wl > > > > P > E else if P j= fx > 0g > > > P > A else if P j= fx = 0g ^ nl  wl  nu < wu > > > P else if P j= fx = 0g > F > : PG otherwise

PA = P PB = P  (yn = yo + z) PC = P  (yn = yo + z) PD = P  (yn  yo + z + 1) PE = P  (yn  yn ) PF = PC u fyn  minfnu; wu gg PG = P  (yn  yo + z) wl = dmin(yo + z; P )e wu = bmax(yo + z; P ) nl = dmin(yn ; P )e nu = bmax(yn ; P )

P

j= y

o

P

+z

o


0 P

j= y pn

wl

+z

>

B

otherwise

Fig. 4.

=n

l

pn

nl

=w

u

wl

w

l

n

w

u