Solving Classes of Set Constraints with Tree Automata - CiteSeerX

0 downloads 0 Views 274KB Size Report
program analysis: the rst one is the class of set constraints with intersection ..... Let us recall that each 3-tuple state codi es the belonging of the languages of X,.
IT { 303 Laboratoire d'Informatique Fondamentale de Lille

Publication

IT { 303

Solving Classes of Set Constraints with Tree Automata P. Devienne, JM. Talbot and S. Tison fdevienne,talbot,tisong@li .fr

mai 1997

c L.I.F.L. { U.S.T.L. LABORATOIRE D'INFORMATIQUE FONDAMENTALE DE LILLE U.R.A. 369 C.N.R.S. UNIVERSITE DES SCIENCES ET TECHNOLOGIES DE LILLE U.F.R. d'I.E.E.A. B^at. M3 { 59655 VILLENEUVE D'ASCQ CEDEX Tel. (+33) 3 20 43 47 24 { Telecopie (+33) 3 20 43 65 66 { E-mail direction@li .fr

Resume

Les contraintes ensemblistes sont un formalisme adequate pour l'analyse statique de programmes. Cependant, il est bien connu que pour les classes les plus generales de contraintes ensemblistes les principaux problemes ont une complexite tres elevee (NEXPTIME-completude du test de satis abilite). De nombreux travaux ont donc pour but d'identi er des sous-classes de complexite plus faible. Dans cet article, nous nous interesserons a deux classes de contraintes ensemblistes qui se sont revelees utiles pour l'analyse de programmes : la premiere est la classe des contraintes ensemblistes avec intersection de nie comme des inclusions entre des expressions construites a l'aide de symboles de fonctions et de l'operation d'intersection. La seconde consiste en des contraintes de la forme X  exp, ou exp est construite a l'aide de symboles de fonctions et des operateurs d'intersection, d'union et de projections. La "dualite" entre ces deux classes permet de de nir une approche commune pour la resolution de celles-ci. Cette approche est basee sur les automates d'arbres qui o re un formalisme adapte a la fois pour la resolution et la representation de la solution. Elle permet egalement une caracterisation simple de la complexite de ces problemes.

Keywords : Contraintes Ensemblistes, Automates d'arbres, Analyse Ensembliste.

Abstract

Set constraints is a suitable formalism for programs static-analysis. However, it is known that the complexity of set constraint problems in the most general cases is very high (NEXPTIME-completeness of the satis ability test). Lots of works are involved into nding more tractable subclasses. In this paper, we investigate two classes of set constraints shown to be useful for program analysis: the rst one is the class of set constraints with intersection de ned as inclusions between expressions built over function symbols and the intersection operator. the second one concerns constraints of the form X  exp, where exp is built with function symbols, the intersection, union and projection operators. The dual aspects of those two classes allows to nd a common approach for solving both of them. This approach uses as basic tool tree automata, which are suitable both for computation and representing the solution of those solving problems. It leads also to simple algorithms and an easy characterization of complexity.

Keywords : Set Constraints, Tree Automata, Set-based Analysis.

1

1 Introduction Set constraints allow to express relation between sets of (ground) terms. They can be de ned as inclusion or non-inclusion between expressions built over variables, function symbols and set operators. The set operators that are encountered in most works are intersection, union, complementation and projection. The main problems that have been addressed concerning set constraints are the classical ones of the constraint paradigm, that is, satis ability [AW92] [GTT93], entailment[CP96] [MNP97] and solving [AM91] [Hei92b]. They have been shown very useful for program analysis [Hei92b][Hei94][MNP97]. Satis ability problem for the largest class of set constraints (where all the set operations cited above can occur) has been proved to be NEXPTIME-complete [CP94] and the complexity remains the same if projection is omitted [Ste94] [BGW92] [Tom94]. This leads to a wide interest for more tractable subclasses, de ned according to the di erent considered set operations and/or other syntactic restrictions. In [CP97a], Podelski and Charatonik have proposed a class of set constraints, called set constraints with intersection, expressed as inclusion between expressions built over functions symbols and intersection. Their satis ability algorithm is de ned as a set of axioms used to saturate the input system of constraints. This algorithm provides a proof for EXPTIME-completeness of the satis ability test. Using the independence property of this class, they also proposed an EXPTIME algorithm for non-inclusion constraints satis ability and for entailment. Finally, by showing that this class is equivalent to the de nite set constraints, they gave a complexity characterization of satis ability test for the latter. The same authors proposed in [CP97b] a method a la Heintze for approximating non-failure semantics of logic programs. This analysis amounts to compute the greatest solution (over sets of nite or in nite trees) of a class of set constraints, symmetric of the one used by Heintze [Hei92a], that is of the form X  exp, where exp is built over function symbols, intersection, union and projection. Their method, dealing with sets of nite or in nite trees, combines syntactic transformations of the constraints with tree automata for testing emptiness of variables and representing the greatest solution. We propose in this paper an elegant and homogeneous framework for solving positive (i:e: with inclusion) set constraints with intersection and the second class (over nite trees). This framework is based on tree automata providing a tool both for computation and representation of the solution of the constraints. The solving algorithms are described as relations (de ned as inference rules) over tree automata. This leads to a very simple and uniform approach, in which the dual aspects of those two classes are revealed. Roughly speaking, for set constraints with intersection, we start with an "empty" automaton (i:e: each variable is interpreted as the empty set) and we modify this automaton (by adding terms into the interpretation of the variables) according to the constraints. For the second class, we start from the "universe" automaton (i:e: each variable is interpreted as the set of all terms) and we remove terms from the interpretation of the variables according to the constraints. After giving a few de nitions and properties about set constraints in the next section, we present our basic tool, that is tree automata in section 3. Section 4 is devoted to an algorithm for deciding satis ability of set constraints with intersection. Finally, section 5 deals with an algorithm for computing the greatest solution (over nite trees) for the class of set constraints introduced in [CP97b].

2

2 Preliminaries

We assume given a nite set of function symbols  and V a countable set of variables (denoted X; Y; Z; X1 ; X2; ::). TERM() is the set of ground terms built over  and TERM( [ V ) over  and V . Inclusion set constraints are de ned as inclusion (sexp1  sexp2 ) between set expressions, built over , V , boolean connectives f[; \g and projection symbols fi?1 (with f 2  and 1  i  arity(f )).

sexp ::= X j f (sexp1; ::; sexpn ) j sexp [ sexp0 j sexp \ sexp0 j fi?1 (sexp)

As usual, a set (or system) of set constraints will be viewed as the conjunction of those ones. An interpretation I is a valuation which maps variables onto sets of ground terms and can be extended on set expressions s.t.:  I (f (t1 ; ::; tn )) = ff (s1; ::; sn )js1 2 I (t1 ); ::; sn 2 I (tn )g.  fi?1(t) = fsi j9s1 ; ::; si?1 ; si+1 ; ::; sn ; f (s1 ; ::; sn ) 2 I (t)g  \ and [ are interpreted in a canonical way. A partial order is de ned on interpretations for a set of variables V as : I  I 0 i 8X 2 V , I (X )  I 0 (X ). When an interpretation is related to a system of set constraints SC , V is by convention the set of variables occurring in SC . I is a model (or a solution) of sexp1  sexp2 i I (sexp1 )  I (sexp2 ). I is a model of a system of set constraints SC if it is a model of each constraint of SC . SC is said to be satis able if it has a model. SOL(SC ) denotes the set of models of SC . SC has a least (resp. a greatest) solution Imin (resp. Imax) i 8I 2 SOL(SC ), Imin  I (resp. I  Imax ). One should notice that for all X ,  t 2 Imin (X ) , 8I 2 SOL(SC ); t 2 I (X )  t 2 Imax (X ) , 9I 2 SOL(SC ); t 2 I (X )

3 Tree Automata The basic tool for the two classes we consider is tree automata, more precisely an extension of ascending tree automata [GS84].

De nition 1 A n-ranked tree automata (TA) A is a tuple (; Q; F ; S ) where  is a set of function symbols, Q is a nite set of states, F = (F ; ::; Fn ), is a tuple of sets of nal states (Fi  Q) and S is a set of transition rules of the form: f (q ; ::; qm ) ! q where f 2  is a m-ary symbol and fq; q ; ::; qm g  Q 1

1

1

1

In fact, we consider only a restricted subclass of such automata, that is the class of deterministic and complete TA (denoted TAdc).

De nition 2 A TA is said to be  deterministic i 8ri ; rj 2 S , s.t. i 6= j , ri and rj don't have the same left-hand side.

1

This de nition of transition rule includes the case where is a constant symbol: ! . f

3

a

q

 complete i 8f 2 , 8q ; ::; qm 2 Q, 9q 2 Q s.t. f (q ; ::; qm ) ! q 2 S 1

1

A deterministic and complete tree automata runs over TERM(). This run is formally de ned using a function runA from TERM( [ Q) (i:e: the set of terms built over  and Q where states are viewed as constants) onto Q s.t.

De nition 3 For all t 2 TERM( [ Q), runA(t) = q i t !A q where !A is the transitive closure of the move function !A de ned from TERM([ Q) onto itself as t !A t0 i t = T [l], t0 = T [r] and l ! r 2 S . The language recognized by a n-ranked TAdc A is a tuple (L ; ::; Ln) where Li is a set of ground terms de ned by 8t 2 TERM(); t 2 Li i runA (t) 2 Fi . 1

A basic property for automaton states is reachability. De nition 4 A state q is said to be reachable in a TA A i there exists a term t in TERM() s.t. runA (t) = q. Reachable(A) will denote the set of reachable states in a TA A This de nition implies, in particular, that if q 2 Fi and q is reachable, then Li is non-empty.

4 Set Constraints with Intersection In this section, we present a method using tree automata for deciding satis ability for a class of set constraints, called "set constraints with intersection", introduced in [CP97a]. Those constraints are built with functional compositions and intersection connectives. Without loss of generality, we may assume that those constraints are in "shallow" form2 : f (X1 ; ::; Xm )  X

X \Y Z X  f (X1 ; ::; Xm )

This class of set constraints is equivalent to the class of de nite set constraints presented in [HJ90] and used for program analysis [Hei92a]. Such a system of set constraints is either unsatis able or has a least solution. We propose an algorithm which builds a tree automata representation of the least solution of a system SC if SC is satis able and returns ? otherwise.

4.1 Algorithm

Let SC be a system of set constraints and fX1; :::; Xn g be the set of variables occurring in SC . We will consider the set of TAdc A = (; Q; F ; S ) s.t.  is the signature of SC , Q = f0; 1gn (the vectors of size n of boolean values) and F = (F1 ; ::; Fn ) where Fi is the set of states having 1 on the ith component. In other words, for a ground term t, if the state equal to runA (t) has 1 on its ith component, then t belongs to Xi in any solution. We present our algorithm as a system of inference rules R. Starting from the "empty" automaton, R computes either a tree automaton representing the solution 2 A system of set constraints with intersection can be transformed into an equivalent "shallow" one by adding "fresh" variables and by noticing that  \ ,  ^  and that  , \  . This could be done with a linear time and space algorithm. X

X

Y

X

X

Y

4

Y

Z

X

Y

X

Z

of a system of set constraints if it is satis able or ? otherwise. Since ,Q and F are xed, no di erence is made between an automaton and its set of transition rules.

Notations:

- for q; q0 2 Q, switch on(q; i) = q0 i 8j; q0 2 Fj , (q 2 Fj _ i = j ) 3 - S0 denotes the automaton having the right-hand sides of its rules set to f0gn - Subst(V; E ) will denote the set of substitutions having V for domain and ranging over the set E . Let R be the following system of inference rules :

ff (q ; ::; qm ) ! qg (Compose) S [ ff S(q[; ::; qm ) ! switch on(q; i)g if



1

1

ff (q ; ::; qm ) ! qg (Inter) S [ ff S(q[; ::; qm ) ! switch on(q; i)g if



1

1

S [ ff (q ; ::; qm ) ! qg if ?

(Clash)

1

8 < :

f (Xi1 ; ::; Xim )  Xi 2 SC 8k; qk 2 Fik

Xi1 \ Xi2  Xi 2 SC q 2 Fi1 and q 2 Fi2

Xi  g(Xi1 ; ::; Xil ) 2 SC f 6= g; q 2 Fi fq1 ; ::; qm g  Reachable(S )

Xi  g(Xi1 ; ::; Xil ) 2 SC fq1 ; ::; qm g  Reachable(S ) ::; qm ) ! qg 9 g( qi1 ; ::; qil ) ! q0 2 S s.t. (Project) S [ ff (Sq [; ::;ffq(q)1;! if > switch on(q; ik )g 1 m q0 2 F ; q = q > > i ik > : fq ; ::; q g 2 Reachable(S ) il i1 Informally, the rst rule means that for any solution I , if for all k, tk 2 I (Xik ), then according to the considered constraint f (t1 ; ::; tk ) 2 I (Xi ). For (Inter), if t belongs to I (Xi1 ) and I (Xk ) for any solution, then t 2 I (Xi2 ). (Clash) means that there exists a ground term f (t1 ; ::; tm ) in I (Xi ) for any interpretation I ; therefore, 8 > > > >
> > > >
> > > > > :

1

)

q0 62 Fk Intuitively, the rst rule means that, according to the constraint, no term f (t1 ; ::; tm ) can belong to I (Xi ) for any solution I and for a term g(t1 ; ::; tl ) s.t. tk 62 I (Xik ), g(t1 ; ::; tl ) 62 I (Xi ). For (Union), if for any solution I , t 62 I (Xi1 ) and t 62 I (Xi2 ), then t 62 I (Xi ). (Project) means that if, for a term tj , for any solution I and for any term g(t1 ; ::; tl ), g(t1 ; ::; tl ) 62 I (Xk ), then tj 62 I (Xi ). Starting from S1 , our algorithm computes Sf a x-point of !R that is S1 !R Sf and Sf !R S implies S = Sf . 6 Intersections can be cancelled since  \ ,  ^  . 7 The interpretation I , de ned as 8 I? ( ) = ? is an obvious solution. ? X

X;

Y

Z

X

9

X

Y

X

Z

5.2 Termination

We can de ne a partial ordering relation #Q on states as q0 #Q q i q0 62 Fj ) q 62 Fj . It can be extended as a partial ordering #A on TAdc (like A for Q ). The termination comes from the facts that S !R S 0 implies that S #A S 0 and that the set TAdc is nite.

5.3 Correction

We start with soundness. For this, we are going to prove that our algorithm computes a solution. Let (L1 ; ::; Ln) be the language recognized by Sf and If be the interpretation de ned by 8Xi ; If (Xi ) = Li .

Theorem 3 If is a solution of SC . Proof : Let us assume that If is not a solution. Therefore, there exists a violated constraint in SC . According to the di erent kind of constraints:  Xi  f (Xi1 ; ::; Xim ): there exist a term t s.t. t 2 If (Xi ) and t 62 If (f (Xi1 ; ::; Xim )). By de nition, it should be noticed that runSf (t) 2 Fi . If t = g(t ; ::; tl ), there exists in Sf a rule g(q ; ::; ql ) ! q s.t. q 2 Fi . This is impossible since Sf is a x-point. If t = f (t ; ::; tm ), it implies that there exists a k s.t. tk 62 If (Xik ), that is runSf (tk ) 62 Fik . So, there exists a rule f (q ; ::; ql ) ! q in SC , s.t. runSf (t) = q 2 Fi and qk 62 Fik , which is impossible.  Xi  Xi1 [ Xi2 : there exist a term t s.t. t 2 If (Xi ), t 62 If (Xi1 ) and t 62 If (Xi2 ) So, by de nition, runSf (t) belongs to Fi , but neither to Fi1 nor to Fi2 . Impossible since Sf is a x-point.  Xi  gj? (Xk ): there exists a term t s.t. t 2 If (Xi ) and t 62 If (gj? (Xk )). So, for any term g(t ; ::; tl ) s.t. tj = t, g(t ; ::; tl ) 62 If (Xk ). So, for any g(qi1 ; ::; qil ) ! q0 s.t. qi1 ; ::; qil are reachable, q0 62 Fk . Since runSf (t) 2 Fi , this would imply that Sf is not a x-point. 1

1

1

1

1

1

1

1

2 We deal now with completeness by starting with proving a basic lemma which states an invariant property of !R .

Lemma 2 8I 2 SOL(SC ), (a) 8f (q ; ::; qm ) ! q 2 S ; 8t ; ::; tm 2 TERM(); ^ ^ ^ f (t ; ::; tm ) 62 I (Xp ) to 62 I (Xp ) ) 1

1

om fpjqo 62Fp g

1

fpjq62Fp g

1

(b) 8t 2 TERM( [ V ); 8 2 Subst(V ar(t); Q); 8 2 Subst(V ar(t); TERM()) ^

^

fyjy2V ar(t)g fpjy62Fp g

y 62 I (Xp ) )

^

fpjrun (t)62Fp g

t 62 I (Xp )

S

If (a) and (b) holds for S and S !R S 0 , then (a) and (b) holds for S 0

Proof :

10

For (a): As in lemma 1 and for the same reasons, it is sucient to prove for a switch to 0 on i on the modi ed rule that : ^

^

om fpjqo 2Fp g

to 62 I (Xp ) ) f (t1 ; ::; tm ) 62 I (Xi ) holds for S 0

1

So, according to the rule applied for !R :  For (Compose): If g 6= f , so 8t1 ; ::; tm 2 TERM(), f (t1 ; ::; tm ) 62 I (g(Xi1 ; ::; Xim )), therefore f (t1 ; ::; tm ) 62 I (Xi ). If f = g, the conditions of (a) and the inference rule implies that 9k; 8tk ; tk 62 I (Xik ). Thus, f (t1 ; ::; tm) 62 I (f (Xi1 ; ::; Xim )). Since I is a solution, f (t1 ; ::; tm ) 62 I (Xi ).  For (Union): Since (a) holds for unchanged parts of S in S 0 , q 62 Fi1 ,q 62 Fi2 implies that f (t1 ; ::tm ) 62 I (Xi1 ) and f (t1 ; ::tm ) 62 I (Xi2 ). Since I is a solution, (a) holds.  For (Project): Let us consider a term t = g(z1 ; ::; zj?1 ; f (y1 ; ::; ym ); zj+1 ; ::; zl ). For any substitution ranging on ground terms 0 ( 0 (zo ) = so, let 0 ranging over (reachable) states s.t. 0 (zo ) = qio = runS (so ). Let 00 (resp. 00 ) be a substitution s.t. 8o; 00 (yo ) = qo (resp. 00 (yo ) = to ). Finally, let  = 0  00 and = 0  00 . (b) holds in S (in particular for t, and ). Moreover, (b) implies for the so that : ^ ^ so 62 I (Xp ) ol^o6=f ) fpjqio 26 Fp

(1

g

Finally, since the conditions of the (Project) rule implies that runS (t) 62 Fk , it can be deduced from these that : ^

^

om fpjqo 62Fpg

to 62 I (Xp ) ) g(s1 ; ::; sj?1 ; f (t1; ::; tm ); sj+1 ; ::; sl ) 62 I (Xk )

1

As I is a solution, the condition implies that f (t1 ; ::; tm ) 62 I (Xi ). For (b): for any S, by induction on the structure of t. The proof goes in a similar way to lemma 1 (b). 2

Theorem 4 [Completeness] 8t 2 TERM(), (9I 2 SOL(SC ) t 2 I (Xi )) ) runSf (t) 2 Fi Proof : Since (b) obviously holds for S , byV lemma 2, it holds for Sf . This implies for ground terms that 8I 2 SOL(SC ); fpjrun f t 62Fp g t 62 I (Xp ). So, runSf (t) 62 Fi ) 8I 2 SOL(SC ); t 62 I (Xi ). 1

S

( )

2

This implies that Sf is unique and represents the greatest solution of the system of set constraints.

5.4 Complexity

We apply for those set constraints the same strategy as for set constraints with intersection. Let f (resp. a) the number of functions symbols (resp. the maximal arity) in . n (resp. v) will denote the number of constraints (resp. of variables) in SC . 11

The number of states and transition rules, and the size of an automaton (T ) are the same as for set constraints with intersection. Therefore, the maximal number of iterations remains the same. For a rule and a set constraint : (Union) can be achieved in constant time O(c). (Compose) costs in time at most O(a) and (Project) O(a:T:2a:v ) So, globally this exponential-time algorithm costs at most O(nf 3 a2 24va ).

6 Example For the following system of constraints, this tree automata, where states are codi ed as W X Y Z , is obtained:

Xa Y X [Z Z  f (X; Y ) W  f2?1(Y ) In a tree grammar representation:

a ! 1110 f ( 1110 ; 1110 ) ! 1011 f ( 1110 ; 1011 ) ! 1011 X )a

Z ) f (a; Y )

W; Y ) a j Z

7 Conclusion We have proposed in this paper a common approach based on tree automata for two di erent problems concerning set constraints and useful for (logic) programs analysis. Tree automata are shown to be suitable both for computing and representing the solution of those twos: The rst one was a satis ability test for (positive) set constraints with intersection and the second one computing the greatest solution for a class of set constraints, which can be viewed as the symmetric of those used for set-based analysis [Hei92a]. This approach leads to an easy characterization of complexity of those problems. In a practical point of view, a representation of states with "unde ned" components (i:e: meaning either 1 or 0) and considering at a step only rules involving reachable states should give an ecient implementation. We also aim to extend our approach to in nite trees as done in [CP97b]. This could be achieved by considering the automata in a descending way and modifying the condition of reachability. As mentioned in [CP97a], set constraints with union (that is inclusion between expressions built over function symbols and the union operator) is not dual to set constraints with intersection. It seems that our method cannot easily address this class of constraints. Therefore, the complexity of the satis ability problem for set constraints with union remains open.

References [AM91] A. Aiken and B. Murphy. Implementing regular trees. In Proceedings of the 5th ACM Conference on Functional Programming and Computer Architecture, pages 427{447. LNCS 523, aug 1991. [AW92] A. Aiken and E.L. Wimmers. Solving Systems of Set Constraints. In Proceedings of the 7th Symposium on LICS, pages 329{340, 1992. [BGW92] L. Bachmair, H. Ganzinger, and U. Waldmann. Set constraints are the monadic class. Technical Report MPI-I-92-240, Max-Planck-Institut fur Informatik, dec 1992. 12

[CP94] [CP96] [CP97a] [CP97b] [GS84] [GTT93] [Hei92a] [Hei92b] [Hei94] [HJ90] [MNP97] [Ste94] [Tom94]

W. Charatonik and L. Pacholski. Set constraints with projections are in NEXPTIME. In Proc. 35th Symp. Foundations of Computer Science, pages 642{653, 1994. W. Charatonik and A. Podelski. The independence property of a class of set constraints. In Proceedings of Conference on Principles and Practice of Constraint Programming (CP'96), pages 76{90. LNCS 1118, 1996. W. Charatonik and A. Podelski. Set constraints with intersection. In Proceedings of the 12th Symposium in Logic in Computer Sciences (LICS'97), 1997. W. Charatonik and A. Podelski. Solving set constraints for greatest models. Technical Report MPI-I-2-004, Max-Planck-Institut fur Informatik, 1997. F. Gecseg and M. Steinby. Tree Automata. Akademiai Kiado, Budapest, 1984. R. Gilleron, S. Tison, and M. Tommasi. Solving systems of set constraints with negated subset relationships. In Proceedings of the 34th Symp. on Foundations of Computer Science, pages 372{380, 1993. N. Heintze. Practical aspects of set based analysis. In Proceedings of the International Joint Conference and Symposium on Logic Programming, nov 1992. N. Heintze. Set Based Program Analysis. PhD thesis, Carnegie Mellon University, sep 1992. N. Heintze. Set-based analysis of ML programs. In Lisp and Functional Programming, pages 306{317. ACM, 1994. N. Heintze and J. Ja ar. A decision procedure for a class of herbrand set constraints. In Proceedings 5th IEEE Conference on LICS, pages 42{51, jun 1990. M. Muller, J. Niehren, and A. Podelski. Inclusion constraints over nonempty sets of trees. In Proceedings of CAAP - TAPSOFT'97, apr 1997. K. Stefansson. Systems of set constraints with negative constraints are nexptime-complete. In Proceedings of the 9th Symposium on LICS, 1994. M. Tommasi. Automates et Contraintes Ensemblistes. PhD thesis, Universite des Sciences et Technologies de Lille, 1994.

13