ARTICLE IN PRESS
Journal of Computer and System Sciences 69 (2004) 617–655 http://www.elsevier.com/locate/jcss
An expressive language for linear spatial database queries$ Luc Vandeurzen,a,1 Marc Gyssens,a, and Dirk Van Guchtb a
Dept. WNI, Limburgs Universitair Centrum, Universitaire Campus, B-3590 Diepenbeek, Belgium b Computer Science Department, Indiana University, Bloomington, IN 47405-4101, USA Received 25 April 2002; revised 8 April 2004 Available online 20 July 2004
Abstract We exhibit a coordinate-based language, called PFOL, which is sound for the linear queries computable in first-order logic over the reals and extends the latter’s restriction to linear arithmetic. To evaluate its expressive power, we first consider PFOL-finite, the PFOL programs that compute finite outputs upon finite inputs. In order to study this fragment of PFOL, we also define an equivalent syntactical language, called SPFOL. We show that SPFOL has the same expressive power as SafeEuql (S. Cluet, R. Hull (Eds.), Proceedings of the Sixth International Workshop on Database Programming Languages, Lecture Notes in Computer Science, Vol. 1369, Springer, Berlin, 1997, pp. 1–24), whence all ruler-and-compass constructions in the plane on finite sets of points can be expressed in SPFOL. This result gives a geometrical justification of SPFOL, and, hence, also of PFOL-finite. Then, we define finite representations for arbitrary semi-linear sets and show that there are PFOL programs for both the encoding and the decoding. This result is used (i) to identify a broad, natural class of FO þ poly-expressible linear queries of which we show equivalence with the class of PFOL-expressible queries, and (ii) to establish a general theorem about lifting query languages on finite databases to query languages on arbitrary linear databases. This theorem is applied to a recent result of Benedikt and Libkin (SIAM J. Comput. 29 (2000) 1652–1682) from finite to arbitrary semi-linear sets, yielding the existence of a natural, syntactically definable fragment of FO þ poly sound and complete for all FO þ poly-expressible linear queries. r 2004 Elsevier Inc. All rights reserved. Keywords: Database theory; Query languages; Constraint databases; Linear constraint databases; Spatial databases; Linear spatial databases; Expressiveness
$
A preliminary version of parts of this paper was presented at the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems—PODS ’98 (Seattle, Washington, USA, June 1998). Corresponding author. E-mail addresses:
[email protected] (L. Vandeurzen),
[email protected] (M. Gyssens),
[email protected] (D. Van Gucht). 1 Currently no longer working at the LUC, but can be reached at the indicated e-mail address. 0022-0000/$ - see front matter r 2004 Elsevier Inc. All rights reserved. doi:10.1016/j.jcss.2004.04.005
ARTICLE IN PRESS 618
L. Vandeurzen et al. / Journal of Computer and System Sciences 69 (2004) 617–655
1. Introduction Following the seminal work by Kanellakis et al. [12] on constraint query languages with polynomial constraints, various researchers have introduced geometric database models and query languages within this framework [13]. We adopt the formalism of [13], which we shall call the polynomial spatial database model, in which both geometric objects and queries are expressed using polynomial inequalities. Geometric objects described by polynomial inequalities are called semi-algebraic sets, and the query language using polynomial inequalities is referred to as FO þ poly: Several authors [1,2,6,9–12,18] discussed linear spatial database models which can be seen as linear restrictions of the polynomial database model. These linear models suffice for the majority of applications encountered in GIS, geometric modeling, and spatial and temporal databases [14,17]. Geometric objects described by linear inequalities are called semi-linear sets, and the restriction of FO þ poly using only linear inequalities is referred to as FO þ linear: Unfortunately, not all linear queries (i.e., mappings between spatial databases describable in the linear model) expressible in FO þ poly can be described in FO þ linear [2]. The present authors showed that, in fact, a whole class of rather straightforward linear queries, among which computing convex hull and deciding collinearity, are not expressible in FO þ linear [18]. Several attempts have been made to enrich the expressive power of FO þ linear; but this turns out to be a difficult task. Afrati et al. [2] and the present authors [18] showed that naive extensions of FO þ linear easily cease to remain sound with respect to the linear queries, and yield a language equivalent in expressive power to FO þ poly: Sound ways to extend FO þ linear have been proposed (e.g., [7,18]), but the geometric significance of these extensions is not always clear. This objection also holds for FO þ linear itself, which is a second reason why FO þ linear is not such a desirable linear query language: unlike for FO þ poly; no geometric characterization of FO þ linear is known. This observation has led some authors to consider point-based linear languages [16]. In this paper, we try to obtain a geometrically meaningful, but still coordinate-based, extension of FO þ linear: We relax the condition that FO þ linear formulae may only contain linear inequalities by introducing a second sort of variables, the so-called product variables. These product variables must be quantified and are allowed to range over a finite set of numbers only. In compensation, however, product variables may occur in multiplications with other product variables or with ordinary real variables. The language thus obtained will be called PFOL. In order to study the merits of PFOL, we first consider PFOL-finite, the PFOL programs that compute finite outputs upon finite inputs. Since intermediate results of PFOL-finite programs may be infinite, structural induction as proof technique is not applicable, whence this language is hard to study. Therefore, we introduce an alternative language, called SPFOL, which is safe with respect to queries from finite inputs to finite outputs. We prove that SPFOL and PFOL-finite are equivalent in expressive power, as a consequence of which SPFOL can be seen as a kind of normal form for PFOL-finite. We then study the expressive power of SPFOL. We show that SPFOL has the same expressive power as SafeEuql [16], a language which captures the notion of ruler-and-compass constructions on finite sets of points in the plane. This result gives a geometrical justification of SPFOL, and, hence, also of PFOL-finite, and, by extension, of PFOL.
ARTICLE IN PRESS L. Vandeurzen et al. / Journal of Computer and System Sciences 69 (2004) 617–655
619
In order to show the equivalence in expressive power of PFOL-finite and SPFOL, we propose a technique to encode an arbitrary semi-linear set into a finite database, called its finite representation, as well as a decoding technique to restore the original semi-linear sets. We show that both encoding and decoding can be expressed in PFOL. Apart from being a tool in proving the equivalence of PFOL-finite and SPFOL, this result also yields a ‘‘lifting theorem’’ which guarantees that a sensible query language on arbitrary semi-linear databases can be obtained by defining a sensible query language from finite inputs to finite outputs, and letting this language operate on finite representations. We call an FO þ poly-expressible linear query constructible if the corresponding transformation from a finite representation of the input to a finite representation of the output can be constructed by ruler and compass. As an immediate consequence of the lifting theorem, PFOL is sound and complete for the constructible queries. The constructible queries are a vast class of FO þ poly-expressible linear queries; in particular, they include all Boolean queries (such as, e.g., collinearity), as well as, for example, the convexhull query. Our work was partly inspired by recent results of Benedikt and Libkin [5]. Benedikt and Libkin exhibit a syntactically definable fragment of FO þ poly which is sound and complete for the FO þ poly-expressible queries from finite inputs to finite outputs, which they claim to be natural. From this, it follows that there exists a query language which is sound and complete for the FO þ poly-expressible linear queries, and which is arguably more natural than the one proposed in [8]. It should be noted that the latter language was proposed precisely to encourage the search for more natural query languages which are sound and complete for the FO þ poly-expressible linear queries. This paper is organized as follows. In Section 2, we review concisely the polynomial and linear database model, and recall some notions and results from earlier work [8] on structural properties of semi-linear sets. In Section 3, we define the query language PFOL, and show it is a linear query language. In Section 4, we consider PFOL-finite and introduce SPFOL. We also establish some basic properties of SPFOL. In order to prove the equivalence of PFOL-finite and SPFOL, we exhibit a finite representation for arbitrary semi-linear sets in Section 5. In particular, we propose encoding and decoding algorithms that are expressible in PFOL. The actual equivalence proof for the expressive powers of PFOL-finite, respectively, SPFOL can be found in Section 6. In Section 7, we then characterize SPFOL by showing that it has the same expressive power as SafeEuql. Finally, in Section 8, we define the class of constructible queries in terms of their effect on finite representations, and present a ‘‘lifting theorem’’ based on the encoding and decoding technique introduced in Section 5. From this, it readily follows that PFOL is complete for the constructible queries.
2. Preliminaries 2.1. The polynomial and linear spatial database models We first review the polynomial model [12,13]. The polynomial model is described using the firstorder language of the ordered field of the real numbers ðR; p; þ; ; 0; 1Þ; i.e., the language ðp; þ; ; 0; 1Þ: Every real formula jðx1 ; y; xn Þ with free real variables among x1 ; y; xn defines a
ARTICLE IN PRESS 620
L. Vandeurzen et al. / Journal of Computer and System Sciences 69 (2004) 617–655
geometrical figure fðx1 ; y; xn Þ j jðx1 ; y; xn Þg in n-dimensional Euclidean space Rn : Point-sets defined in this way are called semi-algebraic sets. A spatial database scheme S is a finite set of relation names. Each relation name R has a fixed arity associated to it, corresponding with the dimension of the Euclidean space containing the semi-algebraic sets that will be the instantiation of R:2 A database scheme has type ½n1 ; y; nk if the scheme consists of relation names, say R1 ; y; Rk ; respectively, of arity n1 ; y; nk : A syntactic spatial database instance is a mapping I assigning to each relation name R of a database scheme S a syntactic spatial relation I ðRÞ of the same type. A syntactic spatial relation of arity n is a real formula jðx1 ; y; xn Þ; with n free variables. The semantics of a syntactic database instance I over a database scheme S is the mapping I assigning to each relation name R in S the semantic spatial relation IðI ðRÞÞ; which is the semi-algebraic set defined by the syntactic spatial relation I ðRÞ: In the polynomial model, we consider a query of signature ½n1 ; y; nk -½n to be a mapping from instances of a spatial database scheme of type ½n1 ; y; nk to instances of a spatial database scheme of type ½n that can be regarded in a consistent way both at the syntactic and semantic level, and is computable at the syntactic level. In this context, we define FO þ poly as the query language obtained by adding to the language of real formulae atomic formulae of the form Rðx1 ; y; xn Þ; with x1 ; y; xn real variables and R a relation name of arity n: A query of signature ½n1 ; y; nk -½n is definable in FO þ poly if there exists an FO þ poly formula j with n free real variables such that, for every input database instance of signature ½n1 ; y; nk ; fðx1 ; y; xn Þ j jðx1 ; y; xn Þg evaluates to the corresponding output database, which is of type ½n: From the polynomial model, a linear spatial database model can be obtained by only considering real formulae containing linear polynomials. The corresponding restrictions of semialgebraic sets, spatial databases, queries, and the language FO þ poly are called semi-linear sets, linear spatial databases, linear queries, and FO þ linear; respectively. There are essentially two ways to do this [8]. In the minimalistic approach, the linear model is described using the first-order language of the ordered additive group of the real numbers R; ðp; þ; 0; 1Þ; i.e., the language ðp; þ; 0; 1Þ: This amounts to allowing only rational numbers as coefficients of linear polynomials. In the maximalistic approach, we want all semi-algebraic polytopes3 to be semi-linear. This amounts to allowing arbitrary real algebraic numbers as coefficients of linear polynomials.4 Many results concerning the linear model hold both in the minimalistic as in the maximalistic approach. However, the language PFOL to be introduced in Section 3 is only closed under the maximalistic approach. We shall therefore take this approach for the remainder of the paper. Queries of signature ½n1 ; y; nk -½0 are called Boolean queries, because the sets fð Þg and f g can be seen as encoding the truth values true and false, respectively. Since both these sets are semilinear, every Boolean query induces a linear query, when restricted to linear inputs. We denote FO þ polylin the class of all linear queries definable in FO þ poly: 2 In general, spatial databases can also contain a non-spatial part. In this paper, we are not concerned with this nonspatial part, and, therefore, do not include it to simplify the presentation. 3 A polytope is the convex closure of a finite set of points. 4 Obviously, we cannot allow arbitrary real numbers as coefficients, as this would place us outside the polynomial model.
ARTICLE IN PRESS L. Vandeurzen et al. / Journal of Computer and System Sciences 69 (2004) 617–655
621
We conclude this overview by mentioning that some common linear topological queries, such as taking the closure, the interior, or the boundary, are expressible in FO þ linear [18]. Where necessary, or where useful for the purpose of abbreviation, we use vector notation to denote points. Equalities or inequalities between vectors must be interpreted coordinate-wise. 2.2. Structural properties of semi-linear sets We also need some notions and results introduced and discussed in earlier work of the present authors with Dumortier [8], which we recall here in a more informal style. First, we introduce some standard geometrical terminology in the setting of the linear model. An affine space is a semi-linear set defined by equations only. E.g., in R3 ; a nonempty affine space is either a point, a line, a plane, or R3 itself. Given a semi-linear set S of Rn ; the affine support of S is the smallest affine space (w.r.t. inclusion) containing S:5 Now, given a semi-linear set S of Rn ; we say that a point ~ p of S is a regular point if there exists a sufficiently small neighborhood V of ~ p such that S-V and the affine support of S-V coincide within V : If, moreover, S-V has the same dimension as S; then ~ p is called a regular point of maximal dimension of S: Example 1. Consider the semi-linear set S ¼ fðx; y; zÞjð0pxp340pyp340pzp3Þ3 ð3pxp54y ¼ 14z ¼ 0Þ3 ðx ¼ 54y ¼ 54z ¼ 0Þg in three-dimensional space, shown in Fig. 1, which consists of a closed filled cube with a closed line segment attached to it at the point ð3; 1; 0Þ and an isolated point. The regular points of S are the points of the open cube, the points of the open line segment, and the isolated point. The regular points of maximal dimension of S are only the points of the open cube. We recall that the set of the regular points of maximal dimension RegðSÞ of a semi-linear set S is again semi-linear, and that the corresponding linear query can be expressed in FO þ linear: (See, e.g., [8].) From a semi-linear set S; several special point-sets can be computed, as follows. First, we compute RegðSÞ: The connected components of RegðSÞ are our first special point-sets. Then, we consider S RegðSÞ; the complement of RegðSÞ with respect to S; and S% RegðSÞ; the % the topological closure of S: Notice that both sets, which complement of RegðSÞ with respect to S; can be computed in FO þ linear; have strictly lower dimension than S: In particular, they are empty if S is zero-dimensional. If S is not zero-dimensional, we repeat the procedure on both sets independently. In this way, we obtain m-, m 1-, y; three-, two-, down to zero-dimensional 5
Alternatively, the affine support of S can be defined as the intersection of all affine spaces containing S; which in turn is an affine space. This definition, while perhaps less intuitive, shows that the concept is well defined.
ARTICLE IN PRESS 622
L. Vandeurzen et al. / Journal of Computer and System Sciences 69 (2004) 617–655
Fig. 1. The semi-linear set of Example 1.
point-sets. We are particularly interested in these zero-dimensional point-sets, which are called special points. By construction, the linear query computing the set of special points of a semi-linear set is expressible in FO þ linear: Example 2. Consider again the semi-linear set S of Example 1, displayed in Fig. 1. Then S RegðSÞ and S% RegðSÞ coincide and consist of the faces of the cube, the closed line segment attached to it, and the isolated point, which constitute a two-dimensional semi-linear set, say T: We see that RegðTÞ consists of the open faces of the cube. Again, T RegðTÞ and T% RegðTÞ coincide. They consist of the edges of the cube, the closed line segment attached to it, and the isolated point, which constitute a one-dimensional semi-linear set, say U: We see that RegðUÞ consists of the open edges of the cube, and the open line segment attached to it. Once more, U RegðUÞ and U% RegðUÞ coincide. They consist of the corner points of the cube, the end points of the line segment attached to it, and the isolated point, which constitutes a discrete set of points. This set of 11 points is the set of special points of S: It must be noted that, in general, the special points of a semi-linear set S need not belong to S (although, of course, they always belong to the topological closure S% of S). Also, the sets S RegðSÞ and S% RegðSÞ do not always coincide, as was the case in Example 2. In general, both sets must be considered to ensure that no special points are missed. To highlight the significance of the special points, we need to make the following observation. It has been shown [8] that, if the affine supports of all the special point-sets of a semi-linear set, including the special points themselves, are described as intersections of ðn 1Þ-dimensional hyperplanes,6 then S is a finite union of cells of the partition of n-dimensional space induced by all these hyperplanes. In the case of a bounded semi-linear set S; all these special point-sets (whence their affine supports) are supported by special points. Hence, if we restrict the partition meant in the previous 6
Mathematically, n-dimensional space itself can be obtained as the intersection of the empty family of ðn 1Þdimensional hyperplanes.
ARTICLE IN PRESS L. Vandeurzen et al. / Journal of Computer and System Sciences 69 (2004) 617–655
623
paragraph to the affine support of S; say A; we can obtain this partition by considering hyperplanes of the affine space A which are supported by special points. We illustrate this on an example. Example 3. Consider the semi-linear set S ¼ fðx; yÞj ð3x y424y414x þ 2yo10Þ4 :ðxp34yX24xXyÞ4 :ð1pyp34x ¼ 3Þg in the plane, shown in Fig. 2, which consists of an open triangle, out of which a closed triangle and a closed line segment have been cut. It can easily be seen that there are 7 special points, namely the corner points of the two triangles involved, and the missing point on the base of the outer triangle. Without elaborating, we mention that there are 8 one-dimensional special point-sets, namely the open line segments in which the special points divide the boundary of S: The affine support of S is the plane itself, so hyperplanes are lines. Hence, to obtain the partition meant above, we must certainly consider the 7 lines supporting the 8 special line segments. Since these lines can also be used to determine the special points, they suffice. Fig. 3 shows these 7 lines (the dots indicating the boundary of S). Coordinate axes are hidden for clarity.
Fig. 2. The semi-linear set of Example 3.
4
5 3 16 15 6
2
17 20
7 8 9
18
19 10
14 1
13 11
12
Fig. 3. The lines needed to describe the affine supports of the special point-sets of the semi-linear set of Example 3, shown in Fig. 2. The numbers indicate the open regions in the induced partition.
ARTICLE IN PRESS 624
L. Vandeurzen et al. / Journal of Computer and System Sciences 69 (2004) 617–655
Fig. 4. The semi-linear set of Example 4.
These lines, together with the open half-planes they define, induce a partition of the entire plain, consisting of 12 points, 31 open line segments or half-lines, and 20 open regions. In Fig. 3, these regions have been identified by numbers. Notice that S is indeed the finite union of cells of this partition, namely the open regions 13, 14, 15, 17, 18, and 19, all of which are filled polygons, and the open intervals separating them. An unbounded semi-linear set S in n-dimensional space is still a finite union of cells of the partition of n-dimensional space induced by any finite collection of ðn 1Þ-dimensional hyperplanes for which all special point-sets of S can be described as an intersection of some of those hyperplanes. Unfortunately, such a collection of hyperplanes can in general no longer be obtained from the special points of S alone. Example 4. Consider the semi-linear set S ¼ fðx; yÞj0pyp2xg in the two-dimensional plane, shown as the closed angular sector in Fig. 4 (coordinate axes are hidden for clarity). The one-dimensional special point-sets of S are the two open half-lines on the boundary of S: The only special point of S is the angular point, which is marked in Fig. 4. The partition induced by the affine supports of the two open half-lines (these lines form the required set of hyperplanes in the plane) consists of 4 open angular sectors (one of which is S), 4 open half lines, and the angular point. Obviously, the 4 lines defining this partition cannot be obtained from the angular point alone. To solve the problem sketched above, the concept of bounding box was introduced in [8]. A bounding box of a semi-linear set S in n-dimensional space is an open hypercube C (for practical purpose usually centered around the origin ~ 0 of Rn ), of which the length of the edges is a real algebraic number, such that each special point-set of S has a non-empty intersection with C: In particular, a bounding box of a semi-linear set contains all its special points. In [8], we exhibit an FO þ linear-expressible linear query which returns a bounding box for a given semi-linear set. Given a semi-linear set S; this query first computes the special points of each of the 2n intersections of S with one of the hyperquadrants of Rn ; and then finds an open hypercube centered around the origin of the coordinate system containing all these points. Once a bounding box C is found for a semi-linear set S of Rn ; the desired partition can be obtained from the special points of S-C; as this latter semi-linear set is bounded. Example 5. Consider again the semi-linear set S of Example 4, displayed in Fig. 4. Fig. 5 shows a bounding box C for S:
ARTICLE IN PRESS L. Vandeurzen et al. / Journal of Computer and System Sciences 69 (2004) 617–655
625
Fig. 5. A bounding box for the semi-linear set of Example 4, shown in Fig. 4.
Notice that S-C is a triangle, and that the special points of S-C are the corner points of this triangle, which are marked in Fig. 5. Notice that the two lines defining the partition of the twodimensional plane described in Example 4 are supported by the special points of S-C:
3. The query language PFOL We now turn to the limitations of FO þ linear: As observed in the Introduction, some straightforward linear queries, such as collinearity or convex closure, are not expressible in FO þ linear [2,18], not even for a finite set of points. Let us try to see why. Example 6. Consider the collinearity query in R2 : The standard way to express that all points of a p and ~ q ; such that each point of S is set S in R2 are collinear is by stating that there are two points, ~ on the line defined by ~ p and ~ q : The corresponding formula is ð(px Þð(py Þð(qx Þð(qy Þð8xÞð8yÞðSðx; yÞ ) ð(lÞðx ¼ lpx þ ð1 lÞqx 4y ¼ lpy þ ð1 lÞqy ÞÞ: Unfortunately, this is not an FO þ linear formula, but a general FO þ poly formula, because of the variable multiplication. Example 7. Consider the convex-closure query in R2 : The standard way to express that a point ~ s p; ~ q ; and ~ r; is in the convex closure of a set S in R2 ; is by stating that there are three points in S; ~ p þ l2~ q þ l3~ r: s ¼ l1~ and three nonnegative numbers l1 ; l2 ; and l3 ; adding up to unity, such that ~ The convex closure is the set is fðx; yÞj ð(px Þð(py Þð(qx Þð(qy Þð(rx Þð(ry Þð(l1 Þð(l2 Þð(l3 ÞðSðpx ; py Þ4 Sðqx ; qy Þ4Sðrx ; ry Þ4l1 X04l2 X04l3 X04l1 þ l2 þ l3 ¼ 14 x ¼ l1 px þ l2 qx þ l3 rx 4y ¼ l1 py þ l2 qy þ l3 ry Þ:g Again, this is not an FO þ linear formula, but a general FO þ poly formula, because of the variable multiplication. If we want to express these queries, we need to augment FO þ linear with a limited amount of multiplicative power, but without sacrificing the soundness of the language with respect to the class of linear queries. This is exactly what we wish to achieve by introducing the query language PFOL, below.
ARTICLE IN PRESS 626
L. Vandeurzen et al. / Journal of Computer and System Sciences 69 (2004) 617–655
In order to define the linear query language PFOL, we first need to define an intermediate language FO þ linear þ PðD1 ; y; Dk Þ; where D1 ; y; Dk are so-called domain symbols denoting finite sets. For this purpose, we assume two sorts of variables, called real variables and product variables. We shall use x; y; z; y; possibly subscripted, to denote real variables, and p; q; r; y; possibly subscripted, to denote product variables. Finally, we shall use the symbol t; possibly subscripted, to denote a variable that can either be a real variable or a product variable. Definition 8. Let D1 ; y; Dk be domain symbols. An atomic partial FO þ linear þ PðD1 ; y; Dk Þ formula is defined as follows: *
*
*
*
Rðt1 ; y; tn Þ; with R a relation name of arity n and t1 ; y; tn real variables or product variables, is an atomic partial FO þ linear þ PðD1 ; y; Dk Þ formula; P n i¼1 ai ti y a with a1 ; y; an ; and a real algebraic numbers, t1 ; y; tn real variables or product variables, and y one of the relation symbols ¼; a; o; p; 4; and X; is an atomic partial FO þ linear þ PðD1 ; y; Dk Þ formula; t1 ¼ pt2 ; with t1 and t2 real variables or product variables and p a product variable, is an atomic partial FO þ linear þ PðD1 ; y; Dk Þ formula; and pffiffiffiffiffi t ¼ jpj; with t a real variable or a product variable and p a product variable, is an atomic partial FO þ linear þ PðD1 ; y; Dk Þ formula.
A partial FO þ linear þ PðD1 ; y; Dk Þ formula is built from atomic partial FO þ linear þ PðD1 ; y; Dk Þ formulae using the connectives : and 4 and the quantifiers ð(xÞ; with x a real variable, and ð(pADi Þ; with p a product variable and 1pipk: An FO þ linear þ PðD1 ; y; Dk Þ formula is a partial FO þ linear þ PðD1 ; y; Dk Þ formula without free product variables. In the context of sets of real numbers as interpretations for D1 ; y; Dk and a linear constraint database containing interpretations for the relevant predicate symbols, the semantics of an FO þ linear þ PðD1 ; y; Dk Þ formula is the obvious one. Notice that, for k ¼ 0; the above definition reduces to the definition of an FO þ linear formula. We now state an important soundness property of an FO þ linear þ PðD1 ; y; Dk Þ formula. Proposition 9. Let D1 ; y; Dk be domain symbols and let jðx1 ; y; xn Þ be an FO þ linear þ PðD1 ; y; Dk Þ formula with free variables x1 ; y; xn : If D1 ; y; Dk are interpreted as finite sets of real algebraic numbers, then the set fðx1 ; y; xn Þ j jðx1 ; y; xn Þg defined by j is interpreted as a semilinear set. Proof. Let, for i ¼ 1; y; k; the interpretation of Di be the finite set of real algebraic numbers fci1 ; y; cimi g: We consecutively eliminate product variables in j by replacing formulae of the form ð(pADi Þc by cðci1 =pÞ3?3cðcimi =pÞ (or false, if Di is interpreted as the empty set). Hence, if j; is the result of all the substitutions, then fðx1 ; y; xn Þ j jðx1 ; y; xn Þg ¼ % fðx1 ; y; xn Þjjðx % 1 ; y; xn Þg: Since the latter set is expressed as the result of an FO þ linear query on a linear constraint database, it is semi-linear. &
ARTICLE IN PRESS L. Vandeurzen et al. / Journal of Computer and System Sciences 69 (2004) 617–655
627
Using Definition 8, we now define the language PFOL. Definition 10. A PFOL program is of the form D1 ’j1 ðxÞ; y; Dk ’jk ðxÞ; fðx1 ; y; xn Þ j jðx1 ; y; xn Þg; with, for i ¼ 1; y; k; Di domain symbols, ji ðxÞ an FO þ linear þ PðD1 ; y; Di1 Þ formula, and jðx1 ; y; xn Þ an FO þ linear þ PðD1 ; y; Dk Þ formula. The semantics of a PFOL program is as follows. First, D1 ; y; Dk are consecutively interpreted as finite sets of real algebraic numbers. In this process, Di is interpreted as the set fx j ji ðxÞg if this set is finite, and as the empty set otherwise.7 Next, the set fðx1 ; y; xn Þ j jðx1 ; y; xn Þg is interpreted in the obvious way. From Proposition 9, the following is immediate. Theorem 11. The language PFOL is sound with respect to the FO þ polylin queries. Notice that the key restrictions that limit the expressive power of PFOL sufficiently to guarantee Theorem 11 are the following: (1) while multiplicative relations are allowed among product variables or between product variables and free variables, they are banned among free variables; (2) only active-domain quantification is allowed over product variables. Both properties are essential for getting closure. However, the particular relationships we allowed between product variables and free variables (or, again, product variables) are to some extent arbitrary. Of course, we want to allow multiplicative relationships in order to be able to express queries such as collinearity or convex closure. pffiffiffiffiffi Relationships of the form t ¼ jpj are less essential, however, and are included in order to be able to express linear queries involving distances. Moreover, other relationships, not allowed in PFOL, could have been considered as well without violating Theorem 11. We return to this issue in Section 8. We now show how collinearity and convex closure can be expressed in the case of finite inputs. As before, the binary predicate symbol S represents a semi-linear set in the two-dimensional plane. Example 12. The PFOL program D1 ’ð(yÞðSðx; yÞ3Sðy; xÞÞ; fð Þjð8pAD1 ÞðpapÞ3ð(px AD1 Þð(py AD1 Þð(qx AD1 Þð(qy AD1 Þ ð8xÞð8yÞðSðx; yÞ ) ð(lÞðx ¼ lpx þ ð1 lÞqx 4y ¼ lpy þ ð1 lÞqy ÞÞg decides whether the points stored in S are collinear, if S is finite, and returns the empty set otherwise. 7
Recall that it can be decided in FO þ linear whether a semi-linear set is finite, as was shown in [18]. By Proposition 9, the set fx j ji ðxÞg is semi-linear. If, in addition, it is finite, it must therefore consist of real algebraic numbers.
ARTICLE IN PRESS L. Vandeurzen et al. / Journal of Computer and System Sciences 69 (2004) 617–655
628
Notice that the real variables x and y may be replaced by product variables ranging over D1 without changing the semantics of the PFOL program in any way. Example 13. The PFOL program D1 ’ð(yÞðSðx; yÞ3Sðy; xÞÞ; fðx; yÞjð(px AD1 Þð(py AD1 Þð(qx AD1 Þð(qy AD1 Þ ð(rx AD1 Þð(ry AD1 Þð(l1 Þð(l2 Þð(l3 Þ ðSðpx ; py Þ4Sðqx ; qy Þ4Sðrx ; ry Þ4 l1 X04l2 X04l3 X04 l1 þ l2 þ l3 ¼ 14x ¼ l1 px þ l2 qx þ l3 rx 4 y ¼ l1 py þ l2 qy þ l3 ry Þg computes the convex closure of the points stored in S; if S is finite, and the empty set otherwise. The modification of the above PFOL program obtained by removing the conditions l1 X0; l2 X0; and l3 X0 computes the affine support of the points stored in S; if S is finite, and the empty set otherwise. One can also write PFOL programs deciding whether the points of an arbitrary semi-linear set of Rn are collinear, or computing the convex closure or the affine support of such a set. However, we shall not exhibit such programs, as their existence—as well as the way in which they can be obtained—will follow from Theorem 49 and its proof in Section 8. Remark 14. In order to write down PFOL programs concisely, we remark that the syntax of an FO þ linear þ PðD1 ; y; Dk Þ formula may be relaxed without increasing the expressive power of PFOL. In particular, one can allow atomic formulae of the form n X Pi ðp1 ; y; pm Þti y Pðp1 ; y; pm Þ; i¼1
with P1 ; y; Pn ; P expressions built from the product variables p1 ; y; pm and the real algebraic numbers, using addition, multiplication and absolute-value square rooting, and t1 ; y; tn product or real variables. We can also allow Boolean conjunctives other than ‘‘4’’ and ‘‘:’’, and subformulae of the form ð8pADÞj as abbreviation of :ð(pADÞ:j: We now illustrate how the square-root construct of PFOL can be put to use. Again, S is a twodimensional semi-linear set in the plane. Example 15. The PFOL program D1 ’ð(yÞðSx; yÞ3Sðy; xÞÞ; fðxÞjð(px AD1 Þð(py AD1 Þð(qx AD1 Þð(qy AD1 Þ ðRðpx ; py Þ4Rðqx ; qy Þ4 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi x ¼ ðpx qx Þ2 þ ðpy qy Þ2 Þg
ARTICLE IN PRESS L. Vandeurzen et al. / Journal of Computer and System Sciences 69 (2004) 617–655
629
(using the abbreviations proposed in Remark 14) computes the set of all distances between pairs of points stored in S; if S is finite, and the empty set otherwise. We conclude our introduction of the language PFOL with a remark on the number of finite domains introduced in a PFOL program. One can wonder whether it can be avoided to introduce more than one finite domain in a PFOL program. Although this question is open, we believe that the use of several finite domains in a PFOL query contributes to a more natural language and cannot be avoided in some cases. To illustrate our point, consider the query which takes a finite semi-linear set as input and computes, for each positive number in the input, the 4th root of that number. The most natural way to express this query in PFOL is as follows: D1 ’ð(xÞðRðxÞ4xX0Þ; pffiffiffiffiffi D2 ’ð(pAD1 Þðx ¼ jpjÞ; pffiffiffiffiffi fðxÞjð(pAD2 Þðx ¼ jpjÞg: It is far from clear how to express this query in PFOL using just one finite domain. In the remainder of this paper, we study the expressive power of the query language PFOL. Thereto, we first study the expressive power of PFOL-finite, the fragment of PFOL that returns finite outputs upon finite inputs. To do so, we need to define a finite representation of an arbitrary geometric database, for which the corresponding encoding and decoding algorithms are expressible in PFOL. Using this result, we will then be able to lift the expressiveness results for the finite case to the general case where also infinite geometric databases are involved. This approach has also been considered by other authors, e.g., Paredaens et al. [16] in the context of SafeEuql, and Benedikt and Libkin [5] on a more abstract level.
4. The languages PFOL-finite and SPFOL We start by formally introducing PFOL-finite: Definition 16. A PFOL-finite program is a PFOL program which returns finite outputs upon finite inputs. Example 17. The PFOL programs in Examples 12 and 15 are examples of PFOL-finite programs. The language PFOL-finite is defined semantically, and, therefore, difficult to study. The most straightforward way to obtain a syntactically defined language ‘‘safe’’ with respect to finite databases is to add a ‘‘finiteness test’’ at the end of an arbitrary PFOL query, comparable to the finiteness test on the domains in PFOL. Unfortunately, this is not of much help, because intermediate results of such queries may be infinite. Therefore, we shall introduce a variation of PFOL-finite, called SPFOL, in which intermediate results are finite. We shall prove that, on finite inputs, PFOL-finite and SPFOL are equivalent in expressive power, whence SPFOL can be regarded as a kind of normal form for PFOL-finite. We start with the definition of SPFOL.
ARTICLE IN PRESS 630
L. Vandeurzen et al. / Journal of Computer and System Sciences 69 (2004) 617–655
Definition 18. Let D1 ; y; Dk be domain symbols. An atomic partial safe FO þ linear þ PðD1 ; y; Dk Þ formula and its set of free variables are defined as follows: *
*
*
*
*
true is an atomic partial safe FO þ linear þ PðD1 ; y; Dk Þ formula, and its set of free variables is the empty set; adomðtÞ; with t a real variable or a product variable, is an atomic partial safe FO þ linear þ PðD1 ; y; Dk Þ formula, and its set of free variables equals ftg; Rðt1 ; y; tn Þ; with R a relation name of arity n and t1 ; y; tn real variables or product variables, is an atomic partial safe FO þ linear þ PðD1 ; y; Dk Þ formula, and its set of free variables equals ft1 ; y; tn g; Q1 ðp1 ; y; pn Þ x ¼ Q2 ðp1 ; y; pn Þ4Q1 ðp1 ; y; pn Þa0; with p1 ; y; pn product variables, x a real variable, and Q1 and Q2 expressions built from the product variables p1 ; y; pn and the real algebraic numbers using addition, multiplication, and absolute-value square rooting, is an atomic partial safe FO þ linear þ PðD1 ; y; Dk Þ formula,8 and its set of free variables equals fx; p1 ; y; pn g; and Qðp1 ; y; pn Þ y 0; with p1 ; y; pn product variables and Q an expression built from the product variables p1 ; y; pn and the real algebraic numbers using addition, multiplication, and absolutevalue square rooting, is an atomic partial safe FO þ linear þ PðD1 ; y; Dk Þ formula, and its set of free variables equals fp1 ; y; pn g:
A partial safe FO þ linear þ PðD1 ; y; Dk Þ formula and its set of free variables are inductively defined as follows: *
*
an atomic partial safe FO þ linear þ PðD1 ; y; Dk Þ formula with set of free variables F is a partial safe FO þ linear þ PðD1 ; y; Dk Þ formula with set of free variables F ; if jðx1 ; y; xm ; p1 ; y; pr Þ is a partial safe FO þ linear þ PðD1 ; y; Dk Þ formula with set of free variables fx1 ; y; xm ; p1 ; y; pr g; and cðx1 ; y; xm ; q1 ; y; qs Þ is a partial safe FO þ linear þ PðD1 ; y; Dk Þ formula with set of free variables fx1 ; y; xm ; q1 ; y; qs g; then jðx1 ; y; xm ; p1 ; y; pr Þ3cðx1 ; y; xm ; q1 ; y; qs Þ
*
is a partial safe FO þ linear þ PðD1 ; y; Dk Þ formula with set of free variables fx1 ; y; xm ; p1 ; y; pr ; q1 ; y; qs g; if jðx1 ; y; xm ; p1 ; y; pr Þ is a partial safe FO þ linear þ PðD1 ; y; Dk Þ formula with set of free variables fx1 ; y; xm ; p1 ; y; pr g; and cðy1 ; y; yn ; q1 ; y; qs Þ is a partial safe FO þ linear þ PðD1 ; y; Dk Þ formula with set of free variables fy1 ; y; yn ; q1 ; y; qs g; then jðx1 ; y; xm ; p1 ; y; pr Þ4cðy1 ; y; yn ; q1 ; y; qs Þ
*
is a partial safe FO þ linear þ PðD1 ; y; Dk Þ formula with set of free variables fx1 ; y; xm ; y1 ; y; yn ; p1 ; y; pr ; q1 ; y; qs g; if jðx1 ; y; xm ; p1 ; y; pr Þ is a partial safe FO þ linear þ PðD1 ; y; Dk Þ formula with set of free variables fx1 ; y; xm ; p1 ; y; pr g; and cðy1 ; y; yn ; q1 ; y; qs Þ is a partial safe FO þ linear þ PðD1 ; y; Dk Þ formula with set of free variables fy1 ; y; yn ; q1 ; y; qs g; and 8
Obviously, if Q1 ðp1 ; y; pn Þ is a constant aa0; we abbreviate this atomic partial safe FO þ linear þ PðD1 ; y; Dk Þ formula to a x ¼ Q2 ðp1 ; y; pn Þ:
ARTICLE IN PRESS L. Vandeurzen et al. / Journal of Computer and System Sciences 69 (2004) 617–655
631
fy1 ; y; yn gDfx1 ; y; xm g; then jðx1 ; y; xm ; p1 ; y; pn Þ4:cðy1 ; y; yn ; q1 ; y; qs Þ
*
is a partial safe FO þ linear þ PðD1 ; y; Dk Þ formula with set of free variables fx1 ; y; xm ; p1 ; y; pr ; q1 ; y; qs g; and if jðx1 ; y; xm ; p1 ; y; pr Þ is a partial safe FO þ linear þ PðD1 ; y; Dk Þ formula with set of free variables fx1 ; y; xm ; p1 ; y; pr g; and if 1pipr and 1pjpk; then ð(pi ADj Þjðx1 ; y; xm ; p1 ; y; pr Þ is a partial safe FO þ linear þ PðD1 ; y; Dk Þ fx1 ; y; xm ; p1 ; y; pi1 ; piþ1 ; y; pr g:
formula with
set of
free variables
We define a safe FO þ linear þ PðD1 ; y; Dk Þ formula as a partial safe FO þ linear þ PðD1 ; y; Dk Þ formula without free product variables. In the context of sets of real numbers as interpretations for D1 ; y; Dk and a linear constraint database containing interpretations for the relevant predicate symbols, the semantics of a safe FO þ linear þ PðD1 ; y; Dk Þ formula is the obvious one, provided adom is interpreted as the active domain of the input database. The syntax of an SPFOL program is the same as that of a PFOL program, except that all formulae are safe. The semantics of an SPFOL program is defined in the same way as the semantics of a PFOL program. Example 19. The PFOL program in Example 15 is actually an SPFOL program, provided D is defined using the active-domain atom. The language SPFOL has the following basic properties: Proposition 20. (1) Each constant linear query returning a finite output can be expressed in SPFOL. (2) Each SPFOL program is equivalent to a PFOL program. (3) SPFOL programs return finite outputs upon finite inputs. The first statement follows from the observation that the most obvious way to describe a finite relation yields a safe FO þ linear formula. The second statement is obvious, whereas the third statement can be proved by structural induction. In order to establish the equivalence in expressive power of PFOL-finite and SPFOL as far as finite inputs are concerned, it remains to show that each PFOL-finite program is equivalent to an SPFOL program. The remainder of this section is concerned with proving a series of properties of SPFOL which will contribute to the desired result, shown in Section 6. Unless stated otherwise, all SPFOLexpressible queries considered below are implicitly assumed to take finite inputs only. We first introduce the following notion.
ARTICLE IN PRESS 632
L. Vandeurzen et al. / Journal of Computer and System Sciences 69 (2004) 617–655
Definition 21. Let Q be a linear query. The active range of Q upon an input of Q is the set of all real numbers in the corresponding output of Q: The active range query is the query returning upon an input of Q the active range of Q: An active range superset of Q upon an input of Q is a superset of the active range of Q: An active range superset query of Q is any query returning upon an input of Q an active range superset of Q: We can now state the first in our series of properties, one which will turn out to be of key importance in Section 6. Proposition 22. Let Q be an FO þ poly query returning finite outputs upon finite inputs. If an active range superset query of Q is expressible in SPFOL, then Q is expressible in SPFOL. Proof. Let jðx1 ; y; xn Þ be an FO þ poly formula computing Q: From a result by Benedikt and Libkin [3,4], it follows that there exists an equivalent formula cðx1 ; y; xn Þ in which all quantified variables range over the active domain (of the input of c). Without loss of generality, we assume that there are no universal quantifiers in c: Let Din be a domain symbol corresponding to the * be c in which (i) each bound variable x has been replaced by a active domain of the input. Let c distinct product variable px ; and (ii) each quantifier ‘‘ð(xÞ’’ has been replaced by ‘‘ð(px ADin Þ’’. Finally, let D1 ’W1 ðxÞ; y; Dk ’Wk ðxÞ; fðxÞjWðxÞg be an SPFOL program computing an active range superset query of Q: Consider the ‘‘program’’ Din ’adomðxÞ; D1 ’W1 ðxÞ; y; Dk ’Wk ðxÞ; Dout ’WðxÞ; * 1 ; y; pn Þg: fðx1 ; y; xn Þjð(p1 ADout Þyð(pn ADout Þðx1 ¼ p1 4?4xn ¼ pn 4cðp * 1 ; y; pn Þ can be interpreted as a safe FO þ linear þ It should first be observed that cðp PðDin ; D1 ; y; Dk ; Dout Þ formula, because (i) all quantification is of the form ‘‘ð(px ADin Þ’’, and (ii) apart from quantification, the syntax of SPFOL does not impose restrictions on formulae in which only product variables occur. Hence the above ‘‘program’’ is an SPFOL program. Clearly, it expresses Q: & The following result is immediate from Proposition 22: Proposition 23. The restriction to finite inputs of each FO þ poly-expressible Boolean query is expressible in SPFOL. To a certain extent, the following property is the converse of Proposition 22. Proposition 24. The active range query of an SPFOL-expressible query is SPFOL-expressible.
ARTICLE IN PRESS L. Vandeurzen et al. / Journal of Computer and System Sciences 69 (2004) 617–655
633
Proof. Let Q be an SPFOL-expressible query. Then, clearly, the active range query of Q is expressible in FO þ poly: By Proposition 22, it thus suffices to show that an active range superset query of Q is expressible in SPFOL. Let D1 ’j1 ðxÞ; y; Dk ’jk ðxÞ; fðx1 ; y; xn Þ j jðx1 ; y; xn Þg be an SPFOL program for Q: Let cðxi1 ; y; xir ; p1 ; y; pm Þ; 1pi1 ; y; ir pn and p1 ; y; pm product variables, a partial safe subformula of jðx1 ; y; xn Þ with set of free variables fxi1 ; y; xir ; p1 ; y; pm g: For j ¼ 1; y; m; let uj ; 1puj pk; be the index such that pj is quantified over Duj in j: From the partial safe FO þ linear þ PðD1 ; y; Dk Þ formula cðxi1 ; y; xir ; p1 ; y; pm Þ; we define the safe FO þ linear þ PðD1 ; y; Dk Þ formula % i1 ; y; xir Þ ð(p1 ADu1 Þyð(pm ADum Þcðxi1 ; y; xir ; p1 ; y; pm Þ: cðx Then, by structural induction, we prove, for each partial safe subformula cðxi1 ; y; xir ; p1 ; y; pm Þ of jðx1 ; y; xn Þ; that an active range superset query of the query expressed by the SPFOL program D1 ’j1 ðxÞ; y; Dk ’jk ðxÞ; % i1 ; y; xir Þg fðxi1 ; y; xir Þjcðx can be expressed in SPFOL. We shall call the above program Pc : If c is atomic, then either the empty set, fðxÞ j adomðxÞg; or Pc itself is an SPFOL program computing an active range super set query of Pc ; settling the basis of the induction. If c has the form c1 3c2 ; with c1 and c2 safe, then the query of which the output is the union of the output of an active range superset query of Pc1 and the output of an active range superset query of Pc2 is an active range superset query of Pc : (The validity of this claim hinges on the fact that c1 and c2 have the same free real variables.) Clearly, this union can be computed in SPFOL if the subqueries can be computed in SPFOL. If c has the form c1 4c2 or c1 4:c2 ; with c1 and c2 safe, then an active range superset query of Pc1 is also an active range superset query of Pc : % ¼ W; % whence an active range Finally, if c has the form ð(pi ADui ÞW; with W safe, then c superset query of PW is trivially an active range superset query of Pc ¼ PW ; concluding the induction step. It now suffices to observe that j% ¼ j; whence Pj is an SPFOL program expressing Q: & We conclude this series of propositions with the following fundamental property of SPFOL. Proposition 25. The SPFOL-expressible queries are closed under composition.
ARTICLE IN PRESS L. Vandeurzen et al. / Journal of Computer and System Sciences 69 (2004) 617–655
634
Proof. Consider a composition R1 :¼ Q1 ; y; Rm :¼ Qm ; QðR1 ; y; Rm Þ; with Q1 ; y; Qm ; and Q SPFOL-expressible queries, and R1 ; y; Rm the input predicates of Q: To compute this composition, the input predicates in Q must be substituted by the corresponding outputs of Q1 ; y; Qm : Let, for i ¼ 1; y; m; Di1 ’ji1 ðxÞ; y; Diki ’jiki ðxÞ; fðx1 ; y; xni Þ j ji ðx1 ; y; xni Þg be an SPFOL program computing Qi : Also, let, for i ¼ 1; y; m; Ei1 ’Wi1 ðxÞ; y; Eili ’Wili ðxÞ; fðxÞjWi ðxÞg be an SPFOL program expressing the active range query of Qi (Proposition 24). Finally, let D1 ’c1 ðxÞ; y; Dk ’ck ðxÞ; fðx1 ; y; xn Þjcðx1 ; y; xn Þg be an SPFOL program computing Q: Without loss of generality, we may assume that all domain symbols are pairwise different (even if some of the queries involved are identical). % be obtained from c1 ; y; ck and c; respectively, by the following % 1 ; y; c% k and c Let c substitutions: (1) each occurrence of ‘‘Ri ðt1 ; y; tni Þ’’, 1pipm; with t1 ; y; tni real variables or product variables, is replaced by ‘‘ji ðt1 ; y; tni Þ’’; and (2) each occurrence of ‘‘adomðtÞ’’, with t a real variable or a product variable, is replaced by ‘‘W1 ðtÞ3?3Wm ðtÞ’’. Then, D11 ’j11 ðxÞ; y; D1k1 ’j1k1 ðxÞ; ^ Dm1 ’jm1 ðxÞ; y; Dmkm ’jmkm ðxÞ; E11 ’W11 ðxÞ; y; E1l1 ’W1l1 ðxÞ; ^ Em1 ’Wm1 ðxÞ; y; Emlm ’Wmlm ðxÞ; % 1 ðxÞ; y; Dk ’c % k ðxÞ; D1 ’c % 1 ; y; xn Þg fðx1 ; y; xn Þjcðx is an SPFOL program computing the composition under consideration. & We shall make extensive use of Proposition 25 in the remainder of this paper, however, without each time explicitly mentioning this.
ARTICLE IN PRESS L. Vandeurzen et al. / Journal of Computer and System Sciences 69 (2004) 617–655
635
The proof that each PFOL-finite program can be transformed into an equivalent SPFOL program will use structural induction. Unfortunately, intermediate results (i.e., outputs corresponding to subformulae of formulae in the program) of PFOL-finite programs may be infinite. Therefore, we intend to prove by structural induction that finite representations of these intermediate results can be computed in SPFOL. Introducing these finite representations is the purpose of the following section. The actual equivalence proof will then follow in Section 6. 5. Finite representations of semi-linear sets In this section, we present an encoding and decoding algorithm, both implementable in PFOL, which compute a finite representation of an arbitrary semi-linear set and recompute the semi-linear set from its finite representation, respectively. We proceed as follows. First, we present the encoding and decoding algorithms, and define the canonical finite representation of a semi-linear set as the result of the encoding algorithm. Then, we define finite representations as a generalization of canonical finite representations in such a way that the decoding algorithm is guaranteed to recompute the original semi-linear set on any finite representation of it. For clarity, we first consider the case where the linear constraint database consists of bounded semi-linear sets, and then say which modifications are required if unbounded semi-linear sets occur. 5.1. The bounded case Let S be a bounded semi-linear set of Rn : We describe the encoding. First, denote by T the set of special points of S which can be computed in FO þ linear as shown in Section 2.2. We recall from Section 2.2 that any finite collection of hyperplanes such that every affine support of a subset of points of T can be described as an intersection of hyperplanes of that collection, partitions the n-dimensional space such that S is a finite union of cells of that partition. We now describe how such a collection of hyperplanes can be computed within PFOL. First, we determine the affine support of T; which can be done in PFOL (Example 13). We also determine the dimension of this affine support, say kpn; which can be p 1k ; y; ~ p k1 ; y; ~ p kk of done in FO þ linear (see [18]). Then, consider all possible sequences ~ p 11 ; y; ~ p i1 ; y; ~ p ik : Let U consist of all k2 points of T: Let, for i ¼ 1; y; k; Ai be the affine support of ~ T points ~ p for which ki¼1 Ai ¼ f~ p g: Clearly, TDU; and U can be computed from T in PFOL. (Where necessary, product variables can be introduced which range over the set of coordinates of points in T:) Example 26. We recall Example 3 with S the two-dimensional semi-linear set shown in Fig. 2 and T; the set of the special points of S; consisting of the three corner points of the outer triangle, the three corner points of the inner triangle, and the special point on the base of the outer triangle, 7 points in total. Clearly, the affine support of T is the two-dimensional plane itself, and hyperplanes are lines in the plane. All 6 lines shown in Fig. 3 connect points of T; so the 12 points
ARTICLE IN PRESS L. Vandeurzen et al. / Journal of Computer and System Sciences 69 (2004) 617–655
636
they generate (including the 7 points of T) all belong to U: The set U; however, contains many more points. For instance, two additional points are obtained by intersecting the line connecting the tops of both triangles with the two horizontal lines in Fig. 3. Next, we compute the finite relation Senc with arity nðn þ 1Þ: The relation consists of ðn þ 1Þtuples of n-dimensional points of U; not necessarily distinct, such that the open9 convex closure of these points is contained in S: In particular, some tuples of S enc denote a point of U which is contained in S; other tuples of S enc denote two different points of U such that the open interval between these points is contained is S; still other tuples of S enc denote three non-collinear points of U such that the open triangle of which these points are the corners is contained in S; and so on. Notice that tuples of S enc consist of at most k þ 1 different points (k being the dimension of S). Since product variables can be used to represent the points of the finite relation U; and since the convex closure of a finite set of points can be computed in PFOL (Example 13), we conclude that the entire query of type ½n-½ðn þ 1Þn which, given a semi-linear set S as input, returns the finite relation S enc as output, can be computed in PFOL. The output relation Senc is the canonical finite representation of S: We consider the following property to be the key property of this canonical finite representation: Proposition 27. Let S be a bounded semi-linear set of Rn and let S enc be the canonical finite representation of S: Then S¼
[
CHð~ p 1 ; y; ~ p nþ1 Þ;
ð~ p 1 ;y;~ p nþ1 ÞAS enc
where CHð~ p 1 ; y; ~ p nþ1 Þ is the open convex closure of ~ p 1 ; y; ~ p nþ1 : Proof. By construction of the relation Senc ; the inclusion from right to left is trivial. Therefore, we concentrate on the other inclusion. As explained in Section 2.2, we know that S is a finite union of cells of the partition of the affine support A of S induced by the ðk 1Þdimensional hyperplanes supported by the points of T: By construction, the cells of this partition have their corner points in U: Since S is bounded, S is a finite union of bounded cells, and since these are intersections of half-planes and thus convex, they can be triangulated using their corner points only. Thus, each cell contained in S; whence S itself, is fully contained in the left-hand side of the above equality. Decoding the canonical finite representation of a bounded semi-linear set can be done in PFOL in the obvious way, suggested by the formula in the statement of Proposition 27.
9
‘‘Open’’ is to be understood with respect to the topology of the affine support of the points under consideration. In Example 13, the open convex closure can be obtained by requiring that l1 ; l2 ; and l3 are strictly greater than 0. This technique also works in general.
ARTICLE IN PRESS L. Vandeurzen et al. / Journal of Computer and System Sciences 69 (2004) 617–655
637
5.2. The general case Let S be a semi-linear set of Rn which may be bounded or unbounded. The methodology of the bounded case can easily be extended to this general case, provided we can incorporate into the finite representation the ‘‘corner points at infinity’’ of any unbounded cells of the partition. ‘‘Corner points at infinity’’ will be represented by directional vectors. Hence, for each direction, we consider two ‘‘corner points at infinity’’, one for each orientation. To compute these directional % As shown in [8], one such bounding vectors, it is necessary to introduce a bounding box B for S: box can be computed in FO þ linear; whence in PFOL. Thus, we construct in PFOL from S-B the set T of special points. Let U be, in the same way as T p g with k the dimension of the affine in the bounded case, those points ~ p for which ki¼1 Ai ¼ f~ support of T and Ai the affine support of k þ 1 linear independent points of T: To find the ‘‘corner points at infinity’’, we compute the set V of directional vectors ~ s ¼ 7ð~ c ~ p Þ; where ~ c is a point of T on the boundary of B and ~ p is an arbitrary point of T; and the open interval between both points is fully contained in S:10 Using the coordinates of the points of T as a finite domain for restricting the range of product variables, we can easily write a PFOL program to compute V : The canonical finite representation of S is again a finite relation S enc ; but which is now of arity ðn þ 1Þ2 : Intuitively, we add an extra coordinate to each point, which equals ‘‘1’’ for a ‘‘finite’’ corner point, and ‘‘0’’ for a ‘‘corner point at infinity’’, i.e., for a directional vector. The relation S enc is the subset of n[ þ1 ðU f1gÞi ðV f0gÞnþ1i ; i¼1
with 1pipn þ 1; consisting of all n þ 1-tuples p i ; 1Þ; ð~ s iþ1 ; 0Þ; y; ð~ s nþ1 ; 0ÞÞ ðð~ p 1 ; 1Þ; y; ð~ such that the open convex closure of all these, not necessarily different, points and ‘‘directions’’ is contained in S: We notice that a point p is in this open convex closure if there exist l1 40; y; li 40; mjþ1 40; y; mnþ1 40 such that l1 þ ? þ li ¼ 1 and ~ p ¼ l1~ p 1 þ ? þ li~ p i þ miþ1~ s iþ1 þ ? þ mnþ1~ s nþ1 : Since product variables can be used to represent the points of the finite relations U and V ; we conclude that the entire query of type ½n-½ðn þ 1Þ2 ; that, given a semi-linear set S as input, returns its canonical finite representation as output, can be computed in PFOL. Example 28. Recall Example 5 with the two-dimensional set S shown with a bounding box B in Fig. 5. For convenience, we shall assume that the angular point has coordinates ð0; 0Þ; while the special points on the bounding box have coordinates ð1; 0Þ and ð1; 0:5Þ; respectively. The set T consists of those points. Clearly, U ¼ T: The set V consists of 6 directional vectors, with coordinates ð1; 0Þ; ð1; 0Þ; ð1; 0:5Þ; ð1; 0:5Þ; ð0; 0:5Þ; and ð0; 0:5Þ; respectively. The canonical finite representation S enc of S is the closure under permutations of generalized points within one 10
The latter condition restricts the set of directional vectors to be considered, but will turn out not to be strictly necessary.
ARTICLE IN PRESS L. Vandeurzen et al. / Journal of Computer and System Sciences 69 (2004) 617–655
638
tuple of the set fð0; 0; 1; 0; 0; 1; 0; 0; 1Þ; ð1; 0; 1; 1; 0; 1; 1; 0; 1Þ; ð1; 12; 1; 1; 12; 1; 1; 12; 1Þ; ð0; 0; 1; 1; 0; 1; ; ; Þ; ð0; 0; 1; 1; 12; 1; ; ; Þ; ð1; 0; 1; 1; 12; 1; ; ; Þ; ð0; 0; 1; 1; 0; 0; ; ; Þ; ð0; 0; 1; 1; 12; 0; ; ; Þ; 1; 0; 1; 1; 0; 0; ; ; Þ; ð1; 12; 1; 1; 12; 0; ; ; Þ; ð0; 0; 1; 1; 0; 1; 1; 12; 1Þ; ð1; 0; 1; 1; 12; 1; 1; 0; 0Þ; ð1; 0; 1; 1; 12; 1; 1; 12; 0Þ; ð0; 0; 1; 1; 0; 1; 1; 12; 1Þ; ð0; 0; 1; 1; 12; 1; 1; 0; 0Þ; ð0; 0; 1; 1; 0; 0; 1; 12; 0Þ; ð1; 0; 1; 1; 0; 0; 1; 12; 0Þ; ð1; 12; 1; 1; 0; 0; 1; 12; 0Þg; with ð; ; Þ standing for one of the other points occurring in that tuple. Proposition 27 can immediately be generalized: Proposition 29. Let S be an arbitrary semi-linear set of Rn and let S enc be the canonical finite representation of S: Then S¼
[
CHðð~ p 1 ; t1 Þ; y; ð~ p nþ1 ; tnþ1 ÞÞ;
ðð~ p 1 ;t1 Þ;y;ð~ p nþ1 ;tnþ1 ÞÞAS enc
p 1 ; t1 Þ; y; ð~ p nþ1 ; tnþ1 ÞÞDRn is the open convex where, for i ¼ 1; y; n þ 1; ti ¼ 1 or ti ¼ 0 and CHðð~ p i ; ti Þ; where, for i ¼ 1; y; n þ 1; ð~ p i ; ti Þ is interpreted as the ‘‘finite’’ corner closure of ð~ p 1 ; t1 Þ; y; ð~ p i ; if ti ¼ 0: point ~ p i ; if ti ¼ 1; and as the ‘‘corner point at infinity’’ described by the directional vector ~ Again, decoding the canonical finite representation of an arbitrary semi-linear set can be done in PFOL in the obvious way, suggested by the formula in the statement of Proposition 29. We now define a finite representation of S as any relation of the right arity that satisfies Proposition 29. For these relations, the decoding query described above always returns the original semi-linear set.
6. The equivalence in expressive power of the languages PFOL-finite and SPFOL We now return to our goal of proving that, for finite inputs, PFOL-finite and SPFOL are equivalent in expressive power. Therefore, we still need to show that each PFOL-finite program can be transformed into an equivalent SPFOL program. As pointed out at the end of Section 4, there is no straightforward way to achieve this with structural induction, as intermediate results (i.e., outputs corresponding to subformulae of formulae in the program) of PFOL-finite programs may be infinite. We can prove, however, that finite representations of these intermediate results can be computed in SPFOL. The equivalence result will then readily follow. In the first part of this section, we present some preliminary results; in the second part, we return to our main line of argument.
ARTICLE IN PRESS L. Vandeurzen et al. / Journal of Computer and System Sciences 69 (2004) 617–655
639
6.1. Preliminary results In preparation of our main argument, we first establish some closure properties with respect to set operations. Lemma 30. Let S be a semi-linear set of Rn : There exists an SPFOL program that, upon a finite representation of S as input, computes a finite representation of the complement Rn S as output. 2
Proof. Let Senc DRðnþ1Þ be an arbitrary finite representation of S: From S enc ; we shall compute the corner points (both finite and infinite) of the cells of a partition P of Rn into convex cells such that, for every tuple of S enc ; the convex subset of Rn represented by that tuple is a union of cells of P: Consequently, S; whence also Rn S; is a union of cells of P: From the obtained corner points, we shall compute a finite representation of Rn S: We show that the entire computation can be expressed in FO þ poly; and that the set of the coordinates of the corner points, which is an active range superset of this query, can be computed in SPFOL. By Proposition 22, it then follows that the entire computation can be expressed in SPFOL. p þ d~; with ~ pa Let T be the set of all (finite) corner points occurring in a tuple of Senc ; all points ~ n enc ~ ~ (finite) corner point and d a directional vector occurring in a tuple of S ; the origin 0 of R ; and e n of the canonical coordinate basis of Rn : Clearly, T can be computed the unit vectors ~ e 1 ; y;~ enc from S in SPFOL. Next, we perform the construction in Section 5.2 to obtain the set U consisting of the (finite) points that can be described as intersections of n ðn 1Þ-dimensional hyperplanes supported by p i1 ; y; ~ p in ; points of T: Let ~ p ij ðpij1 ; y; pijn Þ; 1pi; jpn; be points of T: The condition that ~ 1pipn; supports an ðn 1Þ-dimensional hyperplane of Rn has the form Qi ðpi11 ; y; pinn Þa0; with Qi a polynomial in pi11 ; y; pinn ; expressing some determinant. To obtain the common point(s) of the n hyperplanes, we have to solve a system of equations 8 > < Q11 ðp111 ; y; p1nn Þ x1 þ ? þ Q1n ðp111 ; y; p1nn Þ xn ¼ Q10 ðp111 ; y; p1nn Þ; ^ > : Qn1 ðpn11 ; y; pnnn Þ x1 þ ? þ Qnn ðpn11 ; y; pnnn Þ xn ¼ Qn0 ðpn11 ; y; pnnn Þ; where, for i ¼ 1; y; n; Qi1 ðpi11 ; y; pinn Þ x1 þ ? þ Qin ðpi11 ; y; pinn Þ xn ¼ Qi0 ðpi11 ; y; pinn Þ p in : This system has a unique is an equation describing the hyperplane supported by ~ p i1 ; y; ~ solution if and only its determinant is nonzero, which is a condition of the form Qðp111 ; y; pnnn Þa0; with Q a polynomial in p111 ; y; pnnn : If this is the case, then the solution is given by equations of the form Qðp111 ; y; pnnn Þ xi ¼ Q0i0 ðp111 ; y; pnnn Þ; with Q and Q010 ; y; Q0n0 polynomials in p111 ; y; pnnn : Since, obviously, product variables can be used to represent coordinates of points of T; it follows that U can be computed from T in SPFOL.
ARTICLE IN PRESS L. Vandeurzen et al. / Journal of Computer and System Sciences 69 (2004) 617–655
640
Since ~ 0 ;~ e 1 ; y;~ e n are in T; the (finite) corner points occurring in a tuple of S enc also occur in U: Hence, the partition of Rn into convex cells defined by T (of which the set of the (finite) corner points of its cells is precisely U) satisfies the requirements imposed at the beginning of this proof. Next, let V ¼ f~ p ~ q j~ p; ~ q AU4~ p a~ q g: The set V is a set of directional vectors, which will be used to describe the infinite cells of P: (Actually, V generally contains more directional vectors than needed, resulting in a refinement of P; which is no objection.) Clearly, U,V can be computed from T in SPFOL. Since the (finite) corner points of U and the directional vectors of V suffice to obtain a finite representation of Rn S; the query returning the set of the coordinates of the points of U and the points of V upon input Senc is an active range superset query of the query we intend to express. By Proposition 24, this query is computable in SPFOL. Finally, to obtain the finite representation of Rn S; we must select all tuples of n þ 1 (finite and infinite) points of U and V (of which at least one belongs to U) for which the open convex closure does not meet any of the open convex closures of n þ 1 (finite and infinite) points in a tuple of S enc : Obviously, this query can be expressed in FO þ poly: & Lemma 31. Let S1 and S2 be semi-linear sets of Rn : There exist SPFOL programs that, upon finite representations of S1 and S2 as input, compute finite representations of the union S1 ,S2 ; respectively, the intersection S1 -S2 ; respectively the difference S1 S2 as output. 2
Proof. Let S1enc ; S2enc DRðnþ1Þ be arbitrary finite representations of S1 and S2 ; respectively. Clearly, S1enc ,S2enc is a finite representation of S1 ,S2 ; which can be computed from S1enc and S2enc in SPFOL. Since intersection can be expressed in terms of union and complementation, and since difference can be expressed in terms of intersection and complementation, the above result and Lemma 30 together with Proposition 25 yields the second and the third claim of Lemma 31. & Lemma 32. Let S be a semi-linear set of Rn : Let i1 ; y; im be indices, not necessarily distinct and not necessarily in ascending order, such that 1pi1 ; y; im pn: There exists an SPFOL program that, upon a finite representations of S as input, computes a finite representation of the generalized projection pi1 ;y;im ðSÞ ¼ fðxi1 ; y; xim Þjðx1 ; y; xn ÞASg as output. 2
Proof. Let S enc DRðnþ1Þ be an arbitrary finite representation of S: By definition, [ S¼ CHð~ x 1 ; y; ~ x nþ1 Þ: ð~ x 1 ;y;~ x nþ1 ÞAS enc
Hence, pi1 ;y;im ðSÞ ¼
[
CHðpi1 ;y;im ;nþ1 ð~ x 1 Þ; y; pi1 ;y;im ;nþ1 ð~ x nþ1 ÞÞ:
ð~ x 1 ;y;~ x nþ1 ÞAS enc
x nþ1 are in Rn f0; 1g and represent generalized points of Rn ; clearly, In these expressions, ~ x 1 ; y; ~ x 1 Þ; y; pi1 ;y;im ;nþ1 ð~ x nþ1 Þ are in Rm f0; 1g and represent the generalized points of Rn pi1 ;y;im ;nþ1 ð~ that are the relevant projections of ~ x 1 ; y; ~ x nþ1 : Furthermore, if ~ y 1 ; y; ~ y nþ1 ARm f0; 1g are
ARTICLE IN PRESS L. Vandeurzen et al. / Journal of Computer and System Sciences 69 (2004) 617–655
641
generalized points of Rm ; then CHð~ y 1 ; y; ~ y nþ1 Þ; which is topologically open in its affine support, is y jmþ1 Þ for which (i) 1pj1 ; y; jmþ1 pn þ 1 (notice that the union of all convex closures CHð~ y j1 ; y; ~ j1 ; y; jmþ1 are not necessarily all distinct or in ascending order), (ii) there is at least one finite point y jmþ1 ; and (iii) CHð~ y j1 ; y; ~ y jmþ1 ÞDCHð~ y 1 ; y; ~ y nþ1 Þ: From these observations, it among ~ y j1 ; y; ~ follows that a finite representation of pi1 ;y;im ðSÞ can be computed from the relevant projections of the generalized points in S enc : Obviously, this computation can be expressed in FO þ poly: Moreover, the active domain of the input of this query, which is a primitive in SPFOL, is an active range superset of this query. By Proposition 22, it follows that this query can be expressed in SPFOL. & Lemma 33. Let S be a semi-linear set of Rn : Let mX0: There exists an SPFOL program that, upon a finite representation of S as input, computes a finite representation of S Rm as output. Proof. Because of Proposition 25, it suffices to consider the case m ¼ 1 (notice that S R0 ¼ S). 2
Let Senc DRðnþ1Þ be an arbitrary finite representation of S: Then, for each tuple x nþ1 ; inþ1 Þ of Senc ; ð~ x 1 ; i1 ; y; ~ x nþ1 ; 0; inþ1 ; 0; y; 0; 71; 0Þ; ð~ x 1 ; 0; i1 ; y; ~ x nþ1 ; 0; inþ1 ; ~ x 1 ; 0; i1 Þg; fð~ x 1 ; 0; i1 ; y ; ~ |fflfflffl{zfflfflffl} n times
2
which is a subset of Rðnþ2Þ ; is a finite representation of x nþ1 ; inþ1 ÞÞ R: CHðð~ x 1 ; i1 Þ; y; ð~ (In the above expression, ð~ x j ; ij Þ; 1pjpn þ 1 and ij Af0; 1g; is an element of Rnþ1 and represents a generalized point of Rn ; semi-colons have been inserted for clarity). Clearly, a finite representation of S R can be computed from S enc in FO þ poly by making the union over all tuples of S enc of the finite representations. Moreover, the active domain of the input of this query, which is a primitive in SPFOL, augmented with constants 0; þ1; and 1; is an active range superset of this query. By Proposition 22, it follows that this query can be expressed in SPFOL. & Lemma 34. Let S1 be a semi-linear set of Rn ; and S2 a semi-linear set of Rm : There exists an SPFOL program that, upon finite representations of S1 and S2 as input, computes a finite representation of the Cartesian product S1 S2 DRnþm as output. Proof. We have that S1 S2 ¼ S1 Rm -Rn S2 : The result now follows from Lemmas 31–33 (notice that Rn S2 can be obtained from S2 Rn by a generalized projection), together with Proposition 25. & Lemmas 31, 32, and 34 yield some more closure properties for SPFOL. Lemma 35. Let S be an arbitrary finite semi-linear set of Rn ; and let S enc be the unique finite representation of S. The query mapping S to S enc and the query mapping Senc to S are both SPFOLexpressible.
ARTICLE IN PRESS 642
L. Vandeurzen et al. / Journal of Computer and System Sciences 69 (2004) 617–655
Proof. We start by observing that S enc ¼ fð~ x ; 1; y; ~ x ; 1Þj~ x ASg: |fflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflffl} nþ1 times
Obviously, the query mapping S to S enc and the query mapping S enc to S are both FO þ polyexpressible. For the former query, the active range equals the active domain—which is an SPFOL primitive—augmented with the constant 1: Hence, the active range query is clearly SPFOLexpressible. For the query mapping S enc to S; the active domain is an active range superset. By Proposition 22, it follows that both queries in the statement of Lemma 35 are SPFOLexpressible. & The following is an immediate corollary to Lemmas 31, 32, 34, and 35, and Proposition 25. Proposition 36. The SPFOL-expressible queries are closed under union, intersection, difference, generalized projection, and Cartesian product.
6.2. Main result We are now ready for the proof that each PFOL-finite program can be transformed into an equivalent SPFOL program. The proof consists of a double induction, the inner one of which is exhibited in Lemma 37. Lemma 37. Let Q be a linear query returning finite outputs upon finite inputs, and let D1 ’j1 ðxÞ; y; Dk ’jk ðxÞ; fðx1 ; y; xn Þ j jðx1 ; y; xn Þg be a PFOL program computing Q: Assume that, for i ¼ 1; y; k; there exists an SPFOL program which computes the interpretation of the domain symbol Di in the above program. Then there exists an SPFOL program computing Q. Proof. Let, for each product variable p that occurs in j (as bound variable), uðpÞ; 1puðpÞpk; be the index such that p is quantified over DuðpÞ : (Without loss of generality, we assume that each product variable occurs in only one quantification). For each real variable x that occurs in j (as bound or unbound variable), we put uðxi Þ ¼ 0: In the remainder of this proof, we interpret D0 as R: If t1 ; y; tr are real variables or product variables occurring in j; then, by the assumption and by Lemmas 32–35, and Proposition 25, the query returning upon an input of Q a finite representation of Duðt1 Þ ? Duðtr Þ can be computed in SPFOL. Now, let cðt1 ; y; tr Þ be a subformula of jðx1 ; y; xn Þ with set of free real and product variables ft1 ; y; tr g: We define the output of cðt1 ; y; tr Þ as the query returning upon an input (in particular, a finite one) of Q the set fðt1 ; y; tr Þjcðt1 ; y; tr Þ4Duðt1 Þ ðt1 Þ4?4Duðtr Þ ðtr Þg
ARTICLE IN PRESS L. Vandeurzen et al. / Journal of Computer and System Sciences 69 (2004) 617–655
643
as output. By structural induction, we show that, for each such subformula cðt1 ; y; tr Þ of jðx1 ; y; xn Þ; there exists an SPFOL program returning upon an input of Q a finite representation of the output of cðt1 ; y; tr Þ: For the basis of this induction, we first examine the atomic subformulae. *
*
*
Rðt1 ; y; tr Þ; with R an input predicate symbol. By definition, the output of Rðt1 ; y; tr Þ is the intersection of R with Duðt1 Þ ? Duðtr Þ : By Lemmas 31 and 35, and Proposition 25, it follows that a finite representation of the output of Rðt1 ; y; tr Þ is also SPFOL-computable. P r numbers and y one of the relation symbols ¼; a;o; i¼1 ai ti y a; with a1 ; y; ar ; a real algebraicP r p; 4; and X: The output of is the intersection of the set i¼1 ai ti y a Pr fðy1 ; y; yr Þj i¼1 ai yi y ag with Duðt1 Þ ? Duðtr Þ : The query returning the first set is a constant linear query (the output is input-independent). Therefore, the query returning some finite representation of this set is also a constant query, which, by Proposition 20, is SPFOLcomputable. P By Lemma 31 and Proposition 25, it follows that a finite representation of the output of ri¼1 ai ti y a is also SPFOL-computable. t1 ¼ pt2 ; with p a product variable. First, consider the query Qinit which, upon input the interpretation of DuðpÞ ; returns the set fðy1 ; y2 ; y3 Þjy1 ¼ y3 y2 4DuðpÞ ðy3 Þg: The partial output of this query corresponding to some fixed value of y3 is a line in R3 through the points ð0; 0; y3 Þ and ðy3 ; 1; y3 Þ: Hence, the SPFOL program (using some vector notation for the purpose of abbreviation) D’DuðpÞ ðxÞ; z 2 ;~ z 3 ;~ z 4 Þjð(pADÞð~ z 1 ¼ ð0; 0; p; 1Þ4~ z 2 ¼ ðp; 1; 0; 0Þ4 fð~ z 1 ;~ ~ z 3 ¼ ðp; 1; 0; 0Þ4~ z4 ¼ ~ z 3 Þg
*
computes, upon input the interpretation of DuðpÞ ; a finite representation of the output of Qinit : By Proposition 25, it follows that the query which, upon input an input of Q; returns a finite representation of Qinit ; is also SPFOL-computable. Since the output of t1 ¼ pt2 is the intersection of the output of Qinit with Duðt1 Þ Duðt2 Þ DuðpÞ ; it follows from Lemma 31 that a finite representation of the output of t1 ¼ pt2 is also SPFOL-computable. pffiffiffiffiffi t ¼ jpj; with p a product variable. First, consider the query Qinit which, upon input the pffiffiffiffiffiffiffi interpretation of DuðpÞ ; returns the finite set fðy1 ; y2 Þjy1 ¼ jy2 j4DuðpÞ ðy2 Þg: This query is computed by the SPFOL program D’DuðpÞ ðxÞ; fðy1 ; y2 Þjð(pADÞðy1 ¼
pffiffiffiffiffi jpj4y2 ¼ pÞg:
Since the output of Qinit is always finite, it follows from Lemma 35 that the query returning a finite representation of the output of Qinit is also SPFOL-computable. Since the output of pffiffiffiffiffi t ¼ jpj is the intersection of the output of Qinit with DuðtÞ DuðpÞ ; it follows from Lemma 31 pffiffiffiffiffi that a finite representation of the output of t ¼ jpj is also SPFOL-computable.
ARTICLE IN PRESS 644
L. Vandeurzen et al. / Journal of Computer and System Sciences 69 (2004) 617–655
This concludes the basis of the induction. For the induction step, we consider the ways in which larger formulae can be constructed. *
*
Boolean combination. It suffices to consider the connectives ‘‘:’’ and 4’’. First, let :cðt1 ; y; tr Þ be a subformula of jðx1 ; y; xn Þ: Clearly, if a finite representation of the output of cðt1 ; y; tr Þ is SPFOL-computable, then, by Lemma 30 and Proposition 25, a finite representation of the output of :cðt1 ; y; tr Þ is also SPFOL-computable. Next, let c1 ðt1 ; y; tv ; y; ts Þ4 c2 ðtv ; y; ts ; y; tr Þ be a subformula of jðx1 ; y; xn Þ: Without loss of generality, we assume that tv ; y; ts are the free (real and product) variables shared by c1 and c2 : If S1 DRs and S2 DRrvþ1 are the outputs of c1 ðt1 ; y; ts Þ and c2 ðtv ; y; tr Þ; then S1 Rrs -Rv1 S2 is the output of c1 ðt1 ; y; tv ; y; ts Þ4c2 ðtv ; y; ts ; y; tr Þ: Hence, if finite representations of the outputs of c1 ðt1 ; y; ts Þ and c2 ðtv ; y; tr Þ are SPFOL-computable, then, by Lemmas 31–33, and Proposition 25, a finite representation of the output of c1 ðt1 ; y; tv ; y; ts Þ4c2 ðtv ; y; ts ; y; tr Þ is SPFOL-computable. Quantification. If ð(xÞcðt1 ; y; tr Þ; with xAft1 ; y; tr g a real variable, is a subformula of jðx1 ; y; xn Þ; then the output of ð(xÞcðt1 ; y; tr Þ is obtained from the output of cðt1 ; y; tr Þ by projecting out the coordinate position corresponding to the free real variable x of c: Similarly, if ð(pADuðpÞ Þcðt1 ; y; tr Þ; with pAft1 ; y; tr g a product variable, is a subformula of jðx1 ; y; xn Þ; then the output of ð(pADuðpÞ Þcðt1 ; y; tr Þ is obtained from the output of cðt1 ; y; tr Þ by projecting out the coordinate position corresponding to the free product variable p of c: Hence, if a finite representation of the output of cðt1 ; y; tr Þ is SPFOLcomputable, then, by Lemma 32, a finite representation of the output of ð(xÞcðt1 ; y; tr Þ; respectively ð(pADuðpÞ Þcðt1 ; y; tr Þ; is SPFOL-computable.
We have thus shown by structural induction that, for each subformula cðt1 ; y; tr Þ of jðx1 ; y; xn Þ; there exists an SPFOL program returning upon an input of the given query Q a finite representation of the output of cðt1 ; y; tr Þ: By definition, the output of jðx1 ; y; xn Þ is the output of Q: Hence we have shown that, in particular, a finite representation of the output of Q is SPFOL-computable. Since, by assumption, the output of Q is always finite, it follows from Lemma 35 that Q itself is SPFOL-computable. & Theorem 38. Upon finite inputs, the languages PFOL-finite and SPFOL are equivalent in expressive power. Proof. By Proposition 20, each SPFOL program is equivalent to a PFOL-finite program. We are going to show that each PFOL-finite program is equivalent to an SPFOL program, by induction on the number of domain symbols in the PFOL-finite program. For the basis of this induction, we observe that a PFOL-finite program without domain symbols satisfies all the conditions of Lemma 37, whence such a program is equivalent to an SPFOL program. For the induction step, let k40; and assume as induction hypothesis that all PFOL-finite programs with at most k 1 domain symbols are equivalent to SPFOL programs. Let D1 ’j1 ðxÞ; y; Dk ’jk ðxÞ; fðx1 ; y; xn Þ j jðx1 ; y; xn Þg;
ARTICLE IN PRESS L. Vandeurzen et al. / Journal of Computer and System Sciences 69 (2004) 617–655
645
be a PFOL-finite program with k domain symbols, which we shall call P: For i ¼ 1; y; k; let j % i ðxÞ be the formula ji ðxÞ4ð8xÞð(eÞð8yÞðe404ji ðxÞ4yax4x epypx þ e ) :jðyÞÞ: Since ji ðxÞ is an FO þ linear þ PðD1 ; y; Di1 Þ formula, j % i ðxÞ is also an FO þ linear þ PðD1 ; y; Di1 Þ formula. Clearly, fðxÞjj% i ðxÞg equals fðxÞ j ji ðxÞg; if this set is finite, and the empty set, otherwise. Hence, we may conclude that D1 ’ j % 1 ðxÞ; y; Di1 ’j % i1 ðxÞ; fðxÞjj % i ðxÞg; is a PFOL-finite program with i 1ok domain symbols computing the interpretation of Di in P: By the induction hypothesis, there exists an equivalent SPFOL program. We have thus shown that, for i ¼ 1; y; k; there exists an SPFOL program which computes the interpretation of the domain symbol Di in P: Another application of Lemma 37 yields the existence of an SPFOL program equivalent to P: & We may thus regard SPFOL as a kind of normal form for PFOL-finite, and we shall use this observation to learn more about the expressiveness of PFOL-finite. 7. Expressiveness of SPFOL Although SPFOL is a coordinate-based language, we claim that it is nevertheless geometrically meaningful. To substantiate this claim, we prove that SPFOL programs in which the coefficients of the polynomials in atomic formulae are restricted to rational (or, equivalently, integer) numbers (instead of real algebraic numbers), define the same queries in the two-dimensional plane as the query language SafeEuql [16]. Indeed, SafeEuql has been devised to capture precisely the ruler and compass constructions on finite sets of points in the two-dimensional plane which return finite sets of points.11 We take the liberty of redefining SafeEuql here, so that the definition may better accommodate proofs by structural induction. e 2 represent the origin and both unit coordinate vectors of We assume that the points ~ 0; ~ e 1 ; and ~ the canonical coordinate system of the two-dimensional plane. Variables, denoted as ~ p; ~ q ;~ r ; y; and possibly subscripted, should be interpreted as points of the plane rather than coordinate ˆ p 1 ; y; ~ p k Þ; and possibly subscripted, represent relations vectors. Predicates, denoted as Rð~ consisting of k-ary tuples of points.
11
A point of the plane is constructible with ruler and compass from a finite set of points if and only if that point can be obtained by applying a finite number of ruler and compass constructions on the given set of points and already constructed points. The following constructions can be executed with ruler and compass on a finite set of points of the plane: (1) the construction of a line through two points of the given set; (2) the construction of a circle of which the midpoint and a point belong to the given set; (3) the construction of the intersection point(s) of two lines, two circles, or a line and a circle.
ARTICLE IN PRESS 646
L. Vandeurzen et al. / Journal of Computer and System Sciences 69 (2004) 617–655
Definition 39. The SafeEuql formulae are the smallest collection of formulae closed under the following rules: true is a safe formula; ˆ p 1 ; y; ~ Rð~ p k Þ is a safe formula; p k Þ is a safe formula, and if f~ q 1 ; y; ~ q 6 gDf~ p 1 ; y; ~ p k g; and if if jð~ p 1 ; y; ~ ~ r is an arbitrary variable, then jð~ p 1 ; y; ~ p k Þ4~ r ¼~ 0; p k Þ4~ r ¼~ e1; jð~ p 1 ; y; ~ p k Þ4~ r ¼~ e2; jð~ p 1 ; y; ~ p k Þ4~ r ¼~ q1; jð~ p 1 ; y; ~ p k Þ4geom-predð~ r; ~ q 1 ; y; ~ q 6 Þ; jð~ p 1 ; y; ~ with geom-pred one of the geometric predicates defined below, are safe formulae; p k Þ and cð~ p 1 ; y; ~ p k Þ are safe formulae, then jð~ p 1 ; y; ~ p k Þ3cð~ p 1 ; y; ~ p k Þ is a safe if jð~ p 1 ; y; ~ formula; p k Þ and cð~ q 1 ; y; ~ q l Þ are safe formulae, then jð~ p 1 ; y; ~ p k Þ4cð~ q 1 ; y; ~ q l Þ is a safe if jð~ p 1 ; y; ~ formula; p k Þ and cð~ q 1 ; y; ~ q l Þ are safe formulae with f~ q 1 ; y; ~ q l gDf~ p 1 ; y; ~ p k g; then if jð~ p 1 ; y; ~ p k Þ4:cð~ q 1 ; y; ~ q l Þ is a safe formula; and jð~ p 1 ; y; ~ p k Þ is a safe formula, and 1pipk; then ð(~ p i Þjð~ p 1 ; y; ~ p k Þ is a safe formula. if jð~ p 1 ; y; ~ The semantics of a SafeEuql formula is the obvious one, provided the interpretation of geom-pred is one of the following geometric predicates:12 *
*
*
*
*
*
*
*
on-lineð~ r; ~ q 1 ; y; ~ q 6 Þ; which evaluates to true if ~ q 2 and ~ q 3 are distinct and ~ q1; ~ q 2 ; and ~ q 3 are collinear; on-same-sideð~ r; ~ q 1 ; y; ~ q 6 Þ; which evaluates to true if ~ q 3 and ~ q 4 are distinct, and ~ q 1 and ~ q 2 are q4; on the same side of the line defined by ~ q 3 and ~ on-circleð~ r; ~ q 1 ; y; ~ q 6 Þ; which evaluates to true if ~ q 3 and ~ q 4 are distinct and ~ q 1 is on the circle q 3 and ~ q4; with center ~ q 2 and radius the distance between ~ in-circleð~ r; ~ q 1 ; y; ~ q 6 Þ; which evaluates to true if ~ q 3 and ~ q 4 are distinct, and ~ q 1 is in the q 3 and ~ q4; topological interior of the disk with center ~ q 2 and radius the distance between ~ l-orderð~ r; ~ q 1 ; y; ~ q 6 Þ; which evaluates to true if ~ q1; ~ q 2 ; and ~ q 3 are collinear and distinct, and ~ q2 is between ~ q 1 and ~ q3; c-orderð~ r; ~ q 1 ; y; ~ q 6 Þ; which evaluates to true if ~ q 1 ; y; ~ q 4 are pairwise distinct and on the same circle in clockwise or counter-clockwise order; l-l-crossingð~ r; ~ q 1 ; y; ~ q 6 Þ; which evaluates to true if ~ q 1 and ~ q 2 ; respectively, ~ q 3 and ~ q 4 are q 4 are not collinear, and ~ r is the intersection point of the line through ~ q 1 and ~ q2; distinct, ~ q 1 ; y; ~ q4; and the line through ~ q 3 and ~ l-c-crossingð~ r; ~ q 1 ; y; ~ q 6 Þ; which evaluates to true if ~ q 1 and ~ q 2 ; respectively, ~ q 4 and ~ q 5 are q 2 ; and the circle with center ~ q3 distinct, and ~ r is an intersection point of the line through ~ q 1 and ~ and radius the distance between ~ q 4 and ~ q5; 12
Some variables serve as dummy variables to make geometric predicates of equal width.
ARTICLE IN PRESS L. Vandeurzen et al. / Journal of Computer and System Sciences 69 (2004) 617–655 *
647
c-c-crossingð~ r; ~ q 1 ; y; ~ q 6 Þ; which evaluates to true if ~ q 2 and ~ q 3 ; respectively, ~ q 5 and ~ q 6 are ~ ~ distinct, and r is an intersection point of the circle with center q 1 and radius the distance q 3 ; and the circle with center ~ q 4 and radius ~ q 5 and ~ q6: between ~ q 2 and ~
Example 40. To illustrate Definition 39, we give an example of a SafeEuql query involving the q 4 Þ: geometric predicate l-l-crossingð~ r; ~ q 1 ; y; ~ ˆ Let R be a finite set of points in the two-dimensional plane. The SafeEuql query ˆ p Þ4Rð~ ˆ q Þ4Rð~ ˆ r Þ4Rð~ ˆ s Þ4Rð ˆ~ ˆ u Þ4 fð~ x Þjð(~ p Þð(~ q Þð(~ r Þð(~ s Þð(~ t Þð(~ u ÞðRð~ t Þ4Rð~ l-l-crossingð~ x; ~ p; ~ q ;~ r ;~ s;~ t; ~ u ÞÞg computes the intersection points of all pairs of different non-parallel lines supported by points ˆ of R: For a comparison between SPFOL and SafeEuql to make sense, we must realize that (i) expressions in atomic formulae of SPFOL are allowed to have real algebraic coefficients, and (ii) SPFOL works with variables representing real numbers, whereas SafeEuql works with variables representing points in the two-dimensional plane. Clearly, as SafeEuql expresses precisely the ruler and compass constructions, only constructible numbers13 can be computed in SafeEuql. Observe that not every algebraic number can be constructed with ruler and compass from 0 and 1: To show this, we consider the following well-known result. Proposition 41. In three-dimensional space, the cube with volume twice the volume of a given cube cannot be constructed by ruler and compass. pffiffiffi It follows from Proposition 41 that the real algebraic number 3 2 is not a constructible number. We shall, therefore, only consider SPFOL programs in which the expressions in atomic formulae are built from product variables and rational numbers, using addition, multiplication, and absolute-value square rooting. We denote this restriction of SPFOL as SPFOLQ : Moreover, we assume from now on that all relation names involved, as well as the output, have even arity. This latter assumption allows us to interpret consecutive variables in a relation name, or in the output format, as coordinates of a point in the two-dimensional plane. We first show that every SafeEuql-expressible query can be expressed in SPFOLQ : Thereto, we need the following lemma. 13 With every point in the plane, we can associate a pair of real numbers corresponding to the coordinates of that point with respect to the canonical coordinate system of the plane. We then define the constructible numbers as xAR such that the point with coordinates ðx; 0Þ can be constructed with ruler and compass from the points with coordinates ð0; 0Þ and ð1; 0Þ: It is well known that, for a and b real numbers, a þ b; ab; a; and 1=a can be constructed with ruler and compass from 0 and 1 (see Figs. 6 and 7). Hence, starting from 0 and 1; all rational numbers are constructible with ruler and compass. Moreover, the absolute-value square root of a given number can be constructed with ruler and compass (see also Fig. 6), whence the constructible numbers are a strict superset of the rational numbers. The following alternative characterization of the constructible numbers can be found in the literature: a real number x is constructible with ruler and compass from 0 and 1 if and only if x belongs to a finite Galois-extension of Q with degree 2n ; nAN:
ARTICLE IN PRESS 648
L. Vandeurzen et al. / Journal of Computer and System Sciences 69 (2004) 617–655
Lemma 42. Let R be a finite relation of arity 2; i.e., a finite set of two-dimensional points. Let p 6 Þ be one of the geometric predicates l-l-crossing, l-c-crossing, and cgeom-predð~ p; ~ p 1 ; y; ~ c-crossing defined in Definition 39. There exists an SPFOLQ program of which the output upon input R equals the set ˆ p 1 Þ4?4Rð~ ˆ p 6 Þ4geom-predð~ fð~ p Þjð(~ p 1 Þyð(~ p 6 ÞðRð~ p; ~ p 1 ; y; ~ p 6 ÞÞg: Proof. We describe, for every geometric predicate, how to obtain an SPFOLQ program that has the desired point-set as output upon input R: *
The geometric predicate l-l-crossingð~ p; ~ p 1 ; y; ~ p 6 Þ: First, we observe that the Boolean p 2 are distinct’’, ‘‘~ p 3 and ~ p 4 are distinct’’, and ‘‘~ p 1 ; y; ~ p 4 are not collinear’’, conditions ‘‘~ p 1 and ~ p 4 points of R; can be expressed in FO þ poly; whence, by Proposition 23, they can with ~ p 1 ; y; ~ also be expressed in SPFOLQ :14 Next, the intersection point ~ p is found as the unique solution of the system of equations15 ( ðpy py1 Þðpx2 px1 Þ ðpx px1 Þðpy2 py1 Þ ¼ 0; ðpy py3 Þðpx4 px3 Þ ðpx px3 Þðpy4 py3 Þ ¼ 0:
*
Solving this system of equations for px and py yields equations of the form A1 px ¼ A2 and B1 py ¼ B2 ; with A1 a0; A2 ; B1 a0; and B2 polynomials in the variables px1 ; py1 ; y; px4 ; py4 : Hence, if D is the finite domain containing the active domain of R; all computations can be done with product variables ranging over domain D; whence there exists an SPFOLQ program that computes the intersection point of two non-parallel lines. By Proposition 25, there exists an SPFOLQ program that, upon input R; has the desired point-set as output. The geometric predicate l-c-crossingð~ p; ~ p 1 ; y; ~ p 6 Þ: First, we observe that the Boolean p 2 are distinct’’, and ‘‘~ p 4 and ~ p 5 are distinct’’, with ~ p1; ~ p2; ~ p4; ~ p 5 points of R; conditions ‘‘~ p 1 and ~ can be expressed in FO þ poly; whence, by Proposition 23, they can also be expressed in SPFOLQ : Next, the intersection point ~ p has to satisfy the following system of equations: ( y y x x x x ðp p1 Þðp2 p1 Þ ¼ ðp p1 Þðpy2 py1 Þ; ðpx px3 Þ2 þ ðpy py3 Þ2 ¼ ðpx4 px5 Þ2 þ ðpy4 py5 Þ2 : We can isolate variable px in the first equation and then substitute px in the second equation which yields a system of equations A1 px ¼ B1 py þ C1 and A2 ðpy Þ2 þ B2 py þ C2 ¼ 0; with A1 a0; A2 a0; B1 ; B2 ; C1 and C2 polynomials in the variables px1 ; py1 ; y; px5 ; py5 : The solution py qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi then equals B2 =2A2 when B22 4A2 C2 ¼ 0; or ðB2 7 B22 4A2 C2 Þ=2A2 when B22 4A2 C2 40: When B22 4A2 C2 o0; no real solution py exists. Let D be the finite domain 14
Indeed, an inspection of the proof of Lemma 22 shows that, if the active range superset query of an FO þ polyexpressible query Q can be expressed in SPFOLQ ; the query Q can be also expressed in SPFOLQ : Hence, Proposition 23 remains valid for SPFOLQ : 15 We shall denote the coordinates of a two-dimensional point ~ p as px and py :
ARTICLE IN PRESS L. Vandeurzen et al. / Journal of Computer and System Sciences 69 (2004) 617–655
*
649
containing the active domain of R: Then, all computations involved can be performed with product variables ranging over D: Hence, by Proposition 25, there exists an SPFOLQ program that, upon input R; has the desired point-set as output. The geometric predicate c-c-crossingð~ p; ~ p 1 ; y; ~ p 6 Þ: First, we observe that the Boolean p 3 are distinct’’, and ‘‘~ p 5 and ~ p 6 are distinct’’, with ~ p2; ~ p3; ~ p5; ~ p 6 points of R; conditions ‘‘~ p 2 and ~ can be expressed in FO þ poly; whence, by Proposition 23, they can also be expressed in SPFOLQ : Next, the intersection point ~ p has to satisfy the equations ( y 2 x x 2 y x ðp p1 Þ þ ðp p1 Þ ¼ ðp2 px3 Þ2 þ ðpy2 py3 Þ2 ; ð1Þ ðpx px4 Þ2 þ ðpy py4 Þ2 ¼ ðpx5 px6 Þ2 þ ðpy5 py6 Þ2 ;
ð2Þ
which is equivalent with the system consisting of Eq. (1), and Eq. (2) minus Eq. (1). The latter equation is of the form Apx þ Bpy þ C ¼ 0; with Aa0; B; and C polynomials in the variables px1 ; py1 ; y; px6 ; py6 : Hence, the initial problem is reduced to the computation of the intersection of a line and a circle, both defined by points of which the coordinates are described as polynomials in variables that range over D; where D is the finite domain containing the active domain of R: This problem is solved above. We are now ready to prove the following result: Proposition 43. Every SafeEuql-expressible query can be expressed in SPFOLQ : Proof. Let Q be a query which is expressible in SafeEuql by the formula j: First, we observe that, for each point variable in a SafeEuql formula, only a finite number of points can be substituted to satisfy that formula. We show how the finite superset D; which contains all coordinates of these points, can be computed in SPFOLQ by induction on the structure of the SafeEuql formula j: We start with D equal to f0; 1g: ˆ p 1 ; y; ~ p k Þ; we augment the domain D with the active For an atomic SafeEuql formula Rð~ domain of the input database. p 1 ; y; ~ p k Þ and c2 ð~ q 1 ; y; ~ q l Þ: Let D1 and D2 be the finite Assume now SafeEuql formulae c1 ð~ domains which contain the coordinates of the finitely many points that can be substituted for point variables of, respectively, c1 and c2 ; to satisfy, respectively, c1 and c2 : p 1 ; y; ~ p k Þ4geom-predð~ p; ~ p i1 ; y; ~ p i6 Þ; with Consider a subformula of j of the form c1 ð~ 1pi1 ; y; i6 pk and geom-pred one of the following geometric predicates l-l-crossing, l-ccrossing, or c-c-crossing. Then, let D be equal to D1 ,D where D is the finite domain of all coordinates of points in the output of the SPFOLQ program of Lemma 42 for the corresponding p i6 Þ upon input D1 : geometric predicate geom-predð~ p; ~ p i1 ; y; ~ For subformulae of j of the following form: p 1 ; y; ~ p k Þ4c2 ð~ q 1 ; y; ~ q l Þ; (1) c1 ð~ p 1 ; y; ~ p k Þ3c2 ð~ p 1 ; y; ~ p k Þ or, (2) c1 ð~ p 1 ; y; ~ p k Þ4:c2 ð~ p i1 ; y; ~ p im Þ; (3) c1 ð~ with 1pi1 ; y; im pk; let D be the union of D1 and D2 :
ARTICLE IN PRESS 650
L. Vandeurzen et al. / Journal of Computer and System Sciences 69 (2004) 617–655
In all other cases, D remains unchanged. Clearly, there exists an SPFOLQ program which outputs D upon input, an input of Q: Moreover, this SPFOLQ program expresses an active range superset query of Q: Furthermore, every SafeEuql formula has an equivalent FO þ poly formula [1], whence Q can also be expressed in FO þ poly: Then, by Lemma 22, Q can be expressed in SPFOLQ : & In what follows, we say that a SafeEuql formula jð~ p Þ represents a finite domain D if the sentence fðx; 0ÞjDðxÞg ¼ f~ p j jð~ p Þg: To prove the converse of Proposition 43, we use the following lemma. Lemma 44. Let Q1 and Q2 be expressions built from the variables p1 ; y; pm ; and the rational numbers, using addition, multiplication, and absolute-value square rooting. There exists a SafeEuql formula j such that p m Þjjð~ p; ~ p 1 ; y; ~ p m Þg fð~ p; ~ p 1 ; y; ~ ¼ fððp; 0Þ; ðp1 ; 0Þ; y; ðpm ; 0ÞÞjDðp1 Þ4?4Dðpm Þ 4Q1 ðp1 ; y; pm Þp ¼ Q2 ðp1 ; y; pm Þ4Q1 ðp1 ; y; pm Þa0g; provided that a representation of D; the domain of the variables p1 ; y; pm ; can be computed from the input in SafeEuql. Proof. First, we observe that the point ðp; 0Þ; with p a rational number, can easily be constructed with ruler and compass from the points ~ 0 and ~ e 1 : Second, we show that addition, multiplication, and absolute-value square rooting on variables bound by D can be computed by ruler and compass. Thereto, consider the geometric constructions for multiplication, shown in Fig. 6, addition and absolute-value square rooting, shown in Fig. 7. All these geometric constructions can be performed in SafeEuql, assuming that we add conjuncts that state that the coordinates of ~ p and ~ q satisfy the SafeEuql formula representing D: By a simple induction on the structure of the expressions under consideration, we obtain a SafeEuql formula j with the desired result. & We are now ready to prove the converse of Proposition 43. Proposition 45. Every SPFOLQ -expressible query can be expressed in SafeEuql. Proof. Since an SPFOLQ program is defined by safe FO þ linear þ PðD1 ; y; Dk Þ formulae (in which the polynomials have rational coefficients), it suffices to show that we can translate an arbitrary safe FO þ linear þ PðD1 ; y; Dk Þ formula into a SafeEuql formula, given SafeEuql formulae that represent the finite domains D1 ; y; Dk : To do the translation, we observe that, in the SafeEuql query to be constructed, the point variables that occur in input relational predicates correspond to pairs of product variables in the input relational predicates of the given SPFOLQ program. However, it is much easier to represent a product variable p by a point variable ~ p corresponding to the point ðp; 0Þ: This transition is possible in SafeEuql, as the points ~ p ¼ ðp; 0Þ and ~ q ¼ ðq; 0Þ can be constructed from the point ~ r ¼ ðp; qÞ with ruler and compass. This technique
ARTICLE IN PRESS L. Vandeurzen et al. / Journal of Computer and System Sciences 69 (2004) 617–655
651
Y
y
1 x
X
xy
Fig. 6. Geometric construction of the product.
p 1
0
1
p
q
p+q
0
p+1 2
p
p+1
Fig. 7. Geometric construction for addition (left), and absolute-value square rooting (right).
can also be used to construct a SafeEuql formula which represents the active domain of the input database. To obtain the output point variables of the SafeEuql program, which correspond to pairs of output variables of the SPFOLQ program, the reverse construction is required, which can also be done with ruler and compass. Assume now finite domains D1 ; y; Dk for which there exists SafeEuql formulae j1 ; y; jk such p Þg equals fðx; 0ÞjDi ðxÞg; with 1pipk: Let jðx1 ; y; x2n Þ be a safe FO þ linear þ that f~ p j ji ð~ PðD1 ; y; Dk Þ formula. The SafeEuql formula is obtained by inductively translating j to SafeEuql syntax. Every occurrence of an atomic formula Rðt1 ; y; t2k Þ or adomðtÞ; with t real or product variables, is replaced by a SafeEuql formula as explained above. Every occurrence of an atomic formula Qðp1 ; y; pm Þ40; with Q an expression built from the product variables p1 ; y; pm and the rational numbers, using addition, multiplication and absolute-value square rooting, is replaced by p; ~ p 1 ; y; ~ p m Þ4on-same-sideð~ p ;~ e1;~ 0 ;~ e 2 Þ; jQ ð~
ARTICLE IN PRESS 652
L. Vandeurzen et al. / Journal of Computer and System Sciences 69 (2004) 617–655
with jQ the SafeEuql formula of Lemma 44 applied to Q1 ¼ 1 and Q2 ¼ Q: Similar formulae can be constructed for the formulae Qðp1 ; y; pm Þy 0 with y one of the relation symbols ¼; a; o; p; 4; and X: Notice that product variables in j are appropriately bound to finite domains, and, hence, during the translation, we can add, for each point variable representing a product variable, a conjunct stating that the point variable belongs to the appropriate domain. Every occurrence of an atomic formula Q1 ðp1 ; y; pm Þx ¼ Q2 ðp1 ; y; pm Þ4Q1 ðp1 ; y; pm Þa0; with Q1 and Q2 expressions built from the product variables p1 ; y; pm and the rational numbers, using addition, multiplication and absolute-value square rooting, is replaced by the SafeEuql p; ~ p 1 ; y; ~ p m Þ; where jQ ð~ p; ~ p 1 ; y; ~ p m Þ is the result of Lemma 44 applied to Q1 and Q2 : formula jQ ð~ As explained above, we can add, for each point variable representing a product variable, a conjunct stating that the point variable belongs to the domain to which the product variable is bound. This concludes the basis of the induction. The rest of the induction process is straightforward. The combination of Propositions 43 and 45 yields the main result of this section: Theorem 46. The languages SPFOLQ and SafeEuql are equivalent in expressive power. 8. Expressiveness of PFOL We now return to our motivation for considering PFOL-finite (and SPFOL), namely to study the expressive power of the full query language PFOL. Indeed, the concept of finite representation of a semi-linear set, introduced in Section 5, allows us to lift query languages defined on finite databases to query languages defined on general linear constraint databases. Let Q be a query language defined on finite databases. We define LiftðQÞ to be the query language defined on arbitrary linear constraint databases consisting of all compositions of the PFOL encoding program, a query of Q; and the PFOL decoding program. The following property is now immediate. Theorem 47. Let P be a linear query language defined on arbitrary semi-linear databases that is at least as expressive as PFOL. Let Q be a query language on finite databases whose expressive power is bounded by P: Then LiftðQÞ is a query language on arbitrary databases whose expressive power is bounded by P: If, moreover, Q is complete for the P-expressible queries from finite inputs to finite outputs, then LiftðQÞ has the same expressive power as P: Clearly, PFOL has the same expressive power as LiftðPFOL-finiteÞ; and, hence, by Theorem 46, as LiftðSPFOLÞ: The equivalence result for SPFOL and SafeEuql (Theorem 46), inspires us to propose the following definition. Definition 48. An FO þ polylin query Q is constructible if there exists an SPFOL program that computes, from the canonical finite representation of the input, the active range of the canonical
ARTICLE IN PRESS L. Vandeurzen et al. / Journal of Computer and System Sciences 69 (2004) 617–655
653
finite representation of the output, or, equivalent, an active range superset of the finite representation of the output. The following result is now immediate: Theorem 49. The class of queries expressible in PFOL is the same as the class of constructible queries. The relevance of Theorem 49 with respect to the expressive power of PFOL, is that PFOL can compute a wide range of natural, linear queries, while remaining sound for the linear constraint databases. Example 50. Let S be an arbitrary semi-linear set. Clearly, a bounding box for S is also a bounding box for both the convex closure and the affine support of S: It is now easy to see that a superset of the active domain of the canonical finite representation of the convex closure, respectively, the affine support, of S can be computed from the canonical finite representation of S: As both the convex-closure query and the affine-support query on arbitrary semi-linear sets are FO þ poly-expressible, we may conclude that they are also constructible, and hence PFOLexpressible. In particular, each Boolean FO þ poly-expressible query on linear databases is constructible and hence PFOL-expressible (cf. Proposition 23). As a consequence, collinearity is PFOLexpressible. In summary, convex closure, affine support, and collinearity are PFOL-expressible, and not only the restricted versions of these queries considered in Section 3. The observation that SPFOL is closely related to SafeEuql, a geometrically inspired pointbased language, suggests that PFOL, while coordinate-based just like FO þ linear; is geometrically much more meaningful than the latter. Nevertheless, PFOL is not complete with respect to FO þ polylin ; the set of all FO þ poly-expressible linear queries. This confirms what already might have been expected intuitively, in view of our observation in Section 3 that the particular choice of operations allowed on product variables in PFOL was to some extent arbitrary (though exactly what is needed to obtain the equivalence between PFOL-finite and SafeEuql). To prove this formally, we now give an example of a linear FO þ poly-expressible query which is not PFOLexpressible. Thereto, let ~ p and ~ q be arbitrary points of the plane such that ~ 0; ~ p ; and ~ q are not collinear. It is well-known that the point ~ r on the line segment defined by ~ p and ~ q such that the angle defined by ~ 0; ~ p ; and ~ r is the trisection of the angle defined by ~ 0; ~ p ; and ~ q cannot be constructed p ; and ~ q : This result remains true even if to these points a fixed with ruler and compass from ~ 0; ~ e1; ~ ~ ~ (i.e., independent of p and q ) finite set of points is added [15]. By the equivalence between SPFOLQ and SafeEuql, established in Theorem 46, and by Theorem 49, we can now state the following result.
ARTICLE IN PRESS 654
L. Vandeurzen et al. / Journal of Computer and System Sciences 69 (2004) 617–655
Proposition 51. The trisection query of type ½2-½2 computing for each point ~ p a~ 0 in the input all ~ p cannot be expressed in PFOL. points on a trisector of the angle defined by 0 ; ~ e 1 ; and ~ Proof. Assume to the contrary that there exists a PFOL program that computes the trisection query. We now consider the query which outputs, on finite input, the output of the trisection query on that input intersected with the line x ¼ 1; and, on infinite input, the empty set. Clearly, if the trisection query is expressible in PFOL, this modified trisection query is also expressible in PFOL. Moreover, the latter query yields finite outputs upon finite inputs, whence it is in PFOLfinite. By Theorem 38, there exists an SPFOL program D1 ’j1 ðxÞ; y; Dk ’jk ðxÞ; fðx1 ; x2 Þ j jðR; x1 ; x2 Þg; computing that query. If all expressions in j have rational coefficients, there exists, by Theorem 46, an equivalent SafeEuql expression, whence the trisection of an angle defined by the points ~ 0; ~ e 1 ; and an arbitrary point ~ p can be constructed by ruler and compass, which yields a contradiction. Thus, assume that some expressions in j have real algebraic coefficients. Denote by Ca the finite set of real algebraic coefficients occurring in a polynomial of j and let C~a be the set of points fða; 0ÞjaACg: We can then generalize Lemma 44 to expressions with coefficients from Ca and p g: Hence, using a similar proof as that of Theorem 45, we SafeEuql expressions with input C~a ,f~ can construct a SafeEuql expression cðR; S; ~ x Þ such that, for every finite relation R; D1 ’j1 ðxÞ; y; Dk ’jk ðxÞ; fðx1 ; x2 Þ j jðR; x1 ; x2 Þg x Þ compute the same set of points in the plane. Hence, we can construct the and cðR; C~a ; ~ p by ruler and compass trisection of an angle defined by the points ~ 0; ~ e 1 ; and an arbitrary point ~ ~ e1; ~ p g; a contradiction. from the finite set of points C~a ,f0 ;~ As already observed, the expressive power of (S)PFOL can be manipulated in a flexible way. For instance, if we would supplement the operations allowed on product variables with a cubicroot construct, the trisection query considered above would become expressible. An intriguing question is whether completeness with respect to the FO þ poly-expressible could be reached by adding all functions returning roots of polynomial equations with integer coefficients. This question is still open. Conversely, we can also restrict the expressive power of PFOL by removing the square-root construct, which is not needed in the PFOL programs encoding and decoding an arbitrary semilinear set. We believe that the safe language then obtained precisely expresses the constructions with ruler alone on finite sets of points in the plane. Clearly, the languages obtained by these and similar manipulations deserve further study.
ARTICLE IN PRESS L. Vandeurzen et al. / Journal of Computer and System Sciences 69 (2004) 617–655
655
Acknowledgments We thank an anonymous referee for many valuable comments that helped us to improve the paper. References [1] F. Afrati, T. Andronikos, T. Kavalieros, On the expressiveness of first-order constraint languages, in: G. Kuper, M. Wallace (Eds.), Proceedings of the First Workshop on Constraint Databases and Applications, Lecture Notes in Computer Science, Vol. 1034, Springer, Berlin, 1995, pp. 22–39. [2] F. Afrati, S. Cosmadakis, S. Grumbach, G. Kuper, Linear versus polynomial constraints in database query languages, in: A. Borning (Ed.), Proceedings of the Second International Workshop on Principles and Practice of Constraint Programming, Lecture Notes in Computer Science, Vol. 874, Springer, Berlin, 1994, pp. 181–192. [3] M. Benedikt, G. Dong, L. Libkin, L. Wong, Relational expressive power of constraint query languages, J. ACM 45 (1) (1998) 1–34. [4] M. Benedikt, L. Libkin, Relational queries over interpreted structures, J. ACM 47 (4) (2000) 644–680. [5] M. Benedikt, L. Libkin, Safe constraint queries, SIAM J. Comput. 29 (2000) 1652–1682. [6] A. Brodsky, Y. Kornatzky, The lyric language: querying constraint objects, in: M. Carey, D. Schneider (Eds.), Proceedings of the 14th ACM SIGMOD International Conference on Management of Data, ACM Press, New York, 1995, pp. 35–46. [7] J. Chomicki, D. Goldin, G. Kuper, Variable independence and aggregation closure, in: Proceedings of the 15th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, ACM Press, New York, 1996, pp. 40–48. [8] F. Dumortier, M. Gyssens, L. Vandeurzen, D. Van Gucht, On the decidability of semi-linearity for semi-algebraic sets and its implications for spatial databases, J. Comput. Syst. Sci. 58 (3) (1999) 68–77; see also the Corrigendum, J. Comput. Syst. Sci. 59 (3) (1999) 535–571. [9] S. Grumbach, P. Rigaux, M. Scholl, L. Segoufin, DEDALE, a spatial constraint database, in: S. Cluet, R. Hull (Eds.), Proceedings of the Sixth International Workshop on Database Programming Languages, Lecture Notes in Computer Science, Vol. 1369, Springer, Berlin, 1997, pp. 38–59. [10] S. Grumbach, J. Su, C. Tollu, Linear constraint query languages: expressive power and complexity, in: D. Leivant (Ed.), Proceedings of the Logic and Computational Complexity Workshop ’94, Lecture Notes in Computer Science, Vol. 960, Springer, Berlin, 1995, pp. 426–446. [11] P. Kanellakis, D. Goldin, Constraint programming and database query languages, in: M. Hagiya, J. Mitchell (Eds.), Proceedings of the Second Conference on Theoretical Aspects of Computer Software, Lecture Notes in Computer Science, Vol. 789, Springer, Berlin, 1994, pp. 96–120. [12] P. Kanellakis, G. Kuper, P. Revesz, Constraint query languages, J. Comput. Syst. Sci. 51 (1995) 26–52. [13] G. Kuper, L. Libkin, J. Paredaens, Constraint Databases, Springer, Berlin, 2000. [14] J. Nievergelt, M. Freeston (Eds.), Special issue on spatial data, Comput. J. 37 (1) (1994) 1–42 (special issue). [15] F. Van Oystaeyen, personal communication, 1999. [16] J. Paredaens, B. Kuijpers, G. Kuper, L. Vandeurzen, Euclid, Tarski, and Engeler encompassed, in: S. Cluet, R. Hull (Eds.), Proceedings of the Sixth International Workshop on Database Programming Languages, Lecture Notes in Computer Science, Vol. 1369, Springer, Berlin, 1997, pp. 1–24. [17] N. Pissinou, R. Snodgrass, R. Elmasri, I. Mumick, T. O¨zsu, B. Pernici, A. Segef, B. Theodoulidis, U. Dayal, Towards an infrastructure for temporal databases, SIGMOD Record 23 (1) (1994) 35–51. [18] L. Vandeurzen, M. Gyssens, D. Van Gucht, An expressive language for linear spatial database queries, Theoret. Comput. Sci. 254 (1–2) (2001) 423–463.