Verification of Data-Intensive Web Services - FORTH-ICS

2 downloads 0 Views 180KB Size Report
Dec 21, 2004 - commerce web service, that allows users to order books from an online bookstore and is responsible ...... Oxford University Press, 1994.
Verification of Data-Intensive Web Services Grigoris Karvounarakis December 21, 2004

Abstract Data-intensive web services are interactive web applications that are supported by databases. As such services become more complex, reasoning about their correctness is a non-trivial task. Unfortunately, techniques from the verification literature do not handle programs whose execution involves steps such as queries over a database. For this reason, database researchers have proposed ways to model data-intensive web services using databases. Moreover, they have identified interesting verification problems, whose answer can help reasoning about the correctness of web service specifications. Most of these problems are related with verification of formulas of different temporal logics (LTL, CTL). Since these problems are undecidable in general, appropriate restrictions have been proposed both for the models and for the classes of properties to be verified, and some complexity bounds for the corresponding verification problems have been computed. This paper reviews this line of research, comparing different proposals in terms of the applications they are able to express and the classes of properties that can be verified on them. We also try to identify limitations of these approaches and suggest interesting extensions and related problems that could be the object of future research in this area.

1

Introduction

The last several years have seen an explosion of activity around web services, viewed broadly as interactive web applications providing access to information as well as transactions. Typically, and especially in application fields such as e-commerce or science, these services require database support to manipulate large data sets and provide features that are common in databases, such as concurrency control and recovery. As web services become more complex, the need for tools to verify the correctness of service specifications becomes more important. However, traditional program verification techniques (e.g., model checking [6]) do not apply to programs that interact with databases (e.g., by executing queries), which they treat as a “black-box”. For this reason, database researchers [2, 3, 11, 12, 7] followed a different approach, by proposing models of computing devices that manipulate relational data and can be expressed with database languages. These models were inspired by the traditional computing device of transducers, i.e., automata with output. Service specifications in such models have the form of logic programs in rule-based languages over relational vocabularies, similar to datalog programs [1]. As a result, one can employ techniques from logic, database theory and verification to reason about the behaviour and correctness of such web service specifications. This paper attempts a critical analysis of these papers, by comparing different models and reasoning techniques that have been proposed for the static analysis of web services. We also attempt an evaluation of the proposed models in terms of their ability to express real-life web 1

service applications. Moreover, we present different properties that one may want to verify about such specifications and illustrate how both the expressiveness of the model and of the language in which the properties are defined affects the decidability of the verification problem, as well as the resulting trade-offs that are involved in deciding the appropriate model and property language for some web service application. Finally, we show some complexity results presented in these papers for the cases when the verification problem is decidable. The rest of this paper is organized as follows: In Section 2 we present an example of application that illustrates the role of databases in common web services and motivates the creation of such models for their specification. In Section 3 we present the different models that have been proposed, analyze their strengths and shortcomings and explain how they can be used to express our example application. In Section 4 we identify important problems that have been proposed in order to reason about the correctness of service specifications, focusing mostly on verification of properties that can be expressed as formulas of temporal logics [8]. Finally, in Section 5 we recollect the most important contributions of these papers and outline some directions for future research.

2

A Motivating Example Application

As an example application throughout this paper we are going to consider a common electronic commerce web service, that allows users to order books from an online bookstore and is responsible for receiving payments and subsequently shipping orders. More precisely, the store maintains a database that contains information about the price of available books. Users can submit orders for books and are sent a bill with the price they have to pay. Users respond to the bill by sending an appropriate payment and after this payment is received, the store ships the books to the user. The system may also want to keep track of orders that have been received but not paid for yet, e.g. in order to be able to recover from a crash without losing submitted orders. Applications as the one just described obviously require database support in order to handle large amounts of service data and access them efficiently (e.g., for the store catalogs). Moreover, common features of database systems such as transactions, concurrency control, distribution and recovery from failure are very important for this kind of applications. As a result, any programming language used for the specification of such applications would have to interact with a database and possibly reimplement some of its functionality. This motivates - as a simpler and more uniform approach - the use of databases to represent the dynamic part of the application as well. Analogous extensions, that add dynamic capabilities to databases, have already been proposed in the context of active databases. Furthermore, rule-based database languages can serve as high-level, declarative specification languages for such services, which is desirable both from a business model perspective, since it allows - especially for business experts who are not computer experts - to abstract out implementation details and focus on the business logic. As a positive side-effect, the high-level specification language can also provide a rapid prototype implementation of the application, i.e. on top of a database system, and its performance can benefit by database query optimization and parallel evaluation techniques. But more importantly, of such formal specifications are necessary in order to pursue logical reasoning about the behaviour of web services.

2

3

Formal Models for Data-Intensive Web Services

In this section we are going to explore the different models that have been proposed along these lines, in order to represent such services using rule-based database languages. For each model we will also illustrate how it can be used to express the aforementioned example application.

3.1

Relational Transducers

Relational transducers are the first model that was proposed for this purpose in [2, 3]. In general, e.g., in language theory, transducers are automata with output. Their relational counterpart uses relations to represent input, state and output. Input relations are populated by the ”external” program input. An arbitrary number of tuples can be input at every step of the transducer execution. The transition and output functions of the transducer are expressed as datalog¬ programs ([1]) over the transducer schema, i.e., the database, input, state and output relations. For state relations the semantics of datalog are inflationary, i.e., asserted relational atoms are appended to the corresponding relation at every step. On the other hand, the semantics for output rules are non-inflationary, i.e., at every step only the newly inferred facts are retained. Note that the use of relations (i.e., sets, which are unordered) to represent the state of a machine automatically imposes a limitation on its expresiveness: suppose at some point a state relation contains tuples a and b, and let ina and inb be the corresponding input that resulted in the insertion of those tuples in the state relation. Then this machine cannot distinguish which tuple was inserted first. As a result, this machine cannot express the situation in which on input ina inb it goes to a different state than on input inb ina . This restriction is not so important if one can use any expression in the state rules, since then one could circumvent the problem by adding different symbols in the state along the two different paths created above; however, it becomes more important when combined with the additional restriction that state rules can only accumulate previous input (that is introduced in the next section for spocus relational transducers to achieve decidability of some problems). Moreover, since the state relations grow at every step - due to the inflationary nature of state rules - a relational transducer can never return to some previous state, i.e., it cannot contain a loop, which would require the existence of some transition that removes some tuple(s) from the state relations. This will be more obvious in the following implementation of our simple example application in the relational transducer model. The schema would look like the following (where, intuitively, U stands for user-id, X is a product-id and Y is a price): schema database: price(X,Y); input: order(U,X), pay(U,X,Y); state: past order(U,X); output: sendbill(U,X,Y), ship(U,X); Consider now the rule updating the state relation past order. Since the state rules for relational transducers are inflationary, they can only add tuples to this relation and it is impossible to remove those tuples later. As a result, the natural way to implement this functionality, i.e., as a loop that adds tuples for submitted orders and removes those tuples when the payment has been received is not expressible in this model. To overcome this problem, one can consider introducing another 3

state relation past pay(X,Y), that is updated when a payment is received. Then, it is possible to retrieve orders that have not yet been paid using the query past order −πX past pay. Then, the following datalog rules could provide an implementation of our bookstore application: state rules: past order(U,X) +:- order(U,X); past pay(U,X,Y) +:- pay(U,X,Y); output rules: sendbill(U,X,Y) :- order(U,X) ∧ price(X,Y) ∧ ¬ past pay(U,X,Y); ship(U,X) :- past order(U,X) ∧ price(X,Y) ∧ pay(U,X,Y) ∧ ¬ past pay(U,X,Y). The output rules of this program express the natural application semantics, i.e., that a bill should be sent if an order has been placed for some book and this book has not been paid for yet. Moreover, a book is shipped if it has been ordered in the past, as soon as the correct payment has been received (i.e., pay(U,X,Y) is received as an input). The conjunction with NOT past pay(X,Y) is there to enforce that the book is not going to be shipped again one execution step later, when the payment has been logged as a past payment. Notice another shortcoming that comes from the use of relations to represent state and the lack of the ability to remove tuples: since relations are sets, for some value x there can only be one tuple past order(U,x). If we want to allow a customer to buy the same book more than once (which is quite likely!), x cannot be e.g., the ISBN of the book; because suppose it is, then after the first time the book is ordered, past order contains < x > and after the customer has paid for the first order, past pay contains < x, y >, where y is the price of the book x. As a result, the output rules for sendbill(U,X,Y) and ship(U,X) will never be fired (because of the conjunction with NOT past pay(U,X,Y))! To overcome this problem one has to use a value for X that is unique for every copy of the book (note that this means that price should contain one tuple for every copy of every book that is in stock!). Alternatively, we would have to add another attribute to the order/past order relations to represent a unique order id. However, in this model it is not possible to express such a constraints, i.e., that this attribute is unique, or add a rule that describes how it is created. As a result, following this approach we would basically have a partial specification of the service and we would not be able to verify properties that involve this attribute. On the other hand, one can easily notice how short, simple and easy to understand this specification is, e.g., compared to writing a similar application in some programming language. In the next section we are going to discuss how this specification also facilitates static analysis, with some further restrictions.

3.2

ASM Transducers

At first sight this model, which was proposed in [11, 12] looks quite different from the previous one, because it is based on different logical foundations than relational transducers, namely on the theory of Abstract State Machines (originally called ”Evolving Algebras” [9]). However, it can be viewed as an extension of relational transducers, that also overcomes some of its shortcomings. An ASM transducer schema also consists of database, input, state1 and output relations and a program consisting of state and output rules. For output rules, the semantics are non-inflationary as in the 1

confusingly called “memory” relations in the paper

4

case of relational transducers and the left hand side can only occur positively. On the other hand, state rules can both add and remove tuples from state relations; deletions are expressed by rules with negation on the left hand side (which is similar to the one-step semantics of datalog¬¬). The right hand side of the rules is also more expressive, since it allows the use of any first-order sentence2 , instead of just conjunctive expressions, as in the case of datalog rules employed by relational transducers. Note that the ability to remove tuples from the state relations allows much more flexibility, compared to the model of the previous section. The following implementation3 of our bookstore application should serve to justify this claim: schema database: price(X,Y); input: order(U,X), pay(U,X,Y); state: past order(U,X); output: sendbill(U,X,Y), ship(U,X); state rules: past order(U,X) +:- order(U,X) ∧ ¬ past order(U,X) ¬past order(U,X) +:- past order(U,X) ∧ pay(U,X,Y) ∧ price(X,Y) output rules: sendbill(U,X,Y) :- order(U,X) ∧ price(X,Y) ∧ ¬ past order(U,X); ship(U,X) :- past order(U,X) ∧ price(X,Y) ∧ pay(U,X,Y). Observe that the ability to remove tuples from the state relations allows to express the orderbill-payment process as a loop, i.e., by removing the order from past order the system can return to the state it was before the order was placed. This way we can also use e.g., the ISBN of a book as its identifier in the price catalog; the only limitation enforced by these rules would be that one cannot buy the same book again until the payment for the first order has been sent. Of course, the limitations that come from the lack of order in relations remain, i.e., given some state relations with contents ab one cannot tell the order in which those tuples were inserted (and thus cannot express programs that would do different things depending on that order). However, this limitation is not so important if one is allowed to use any expression in the state rules, as explained earlier.

3.3

Web Page Schemas

The most recent specification language, that was introduced in [7], allows the specification of a web service as a set of web page schemas. This model introduces a small extension of ASM relational transducers, that supports a limited - but very natural from the point of view of practical web service applications - form of interaction between transducers, namely a (unidirectional) transition from one transducer (web page) to another. All these transducers access the same database (i.e., database, input, state and output relations), but only one of them is considered to be the “current” 2

Although only conjunctive expressions are shown in the examples of the paper, but there are some expressions in [7] that also contain quantifiers 3 Note that the syntax used in he following example, as well as in examples of the other models, is not the same as the one used in the papers, but has been selected as a common syntax that can be used for all three models in an attempt to hide their syntactic discrepancies and focus on the differences in terms of expresiveness

5

page at any point in time (i.e., the one that corresponds to the current page) and only the set of rules of that transducer is may be fired. The transition is expressed by the introduction of a new category of rules, namely target rules, that specify the next page to be considered current. Notice, however, that the fact that there is only one current page at any given time limits the ability of the model to handle multiple concurrent programs “seamlessly”; while in the case of the previous models all rules were considered at every step and some of them could be applied on data that corresponded to different concurrent runs (e.g., for different users that were using the system at the same time), in this model only one page is the current at a every step and that page corresponds to only one of the users. As a result, one could claim that this model can express the behaviour of the system within a particular run, that roughly corresponds to the interaction with a user within a session, while the models of the previous section represent the behaviour of the whole system (and there could be cases in which different concurrent runs are not independent4 ). Another interesting feature of this model is that it allows a limited form of input that comes from “outside” (i.e., a user) instead of from the database (i.e., input relations), as it is common in web services, e.g., when users are required to login to a website. Such input is then treated as a constant for the rest of the model. More interestingly, this model allows the transducer some limited control over the input it can receive: one can specify as ”input options” a set of tuples for the input relations among which the user can select those to be considered as input tuples for the next execution step. These input option relations are populated by expressions on the contents of the database, state and output relations, evaluated after the rules of the previous page have been fired. This feature, while natural in web service application where users often select products and options among list that are presented to them, imposes a restriction that simplifies the static analysis of such specifications, namely that it bounds the maximal input flow, i.e., the amount of data that can be fed as input at any step. This was also considered as a reasonable assumption (but not enforced through the specification language) by Spielmann [12], as will be explained in the next section. To facilitate comparison with the models presented above, we illustrate a possible implementation of our bookstore application using web page schemas. It employs two pages, one in which an - already logged in - user can order books, called OP (for Order Page) and one in which users can send payments for billed orders, called PP (for Payment Page), while it also assumes the existence of a Home Page (HP), where the system goes if the user clicks the “cancel” button in the OP or after clicking “pay” in PP, and of a constant user that holds the login name of the user that was entered in an earlier page: Page OP inputs IOP : order(X), button(X) input rules: Optionsorder (X) :- price(X,Y) Optionsbutton (X) :- X = ‘‘order’’ ∨ X = ‘‘cancel’’ state rules: past order(U,X) +:- order(X) ∧ U = user ∧ ¬ past order(U,X) target web pages TOP : PP, HP target rules: PP :- order(X) ∧ button(‘‘order’’) ∧ ¬ past order(U,X); 4

This would be, for example, the case if we were maintaining a count of the number of books in stock and decreased this number when a book was shipped

6

HP :- button(‘‘cancel’’); output (called “action” in the paper) rules: sendbill(U,X,Y) :- order(U,X) ∧ U = user ∧ button(‘‘order’’) ∧ price(X,Y) ∧ ¬ past order(U,X); Page PP inputs IP P : pay(U,X,Y), button(X) input rules: Optionsbutton (X) :- X = ‘‘pay’’ state rules: ¬past order(U,X) +:- past order(U,X) ∧ pay(U,X,Y) ∧ price(X,Y) target web pages TOP : HP target rules: HP :- button(‘‘pay’’); output rules: ship(U,X) :- past order(U,X) ∧ pay(U,X,Y) ∧ button(‘‘pay’’) ∧ price(X,Y) ∧ ∃ Z (credit-limit(U,Z) ∧ Z ≥ Y ). In the last rule we have slightly extended our application and its schema with a database relation credit-limit that contains information about users and their remaining credit balance in order to illustrate the use of quantifiers in the right-hand side of a rule. The input rule for Optionsbutton also depicts the use of disjunction. In general, any kind of first order sentence may be used. Observe that the introduction of the notion of a web page and target rules that specify the transition between pages does not seem to add any significant power to the model. Indeed, one could express the specification above using a single ASM transducer, by simply adding a state relation CurrentPage and replacing ”target rules” with rules that update this state relation appropriately. Then, to achieve the ”separation” of rules into different schemas, it would suffice to add to the right hand side of every rule a conjunction with CurrentPage(P), where P would be the name of the corresponding page, in whose schema that rule belonged. However, the introduction of web page schemas yields a more clear design and, more importantly, distinguishes web pages as a special construct of the model, allowing - as it will be shown in the next section - to express properties about the transitions themselves; if one treated web pages as part of the state, their update at every transition would be a result of the firing of two, seemingly independent rules, that have the effect of removing the old state from the CurrentPage relation and adding the new one. As a result, it would be impossible to statically reproduce the target rules from such a specification, and, thus, to reason about properties of the sequence of pages in a run. Another way to say this is that using a state relation one can express some properties about its contents at some point in time but not about the sequence of its contents over some period of time.

4

Reasoning about Services

The main reason for pursuing formal models for the specification of data-intensive web services was to enable some automated reasoning about their correctness, based on static analysis of their specification and using techniques from logic and verification. Although this ”high-level” aim has been the main focus of researchers in all papers that have been published in the area, the precise research questions have evolved, either because some research questions were considered more 7

important than others or because some of the problems were found to be reducible to some of the others. Indeed, the first paper(s) ([2, 3]) proposed several questions in an attempt to identify the most interesting problems, such as: • Log validation: given a log sequence, consisting of the contents of some - but not all - of the relevant relations, can we check whether there exists a sequence of inputs for which the system would produce this log. • Goal reachability: given an FO sentence on the schema that represents a ”goal”, does the execution ever satisfy this sentence? • (Limited) verification of temporal properties, expressed by conjunctions of sentences of the form: ∀¯ x[φ(state, db, in)(¯ x) → ψ(state, db, in)(¯ x)] Let Tpast input denote the set of all properties of this form. A run is considered to satisfy such a property if the sentence is satisfied at every transition of the run. For example, such sentences can express ”backwards” properties (from a temporal perspective) of the kind: ”if φ is satisfied at some point in the run, then there must have been a state earlier in the run in which ψ was satisfied” (assuming both φ, ψ remain satisfied after they are asserted at some state). For example, the property: “whenever a product is shipped, at some previous step a payment with the correct price for that product has been received” belongs to Tpast input . • Another limited kind of verification that was proposed requires the addition of special error rules in specification. Then, one would like to verify that all runs of a transducer are error-free (i.e., no error rules are fired) or that some property that belongs in Tpast input is satisfied by every error-free run. • Containment/equivalence testing between two specifications. Over time it turned out that the most interesting of these problems, in order to express properties about the correctness of specifications, was the verification of temporal properties. However, in its original formulation it was not sufficient, because the proposed class of properties, presented above, does not contain all interesting temporal properties. In particular, it does contain formulas that express statements about the future, such as formulas of Linear Temporal Logic [8] employing the operator F (read ”eventually”). As a result, more recent work focused on the problem of verification of LTL-FO [8] properties for relational transducers, to allow the use of temporal operators in expressions with relational atoms. This was further justified by the fact that the problems of goal reachability and log validation can be reduced to that of verification of LTL-FO properties, as it was proved in [12]. Finally, [7] take verification of temporal properties one step further, by allowing even more expressive temporal operators, as the ones used in the branching-time logic CTL/CTL∗ [8], that allow quantification over paths in runs. Unfortunately, all of these interesting problems are undecidable, for all of these transducer models. In order to overcome undecidability one needs to impose some restrictions on the expresiveness of the model and/or simplifications of the problems to be solved. Some of these restrictions may be somewhat ”artificial”, in that they are devised to achieve decidability of some interesting problems but may limit the set of interesting applications that can be expressed in that model. In some cases there may exist some more ”natural” restrictions, that come from the kind of applications that we want to express. One restriction that is common in all three models is that no updates in the 8

database relations are possible during a run; as a matter of fact, the the specification languages do not even allow to express such updates (i.e., there are no ”database rules” similar to ”state rules”). On the other hand, since some (limited) form of updates is allowed for some relations (namely, the state relations) one can wonder whether allowing insertions and deletions from database relations as well would make verification much harder. In the following sections we are going to present different restrictions that were proposed, and try to assess the extent to which these restrictions limit the ability to express useful application in the resulting model. Since verification of temporal properties has turned out to be the most important among these problems, we are going to focus our presentation on it, also showing the “evolution” of the problem over time. Moreover, we are going to illustrate the trade-offs between the expressiveness of the model and of the temporal language, in order for the problem of verifying sentences of this language on that model to remain decidable.

4.1

Spocus transducers

Along these lines, [2, 3] propose a restriction of relational transducers, called spocus, which stands for Semi-Positive Output and CUmulative State. The first part of the name corresponds to the restriction for output rules: the body of the rule must be a conjunction of relational atoms, positive or negative, such that every variable used in the rule occurs in (i.e., is bound to) at least one positive atom. State rules are much more restricted, in that their body can only contain a single positive input relation atom, i.e., each one of them can only accumulate the contents of input relations at previous steps. As a result, spocus transducers suffer from the problem that was first discussed in section 3.1, i.e., after receiving inputs a and b in some order, and following the corresponding ”transitions”, they cannot determine the order in which their input was received. As a result, one could claim that the restriction imposes a severe limitation on the practicality of the model. On the other hand, spocus transducers are expressive enough to model some interesting applications, as for example our bookstore application (with the limitations analysed in section 3.1; indeed our relational transducer that implements the bookstore web service in section 3.1 is a spocus transducer. Moreover, given this restriction, an interesting verification problem is decidable in NEXPTIME, namely whether every run of a spocus transducer T satisfies a sentence ψ ∈ Tpast input . This class of sentences can be used to make judgements about input that was received in previous steps, given the current “state” of the machine. For instance, one such property, that the owner of the bookstore would like to verify in our bookstore application, is whether every time the system ships a book to a customer, a proper payment for that book has already been received from that customer: ∀u∀x∀y [(ship(u, x) ∧ price(x, y)) → past pay(u, x, y)] Moreover, goal reachability, i.e. the problem of whether, given a spocus transducer T and a goal γ, expressed as an FO sentence, there exists a run of T whose output in the last step satisfies γ is also decidable in NEXPTIME. The same is true for verification of properties for error-free runs, with the condition that error rules do not contain any negative state literals. However, Tpast

input

does not include interesting properties such as:

∀u∀x∀y(past order(u, x) ∧ price(x, y) ∧ pay(u, y) → Fship(u, x)) This means that if a customer pays for a book she has ordered then eventually the book will be

9

shipped to her.

4.2

Input-bounded quantification and LTL-FO verification

Motivated by this shortcoming, more recent proposals pursued verification of a richer temporal language, namely LTL-FO an extension of (Propositional) Linear Temporal Logic to allow the use of relational atoms. This logic is based temporal operators such as neXt and Until, which can be combined to also express Future (also called eventually), Before and Globally (also called always). Verification of sentences of this language was first investigated in [11, 12] in the context of ASM transducers. This work followed a different approach in order to achieve decidability: it identifies a simplifying, yet quite natural, assumption by requiring the existence of a bound in the maximal amount of input that can be “fed” to the system at every execution step. Indeed, such an assumption is very reasonable, even just for technical reasons - e.g., memory size of the computer on which the system is running. Moreover, it has the effect of bounding one of the arbitrary parameters that affect the size of the state space that one needs to explore in order to “cover” all possible executions of such a specification. In order to take advantage of this fact, a slight restriction in both the model and the temporal language is proposed, namely that all formulas used in the rules that form the specification of the web service and in the properties to be verified are input-bounded, i.e., the range of first-order quantifiers in the formulas must be restricted to the active domain of the current input. Formally, the set of input-bounded formulas is obtained from FO by replacing the formula formation rule with the following: • If x ¯ is a tuple of variables, α is an input atom with x ¯ ⊆ f ree(α) and φ is a formula such that for every state and output atom β occuring in φ, f ree(β) ∪ x ¯ = ∅, then ∃¯ x(α ∧ φ) and ∀¯ x(α → φ) are formulas. An ASM transducer is input-bounded if the formulas in the bodies of all rules are inputbounded5 . For example consider the following sentence, requiring that a product must be paid for before it can be shipped, is not input-bounded because x does not occur in any input relation atom: ∀u∀y[pay(u, y) ∧ ∃x price(x, y)B¬ship(x, y)] The following similar sentence is also non input-bounded, since x occurs in past order(u, x), which however is not an input relation: ∀u∀x∀y[past order(u, x) ∧ pay(u, y) ∧ price(x, y)B¬ship(x, y)] 5

Formally, input-bounded ASM transducers and Spocus transducers are incomparable in that for both models there exist runs that are not expressible in the other model. In the one direction we have already seen such an example, namely that Spocus transducers cannot express loops by removing tuples from the state, as ASM transducers can. In the other direction, Spielmann claims that such a run exists, but I could not find an example. Nevertheless, I would conjecture that input-bounded ASM transducers are more appropriate to specify web services than Spocus transducers, which have some serious limitations that make some common patterns awkward, if not impossible, to express. I believe that the fact that one of the authors who proposed Spocus transducers (Vianu) based his later work ([7]) on input-bounded ASM transducers further justifies this conjecture.

10

This example reveals some of the limitations imposed by the input-boundedness condition, i.e., that we can only put in the same formula expressions about data that are passed as input at the same execution step. This is not the case for our example property above, in which orders and payments are not received at the same step. To allow a little more flexibility, [7] slightly extended the definition to allow for the use of state relations that only hold the last value of a particular input relation. With this relaxation we could change the rule for past order to only hold the last order and the above formula would be input-bounded and, assuming that users have to pay for an order, before placing a new one, this sentence would also express the desired property. Moreover, input-boundedness turns out to be a sufficient condition, in order to solve the problem of verification of (input-bounded) temporal properties. Indeed, it was proven in [11, 12] that given an input-bounded ASM transducer T and an input-bounded LTL-FO sentence φ, deciding whether every run of T with maximal input flow ≤ N satisfies φ is in EXPSPACE, for any N ≥ 0 if the database schema is not fixed (PSPACE-complete if the schema is fixed)6 . Moreover, even with the restriction of input-boundedness, this problem is certainly more general than verification problems that were solved for spocus transducers. Interestingly, and perhaps surprisingly, it was proved in [11, 12] that problems such as log validation, goal reachability and equivalence that were considered in [2, 3] are polynomial-time reducible to it - for input-bounded ASM transducers - and, as a result also in EXPSPACE. Furthermore, it was proved in [7] that deciding whether every run is error-free is as hard as LTL-FO verification. Naturally, since [7] proposes an extension of ASM transducers, they also adjust the notion of input-boundedness in the context of web page schemas. An interesting remark is that the assumption of the existence of a bound in the maximal input flow is an implicit part of the model, since input comes from a precomputed set of options of known size. Indeed, this seems to the norm for interactive web services, in which users navigate between pages and enter information or pick from lists of choices that are presented to them, further justifying this assumption. Not surprisingly, since the models are quite similar, the complexity of the decision procedure of the verification problem for web page schemas is also in EXPSPACE (and PSPACE-complete for schemas with a fixed bound on the arity). Moreover, it turns out that the conditions set forth by [11, 12], with the extension about state relations holding the last input, are quite tight: it was proved in [7] that even small relaxations in these conditions make the verification problem undecidable. For example, we could relax the input-bounded restriction by allowing state rules to be of the form (which is not input-bounded but still quite limited): S(¯ x) : −∃¯ y S 0 (¯ x, y¯) where S, S 0 are state relations. However, the problem becomes undecidable, as shown by a reduction of the implication problem for functional and inclusion dependencies. Moreover, allowing quantification over state relations that hold anything more than the last input value (e.g., holding all past input values), as in the example above, is also undecidable by a reduction of Trakhtenbrot’s theorem, i.e., undecidability if finite validity of FO formulas. 6

It is unclear to me what they mean by “fixed”; it could be that they are verifying some property on a given schema (and thus proving the property only proves that it holds for that schema but may not hold for some other schema). However, it seems that the schema is always known in advance since relevant database relations are declared as part of the transducer specification - in the schema part - and one would imagine that we know their arity/type as well, if they are used in rules

11

4.3

CTL-FO/CTL∗ -FO

One of the reasons for splitting specifications in different pages and distinguish information about the “current page” as something special, rather than as state information, was to enable the formulation of properties about the transition between pages. To express such properties, some more powerful operators need to be added to the temporal language, that allow quantification over paths in runs. Computation Tree Logic with FO atoms (CTL-FO) extends LTL-FO with such operators, namely A, which stands for “for every path in the continuation of the current run” (starting from some initial page that is specified by the context in which the expression appears), and E which means “there exists a continuation of the current run”. CTL∗ allows any combination of path quantifiers and temporal operators while in CTL temporal operators must always be immediately preceded by a path quantifier. Using such operators one can express navigational properties such as the following: AGEF(HP ) which intuitively means that from every page that is accessible from the entry page to the system there exists a continuation of the run that eventually leads to the page HP . Note that without the operator E in the middle, the resulting sentence would mean that every continuation of the run would eventually go through the page HP ; as a result, this property cannot be expressed without the added operators (e.g., in LTL-FO). Unfortunately, the additional expressiveness in the property language comes at a cost: inputboundedness is not sufficient to guarantee decidability of verification. Further restrictions have to be applied to the specification of the transducer to make this possible. The restriction proposed in [7] is that specifications are both input-bounded and propositional, i.e., all state, input and output relations in the specification have arity 0 (database relations can be as before). Then, given a propositional input bounded web service specification and a formula φ, the verification problem can be solved in CO-NEXPTIME, if φ is in CTL-FO and EXPSPACE, if φ is in CTL∗ FO. However the usefulness of the resulting model for expressing interesting web services that are also worth verifying is questionable, since propositional input-bounded web services only allow to pass boolean values as input and to express the state using boolean variables (relations of arity 0). One can also achieve more tractable decision procedures by further restricting the specification language; fully propositional specifications further restrict propositional specifications by disallowing the use of database relations. Verification of CTL∗ -FO formulas over fully propositional web service specifications is in PSPACE. However, the resulting model does not seem expressive enough to express any interesting web service. The authors suggest that the service specification itself does not have to be propositional but one can first “abstract out” the data in order to get a propositional specification and focus on verification of navigational properties. This is probably feasible for properties as the one presented above, that does not involve any relational atoms, but not for more complex ones (e.g., in a more detailed implementation of the bookstore application such a property could be): ∀pid, ∀pr [AG(ξ(pid, pr) → A((EFcancel(user, pid))U(ship(user, pid))] where ξ(pid, pr) is the formula: P P ∧ pay(pr) ∧ button(“pay 00 ) ∧ pick(pid, pr) ∧ price(pid, pr) Moreover, the complete lack of relations defeats the purpose of introducing this family of models 12

(relational transducers) in the first place, as well as the whole discussion about “data-intensive” web services, since such specifications do not contain any data at all and perhaps more “mature” models from the verification community - for which more decidability results are known - could be employed to, potentially, greater benefit. The authors seem to acknowledge this lack of expressiveness and propose a slight extension that seems to help express some very specific kinds of applications while not affecting the decidability of verification. This extension allows the use of a single binary relation in the database but further restricts input to a single unary relation; this way one can express applications in which consecutive pages correspond to subsequent refinements in a userdriven search on the same data, going through consecutive stages of refinement. Under these conditions, verification for CTL-FO formulas is in EXPTIME while for CTL∗ -FO formulas it is in 2-EXPTIME.

5

Conclusions

In this paper we have reviewed formal specification languages that have been proposed in order to capture the semantics of web services that involve the manipulation of data. In this context, we reviewed the models of relational transducers and the more flexible ASM transducers, as well as the extension of the latter to web page schemas. The main reason for pursuing this formalization was to enable static analysis of some notion of correctness of services. Although such problems are undecidable in general, several restrictions were proposed for these these models, under which verification of temporal properties is decidable. For spocus transducers, verification of a limited form of temporal properties can be decided in NEXPTIME while for input-bounded ASM transducers and web page schemas verification of input-bounded LTL-FO formulas is decidable in EXPSPACE. Finally, verification of CTL-FO formulas is also in EXPSPACE for propositional input-bounded web page schemas, which however seems to be a very restricted model to represent practical applications. The complexity results, although they may look intimidating, are quite reasonable, as static analysis goes. However, for an implementation of a system with such functionality to be feasible, significant work is needed in order to obtain practical algorithms and heuristics. As a problem of theoretical importance one may also want to find further restrictions to the problems above under which the complexity bounds are lower, e.g., in PTIME. However, the resulting models would probably be too restrictive to be useful for modeling practical web service applications. Another direction for future work that is more important from a practical point of view is related with a severe limitation shared by all three models, namely that they do not allow any updates to the database relations, while such functionality is necessary for practical applications (e.g., a store would probably like to decrease the quantity of a product in its inventory when an item is shipped to a customer). A way to circumvent this problem could be the introduction of sessions. Then, it may be adequate to verify that some properties hold within a session (during which updates are still not allowed), while allowing updates to the database relations between sessions. An important aspect of web services (and in general E-services) that has gained increased attention in the last few years (e.g., [5, 10, 4]) and is not covered in the reviewed papers is the composition of services that interact with each other, in order to form more complex services. For the case of interacting data-intensive services, a model was outlined in [10] in the form of relational Mealy machines (RMMs). It is likely that the design of such models could take advantage of the work presented above or even be built as an extension of such models, with the addition of com13

munication primitives. Different problems may also arise with such extensions. Since verification would most likely be harder in more powerful models as these, more restrictions may need to be discovered - in the models or the questions we want to answer about them - under which interesting problems are decidable.

References [1] S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. [2] S. Abiteboul, V. Vianu, B. Fordham, and Y. Yesha. Relational transducers for electronic commerce. In Proceedings of ACM Symposium on Principles of Database Systems, pages 179– 187, Seattle, Washington, United States, 1998. ACM Press. [3] S. Abiteboul, V. Vianu, B. Fordham, and Y. Yesha. Relational transducers for electronic commerce. Journal of Computer and System Sciences, 61(2):236–269, 2000. [4] D. Berardi, D. Calvanese, G. De Giacomo, M. Lenzerini, and M. Mecella. Automatic composition of e-services that export their behavior. In International Conference on Service-Oriented Computing, volume 2910 of Lecture Notes in Computer Science, Trento, Italy, 2003. Springer. [5] T. Bultan, X. Fu, R. Hull, and J. Su. Conversation specification: a new approach to design and analysis of e-service composition. In Proceedings of International World Wide Web Conference, pages 403–410, Budapest, Hungary, 2003. ACM Press. [6] E. M. Clarke, O. Grumberg, and D. A. Peled. Model Checking. MIT Press, 2000. [7] A. Deutsch, L. Sui, and V. Vianu. Specification and verification of data-driven web services. In Proceedings of ACM Symposium on Principles of Database Systems, pages 71–82, Paris, France, 2004. [8] E. Emerson. Temporal and modal logic. In J. van Leeuwen, editor, Handbook of Theoretical Computer Science, Vol. B, pages 995–1072. Elsevier Science Publishers B.V., 1990. [9] Y. Gurevich. Evolving algebras 1993: Lipari Guide. In E. B¨orger, editor, Specification and Validation Methods, pages 9–37. Oxford University Press, 1994. [10] R. Hull, M. Benedikt, V. Christophides, and J. Su. E-services: a look behind the curtain. In Proceedings of ACM Symposium on Principles of Database Systems, pages 1–14, San Diego, California, 2003. ACM Press. [11] M. Spielmann. Verification of relational transducers for electronic commerce. In Proceedings of ACM Symposium on Principles of Database Systems, pages 92–103, Dallas, Texas, USA, 2000. ACM. [12] M. Spielmann. Verification of relational transducers for electronic commerce. Journal of Computer and System Sciences, 66(1):40–65, 2003.

14