Formal Verification of Web Service Interaction Contracts - Google Sites

2 downloads 192 Views 181KB Size Report
Recovery not only needs to cope with database failures but ... consistency in the presence of transient software failure
Formal Verification of Web Service Interaction Contracts German Shegalov Oracle, Java Platform Group 1211 SW Fifth Ave, Ste. 800 Portland, Oregon 97204, USA [email protected]

Gerhard Weikum Max-Planck-Institut Informatik Stuhlsatzehausweg 85 D-66123 Saarbruecken, Saarland, Germany [email protected]

Abstract

sage queues) where each interaction boils down to an expensive 2PC transaction comprising a read of a previous reply from an input queue and a write to an output queue [7], preventing its adoption in large scale Web applications. It also requires an unnatural programming style of keeping application state externally, which makes it unsuitable for state-rich applications such as collaborative E-science (e.g., distributed data analysis). It cannot be expected that applications be rewritten to include the necessary calls to a SQL engine or a TP-monitor API at all points where components interact with each other. We have introduced generic recovery protocols in the interaction contracts (IC) framework [1] and shown its efficient implementation for Web Services EOS that transparently provides recovery guarantees from the finger tips to the data server throughout all tiers. The original work provided only a semi-formal textual specification. However, it is for recovery being such a crucial system component that formal specification and verification is mandatory.

Recovery is the last resort when other components exhibit bugs. It is therefore of paramount importance that the correctness of the recovery protocols be formally verified. Recovery not only needs to cope with database failures but should handle and ideally mask message and process failures in clients and servers. Otherwise, when a reply message is lost the application must be able to determine “manually” whether the interaction is to be repeated. This paper develops a statechart specification of a recovery framework that generically guarantees exactly-once execution and applies model checking to prove its correctness.

1. Introduction Information systems comprise many components and reach a complexity that even stress-tested production systems are rarely free of bugs. Transaction recovery is the last resort for preventing data losses and preserving data consistency in the presence of transient software failures. Unfortunately, transactional atomicity and persistence does not suffice to guarantee the correct behavior of the applications. It is the application code that needs to retry failed requests to servers, as established by custom failure-handling code. Since modern multi-tier applications involve a rich set of clients, several layers of middleware, and backend data servers, the complexity of handling failures is enormous. This situation calls for a better generic infrastructure providing comprehensive forms of data, process, and message recovery and being able to mask all failures to applications [10, 2, 9, 6, 11, 4]. Such an infrastructure drastically simplifies the application code boosting the development productivity. A typical challenge for applications is coping with nonidempotent requests whose repeated execution must be avoided. A conventional approach to handle these problems generically is to manage all application-relevant state, at all tiers, in transactional components (databases and mes-

2. Review of the IC Framework The framework [1] considers a set of components of three types. (i) Persistent components (Pcom), e.g., clients and application servers, are able to recreate their state and messages as of the time of the last external interaction upon crashes, and eliminate duplicates; (ii) Transactional components (Tcom), e.g., database servers, provide the same guarantees only for the final (commit request); (iii) eXternal components (Xcom) without any recovery guarantees represent human users and legacy non-compliant components. The Pcom is a central concept of the framework. Pcom’s are piecewise deterministic, i.e., their execution is deterministic up to nondeterministic input that is under control of other components. A Pcom’s state as of any particular time can be recreated replaying a log of nondeterministic events. The exactly-once execution in the overall system is guaranteed when components obey the obligations defined in the IC’s. User messages passed between two Pcom’s must com1

ply with either the Committed IC (CIC) or the Immediately Committed IC (ICIC). Pcom’s and Tcom’s exchange messages using the Transactional IC (TIC) [13]. Xcom’s are allowed to communicate only with Pcom’s as determined by the External IC (XIC). Full details can be found in [12].

persistent without the need to request the message from the sender. After S2b release, the interaction is installed, i.e., replay of the interaction is no longer needed. An ICIC is a CIC in which the receiver immediately installs the interaction and therefore the sender is released from both message persistence requirements, S2a and S2b at once. Now we consider the mapping to statechart elements. We introduce a variable suffixed with last logged for each instance of the IC parts whose update with a current IC state represents a synchronous (forced) log write. After a crash, the component first goes into the recovery states of all IC instances it has run and jumps to a state recorded in last logged. A message m is modeled as internal event appearing in the action part of an ECA label (e.g., /m). Communication failures that lead to a message loss are captured in an external event link outage. Transitions of a receiver reacting on a message m use a compound event m ok abbreviating m∧¬link outage. Message uniqueness is achieved in consequence of modeling IC’s as generic activities, such that each individual message in a specification for a concrete application is local to the corresponding instance. This makes the use of additional sequence numbers unnecessary. Component crashes are modeled by external events as well. All IC statecharts terminate components on a crash. This is done by enclosing IC’s orthogonal components in a superstate with a transition to a termination connector (a circle labeled T ). The crash transition supersedes all regular transitions. We also model process monitors that restart crashed activities as soon as the crash event is no longer generated. System failures result in message losses. Some interaction contracts require that the sender resends the message periodically until an acknowledgment is received. Request timeouts on a sender component may also occur due to unacceptable performance loss during peak load. We model this by having the receiver react on messages originating from other components after a random delay, which we refer to as message execution time, and which subsumes network and CPU processing times. Statemate supports special timeout events of the form tm(e, d) generated d steps after the most recent occurrence of the event e. We use external integer variables as timeout values with the range 0 . . . 30. Timeout events in the specifications are suffixed tm. Figure 1 shows the statecharts defining the behavior of a sender Pcom (cic sender sc) passing a message to a receiver Pcom (cic receiver sc) using the (I)CIC. When verifying a single (I)CIC instance, we assume that the message being sent is a result of internal computations and is not caused by any external event (sndr trigger is generated). The sender obligation of persistent state S1 from the CIC definition is outside the specification, when only one pro-

3. Specification of (I)CIC In this section we give a textual specification of the CIC and the ICIC and describe how we map it to a formal specification using the statechart language of Statemate [8]. Statecharts are finite-state automata with various enhancements. Each state can in turn be an entire automaton, so statecharts can be nested. Orthogonal components are two or more statecharts that capture parallel execution, providing a compact way of describing a cross-product of automata. Transitions are labeled with event-condition-action (ECA) rules where events and conditions are propositional-logic formulae, and actions are typically the starting or terminating of activities. The behavior of an activity is in turn specified by a statechart. There are rigorous methods of encoding a statechart into a standard finite-state machine, which can then serve as the basis for model checking. We consider only transient omission failures such as component crashes and communication outages resulting from so-called Heisenbugs such as nondeterministic concurrency bugs occurring under unspecified workload. A CIC between a Pcom sender (S) and a Pcom receiver (R) consists of the obligations: S1: Persistent State. Sender promises that its state at the time of the message send or later is persistent. S2: Persistent Message. S2a: Sender promises to send the message repeatedly (driven by timeouts) until receiver releases it (perhaps implicitly) from this obligation. S2b: Sender promises to resend the message upon explicit receiver request until the receiver releases it from this obligation. This is distinct from S2a, typically longer lasting and usually more explicit. S3: Unique Messages. Sender promises that its messages have unique contents (including all header information such as timestamps, HTTP cookies, etc.). R1: Duplicate Message Elimination. Receiver promises to eliminate duplicate messages (which sender may send to satisfy S2a). R2: Persistent State. R2a: Receiver promises that before releasing sender obligation S2a, its state at the time of message receive or later is persistent without the sender periodically re-sending. After S2a release, receiver must explicitly request the message from sender should it be needed. The interaction is stable, i.e., it persists (via recovery if needed) with the same state transition as originally. R2b: Receiver promises that before releasing the sender from obligation S2b, its state at the time of the message receive or later is 2

MSG_RECOVERY not SEND_MSG_OK and GET_MSG_TM/ GET_MSG

4.1. Formal Verification of (I)CIC

[RCVR_LAST_LOGGED=='STABLE']/ GET_MSG

CIC_RECEIVER_SC

SEND_MSG_OK SEND_MSG_OK [RCVR_LAST_LOGGED=='']

MSG_RECEIVED

RECOVERY

MSG_EXEC_TM/ RECEIVED; [RCVR_LAST_LOGGED=='STABLE']

MSG_PROCESSED

[RCVR_LAST_LOGGED=='INSTALLED'] [ICIC]/ RCVR_LAST_LOGGED:='INSTALLED'; INSTALLED

(RCVR_STABLE_TM or RCVR_ND[MSG_ORDER_MATTERS]) [not ICIC and RCVR_LAST_LOGGED=='']/ RCVR_LAST_LOGGED:='STABLE'; STABILITY STABLE_R>

RCVR_INSTALL_TM/ RCVR_LAST_LOGGED:='INSTALLED'; INSTALLED

RCVR_CRASH

INSTALLED_R>

Chart: CIC_RECEIVER_SC Version:3 Date: 5-NOV-2003 13:32:19

T

Chart: CIC_SENDER_SC Version:2 Date: 5-NOV-2003 13:32:19

T

SNDR_CRASH

CIC_SENDER_SC

MSG_LOOKUP MSG_RECOVERED_TM/ SEND_MSG

GET_MSG_OK STABLE_S

SNDR_MSG_TM and not (STABILITY_OK or INSTALLED_OK)/ SEND_MSG SENDING SNDR_ND/ SEND_MSG PREPARE_PERSISTENCE

STABILITY_OK

INSTALLED_OK/ SNDR_LAST_LOGGED:='INSTALLED'

SNDR_STABLE_TM and not (INSTALLED_OK or GET_MSG_OK)/ IS_INSTALLED

INSTALLED_OK/SNDR_LAST_LOGGED:='INSTALLED'

INSTALLED_S

[SNDR_LAST_LOGGED=='INSTALLED'] SNDR_TRIGGER [SNDR_LAST_LOGGED=='']/ SNDR_ND

RECOVERY

Figure 1. Statechart of a Pcom sender and receiver implementing CIC. ICIC is used when the corresponding receiver flag is on.

tocol instance is concerned. However, it will play a role, when we will verify a complex Web service specification with multiple different protocol instances at the application level. According to the framework design the Pcom commits its state (resulting from processing incoming messages whose replay must be in the same order) when sending a message. Thus, the Pcom must force the log to disk. This is why an output parameter event sndr nd is generated. This parameter is usually bound to the corresponding rcvr nd parameters to the parallel receiving activities on the same Pcom used as a trigger for the log forcing. The sender obligation S3 requiring message uniqueness is provided in the specification without any special measures as we discuss above.

The formula AG(¬sndr crash) → AG(rcvr last logged =00 → AF

Suggest Documents