Approximating Continuous Markov Processes Vineet Guptay
Josee Desharnais
School of Computer Science McGill University Montreal, Quebec, Canada
Caelum Research Corporation NASA Ames Research Center Moett Field CA 94035, USA
Radha Jagadeesan
Prakash Panangaden
Dept. of Math. and Computer Sciences Loyola University-Lake Shore Campus Chicago IL 60626, USA
School of Computer Science McGill University Montreal, Quebec, Canada
April 23, 1999 Abstract
Markov processes with continuous state spaces arise in the analysis of stochastic physical systems or stochastic hybrid systems. The standard logical and algorithmic tools for reasoning about discrete ( nite-state) systems are, of course, inadequate for reasoning about such systems. In this work we develop three related ideas for making such reasoning principles applicable to continuous systems. We show how to approximate continuous systems by a countable family of nite-state probabilistic systems, we can reconstruct the full system from these nite approximants, we de ne a metric between processes and show that the approximants converge in this metric to the full process, we show that reasoning about properties de nable in a rich logic can be carried out in terms of the approximants. The systems that we consider are Markov processes where the state space is continuous but the time steps are discrete. We allow such processes to interact with the environment by synchronizing on labels in the manner familiar from process algebra. Previously, we de ned a notion of equivalence - strong probabilistic bisimulation - and gave a logical characterization of this equivalence in terms of a certain weak modal logic. The approximation results are new and opens the door to the use of extant veri cation technology. A fundamental shift in viewpoint relative to our previous work is to move from traditional logic to a logic with reals as truth values. This is not a shift to an ad-hoc many-valued logic but a shift from traditional logical formulas to measurable functions viewed as formulas; a point of view advocated by Kozen almost 20 years ago.
1 Introduction The dichotomy between discrete and continuous models of systems has been re ected in deep conceptual divisions and fundamentally dierent mathematical tools. There have been meeting points y
Research supported in part by NSERC. Contact author. Phone: 1.650.604.2462, Fax 1.650.604.4036,
[email protected]
1
before - for example, numerical analysis - but the last few years have witnessed new relationships between these subjects; most notably robotics, computer vision, hybrid systems and also the theory of cellular automata. These subjects have brought together mathematicians, physicists, control theorists and computer scientists. Nevertheless the subjects remain far apart. Until a few years ago one saw very little eect of logic, for example, on traditional control theory - the COCOLOG project [CW95a] is an exception - or physics. Similarly one sees no in uence of ideas from analytical mechanics in the eld of veri cation, though such ideas have appeared in the cellular automata literature. With the emergence of the hybrid systems community [AHS96, AKNS97] there are much more interactions between the two sets of ideas, i.e. between continuous and discrete models. One of the pioneering papers in this subject was the development of timed automata by Alur and Dill [AD94]. Here time is continuous and the state space (including clock states) is also continuous. The key breakthrough was the so-called region construction which allowed an in nite state space to be collapsed into nitely many equivalence classes under a suitable behavioral equivalence. This paved the way for the use of the standard veri cation techniques for the veri cation of continuoustime systems. The region construction is also used in the study of linear hybrid automata [ACH+ 95, HHWT97]. Here the state space may be a continuum and the dynamics may be partly governed by discrete transitions and partly by dierential equations. The ability to exploit the geometry of the problem to reduce the system to a nite-state system is crucial. In each case the reduction to a nite-state systems rests on judicious limitations placed on the systems as well as clever use of a behavioral equivalence - in fact a type of bisimulation - in order to achieve the reduction to a nite-state system. This approach is very successful but it limits the types of systems one can deal with. For example a bouncing ball cannot be handled by linear hybrid automata since the dierential equations have to be rst order. Our previous work [DEP98, DEP99] on stochastic systems was similarly motivated. We sought behavioral equivalences that could be used with continuous state-space stochastic systems. If one is lucky, the system of interest is actually equivalent to a nite-state system. In the present paper we have a new viewpoint; namely that we should try to approximate a continuous system with a discrete one. The goal is that one should not have to put too many restrictions on the type of system one works with. One no longer expects that a system of interest will turn out to be equivalent to a nite-state system but rather one expects that an arbitrary system will be close to a nite-state system. In this paper we present an approximation scheme for a class of stochastic systems with continuous state spaces with exactly this property. This scheme does not just provide a discrete approximation to a continuous system, it also supports logical reasoning principles and shows how one can perform \approximate reasoning." In a recent paper [DGJP99] we develop metrics between processes, for discrete systems. We use one of (actually a family of) the metrics de ned in that paper - extended to the continuous case - to show that the approximations converge to the original system. We also introduce a notion of simulation and prove that a system simulates its nite approximants. This shows that one of the nite approximants really embodies only behaviour that belongs to the system being approximated. We also show that there is a countable family of nite-state systems that approximate all systems. The systems that we consider are Markov processes where the state space is continuous - it is assumed to be an analytic space - but the time steps are discrete. Analytic spaces are metrizable topological spaces that can be given a separable metric. We will suppress the discussion of these details - they are thoroughly discussed in our previous papers [DEP99, Eda99]- since they are not needed to follow any of the new results of this paper. Suce to note that all the familiar continuous spaces, for example open, closed or Borel subsets of Rn , are analytic. 2
The systems that we consider - continuous state space and discrete time - are of interest for two reasons. First, this is a reasonable middle ground before we get to a completely continuous model. Several authors have looked at discrete space, continuous time models. Second, these systems occur in natural examples. A signi cant one that we have become involved with is the treatment of a
ight-control system. Here the system is coded as a loop with a certain xed periodicity. Every cycle some actions are taken and some interactions occur. The system itself is inherently continuous state but the temporal evolution has a discrete time step imposed by the main loop. Another situation where such systems occurs is in the context of probabilistic programming languages. As soon as one adds recursion or iteration even a binary probabilistic choice can be the basis of a continuous state space. In a recent paper [GJP99] we developed a probabilistic concurrent constraint programming language in which we had to deal with continuous state spaces. That paper contains a discussion of more examples where this type of system is important. The background to the present work is a previous investigation into such processes. We de ned a notion of equivalence - a strong bisimulation - and a logical characterization of this equivalence de ned in terms of a certain weak modal logic. The present work builds on this. The metrics we de ne re ne the notion of equivalence de ned previously, and are based on the logic. However, the approximation results are entirely new and make it possible to consider the use of extant veri cation technology to validate stochastic systems. A fundamental shift in viewpoint relative to our previous work is to move from traditional logic to a logic with reals as truth values. This is not a shift to an ad-hoc many-valued logic but a shift from traditional logical formulas to measurable functions viewed as formulas; a point of view advocated by Kozen almost 20 years ago. This allows one to capture the notion of approximate reasoning in a much smoother fashion. Just as with logic we have a syntactically de ned class of functional expressions. When interpreted on a speci c system we obtain a measurable function. These functions \probe" a measure in exactly the same way that a logical formula probes a state. In fact Kozen, following suggestions of Plotkin, noticed a Stone-type duality between measures and measurable functions which led him to propose the idea that measurable functions should be viewed as the analogues of formulas in probabilistic semantics. Just as the satisfaction relation can be seen as a pairing of states and formulas, integration is a pairing of measures and measurable functions with the new feature that a real number between 0 and 1 is assigned to each such pairing. We prove that when a functional expression is evaluated on an approximating sequence of nite-state processes, the result converges to the interpretation of the same functional expression on the original process. This kind of result can only be sensibly stated with the functional expressions rather than with formulas. However the fundamental link with the logic we prove is that - for the negation-free fragment of logic - for every formula we can de ne a functional expression f such that for any state s s j= if and only if f (s) > 0. From this one can extract many facts about how the logic interacts with the approximations. In fact the class of functions that we deal with is richer than what the logic might suggest. The logic we can deal with is negation free but the class of functions for which we can prove the convergence property includes functions that are not monotone with respect to simulation. In fact the functions obtained from our functional expressions are rich enough to approximate all bounded measurable functions obeying a Lipschitz condition. The de nition of the Hutchinson metric [Hut81] is based on exactly such functions suggesting that this class is indeed a mathematically signi cant class. We cannot prove the fundamental result mentioned in the last sentence of the last paragraph for a richer logic including negation but we conjecture this it is true. Thus if we look at the functions in their own right we have a signi cantly stronger result. Indeed we claim that if we think of these functions as observations of process behaviour we have included any reasonable observations that 3
one can expect to make. The search for such nite approximation techniques was motivated by ideas from domain theory, but we use none of the results or techniques of that subject. The technical contributions all involve combinatorial constructions and classical measure theory. In domain theory one of the central contributions is to develop a general qualitative notion of information and, in particular, of nite information. For us the notion of nite process happily turns out to be precisely a nite-state process. In the present paper we will avoid categorical or domain-theoretic ideas in order to make the ideas more generally accessible. A future paper will discuss those issues. This paper is organized as follows. Section 2 is a review of our basic model previous work reported rst in [BDEP97]. The next section, discusses the notion of bisimulation related to what we reported in [BDEP97, DEP98] and to appear in [DEP99]. The present version is however much more \down to earth" and does not use any categorical language. This section also introduces the notion of simulation and gives the logical characterization of bisimulation. This presentation is new but the results are only marginally dierent from what we have previously reported. The next section describes the approximation scheme and reconstruction. The next section describes the use of functional expressions. The results are almost all new, but the idea of such a class of functional expressions appeared in our recent paper [DGJP99]. In particular, the way that approximations interact with these functions is new. The last technical section introduces the metrics and proves convergence results. These metrics appeared in our paper [DGJP99] but of course the convergence results are new.
2 Labeled Markov Processes In this section we recapitulate the de nitions of labeled Markov processes from [DEP99]1 . The rst few paragraphs recall basic notions of process algebra and the intuition of bisimulation. A reader familiar with these concepts can safely skip until just before de nition 2.1. The idea is to model probabilistic systems that may interact with the environment. The nature of the interaction is of a very primitive kind used by the process algebra community [Mil89]. The system is assumed to be capable of a number of possible dierent \actions" each represented by a \label" which is not interpreted further. The environment selects an action and the system reacts by executing a state transition corresponding to the chosen action. This response may be determined completely by the action (label) or it may be indeterminate (nondeterministic) - the case extensively studied in process algebra. The actions oered by the system to the environment will, in general, depend on the state. From the point of view of the environment (or of an observer performing experiments on the system) the state is not observed. Instead the observer sees what actions the system performs and which are rejected. A whole variety of process equivalences have been developed with dierent motivations and mathematical theories. The relation most studied is bisimulation introduced by Milner [Mil80] and Park [Par81]; see [Mil89] for an expository introduction. The intuition behind bisimulation is that the two processes being compared have matching moves. If one process can make a transition with an a-action then so can the other and the processes resulting from the transitions are themselves bisimilar. Whatever the variation in the notion of equivalence; the key point is always that the acceptance or rejection of actions by the process is the basic observable. The fact that one observes rejection of actions is what distinguishes this subject from classical automata theory. Labeled Markov processes are probabilistic transition systems with a state space which may 1 The work was rst announced in [BDEP97] and [DEP98]
4
be continuous and a set of labels. Corresponding to each label a Markov process is de ned. The transition probability is given by a stochastic kernel (Feller's terminology [Fel71]) also commonly called a Markov kernel. Thus the indeterminacy has two sources: the \choice" made by the environment - no probabilities are attributed to this at all - and the probabilistic transitions made by the process. This is the so called \reactive" model and is due to Larsen and Skou [LS91] in a discrete state-space setting. In brief, a labeled Markov process can be described as follows. There is a set of states and a set of labels. The process is in a state at a point in time and transitions occur between states. The state which the process moves to is governed by which interaction with the environment is taking place and this is indicated by the labels. The process evolves according to a probabilistic law. If the process interacts with the environment by synchronizing on a label it makes a transition to a new state governed by a transition probability distribution. So far, this is essentially the model developed by Larsen and Skou [LS91] in their very important and in uential work on probabilistic bisimulation. They specify the transitions by giving, for each label, a probability for going from one state to another. Probabilistic bisimulation then amounts to matching the moves; this means that both the labels and the probabilities must be the same. Our formalism has to be couched in measure-theoretic terms since we are interested in continuous and discrete processes. We cannot take always take all the sets as the - eld on the state space. In fact we need to assume metric space structure on the state space. The classical theory of Markov processes is typically carried out in the setting of Polish spaces rather than on abstract measure spaces. We work with analytic spaces which generalize Polish spaces. However in this paper we focus on more combinatorial issues and suppress some of the ner points of measure theory. The paper \Bisimulation for Labeled Markov Processes" by Desharnais, Edalat and Panangaden [DEP99] discusses all these issues in detail. A key ingredient in the theory is the stochastic kernel or Markov kernel. We will call it a transition probability function. De nition 2.1 A transition (sub-)probability function on a measurable space (S; ) is a function : S ?! [0; 1] such that for each xed s 2 S , the set function (s; ) is a (sub)probability measure, and for each xed A 2 the function (; A) is a measurable function. One interprets (s; A) as the probability of the process starting in state x making a transition into one of the states in A. The transition probability is really a conditional probability; it gives the probability of the process being in one of the states of the set A after the transition, given that it was in the state s before the transition. In general the transition probabilities could depend on time, in the sense that the transition probability could be dierent at every step (but still independent of past history); we always consider the time-independent case. We will work with sub-probability functions; i.e. with functions where (s; S ) 1 rather than (s; S ) = 1. The mathematical results go through in this extended case. We view processes where the transition functions are only sub-probabilities as being partially de ned, opening the way for a notion of approximation. The stochastic systems studied in the literature are usually only the very special version where (s; S ) is either 1 or 0. We call such processes total and the general processes are called partial. We capture the idea that an action is rejected by setting (s; S ) to be 0.
De nition 2.2 A partial labeled Markov process S with label set A is a structure (S; i; ; fa j a 2 Ag), where S is the set of states i is the initial state, and is the Borel - eld on S , and 8a 2 A; a : S ?! [0; 1] 5
is a transition sub-probability function.
We will x the label set to be A once and for all. The resulting theory is not seriously restricted by this. We will write (S; i; ; ) for partial labeled Markov processes, instead of the more precise (S; i; ; fa j a 2 Ag). We de ne a total labeled markov process as a partial labeled Markov process, as above, and a predicate Can on S A such that for every (s; a) 2 Can we have a (s; S ) = 1 and for every (s; a) 62 Can we have a (s; S ) = 0. In other words, if the action is enabled, the transition probability is normalized to 1. We give a simple example, taken from [DEP99], to illustrate the ideas. Consider a process with two labels fa; bg. The state space is the real plane, R2 . When the process makes an a-move from state (x0 ; y0 ), it jumps to (x; y0 ), wherep the probability distribution for x is given by the density K exp(?(x ? x0 )2 ), where K = = is the normalizing factor. When it makes a b-move it jumps from state (x0 ; y0) to (x0 ; y), where the distribution of y is given by the density function K exp(? (y ? y0 )2 ). The meaning of these densities is as follows. The probability of R t jumping from (x0 ;2y0 ) to a state with x-coordinate in the interval [s; t] under an a-move is s K exp(?(x ? x0 ) )dx. Note that the probability of jumping to any given point is, of course, 0. In this process the interaction with the environment controls whether the jump is along the x-axis or along the y-axis but the actual extent of the jump is governed by a probability distribution. If there were just a single label we would have an ordinary (time-independent) Markov process.
3 Bisimulation and Logic The fundamental process equivalence that we consider is strong probabilistic bisimulation or just \bisimulation" for the present paper. The de nition that we use is an adaptation of a de nition due to Larsen and Skou [LS91]. In an earlier paper [BDEP97] we had introduced a version of this de nition based on categorical ideas but in the present paper we present a version much closer in form to that of Larsen and Skou2 . In this section we also recapitulate the result of our paper [DEP98] on a logical characterization of bisimulation.
3.1 A Relational De nition of Bisimulation
The notion of bisimulation that we use captures the idea that processes are equivalent if they react exactly the same way to all external interactions in terms of accepting or rejecting actions. Thus we do not see the \internal dynamics", i.e. the state transitions but we do see whether an action is accepted or rejected and with what probability. The de nition that we give is essentially the same as that of Larsen and Skou but with the evident extra conditions to deal with measure-theoretic issues. Let R be a relation on a set S . We say a set A S is R-closed if R(A) = ftj9s 2 A; sRtg is a subset of A. If R is re exive, this becomes R(A) = A.
De nition 3.1 Let S = (S; i; ; ) be a labeled Markov process. An equivalence relation R on S is a bisimulation if whenever sRs0, with s; s0 2 S , we have that for all a 2 A and every Rclosed measurable set A 2 , a (s; A) = a (s0 ; A). Two states are bisimilar if they are related by a bisimulation relation. Let S = (S; i; ; ) and S 0 = (S 0 ; i0 ; 0 ; 0 ) be a pair of labeled Markov processes. Then we say that S and S 0 are bisimilar if there is a binary relation R on S ] S 0 such that 2 For those who are interested, our version amounts to working with cospans rather than with spans.
6
1. iRi0 2. if A is any R-closed subset of S ] S 0 such that A \ S 2 and A \ S 0 2 0 and sRs0 then a (s; A \ S ) = a0 (s0 ; A \ S 0) 3. R restricted to S and to S 0 yields bisimulation relations on the processes S and S 0 separately.
The intuition of this de nition is that the relation R relates those states that can be \lumped" together. Bisimulation is the largest such relation. Across a pair of processes the de nition is the evident extension. Intuitively one is taking bisimulation in \the direct sum" of the two processes. Bisimulation is obviously re exive and symmetric. We will use the logical characterization of bisimulation in order to show that it is transitive.
3.2 A Logical Characterization of Bisimulation
One can de ne a simple modal logic and prove that two states are bisimilar if and only if they satisfy exactly the same formulas. Indeed for nite-state processes one can decide whether two states are bisimilar and eectively construct a distinguishing formula in case they are not [DEP99]. As before we assume that there is a xed set of \actions" A. The logic is called L and has the following syntax: T j 1 ^ 2 j haiq
where a is an action and q is a rational number. This is the basic logic with which we establish the logical characterization. In later sections we will work with richer logics. We introduce one of them here, a logic with in nitary disjunction, LW : _
L j fi j i 2 Ng: Given a labeled Markov process S = (S; i; ; ) we write s j= to mean that the state s satis es the formula . The de nition of the relation j= is given by induction on formulas. The de nition is obvious for the propositional constant T, conjunction and disjunction. We say s j= haiq if and only if 9A 2 :(8s0 2 A:s0 j= ) ^ (a (s; A) > q). In other words, the process in state s can make an a-move to a state, that satis es , with probability strictly greater than q3 . We write JKS for the set fs 2 S js j= g. We often omit the subscript when no confusion can arise. The logic that Larsen and Skou used in [LS91] has more constructs including disjunction and some negative constructs. They show that for nitely branching systems4 , two states of the same process are bisimilar if and only if they satisfy the same formulas of their logic. The fact that a logic without negation and without in nitary conjunction is sucient for processes with in nite branching was somewhat of a surprise based on what we expect from the nonprobabilistic case. It is even more surprising that this logical characterization goes through even in the continuous case. The example of the present paragraph is taken from [DEP98, DEP99]. Consider the processes shown in gure 1. They are both nonprobabilistic processes. The usual formula distinguishing them is hai:hbiT, which says that the process can perform an a action and then be in a state where it cannot perform a b-action. The process on the left satis es this formula while the process on 3 In our earlier work we had used instead of >. 4 They actually use a stronger property, the \minimum deviation condition" which uniformly bounds the degree
of branching everywhere.
7
the right does not. However, it is well-known that they cannot be distinguished by a negation-free formula of Hennessey-Milner logic. If we now consider probabilistic versions of these processes we nd that the situation is dierent. For no assignment of probabilities are the two processes going to be bisimilar. Suppose that the two a-labeled branches of the left hand process are given probabilities p and q, assume that the b-labeled transitions have probability 1. Now if the right hand process has its a-labeled transition given a probability anything other than p + q, say r > p + q we can immediately distinguish the two processes by the formula haip+q T which will not be satis ed by the left-hand process. If r = p + q then we can use the formula hair0 hbi1? T, where q < r0 < p. The left hand process cannot satisfy this formula unless p = 0, in which case the two processes are in fact identical.
a ? ?
s
s
@R@a b
?
b
a
? ?
Figure 1: Two processes that cannot be distinguished without negation. The main theorem relating the logic and bisimulation is the following. This was proved in [DEP98, DEP99]. The present proof is adapted to our relational presentation of bisimulation, certain technical details are suppressed, see [DEP99] for the complete proof.
Theorem 3.2 Let (S; i; ; ) and (S 0 ; i0 ; 0; 0 ) be two labeled Markov processes. Two states s 2 S , s0 2 S 0 are bisimilar if and only if they satisfy the same formulas of L. Proof . ): Let R be a bisimulation between S and S 0. We prove by induction on the structure of
formulas that if sRs0 then s and s0 satisfy the same formulas. The cases of T and conjunction are trivial. Now assume the implication is true for , i.e., for every pair of bisimilar states either both satisfy or neither of them does. This means that the set JKS [ JKS 0 is R-closed. Note that these two sets are disjoint, we will call them A and A0 respectively. It is easy to prove (see [DEP99] for details) that JK is always measurable. Thus we have that A and A0 are R-closed measurable sets. Since R is a bisimulation, a (s; JKS ) = a0 (s0 ; JKS 0 ) for all a 2 A. So s and s0 satisfy the same modal formulas of the form haiq . (: In our previous paper [DEP98, DEP99] we constructed a quotient process obtained by de ning an equivalence relation on the states of a process S = (S; i; ; ) as follows. Brie y, two states were de ned to be equivalent if they satisfy the same formulas of the logic; we write s s0 when this is the case. We then form the quotient Q = (Q; q0 ; ; ) with the canonical projection q where is the largest - eld such that q is measurable. We de ned a transition probability on the quotient process and proved the following properties: 1. B 2 if and only if q?1 (B ) 2 , 2. 8s 2 S; B 2 :(q(s); B ) = (s; q?1 (B )). Now we will use these facts to show that the relation de ned on the states of S is in fact a bisimulation relation. Let A 2 be -closed. Then we have A = q?1 q(A) and hence q(A) 2 . 8
Now if s s0 , then q(s) = q(s0 ), and a (s; A) = a (f (s); f (A) = a (s0 ; A), as wanted. It is straightforward to adapt this to the case where we are talking about bisimulation between two processes. Corollary 3.3 Bisimulation is an equivalence relation. Proof . Suppose that S and S 0 are bisimilar and that S 0 and S 00 are bisimilar. This means that we have relations R S S 0 and R0 S 0 S 00 which are bisimulations. Then related states satisfy the same formulas. Thus given sRs0 and s0R0 s00 we know that s and s00 satisfy the same formulas. Thus we can nd a bisimulation relation between them.
3.3 Simulation
It will be convenient explicitly to de ne the notion of direct sum of two labeled Markov processes. De nition 3.4 Let S = (S; i; ; ) and S 0 = (S 0 ; i0 ; 0; 0 ) be two labeled Markov processes. The direct sum of these processes is a process U = (U; u0 ; ; ) with U = S ] S 0 ] fu0 g, u0 is a new state, is the - eld generated by [ 0, and 8a 2 A, a(u0 ; fig) = a (u0 ; fi0 g) = 12 , and for all s 2 S , s0 2 S 0, a (s; A ] A0 ) = a(s; A) and a (s0 ; A ] A0 ) = a0 (s0 ; A0 ). The choice of 12 as the transition probability is arbitrary. This construction is purely formal and is only used in order to de ne a relation on the common state space. With this de nition we do not, for example, have an associative direct sum. However this is of no importance for the use that we make of this de nition. De nition 3.5 Let S = (S; i; ; ) be a labeled Markov process. A re exive and transitive relation (a preorder) R on S is a simulation if whenever sRs0 , with s; s0 2 S , we have that for all a 2 A and every R-closed measurable set A 2 , a (s; A) a (s0 ; A). R is a strict simulation if the inequality is strict whenever a (s0 ; A) > 0. We say s is simulated (strictly simulated) by s0 if sRs0 for some simulation (resp. strict simulation) relation R. Let S = (S; i; ; ) and S 0 = (S 0 ; i0 ; 0 ; 0 ) be a pair of labeled Markov process. S is simulated (strictly simulated) by S if there is a simulation (resp. strict simulation) relation on some process U of which S and S 0 are direct summands, relating i and i0 in U . Proposition 3.6 Every bisimulation is a simulation, and simulation and strict simulation are transitive. In fact we also have that if S strictly simulates S 0 which is simulated in turn by S 00 , then S is strictly simulated by S 00 . Proof . We prove that simulation is transitive. First let us consider two simulations R1 and R2 on a single process S = (S; i; ; ). Let R be the transitive closure of R1 [ R2 . Then every measurable R-closed set is also Ri -closed, i = 1; 2, then it follows easily that R is a simulation on S . Now let R1 be a simulation between S and S 0 through process U1 and R2 a simulation between 0 S and S 00 through process U2. Then construct the direct sum U of U1 and U2 and consider R the transitive closure of R1 R2 on U as above. Then R is a simulation on U that relates i and i00 , and S , S 00 are direct summands of U . 9
The notion of simulation meshes properly with the logic in the sense of the following proposition. Proposition 3.7 If s is simulated (or strictly simulated) by s0, then for all formulas 2 LW, s j= implies s0 j= .
Proof . Let R be a simulation on a single process S = (S; i; ; ). We prove by induction on
the structure of formulas that for every formula , JK is R-closed, which implies the result. It is obvious for T, conjunction and disjunction. Now assume it is true for , and let sRs0. Then, since R is a simulation and JK is measurable and R-closed, we have a (s; JK) a (s0 ; JK), and hence Jhaiq K is R-closed for every rational q . Now if s and s0 come from two dierent processes, observe that if S is a direct summand of U , a state of S satis es exactly the same formulas in S as in U . Hence the result.
3.4 Adding least xpoints to L
As summarized in subsection 3.2 we showed in [DEP98] that bisimulation was characterized by a very weak logic. In the present paper we go in the opposite direction. We are interested in reasoning about a process in terms of its nite approximants. Now we would like the logic to be as rich as possible so that very rich forms of reasoning can be carried out using the approximation mechanism that we have developed. Recall the logic W LW obtained by adding a countable disjunction to L. Formally, we have LW: T j 1 ^ 2 j haiq j fi j i 2 Ng, with the de nition of the relation j= given by: _
t j= fi j i 2 Ng , (9j ) t j= j Theorem 3.2 can be trivially extended to show the following. Proposition 3.8 Let states s and t of process S be bisimilar. Then (8 2 LW ) [s j= , t j= ] The presence of countable disjunction enables the de nition of least xed points because of the continuity of the constructions in the logic. Proposition 3.9 Let F [X ] be an expression built from the following grammar: _
X j T j 1 ^ 2 j haiq j fi j i 2 Ng Let S be any process and let fAi j i 2 Ng be an increasing sequence of measurable subsets of S such that [i Ai = A. Then, [ JF [A=X ]K = JF [Ai =X ]K
Proof . The proof proceeds by induction on the structure of F [X ]. We actually have to prove in
addition that JF [Ai =X ]K forms an increasing chain which we carry along as an inductive hypothesis. We omit explicitly verifying this as it is easy. The base cases for X; T are immediate. We consider the inductive cases below.
10
Let F [X ] = F1 [X ] ^ F2 [X ].
F [A=X ]K = JF1 [A=X ]K \ JF2 [A=X ]K = [iJF1 [Ai =X ]K \ [i JF2 [Ai =X ]K; by Induction = [i[JF1 [Ai =X ]K \ JF2 [Ai =X ]K]
J
W
Let F [X ] = fFj [X ] j j 2 Ng.
F [A=X ]K = [j JFj [A=X ]K = [j [i JFj [Ai =X ]K]; by Induction = [i [j JFj [Ai =X ]K]
J
Let F [X ] = haiq :F1 [X ].
F [A=X ]K = fs j q < a (s; JF1[A=X ]K)g = fs j q < a (s; [i JF1 [Ai =X ]K)g = fs j q < sup a (s; JF1 [Ai =X ]K)g because JF1 [Ai =X ]K is an increasing chain. i = [i fs j q < a (s; JF1 [Ai =X ]K)g = [i JF [Ai =X ]K
J
Thus, the least xpoint operator, , is de nable in the logic LW .
3.5 Examples
In this section, we describe probabilistic versions of some standard temporal operators. In these examples, we are essentially following the encoding of temporal operators in the modal calculus. For purposes of notational simplicity, we are using only the label a in the following. EFP Aq FP EU Aq U
= = = =
X: _ X: _ X: _ X: _
hai0 :X haiq :X [ ^ hai0 :X ] [ ^ haiq :X ]
The P is the temporal logic formula corresponding to . EFP captures the idea that there is a path along which eventually holds. Aq FP captures the idea that at any state either is true or one can make a transition with probability at least q to states which eventually satisfy . The U s are until formulas. EU captures the idea that there is a path along which until holds. Aq U captures the idea that at any state, either holds or holds and we can make a transition with at least probability q where this property is also true.
4 Finite Approximation and Reconstruction This section describes the heart of our approximation results. We construct a family of nite-state processes that approximate a process. We show that one can reconstruct the original process 11
more precisely a bisimulation equivalent of the original process - from the approximants. We do not reconstruct the original state space - this has to be known - but we reconstruct all the transition probability information; i.e. the dynamical aspects of the process. The construction can be viewed as a kind of \unfolding" construction. The approximants that we construct are trees; thus a very special kind of nite-state process. As the approximation is re ned there are more and more transitions possible. There are two parameters to the approximation, one is a natural number n, and the other is a positive real . The number n gives the number of successive transitions possible from the start state; the height of the tree. The number measures the accuracy with which the probabilities approximate the transition probabilities of the original process. The states of the approximants are labeled with subsets of the state space of the original process. Given a labeled Markov process S = (S; i; ; ), an integer n and > 0, we construct a nite approximation S (n; ) to it. We will prove that S simulates all its nite approximations so that in some sense the approximants really only capture properties of the original process. In later sections we prove much stronger statements about how the approximations converge in a suitable metric and how the logical reasoning principles also \converge". S (n; ) is an n-step unfolding approximation of S . Its state-space is divided into n + 1 levels which are numbered 0; 1; : : : ; n. A state is a pair (A; l) where A 2 and l 2 f0; 1; : : : ; ng. At each level the sets that de ne states form a partition of S . The initial state of S (n; ) is at level n and transitions only occur between a state of one level to a state of one lower level. Thus, in particular, states of level 0 have no outgoing transitions. In the following we omit the curly brackets around singleton sets. This only happens in nite-state processes.
De nition 4.1 Let (S; i; ; ) be a labeled Markov process. We denote the nite approximation by S (n; ) = (P; p0; P (P ); p) where P is a subset of f0; : : : ; ng. It is de ned as follows, for n 2 N and > 0. S (n; ) has n +1 levels. States are inductively de ned with respect to the level they are in.
Level 0 has one state (S; 0). Now, given level l containing m states, we de ne level l + 1 as follows. Consider (Bi )i2I the partition of [0; 1] into intervals of size =m: ff0g; (0; =m]; (=m; 2=m]; : : : g. States of level l + 1 are obtained by the partition of S which is generated by the sets a (; C )?1 (Bi ), for C a union of sets appearing in level l and every label a 2 fa1 ; : : : ; an g, i 2 I . So if a set A is in this partition of S , (A; l + 1) is a state of level l + 1. Transitions can happen from a state of level l + 1 to a state of level l, and the transition probability function is given by
pa ((A; i); (B; j )) =
(
inf (t; B ) if j = i ? 1; t2A a 0 otherwise.
The initial state p0 of S (n; ) is the state (A; n) such that A contains i, the initial state of S .
We can actually produce approximants with fewer states if we are a bit more careful, but it makes little dierence to the discussion in this paper. If B = [Bi , where (Bi ; l) 2 S (n; ), we will sometimes write pa ((A; l + 1); (B; l)) to mean P i2I pa ((A; l + 1); (Bi ; l)). If s 2 S , we denote by (As ; l) the unique state (at level l) such that s 2 As. The following lemma is a simple but useful result.
Lemma 4.2 Let S be a labeled Markov process, and s 2 S . In S (n; ), if B is a nite union of sets appearing at level l, then 0 < a (s; B ) ? pa ((As ; l + 1); (B; l)) : 12
Proof . Let (A; l + 1), (Bi; l), i = 1; : : : ; j be states of S (n; ). Then for all s; t 2 A we have ja(s; Bi ) ? a (t; Bi)j < =m; because of the way S is partitioned on level l +1 (m is the number of states at level l). Since j m, the result follows trivially. It turns out that every state (A; l) in S (n; ) is simulated in S by every state s 2 A. Proposition 4.3 Every labeled Markov process S simulates all its nite approximations. More precisely, every state (A; l) of S (n; ), is simulated in S by every s 2 A.
Proof . Let S (n; ) = (P; p0; P (P ); p) and U = (U; u0 ; ; ) be the direct sum of S (n; ) and S . Now let R be the re exive relation on U relating a state (A; l) from S (n; ) to every state s 2 A from S . We prove that R is a simulation. Let X 2 be R-closed, that is, X \ S 2 and R(X \ P ) X \ S . Now consider two related states, (A; l) and s 2 A. The only positive transition from (A; l) are to states of the form (B; l ? 1) so we can assume that X \ P is a union B of states of level l ? 1. Now observe that R((B; l ? 1)) = B and by the preceding lemma we have: a((A; l); (B; l ? 1) [ B ) = pa ((A; l); (B; l ? 1) a(s; B ) = (s; (B; l ? 1) [ B ); and hence the result. The following lemma shows how the interpretation of logical formulas of L interacts with the approximation. We use the notation depth() to stand for the maximum depth of nesting of the modal operator in the formula . Lemma 4.4 Let (S; i; ; ) be a labeled Markov process. For every formula we have
= [iCi with (Ci; l) 2 S (n; ) and Ci JKS ,
J KS
where n l depth() and all the probabilities occurring in are integer multiples of .
Proof . The proof is by induction on the structure of formulas. It is trivial for T. Now assume it is true for and . Suppose that all the probabilities occurring in ^ are multiples of ; then this statement is true for both and . Now let l depth( ^ ); then l depth() and l depth( ). Consequently, we have JKS = [i Ci where the index i runs over all the states (subsets of S ) occurring at level l, i.e. (Ci ; l) 2 S (n; ) and Ci JKS and similarly for . Now,
since sets involved in one level are all disjoint, it is easy to see that the lemma is also true for ^ . Now assume it is true for and consider the formula haiq . Let l and be as above for this formula. Then by the induction hypothesis, we have that JKS is a union of sets appearing at level l ? 1 in S (n; ) (because depth() n ? 1). Therefore, sets of level l are partitions of the sets Ci = a (; JKS )?1 ((i=m; (i + 1)=m]) and since q is an integer multiple of =m, there is some j such that j=m = q and hence the sets partitioning all Ci for i j form a partition of Jhaiq KS and we have the result.
13
The next theorem is the main result of this section. It shows how the original process can be reconstructed from the approximants. Theorem 4.5 Let (S; i; ; ) be a labeled Markov process that is maximally collapsed, that is, S = S= . If we are given all nite approximations S (n; ), we can recover (S; i; ; ).
Proof . We can recover the state space trivially by taking the union of states at any level of any approximation. We know from the fact that S is maximally collapsed that is generated by the sets of the form JK; see [DEP99] for details. Thus Lemma 4.4 implies that
B := fB : (B; n) 2 S (n; ) for some n 2 N and some > 0g generates (obviously, B ).
The main diculty is that we have to recover the transition probability function. To do so, let F (B) be the set containing nite unions of sets in B. We rst argue that F (B) forms a eld, then we de ne a (s; ) on it and we show that a (s; ) and a (s; ) agree on it for all s 2 S . It will imply that a (s; ) is nitely additive on F (B) and hence that it can be extended uniquely to a measure on , and hence that a and a agree on S , as desired. We show that F (B) forms a eld. It is obviously closed under nite unions. To see that it is also closed under intersection and complementation, note that if (C; n) 2 S (n; ), then for all m > n and all such that is an integer multiple of , C is a union of a family of sets Ci such that (Ci ; m) 2 S (m; ). Now let C 2 F (B), s 2 S , a 2 A and let
a (s; C ) := sup n;
X
BC (B;n?1)2S (n;)
pa ((As ; n); (B; n ? 1)):
We prove that a (s; ) and a (s; ) agree on F (B) for all s 2 S . Obviously, a (s; C ) a (s; C ) for C 2 F (B). We now prove the reverse inequality. Let m be the number of states at level n ? 1 in S (n; ). sup n;
X
BC (B;n?1)2S (n;)
pa ((As; n); (B; n ? 1)) sup n;
X
BC (B;n?1)2S (n;)
(a (s; B ) ? =m)
sup (a (s; [B ) ? ) n; sup (a(s; C ) ? ) =
(n;)2I a(s; C );
where I is the set of pairs (n; ) such that in S (n; ), level n contains a partition of C (note that there are arbitrary small s that are involved in I ). This concludes the proof that and agree and we are done. The next proposition shows how the partitions produced in the construction de ne equivalence relations which give \in the limit" the bisimulation relation. Proposition 4.6 Let S = (S; i; ; ) be a process. Then states s; t 2 S are bisimilar i they are in the same partition in all nite approximations to S . 14
Proof . It is not hard to show by induction that every union of states (Bi )1ik at some level of an approximation corresponds to a formula of L_ (_ is the nite disjunction) to which we add negation, and that this formula is satis ed in S by and only by the state that belong to [i Bi in S . So if two states don't belong to the same state of an approximation, they are distinguishable by a formula and hence are not bisimilar. If s and t are not bisimilar, then there is a formula 2 L that distinguishes them. So assume s j= and t 6j= . Then by Theorem 4.7, there is an approximation S (n; e) such that (As ; n) j= . The state t cannot be in As because it would then satisfy , by Theorem 4.3, which is not true.
This last result justi es working with the approximants if one is interested in reasoning about bisimulation. The next result shows that if one is interested in logical reasoning about processes then any formula is satis ed by one of the nite approximants.
Theorem 4.7 If a state s 2 S satis es a formula , then there is some approximation S (n; ) such that (As ; n) j= . Proof . The proof is by induction on the structure of formulas. However a direct induction proof does not work in any obvious way. We need to prove a signi cantly stronger result in order to use a stronger induction hypothesis. We prove that for all formula and every l depth() there is an increasing sequence (An )nl of sets in which satisfy: (i) [nl An = JKS ; (ii) 9(Ci ; l) 2 S (n; 1=2n ), i =1; : : : ; m, such that An = [mi=1 Ci , n l; (iii) the states (Ci ; l) satisfy in S (n; 1=2n ). It is obvious for T for which you choose An = S for all n. We x n = 1=2n . Note that every non-trivial formula is of the form ^ji=1 hai iqi i , so assume the claim is true for i , i = 1; : : : ; j and let l depth(^ji=1 hai iqi i ). Then l ? 1 depth(i ) for all i = 1; : : : ; j . Let (Ain )nl?1 be the sequence for i at level l ? 1. Now de ne for n l, the sequence
Bn = fs 2 S : a (s; Ain ) > qi + n; i =1; : : : ; j g: i
Note that this is an increasing sequence of sets in . We rst prove (i), that is, for all s j= , there is some n such that s 2 Bn . So assume ai (s; JK) > qi for all i = 1; : : : ; j . Then, since ai (s; ) is a measure and Ain is an increasing sequence which converges to Ji K, there is some n such that ai (s; Ain ) > qi , i = 1; : : : ; j . Moreover, there is some n such that ai (s; Ain ) > qi + n, i = 1; : : : ; j , because Ain is increasing and n is decreasing to 0. Thus s 2 Bn and (i) is proved. We now prove (ii) and (iii). Let s 2 Bn , for a xed n l. Then because all states (A; l ? 1), where A Ain , satisfy and by Lemma 4.2, we have
pa ((Cs; l); (Ji KS (n;) ; l ? 1)) pa ((Cs ; l); (Ain ; l ? 1)) a (s; Ain ) ? n > q i + n ? n = q i ; i
i
i
and hence, (Cs ; l) j= . This means that Bm is a union of sets of level l which satisfy , as wanted in (ii) and (iii). So the proof is complete. 15
We conclude this section with a discussion of a special class of nite processes. Finite processes with tree-like structure and rational transition probabilities play a special role. We examine their properties in the next result. For brevity we will just say \rational tree" when we mean a nite-state process with a tree-like transition graph and rational transition probabilities. We have already seen that the processes of the form S (n; ) are always trees. In turns out that if we look only at rational trees we can approximate all labeled Markov processes. We will make this precise in a later section after we have introduced a metric between processes. Here we establish some basic facts about simulation and strict simulation between rational trees and other processes. In the present discussion we need to work with the strict simulation relation rather than plain simulation because the results are not correct with ordinary simulation for technical reasons having to do with strict inequalities providing more \room to maneuver." The next few results state special properties of simulation between nite processes. Lemma 4.8 If a process simulates a nite process, then it simulates it through their direct sum.
Proof . Assume there is a simulation between P and S . Consider the relation W induced by the logic on the direct sum of P and S de ned as follows. p 2 P is related to s 2 S if s satis es all the formulas that p satis es. Then W relates initial states p0 of P and i of S since they
are related by the simulation R and hence, by Theorem 4.3 i satis es all the formulas that p0 satis es. Now let pWS and let Y be a W -closed set in the direct sum. Then Y \ S must contain B = [p2Y \P \pj= JKS by de nition of W . It is not hard to show that B and the corresponding set A in P (where JKP is considered in P ) are the limits of a decreasing chain of formulas, which implies that a (p; B ) a (s; B ), since this inequality is true for formulas because pWs. Observe that if Y is W -closed, then Y \ P = A. by de nition of W . Hence we have
a(p; Y \ P ) a (p; A) a (s; B ) a (s; Y \ S ) as wanted.
Corollary 4.9 If a process strictly simulates a rational tree, then it strictly simulates it through their direct sum.
Proof . Let T be a nite rational tree strictly simulated by S . Then consider T 0 a nite rational tree obtained from T by increasing all the probabilities of the non-zero transitions in such a way that T 0 is still simulated by S . Then it is easy to see that T 0 strictly simulates T , through the relation R that relates a state in T to the same state in T 0 . This relation lives on the direct sum of T and T 0. Now we know that T 0 is simulated by S through the direct sum of T 0 and S (the above
result is for simulation relations). Let R0 be the witnessing relation. We say tWs if t0 R0 s, where t0 is the state in T 0 corresponding to t. Now let Y be a W -closed set of T + S . Then R(Y \ T 0 ) is obviously measurable in T 0 and R-closed. Moreover R(Y \ T 0 ) [ (Y \ S ) is also R0 -closed. Hence we have a(t; Y \ T ) < a0 (t; R(Y \ T )) a(s; Y \ S ):
16
The preceding results allow us to use a simpler de nition of simulation and strict simulation when a nite process is involved that will not involve direct sums. It is easy to check that the following de nition of simulation and strict simulation is equivalent to the one we have de ned previously.
De nition 4.10 A simulation between a nite process P = (P; p0 ; P (P ); ) and another process S is a re exive and transitive relation on P [ S such that the restrictions of R to P and S are simulations and if pRs implies that for every R-closed set A P [ S such that A \ S 2 , we have a (p; A \ T ) a(s; A \ S ). If P is a nite tree, a simulation R is a strict simulation if the above inequality is strict when a (s; A \ S ) > 0. Now observe that if a nite process P is strictly simulated by some process S , then there is an > 0 such that a (p; A \ T ) < a (s; A \ S ) ? whenever a (s; A \ S ) > 0. We then call R an -strict simulation and we write R instead of R.
Lemma 4.11 Let T be a nite tree that is strictly simulated by a labelled Markov process S . Then there is a nite approximation S (n; ) strictly simulating T . Proof . Let R be the strict simulation between T and S . Consider S (n; =4), where n is the height of T . We assume that T only involves labels a1 ; : : : ; an , but the proof can be adapted easily if it is not the case, because for sure T only involves a nite number of labels. We rst extend R to R0 in the following way. Let t 2 T be at level l, and let s 2 S . Let R0 be the transitive closure of tR0 s if there is some s0 2 B such that (B; l) is a state of S (n; =4) and tRs0 . We prove that R0 is a strict simulation between T and S . Observe that R0 coincides with R on both T and S . Let sR0t and let Y T [ S be an R0 -closed set such that Y \ S 2 . Then it is also R-closed and Y \ S is a union of sets at level l ? 1 of S (n; =4). Hence we have the following: a(t; Y \ T ) < a(s0 ; Y \ S ) ? a (s; Y \ S ) + =4 ? = a(s; Y \ S ) ? 3=4 because s and s0 belong to the same set of level l. We have proved that R0 is a 3=4-strict simulation between T and S . We now de ne the relation W between T and S (n; =4). Let W be the transitive closure of the re exive relation which contains the restriction of R0 to T and relates t and (A; l) if t is at level l in T and tR0s for some s 2 A. Observe that W coincides with R0 on T and is the identity relation on S (n; =4). Now take two W -related states t 2 T and (A; l) 2 S (n; =4). Let Y T [ P be W -closed. Let (B; l ? 1) be the \set" formed by taking the union of sets of the states of Y \S (n; =4) restricted to level l ? 1 in S (n; =4). Then B is obviously measurable in S and (Y \ T ) [ B is R0 -closed. Then if s is the state in A such that tR0s,
a(t; Y \ T ) < a(s; B ) ? 3=4 pa((A; l); (B; l ? 1)) + =4 ? 3=4 < pa ((A; l); Y \ S (n; =4)) by Lemma 4.2. So we are done.
Theorem 4.12 Let T and T 0 be two nite trees that are strictly simulated by a labelled Markov process S . Then there is a nite tree which is strictly simulated by S and also strictly simulates both T and T 0 . i.e., nite rational trees that are strictly simulated by a process form a directed set. 17
Proof . Let T = (T; t0 ; P (T ); ) and T 0 = (T 0; t00 ; P (T 0 ); 0) be strictly simulated by S by relations R and R0. Then by Lemma 4.11, there are n; n0 2 N and ; 0 ; ; 0 > 0 such that T is -strictly simulated by S (n; ) = (P; p0 ; P (P ); ), and similarly for T 0 . Let < ; 0 be a natural divisor of
both and 0 and let n = max(n; n0 ). We show that T and T 0 are strictly simulated by S (n ; ). Since is a divisor of , the partition at any level l of S (n ; ) is a re nement of the partition at level l of S (n; ). Let W be the re exive relation on T [ P which coincides with R on T and relates t and (A; l) if tR(B; l) for some (B; l) 2 P such that A B . Let Y T [ P be W -closed. Then (Y \ T ) [ R(Y \ T ) is R-closed and R(Y \ T ) is measurable in S when considered as a set in S . We now prove that R(Y \ T ) is included in Y \ P when they are considered as sets in S . If (B; l) 2 R(Y \ T ), then there is some t 2 Y \ T such that tR(B; l) which implies that for all (A; l) 2 S (n ; ) such that A B , (A; l) 2 Y \ P . Since sets of level l in S (n ; ) form a re nement of sets of level l in S (n; ), we have B Y \ P (Y \ P considered as a set in S ). Now let s 2 A B .
a (t; Y \ T ) < a ((B; l); R(Y \ T )) ? a(s; R(Y \ T )) ? a(s; Y \ P ) ? a ((A; l); Y \ P ) + ? < a ((A; l); Y \ P ); as wanted. A similar proof shows that T 0 is strictly simulated by S (n ; ). Observe that S (n ; ) is a tree. We construct a nite rational tree from it. Let T = (P ; p0 ; P (P ); ) where will be de ned below. T has the same state-space and initial state as S (n; ), and it will have the same tree shape. We de ne by decreasing every probability of a non-zero transition of S (n ; ) to a rational number below it in such a way that T will be strictly simulated by S (n ; ) and will still strictly simulate T and T 0 .
5 Probabilistic logic via functions into [0 1] ;
We ended the last section with the proof that every formula satis ed by a process was satis ed by a nite approximant. In order to really capture the idea of approximate reasoning we need to shift from the traditional view of logical formulas to measurable functions. Working with measurable functions one has the right setting to talk about convergence. In this section, we present an alternate presentation of probabilistic logic using functions into the reals. Our technical development is based on the following key idea expounded by Kozen [Koz85] to generalize logic to handle probabilistic phenomena. Some of this work has appeared before in the setting of discrete systems. The class of functions we use here is richer than in our earlier presentation, the setting is that of continuous or discrete labeled Markov processes, but mainly the approximation results are new. Classical logic Generalization Truth values f0; 1g Interval [0; 1] Propositional function Measurable function State Measure Evaluation of prop. functions Integration 18
The key results of this section are:
We present syntactic and semantic characterizations of the functions that are used in this
paper. We explore the expressiveness of the functional view, by showing that function expressions are at least as expressive as LW ; and by showing that the function viewpoint is sound and complete for bisimulation. We show that reasoning with functions on processes is nitary in nature, by showing that the value of a function at a state in a process is the limit of its values at its nite and rational approximants.
5.1 functional expressions
We de ne a set of functional expressions by giving an explicit syntax. It is worth clarifying our terminology here. A functional expression becomes a function when we interpret it in a process. Sometimes we may loosely say \the same function" when we move from one process to another. What we really mean is the \same functional expression"; obviously it cannot be the same function when the domains are dierent. This is no dierent from having syntactically de ned formulas of some logic which become boolean-valued functions when they are interpreted on a structure. We now give the class of functional expressions. First, some notation. Let brcq = r ? q if r > q, and 0 otherwise. dreq = q if r > q, and r otherwise. Note that brcq + dreq = r.
De nition 5.1 For each c 2 (0; 1], we consider a family F c of functional expressions generated by the following grammar. Here q is a rational in [0; 1].
f c ::=
j j j j j
s:1 s:1 ? f c(s) s: min(f1c(s); f2c(s)) c s: sup R i2N ffi (s)gc s:c t2S a (s; t)f (t) s:bf c(s)cq j s:df c(s)eq
Constant schema Negation schema Min schema Sup schema Pre x schema Conditional schemas
The functional expressions generated by these schemas will be written as 1; 1?f; min(f1 ; f2 ); supi2N ffi g; hai:f; bf cq and df eq respectively. The indexing set in the Sup schema is the natural numbers. One can informally associate functional expressions with every connective of the logic L in the following way | the precise formalization will be presented in lemma 5.7. T is represented by s:1, conjunction by min, disjunction by sup, and negation by 1?. The contents of the connective haiq is split up into two expression schemas: the hai:f schema that intuitively corresponds to pre xing and the conditional schema bf cq that captures the \greater than q" idea.
De nition 5.2 F+c is the sub-collection of F c that does not use the negation schema. F inc is the sub-collection of F c that only uses a binary version of the Sup schema. F inc+ is the sub-collection of F c that only uses a binary version of the Sup schema and does not use the negation schema. We now present a semantic characterization of the functions induced by functional expressions. Given a labeled Markov process S , any expression f c 2 F c induces a function fSc : S ! [0; 1]. 19
Figure 2: Examples of plMps
Theorem 5.3 The functions 1; 1 ? f; min(f1 ; f2); bf cq ; df eq can be used to approximate any con-
tinuous Lipschitz function from [0; 1] to [0; 1].
Proof . We exploit a property shown in [Wei97] | given any continuous function f : [0; 1] ?! [0; 1] satisfying j f (x) ? f (y) jj x ? y j, and any > 0, there exists a continuous polygonal function g built out of line segments of x-length , with slope either 0; 1 or ?1, such that for all x 2 [0; 1], f (x) ? g(x) < . The proof of this is straightforward | we construct the function g from left to
right by appending the appropriate segment each time, showing that there is always a segment enabling g to satisfy the property. So we need to construct these polygonal functions from the given functions. We will construct any polygonal function by showing that to any such function fx with domain [0; x] we can append a segment with slope 0; 1 or ?1 to get a new function fx+, which agrees with fx on [0; x], and has slope 0; 1 or ?1 after that. Thus to add a segment of slope 1, extend the segment backward until it meets the X or Y axis. Suppose it hits the X axis at z | we set fx+ = max(fx; bI cz ). We know from the Lipschitz property that for fx bI cz over [0; x]. The other cases can be dealt with similarly.
This shows that we can replace the constant schema, min schema, negation schema and conditional schema with one schema : s:g(f (s)), where g is any continuous Lipschitz function. To get positive versions of the schemas, we can restrict g to monotone continuous Lipschitz functions. With this characterization in hand we can observe that the Hutchinson metric is very close in spirit to our metric.
5.2 Examples
Example 5.4 Consider the processes A1 and A2 of gure 2. All transitions are labeled a. The functional expression (hai:1)c evaluates to c at states s0 ; s2 of both A1 and A2 ; it evaluates to 0 at states s1 ; s3 of A1 and s3 ; s4 of A2 , and it evaluates to c=2 at state s1 of A2 . The functional expression (hai:hai:1)c evaluates to 3c2 =4 at states s0 of A1 ; A2 and to 0 elsewhere. The functional expression (hai:bhai:1c 12 )c evaluates to 3c2 =8 at state s0 of A1 and to c2 =4 at state s0 of A2 .
Example 5.5 Consider the process A3 of gure 2. All transitions are labeled a. A functional expression of the form (h|ai: :{z : : :ha}i :1)c evaluates to cn at state s0 . On state s0 of process A4 the n
same functional expression evaluates to (c0:4)n .
5.3 Expressiveness of functional expressions
In this subsection we show how logical formulas of LW can be captured by the functionals we have introduced. The following lemma is the functional analog of Proposition 3.7. Lemma 5.6 If R is a simulation relation between processes S and S 0 , then (8f c 2 F+c ) (8s 2 S; s 2 S 0 ) [sRs0 ) fSc (s) fSc 0 (s0 )] 20
Proof . We prove the lemma for R a simulation on a single process S . For the general case, observe
that for every state of a process, the value of a functional is the same in the process as in any direct sum of which it is a summand. The proof proceeds by well-founded induction on the construction of the functional expression f c. The key case is g = hai:f . By the inductive assumption on f , we have s0 R t0 ) f (s0) f (t0 ). Let s R t. We will prove that g(s) g(t). Let (u) = a (s; u), and (u) = a (t; u). Consider simple functions h5 derived from f as follows: let v1 < : : : < vn be nitely many values of f . De ne h(s) = maxfvi j vi f (s)g. This satis es f (x) f (y) ) h(x) h(y) and (8u) h(u) f (u). We have, Z
Z
fd = sup hd h
where the sup ranges over all simple functions Rh that satisfy the above conditions. Hence, it suces R to prove that for any such simple function h, hd hd . Consider one such h with range fv1 : : : vn g. Then, for each 1 i n, the set Si = h?1 fvi : : : vn g is measurable. It is also R-closed, as if s 2 Si and sRt f (s) f (t), so t 2 Si. Thus, for each R then P Si, (Si ) (Si), and this will prove the result, as hd = i(vi ? vi?1 )(Si ).
Lemma 5.7 Given any 2 LW and any labeled Markov process S , and any c 2 (0; 1], there exists f c 2 F+c such that 1. 8s 2 S , fSc (s) > 0 i s j= . 2. for any other labeled Markov process S 0 , 8s0 2 S 0 , fSc 0 (s0 ) > 0 ) s0 j= . Proof . We rst prove this for nite-state processes. Let P = (P; p0 ; P (P ); ) be a nite-state
process. The proof is by induction on the structure of . If = T, the functional expression s:1 suces. If = 1 ^ 2 , let f1c and f2c be the functional expressions corresponding to 1Wand 2 . Then the functional expression s: min(f1c (s); f2c(s)) satis es the conditions. If = i fi g, let fic be the the functional expression corresponding to i . Then the functional expression s: sup(f1c; : : : ; fnc ; : : : ) satis es the conditions. If = haiq : , let gc be the functional expression corresponding to yielded by induction. Let x = minfg(s) j s 2 J KP g. By induction hypothesis, x > 0. Consider the functional expression f c given by bhai:dgex ccxq . For all t 2 J KP , (dgex )(t) = x. Now for any state p 2 P , (hai:dgex )c (p) = cx
X
t2J
K
a(p; t) = cxa (p; J KP )
Now for each state p 2 JKP , a (p; J KP ) > q. Thus f c satis es the rst condition. Now let S be an arbitrary process. Let s be a state in S such that s j= . By Lemma 4.7, there is a nite approximation S (n; ) of S such that (As ; n) j= . By Lemma 5.7, 9f c 2 F+c such that fSc (n;)((As ; n)) > 0 and for any process S 0, 8s0 2 S 0 :s0 6j= ) fSc 0 (s0 ) = 0. By Proposition 4.3 and Lemma 5.6, fSc (s) > fSc (n;)((As ; n)) > 0, so f c satis es the conditions required by the lemma. The second condition holds because for any state s0 in S 0 , (hai:dgex )(s0 ) cxa0 (s0 ; J KS 0 ), so if 0 a (s0 ; J KS 0 ) q then (bhai:dgex ccxq )(s0 ) = 0. 5 A function is simple if its range is nite.
21
Note that for the L subfragment of the logic, the resulting function is in F inc+ .
Corollary 5.8 For all formulas 2 L, there exists fc 2 F+c such that (8S ) (8s 2 S ) [fSc (s) > 0 , s j= ] Proof . Take the sup of the functions corresponding to for the (countably many) states in nite processes. By the previous corollary, this function satis es the desired property.
Example 5.9 fc satis es: fTc = 1 For any state s in process P , fhcai :T(s) = ba (s; P )cq . q
fc^ = min(fc ; f c ) fc_ = max(fc ; f c )
The next result says that functions are sound and complete for bisimulation. Proposition 5.10 For any labeled Markov process S , 8c 2 (0; 1], 8s; s0 2 S , s; s0 are bisimilar implies that: (8f 2 F c ) [fSc (s) = fSc (s0 )]
Proof . We now show that for any bisimulation R, sRs0 implies that (8f 2 F c) [fSc (s) = fSc (s0)].
The proof proceeds by induction on the structure of the function expression f c. The R keyccase is when R c c c f is of the form (hai:g) . Then we would like to show that t2S g (t)a (s; dt) = t2S g (t)a (s0; dt). Consider any simple function h approximating gc , with values vi ; i = 1::n, de ned by h(s) = maxfvi j vi gc(s)g. Then the set Si = h?1 (vi ) is measurable and t 2 Si and tRt0 then by induction gc (t) = gc (t0 ), so t0 2 Si . Thus a (s; Si ) = a (s0 ; Si ), which shows the result.
Theorem 5.11 For any labeled Markov process S , 8c 2 (0; 1], 8s; s0 2 S (8f 2 F inc+ ) [fSc (s) = fSc (s0 )] ) (8 2 L)[s j= , s0 j= ] Proof . Let be such that s j= and s0 6j= . By Lemma 5.7, there is a functional expression f c 2 F inc+ such that fSc (s) > 0 and fSc (s0 ) = 0. Example 5.12 Consider the processes A1 ; A2 of gure 2. The calculations of example 5.4 show that the s0 states of A1 ; A2 are distinguishable. Furthermore, the states are indistinguishable if we use only the function schemas Constant, Min and Pre xing. Thus, example 5.4 shows that the conditional functional expressions are necessary.
22
5.4 Finitary reasoning with functional expressions
Lemma 5.13 Let (S; i; ; ) be a labeled Markov process, S (n; ) one of its approximation, and f a functional expression in F c of depth n involving only labels a1 ; : : : ; an . Then jfS (s) ? fS (n;)((As ; n))j n: Proof . Let depth(f ) = d n, we prove that for every state s 2 S and for every l with d l n, jfS (s) ? fn((As ; l))j l (we write fn for the evaluation of f in S (n; )). This will be proved by induction on the structure of f . Of course, the inequality is true if f is s:1. Now assume that the inequality is true for f , i.e.,
jfS (t) ? fn(Bt ; l)j l; for d l n; t 2 S: It is easy to verify that the constructors min, and g f where g is any Lipschitz function and max fi satisfy the inequality. Now for hai:f we have jhaZi:fS (s) ? hai:fn((As ; l + 1))j X fn((B; l))pa ((As ; l + 1); (B; l))j = jc fS (t)a (s; dt) ? c S
= jc
cj c
X
Z
(B;l)2S (n;)
[ fS (t)a (s; dt)
Z
(B;l)2S (n;) B XZ
(B;l) B X
B
fn((B; l))a (s; dt) ? fn((B; l))pa ((As ; l + 1); (B; l))]j
[fS (t) ? fn((B; l))]a (s; dt)j + cj
a(s; B ) + cj
X
X
(B;l)
fn((B; l))[a (s; B ) ? pa ((As ; l + 1); (B; l))]j
a(s; B ) ? pa ((As; l + 1); (B; l))j (B;l)2S (n;) (B;l)2S (n;) cl + c(a (s; S ) ? pa((As ; l + 1); (S; l)))
cl + c (l + 1):
The second inequality follows by induction hypothesis and from the fact that fn is less than 1, and hence is true for all d l n. The last inequality follows from the preceding lemma. So we proved that for hai:f , which is of depth d + 1,
jhai:f (s) ? hai:fn ((As ; m))j m for all (As ; m), where d + 1 m n, as wanted.
Corollary 5.14 Let (S; i; ; ) be a labeled Markov process and f a functional expression of F c involving nitely many labels. Then for all n max(depth(f ); N ) (N is the greatest index for a label appearing in f ). Then for every s 2 S , fS (s) = lim f ((A ; n)): !0 S (n;) s
6 Metrics and Convergence In our recent paper [DGJP99] we introduced the notion of metric between discrete process. In the present section we extend these ideas to the continuous case and show how the approximants of 23
a process converge to the original process under this metric. Intuitively the metrics measure how \visibly" dierent the processes are. In terms of logic one can say that two processes are very close if the formulas that tell them apart are very complex. To capture this intuition quantitatively we use the functions introduced in the last section. There is now a second notion of how far apart processes are; the distinguishing functions could have values which are very dierent or only slightly dierent. We actually study a family of de nitions which assign dierent weights to these dierences6 . The class of functional expressions F c motivates the de nition of a function dc from processes to [0; 1]. De nition 6.1 Each collection of functional expression - let F c be the set of expressions in one such collection - induces a distance function as follows: dc(S ; S 0 ) = sup jfSc (i) ? fSc 0 (i0 )j c c f 2F
Of course we have to show that we get a metric from this de nition. This de nition is close in form to the de nition of the Hutchinson metric [Hut81] which is used in the theory of fractals. The dierence is in the class of functions used. In the Hutchinson metric one uses the family of Lipschitz functions. Later we will see that our de nition is even closer to the Hutchinson metric than appears at rst sight. We study the family of metrics fdc j c 2 (0; 1]g. These metrics support the spectrum of possibilities of relative weighting of the two factors that contribute to the distance between processes: the complexity of the functions distinguishing them versus the amount by which each function distinguishes them. d1 captures only the dierences in the probability numbers; probability dierences at the rst transition are treated on par with probability dierences that arise very deep in the evolution of the process. In contrast, dc for c < 1 give more weight to the probability dierences that arise earlier in the evolution of the process, i.e. dierences identi ed by simpler functions. As c approaches 0, the future gets discounted more. As is usual with metrics, the actual numerical values of the metric are less important than the notions of convergence that they engender Thus, we take the uniformity view of metrics, eg. see [Ger85] 7 , and will view the metric via properties like the signi cance of zero distance, relative distance of processes, contractivity and the notion of convergence rather than a detailed justi cation of the exact numerical values. The main results of this section are: We show that each dc; c 2 (0; 1] is a metric. In particular, processes at 0 distance are bisimilar. The nite representation results of the earlier section show that the space of processes is a separable metric space for each of these metrics. We describe some perturbation results | informally, we show that small perturbations of the probabilities in a process yields a process that is within a small distance of the unperturbed process. The de nition of the metric above has a quanti cation over all functional expressions. To ease working with metrics, we show that for c < 1, there is a single function that characterizes the balls around a given state. For c < 1, we show that that the problem d(S ; S 0 ) < is decidable. 6 There are other interesting notions of metric that we do not address here. 7 Intuitively, a uniformity captures relative distances, eg. is x is closer to z than y ; it does not tell us what the
actual distances are. For example, a uniformity on a metric space M is induced by the collection of all balls S where S = ffy midd(x; y) < g j x 2 M gg.
24
6.1 Basic properties and examples of dc
The rst main result is that we have a separable metric space. We need an easy lemma to start. Lemma 6.2 Given any process of the form S (n; ) we can construct a sequence of rational trees Ti such that T1 T2 : : : S (n; ) and with limi!1 dc(Ti; S (n; )) = 0.
Proof . The process S (n; ) is already a tree. We just take a family of trees with the same shape
and with the probabilities chosen to be rational and to converge to the probabilities occurring in S (n; ). We can always choose these numbers to be strictly increasing. Thus we get immediately that the strict simulation relation holds. It is easy to see that the family of rational trees converge in the metric to S . Now we have the key theorem. Theorem 6.3 For all c 2 (0; 1], dc is a metric on bisimulation equivalence classes of processes and this yields a separable metric space.
Proof . The triangle inequality is an easy exercise and symmetry of dc is immediate. The fact that dc(S ; S 0 ) = 0 i S and S 0 are bisimilar follows from theorem 5.11.
We now show that the rational trees introduced in section 4 form a countable dense subset. Given any process S we can consider the countable family of nite approximations S (n; 2?n ) to S . For each such nite process we have a countable sequence of rational trees fTj(n) j j; n 2 Ng, as in lemma 6.2. Now, since the rational trees form a directed set with the strict simulation order we can construct a sequence of rational trees as follows. We choose T1 to be T1(1) . For Ti+1 we i+1) with T ; because we have a directed set we have some rational tree strictly above compare Ti(+1 i both, we designate one such to be Ti+1 . Thus we have a sequence of rational trees ordered by strict simulation and which converge to S .
Example 6.4 The analysis of example 5.12 yields dc (A1; A2 ) = c2 =8. Example 6.5 Example 5.5 shows the fundamental dierence between the metrics dc; c < 1 and d1 . For c < 1, dc (A3 ; A4 ) is witnessed by (hai:1)c and is given by dc(A3 ; A4 ) = 0:6c. In contrast, d1 (A3 ; A4 ) = supf1 ? (0:4)n j n = 0; 1; : : : g = 1. Example 6.6 Consider the family of processes fP j 0 < rg where P = ar?:Q, i.e. P is the process that does an a with probability r ? and then behaves like Q. The functional expression (hai:1)c evaluates to (r ? )c at P . This functional expression witnesses the distance between any two P 's (other functions will give smaller distances). Thus, we get d(P1 ; P2 ) = cj1 ? 2 j. This furthermore ensures that P converges to P0 as tends to 0. Example 6.7 (from [DEP98]) Consider the processes P (left) and Q (right) of gure 3. Q is just like P except that there is an additional transition to a state which then has an a-labeled transition
back to itself. The probability numbers are as shown. If both processes have the same values on all functional expressions we will show that q1 = 0, i.e. itPreally cannot be present. The functional P expression (hai:1)c yields c ( 1 + i0 qi ) on Q. The functional expression i0 pi ) on P and c(qP P c 2 2 (hai:hai:1) yields c ( i1 pi ) on P and c (q1 + i1 qi) on Q. Thus, we deduce that p0 = q0 . Similarly, considering functional expressions (hai:hai:hai:1)c etc, we deduce that pn = qn . Thus, q1 = 0.
25
Figure 3: Probability and countable branching
6.2 Decidability of the Metric
In this subsection we show how to approximate the metric. This result is a small variation of our result in [DGJP99]. The key part is showing how to approximate the functions of the form f c; c < 1. De ne the depth of a f c 2 F inc inductively as follows: depth(s:1) = 0, depth(min(f1 ; f2 )) = max(depth(f1 ); depth(f2 )), depth(1 ? f ) = depth(bf cq ) = depth(df eq ) = depth(f ). Then it is clear that for any state s of any process S , jf c(s)j cdepth(f ) . The class F of all the functions of depth n or less satisfy the following inequality (dc (P ; Q) ? sup(jf c(sP ) ? f c(sQ )j) cn : F
What this inequality says is that if we use functions of depth n or less to compute the sup we will be close to the actual metric distance. However there are in nitely many functional expressions of depth n. We now construct a nite subset of these, such that the above inequality still holds. We construct the set of functions inductively as follows. Let F i be the set of all functions of depth i. De ne:
F1i+1 F2i+1 F3i+1 F4i+1 F5i+1 F6i+1 F7i+1 F7i+1 F8i+1 F i+1
= = = = = = = = = =
fhai:f j f 2 F ig f1 ? f j f 2 F1ig fbf cq j f 2 F2i+1 ; q 2 Ai+1 g f1 ? f j f 2 F3i+1 g fdf eq j f 2 F4i+1 ; q 2 Ai+1 g f1 ? f j f 2 F5i+1 g fdf eq j f 2 F6i+1 ; q 2 Ai+1 g f1 ? f j f 2 F7i+1 g fmin(f1; : : : ; fn) j fi 2 F7i+1 [ F ig fmax(f1 ; : : : ; fn) j fi 2 F8i+1 g
We can prove that for any f c 2 F inc of depth n, there is a function in F n that approximates it closely enough. Lemma 6.8 Let f c 2 F inc be of depth i n. Then, there exists gfc 2 F i such that: (8process S ) (8s 2 S ) [jf c (s) ? gfc (s)j 3m+1n?i ]
Proof . The proof proceeds by induction on i. Here, we sketch the two basic ideas of the proof for
the inductive step. The following identities show that repeating steps 2 onwards on F i+1 does not get any new
26
functions.
bbf cq cr = bf cq+r bdf eq cr = dbf cr eq?r bmin(f1; f2 )cr = min(bf1 cr ; bf2 cr ) 1 ? (1 ? f ) = f 1 ? max(f1 ; f2 ) = min(1 ? f1 ; 1 ? f2) b1 ? bf cq cr = db1 ? f cr?q e1?r d1 ? df er eq = 1 ? d1 ? d1 ? f eq er b1 ? df eq cr = b1 ? f cr ; if q + r 1 b1 ? df eq cr = db1 ? f cr e1?q?r ; if q + r < 1
ddf eq er = df emin(q;r) dmin(f1; f2 )er = min(df1 er ; df2 er ) 1 ? min(f1 ; f2 ) = max(1 ? f1 ; 1 ? f2 )
min(max(f1 ; f2 ); max(f3 ; f4 )) = max(min(f1 ; f3 ); min(f1 ; f4 ); :::) De ne f1 ; f2 to be -close if for all states s in all processes S , jf1 (s) ? f2 (s)j < . Then if f1 and f2 are -close, then so are (hai:f1; hai:f2 ), (bf1 cq ; bf2 cq ), df1 eq ; df2 eq ) and (1 = f1 ; 1 ? f2 ). In addition if f10 and f20 are also -close, then min(f1 ; f10 ) and min(f2 ; f20 ) are also -close. Furthermore, jq1 ? q2j ) jbf cq1 (s) ? bf cq2 (s)j Similarly, for df e() . Given nite processes P; Q, we have an algorithm for computing dc (P; Q) for c < 1 to any desired accuracy c1n , where n is a natural number. We do this by computing supF jf c(sP ) ? f c(sQ)j for a nite set of functions F , and then show that for this F , dc (P; Q) ? supF jf c(sP ) ? f c(sQ )j c1n . The required nite set of functions is given by Lemma 6.8. While the algorithm is not interesting - because it is of very high complexity - it is certainly worth noting that our metric is constructive. p
6.3 Behavioral Properties of the Metric
The theme of this subsection is extracting behavioral information from the metric. Our rst lemma shows that we can nd a function - in our class of functions - which is characteristic for -balls around a given state. Lemma 6.9 Let s be a state in a labeled Markov process S and 2 (0; 0:5). Let ffi j i = 1 : : : ng be a nite set of functional expressions. Then, there is a single function f such that f (s) = , and f (t) = 0 i for any i,jfi(s) ? fi (t)j . Proof . De ne functional expressions gi as follows. Let fi(s) = q. bfi cq?; b1 ? fic1?q?); if q > gi = bmin( 1 ? fic1?q? ; if q gi (s) = . Also, for any state t in any process, if fi (t) q + or fi (t) q ? , then gi (t) = 0. The functional expression f = min(g1 ; : : : ; gn ) satis es the required properties.
Proposition 6.10 Let s be a state in a process. let 2 (0; 0:5); c 2 (0; 1]. Then, there is a function f such that f (s) = , and for any state t in any process, f (t) = 0 if dc (s; t) , and f (t) ? dc(s; t) if dc (s; t) < . Proof . Using the nitely many functions from Lemma 6.8, we can form f as in the previous lemma. This f has the required property.
27
Corollary 6.11 Let P and P 0 be nite processes with start states p0 and p00 such that dc(p0; p00 ) < .
Let there be a transition from p0 to p on label a with probability r. Then, then there is a state p0 of P 0 to which p00 has an a-transition and d(p; p0) < =rc.
Proof . Consider the function f given by the above lemma for the state p00 using =rc. Then f (p00) = =rc, and
[hai:f ](p0 ) = c
X
a(p0 ; pi )f (pi ) ca (p0; p)f (p) =
Thus [hai:f ](p00 ) > 0. So there is some p0 such that f (p0 ) > 0. Now by the de nition of f ,d(p; p0 ) < =rc.
Example 6.12 Let P be a nite process with initial state p0 and S = (S; i; ; ) be any process such that d(P ; S ) < . Let f be any function, whose values on the states of P are y1 :::yn , where y1 < y2 ::: < yn . Let the states be numbered 1 : : : :n, lumping together states with equal values. Let p1:::pn be the probability of going to states 1 : : : n from p0 in P . X
k yi g)
X a (i; ft j f (t) yi g) < pk + e=(yi ? yi?1 ) ki
Thus, we have bounded a (i; ft j f (t) yi g) by two numbers. The gaps are caused by the fact that some states in S may have f (t) = yi .
6.4 Convergence of Approximations
The metrics that we have introduced measure how close the \observed bahavior" of two processes are. We will now prove that the approximations converge in the metric which strengthens the results of section 4 signi cantly. We are not just saying that the approximations somehow encode the information present in the process being approximated but that they come close in a behavioral sense.
Corollary 6.13 If S involves a nite number of labels, S (n; cn=n) converges to S in the metric dc, if c < 1.
28
Proof . Assume there is a nite number N of labels involved in S , that is, for every other label a 2 A, a(s; S ) = 0 for all s 2 S . Then Theorem 5.13 will be satis ed for every functional expression of depth n, if n N , i.e., jfS (i) ? fn (p0 )j < ncn =n = cn . Now, if depth(f ) n, we have 0 f cn by the way f is de ned. This implies that dc (S (n; cn ); S ) cn . Since c < 1, we have that when n increases, S (n; ) gets arbitrary close to S .
6.5 Perturbation
One of the major criticisms of process equivalences is that they are not robust. The results of this section show that if one slightly perturbs the probabilities in a process the result is close. De nition 6.14 Let S be a process. De ne a process S 0 to be an -perturbation of S if The state set of S 0 is the same as S . For each state s, and each label a, and each measurable set A0 of S 0, ja(s; A0 ) ? a0 (s; A0)j < . Our metric accommodates the notion of small perturbations of probabilities. Proposition 6.15 If c < 1, and S 0 is an perturbation of S , then dc(S ; S 0 ) < k where k = supn ncn 8 .
Proof . The proof is by induction on the formulas. The sole non-trivial case is hai:f . We write f for fS and f 0 for fS 0 . Let depth(f ) = n, and jf (t) ? f 0(t)j < ncn . Then f (s) cn and R R c[ t f (t)Ra(s; dt) ? t f 0(t)a0 (s; dt)] R = c f (t)[a (s; dt) ? a0 (s; dt)] + c Ra0 (s; dt)[f (t) ? f 0 (t)] < cn+1 j (s; A0 ) ? 0 (s; A0 )j + ncn+1 a0 (s; t) < cn+1 + ncn+1 = (n + 1)cn+1
Here A0 is the set on which the measure a (s; :) ? a0 (s; :) is positive. For c = 1 ncn increases without limit, and example 5.5 shows that the above lemma does not hold for c = 1. However in this case we can still perturb the process S in the following P way | let S be unfolded, so it has no loops. Let i ; i 2 N be non-negative rationals such that i i = < 1=3. Now we obtain S 0 by taking the same state set as S , and for each state s at depth n, ja (s; A0) ? a0 (s; A0)j < n for each label a and each measurable set A0. Then we can show by a similar calculation as above that d1 (S ; S 0 ) < 1 ? e?2 , thus as ?! 0, d1 (S ; S 0 ) ?! 0.
Example 6.16 Consider \straight line" formulas generated by ::= T j haiq Consider one such = ha1 iq1 : : : han iq T . Let P be a nite-state process unfolded to the depth of n
the formula such that p0 , the start state of P , satis es the formula. An easy induction using the proof of lemma 5.7 shows that Y fc (p0 ) cn (ri ? qi ) 8 e.g.
i
k = 1 for c 1=2.
29
where
ri = sinf (s; Ai+1 ) 2A a i
i
where Ai+1 is the set of all the states in level i+1 which satisfy the sux formula hai+1 iqi+1 : : : han iqn T. Note that this bound is achieved by the n-length chain automaton which has transition probabilities ri. Q The form of the expression f c(p0 ) cn i (ri ? qi ) tells us that if f c(p0 ) > , we can perturb 1 the probabilities at some level by up to n =c, and the resulting process will continue to satisfy the formula.
7 Related Work There has been a substantial amount of work on probabilistic transition systems and their associated equivalences. As far as we are aware, none of them have looked at bisimulation for continuous state spaces, except for the work of deVink and Rutten [RdV97]. The starting point of work in the area of probabilistic semantics are the fundamental papers of Saheb-Djahromi [SD78, SD80] and of Kozen [Koz81, Koz85]. These are concerned with domain theory and programming languages rather than with process equivalences, but they both introduced nontrivial measure-theoretic ideas. Kozen also noticed a very interesting duality between statetransformer semantics as described by stochastic kernels and a probabilistic predicate-transformer semantics in which programs are seen as inducing linear continuous maps on the Banach algebra of bounded measurable functions. This duality in uenced our search for a logical characterization though the logic we used owes more to Larsen and Skou [LS91]. The fundamental work on probabilistic bisimulation is the paper by Larsen and Skou [LS91] which analyzes not just bisimulation but also testing; in the setting of discrete systems. Our previous work on bisimulation for continuous state spaces will appear in nal form in [DEP99], which rests on some results of Edalat [Eda99]. From the point of view of applications there have been a number of very interesting results. The most interesting work, in our opinion, is the work of Jane Hillston [Hil94] on developing a process algebra for performance evaluation. Her work is not comparable to ours, because she works with temporal delay in discrete space Markov chains. The main point of her work is a compositional approach to performance evaluation. In her framework, she address continuous time in the following way. The systems being modeled are described by a probabilistic process algebra called PEPA. The semantics of PEPA are given in terms of labeled transition systems. Associated with the transitions is a continuous time random process with an exponentially distributed delay. Associated with each type of action is a dierent rate. Indeterminacy is resolved by races between events executing at dierent rates. In her work a crucial role is played by a congruence called strong equivalence. Strong equivalence is in fact closely analogous to Larsen-Skou bisimulation and - as she points out - to an old idea in queuing theory called lumpability. It is de ned in a way very similar to the way that Larsen and Skou proceed, with the dierence being that instead of using the probabilities associated with the actions she uses the rates associated with the actions. In this sense it is related to our probabilistic bisimulation but, of course, her treatment of time is totally dierent. The other area where bisimulation has appeared in a continuous context is the theory of timed automata [AD94]. Here the basic framework is ordinary automata theory augmented with clocks. These concepts have proved to be very useful in practice and lead to fundamental theoretical questions. The main technical result is the so called region construction. This is a quotienting of the state space of the timed automaton - which is a continuum because of the clocks - by an 30
equivalence relation, which, is very much like bisimulation. By imposing certain conditions on the way clocks can be read they guarantee that the region construction leads to a nite-state system. The bisimulation relation that they use is not like ours; theirs is concerned with how actions are enabled as time passes. The region construction is also used [ACH+ 95, HHWT97] for linear hybrid automata. This is a setup even closer in spirit to ours since there are explicitly continuous state spaces. Here also a relation like bisimulation is used to collapse the continuum state space to a nite state space. The group at Oxford has been developing an extensive theory of probabilistic systems; see the collection of reports available from the web [Pro]. The focus has been on equational laws satis ed by processes. From the semantic point of view, they have extensively developed and enriched Kozen's [Koz85] predicate-transformer view. They have considered continuous state space systems and have incorporated nondeterminism in their framework. As with the earlier work of Kozen [Koz81, Koz85], stochastic kernels play an important role. There are several papers now on probabilistic analysis, modeling and veri cation. There are even several papers on probabilistic process algebra analyzing notions of testing and simulation, investigating model checking and exploring various other ideas [SL94, vGSST90, JL91, JY95, JS90, CSZ92, BK96, HK96]. There are several interesting practical developments, other than PEPA, which are worthy of attention. In particular telecommunication [AJKvO97], real-time systems [BLFG95] and modeling physical systems [GSS95] are areas where probabilistic systems are very important. It is particularly for the last type of application that we expect that the continuous space formalism developed here will be useful. In a recent paper [GJP99] a programming language with probabilistic choice and recursion was developed. This immediately puts the work in the realm of continuous spaces. The semantics of such systems involved the basic ideas of measure theory that we found useful in the present work. The other related paper is the work of de Vink and Rutten [dVR97]. This is also an investigation into the realm of continuous state spaces. However they work with ultrametric spaces, not with the kind of metric spaces that actually arise in physical examples.
8 Conclusions We summarize our main contributions. We gave a notion of nite approximation to a class of stochastic systems and a proof that one can reconstruct the process from its nite approximants, a de nition of distance between processes and the proof that the nite processes approximating a process converge to the original process, a logic for reasoning about processes and results showing that the interpretation of logical formulas themselves \converge" to give results about the original process, perturbation results that say that a process can be slightly altered and the process will be close to the original unperturbed process and the logical interpretation will also be close, a proof that there is a countable family of nite-state processes that form a basis for the space of all processes. The theme of all these results is that one can approximate a continuous state process with a family of nite state processes. This approximation carries with it the logic in the sense that approximate 31
reasoning principles are supported. We have also eliminated one of the key objections to working with probabilistic process equivalences, namely that they are not robust. The research reported here is part of a larger investigation. We are interested in relating our work to the whole issue of discrete versus continuous models. Work in cellular automata has shown that there are many physical phenomena that are usually associated with continuous systems being manifested in discrete systems [Wol94]. In control theory there has been work on how controllability ideas transfer from continuous to discrete [CW95b]. We are ultimately interested in foundations for continuous stochastic systems and in methods for dealing with them algorithmically. In particular we are eager to extend our theories to work with continuous time. The main practical impact of these results should be in reasoning about continuous stochastic systems. We are currently looking at a number of potential applications; these include some toy examples to explore the ideas and, more seriously, a ight-control system and a robotic system. We are planning to explore the use of standard veri cation technology, such as probabilistic model checkers [BCHG98, HG98, han94] in conjunction with our approximations. Such activity would entail much more attention to the algorithmic and complexity aspects of the work.
Acknowledgments Prakash Panangaden would like to thank Franck van Breugel, Erik de Vink and Martin Escardo for helpful discussions. He would also like to thank Dexter Kozen for inspirational lectures on measure theory in 1985 which he hopes he has nally grasped!
References [ACH+ 95] R. Alur, C. Courcoubetis, N. Halbwachs, T.A. Henzinger, P.-H. Ho, X. Nicollin, A. Olivero, J. Sifakis, and S. Yovine. The algorithmic analysis of hybrid systems. Theoretical Computer Science, 138:3{34, 1995. [AD94] R. Alur and D. Dill. A theory of timed automata. Theoretical Computer Science, 126:183:235, 1994. [AHS96] R. Alur, T. Henzinger, and E. Sontag, editors. Hybrid Systems III, number 1066 in Lecture Notes in Computer Science. Springer-Verlag, 1996. [AJKvO97] R. Alur, L. Jagadeesan, J. J. Kott, and J. E. von Olnhausen. Model-checking of real-time systems: A telecommunications application. In Proceedings of the 19th International Conference on Software Engineering, 1997. [AKNS97] P. Antsaklis, W. Kohn, A. Nerode, and S. Sastry, editors. Hybrid Systems IV, volume 1273 of Lecture Notes In Computer Science. Springer-Verlag, 1997. [BCHG98] C. Baier, E. M. Clarke, and V. Hartonas-Garmhausen. On the semantic foundations of probabilistic verus. In Probabilistic Methods in Veri cation - PROBMIV 98. University of Birmingham, 1998. [BDEP97] R. Blute, J. Desharnais, A. Edalat, and P. Panangaden. Bisimulation for labelled markov processes. In Proceedings of the Twelfth IEEE Symposium On Logic In Computer Science, Warsaw, Poland., 1997. 32
[BK96]
C. Baier and M. Kwiatkowska. Domain equations for probabilistic processes. Available from URL http://www.cs.bham.ac.uk/ mzk/, March 1996. [BLFG95] A. Benveniste, B. C. Levy, E. Fabre, and P. Le Guernic. A calculus of stochastic systems for the speci cation, simulation and hidden state estimation of mixed stochastic/nonstochastic systems. Theoretical Computer Science, 152(2):171{217, 1995. [CSZ92] R. Cleaveland, S. Smolka, and A. Zwarico. Testing preorders for probabilistic processes. In Proceedings of the International Colloquium On Automata Languages And Programming 1992, number 623 in Lecture Notes In Computer Science. Springer-Verlag, 1992. [CW95a] P. E. Caines and S. Wang. A conditional observer and controller for nite machines. SIAM Journal of Control and Optimization, 33(6):1687{1715, Nov 1995. [CW95b] P. E. Caines and Y. J. Wei. The hierarchical lattices of a nite machine. Systems and Control Letters, 25:257{263, 1995. [DEP98] J. Desharnais, A. Edalat, and P. Panangaden. A logical characterization of bisimulation for labeled markov processes. In proceedings of the 13th IEEE Symposium On Logic In Computer Science, Indianapolis, pages 478{489. IEEE Press, June 1998. [DEP99] J. Desharnais, A. Edalat, and P. Panangaden. Bisimulation for labeled markov processes. Information and Computation, 1999. [DGJP99] J. Desharnais, V. Gupta, R. Jagadeesan, and P. Panangaden. Metrics for labeled markov processes. In Proceedings of CONCUR99, Lecture Notes in Computer Science. Springer-Verlag, 1999. [dVR97] E. de Vink and J. J. M. M. Rutten. Bisimulation for probabilistic transition systems: A coalgebraic approach. In Proceedings of the 24th International Colloquium On Automata Languages And Programming, 1997. [Eda99] A. Edalat. Semi-pullbacks and bisimulation in categories of markov processes. Mathematical Structures in Computer Science, 1999. [Fel71] W. Feller. An Introduction to Probability Theory and its Applications II. John Wiley and Sons, 2nd edition, 1971. [Ger85] Robert Geroch. Mathematical Physics. Chicago Lectures in Physics. University of Chicago Press, 1985. [GJP99] V. Gupta, R. Jagadeesan, and P. Panangaden. Stochastic processes as concurrent constraint programs. In Proceedings of the 26th Proceedings Of The Annual ACM Symposium On Principles Of Programming Languages, 1999. [GSS95] V. Gupta, V. Saraswat, and P. Struss. A model of a photocopier paper path. In Proceedings of the 2nd IJCAI Workshop on Engineering Problems for Qualitative Reasoning, 1995. [han94] Hans A. hansson. Time and Probability in Formal Design of Distributed Systems, volume 1 of Real-time Safety-critical Systems. Elseiver, 1994. 33
[HG98]
V. Hartonas-Garmhausen. Probabilistic Symbolic Model Checking with Engineering Models and Applications. PhD thesis, Carnegie-Mellon University, 1998. [HHWT97] T. Henzinger, P.-H. Ho, and H. Wong-Toi. Hytech: a model checker for hybrid systems. Software Tools for Technology Transfer, 1(1), 1997. [Hil94] J. Hillston. A Compositional Approach to Performance Modelling. PhD thesis, University of Edinburgh, 1994. To be published as a Distinguished Dissertation by Cambridge University Press. [HK96] M. Huth and M. Kwiatkowska. On probabilistic model checking. Technical Report CSR-96-15, University of Birmingham, 1996. Available from http://www.cs.bham.ac.uk/ mzk/. [Hut81] J. E. Hutchinson. Fractals and self-similarity. Indiana Univ. Math. J., 30:713{747, 1981. [JL91] B. Jonsson and K. Larsen. Speci cation and re nement of probabilistic processes. In Proceedings of the 6th Annual IEEE Symposium On Logic In Computer Science, 1991. [JS90] C.-C. Jou and S. A. Smolka. Equivalences, congruences, and complete axiomatizations for probabilistic processes. In J.C.M. Baeten and J.W. Klop, editors, CONCUR 90 First International Conference on Concurrency Theory, number 458 in Lecture Notes In Computer Science. Springer-Verlag, 1990. [JY95] B. Jonsson and W. Yi. Compositional testing preorders for probabilistic processes. In Proceedings of the 10th Annual IEEE Symposium On Logic In Computer Science, pages 431{441, 1995. [Koz81] D. Kozen. Semantics of probabilistic programs. Journal of Computer and Systems Sciences, 22:328{350, 1981. [Koz85] D. Kozen. A probabilistic PDL. Journal of Computer and Systems Sciences, 30(2):162{ 178, 1985. [LS91] K. G. Larsen and A. Skou. Bisimulation through probablistic testing. Information and Computation, 94:1{28, 1991. [Mil80] R. Milner. A Calculus for Communicating Systems, volume 92 of Lecture Notes in Computer Science. Springer-Verlag, 1980. [Mil89] R. Milner. Communication and Concurrency. Prentice-Hall, 1989. [Par81] D. Park. Concurrency and automata on in nite sequences. In Proceedings of the Fifth GI Conference, number 154 in Lecture Notes in Computer Science, pages 561{572. Springer-Verlag, 1981. [Pro] Probabilistic systems group, collected reports. Available from www.comlab.ox.ac.uk in the directory /oucl/groups/probs/bibliography.html. [RdV97] J. J. M. M. Rutten and E. de Vink. Bisimulation for probabilistic transition systems: a coalgebraic approach. In P. Degano, editor, Proceedings of ICALP 97, number 1256 in Lecture Notes In Computer Science, pages 460{470. Springer-Verlag, 1997. 34
[SD78]
N. Saheb-Djahromi. Probabilistic LCF. In Mathematical Foundations Of Computer Science, number 64 in Lecture Notes In Computer Science. Springer-Verlag, 1978. [SD80] N. Saheb-Djahromi. Cpos of measures for nondeterminism. Theoretical Computer Science, 12(1):19{37, 1980. [SL94] R. Segala and N. Lynch. Probabilistic simulations for probabilistic processes. In B. Jonsson and J. Parrow, editors, Proceedings of CONCUR94, number 836 in Lecture Notes In Computer Science, pages 481{496. Springer-Verlag, 1994. [vGSST90] R. van Glabbeek, S. Smolka, B. Steen, and C. Tofts. Reactive generative and strati ed models for probabilistic processes. In Proceedings of the 5th Annual IEEE Symposium On Logic In Computer Science, 1990. [Wei97] Klaus Weihrauch. Computability on the probability measures on the Borel sets of the unit interval. In Pierpaolo Degano, Robert Gorrieri, and Alberto MarchettiSpaccamela, editors, Automata, Languages and Programming, 24th International Colloquium, volume 1256 of Lecture Notes in Computer Science, pages 166{176, Bologna, Italy, 7{11 July 1997. Springer-Verlag. [Wol94] S. Wolfram. Cellular Automata and Complexity. Addison-Wesley, 1994.
35