Proofs of randomized algorithms in Coq - CiteSeerX

0 downloads 175 Views 363KB Size Report
Article published in Science of Computer Programming 74 (2009) 568–589 ... prove properties of randomized programs in
Proofs of randomized algorithms in Coq Philippe Audebaud LIP - ENS Lyon 46, all´ee d’Italie - 69437 Lyon - France

Christine Paulin-Mohring INRIA Futurs, ProVal - Parc Orsay Universit´e - F-91893 LRI, Univ Paris-Sud, CNRS - Orsay F-91405

Abstract Randomized algorithms are widely used for finding efficiently approximated solutions to complex problems, for instance primality testing and for obtaining good average behavior. Proving properties of such algorithms requires subtle reasoning both on algorithmic and probabilistic aspects of programs. Thus, providing tools for the mechanization of reasoning is an important issue. This paper presents a new method for proving properties of randomized algorithms in a proof assistant based on higher-order logic. It is based on the monadic interpretation of randomized programs as probabilistic distributions (Giry, 1982; Ramsey and Pfeffer, 2002). It does not require the definition of an operational semantics for the language nor the development of a complex formalization of measure theory. Instead it uses functional and algebraic properties of unit interval. Using this model, we show the validity of general rules for estimating the probability for a randomized algorithm to satisfy specified properties. This approach addresses only discrete distributions and gives rules for analysing general recursive functions. We apply this theory to the formal proof of a program implementing a Bernoulli distribution from a coin flip and to the (partial) termination of several programs. All the theories and results presented in this paper have been fully formalized and proved in the Coq proof assistant. Key words: randomized algorithms, proof of partial and total correctness, functional language, axiomatic semantics, probability framing, call-by-value, monadic interpretation

Email addresses: [email protected] (Philippe Audebaud), [email protected] (Christine Paulin-Mohring).

Article published in Science of Computer Programming 74 (2009) 568–589

1

Introduction

Randomized algorithms are widely used either for finding efficient approximate solutions to complex problems such as primality testing, or in order to obtain good average behavior. Proving properties of such algorithms requires subtle reasoning about both algorithmic and probabilistic aspects of programs. Providing tools for the mechanization of reasoning is consequently an important issue.

1.1

Models

The first problem is to find an appropriate mathematical representation of a randomized algorithm. Methods for modeling randomized programs go back to the early work of Kozen (1981, 1983) which proposes to interpret randomized imperative programs as measure transformers. This approach has been studied further by Morgan and McIver (1999) who extend the interpretation to non-deterministic as well as probabilistic choices and define a refinement relation. Using an extension of weakest-precondition computation to randomized programs, they propose a method to lower the probability for the result of the program to satisfy a given property. Studying the semantic foundations of probabilistic languages has been the concern of much research. There are at least two different approaches. The first one is an operational view using access to an arbitrary number of independent random variables following a given distribution: which can be a coin flip (Hurd, 2002b, 2003) or a uniform distribution (Park et al., 2005). This interpretation is a monadic transformation. If Ω denotes the type of infinite sequences of independent random values, then a computation of type A will be interpreted as a function of type Ω → A × Ω: it computes a value of type A and modifies the global state of type Ω after consuming a finite prefix of the sequence of random values. Reasoning on randomized programs using this approach requires to model the base probability distribution on Ω. The second approach uses an interpretation of randomized programs as probability distributions. It is also possible to use a syntactic monadic transformation. In the discrete case, a probability distribution can be represented as a functional mapping from a subset of some σ into the interval [0, 1], or, using expectation, mapping a real-valued function on σ into an element of R. The monadic structure of probability theory was studied in Giry (1982) developing unpublished ideas in Lawvere (1962). This approach is used for instance in Ramsey and Pfeffer (2002) where a randomized functional term is interpreted as an Haskell program using the so-called expectation monad, ie functions of 569

type (σ → R) → R. 1.2

Proofs

Enabling the mechanized reasoning of probabilistic programs requires also tools for analysing the behavior of these programs. This point is a research topic. Hurd, McIver and Morgan designed a mechanization of the quantitative logic for probabilistic guarded commands using the proof assistant HOL (Hurd et al., 2005). Their goal is very similar to ours, except that they analyse a different source language, handling both probabilistic and non-deterministic choice in an imperative settings, while we are only considering probabilistic choice but in a functional language, including recursive functions. Their work also contains the formalisation in HOL of meta-reasoning on the source language, while we have for the moment only considered a shallow embedding of our programs in Coq. With regard to algorithms, Hurd (2002b, 2003) shows how to model and prove properties of randomized programs in the HOL proof assistant using a monadic transformation of programs, where Hurd assumes access to an infinite sequence of independent coin flips.

1.3

Our choices

In this article, we intend to prove specifications for probabilistic programs inside the Coq proof assistant. We start by turning a (probabilistic) functional program p on some type A into a pure functional term, denoted as [p], with type MA ≡ (A → [0, 1]) → [0, 1], where MA is provided with a monadic structure. In this setting, [p] will represent a (mathematical) discrete measure: a sub-probability. Although this monad appears more restrictive than the one proposed in Ramsey and Pfeffer (2002), it turns out to be sufficient for the goal of providing approximations for probabilities. To keep the monadic transformation simple, we design a tiny probabilistic language Rml, equipped with a rather restricted type system, yet expressive enough for coding interesting algorithms. Program specifications are then proved along a specific inference system for axiomatic semantics. For the proof assistant, tools are required for interactive reasoning about probabilistic programs (actually through the above transformation). We thus share Hurd’s approach, while our design choices do not require full development of measure theory inside Coq. Our tools are based upon a specific library which 570

axiomatizes the properties required on some abstract type U representing the real interval [0, 1]. This library is developed as an independent contribution (Paulin-Mohring, 2007), and designed to provide the back-end tools needed by the user. Our axiomatic semantics enhances previous work by Morgan and McIver (1999), where rules allowed only weakening for probabilities. We prove their validity with respect to our semantics. We also propose schemes to reason on general recursive functions which generalize the usual schemes for loops. Our framework does not rely on a particular choice of a primitive randomized function. In this paper, we use a boolean flip and a finite random function and we show how to interpret directly a randomized choice operator. We only build discrete distributions: dealing with continuous distributions would require modification of the interpretation to restrict the functional to measurable functions, an extension we plan to investigate later.

1.4

Paper outline

The paper is organized as follows. In Section 2, we introduce the input language and its semantics: an interpretation of programs as measures using a monadic transformation. We analyze our monadic interpretation from the functional point of view. In Section 3 we introduce the basic Coq theories for representing measures. In Section 4, we show the derived rules for framing the probability for a randomized program to satisfy a given property, in particular for the case of recursive programs. In Section 5, we apply our method to proofs of simple probabilistic properties of programs. The current paper extends Audebaud and Paulin-Mohring (2006) by suggesting an interpretation of higher-order functional programs in section 2.5.6 and introducing rules for intervals in section 4.1 and also more general rules for reasoning on recursive functions in section 4.4. We also develop an example of partial termination in section 5.2.

1.5

Remark

The possible interpretation of random functional programs as probabilistic distributions using a monadic interpretation is not new, it appears in many theoretical works on semantics, see Giry (1982), or more concretely for representing random programs in Haskell by Ramsey and Pfeffer (2002). To our knowledge, however, the approach of mechanizing reasoning about random functional expressions is new. In Ramsey and Pfeffer (2002), the interpreta571

tion does not cover general recursive programs and its inefficiency is criticized, the authors propose instead an alternative method which only covers discrete distributions. The possibility to cover recursion was however studied in Jones (1989), on which the approach of this paper is based. That the interpretation can lead to inefficient or even unfeasible computations in practice will be illustrated in Sections 2.6.1 and 2.6.2. Our work advocates that operational behavior is not relevant, since our model allows anyway for abstract reasoning on programs, using the general rules presented in Section 4 and illustrated on examples in Section 5. This is to be related to Hoare rules for axiomatic semantics, which do not rely on computations per se, but to denotational semantics. From this point of view, we compare with Kozen’s second semantics in Kozen (1981), and to the framework proposed in (Kozen, 1983).

2

Monadic interpretation of randomized algorithms

Sections 2.1 and 2.2 provide background results on probabilities which underlie our framework. We present in Section 2.3 a reasonably simple probabilistic language Rml. The monadic interpretation is the subject of Section 2.4, where we discuss also the consequences of relaxing the typing rules. We conclude this section by putting our interpretation at work on concrete examples in Section 2.6. The approach in this part is very similar to the one proposed in Ramsey and Pfeffer (2002). The main differences are that we measure functions with values in the interval [0, 1] instead of real numbers, we concentrate on a first-order language, which is sufficient for the applications we want to address, but we also show how to extend the approach for the general functional case. Unlike what is done in Ramsey and Pfeffer (2002), we shall address the question of general recursive functions in section 4.

2.1

Randomized programs as measure transformers

Usually, an imperative or functional program returns at most one state (or value in the functional case), from any given initial state. Moreover, the returned state is entirely determined by the program and the initial state. When dealing with probabilistic programs, this is no longer the case, even when running the program several times, starting with the same initial state. Rather, the distribution of returned states can be represented as some random variable, hence a measure over the states set. This change of view has been investigated in works by Kozen (1981, 1983); Jones and Plotkin (1989); Jones (1989); McIver and Morgan (2005) among others. Whilst the observation of the actual 572

returned states is non-deterministic, the measure which can be built from the initial state by applying the denotation of a probabilistic program provides a deterministic value. This approach is then easily extended into randomized programs viewed as measure transformers. The distribution of these output states is interesting. If this distribution is known, given a property P on the output state, we can compute the probability for the result of the program to satisfy P . A randomized program uses basic randomized primitives such as a random function which, given a natural num1 , or a ber n, produces a number between 0 and n with uniform probability 1+n more basic flip function which produces boolean values true or false with equal probability 12 . Another classical operator is probabilistic choice P p + Q which behaves like the program P with probability p and as Q with probability 1−p. The implicit assumption is that any access to a given random primitive in the program is independent of the others. Since we are concerned with a functional language, we do not have to take global states into consideration. Programs are interpreted as functions which compute values, and our aim is to estimate the distribution of these return values.

2.2

Representation of distributions

In this section, we explain our choice for a mathematical representation of probability distributions. We introduce the notation [0, 1] for the set of real numbers x such that 0 ≤ x ≤ 1.

2.2.1

The measure perspective

A (positive) measure on a set A, is a linear functional µ which given a (measurable) Rfunction f from A to R+ , computes a non-negative real number, its integral f dµ. A required condition on Rµ to be a measure, besides linearity, W W R is that µ preserves least upper bounds : n fn dµ = n fn dµ. In the following, we shall use the notation µ(f ) instead of

2.2.2

R

f dµ.

Notations for characteristic functions

If X is a subset of A, IX ∈ A → [0, 1] will denote the characteristic function of X such that ∀x ∈ A, IX (x) = 0 ⇔ x 6∈ X ∧ IX (x) = 1 ⇔ x ∈ X. We write 573

simply I for the function which is 1 everywhere. If P (x) is a formula with a free variable x, we write IP (.) for the characteristic function of the set X such that x ∈ X ⇔ P (x). For instance, I.=k is the characteristic function of the singleton {k}.

2.2.3

Our abstract notion of measure

From now on in this article, probability distributions are represented as positive measures, which norm is bounded by 1. In order to define a probability distribution, it is sufficient to be able to measure functions which take values in the unit interval [0, 1]. We can remark that if ∀x.f (x) ∈ [0, 1] then µ(f ) ∈ [0, 1] because a probability distribution is bounded by one. Hence, a measure µ on A can be interpreted as a function of type (A → [0, 1]) → [0, 1] satisfying some extra algebraic properties, to be precised in section 3.2.

2.3

Basic language for randomized programs

For sake of simplicity, we shall use in the following a simple first-order functional language. We will explain in section 2.5.6 how it could be extended to full functional constructions.

2.3.1

Expressions

Our language (called Rml) contains the following constructions: • • • • •

Variables: x Primitive constants: c Conditional: if b then e1 else e2 Local binding: let x = e1 in e2 Application: f e1 . . . en with f a primitive or user-defined function.

We shall introduce parentheses in concrete notations when needed. Functions can be declared the following way: let f x1 . . . xn = e Remark. Recursive definitions can be defined as well: let rec f x1 . . . xn = e 574

However, their introduction raise some technical issues with respect to the material developed in this section. Therefore, we postpone any further detail to section 3.3. In order to deal with probabilistic programs, Rml includes also a few random primitive functions, such as the random function which given a positive integer n, computes with uniform distribution an integer k such that 0 ≤ k ≤ n and the flip function which computes a boolean which is true with probability 12 .

2.3.2

Types

Our assumption on Rml, is that all expressions will be well-formed using a restricted simple types system. This system is built over base types such as bool for boolean values and nat for natural numbers (non-negative integers), and allows arrow types in the restricted case where arguments have a base type. In the following we write β, βi . . . in order to denote a base type. We shall write e : β when e is a well-formed expression of type β and f : β1 → · · · → βn → β when f e1 . . . en : β whenever ei : βi for i = 1 . . . n.

2.3.3

Meta-language

Randomized expressions are interpreted in an higher-order functional language. The target type system is richer: a type τ will be either a base type β (including the base type [0, 1] for reals between 0 and 1) or some functional type τ1 → τ2 . We use the same notations as in Rml for local bindings and conditionals, but we also introduce typed abstraction fun (x : τ ) ⇒ e and binary application (e1 e2 ). Application is left associative and types can be omitted in lambdaabstraction, written fun x ⇒ e, when the type is clear from the context. As a matter of fact, Rml (except for the randomized primitive functions) corresponds to a restricted subset of our meta-language where variables are always in base types and functions are in eta-long normal form. An alternative could have been to use a monadic meta-language as in Moggi (1991) or Pfenning and Davies (2001), but it would have introduced an extra level of syntax that we are able to avoid here, owing to the restrictions on Rml syntax. Doing otherwise would result in introducing more complex notations, which would have obscure the key ideas. The section 2.5.6 develops these points further. 575

2.4

Interpretation of random expressions

A (random) expression e in a base type β actually represents a set of values of type β, as different evaluations of the expression will lead to different values in general. As pointed out above, for analyzing the distribution of these values, we interpret e : β as a measure on β, i.e. a function of type (β → [0, 1]) → [0, 1]. In the following, Mτ will represent the type (τ → [0, 1]) → [0, 1] of measures on values of type τ . We write [e] to represent the measure associated to the expression e. If we know [e], given a property Q on β, it is possible to compute the probability for the evaluation of e to satisfy Q, it is just [e]IQ , namely the application of the measure associated to the expression e to the characteristic function of the predicate Q, interpreted as a subset of β.

2.5

Monadic transformation

The interpretation of e of type β as a measure [e] of type Mβ = (β → [0, 1]) → [0, 1] is defined by structural induction on e.

2.5.1

Definition of unit and bind

As usual in monadic transformations, we first introduce two operators: unit : τ → Mτ = fun (x : τ ) ⇒ fun (f : τ → [0, 1]) ⇒ f x bind : Mτ → (τ → Mσ) → Mσ = fun (µ : Mτ ) ⇒ fun (M : τ → Mσ) ⇒ fun (f : σ → [0, 1]) ⇒ µ (fun (x : τ ) ⇒ M x f )

As expected, theses definitions satisfy the usual monadic properties. The equality on Mβ is defined point-wise (µ1 = µ2 ⇔ ∀f, µ1 (f ) = µ2 (f )). • bind (unit x) M = M x • bind (bind µ M1 ) M2 = bind µ (fun x ⇒ bind (M1 x) M2 ) • bind µ unit = µ

576

2.5.2

Interpretation of functions

A function with name f and type τ ≡ β1 → · · · → βn → β will lead to a new function name [f ] of type [τ ] ≡ β1 → · · · → βn → Mβ. Primitive randomized functions Each primitive randomized function is given a functional interpretation of the corresponding type. In this paper, we shall use the following constructions: [random] n : Mnat 1 (f i) = fun (f : nat → [0, 1]) ⇒ Σni=0 1+n

[flip] ()

: Mbool = fun (f : bool → [0, 1]) ⇒ 21 (f true) + 21 (f false)

It is also possible to start from other primitive notions of randomness, like the random choice operator used in Ramsey and Pfeffer (2002): e1 p+ e2 : Mβ = fun (f : β → [0, 1]) ⇒ p × ([e1 ]f ) + (1−p) × ([e2 ]f )

User defined functions For a (non-recursive) user-defined function introduced by let f x1 . . . xn = e, the interpretation [f ] will be introduced by let [f ] x1 . . . xn = [e]. This turn [f ] into a function with type β1 → · · · βn → Mβ, belonging to the target language. Shortcut More generally, when f is any Rml function of type β1 → · · · βn → β, when x1 , . . . , xn are terms of the meta-language such that xi has type βi (for each 1 ≤ i ≤ n) and g : β → [0, 1] is any function, we allow ourselves to write [f x1 . . . xn ]g (instead of ([f ] x1 . . . xn )g) for the expectation of g by the measure [f ] x1 . . . xn . We also abusively use the same notation [φ x1 . . . xn ]g (instead of (φ x1 . . . xn )g) when φ is some function of type β1 → · · · βn → Mβ defined in the metalanguage, in order to emphasize the fact that we are computing the expectation of g with respect to the measure (φ x1 . . . xn ). Recursively defined functions When dealing with let rec f x1 . . . xn = e, we define as well [f ] as a new recursively defined function in the target language, introduced by let rec [f ] x1 . . . xn = [e]. However, this is not as simple, in spite of being quite the same from the sole syntactic point of view. We address this issue more deeply in section 3.3. 577

2.5.3

Interpretation of expressions

Computation a : β Functional value [a] : Mβ v

unit v

v variable or constant

let x = a in b

bind [a] (fun x ⇒ [b])

f a1 . . . an

bind [a1 ] (fun x1 ⇒ . . . bind [an ] (fun xn ⇒ [f ] x1 . . . xn ) . . .)

if b then a1 else a2 bind [b] (fun (x : bool) ⇒ if x then [a1 ] else [a2 ])

2.5.4

Properties of the interpretation

It is easy to prove that our interpretation is well-typed: Proposition 1 Given an expression e in Rml of (base) type β, [e] is defined and has type Mβ

PROOF. We prove a more precise result by a simple induction on the expression e: assume e has type β, contains the finite set of free variables (xi )i , where for each i, xi has type βi , and make calls to some finite set (fj )j of functions. Then [e] has type Mβ in an environment containing the same variables (xi )i of same type (βi )i and contains the corresponding functions symbols ([fj ])j , such that [fj ] has now type [τj ] when fj has type τj .

If random primitives are left aside in a term e then it is possible to simplify the translation [e]: Proposition 2 Let e be a pure expression of base type β in Rml, i.e. an expression in which no randomized construction occurs, then e can be translated in our meta-language into a term of type β (still written e) and [e] = unit e.

PROOF. The proof is by induction on the structure of terms not involving randomized constructions. The translation uses the meta-language abstraction, application and local definition for the interpretation of the corresponding Rml constructions. The equality [e] = unit e is a consequence of the monadic properties of unit and bind.

578

2.5.5

On the meaning of the interpretation

Let us have a look at this interpretation from the measure theory point of view, • The monad operator unit x represents the Dirac measure δx at point x. If x : β and θ : β → [0, 1], then [x]θ = unit x θ = θ(x) =

Z

θ(y)dδx (y)

• Given µ a measure on Mα and fun x ⇒ e of type α → Mβ a family of measures on β parametrized with x ∈ α, the measure bind µ (fun x ⇒ e) is defined as bind µ (fun x ⇒ e) θ =

Z Z



θ(y)de(y) dµ(x)

In particular [let x = a in b]θ =

Z Z



θ(y)d[b](y) d[a](x)

in such a way that both let and summation constructs bind the variable x. • The measure associated to the conditional e = if e0 then e1 else e2 behaves as expected: 



[e]θ =

Z

Z

θ(y)d[e1 ](y) +



θ(y)d[e2 ](y) d[e0 ](b)

b=false

b=true

= ([e0 ]Ib=true )

Z

Z

θ(y)d[e1 ](y) + ([e0 ]Ib=false )

Z

θ(y)d[e2 ](y)

as the variable b occurs neither in e1 nor in e2 . • Accordingly, the application e = (f a1 . . . an ) corresponds to a multiple summation [e]θ =

Z 

···

Z Z





θ(y)d[f x1 . . . xn ](y) d[an ](xn ) . . . d[a1 ](x1 )

• The definitions for random primitives such as flip and random n involve actually finite summations, as already presented in section 2.5.2. The general summation symbol includes obviously the particular case of finite and denumerable ones.

2.5.6

A general higher-order interpretation

The basic term language presented in section 2.3 will turn out to be sufficient for dealing with interesting examples. Its main restriction however results from 579

its proper design: we do not take into account programs which could generate randomized functions. For instance, the program Φ defined as let Φ = let x = random 100 in fun (n:nat) ⇒ let y = random n in y+x

provides a random variable on type nat → nat. As such, one would expect its monadic interpretation to be given over some type expression [[nat → nat]] ≡ Mρ ≡ (ρ → [0, 1]) → [0, 1], where ρ is some type, which is described more precisely below. This leads to the general high-order interpretation, based upon the fact that [[β]] = Mβ for any base type β, and the observation that variables (as well as abstractions) are expected to be considered values with respect to our interpretation. Given an arbitrary expression e : σ, we still want to turn e into a measure on some type expression σ ¯ , i.e. a function of type [[σ]] = M¯ σ. The transformation turns out to be the well known Plotkin’s Call-by-Value ¯ → M¯ τ. transformation, with β¯ ≡ β and σ → τ ≡ σ As for e : σ, we define [[e]] : [[σ]] such that: Term e : τ

Interpretation [[e]] : [[τ ]]

x

unit x

fun (x : σ) ⇒ t

unit (fun (x : σ ¯ ) ⇒ [[t]])

let x = a in b

bind [[a]] (fun (x : σ ¯ ) ⇒ [[b]])

tu

bind [[t]] (bind [[u]])

if b then e1 else e2 bind [[b]] (fun (x : bool) ⇒ if x then [[e1 ]] else [[e2 ]]) In other words, turning our former monadic transformation (section 2.3) over Rml into a more general transformation amounts at applying CPS transformations to our programs, where measurable functions f : β → [0, 1] are now seen as particular cases of continuations. This interpretation extends this situation by interpreting random primitives such as flip() (resp. random n) as a genuine inhabitant in Mbool (resp. Mnat) as shown in section 2.5.2. It can be shown that this interpretation is a conservative extension from the former monadic one. They compare, when we restrict ourselves to the Rml case: Proposition 3 Assume p is a well formed term from Rml. If p : β, where β is a base type, then [p] = [[p]] as elements of Mβ. In this paper, randomized functions such as Φ cannot be considered since the type system chosen in this work does not allow for building measures on functional types. 580

2.6

Examples of functional interpretation

Now that the monadic translation is defined, we can transform an expression e which computes a value randomly into an deterministic expression [e] which returns the measure associated with the expression e. Before looking at this interpretation in the prospect of proving facts over some program e, notice that [e] is an ordinary functional term, and can be evaluated as such in the interactive main loop of, say, O’Caml.

2.6.1

Primality test

A basic example of a randomized algorithm is the primality test. The principle of this algorithm is the following. We want to check whether a number p is prime. There is a deterministic test (test) which applies to 1 ≤ k < p and p such that: • If p is prime then (test k p) evaluates to true for all k • If p is not prime then (test k p) evaluates to true for a limited number of . k, say N less than p−1 2 We choose k randomly and run the test: if the answer is false, then p is not N which prime; if the answer is true then p is not prime with a probability p−1 1 is less than 2 . Iterating the test improves the level of confidence, provided the random choices of k are independent. In our language, the function which iterates n times the primality test for p can be written: let rec prime test p n = if n = 0 then true else let k = random (p-2) in if test (k+1) p then prime test p (n-1) else false Using the monadic transformation, and monad simplification laws, we get the functional computation of the associated measure: let rec prime test fun p n = if n = 0 then unit true else bind (random fun (p-2)) fun k ⇒ if test (k+1) p then prime test fun p (n-1) else unit false Now if we want to evaluate the probability for our program to give a correct answer, we define prime correct, the characteristic function of the correctness 581

predicate, which says that the result is true exactly when p is prime: let prime correct p b = if b = exact prime p then 1. else 0. One can now explicitly compute the probability that our program gives a correct answer after n iterations: let evaluate p n = prime test fun p n (prime correct p) The function can be run in O’Caml and gives the following results. # # -

evaluate 23 1;; : float = 1 [evaluate 9 0;evaluate 9 1;evaluate 9 2;evaluate 9 3];; : float list = [0.;0.75;0.9375;0.984375]

If the number is prime (example p = 23), then the result will be correct with probability one. On the other hand, if p is not prime (example p = 9) then the probability that the program gives a correct answer after 0 iteration is 0, after 1 iteration, we get the good answer 3 times out of 4 and it goes to more than 98% of good answers after 4 iterations. One nice point is that we have been able to compute these probabilities with a simple ML program without any specific knowledge on probability theory nor number theory (except for the interpretation of random). On the other hand, if we analyze the program, we remark that it is very inefficient: • in order to build the characteristic function to be tested we need to know (or to test) exactly if p is prime or not; • because of the interpretation of random, the program is executed for all the values of k between 1 and p − 1 before computing the average number of good answers.

2.6.2

Random walk

Furthermore, this computational approach does not work in all cases. Our previous program uses a structural recursion which always terminates. Many interesting probabilistic programs only terminate with probability one, which is a weaker requirement. For instance the following function flips a coin and returns how many flips it took to get false, this is a typical example of a random walk: let rec walk x = if flip () then x else walk (x+1) 582

If we test this function in O’Caml several times, we get small number answers such as 1, 2, 3. We may apply our translation scheme: let rec walk fun x = bind flip fun (fun (b:bool) ⇒ if b then unit x else walk fun (x+1)) and measure the function which is 1 everywhere: # walk_fun 1 (fun n -> 1.);; Stack overflow during evaluation (looping recursion?). it loops because our interpretation tests all the cases, in particular the one where the result of flip is always false. This example shows that, when general fix-points are involved, we cannot anymore use computation of the monadic interpretation for analyzing the probability of events. We shall need to reason about these programs instead. For that, we first define a Coq theory for representing distributions, then we prove several theorem for analyzing programs.

3

Coq representation of randomized programs

The monadic interpretation transforms a probabilistic term e of type β into a purely functional one, [e] which is understood as a measure on this same type. Our next step towards reasoning on these randomized terms consists in providing tools on proof assistant Coq side to reason on e through its interpretation [e]. As a matter of consequence, we develop tools to reason on measures instead. The section 3.1 presents an axiomatization U of the unit interval [0, 1], sufficient for the purpose, and representation for types and terms from Rml is explained in sections 3.2 and 3.3.

3.1

U : an axiomatization of the set [0, 1]

Our model is based on measures seen as functionals of type (A → [0, 1]) → [0, 1]. For constructing this model in Coq, we have chosen to axiomatize a type U which corresponds to the interval [0, 1]. The complete development is available as a Coq contribution (see http://coq.inria.fr) 1 1

Our development currently runs with Coq V8.1.

583

3.1.1

Notations for complete partial orders

Our development extensively uses the notion of complete partial order. Our Coq library consequently starts with the definition of a structure for ordered sets, and one for complete partial orders. An ordered set is given by a type O, a relation ≤ which is reflexive and transitive. An equality on O is defined by x == y iff x ≤ y ∧ y ≤ x. Given two ordered sets O1 and O2 , we introduce the type of monotonic functions m O1 → O2 . A ω-complete partial order (ω-cpo) is given by an ordered set D, a minimal element 0 and a least-upper bound operation lub f on monotonic sequences m m f : nat → D. Given two ω-cpos D1 and D2 , a monotonic function F : D1 → D2 is defined to be continuous whenever F (lub f ) ≤ lub(F ◦ f ). Because the opposite inequality is always provable, a continuous function also satisfies F (lub f ) == lub(F ◦ f ). There is a standard way to introduce fix-points in an ω-cpo D. Let F be a m monotonic operator on D (ie F : D → D), we introduce the sequence Fn defined by Fn ≡ F n 0 (with F n+1 = F ◦ F n ) and define fix F = lub Fn . It is easy to show that fix F ≤ F (fix F ), the equality fix F == F (fix F ) requires that F is continuous. The ω-cpo structure can be extended to functions spaces. If we have an ω-cpo structure on a set D, then we can define the same structure on the set A → D of functions with values in D, just taking: f ≤A→D g ⇔ ∀x, f x ≤D g x 0A→D = fun x ⇒ 0D

lubA→D fn = fun x ⇒ lubD (fn x)

Given an ordered set O and an ω-cpo D, the set of monotonic functions from O to D is also an ω-cpo.

3.1.2

Definitions

Our axiomatisation of [0, 1], starts by introducing an ω-cpo U . Consequentely we can use the following symbols: • Constant : 0 • Predicates : x ≤ y, x == y with x, y ∈ U m • Least-upper bounds for monotonic sequences: lub f with f ∈ nat → U . If f is an expression with a free variable n, we write lub(f )n instead of 584

lub (fun n ⇒ f ). We also introduce the following constructions building new elements in U : • • • •

bounded addition: x + y with x, y ∈ U multiplication: x × y with x, y ∈ U inverse: 1−x with x ∈ U 1 with n : nat values: 1+n

The addition in U is bounded: it gives the minimum of addition on reals and 1.

3.1.3

Axioms

In addition to the ω-cpo properties, we introduce a set of axioms for the operations on U .

3.1.3.1 Order We assume that 1 is different from 0 and not less than any element in U and that the order is total: • Non-confusion: ¬0 == 1 • Bounds: ∀x, x ≤ 1 • Totality: ∀xy, x ≤ y ∨c y ≤ x Coq implements an intuitionistic logic, we did not want to commit ourselves to a classical axiomatisation of real numbers. Consequentely, we choose a classical version of disjunction for expressing the totality: the property A∨c B is defined as ∀C, (¬¬C → C) → (A → C) → (B → C) → C and we added an axiom stating that the order relation is classical: • Classical: ¬¬(x ≤ y) → x ≤ y

3.1.3.2 Addition, multiplication and inverse As expected, we include the usual axioms stating that addition and multiplication are symmetric and associative, with 0 and 1 as their respective neutral elements. Some properties of addition are only valid when there is no overflow during addition. The non-overflow condition is expressed in our formalism as x ≤ 1−y. We express the relationship between least upper bounds (lubs) and addition and multiplication by the assumption of continuity of addition and multiplication with respect to their second argument. 585

The complete set of axioms is: • Addition · Symmetry: ∀x y, x + y == y + x · Associativity: ∀x y z, x + (y + z) == (x + y) + z · Neutral element: ∀x, 0 + x == x · Compatibility: ∀x y z, y ≤ z ⇒ x + y ≤ x + z · Simplification: ∀x y z, z ≤ 1 − x ⇒ x + z ≤ y + z ⇒ x ≤ y m · lub and addition: ∀(f : nat → U ) k, k + lub f ≤ lub(k + f n)n • Multiplication · Symmetry: ∀x y, x × y == y × x · Associativity: ∀x y z, x × (y × z) == (x × y) × z · Neutral element: ∀x, 1 × x == x · Distributivity on addition: ∀x y z, x ≤ 1−y ⇒ (x+y)×z == x×z +y ×z · Compatibility: ∀x y z, y ≤ z ⇒ x × y ≤ x × z · Simplification: ∀x y z, ¬0 == z ⇒ z × x ≤ z × y ⇒ x ≤ y m · lub and multiplication: ∀(f : nat → U ) k, k × lub f ≤ lub(k × f n)n • Inverse · Inverse maps 1 to 0 : 1 − 1 == 0 · Inverse property: ∀x, (1 − x) + x == 1 · Compatibility: ∀x y, x ≤ y ⇒ 1 − y ≤ 1 − x · Inverse and addition: ∀x y, y ≤ 1 − x ⇒ (1 − (x + y)) + x == 1 − y · Inverse and multiplication: ∀x y, 1 − (x × y) == (1 − x) × y + 1 − y

3.1.3.3 •

Constant

1 1+n

The constant

1 1+n

satisfies the axiom:

1 1+n

1 == 1−(n × 1+n ) 1 where n × 1+n is a generalized sum defined by induction on n.

Finally the fact that U is archimedian is axiomatized by the property 1 • ∀x, ¬x == 0 ⇒ ∃c n, 1+n ≤x

As for the total order property, we use a classical version of existential.

3.1.4

Remarks

Our modeling of randomized programs does not depend on our particular axiomatization of [0, 1]. Our choices are somehow arbitrary, we tried to find an axiomatization with a few number of operations and axioms such that the theory could be easily instantiated by different representations of real numbers. We are interested in particular by constructive reals, and we plan to investigate a possible encoding using the reals defined by Geuvers and Niqui (2002) or the axioms proposed for interval objects as described by Escardo 586

and Simpson (2001). We use the functor mechanism of Coq in order to keep the axiomatization of [0, 1] as a parameter of the theory.

3.1.5

Derived operations

The usual minus operation x − y (which is zero when x ≤ y) can be defined using our special inverse by: x − y ≡ 1 − ((1 − x) + y) The operation max can be defined as (x − y) + y. Using the max operation, we can define the least-upper bound of an arbitrary sequence. The greatest lower bound can be defined by glb f ≡ 1−lub(1−f ). It is also easy to define n × x and xn for an integer n by induction on n. In Morgan and McIver (1999), the authors use an operation x & y defined on non-negative real numbers as the maximum of 0 and x + y − 1. The same operation can be defined in our theory using the inverse operation and addition by x & y ≡ 1−((1−x) + (1−y)). It is the dual operation of addition because we have (1−(x & y)) == (1−x) + (1−y) and 1−(x + y) == (1−x) & (1−y). This operation captures intersection of properties because IP ∩Q == IP & IQ and will be used in fix-point rules in section 4.4.2. Altogether, the Coq theory for [0, 1] contains approximately 1100 lines of definitions and lemmas (and almost twice as many lines of proofs).

3.2

Dealing with Rml in Coq

Given e : β, we get [e] : Mβ = (β → [0, 1]) → [0, 1]. The type Mβ is first represented in Coq as some record type (distr β) which captures functionals in Mβ with good measure properties.

3.2.1

Representation of types

In the following, we extend in a standard way the operations on U , to operations and relations on functions of type β → U using the same notations: f +g is the function fun x ⇒ f x + g x and k × f is the function fun x ⇒ k × f x. Given a type β, we define a distribution on β to be a monotonic function µ of m type (β → U ) → U which furthermore satisfies stability properties, namely: • linearity : · ∀f g : β → U, f ≤ 1 − g ⇒ µ(f + g) == µ(f ) + µ(g) · ∀(k : U )(f : β → U ), µ(k × f ) == k × µ(f ) • compatibility with inverse : ∀f : β → U, µ(1−f ) ≤ 1−µ(f ) m • continuity : ∀f : nat → (β → U ), µ(lub f ) ≤ lub (µ ◦ f ) 587

In Coq, we introduce a type (distr β) as a dependent record which contains the measure µ plus the proofs of compatibility properties for µ. There is a natural order on that type inherited from the functional order on (β → U ) → U . Formally in the Coq development, there is a difference between the type Mβ of functionals and the type (distr β) which contains the functional of type Mβ plus the proofs of stability properties. However, for the sake of readability we shall not emphasize this distinction in this paper and use simply the type Mβ in place of (distr β) assuming all the objects in that type satisfy the requested stability properties.

3.2.2

Remarks

We allow a distribution to be a sub-probability with possibly µ(1−f ) < 1−µ(f ) (i.e. µ(I) < 1). This is useful for interpreting non terminating programs. The definition and properties in Coq of a measure on a type β is done for an arbitrary Coq type and not just base types coming from the Rml interpretation.

3.2.3

Derived properties

From this definition, we can deduce further properties, such as • • • •

µ(fun x ⇒ 0) == 0, µ(1−f ) == µ(I) − µ(f ), ∀f g, µ(f + g) ≤ µ(f ) + µ(g) (even when there is an overflow), ∀f g, µ(f ) & µ(g) ≤ µ(f & g).

3.2.4

Representation for Rml terms

We easily check that the monadic operators unit and bind introduced in 2.5 satisfy the stability properties of measures given in section 3.2.1. This is also the case for the primitive random constructions introduced in section 2.5.2: [random] and [flip] or the choice operator P p + Q . With the help of these operators, we can represent our Rml terms. For example, following our general monadic translation scheme, one can also define a conditional operation Mif of type Mbool → Mβ → Mβ → Mβ: Mif µb µ1 µ2 ≡ bind µb (fun b ⇒ if b then µ1 else µ2 ). 588

We use this operator for interpreting conditional programs: [if b then e1 else e2 ] ≡ Mif [b] [e1 ] [e2 ] 3.2.5

Properties

We prove the monotonicity of the bind operation. Assuming µ1 , µ2 : Mα, M1 , M2 : α → Mβ: µ1 ≤ µ2 M1 ≤ M2 bind µ1 M1 ≤ bind µ2 M2 3.3

Managing recursive definitions

As expected, the difficult part is the interpretation of general fix-points. We distinguish two cases, one where termination is total, like in the case of primality testing, in which case we can use the fix-point constructions of Coq in order to interpret the recursively defined distribution and the general case, like in the example of the Random walk, where we use a limit construction.

3.3.1

Total recursive functions

We assume the function f is recursively defined in Rml and has type β1 → · · · → βn → β. let rec f x1 . . . xn = e A natural idea in order to interpret f in Coq as a function [f ] defining a measure of type β1 → · · · → βn → Mβ, would be to use the same recursive definition in Coq: let rec [f ] x1 . . . xn = [e] However, this is not always possible in Coq. The prover accepts a recursive definition for f when there is an argument xi of type βi with βi an inductive type and all recursive calls (f a1 . . . an ) in the body e are such that ai is a value structurally smaller than xi . If the definition of f in Rml satisfies this criteria (for one of its arguments) and if the structurally smaller elements ai do not contain randomized constructions, then this is also the case of recursive calls to [f ] in [e] and the recursive definition of [f ] in Coq will be valid. The function prime test in 589

section 2.6 gives an example of this case: it is a structural recursion on the variable n. Another important case of recursive definitions in Coq is the case of wellfounded recursive definitions. We assume given a relation ≺ on one of the arguments xi of type βi which is proved to be well-founded and such that all recursive calls (f a1 . . . an ) in the body e are such that ai is a non randomized construction and ai ≺ xi is provable. Such that the Coq definition of [f ] using well-founded recursion is also valid.

3.3.2

Limit of distributions

In order to interpret recursive functions in which recursive calls are not obviously terminating as in the previous cases, we need to take limits of sequences of distributions. As mentioned in section 3.1.1, there is a ω-cpo structure on the functional m type Mβ = (β → [0, 1]) → [0, 1], it is not difficult to show that the least-upper bound operation preserves the measure stability properties, such that the set distr β is also an ω-cpo.

3.3.3

Fix-points

For the sake of clarity, this explanation is restricted to unary recursive definitions; the n-ary case is handled similarly. Let us consider we want to define a function which satisfies the equation let rec f x = e where f is assumed to take an argument in type α, and returns a random value of type β, such that it has type α → β and [f ] will have type α → Mβ. We introduce F of type (α → Mβ) → α → Mβ defined by (fun [f ] ⇒ fun x ⇒ [e]). We assume F to be monotonic: h ≤ g ⇒ F h ≤ F g. Using the ω-cpo structure on α → Mβ, we construct the fix-point fix F of type α → Mβ, this function will be our interpretation of f . As mentioned in section 3.1.1, the inequality fix F x ≤ F (fix F ) x holds. The equality is only provable when F is continuous. We have proven lemmas stating that the bind operation seen as a monotonic m m function of type distr A → (A → distr B) → distr B is continuous. We have also that the fixpoint operation seen as a monotonic function from c c D → D to D is continuous with D → D the set of continuous functions from D to D. We can deduce (as a meta-theorem that we did not formalize) 590

that functionals generated from Rml expressions will satisfy the continuity hypothesis. To summarize this section, when a recursive function is introduced in Rml using the declaration: let rec f x = e we interpret it as a function [f ] defined in our meta-language by m

let rec [f ] x = fix (fun [f ] ⇒ fun x ⇒ [e]) x We will explain in the next section how to prove properties of such programs.

4

Derived rules for reasoning on programs

As far as fix-points are concerned, well founded recursive definitions are dealt with as usual in Coq, and need no further development in this article. In this section, we develop an extended axiomatic semantics for Rml programs (section 4.1), with some particular attention to general recursive definitions. Actually, the very novelty when considering some probabilistic program e is the fact that e may not terminate on every initial state, but rather terminates almost surely, which is is a weaker property. From the operational point of view, this property expresses that e will terminate eventually. This is developed further in section 4.4.

4.1

Extending Kozen’s minoring derivation rules

For reasoning about programs, it is convenient to use an axiomatic semantics that provides rules by induction on the structure of the program, stating as usual, how some post-condition is satisfied after execution, provided some precondition holds. In fact, in the context of probabilistic programs, we are interested (see also (Kozen, 1983)) in deriving some information on the probability for a certain property to hold. Given e : β, its monadic interpretation [e] : Mβ is meant to represent a measure on β, which computes for a function f : β → [0, 1], its expectation [e]f ∈ [0, 1]. (Usually f will be the characteristic function IP of some predicate P of type β → bool, in which case [e]IP computes the probability for the property P .) The expression [e]f computes the exact expectation, while in general it would 591

be easier to reason on approximation of this value that will be given by a possible interval of values. Obviously, 0 6 [e]f 6 1 is the worst surrounding we can get for this expectation. Whenever [e]I = 1, we understand that [e] is a probability, which also means that e terminates almost surely. On the contrary, the obvious meaning of [e]I = 0 is that e diverges almost surely. Besides these particular cases, we expect to derive a 6 [e]f 6 b framings, where a 6 b ∈ [0, 1], that is to say [e]f ∈ [a, b]. Therefore, our precondition is going to be some interval I ⊆ [0, 1]. Post-conditions should be similar, but expected to depend on the value returned from the computation of e, since we are dealing with functional programs. Thus, post-conditions are taken to be interval-valued functions F , such that ∀x : A, F x ⊆ [0, 1]. As a matter of consequence, we provide rules for deriving judgements of the form [e]F ⊆ I, which extends Kozen’s k ≤ [e]f rules (where k ∈ [0, 1], e is an expression of type β and f is a function of type β → [0, 1]) in a consistent way: The minoration k ≤ [e]f is rewritten as k 6 [e]f ∧ [e]I 6 1, owing to the fact the interpretation [e] is monotonic. Before going through the details, let us notice that this presentation could have been settled in the usual Scott’s domains framework (Scott, 1972), where the set I of intervals included in [0, 1] is turned into an ω-cpo, with ordering the converse of inclusion, [0, 1] as bottom element and intersection as the least upper-bound operation. As a matter of fact, if we do not restrict ourselves to the unit interval, this is Scott’s Interval Domain, which is the interpretation for abstract data type R in his model for functional programming. We do not need to deal with the full presentation for our purpose, but for two important points. First of, maximal elements of the Interval Domain are singleton sets {r} ≡ [r, r], where r ∈ R. In our framework, maximal elements are the same, restricted to r ∈ [0, 1], and are associated (obviously) to equality proofs. In other words, maximal interval matches the best information we can derive for some probability, while [0, 1] matches the worst, useless information. Secondly, we have to cope with recursive definitions, in which case we shall need monotonic interval sequences (In )n such that for all n, In+1 ⊆ In . Then, the least upper bound ∩n In is well defined. This is going to be sufficient in this setting.

4.2

Definition on intervals

An interval I is given by its lower bound low I and its upper bound up I such that 0 ≤ low I ≤ up I ≤ 1, and we write it [low I, up I], we use the notation 592

{r} for the singleton interval [r, r]. We write I the set of intervals. We have the expected definition on membership and inclusion : • x ∈ [a, b] is defined as a ≤ x ≤ b • [a, b] ⊆ [c, d] is defined as c ≤ a ∧ b ≤ d. Operations on intervals can be lifted to interval functions. For an interval function F , we write low F for the function fun x ⇒ low (F x) and similarly up F for the function fun x ⇒ up (F x). The operation of a distribution e on A on an interval function F on A is written [e]F, it is an interval defined by [[e](low F ), [e](up F )]. Given two functions f and g of type β → [0, 1], we shall write [f, g] for the interval function fun x ⇒ [f x, g x] and {f } for the singleton function [f, f ]. Because of the monotonicity of distributions, it is easy to show that for a function f in β → [0, 1], if for all x, f x belongs to the interval F x, then [e]f ∈ [e]F . We have also that [e]{f } = {[e]f } such that nothing is lost when considering intervals. We also extend operations of addition and multiplication to intervals: • [a1 , b1 ] + [a2 , b2 ] = [a1 + a2 , b1 + b2 ] • k × [a, b] = [k × a, k × b]

4.3

Basic (non recursive) rules

From now on, I, J, K ⊆ [0, 1] stand for intervals and F, G, H for interval-valued functions. We derive proofs for [e]F ⊆ I along the following cases. Representation of intervals on the Coq is done with no additional effort. The interpretation of Rml terms however need now being reconsidered as acting on interval-valued functions instead of simple functions. This is straightforward along the following points: • [v]G = G v when v is a variable, a constant or a non-randomized term • [let x = a in e]G = [a] (fun x ⇒ [e]G) • [if e0 then e1 else e2 ]G = [e0 ]I.=true × [e1 ]G + [e0 ]I.=false × [e2 ]G The functions [random] and [flip] associated to the primitive randomized constructions also operate on intervals functions like on real functions. 1 (G i) • [random n]G = Σni=0 1+n 1 • [flip ()]G = 2 (G true) + 12 (G false)

593

From these equalities, we can derive the following rules: G2 ⊆ G1

[e]G1 ⊆ I1 [e]G2 ⊆ I2

I1 ⊆ I2

[a]F ⊆ I ∀x, [e]G ⊆ F x [let x = a in e]G ⊆ I [e1 ]G ⊆ I1 [e2 ]G ⊆ I2 [if e0 then e1 else e2 ]G ⊆ [e0 ]I.=true × I1 + [e0 ]I.=false × I2 We can derive in our formalism useful schemes which generalize reasoning on deterministic programs. For instance, if we have established that an expression a satisfies a predicate P with probability 1, then it is possible to reason subsequently exactly as if P was true for the result of the computation of a. This is stated in the following derivable rule: [a]IP = 1 ∀x, P x ⇒ [e]F ⊆ I [let x = a in e]F ⊆ I 4.4

Rules for fix-points

In that part, we use the same notations as in section 3.3.3. We want to prove properties of a recursive definition in Rml: let rec f x = e with x of type α, m and e of type β. We introduce F a monotonic operator of type (α → Mβ) → α → Mβ as in 3.3.3 such that [f ] = fix F . We also introduce the notation f · G when f has type α → Mβ and G has type α → β → I. The expression f · G will denote a function of type α → I defined by (f · G) x is the value [f x](G x) of the measure (f x) on the function (G x). We allow ourselves to use the same notation when g is a real-valued function of type α → β → [0, 1], in which case f · g will be a function of type α → [0, 1]. The function g plays the role of an input-output relation: given a binary relation R on α and β then we can take g of type α → β → [0, 1] to be the characteristic function of R, in that case f · g corresponds to the function which associates to x the probability of R(x, f x).

4.4.1

Basic estimation

We now justify the rule for estimating fix-points which agrees and extends the ideas presented by Jones (1989). Let us give the general idea in the first 594

place. The Rml definition let rec f x = e for f can also be considered as the fix-point of some functional F such that [f ] x = fix F x. Given the interval-valued function G, we want to estimate [f x]G, so to find I such that [f x]G ⊆ I. The maximal interval I = [0, 1] is a trivial solution. Now the fix-point is the result of the iteration of the functional F , so if it is possible to decrease the interval at each step, we can deduce an approximation for f . This leads to the following provable rule, assuming a given monotonic sequence (In )n of interval-valued functions on type α such that: ∀x, 0 ∈ I0 x, and for n ≥ 0, In+1 ⊆ In . ∀n, ∀h : α → Mβ, (h · G ⊆ In ) ⇒ (F h) · G ⊆ In+1 T fix F · G ⊆ n In The proof is a direct consequence of the following equalities with G = [g1 , g2 ] and In = [pn , qn ], where (pn )n is an increasing sequence starting from 0 and (qn )n is a decreasing sequence: fix F · [g1 , g2 ] = [lub(F n 0) · g1 , lub(F n 0) · g2 ] ⊆ [lub (pn ), glb (qn )] The rule above estimates an upper-bound of the fix-point using a decreasing sequence, it is sometimes more convenient to use increasing sequences both for lower and upper bounds of the intervals. In this case, assuming (pn )n and (qn )n are both increasing sequences of functions of type α → [0, 1] with the proviso that for all x, p0 x = 0, we can prove the following result: ∀n, ∀h : α → Mβ, (h · G ⊆ [pn , qn ]) ⇒ (F h) · G ⊆ [pn+1 , qn+1 ] fix F · G ⊆ [lub (pn ), lub (qn )] No continuity condition on F is required to validate the above rules. As mentioned in section 3.3.3, continuity is only necessary to ensure that fix F is indeed a fixpoint of F .

4.4.2

Advanced schemes

The previous scheme gives the general idea. However, reasoning with fix-points is always tricky, and it would be handy to involve some more advanced schema in the process. While one is required to find an appropriate invariant, there are some systematic ways to find it depending on the form of F . In this section, we make intensive use of notations introduced at the beginning of the section. 595

In this part, we took inspiration from the loop rules in pGCL introduced by Morgan (as described in McIver and Morgan (2005)) and propose a systematic generalization to the case of recursive functions. Let us make some preliminary observations. We start from a recursive definition let rec f x = e on type α → β. Assuming f is deterministic and we want to prove that ∀x, P (f x), a natural approach is to try to find an inductive argument which shows that the body e of the function f satisfies P assuming the recursive calls in e do. More formally, if the definition f corresponds to the functional F , we can try to prove for an arbitrary function h that, ∀x, P (h x) implies ∀x, P (F h x). We use a similar approach for randomized programs. Instead of the property P , we start from a function g : α → β → [0, 1] to be estimated and we try to relate the estimation of the body of the recursive function (F [f ] · g) to the estimation of the recursive calls by using properties of F . If we succeed, it m means that we found a functional Fg (of type (α → U ) → (α → U )) such that the following diagram commutes for an arbitrary h of type α → Mβ. h_ 



ωg

/h · g _

F

F h



ωg

/ (F



Fg

h) · g

Whenever Fg exists, we get for all n > 0, the relation : ωg ◦ F n = Fgn ◦ ωg which expresses a simulation relation between the fix-point issued from the source program through iterations of the functional F when applied to g, and the fix-point which can be computed by applying the functional Fg . Therefore, we understand that the value [f ] · g can be reached as well from the sequence of iterations Fgn . In fact: [f ] · g = fix F · g = lub (F n 0) · g = lub (F n 0 · g) = lub (Fgn (0 · g)) = fix Fg We now give the general definition. m

Definition 4 Given a functional F of type (α → Mβ) → (α → Mβ) a function g of type α → β → [0, 1], we say that a functional Fg of type m (α → [0, 1]) → (α → [0, 1]) commutes with F for the expectation g when the following property holds: ∀h, (F h) · g = Fg (h · g)

(1)

We will say that Fg weakly commutes with F when 596

∀h, (F h) · g ≤ Fg (h · g)

(2)

An important consequence of the existence of Fg is that the estimation of expectation for the fix-point can be related to the fix-point of Fg as stated in the following lemma. Proposition 5 Given a real-valued function g of type α → β → [0, 1] and a m monotonic operator Fg of type (α → [0, 1]) → (α → [0, 1]): • if Fg weakly commutes with F for g then fix F · g ≤ fix Fg . • if Fg commutes with F for g then fix F · g = fix Fg . Now we can use the fact that fix Fg is an initial fix-point, such that if we can find a real-valued function φ of type α → [0, 1] such that Fg φ ≤ φ then we deduce fix Fg ≤ φ and combining this result with the last property, we obtain the following result: Proposition 6 Given a real function g of type α → β → [0, 1] such that there exists a monotonic operator Fg which weakly commutes with F for g, if Fg φ ≤ φ then fix F · g ≤ φ. In most cases, we also want a minoration for fix F · g. For that, we have to reverse this result and consider how the distribution fix F operates on 1−g. Proposition 7 Given a real function g of type α → β → [0, 1] such that there exists a monotonic operator F1−g which weakly commutes with F for 1−g, if F1−g (1−φ) ≤ (1−φ) then φ & (fix F · I) ≤ fix F · g. The function fix F · I associates to each x the probability that the recursive function terminates on x.

PROOF. The value x & y is defined in our formalism as 1−((1−x) + (1−y)) using our bounded addition and corresponds to the real max(0, x + y − 1). In particular x & 1 = x so for any function f , f & I = f . The proof uses the fact that for any distribution µ of type Mβ, we have (1−µ(1−h1 )) & µ(h2 ) ≤ µ(h1 & h2 ). From the previous proposition applied to 1−φ we have fix F · (1−g) ≤ 1−φ. so φ ≤ 1−fix F · (1−g) then: φ & (fix F · I) ≤ (1−fix F · (1−g)) & (fix F · I) ≤ fix F · (g & I) = fix F · g

There is a special case where we can get a minoration by φ, this is when 597

φ ≤ fix F ·I which can be seen as a generalisation of the fact that our invariant estimation φ implies termination of the fix-point. In order to obtain this result, we need (fix F · I) − φ to be a pre-fixpoint of F1−g . Proposition 8 Let g be a real function of type α → β → [0, 1] such that there exists a monotonic operator F1−g which weakly commutes with F for 1−g. If the properties F1−g ((fix F · I) − φ) ≤ (fix F · I) − φ and φ ≤ fix F · I hold, then φ ≤ fix F · g.

PROOF. This results is obtained using the previous proposition with the invariant φ0 = φ + 1 − (fix F · I). We have 1 − φ0 = (fix F · I) − φ such that F1−g (1−φ0 ) ≤ 1−φ0 by hypothesis, consequently φ0 & fix F · I ≤ fix F · g. The final result comes from properties of + and & on [0, 1]: φ0 & fix F · I = (φ + (1−fix F · I ) & fix F · I = φ

4.4.3

Application to loops

We can define recursively a loop function in Rml. We assume given a type S for states, a boolean condition cond of type S → bool and a body body of type S → S. let rec loop s = if cond s then let s’ = body s in loop s’ else s The interpretation [cond] will have type S → Mbool and [body] will have type S → MS. We introduce the terms ctrue s = [cond s]I.=true and cfalse s = [cond s]I.=false We want to measure a function g of type S → [0, 1] on the output state of loop, which does not depend on the input state. We still use the notation f · g in place of the more verbose f · fun s ⇒ g. We write F for the functional associated to loop. We have: (F f ) · g = fun s ⇒ (ctrue s) × [body s](f · g) + (cfalse s) × (g s) Such that the functional Fg which commutes with F for g can be defined the following way: Fg h = fun s ⇒ (ctrue s) × [body s]h + (cfalse s) × (g s) 598

It is easy to check the following property : F1−g (1−h) ≤ 1−(Fg h) such that the condition φ ≤ Fg φ is sufficient to ensure F1−g (1−φ) ≤ 1−φ. And we can derive the following theorem : Proposition 9 Given g, φ and ψ of type S → [0, 1], assuming ∀s, φ s ≤ (ctrue s) × [body s]φ + (cfalse s) × (g s) and ∀s, (ctrue s) × [body s]ψ + (cfalse s) × (g s) ≤ ψ s we can deduce φ & [loop] · I ≤ [loop] · g ≤ ψ In case cond is a non randomized construction, let C s be the property cond s = true. The condition: φ s ≤ (ctrue s) × [body s]φ + (cfalse s) × (g s) becomes: C s ⇒ φ s ≤ [body s]φ and ¬C s ⇒ φ s ≤ g s which is a generalization of the loop rule in axiomatic semantics, φ being the invariant which should be preserved in the body (when the condition is true) and should establish the post-condition at the end (when the condition is false). We consequently have the following rule which corresponds to the total loop correctness rule in McIver and Morgan (2005): ∀s, C s ⇒ φ s ≤ [body s]φ ∀s, φ s & [loop s]I ≤ [loop s](φ & I¬C )

5

Applications

We apply our approach for proving properties of simple randomized programs.

5.1

Probabilistic termination

We return to our example of section 2.6.2, a random walk which illustrates probabilistic termination. let rec walk x = if flip() then x else walk (x+1) We show that this program terminates with probability one. For that it is enough to prove that: ∀x, [walk x]I = 1. 599

The functional F to be considered is: m

fun [walk] ⇒ fun x ⇒ [if flip() then x else walk (x + 1)] when w : nat → Mnat, x : nat and g : nat → [0, 1] to be measured, we have: 1 1 (F w · g) x = (g x) + (w · g) (x + 1) 2 2 m

We can introduce Fg of type (nat → [0, 1]) → (nat → [0, 1]) such that 1 1 Fg h x = g x + (h (x + 1)) 2 2 and check the commutation property between Fg and F . In case g is the function I we get the functional FI h x =

1 1 + (h (x + 1)) 2 2

we know by proposition 5 that [walk x]I = fix FI x what remains to be computed is fix FI x. The real fix FI x is the least-upper bound of a sequence (pi )i such that p0 = 0 and pi+1 = 12 + 21 pi . It is easy to show that pn = 1− 21n , that the least upper bound of the sequence (pi )i is 1 such that fix FI x = lub(pn )n = 1. 5.2

Parametrized termination

This example is taken from Ycart (2002), adapted here to fit with our restriction to discrete random distributions. It can be seen as a generalisation of walk where the probability to stop or continue is given in each point by an arbitrary function K x. We assume given a non-randomized function K of type nat → nat and an integer N . We write also Y x for the element of [0, 1] defined as (K x)/1 + N . 600

The function we want to study is defined by the following Rml program: let rec ω x = if random N < K x then x else ω (x + 1) We have [ω] = fix F , where F f x ≡ [if random N < K x then x else f (x+1)]. Let us start with some informal observations. Given θ : nat → [0, 1], assume we want to approximate the value of [ω x]θ ∈ [0, 1]. From a mathematical point of view, this is a summation. Let us have a naive look at it: Z

θ(y)d[ω x](y) = (Y x)θ x + (1−(Y x))

Z

θ(y)d[ω (x + 1)](y)

From the section 2.5.5, we know our monadic interpretation expresses the same idea, in a more formal setting.

5.2.1

Putting advanced schemes at work

[ω x]θ = (Y x)θ x + (1−(Y x))[ω (x + 1)]θ = (Y x)θ x + (1−(Y x))(Y (x + 1))θ (x + 1) +(1−(Y x))(1−(Y (x + 1))[ω (x + 2)]θ =... = (Y x) × θ x + · · · +

x+n Y

(1−Y k)[ω (x + 1 + n)]θ

k=x

We observe that the potential source of divergence depends on the behaviour Q of the infinite product R∞ (x), limit of the sequence Rn (x) ≡ x+n−1 k=x (1−Y k). Let us make this observation more formal. Considering the functional F which defines the fix-point, we rather get: [F f x]θ = (Y x)θ x + (1−(Y x))[f (x + 1)]θ

(3)

This turns out to be an application of the properties presented in section 4.4.2. From equation 3, we get that the commutation property holds with the functional Fθ h x = (Y x) × (θ x) + (1−Y x) × (h (x + 1)) When θ is the unit function I, we obtain : FI h x = (Y x) + (1−Y x) × (h (x + 1)) 601

The proposition 5 ensures that [ω x]I = fix FI x so it remains to compute this fixpoint, it is the limit of a sequence sn such that s0 x = 0 and sn+1 x = (Y x) + (1−Y x) × (sn (x + 1)). One shows by induction on n that sn x =

n−1 X

Y (x + k) × Rk (x)

k=0

with Rn (x) as defined above. then using the fact that Y (x + k) × Rk (x) = Rk (x)−Rk+1 (x) we deduce sn x = R0 (x)−Rn (x) = 1−Rn (x) and consequently Q the expected limit of sn x is equal to 1− ∞ i=x 1−(Y i). We deduce the expected result: [ω x]I = 1−

∞ Y

1−(Y i)

i=x

We now illustrate the use of other rules for fix-points. We may be interested to show that the function ω applied on x never outputs value less than x. Because it is a property always true, one possibility would be to use the power of the Coq type system and have a semantic which associates to x a distribution on numbers greater or equal to x. However, if we stay in our Rml framework, we may want to prove that the probability for ω x to output a value less than x is 0, which can be rephrased as [ω x]I.