ship between computer programs and mathematical proofs. According to this theory, proposi- tions are types and (correct)
Formalization of statistical conditional independence relations using Coq/SSReflect Jinfang Wang, Manabu Hagiwara and Mitsuharu Yamamoto Department of Mathematics and Informatics, Graduate School of Science, Chiba University, Japan
Abstract The cain (Wang, 2010) is an axiomatic algebraic system for manipulating probabilistic conditional independence (PCI) relations. The axioms and the PCI relations are all in equational forms. In this paper we shall show how to formalize the theories of cain using Coq/SSReflect, an interactive theorem prover to mechanically check the correctness of mathematical assertions. This formalization opens the possibility for automatically check the correctness of a set of PCI relations given another set of PCI relations.
1 Introduction Several axiomatic systems have been proposed for studying probabilistic conditional independence relations. Among such systems, the most well known ones are the graphoid of Pearl and Paz (1987) and the separoid of Dawid (2001), these axiomatic systems being based on the fundamental properties satisfied by PCI. For instance, symmetry is an intrinsic property for the ternary CIP relation X Y |Z, that is, X Y |Z holds if and only if Y X|Z. Another axiomatic system called cain (Wang, 2010), however, takes a quite different approach. The cain algebra axiomatizes the most fundamental algebraic properties concerning the probability density (mass) functions, In addition to other properties, for instance, the collection of probability density (mass) functions forms an abelian group with respect to the usual product for real numbers. The cain enjoys a great advantage over either the graphoid or the separoid in that all the relation concerning PCI are expressed in equal forms. To demonstrate the possible advantage of this approach, we shall formalize the theories of cain using theorem-prover Coq/SSReflect. Coq is an interactive theorem prover, which is based on the theory of the calculus of inductive constructions. Coq allows the expression of mathematical assertions and mechanically checks proofs of these assertions. The Curry-Howard isomorphism expresses a direct relationship between computer programs and mathematical proofs. According to this theory, propositions are types and (correct) proofs are programs. SSReflect (small scale reflection) is an extension of coq developed in the path to formalize the proof of the Four Color Theorem (2004), The proof of the Feit Thomson Theorem (Feit and Thompson, 1962, 1963) has also been formalized using ssrelfect (???, 2012). The Feit Thomson Theorem states that every finite group of odd order is solvable.This can be inputted into ssrelefect in the following.
Theorem (gT:finGroupType) (G:{group gT}):odd |G| → solvable G.
The actual proof involved more than 170,000 lines of codes with more than 15,000 definitions and 4,200 theorems. Most recently, the formalization of the Kepler Conjecture (??) has also been completely (??, 2014). SSReflect has also been partially used in this formations.
2 The Cain Algebra for Conditional Independence Now we give a quick review of the cain algebra (Wang, 2010). The motivation of the cain algebra is to develop a theory of PCI in a purely universal algebraic fashion so that PCI relations are all represented in equational forms. One potential advantage of this approach is that one may derive PCI relations from other PCI relations automatically using computer algorithms.
2.1 The cainoid Let (L, ≤) be a bounded lattice with bottom ∅. We shall use the symbol xy (reads as coin-xover-y) to denote the elements of the direct product L ⊗ L, with the conventions, x = x∅ , y = ∅ ∅ x the raising coin with context x, y the lowering coin with context y, y , ∅ = 1 . We call x and y the mixed coin with raising context x and lowering context y. All these coins are called atom coins. A coin is a string or a concatenation = xy11 · · · xynn of n atom coins, the collection of which is denoted by { } x1 xn C= x , y ∈ L, i = 1, · · · , n , n ∈ N . (2.1) · · · i i yn y1 We introduce a binary dot operator, · : C × C → C, which obeys the following axiomatic properties. D EFINITION 1 (Cainoid; Wang (2010)). An algebraic structure (C, ·), is called a cainoid, if for any , ′ , ′′ ∈ C and any x, y ∈ L, the following hold: C1: C2: C3: C4: C5:
= ′· ( · ′ ) · ′′ = · ( ′ · ′′ ) 1· = x · x=1 x x∨y · y (where x > ∅) y = ·
′
In Definition 1, x , x and xy may be regarded as algebraic abstractions of the joint probability density function (p.d.f.) f (x), the reciprocal 1/f (x), and the conditional p.d.f. f (x|y), respectively. So a coin xy11 · · · xynn may then be regarded as an abstraction of the product f (x1 |y1 ) · · · f (xn |yn ) of n conditional p.d.f.’s. Axiom C5 is an abstraction of the definition of conditional density functions.
2.2 The cain The properties included in Definition 1 are not sufficient for studying PCI relations. For instance, we obviously need to deal with the relation between marginal and joint p.d.f. which are 2
dined through integration. The next key idea in constructing cain algebra is to abstract those properties of p.d.f. related to integration. In the following definition we shall use the notation {y} to denote a coin with context y. D EFINITION 2 (x-integrability). Let x ∈ L. A coin less than x, that is, = {y} with y ≥ x.
is said x-integrable if
has context no
We have not defined the context of a coin, which is a somewhat involved concept; see Wang (2010) for details. To introduce integration, we make a further assumption on lattice L. A bounded lattice L, with a top ⊤ and a bottom ∅, is called a complemented lattice or ortholattice, if for each x ∈ L there exists a complement, denoted by x¯, so that x ∨ x¯ = ⊤ , x ∧ x¯ = ∅ . If L is also distributive, then x¯ is also unique. From now on we shall assume that L is a complemented distributive lattice, i.e. a Boolean algebra. D EFINITION 3 (coin marginalization). For an arbitrary x ∈ L, let∫ D(x) be the set of all∫ xintegrable coins. The x-marginalization is a function, denoted by x , from D(x) into C , x : D(x) −→ C, so that for any {y} ∈ D(x), there is a unique coin {y ∧ x¯} ∈ C such that ∫ {y} = {y ∧ x¯} . (2.2) x
Further, the function (i) If
y
∫ x
satisfies the following three properties:
is x-integrable then
∫ y
=
y∧¯ x
.
(2.3)
x
(ii) Let x = x1 ∨ x2 with x1 ∧ x2 = ∅ (that is, x1 and x2 are relative complements w.r.t. x). Let = {y1 } {y2 } be x-integrable, where {y1 } is x1 -integrable and {y2 } is x2 -integrable. Further assume that x1 ∧ y2 = x2 ∧ y1 = ∅. Then it holds ∫ ∫ ∫ {y1 } {y2 } = {y1 } {y2 } . (2.4) x1 ∨x2
(iii) For any
∈ C, it holds
x1
x2
∫ =
.
(2.5)
∅
∫N OTATION 2.1. To mimic the conventional notation for integration, we shall replace dx . With the above notation, axioms (2.3)-(2.5) can be rewritten as ∫ y y∧¯ x dx = , ∫ ∫ ∫ ( {y1 } {y2 }) d(x1 ∨ x2 ) = {y1 }dx1 ∫ d∅ = .
3
∫ x
by
(2.6) {y2 }dx2 ,
(2.7) (2.8)
(2.6) is analogous to the definition of marginal probability density functions. (2.7) is an analogue of the following property for conventional integration ∫ ∫ ∫ f (x, z)g(y, z) dxdy = f (x, z) dx g(y, z) dy . The ordinary integration also satisfies ∫ ∫ cf (x, y) dx = c f (x, y) dx , where c is a constant independent of both x and y. (2.8) is an abstraction of this property. D EFINITION 4 (cain). A cain is a cainoid further satisfying (2.6)-(2.8).
2.3 Conditional Independence Conditional independence will be defined in terms of coin identities. The following two rules concerning certain types of coin equations will be of great importance for manipulating conditional independence relations. Let [x] denote a coin in the marginal cain Cx . L EMMA 2.1 (law of normalization). Let x¯, y¯ be the complements of x and y respectively. Then x y∨z x∨y∨z
= [¯ z] ⇒
x y∨z = x∨y∨z
= [¯ y ] [¯ x] ⇒
=
x∧¯ z (x > ∅) , y∧¯ z (x∨z)∧¯ y (y∨z)∧¯ x
(2.9) z∧¯ x∧¯ y.
(2.10)
The law of normalization, or the N-law for short, is a powerful rule that enables one to coerce an ‘ambiguous’ coin equation into an ‘exact’ form. This is useful, for instance, in situations when many atom coins enter into a coin equation but we are only interested in relations on a small portion of them. Those ‘nuisance’ coins can be treated as ‘proportionality’ constant. On the other hand, a ‘large’ coin identity can give rise to many ‘small’ identities using the following law of marginalization, or the M-law. L EMMA 2.2 (law of marginalization). If x ∧ y = ∅, then for any a, b ∈ L, x∨y z
In particular, if z = ∅ then
x z
=
x∨y
=
y z x
⇒ y
(x∧a)∨(y∧b) z
⇒
x∧a z
=
(x∧a)∨(y∧b)
=
y∧b z
x∧a
.
y∧b
(2.11) .
Let L be a Boolean algebra. That is, L is a complemented distributive lattice with bottom ∅ and top ⊤ > ∅. Let C be a cain defined on L. D EFINITION 5 (conditional independence). x is said independent of y conditional on z, written x y|z, if and only if xy∨z = xz . If x y|∅ we say that x is independent of y, written x y. The N-law leads to the following seemingly weaker but equivalent condition for conditional independence. T HEOREM 2.1 (factorization). If x, y, z are nontrivial and mutually exclusive, then x y|z ⇔
x∨y∨z
x] [¯ y] , = [¯
where [¯ x] and [¯ y ] are coins of the marginal cains Cx¯ and Cy¯ respectively. 4
To appreciate how PCI relations are stated and proved in cain, we now state the well-known contraction property of conditional independence, which is a defining property of the graphoid. T HEOREM 2.2 (contraction). If x, y, z, w are nontrivial and mutually exclusive, then x y | (z ∨ w) and x z | w hold if and only if x (y ∨ z) | w . Proof. “sufficiency.” First, x (y ∨ z)|w implies, by the M-law, x z|w, which in turn implies x∨y∨z∨w = xw y∨z∨w = xz∨w y∨z∨w . So x∨y∨z∨w = xz∨w y∨z∨w , implying x y | (z ∨ w). “necessity” . x y|(z ∨ w) and x z|w imply x∨y∨z∨w = x∨z∨w yz∨w and x∨z∨w = x∨w z x∨y∨z∨w = x∨w zw yz∨w , which by M-law implies y∨z∨w = w zw yz∨w , or w . So equivalently, y∨z (y ∨ z)|w. = zw yz∨w . So x∨y∨z∨w = x∨w y∨z w w , implying x
3 Computer Verified Conditional Independence Relations The following codes define a bounded lattice L with bottom bot. To formalize the cainoid, we only need to introduce the join operator. Parameter L : eqType. Parameter bot : L. Parameters meet join : L→L→L. Axiom join_idempotent : idempotent join. Axiom join_commutative : commutative join. Axiom join_associative : associative join. Definition ge x y := join x y = x. Axiom bot_minimum : ∀ x, ge x bot.
The following first introduce the set of coins, which is of the type called choiceType. The lattice L is closed under the binary dot operator dot defined on the direct product of L. The mixed coin mix is also a binary dot operator dot defined on the direct product of L. Parameter coins : choiceType. Parameter dot : coins → coins → coins. Parameter mix : L→L→coins.
The following are special cases of mix, with bob, up x and down x corresponding to 1, x and x respectively. Definition bob := mix bot bot. Definition up x := mix x bot. Definition down x := mix bot x.
Now the axioms of a cainoid C1 through C5 may then be specified as follows. Axiom dotC : commutative dot. Axiom dotA : associative dot. Axiom bob_unitL : left_id bob dot. Axiom up_down_unitL : ∀ x, dot (up x) (down x) = bob. Axiom mix_up_down : ∀ x y, x ̸= bot → mix x y = dot (up (join x y)) (down y).
5
The following give the cain-algebraic analogy of the Bayes’ Theorem and the corresponding formalization in SSReflect. T HEOREM 3.1 (Bayes’ Theorem). If x > ∅, y > ∅, then
x y
=
y x
x
y
Theorem Bayes: ∀ x y, (x ̸= bot) ∧ (y = ̸ bot ) → mix x y = dot (mix y x) (dot (up x) (down y)).
The formal proof of this theorem can be obtained by basically applying the axioms of a cain in direct fashion. Proof. move⇒ x y; case ⇒ xneb yneb. rewrite [in RHS]mix_up_down; last exact. rewrite dotA. by rewrite -![ (dot (dot (up _) _) _)]dotA [dot (_ x) (_ x)]dotC up_down_unitL bob_unitL join_commutative mix_up_down. Qed.
The axioms defining a cainoid suffice to prove the sufficiency of the contraction property, namely, T HEOREM 3.2. If x, y, z, w are nontrivial and mutually exclusive, then x x z | w implies x (y ∨ z) | w .
y | (z ∨ w) and
Theorem contraction_sufficiency: ∀ x y z, (x ̸= bot) ∧ (y ̸=bot) ∧ ((up (join x z) = dot (up x) (up z)) ∧ (mix (join x y) z = dot (mix x z) (mix y z))) → up (join x (join y z)) = dot (up x) (up (join y z)).
To make the proof of this theorem more readable, we first prove a lemma, Lemma Delete_Down: ∀ A B x, dot A (down x) = B →A = dot B (up x). Proof. move⇒ A B x AxB. by rewrite -[in RHS]AxB -!dotA [dot (_ x) (_ x)]dotC up_down_unitL dotC bob_unitL. Qed.
The Lemma Delete_Down says that for any coins A, B and x ∈ L, A x = B implies A = B . In a formal system of proofs such as Coq, propositions are types and proofs are programs producing the right terms having the required types. To see the type of Lemma Delete_Down, we input x
Check Delete_Down.
which outputs Delete_Down : ∀ (A B : coins) (x : L), dot A (down x) = B → A = dot B (up x)
To see the type of Lemma Delete_Down, we input 6
Print Delete_Down.
which outputs Delete_Down = fun (A B : coins) (x : L) (AxB : dot A (down x) = B) ⇒ (fun _evar_0_ : A = dot (dot A (down x)) (up x) ⇒ eq_ind (dot A (down x)) (fun _pattern_value_ : coins ⇒ A = dot _pattern_value_ (up x)) _evar_0_ B AxB) ((fun _evar_0_ : A = dot A (dot (down x) (up x)) ⇒ eq_ind (dot A (dot (down x) (up x))) [eta eq A] _evar_0_ (dot (dot A (down x)) (up x)) (dotA A (down x) (up x))) ((fun _evar_0_ : A = dot A (dot (up x) (down x)) ⇒ eq_ind_r (fun _pattern_value_ : coins ⇒ A = dot A _pattern_value_) _evar_0_ (dotC (down x) (up x))) ((fun _evar_0_ : A = dot A bob ⇒ eq_ind_r (fun _pattern_value_ : coins ⇒ A = dot A _pattern_value_) _evar_0_ (up_down_unitL x)) ((fun _evar_0_ : A = dot bob A ⇒ eq_ind_r [eta eq A] _evar_0_ (dotC A bob)) ((fun _evar_0_ : A = A ⇒ eq_ind_r [eta eq A] _evar_0_ (bob_unitL A)) (erefl A)))))) : ∀ (A B : coins) (x : L), dot A (down x) = B → A = dot B (up x) Arguments A, B, x are implicit
Note that the last line is the type of Lemma Delete_Down we have just seen. Now we give a proof of the contraction property. Lemma Delete_Down is applied in the third line. Proof. move⇒ x y z; case ⇒ xneb; case ⇒ yneb; case ⇒ xzI. rewrite mix_up_down. move ⇒ /Delete_Down xyIz. rewrite join_associative [in LHS]xyIz -!dotA mix_up_down // [in LHS]xzI -![dot (dot (up x) (up z)) (down z)]dotA up_down_unitL [ dot _ bob] dotC bob_unitL mix_up_down // -![dot (dot _ _) _]dotA [dot (_ z) (_ z)]dotC up_down_unitL [dot _ bob] dotC bob_unitL //. ( ∗ The l a s t s u b g o a l ∗ ) apply /negP; case⇒ /join_eq_bot /andP. case⇒ /eqP xb yb. move: xneb; by move ⇒ /eqP. Qed.
In fact, A cainoid satisfies all the properties of an Abelian group. This fact allows us to use all the known facts and properties associated with an Abelian group, which are available in the SSReflect libraries. The following codes embed this structure of the cainoid into the ring scope. Definition zmodMixin := ZmodMixin dotA dotC bob_unitL dotV.
7
Canonical zmodType := ZmodType _ zmodMixin.
4 Discussions This is an on going project on formalization of the cain algebra, which allows formal studies of the probabilistic conditional independence relations. We showed a possible way to formalize the cainoid in SSReflect, which may be updated in future research. We shall leave the task for formalizing the remaining axioms of the cain as our future research work.
References [1] Bertot, Y. and Pierre Cast´eran, P. (2004). Interactive Theorem Proving and Program Development: Coq’Art: The Calculus of Inductive Constructions (Texts in Theoretical Computer Science. An EATCS Series), Springer: Tokyo. [2] Chlipala, A. (2013). Certified Programming with Dependent Types: A Pragmatic Introduction to the Coq Proof Assistant, The MIT Press. [3] Dawid, A. P. (2001). Separoids: a mathematical framework for conditional independence and irrelevance. Ann. Math. Art. Intell. 32 335–372 [4] Gonthier, G. and Mahboubi, A. (2010). An introduction to small scale reflection in Coq, Journal of Formalized Reasoning. 3, 95–152. [5] Pearl, J. and Paz, A. (1987). Graphoids: a graph-based logic for reasoning about relevance relations. In Advances in Artificial Intelligence (D. Hogg and L. Steels, eds.) 357–363, North-Holland, Amsterdam [6] Wang, J. (2010). A universal algebraic approach for conditional independence, Annals of the Institute of Statistical Mathematics. 62, 747–773.
8