Orientation rules for constructing Markov equivalence classes for ...

1 downloads 276 Views 250KB Size Report
Mar 24, 2005 - equivalence classes for maximal ancestral graphs. R. Ayesha Ali ... Markov equivalence class representative, given a member of the equiv-.
Orientation rules for constructing Markov equivalence classes for maximal ancestral graphs R. Ayesha Ali National University of Singapore, Singapore [email protected]

Thomas Richardson

Peter Spirtes, Jiji Zhang

University of Washington, USA

Carnegie-Mellon University, PA, USA

[email protected]

[email protected]

March 24, 2005

Abstract It is well known that there may be many causal explanations that are consistent with a given set of data. For processes that can be represented by a directed acyclic graph (DAG), many authors have characterized the common aspects of these explanations into one representation. What is less well known is how the relationships common to every causal explanation among the observed variables of some DAG process change in the presence of latent variables. Ancestral graphs provide a class of graphs that can encode conditional independence relations that arise in DAG models with latent and selection variables. In this paper we present a set of orientation rules that construct the Markov equivalence class representative, given a member of the equivalence class. These rules are sound and complete. We also show that when the equivalence class includes a DAG, the equivalence class rep-

1

resentative is the essential graph for the said DAG. Keywords: maximal ancestral graphs, joined graphs, Markov equivalence, DAG models, essential graph.

1

INTRODUCTION

A directed acyclic graph (DAG) consists of vertices and directed ( −−−Â ) edges that encode the conditional independence relations holding among the variables (vertices) of some data generating process. However, given a set of conditional independence relations, there are often many DAGs that can encode the same set of conditional independence statements. A set of DAGs that encode the same set of conditional independence relations is termed Markov equivalent. Frydenberg (1990), Verma and Pearl (1991), Chickering (1995) and Andersson et al. (1997) have characterized Markov equivalence classes for DAGs, and have presented algorithms for constructing an equivalence class representation given a member (DAG) of the equivalence class. In this paper we present an algorithm for constructing a Markov equivalence class representative for ancestral graphs, given a single member of the class. We suppose our observed data was generated by a process represented by a DAG with latent and selection variables. These models may correspond to processes in which we have observed only a subset of the variables, in a specific subpopulation. Hence some variables in the underlying DAG are not observed (‘latent’), while other variables, specifying the specific subpopulation from which our data is sampled, are conditioned upon (‘selection variables’). Unfortunately, the set of distributions obtained by conditioning on the selection variables and marginalizing over the latent variables cannot be represented by a DAG, in general, even though the underlying model can. 2

H (i) time:

Azt 1

Pcp 2

Ap 3

(ii)

CD4

time:

4

Azt 1

Pcp 2

Ap 3

CD4 4

Figure 1: (i) A DAG with a latent variable H. (ii) The ancestral graph resulting from marginalizing over H adds a bi-directed edge between P cp and CD4.

The DAG in Figure 1(i) entails the relation that Azt is marginally independent of CD4; but conditional on P cp, Azt and CD4 are dependent. Similarly, Ap is marginally independent of P cp; but conditional on CD4, Ap and P cp are dependent. There is no DAG over the variables {Azt, P cp, Ap, CD4} that can simultaneously encode these relations. However, ancestral graphs are a class of graphs that, using only the observed variables, can encode the conditional independence relations given by any data-generating process that can be represented by a DAG with latent and selection variables. In fact, every DAG with latent variables L, selection variables S and observed variables O, can be represented by an ancestral graph over the vertices in O such that the two graphs are Markov equivalent on the O margin. Further, the associated statistical models retain many of the desirable properties that are associated with DAG models. Richardson and Spirtes (2002) provide a detailed introduction to ancestral graphs and discusses the causal interpretation of such graphs. Following Andersson et al. (1997), we refer to the DAG equivalence class representative as the essential graph. For data generated by some DAG with no latent variables, one could correctly specify the associated essential graph. However, without knowing the underlying graph, one may worry that there were latent variables present, and that the learned essential graph is no longer valid. Section 2 provides relevant definitions and concepts related to DAGs and 3

ancestral graphs. Section 3 presents relevant results on Markov equivalence. In particular, Section 3.3 provides a unique representation of equivalence classes for maximal ancestral graphs. Analogous to Meek (1995), we present in Section 4 a set of orientation rules that construct the equivalence class representative for maximal ancestral graphs. We prove that these orientation rules are sound and complete, and show that whenever the equivalence class includes a DAG, the corresponding equivalence class representative is simply the DAG’s corresponding essential graph.

2

BACKGROUND

A graphical Markov model is a pair hV, Ei that represents a set of distributions where V is a set of vertices (or variables) and E is a set of edges. These distributions are compatible with an independence model, Im (·), which is a list of conditional independence statements such as “A is independent of B given S” for disjoint subsets {A, B, S} of V . Hence we write the independence model associated with a graph G as Im (G). We also use the notation “A⊥ ⊥B|S” to mean “A is independent of B given S”.

2.1

Vertex Relations

We use the following terminology to describe relations between vertices in a graph G:

If

   α −−−− β       α ≺−−− β

        

  α −−−Â β       α ≺−−− β

       

in G then α is a

      neighbour            spouse         

parent child

4

       

of β, and

   α ∈ neG (β)       α ∈ spG (β)

        

  α ∈ paG (β)       α ∈ ch (β) G

       

.

A pair of vertices which are connected by some edge will be said to be adjacent, so no vertex is adjacent to itself. A path, π is a sequence of distinct vertices that are adjacent. If there is an edge α −−−Â β, or α ≺−−Â β then there is said to be an arrowhead at β on this edge. Conversely, if there is an edge α −−−Â β, or α −−−− β then there is said to be a tail at α. Suppose hα, β, δi are vertices in a graph such that α and β are adjacent, and β and δ are adjacent. If α and δ are also adjacent, then the triple hα, β, δi is “shielded”. Otherwise, if α and δ are not adjacent, then the triple hα, β, δi is “unshielded”. A shielded triple forms a triangle. We use the following shorthand notation for endpoints in a graph containing undirected, directed or bi-directed edges: 1. “α −−−? β” is used to denote that there is a tail at α in the graph, on the edge between α and β, and that there may be a tail or an arrowhead at the β end of this edge. 2. “α ≺−−? β” is used to denote that there is an arrowhead at α, and either an arrowhead or a tail at β on the edge between α and β. 3. “α ?−−? β” is used to denote that there could be an arrowhead or tail at either end of the hα, βi edge.

2.2

Directed Acyclic Graphs

A directed acyclic graph (DAG) is a graph such that i) all edges are directed ( −−−Â ,

≺−−− ),

and ii) there are no directed paths, beginning and ending at

some vertex, that do not contradict any arrowheads (i.e. no directed cycles). For DAGs, if on a path two arrowheads meet at vertex v (i.e.

−−−Â v ≺−−− ),

then we call v a collider on that path. All other configurations ( −−−Â v −−−Â , ≺−−− v ≺−−− , ≺−−− v −−−Â )

are called non-colliders. If every non-endpoint on

a path is a collider, then the path is a collider path. The independence relations entailed by a DAG can be determined through d-separation. 5

Definition 2.1 In a directed acyclic graph, a path π between α and β is said to be d-connecting given Z if the following hold: (i) No non-collider on π is in Z; (ii) Every collider on π is an ancestor of a vertex in Z. Two vertices α and β are said to be d-separated given Z if there is no path d-connecting α and β given Z. In particular, if vertices α and β are d-separated given Z, then α⊥ ⊥β|Z. However, as discussed in the introduction, if some of the variables in the graph are not observed, then it is not always possible to represent all and only the conditional independence relations implied by the observed variables using a DAG.

2.3

Ancestral Graphs

The basic motivation for developing ancestral graphs is to enable one to focus on the independence structure over the observed variables that results from the presence of latent variables without explicitly including latent variables in the model. Permitting bi-directed ( ≺−−Â ) edges in the graph allows one to graphically represent the existence of an unobserved common cause of observed variables. For Figure 1(i) this corresponds to removing H from the graph and adding a bi-directed edge between P cp and CD4. Undirected edges ( −−−− ) are also introduced to represent unobserved selection variables that have been conditioned on rather than marginalized over. However, interpreting ancestral graphs is not always straightforward. Richardson and Spirtes (2003) provides a detailed discussion on the interpretation of edges in an ancestral graph as well as details on the basic definitions and concepts presented here.

6

Definition 2.2 A graph, which may contain undirected ( −−−− ), directed ( −−−Â ) and bi-directed edges ( ≺−−Â ) is ancestral if: (a) there are no directed cycles; (b) whenever an edge x ≺−−Â y is in the graph, then x is not an ancestor of y, (and vice versa); (c) if there is an undirected edge x −−−− y then x and y have no spouses or parents. Conditions (a) and (b) state that if x and y are joined by an edge and there is an arrowhead at x then x is not an ancestor of y, which motivates the term ‘ancestral’. Condition (c) guarantees that the configurations −−−Â γ −−−−

and

≺−−Â γ −−−−

never occur in an ancestral graph. Note that

DAGs form a subclass of ancestral graphs. DAGs and undirected graphs form subclasses of ancestral graphs. Given an ancestral graph G with vertex set V , for arbitrary disjoint sets S, L (both possibly empty) Richardson and Spirtes (2002) defined a graphical transformation such that the independence model corresponding to the transformed graph will be the independence model obtained by marginalizing over L and conditioning on S in the independence model of the original graph. Though this transformation is defined for any ancestral graph G, the primary motivation is the case in which G is an underlying (causal) DAG that is partially observed. The following two results describe configurations that may arise in ancestral graphs. Lemma 2.1 Let G be an ancestral graph containing vertices {a, b, c}. If a ?−−Â b −−−Â c? −−−? a, or a −−−Â b ?−−Â c? −−−? a then c ≺−−? a in G. Proof: If c −−−? a in G, then a ?−−Â b −−−Â c −−−? a or a −−−Â b ?−−Â c −−−? a violates the ancestral property. Hence c ≺−−? a in G. 2 7

Lemma 2.2 If hu, a, vi is a collider in the ancestral graph G, hu, b, vi is a non-collider in G, and a is adjacent to b, then a ≺−−? b in G. Proof: Since hu, b, vi is a non-collider, either b −−−Â u or b −−−Â v in G. In other words, either b −−−Â u ?−−Â a? −−−? b or b −−−Â v ?−−Â a? −−−? b in G and by Lemma 2.1 a ≺−−? b. 2 For ancestral graphs, call a non-endpoint vertex v on a path a collider if two arrowheads meet at v, i.e. −−−Â v ≺−−Â ;

−−−Â v ≺−−− , ≺−−Â v ≺−−Â , ≺−−Â v ≺−−−

or

all other non-endpoint vertices on a path are non-colliders, i.e.

−−−− v −−−− , −−−− v −−−Â , −−−Â v −−−Â , ≺−−− v −−−Â .

A natural extension of d-

separation may then be applied to ancestral graphs. Definition 2.3 In an ancestral graph, a path π between α and β is said to be m-connecting given Z if the following hold: (i) No non-collider on π is in Z; (ii) Every collider on π is an ancestor of a vertex in Z. Two vertices α and β are said to be m-separated given Z if there is no path m-connecting α and β given Z. Definition 2.3 is an extension of the definition of d-separation for DAGs (Definition 2.1) in that the notions of ‘collider’ and ‘non-collider’ now include bi-directed and undirected edges. Since m-separation characterizes the independence relations in an underlying probability distribution compatible with a graph, tests of m-separation can be used to determine when ancestral graphs are Markov equivalent to each other.

3

MARKOV EQUIVALENCE

We introduce the following: 8

Definition 3.1 Two graphs G1 and G2 are said to be Markov equivalent if for all disjoint sets A, B, Z (where Z may be empty), A and B are mseparated given Z in G1 if and only if A and B are m-separated given Z in G2 . i.e. A⊥ ⊥B|Z ∈ Im (G1 ) ⇐⇒ A⊥ ⊥B|Z ∈ Im (G2 ). In other words, two graphs G1 and G2 are Markov equivalent if Im (G1 ) = Im (G2 ), and we write G1 ∼ G2 . If G is a maximal ancestral graph then we define [G] = {G 0 | G 0 is maximal, and G ∼ G 0 } to be the Markov equivalence class containing G. The graphs in Figure 2 are Markov equivalent, as are G1 and G2 in Figure 3. As outlined in the introduction, whether or not two graphs are Markov equivalent is of great significance since the associated models contain the same set of distributions. Consequently they are indistinguishable in the asymptotic limit. At the same time, some members of the equivalence class may lead more directly to closed form MLEs (see Drton and Richardson (2004b)). In addition, for the purposes of model search it is more efficient to perform a search across equivalence classes rather than scoring individual models (Chickering (2002a), Chickering (2002b)). Frydenberg (1990) and Verma and Pearl (1991) gave simple graphical conditions for determining when two DAGs are Markov equivalent. Theorem 3.1 (DAG Equivalence) Two directed acyclic graphs D1 and D2 are Markov equivalent if and only if D1 and D2 have the same vertex set, same adjacencies, and same unshielded colliders. Meek (1995) provided the following set of orientation rules to construct the essential graph for a DAG.

9

Constructing the essential graph for DAG D (E1) Construct an undirected graph H with the same adjacencies as D. (E2) For all triples x, y, z, if hx, y, zi forms an unshielded collider in G then add arrowheads at y to the (x, y) and (y, z) edges in H thus: x −−−Â y and y ≺−−− z. (E3) If x −−−Â y −−−− z and x is not adjacent to z in H, then orient y −−−Â z. (E4) If x −−−Â y −−−Â z −−−− x in H, then orient x −−−Â z. (E5) If w −−−− x −−−Â y ≺−−− z −−−− w and w −−−− z in H, then orient w −−−Â z. (E6) Iterate steps (E3) and (E5) until no further arrowheads are added.

3.1

Maximal ancestral graphs

Independence models described by DAGs satisfy pairwise Markov properties such that every missing edge corresponds to a conditional independence relation. In general, this property does not apply to ancestral graphs. For example, there is no set which m-separates γ and δ in the graph in Figure 2(a), which motivates the following definition: Definition 3.2 An ancestral graph G is said to be “maximal” if, for every pair of non-adjacent vertices α, β there exists a set Z(α, β ∈ / Z), such that α and β are m-separated conditional on Z. These graphs are termed maximal in the sense that no additional edge may be added to the graph without changing the associated independence model. It has been shown in Richardson and Spirtes (2002) that if an ancestral graph is not maximal, then there exists at least one pair of nonadjacent vertices {α, β}, for which there is an “inducing path” between α and β where: 10

Definition 3.3 An inducing path π is a path in a graph such that each non-endpoint vertex is a collider, and an ancestor of at least one of the endpoints.

α

β

(a)

α

β

γ

δ

(b) γ

δ

Figure 2: (a) The path {γ, β, α, δ} is an example of an inducing path in an ancestral graph. (b) A maximal ancestral graph Markov equivalent to (a).

By definition, inducing paths always consist of a single edge in DAGs and in undirected graphs; hence, such graphs are always maximal. Richardson and Spirtes (2002) also showed that any non-maximal ancestral graph can be made ancestral by appropriately adding bi-directed edges to the graph. The non-maximal ancestral graph in Figure 2(a) can be made maximal by adding a bi-directed edge between γ and δ, as shown in Figure 2(b). Note that strictly speaking an inducing ‘path’ π = hα, ν1 , . . . νp , βi is a collection of paths: the collider path π, together with directed paths from each vertex νi to one of the endpoints. (The definition given here is called a ‘primitive’ inducing path in Richardson and Spirtes (2002); the concept was introduced by Verma and Pearl (1990).) In the remainder of this paper, we focus on maximal ancestral graphs since every non-maximal ancestral graph can uniquely be transformed to a Markov equivalent maximal ancestral graph by appropriately adding bidirected edges. In other words, the problem of characterizing Markov equivalence for ancestral graphs naturally reduces to that of characterizing equivalence in the case where both graphs are maximal.

11

Definition 3.4 An inducing path π = hq0 , q1 , . . . , qn i, will be said to be minimal if no proper subsequence of the vertices on π form an inducing path between q0 and qn . In other words, it is not possible to find an inducing path between q0 and qn that is shorter than π. This concept will be used to prove Corollaries 4.2 and 4.3.

3.2

Discriminating paths in ancestral graphs

A key difference between DAGs and ancestral graphs is that the having the same adjacencies and unshielded colliders, though necessary, are not sufficient conditions for determining Markov equivalence of ancestral graphs. Consider the graphs shown in Figure 3: G1 and G3 contain the same adjacencies and the same unshielded colliders, but these two graphs are not Markov equivalent to each other: In G1 , x is m-separated from y given q; but according to G3 , x is m-connected to y given q. In fact in any graph Markov equivalent to G1 , hq, β, yi forms a shielded collider. (There is only one such graph, G2 , so {G1 , G2 } forms a Markov equivalence class.) However, in general, it is clearly not necessary that two graphs have all of the same shielded colliders in order for them to be Markov equivalent. 

b

 







b











b









Figure 3: G1 , G2 , G3 have the same adjacencies and the same unshielded colliders, but G1 and G3 are not Markov equivalent. π = hx, q, β, yi forms a discriminating path for β in every graph.

Discriminating paths are special paths that, if present in two Markov 12

equivalent graphs, imply that certain shielded colliders (or non-colliders) will be present in both graphs: Definition 3.5 U = hx, q1 , q2 , . . . , qp , β, yi is a discriminating path for β in an ancestral graph G if and only if: (i) U is a path between x and y with at least three edges, (ii) U contains β, β 6= x, β 6= y, (iii) β is adjacent to y on U , x is not adjacent to y, and (iv) For every vertex qi , 1 ≤ i ≤ p on U , excluding x, y, and β, qi is a collider on U and qi is a parent of y. Given a set Z, if Z does not contain all qi , 1 ≤ i ≤ p, then the path hx, q1 , . . . , qj , yi is m-connecting where qj ∈ / Z and qi ∈ Z for all i < j. If Z contains {q1 , . . . , qp } and β is a collider on the path U in the graph G, then β ∈ / Z if Z m-separates x and y. Consequently, in any graph Markov equivalent to G containing the discriminating path U , β is also a collider on U . Similarly, if β is a non-collider on the path U then β is a member of any set that m-separates x and y, and β is a non-collider on U in any graph Markov equivalent to G containing U . In other words, hqp , β, y is “discriminated” to be either a collider or a non-collider on the path U in any graph Markov equivalent to G in which U forms a discriminating path, even though it is shielded. The paths hx, q, β, yi in Figure 3 are examples of discriminating paths for β. Note that if β is a non-collider on U , then β −−−Â y as in G3 . Lemma 3.1 If π = hx, q1 , . . . , qp , b, yi forms a discriminating path for b given Z in an ancestral graph G, and hqp , b, yi is a non-collider, then b −−−Â y in G. Proof: Since π forms a discriminating path for b given Z, b ?−−Â qp −−−Â y ?−−? b is in G. By Lemma 2.1, y ≺−−? b in G. Since hqp , b, yi forms a non-collider 13

b ≺−−Â y is ruled out: y ≺−−Â b −−−Â qp −−−Â y violates the ancestral property. Hence b is a parent of y. 2 Like inducing paths, a discriminating ‘path’ π = hx, q1 , . . . , qp , β, yi is in fact a collection of paths: x ?−−Â ≺−−− · · · ≺−−− qp ≺−−? β ?−−Â y x ?−−Â q1 ≺−−− · · · ≺−−− qj −−−Â y

(1 ≤ j ≤ p)

with the requirement that the endpoints x and y are not adjacent. We will often refer to some path π between x and y as a discriminating path for β, thereby implicitly specifying the triple for which the path is discriminating to be the triple containing y and β and the vertex prior to β on the path; by convention we order the endpoints of the discriminating path so that it is the second endpoint (in this case, y) which is in the discriminated triple. It is clear that discriminating paths, when present in both graphs, lead directly to necessary conditions for Markov equivalence. However, a discriminating path for a given triple may not be present in all graphs within a Markov equivalence class. We avoid this problem by identifying, via a recursive definition, a sub-class of discriminating paths (those ‘with order’) which are always present. In particular, define a hierarchy of triples as follows: Definition 3.6 Let Oi (i ≥ 0) be the set of triples of order i, defined recursively as follows: Order 0: A triple hα, β, γi ∈ O0 if α and γ are not adjacent in G. Order i + 1: A triple hα, β, γi ∈ Oi+1 if (1) hα, β, γi ∈ / Oj , for some j < i + 1 and

14

(2) there exists a discriminating path π = hx, q1 , . . . , qp = α, β, γi for β in G, and each of the colliders on the path: hx, q1 , q2 i, . . . hqp−1 , qp , βi ∈

[

Oj .

j≤i

If hα, β, γi ∈ Oi then the triple is said to have order i. A discriminating path is said to have order i if every triple on the path has order at most i, and at least one triple has order i. If a triple has order i for some i, then we will say that the triple has order, likewise for discriminating paths. In each graph in Figure 3, the triple hx, q, βi has order 0, while hq, β, yi has order 1. It is important to note that not every triple in a graph will have an order. For example, no triple in Figure 2(b) has order. Ali et al. (2004) proved the following Markov equivalence result for maximal ancestral graphs. Theorem 3.2 (Markov Equivalence) Maximal ancestral graphs G1 and G2 are Markov equivalent if and only if G1 and G2 have the same adjacencies and the same colliders with order. Though discriminating paths can exist in DAGs, they are not important for determining Markov equivalence because such paths always discriminate non-colliders (see G3 in Figure 3): if hx, q, βi forms a collider, then since there are no bi-directed edges in a DAG, it follows that β and x are parents of q.

3.3

Joined Graphs

Andersson et al. (1997) constructed the essential graph, a Markov equivalence class representation for DAGs, by retaining all edges common to every member of the equivalence class. In other words, the essential graph associated with DAG D is a partially directed graph with the same skeleton 15

as D such that an edge along a particular path is oriented if and only the edge has the same orientation along the analogous path in every DAG in the equivalence class. Also, the essential graph [D] is a chain graph and is Markov equivalent to every DAG in the equivalence class. See Frydenberg (1990) and Andersson et al. (1997) for more details on chain graphs. Note that the arrowheads in the essential graph form a subset of the arrowheads that were present in D. Suppose that one could correctly specify the essential graph associated with some DAG. One may subsequently worry that if we allow for the presence of latent variables, then the essential graph may no longer entail all and only the arrowheads present in the entire equivalence class. However, it turns out that even in the presence of latent variables, the essential graph remains correct (see Theorem 4.3). The join operation allows one to associate each equivalence class with a unique graph by identifying the features common to the set of Markov equivalent ancestral graphs. Briefly, the join operation can be thought of as an AND operation on the “arrowheads” of the set of Markov equivalent ancestral graphs being joined, and an OR operation on the “tails” of these graphs. Following Andersson et al. (1997), Ali and Richardson (2002) define: Definition 3.7 If G1 and G2 are two graphs with the same adjacencies then define G1 ∨ G2 to be a graph with the same adjacencies such that, on an edge between α and β, (i) there is an arrowhead at β in G1 ∨ G2 only if there is an arrowhead at β in G1 and G2 . (ii) there is a tail at β in G1 ∨ G2 if there is either a tail at β in G1 or a tail at β in G2 . Note that if two maximal ancestral graphs are Markov equivalent, then by Theorem ?? they will have the same adjacencies. 16

We also define:

_

sup[G] =

G0

G 0 ∈[G]

Hence sup[G] is a simple representation of Markov equivalence classes of maximal ancestral graphs. Ideally, any representation of an equivalence class of ancestral graphs would encode the same independence model encoded by all the ancestral graphs in the equivalence class. Ali and Richardson (2002) and Ali and Richardson (2004) defined Markov properties for joined graphs and proved that sup[G] is in fact Markov equivalent to every ancestral graph in the equivalence class.

x z

y G

x



w

y

z

G

1

x

=

w

z

2

y w

H

Figure 4: An example of joining two Markov equivalent ancestral graphs in which the joined graph is not ancestral.

Figure 4 provides an example of a joined graph. Note that since there are arrowheads that meet the undirected edge x −−−− w in the joined graph, H is not ancestral as it violates condition (c) of Definition 2.2. Figure 5 shows another example of two Markov equivalent graphs being joined. Here, H is itself a member of the equivalence class of ancestral graphs. β

G

1

x

q

y

β

G



2

x

q

y

β

H

= x

q

y

Figure 5: An example of joining two Markov equivalent ancestral graphs in which the joined graph is itself a member of the equivalence class.

17

3.4

Inferring edge ends and Invariance

In this section we introduce the notion of “invariance” and present results that will help prove the main results of this paper. Definition 3.8 Let G be a maximal ancestral graph. If an edge end in G is present in every graph in the equivalence class, then the edge end is “invariant”. In other words, if a is adjacent to b and either (i) there is a tail at b on the (a, b) edge in every graph in the equivalence class, or (ii) there is an arrowhead at b on the (a, b) edge in every graph in the equivalence class, then the b edge end on the (a, b) edge is invariant. If the (a, b) edge has the same orientation in every graph of an equivalence class, then we say that the (a, b) edge is invariant. Clearly every arrowhead in sup[G], for some maximal ancestral graph G, is invariant. However, only some of the tails are invariant. Corollary 3.1 If π = hx, q1 , . . . , qp , b, yi forms a discriminating path with order for b in a maximal ancestral graph G, and hqp , b, yi is a non-collider in G, then b −−−Â y in every maximal ancestral graph Markov equivalent to G and the (b, y) edge is invariant. Proof: By Markov equivalence, the path analogous to π forms a discriminating path for b and hqp , b, yi is a non-collider in every graph in the equivalence class. By Lemma 3.1 b is a parent of y in every maximal ancestral graph Markov equivalent to G. 2 Partial characterizations of Markov equivalence classes for ancestral graphs have been obtained using POIPGs and PAGs (partial ancestral graphs) by 18

Richardson and Spirtes (2003) and Spirtes et al. (1993). The key difference between PAGs and joined graphs is that, unlike joined graphs, PAGs track not only arrowheads that are invariant in an equivalence class, but also the tails in an equivalence class. Given a maximal ancestral graph G, Spirtes et al. (1993, 1995, 1999) developed the Fast Causal Inference (FCI) algorithm to construct a PAG to represent a set of features common to every graph in [G], but this algorithm is not complete.

4

CONSTRUCTING sup[G]

Meek (1995) presented a set of orientation rules that could be applied to a DAG to construct its associated essential graph. In this section we define a set of orientation rules that can be applied to a maximal ancestral graph G to construct sup[G]. Orientation Procedure (S1) Construct an undirected graph H with the same adjacencies as G. (S2) For all triples x, y, z, if hx, y, zi forms an unshielded collider in G then add arrowheads at y to the (x, y) and (y, z) edges in H thus: x ?−−Â y and y ≺−−? z. (S3) If hx, q1 , . . . qp , b, yi forms a discriminating path for b in H, and hqp , b, yi forms a collider in G then add arrowheads at b to the (qp , b) and (b, y) edges in H thus: qp ?−−Â b and b ≺−−? y. (S4) If hu, a, vi forms an unshielded collider in H, and hu, b, vi forms an unshielded non-collider in H, and a and b are adjacent then add an arrowhead at a to the (a, b) edge in H: a ≺−−? b. (S5) If either of the following hold in H: 19

(S5i) There is an unshielded non-collider ha, b, ci in G, and there is an arrowhead at b on the (a, b) edge in H. (S5ii) hx, q1 , . . . qp ≡ a, b, ci forms a discriminating path for a in H and ha, b, ci forms a non-collider in G then perform the following orientations: (S5a) Add an arrowhead at c on the (b, c) edge in H: b −−−Â c. (S5b) For every vertex z adjacent to b and c, such that there is an arrowhead at b on the (b, z) edge in H add an arrowhead at c on the (c, z) edge: z ?−−Â c. (S5c) For every vertex z adjacent to b and c such that there is an arrowhead at z on the (c, z) edge add an arrowhead at z on the (b, z) edge: z ≺−−? b. (S6) Iterate steps (S3) to (S5) until no further arrowheads are added.

4.1

Soundness of Orientation Rules

Theorem 4.1 The orientation rules given above are sound. Proof: In order to prove that the orientations rules are sound, we must show that the arrowheads introduced by the orientation rules are present in every graph in the equivalence class (i.e. invariant), and thus are present in sup[G]. We first show that the arrowheads introduced by steps (S1) and (S2) are invariant. We then use an induction argument to show that the arrowheads introduced by steps (3) to (6) are invariant. Base case: By the definition of Markov equivalence, all graphs in an equivalence class have the same set of adjacencies and unshielded colliders. So

20

steps (S1) and (S2) are sound. Inductive case: Assume that after k iterations of applying steps (S3) to (S5), the arrowheads in H are invariant. We now show that the next iteration of steps (S3) to (S5) introduce to H arrowheads that are invariant in [G] . (S3): If π = hx, q1 , . . . , qp , b, yi forms a discriminating path for b in H then, by the inductive hypothesis, the analogous path also discriminates b in every member of the equivalence class. By Lemma 3.1, hqp , b, yi is a collider in every member of the equivalence class if and only if hqp , b, yi is a collider in G. Hence step (S3) is sound. (S4): If hu, a, vi is an unshielded collider, then by Markov equivalence, hu, a, vi forms an unshielded collider in every equivalence class member. Similarly, hu, b, vi is an unshielded non-collider in every equivalence class member. Hence, by Lemma 2.2, a ≺−−? b in every equivalence class member and step (S4) is sound. (S5): Conditions (S5i) and (S5ii) are required to show that step (S5a) is valid and that the b −−−Â c edge is invariant in [G]. Steps (S5b) and (S5c) follow as a consequence of the invariance of the b −−−Â c edge. We consider each sub-step separately. (S5a) To show that edge b −−−Â c is invariant and hence, in sup[G]. (S5i) By Markov equivalence, ha, b, ci is an unshielded non-collider in every equivalence class member. If a ?−−Â b ?−−? c in H, then by the inductive hypothesis this arrowhead is invariant in [G]. Further, since ha, b, ci is a non-collider, a ?−−Â b −−−Â c in every equivalence class member. Consequently, b −−−Â c in sup[G]. (S5ii) If π = hx, q1 , . . . , qp , b, ci forms a discriminating path for b in H then, by the inductive hypothesis the analogous path 21

also discriminates b in every member of the equivalence class. Further, if hqp , b, ci is a non-collider in G, then by Corollary 3.1, b −−−Â c in every member of the equivalence class. So if conditions (S5i) or (S5ii) are met, then b −−−Â c in every member of the equivalence class and step (S5a) is sound. Suppose there is some vertex z which is adjacent to b and to c. We now show that steps (S5b) and (S5c) are sound. (S5b) z ?−−Â b −−−Â c ?−−? z in H. By the soundness of steps (S1) to (S5a) the arrowheads at b and at c are invariant in [G]. Hence, by Lemma 2.1 c ≺−−? z in every equivalence class member. (S5c) b −−−Â c ?−−Â z ?−−? b in H. By the soundness of steps (S1) to (S5a) the arrowheads at b and at z are invariant in [G]. Hence, by Lemma 2.1 z ≺−−? b in every equivalence class member. (S6): We have shown that steps (S1) and (S2) are sound, so all arrowheads in H after applying steps (S1) and (S2) are invariant. Furthermore, the arrowheads introduced by each subsequent step are invariant in [G]. Hence the arrowheads in H introduced by step (S6) are invariant. Since all arrowheads introduced by the orientation procedure are invariant in [G], the graph resulting from joining the entire equivalence class will also contain these arrowheads. Hence sup[G] contains all the arrowheads introduced by the orientation procedure and the procedure is sound. 2

4.2

Balanced Triangles

The following characterization of triangles is central to showing that the orientation rules presented above are arrowhead complete (see Theorem 4.2).

22

Definition 4.1 (Balanced) A triangle with vertex set {x, y, z} is said to be balanced at x if one of the following holds: (i) y ?−−Â x ≺−−? z ?−−? y (ii) y ?−−− x −−−? z ?−−? y (iii) y ?−−Â x −−−? z ≺−−? y In summary, the triangle is balanced at x if the edge ends at x are of the same type (arrowhead or tail); if the edge ends differ, then the triangle is balanced if (iii) holds. If every vertex in a triangle is balanced, then the triangle is said to be balanced. Further, a graph containing directed, undirected and bi-directed edges will be said to be balanced if every triangle in the graph is balanced. It can easily be verified that ancestral graphs are balanced. Lemma 4.1 The graph H produced by orientation rules (S1) to (S6) is balanced. Proof: Suppose that H is not balanced. Then there is some triangle hx, y, zi in H in which x ?−−Â y −−−? z −−−? x The proof proceeds by considering the first arrowhead x ?−−Â y introduced by the orientation procedure into a triangle hx, y, zi which remains unbalanced after the procedure has completed. We will show that each step of the procedure either could not have introduced an arrowhead into a triangle which remained unbalanced, or that the supposition that it did implies that there was already an arrowhead that had been introduced earlier into a triangle which remained unbalanced, which is also a contradiction.

23

(S2) Suppose that there is an unshielded collider x ?−−Â y ≺−−? t, where hx, y, zi is a triangle which is unbalanced after the procedure completes. Note that z 6= t since y ≺−−? t, but y −−−? z by supposition. Further, z and t are adjacent since otherwise we have y −−−Â z by step (S5a) and further x ?−−Â z by step (S5b). If hx, z, ti forms an unshielded collider then hx, y, zi is balanced at y which is a contradiction. If hx, z, ti forms an unshielded non-collider then we have y ≺−−− z by step (S4), again a contradiction. (S3) Suppose that hq0 , q1 , . . . , qp−1 , qp ≡ x0 , y, x1 i forms a discriminating path for y in H, and that hx0 , y, x1 i forms a collider in G. Further suppose that one of the arrowheads introduced at y in H leads to an unbalanced triangle hxi , y, zi with some vertex adjacent to y and xi for some i. By definition q0 is not adjacent to x1 , so z 6= q0 . Also by definition, qi −−−Â x1 for 1 ≤ i ≤ p and x0 ≺−−Â y ≺−−? x1 ≺−−− x0 in H. Note that z is adjacent to both x0 and x1 because otherwise we have y −−−Â z by step (S5a) and xj ?−−Â z by step (S5b), which is a contradiction. It follows that z 6= qi 1 < i ≤ p: hq0 , q1 , . . . , qi−1 , qi ≡ z, x1 i forms a discriminating path for qi with qi −−−Â x1 and by step (S5c) z ?−−Â y, which is a contradiction (by assumption z ?−−− y). Thus far we have that x0 −−−Â x1 ?−−? z ?−−? x0 ; and by assumption, z ?−−− y and either x0 ?−−− z or x1 ?−−− z or both in H. Claim I: If any of the vertices {q1 , . . . , qp−1 , qp } are adjacent to z then there is no arrowhead at qi on the (qi , z) edge (i.e. qi −−−? z). If there is an edge qi ≺−−? z in H then π = hq0 , . . . , qi , z, x1 i forms a discriminating path for z. If z is a non-collider on π, then by step

24

(S5a) z −−−Â x1 and z ?−−Â y by step (S5c), which is a contradiction. If z is a collider on π then we have x1 ?−−Â z by step (S3) and x0 ?−−Â z by step (S5c), which is a contradiction. Claim II: qi is adjacent to z for all i, 0 ≤ i ≤ p. Suppose that the claim is false. Let i be the greatest j such that qj is not adjacent to z. We have already shown that qp ≡ x0 is adjacent to z, so i < p. Since qi is not adjacent to z, but qj is adjacent to z for i < j ≤ p, by Claim I we have qj −−−? z. By step (S5i)(S5a) qi+1 −−−Â z and by step (S5ii)(S5a) qj ≺−−− z for (i + 1) < j ≤ p. Hence hqi , . . . , qp , y, zi forms a discriminating path for y, on which y is a non-collider. Hence we have y −−−Â z by step (S5a), and further xk ?−−Â z by step (S5b) (since xk ?−−Â y) for k = 0, 1. Thus the triangles hxk , y, zi are both balanced at y, which is a contradiction. By Claim I and Claim II, qi −−−? z, 1 ≤ i ≤ p and q0 is adjacent to z. Hence hq0 , z, x1 i forms an unshielded triple. If hq0 , z, x1 i is a collider then hx1 , y, zi is balanced at y. Further, since hq0 , q1 , . . . , qp−1 , x0 , x1 i discriminates hqp−1 , x0 , x1 i to be a noncollider, by step (S5a) qp −−−Â x1 , and by step (S5c) we have x0 ≡ qp ?−−Â z (since x1 ?−−Â z). So triangle hx0 , y, zi is also balanced, which is a contradiction. Hence hq0 , z, x1 i is a non-collider. Note that hq0 , q1 , . . . , qp−1 , x0 , x1 i discriminates hqp−1 , x0 , x1 i to be an unshielded non-collider, we have y ≺−−Â x1 in H. If q0 ?−−Â z in H then z −−−Â x1 by step (S5a), and further z ?−−Â y by step (S5c), which is a contradiction. Hence we have q0 ?−−− z in H. But then triangle hq0 , q1 , zi is unbalanced at q1 , and the

25

arrowhead at q1 was present before the arrowheads introduced at y in step (S3). Step (4) Suppose that ha0 , x, a1 i is an unshielded non-collider, and ha0 , y, a1 i is an unshielded collider, and that there is some vertex z forming a triangle hx, y, zi which is unbalanced at y in H. Note that a0 6= z 6= a1 . Further, z is adjacent to each vertex ai : else hai , y, zi forms an unshielded non-collider; by step (S5a) we have y −−−Â z and hence x ?−−Â z by (S5b), which is a contradiction. If ha0 , z, a1 i forms an unshielded collider then x ?−−Â z by step (S4), which is a contradiction. If ha0 , z, a1 i forms an unshielded non-collider then y ≺−−? z, by step (S4), again a contradiction. (S5a) We consider the two subcases separately: (S5i) Suppose that ha, x, yi forms an unshielded non-collider, and there is an arrowhead at x on the (a, x) edge in H. If a and z are not adjacent then ha, x, zi forms an unshielded triple. If it is a non-collider, then x −−−Â z in H by step (S5a). If ha, x, zi forms a collider then we have z ?−−Â y by step (S5b). In either case, the triangle hx, y, zi is balanced at y, which is a contradiction. Consequently a and z are adjacent in H. Now consider the triangle ha, x, zi. If we have x ≺−−? z in H, then z ?−−Â y by step (S5b). Similarly if we have a ?−−Â z then z −−−Â y by step (S5a), and x ≺−−? z by step (S5b). In both cases, the triangle hx, y, zi is balanced at y, which is a contradiction. Consequently, we have x −−−? z and a ?−−− z in H. But then triangle ha, x, zi is unbalanced at x and this arrowhead was introduced before the arrowhead at y on edge (x, y). (ii) Suppose that hq0 , . . . , qp , x, yi forms a discriminating path for x 26

and hqp , x, yi forms a non-collider. Then qp −−−Â y in H. If qp and z are not adjacent then hqp , y, zi forms an unshielded non-collider; y −−−Â z by step (S5a); and thus x ?−−Â z by step (S5b), which is a contradiction. Thus qp and z are adjacent.

If qp ≺−−? z in H then hq0 , . . . , qp , z, yi forms a discriminating path for z. If hqp , z, yi is a collider then, since hqp , x, yi is discriminated to be a non-collider, by step (S5c) we have x ?−−Â z. If hqp , z, yi is a non-collider then by step (S5a) we have z ?−−Â y in H. In both cases triangle hx, y, zi is balanced at y, which is a contradiction. Hence qp −−−? z in H. But in this case hqp , x, zi forms an unbalanced triangle at qp and the arrowhead on the (qp , x) edge was present before the arrowhead at y was added. (S5b) Let b and y be vertices satisfying either (i) or (ii) in the antecedent of step 5, i.e. either there is an unshielded non-collider ha, b, yi, or there is a discriminating path for b terminating at y on which b is a non-collider. Suppose that x ?−−Â b, and hence we have x ?−−Â y by step (S5b). Further suppose there is a vertex z forming a triangle with x and y that is unbalanced at y. Note that z 6= b since x ?−−Â b, but x ?−−− z by hypothesis. If b and z are not adjacent then hb, y, zi forms an unshielded non-collider. Since b −−−Â y, we have y −−−Â z in H by step (S5a), and further x ?−−Â z by step (S5b), which is a contradiction. Hence b and z are adjacent. If b ≺−−? z in the completed graph then y ≺−−? z by step (S5b) since b −−−Â y by step (S5a). Thus b −−−? z in H. But now hb, x, zi forms a triangle which is unbalanced at b, but the arrow at b on the (b, x) edge was present before the arrow at y was introduced.

27

(S5c) Let x and b be vertices satisfying either (i) or (ii) in the antecedent of step (S5), i.e. either there is an unshielded non-collider ha, x, bi with an arrowhead at x on the (a, x) edge, or there is some discriminating path for x, terminating at b, on which x is a non-collider. Further suppose that b ?−−Â y, and there is a triangle hx, y, zi. By rule (S5c) we have x ?−−Â y, and again we suppose that hx, y, zi is unbalanced at y. Note that b 6= z since we have x −−−Â b but x ?−−− z. If b and z are not adjacent then b ?−−Â y −−−? z forms an unshielded non-collider. Hence by step (S5a) we have y −−−Â z and by step (S5b) x ?−−Â z, which is a contradiction. Hence b and z are adjacent. If b ?−−Â z in H then x ?−−Â z by step (S5c) since x −−−Â b by step (S5a). Thus b ?−−− z in H. But now hb, y, zi forms an unbalanced triangle at y, and the arrowhead at y on the (b, y) edge was present before the arrowhead was introduced on the (x, y) edge by step (S5c). 2

The balanced property of the graph H is powerful in that it helps us to characterize the graph output by the orientation rules. The analogy of the Lemma 2.1 for H is: Corollary 4.1 Let H be the graph produced by the orientation rules. If in H either i) a −−−Â b −−−− c, or ii) a ↔ b −−−− c, then a −−−Â c or a ↔ c respectively. Proof: a is adjacent to c else edge b −−−− c contradicts Step (S5i)(a) of the orientation rules. By two applications of Lemma 4.1 we can complete the proof: the triangle is balanced at b hence a ?−−Â c in H; similarly, the triangle is balanced at b hence a ↔ c if and only if a ↔ b in H. 2 Corollary 4.1 states that the endpoints of an undirected edge in H, that was not undirected in the maximal ancestral graph G that gave rise to H, 28

share the same parents and spouses. Corollaries 4.2 and 4.3 will be instrumental in proving Theorem 4.2. Before stating and proving these results, we make the following definitions. Definition 4.2 Let π = hx0 , x1 , . . . , xn i be a path in a graph. Any vertex xj is non-consecutive to xi , 1 ≤ i ≤ n if |j − i| > 1, provided such a xj exists. Definition 4.3 Let inducing path π = hq0 , . . . , qn i and a0 be the first vertex after qi on the directed path from qi to an endpoint, for some 1 < i < n. Then π is said to be a concise inducing path if: (i) a0 is adjacent to a vertex qj , 1 ≤ j < i; (ii) a0 is adjacent to a vertex qk i < k ≤ n; and (iii) hq0 , . . . qj , a0 , qk , . . . qn i forms an inducing path. In other words, if π is a concise inducing path then it is not possible to form a new inducing path with a shorter directed path from the collider to an endpoint via the replacement described. It is easy to see that if there is an inducing path between q0 and qn then there is a concise inducing path. A directed path is said to be minimal if no subsequence of vertices with the same endpoints forms a directed path. Proposition 4.1 If π is an inducing path between q0 and qn then there exists a path π ∗ that is both a minimal and concise inducing path. Corollary 4.2 Let H be the graph produced by the orientation rules. Suppose H∗ , the induced subgraph of obtained by removing the undirected edges from H, forms an ancestral graph. Then H∗ is maximal.

29

Proof: Suppose for a contradiction that H∗ contains an inducing path with non-adjacent endpoints. Let π = hq0 , q1 , ..., qn i be a concise minimal inducing path in H∗ . Then π is a collider path; q1 is an ancestor of qn ; qn−1 is an ancestor of q1 ; and every other non-endpoint is an ancestor of an endpoint. Note that by Theorem 4.1, every edge on the collider path π is invariant. Since H∗ is ancestral, n ≥ 2 (otherwise q0 ?−−Â q1 −−−Â · · · −−−Â q0 or q2 ?−−Â q1 −−−Â · · · −−−Â q2 would violate the ancestral assumption). Claim A : If qi −−−? qj , |i − j| > 1, in H then qi −−−Â qj , and this edge is invariant. Suppose for a contradiction that qi−1 ?−−Â qi −−−− qj . Without loss of generality, suppose i < j ≤ n. By Corollary 4.1 qi−1 ?−−Â qj . Since π is minimal, i 6= 1. Therefore qi−1 ≺−−Â qi −−−− qj , and by Corollary 4.1 qi−1 ≺−−Â qj . But then hq0 , . . . , qi−1 , qj , . . . , qn i violates the minimalty of π. Hence qi −−−Â qj in H and hence in H∗ . Suppose for a contradiction that the edge qi −−−Â qj is not invariant. Then qi−1 is adjacent to qj (to shield non-colider hqi−1 , qi , qj i) and by Lemma 4.1 qi−1 ?−−Â qj . By the minimality of π, qi−1 ≺−−Â qj is ruled out. Hence qi−1 −−−Â qj . But then either hq0 , . . . , qi−1 , qi , qj i forms a discriminating path with order for {qi−1 , qi , qj }; or q0 −−−Â qj and hq0 , qj , . . . , qn i is more minimal than π, both of which are contradicitons. The case 0 ≤ j < i < n uses a symmetric argument. Let hqi , a0 , a1 , . . . am = qt i for some 1 ≤ i < n, and t = 0, n, be the minimal directed path from vertex qi to an endpoint. Claim B : The edge qi −−−Â a0 is invariant. Suppose for a contradiction that the edge qi −−−Â a0 is not invariant. By Claim A, we have a0 6= qj , j 6= i. Recall that n ≥ 2. There are three subcases to consider: 30

(i) Suppose i = 1 and edge q1 −−−Â a0 is invariant. Then non-colliders q0 ?−−Â q1 −−−Â a0 } and q2 ≺−−Â q1 −−−Â a0 } are shielded, and by Lemma 4.1 q0 ?−−Â a0 ≺−−? q2 . Further, if we have a0 ≺−−Â q2 then hq0 , a0 , q2 , . . . , qn i violates the conciseness of π. Hence q2 −−−Â a0 . Consider the smallest j, 2 < j < n, such that qj is not a parent of a0 . Then qj is adjacent to a0 : otherwise hqj , qj−1 , . . . , q1 , a0 i forms a discriminating path for hq2 , q1 , a0 i. By Lemma 4.1 qj ?−−Â a0 and by the minimality of π qj ≺−−Â a0 is ruled out. It follows that qj −−−Â a0 for 2 < j < n and qn ?−−Â . But then hq0 , a0 , qn i violates the conciseness of π. Hence edge q1 −−−Â a0 is invariant. (ii) The case where i = n−1 is symmetric to (i), and edge qn−1 −−−Â a0 is invariant. (iii) Suppose 1 < i < (n − 1), and edge qi −−−Â a0 is not invariant. Then the non-colliders qi−1 ?−−Â qi −−−Â a0 and qi+1 ≺−−Â qi −−−Â a0 are shielded, and by Lemma 4.1 qi−1 ?−−Â a0 ≺−−? qi+1 . If qi+1 ≺−−Â a0 then qi+2 is adjacent to a0 (else hqi+2 , qi+1 , qi , a0 i forms a discriminating path for a0 ). Let k be the largest r in a sequence such that qr −−−Â a0 , i ≤ r < k and either qk ≺−−Â a0 or qn −−−Â a0 . By a symmetric argument, there exists a j that is the smallest s in a sequence such that qs ≺−−Â a0 , j < s ≤ i and either qj ≺−−Â a0 or q1 −−−Â a0 . But all of these cases: q0 ?−−Â a0 ≺−−? qn , q0 ?−−Â a0 ≺−−Â qk , qj ≺−−Â a0 ≺−−? qn , and qj ≺−−Â a0 ≺−−Â qk violate the conciseness of π, which is a contradiction. Since all cases lead some contradiction, the edge qi −−−Â a0 is invariant. Claim C : Every edge on the minimal directed path δ i = hqi , a0 , a1 , . . . am = qt i for some 1 ≤ i < n, and t = 0, n is invariant. 31

Suppose for a contradiction that this is not the case. B, edge qi −−−Â a0 is invariant.

By Claim

Let j be the smallest r such that

edge ar −−−Â ar+1 is not invariant for 0 ≤ r < m. Then non-collider qj−1 −−−Â qj −−−Â qj+1 is shielded and by Corollary 4.1 qj−1 −−−Â qj+1 , but then the path hqi , a0 , . . . , qj−1 , qj+1 , . . . , qt i violates the minimality of the directed path δ. So, by an inductive argument on the edges along δ i , we find that every edge is invariant. Since all the edges along δ i are invariant for every 0 < i < n, and the edges along collider path π are invariant, π forms an inducing path in the maxmial ancestral graph G that gave rise to H. Further, since G is maximal, but q0 and qn are not adjacent in H∗ , we have q0 −−−− qn in H. In particular, there is some path qn ?−−Â qn−1 −−−Â a0 . . . −−−Â am −−−Â q0 −−−− qn in H. By Corollary 4.1 am−1 −−−Â q0 . By induction we have ak −−−Â q0 for 0 ≤ k ≤ m, and qn−1 −−−Â q0 . But then triangle qn ?−−Â qn−1 −−−Â q1 −−−− qn is not balanced at qn−1 , which is a contradiction. Hence H∗ is maximal. 2

Corollary 4.3 Let H be the graph produced by the orientation rules. Then no replacement of the undirected edges in H by directed edges will give rise to an inducing path with non-adjacent endpoints, that includes an edge that was oriented by the orientation rules. Proof: Let H∗ be a graph obtained by replacing the undirected edges in H by directed edges, and suppose H∗ contains an inducing path π with non-adjacent endpoints. Let π ∗ = hq0 , q1 , ..., qn i be the most concise minimal such path. Since we focus on replacing undirected edges by only directed edges, the collider path hq0 , q1 , ..., qn i was oriented by the orientation rules. Label the minimal directed path from a vertex qi to an endpoint as δ i , 1 ≤ i < n. Then there is at least one edge along some path 32

δ i = hqi , a0 , a1 , . . . , am−1 , am = qt i for t = 0, n, that was not oriented by the rules, i.e. is udirected in H. Suppose, for a contradiction, that aj−1 −−−− aj in H for some 0 ≤ j ≤ m, a−1 = qi , for some 1 ≤ i < n. j=0: If qi −−−− a0 in H, then since qi−1 ?−−Â qi ≺−−? qi+1 with qi −−−Â a0 , we have that qi−1 and qi+1 are adjacent. Further, by Lemma 4.1 qi−1 ?−−Â a0 and qi+1 ?−−Â a0 . If n = 2, then we have q0 ?−−Â a0 ≺−−? q2 which violates the conciseness of the path π ∗ . If n ≥ 3 then there are three cases to consider: (a) i = 1. By Corollary 4.1 q0 ?−−Â a0 ≺−−Â q2 (since a0 −−−− q1 ≺−−Â q2 in H), which violates the conciseness of π ∗ . (b) i = n − 1. By Corollary 4.1, we have qn ?−−Â a0 ≺−−Â qn−2 (since a0 −−−− qn−1 ≺−−Â qn−2 in H), which violates the conciseness of π ∗ . (c) 1 < i < n − 1. Since qi−1 ≺−−Â qi ≺−−Â qi+1 in H, and qi −−−− a0 , by Corollary 4.1 qi−1 ≺−−Â a0 ≺−−Â qi+1 . But then hq0 . . . , qi−1 , a0 , qi+1 , . . . , qn i violates the conciseness of π ∗ . Hence edge qi −−−Â a0 is in H. Consider the smallest r in sequence such that ar−1 −−−− ar in H, for 0 ≤ r ≤ m. By the above argument, r 6= 0. By assumption, we have ar−2 −−−Â ar−1 −−−− ar , and by Corollary 4.1 ar−2 −−−Â ar , which violates the minimality of the directed path δ i . Hence, by induction, we find that no edge aj−1 −−−Â aj is undirected in H, which is a contradiction. 2

4.3

Orientation Rules are Arrowhead Complete

Another key concept required to prove that the orientation rules are complete involves defining an order on the variables in a graph. A graph is 33

chordal if and only if every cycle over four or more vertices has an edge between two non-adjacent vertices, i.e. has a chord. A partial order (≺) for a graph induces an orientation such that if x ≺ y, then there is no directed path from y to x (y is not an ancestor of x). For an undirected graph U, a total order induces an orientation such that for {x, y} adjacent, x −−−Â y if and only if x ≺ y. Let Uα be the induced directed graph obtained by a total order α. Then we say that U has a consistent order α if and only if Uα has no unshielded colliders. We will make use of the following results for undirected graphs, presented by Meek (1995). Lemma 4.2 For undirected graphs, only chordal graphs have consistent orderings. Lemma 4.3 (Orienting chordal graphs) Let U be an undirected chordal graph. For all pairs of adjacent vertices x and y in U there exist total orderings α and γ which are consistent with respect to U and such that x −−−Â y is in Uα and y −−−Â x is in Uγ . Theorem 4.2 The orientation rules are arrowhead complete. Proof: There are four steps to the proof. I. Removing the undirected edges in H leaves a disjoint union of maximal ancestral graphs. Let H∗ be the graph resulting from removing the undirected edges in H. Suppose for a contradiction that H∗ contains a nonancestral configuration. By construction, there are no undirected edges in H∗ . Hence configurations such as a ?−−Â b −−−− c or a ?−−Â b −−−− c −−−− d −−−Â a do not occur in H∗ . Then H∗ contains a partially directed k-cycle of the form q1 ?−−Â q2 −−−Â · · · −−−Â qk −−−Â q1 . Consider k = 3. By Lemma 4.1, there are no partially directed cycles involving only three vertices in H∗ . By induction, H∗ contains no partially 34

directed k-cycles: any ancestral graph G that gave rise to H∗ would include at least one collider in a k-cycle, hence at least one triple in the k-cycle is shielded in G. Suppose triple {qi−1 , qi , qi+1 } is shielded, 1 < i < k. By Lemma 4.1, qi−1 ?−−Â qi+1 in H, and thus in H∗ too. But then either hq1 , q2 , . . . , qi−1 , qi+1 , . . . , qk , q1 i or hqi−1 , qi , qi+1 i forms a shorter cycle in H∗ , which is a contradiction. Hence H∗ is ancestral (with no undirected edges) and by Corollary 4.2, H∗ is maximal. II. No replacement of the undirected edges in H to directed edges will give rise to a partially directed cycle, an unshielded collider, a collider with order, or an inducing path with non-adjacent endpoints, that includes an edge that was oriented by the orientation rules. By Corollary 4.1, no orientation of the undirected edges in H will give rise to a partially directed cycle or an unshielded collider. By Corollary 4.3, no orientation of the undirected edges in H will give rise to an inducing path with non-adjacent endpoints. Suppose for a contradiction that we can replace, by directed edges, the undirected edges in H such that triple {a, b, c} is discriminated to be a collider by path π = hq0 , q1 , ..., qp = a, b, ci. Call this graph H∗ . Then every non-endpoint of π is a collider oriented by the orientation rules, and a ≺−−Â b ≺−−Â c. Then there exists an i, 1 ≤ i ≤ p such that qi −−−− c in H. Since q0 ?−−Â q1 −−−? c is unshielded, by (S5i)(S5a) the rules would orient q1 −−−Â qn . Further, by (S5ii)(S5a) qi −−−Â qn (since hq0 , q1 , ..., qi−1 , qi , ci forms a discriminating path for a non-collider with qi−1 ≺−−Â qi −−−? c) for 1 ≤ i ≤ p, which is a contradiction. III. Let U be the induced undirected graph obtained by removing from H the directed and bi-directed edges, as well as all undirected edges that have no parents or spouses. Then U is a disjoint union of chordal undirected graphs. Suppose for a contradiction that U is not decomposable. Then all

35

total orderings of U lead to an unshielded collider. Let H∗ be the graph obtained by removing the undirected edges from H. By II., we know that no replacement of the undirected edges in H to directed edges gives rise to a collider with order or an inducing path with non-adjacent endpoints involving the edges oriented by the orientation rules. Consider U 0 , the subgraph of G analogous to U. By Theorem 4.1, U 0 does not contain any colliders with order. By the maximality of G, U 0 does not contain any inducing paths with non-adjacent endpoints. Let D be the skeleton of U 0 . Construct a total order for D as follows: remove all bi-directed edges from U 0 , which leaves a DAG. Find a total ordering compatible with this DAG and orient, as directed edges, the edges in D according to this ordering. Note that every arrowhead in D is present in G on the corresponding directed or bi-directed edge in U 0 . If U is not chordal, then D contains an unshielded collider. But this unshielded collider is also present in G, which is a contradiction. IV. By Lemma 4.3 there are at least two such orderings for every (x, y) edge in U: one in which x −−−Â y and another in which x ≺−−− y. Hence, H is maximally oriented and therefore the orientation rules are arrowhead complete. 2 In this section, we have shown that every arrowhead introduced by the orientation rules is invariant, and therefore is in sup[G]. We have also shown that for every tail in H, the graph constructed by rules, we can find a maximal ancestral graph G ∗ in the equivalence class for which the said tail in H is a tail in G ∗ .

4.4

sup[G] and the essential graph

If an equivalence class contains a DAG, is the joined graph for the entire class the same as the essential graph for the equivalence class for DAGs?

36

Theorem 4.3 Let D be a DAG with observed variables O and no latent variables. Let G be a maximal ancestral graph over the same observed variables. If D is Markov equivalent to G then the essential graph for D equals sup[G]. Proof: Let H be the graph resulting from applying the orientation procedure to G, and E be the graph resulting from applying Meek’s rules for DAGs to D. By Markov equivalence, D and G entail the same set of unshielded colliders. Hence the graph H, after applying steps (S1) and (S2) to G, is the same as E after applying steps (E1) and (E2) to D. By definition, DAG D contains no bi-directed edges. Consequently, (i) all discriminating paths in D discriminate non-colliders; and (ii) any discriminating path with order has order of at most one. By Markov equivalence, (i) and (ii) also apply to G. Hence step (S3) and condition (S5ii) do not apply to G. So step (S4) is equivalent to step (E5), step (S5a) is the same as step (E3), and steps (S5b) and (S5c) are equivalent to step (E4). Note that the orientation rules for maximal ancestral graphs are not sensitive to the order in which steps (S3) to (S5) are performed (since we iterate through them in step S(6). Consequently, the operations performed on G are equivalent to the steps performed on D. Hence the essential graph E = sup[G]. 2

5

Conclusions and Future Work

Ancestral graphs are a class of graphs that can represent the independence relations holding among the observed variables of a DAG model with latent and selection variables. Unfortunately, as with DAG models, there often are a number of ancestral graphs that can encode the same independence model. Joined graphs, which extract the arrowheads common to Markov equivalent ancestral graphs, allow one to associate a unique graph with each

37

ancestral graph equivalence class. In this paper we have presented a set of orientation rules such that, given a member of an equivalence class, one could construct the corresponding joined graph for the entire equivalence class (i.e. sup[G]). We have also shown that if an equivalence class contains a DAG, then the joined graph for the entire class the same as the essential graph for the equivalence class for DAGs. The proof of Theorem 4.2 shows that sup[G] can be decomposed into three subgraphs: one which is a maximal ancestral graph with no undirected edges, another is made up by a disjoint union of undirected chordal graphs, and the third subgraph is a non-chordal undirected graph (possibly empty). It also suggests a way to construct a member of the equivalence class with the minimum number of additional arrowheads. Drton and Richardson (2004a) presented an algorithm for fitting ancestral graphs. Their results, along with the above implications of the completeness proof form the framework for conducting an efficient equivalence class search across maximal ancestral graphs. The authors are in the process of developing a Markov equivalence class search procedure for maximal ancestral graphs. Acknowledgments The authors are thankful to Michael Perlman for his helpful comments and for posing the question of whether the essential graph for DAG D is equal to sup[G] if D ∈ [G]. This research was supported by the National University of Singapore.

References Ali, R. A. and T. Richardson (2002). Markov equivalence classes for maximal ancestral graphs. In A. Darwiche and N. Friedman (Eds.), Uncertainty in 38

Artificial Intelligence: Proceedings of the 18th Conference, San Francisco, pp. 1–8. Morgan Kaufmann. Ali, R. A. and T. Richardson (2004). Searching across markov equivalence classes of maximal ancestral graphs. In 2004 Proceedings of the American Statistical Association, Statistical Computing Section [CD-ROM], Toronto, ON. American Statistical Association. Ali, R. A., T. Richardson, and P. Spirtes (2004). Markov equivalence for ancestral graphs. Technical Report 466, Department of Statistics, University of Washington. Andersson, S. A., D. Madigan, and M. D. Perlman (1997). A characterization of Markov equivalence classes for acyclic digraphs. Ann. Statist. 25, 505–541. Chickering, D. (1995). A transformational characterization of equivalent Bayesian networks. In P. Besnard and S. Hanks (Eds.), Proceedings of the Eleventh Annual Conference on Uncertainty in Artificial Intelligence, San Mateo, CA, pp. 87–98. Morgan Kaufmann. Chickering, D. (2002a). Learning equivalence classes of Bayesian network structures. Journal of Machine Learning Research 2, 445–498. Chickering, D. (2002b). Optimal structure identification with greedy search. Journal of Machine Learning Research 3, 507–554. Drton, M. and T. Richardson (2004a). Iterative conditional fitting for gaussian ancestral graph models. In Proceedings of the 20th Annual Conference on Uncertainty in Artificial Intelligence (UAI-04), Arlington, Virginia, pp. 130–137. AUAI Press.

39

Drton, M. and T. Richardson (2004b). Simplicial sets and orientable edges in graphical models for marginal independence. Technical report, Department of Statistics, University of Washington. Frydenberg, M. (1990).

The chain graph Markov property.

Scand. J.

Statist. 17, 333–353. Meek, C. (1995). Causal inference and causal explanation with background knowledge. In P. Besnard and S. Hanks (Eds.), Uncertainty in Artificial Intelligence: Proceedings of the 11th Conference, San Francisco, pp. 403– 410. Morgan Kaufmann. Richardson, T. and P. Spirtes (2003). Causal inference via ancestral graph models. In P. Green, N. Hjort, and S. Richardson (Eds.), Highly Structured Stochastic Systems. Oxford: Oxford University Press. Richardson, T. S. and P. Spirtes (2002). Ancestral graph Markov models. Annals of Statistics 30 (4), 962–1030. Spirtes, P., C. Glymour, and R. Scheines (1993). Causation, Prediction and Search. Number 81 in Lecture Notes in Statistics. Springer-Verlag. Spirtes, P., C. Meek, and T. S. Richardson (1995). Causal inference in the presence of latent variables and selection bias. In P. Besnard and S. Hanks (Eds.), Uncertainty in Artificial Intelligence: Proceedings of the 11th Conference, San Francisco, pp. 403–410. Morgan Kaufmann. Spirtes, P., C. Meek, and T. S. Richardson (1999). Causal inference in the presence of latent variables and selection bias. In C. Glymour and G. F. Cooper (Eds.), Computation, Causality and Discovery, pp. 211– 252. Cambridge, Mass.: MIT Press.

40

Verma, T. and J. Pearl (1990). Equivalence and synthesis of causal models. In M. Henrion, R. Shachter, L. Kanal, and J. Lemmer (Eds.), Uncertainty in Artificial Intelligence: Proceedings of the 6th Conference, Mountain View, CA, pp. 220–227. Association for Uncertainty in AI. Verma, T. and J. Pearl (1991). Equivalence and synthesis of causal models. Technical Report R-150, Cognitive Systems Laboratory, UCLA.

41

Suggest Documents