Testing implication of hierarchical log-linear models for probability ...

6 downloads 0 Views 2MB Size Report
Consider a probability distribution of a set of discrete vari- ables and suppose ..... The adjacency lists of the intersection graph of S, an order- ing X(1),...,X(s) of ...
Statistics and Computing (1996) 6, 169-176

Testing implication of hierarchical log-linear models for probability distributions F. M. MALVESTUTO

Deptartrnent of Electrical Engineering, University of L'Aquila, 67040 Rojo Poggio (AQ), Italy Received October 1992 and accepted September 1994

The problem of the logical implication between two hierarchical log-linear models is proved to be equivalent to the problem of deciding whether their generating classes satisfy the graphtheoretic condition of hinging. Moreover, a polynomial-timeprocedure is worked out to test implication.

Keywords: Probabilistic dependency models, maximum-entropy distribution, implication problem, intersection graph 1. Introduction Consider a probability distribution of a set of discrete variables and suppose we are interested in searching for dependency models. The problem of model search has several applications in statistics, e.g. for analysis of categorical data (Bishop et al., 1975; Whittaker, 1990) and for approximation to empirical probability distributions (Brown, 1959; Chow and Liu, 1968; Ku and Kullback; 1969; Lewis, 1959; Malvestuto, 1991). Sometimes, dependency models can be inferred on the basis of conceptual relationships among variables (Pearl, 1988); however, it is often the case that such relationships are not known beforehand and dependency models are searched for with an exploratory analysis using time-consuming numerical tests. Now, suppose that we are given a probability distribution and that, on the basis of results of previous numerical tests, we know that certain dependency models hold (e.g. some groups of variables are mutually or conditionally independent); then, we are interested in inferring new dependency models on logical grounds, that is, without resorting to additional numerical tests. More precisely, we focus on the logical question whether a dependency model is or is not a logical consequence of another dependency model (implication problem). If we are able to answer this question, then we can speed up the search for models and only a suitable subset of models need to be tested, since the remaining ones will be accepted or rejected on logical grounds. Other computational applications of the implication problem are discussed in Section 5. 0960-3174 9 1996Chapman& Hall

At present, efficient model-search procedures for some classes of hierarchical log-linear models have been worked out (Edwards and Havranek, 1985; 1987; Havranek, 1984; 1988; 1990a; 1990b; Malvestuto, 1992a), based on sound (and sometimes complete) axiomatizations (Pearl, 1988; Studeny 1989; 1992; Geiger et al., 1991; Havranek, 1991; Geiger and Pearl, 1993; Malvestuto, 1992a; 1994; Malvestuto and Studeny, 1992). In this paper, we show that the implication problem for hierarchical log-linear models can be solved using a polynomial-time algorithm. To this end, we introduce some basic definitions in Section 2 and a graph representation of generating classes of hierarchical log-linear models in Section 3. In Section 4 we present a graph algorithm to solve the implication problem, whose correctness is rigorously proved in the Appendix. Finally in Section 5 we discuss further applications of the implication problem.

2. Basic definitions Let V be a finite set of discrete variables with associated finite value sets, called domains. A V-tuple, denoted by v, is an element of the Cartesian product of the domains of the variables in V; moreover, the subtuple of v including all and the only values of the variables belonging to a given non-empty subset X of V is called the projection of v on X. Now let S = { J k Z l , . . . , Xs} (s ~ 1) be a non-redundant set class over a non-empty (possibly improper) subset J( of V (that is, X = [_Jixi and no set in S is included in

Malvestuto

170

{ABe}

another set). The (hierarchical log-linear) model on X generated by S is a sentence which states that the variables in X are 'separable' according to S, which is called the generating class of the model and whose members are called the generators of the model. More precisely, we say that the model generated by S holds for a probability distribution p on V if there exist non-negative functions ~)I(X1),..., ~)s(Xs) such that for all x

p ( x ) = el(X1)

X

...

X

{AB, AC, BC}

r

where p(Y) is the marginal o f p on X; in equivalent terms (see, for example, Asmussen and Edwards, 1983), the model generated by S holds for p if there exist functions AI(X1),...,As(Xs) taking their values from the interval [ - ~ , + ~ ) such that for all x

{AB, AC}

{AB, BC}

log p(x) = )~l(Xl) +...--[- ~ks(Xs) , with the convention that log 0 = - o o . The model generated by S is said to be marginal or full depending on whether X is a proper subset of V or X = V. Moreover, a full or marginal model is trivial if it has exactly one generator. Finally, let a and/3 be two models on the same set of variables; we say a is simpler than/3 (or, equivalently,/3 is more general than a) if each generator of a is a subset of some generator of/3. This relation between models on the same set of variables defines a partial ordering and the set of all models on a set of variables X is a lattice L(X) whose zero is the model generated by the point partition of X and whose unit is the trivial model on X. Special families of models (independence models, conditional independence models, totally decomposable models, decomposable models and graphical models) have been defined by making assumptions on generating classes (Malvestuto, 1992b). A probability distribution for which a model a holds is said to satisfy a. Notice that every probability distribution satisfies the trivial model on every subset of V. Now, given two (full or marginal) models a and/3, we say that a logically implies/3 (denoted by a ~ / 3 ) or, equivalently, that/3 is a logical consequence of a, if every probability distribution that satisfies a also satisfies/3. Notice that the implication a ~ / 3 always holds if /3 is any saturated model, or/3 is more general than a. On the other hand, if a is a marginal model and/3 is a full non-saturated model, then the implication does not hold. We shall prove (see Theorem 2 in the Appendix) that, given a full model a and a non-empty set of variables X, the family L~(X) of the models on X that are implied by a is a sublattice of L(X).

Example 1 Let V = ABCD. Consider the full model a generated by {AB, CD} and the marginal model/3 generated by {AB, AC}. It is easy to show that/3 is a logical consequence of a, that is,/3 belongs to L~ (ABC). In fact, consider any probability distribution p(ABCD) that satisfies a, that is,

{aB, C} Fig. 1. The lattice of classes generating the models in L~(ABC)

the variable sets AB and CD are independent in p(ABCD). Since a logically implies the model o / o n ABC generated by {AB, C} (that is, AB and C are independent) and/3 is more general than a r, one has a ~ a' ~ / 3 . With similar arguments we can determine L~(ABC). Figure 1 shows the lattice of the classes over ABC that generate the models in L~(ABC). Turning to the implication problem raised in Section 1, with the exception of trivial cases we state it in the following precise terms: given a full model a and a marginal or full model /3, decide whether /3 is or is not a logical consequence of a.

3. Graph representation of generating classes Generating classes over a fixed set are partially ordered by the covering relation: a generating class S covers a generating class R if each member of R is a subset of some member of S. (Notice that the condition that S covers R is equivalent to the condition that the model generated by R is simpler than the model generated by S.) Given a generating class S and an ordering X(1),..., X(s) of its members, with S we associate a graph G with node set N = { 1 , 2 , . . . , s} such that node i is labelled by X(i) (for i c N). Moreover, two nodes i a n d j of G are joined by edge (i, j ) if and only if X(i) A X(i) r 0 and the edge (i, j) is labelled by the (non-empty) variable set X(i) N X(i). In what follows we refer to G as the intersection graph of S. Let G be the intersection graph of a generating class S, V(G) the union of labels of nodes of G and U a non-empty subset of V(G); the partial graph of G induced by U,

Testing implication of hierarchical log-linear models

@

A

@

C

(

171 that, by Remark 2, V is convex in S, and that each member of S is convex in S.

~

AB AC CD Fig. 2. The intersectiongraph of S = {AB, AC, CD} denoted by (G)~, is the graph obtained from G by deleting all edges whose labels are subsets of U. If G/ is a connected component of (G) u, the boundary of G' is the intersection of V(G~) with U. By Oc~Swe denote the set of the maximal (with respect to set inclusion) boundaries of the connected components of (G)u; we call OuS the derivative of S with respect to U. Finally, we introduce the key notion we need to solve the implication problem raised in Section 1. We say that S hinges upon a generating class R over U or, equivalently, that R is a generalized hinge of S (Gyssens, 1985; Malvestuto, 1992b) if R covers O~S. Example

2 Consider the generating class S = {AB, AC, CD}. The intersection graph G of S is shown in Fig. 2. Let U = BCD. Figure 3 shows (G)u. The boundaries of the two connected components of (G)v are BC and CD; so, we have that OvS= {BC, CD}. Thus, for example, S does not hinge upon {B, CD}. Remark 1 For each variable A E V(G)\U there exists exactly one connected component G' of (G)~: such that A c

Remark 2 If U = V(G), then the nodes of (G)v are all isolated; therefore, OrS = S. Consequently, S binges upon a generating class R over V(G) if and only if R covers S. Let S be a generating class over V and U a non-empty (proper or improper) subset of V. Denote by (S)v the generating class over U obtained by taking the maximal (with respect to set inclusion) intersections of members of S with U. Generally speaking, if a member X of S is not a subset of U then, by Remark 1, JfN U is a subset of the boundary of one connected component of (G)v; therefore, OrS always covers (S)v. We say that U is convex ('closed' in Malvestuto, 1992b) in S if (S)u = OrS, that is, if the boundary of each connected component of (G)e is a subset of some member of S. It is easily seen that U is convex in S if and only if S hinges upon o n (S) v. Notice

Example 2 (continued) Consider again the set U = BCD. Then the generating class (S)v is {B, CD} and, since (S)u ~ OuS, U is not convex in S. An example of a convex set in S is ABC.

4. The algorithm In this section we present an algorithm which allows us to answer the question raised by the implication problem. Let S be a generating class over a variable set V and R a generating class over a non-empty subset U of V; the question is whether the model generated by R is or is not a logical consequence of the model generated by S. To answer this question we resort to the following result stated by Malvestuto (1992b): the model generated by R is a logical consequence of the model generated by S if and only if S hinges upon R. (The proof that appears in Malvestuto, 1992b, is very sketchy; a rigorous proof is stated in the Appendix.) Now, to decide whether S hinges upon R we resort to the very definition which naturally leads to a polynomial-time algorithm. An efficient implementation can be obtained by representing S by its intersection graph G whose structure is specified by the adjacency lists of its nodes. Then, given the set U of variables appearing in R, it is easy to find the connected components of (G)v and test their boundaries for inclusion in members of R, as stated by the following procedure.

Hinging algorithm The connected components of (G)v are found with a traversal of G starting at vertex 1. Once a connected component of (G)u has been found, its boundary is tested for inclusion in some member of R. The set variables U, I and W are used to keep track respectively of variables appearing in R, of nodes 'visited' during the traversal of G and of the boundary of a connected component of (G)v when found; moreover, a queue Q is used to guide the traversal of (G)~: according to a breadth-first search strategy.

Input The adjacency lists of the intersection graph of S, an ordering X(1),...,X(s) of members of S and an ordering Y(1),..., Y(r) of members of R.

C) A(D AB

AC

CD

Fig. 3. The partialgraph of the intersectiongraph of Fig. 2 induced

by BCD

Output The truth value of a Boolean variable success, which eventually is true if and only if S hinges upon R.

Malvestuto

172

Initialization Set U:=O;

CD U:=UUY(j)

I:={1};

C)

forj=l,...,r

W : = X ( 1 ) NU;

DF

Q:=[1];

Procedure 1. Extract the first member of Q, say node i, and scan the adjacency list L(i) of node i. Suppose that, during the scansion of L(i), node i' is reached; if i'f~I and X(i) n X(i') is not a subset of U, then 'visit' node i', that is, insert i' into Q and set: W:= W U X ( i ' ) N U

Reach the next member of L(i), if any. After examining all members of L(i), if Q is not empty, then repeat step 1. 2. If W is a subset of no Y(j), 1

Suggest Documents