paper. However, the members diverge about the actual conditional proba- .... functions of past data see Faria 1996 for comments on this . The externally ... modified LogOp of Genest, McConway and Schervish 1986 that answers ...... n4 n the density of X N. X , Z does not depend on when Z is incorporated. n n ..... Ann. Math.
The Annals of Statistics 1997, Vol. 25, No. 4, 1740 ] 1761
CONDITIONALLY EXTERNALLY BAYESIAN POOLING OPERATORS IN CHAIN GRAPHS
´ LVARO E. FARIA, JR.1 BY A
AND
JIM Q. SMITH
University of Warwick We address the multivariate version of French’s group decision problem where the m members of a group, who are jointly responsible for the decisions they should make, wish to combine their beliefs about the possible values of n random variables into the group consensus probability distribution. We shall assume the group has agreed on the structure of associations of variables in a problem, as might be represented by a commonly agreed partially complete chain graph ŽPCG. we define in the paper. However, the members diverge about the actual conditional probability distributions for the variables in the common PCG. The combination algorithm we suggest they adopt is one which demands, at least on learning information which is common to the members and which preserves the originally agreed PCG structure, that the pools of conditional distributions associated with the PCG are externally Bayesian ŽEB.. We propose a characterization for such conditionally EB ŽCEB. poolings which is more general and flexible than the characterization proposed by Genest, McConway and Schervish. In particular, such a generalization allows the weights attributed to the joint probability assessments of different individuals in the pool to differ across the distinct components of each joint density. We show that the group’s commitment to being CEB on chain elements can be accomplished by the group being EB on the whole PCG when the group also agrees to perform the conditional poolings in an ordering compatible with evidence propagation in the graph.
1. Introduction. Bayesian statistical theory is a coherent normative methodology for the individual. However, groups of individuals are left in difficulty since no concept equivalent to the classical notion of objectivity is available to them. Consequently, methods arising either from the axiomatic or the Bayesian modelling approaches w Winkler Ž1986.x have been developed to attempt to deal with this problem. We adopt here the axiomatic approach to the aggregation of beliefs where the group is assumed to choose certain ‘‘desirable’’ and ‘‘reasonable’’ conditions that the aggregating formula should obey and which at the end characterize its form. Early important works in this area are those of Madansky Ž1964, 1978., who proposed the external Bayesianity property Žsee Section 2. as a Bayesian prescription for group belief consensus. Later, Bacharach
Received August 1995; revised November 1996. 1 Supported by Grant 200918 _ 92.3 from CNPq, the Brazilian research council. AMS 1991 subject classifications. Primary 62C10; secondary 62H99. Key words and phrases. Bayesian multivariate decision analysis, expert systems, external Bayesianity, logarithmic opinion pools, chain graphs.
1740
CONDITIONALLY EXTERNALLY BAYESIAN GRAPHS
1741
Ž1972, 1975. introduced the logarithmic opinion pool ŽLogOp. as well as some work on the implication axioms have on pooling rules. Genest Ž1984. and Genest, McConway and Schervish Ž1986. characterized the LogOps as the only pools being EB. Also, McConway Ž1981. showed the linear opinion pool ŽLinOP. of Stone Ž1961. as the only type which satisfied an axiom called the marginalization property. The reader should refer to Genest and Zidek Ž1986. for an excellent survey and annotated bibliography on the axiomatic approaches. Other important results in the area are the impossibility theorems obtained by Bacharach Ž1975., Dalkey Ž1972, 1975. and Lehrer and Wagner Ž1983., among others. In general, those impossibility results mean that no pooling rule for preferences or beliefs works sensibly well in all situations. Combining rules general enough to satisfy certain ‘‘desirable’’ axioms, on the other hand, can fail to be democratic. More closely related to our work, Genest and Wagner Ž1984. proved that only dictatorships could accommodate both the likelihood principle w see Ž3. in Section 2x and the independence preservation property ŽIPP. w Laddaga Ž1977.x . Basically, the IPP demands the pooling on joint quantities to equal the product of the poolings marginally in each quantity Žin the same spirit of the statistical independence for probability functions., whenever the individuals preserve the quantities’ independence on their subjective assessments. Since the arrival of these results, there have been two lines of work which attempt to circumvent the impossibility theorems. One line has attempted to weaken group behavioral assumptions w e.g., Pill Ž1971.x in the hope that the impossibility results would vanish. This has had little success w Winkler Ž1968.x . The other line, to which this paper is a contribution, has looked at weakening the insistence on universal applicability. Here, our assumption that the group agrees that a specific belief network is valid and is examined only in the light of experiments which do not destroy its structure, prevents some of the impossibility results. In particular, the above-mentioned impossibility of Genest and Wagner Ž1984. is circumvented w Faria Ž1996.x . Despite the restrictions imposed on our problem and thus on the pooling operators discussed here, they are of a general enough class to be useful in certain practical situations. For instance, it is easy to devise situations where a common belief net would be an obvious assumption. Moreover, the side condition of the decomposability of the common belief net is not critical since a nondecomposable net can always be made decomposable, as we shall see. The class of decomposable graphical models introduced by Lauritzen, Speed and Vijayan Ž1984. is the one which, under certain restrictions on the form of input data, retains its structure Žcoded in terms of conditional independence statements. after data has been observed. This fact is used extensively to create quick algorithms for calculating posterior distributions in high-dimensional problems w e.g., Dawid Ž1992., Jensen, Jensen and Dittmer Ž1994. and Smith and Papamichail Ž1996.x . However, here we use this same property to define classes of combination rules which are, in a partial sense, externally Bayesian.
1742
A. E. FARIA, JR. AND J. Q. SMITH
In this text we concentrate on generalizing the external Bayesianity property so that on the one hand it is suitable for the analysis of multivariate structures, and on the other it allows the group to learn about the combination’s vector of weights. This responds, at least in part, to some of the criticisms w Lindley Ž1985.x about the inflexibility of the weights on the well-discussed logarithmic pool w Genest and Zidek Ž1986.x which usually are not allowed to vary with the members’ individual expertise. Some examples on how the weights can be updated based on the individuals’ relative expertise on causal variables are given in Faria and Smith Ž1994, 1996. and in Faria Ž1996.. In fact, the weights should be chosen on the basis of the common information that the members of the group have about their own specialities and expertises. A group which cannot or is not prepared to agree about the members’ relative expertise may prefer to adopt a class of combination rules which sequentially adapts the weights based on each experts relative predictive performances, as done, for example, by Smith and Makov’s Ž1978. quasi-Bayes approach w see, e.g., Faria and Souza Ž1995. and Faria Ž1996.x . Other methods on appropriate updating of weights can be found, for example, in Bates and Granger Ž1969., in Mendel and Sheridan Ž1987., in Bayarri and DeGroot Ž1988., in DeGroot and Mortera Ž1991. and in Cooke Ž1991.. Also, Cooke Ž1991. comments on the main problems in developing a theory of weights. Although developed for linear pools, there is in principle no reason why sequential methods should not be adapted to be employed for time series in our generalized logarithmic pools where the weights can be functions of past data w see Faria Ž1996. for comments on thisx . The externally Bayesian group has some advantages. For instance, Raiffa Ž1968. illustrated how the relevance over the order in which the pooling and updating are done can lead to subjects trying to increase their influence on the consensus by insisting that their opinions be computed before the outcome of an experiment is known. If a group is externally Bayesian ŽEB., then all such argument would be pointless. Also, as Genest Ž1984. points out, if such a pool can be agreed, then it has great practical advantages. For example, its members need not meet again after data has been observed. However, the main reasons we have chosen the external Bayesianity criterion on which to base this methodology is twofold. First, because conditional independence ŽCI. statements are at the heart of all belief networks, the types of multiplicative pools which result from external Bayesianity tend to preserve the CI structures which relate to chain graph ŽCG. associations, whereas other pools, like the linear opinion pools w McConway Ž1981.x do not. Second, using a loosened form of the external Bayesianity criterion with a CG structure allows a generalization of the modified logarithmic opinion pool Žmodified LogOp. of Genest, McConway and Schervish Ž1986. that answers some of the criticisms of its use. The structure of the paper is as follows. In Section 2 we review the external Bayesianity property and its logarithmic characterization through pooling operators. In Section 3 we define partially complete chain graphs ŽPCG’s. and state the conditions under which their structure is preserved
1743
CONDITIONALLY EXTERNALLY BAYESIAN GRAPHS
after sampling. A definition of conditional external Bayesianity together with its characterization are proposed in Section 4. In Section 5 we show how CEB poolings in chain elements can characterize an EB pooling on the PCG. The discussion of the results and the plausibility of our assumptions are shown in Section 6. The Appendices contain the proofs of some of the stated theorems. 2. The EB logarithmic opinion pool. Let Ž V, m . be a measure space. Let D be the class of all m-measurable functions f : V ª w 0, `. f ) 0, m a.e. such that Hf d m s 1. A pooling operator T : Dm ª D is that one which maps a vector of functions Ž f 1 , . . . , f m ., where f i g D Žfor all i s 1, . . . , m., into a single function also in D. Madansky Ž1964, 1978. characterized the EB pooling operators as those synthesizing the diverging individual opinions f 1 , . . . , f m into a group opinion expressed by a single density f s T Ž f 1 , . . . , f m . that must satisfy
Ž 1.
T
ž
lf 1 Hlf 1 d m
,...,
lf m Hlf m d m
/
s
lT Ž f 1 , . . . , f m . HlT Ž f 1 , . . . , f m . d m
,
m-a.e.,
where l: V ª Ž0, `. is the group’s common likelihood for Ž f 1 , . . . , f m . such that 0 - Hlf i d m - `, i s 1, . . . , m. Genest, McConway and Schervish Ž1986. proved that the following modified logarithmic opinion pool is, under certain regularity conditions discussed later, the most general EB logarithmic pool satisfying condition Ž1. above:
Ž 2.
T Ž f1 , . . . , fm . s
m g Ł is1 f iw i m Hg Ł is1 f iw i d m
,
m-a.e.,
where g: V ª w 0, `. g ) 0, m a.e. is an arbitrary bounded function on V and wi , i s 1, . . . , m, are arbitrary weights Žnot necessarily nonnegative. adding up to 1. The wi ’s are experts’ opinions weights in the combination and must be suitably chosen to reflect relative expertise. They should possibly be elicited by the experts based on their common knowledge set w see, e.g., Geanokoplos Ž1992.x of their own relative predictive capabilities. One of the above-mentioned regularity conditions on Ž V, m . is that for an existing Lebesgue measurable function P: V = Ž0, `. m ª Ž0, `., the pooling operator T : Dm ª D is such that
Ž 3.
T Ž f1 , . . . , fm . Ž x . s
P x , f 1Ž x . , . . . , fm Ž x . HP Ž ?, f 1 , . . . , f m . d m
,
m-a.e.
This condition is called a likelihood principle in the sense that it restricts the likelihood of the combined density T at a particular point x in V to depend, except for a normalizing constant, on the x and on the individual densities f i , i s 1, . . . , m, assigned to x only through their values at x, and not upon other points and densities of the points which might have occurred but did not. The likelihood principle represented by Ž3. is a strong and rather arbitrary requirement particularly in the case where T is a multivariate density w e.g., Faria and Smith Ž1994, 1996.x . In the generalization of EB and
1744
A. E. FARIA, JR. AND J. Q. SMITH
CEB pools we make in Section 4, we require Ž3. to hold when applied to marginal and conditioned variables related to individual nodes in a graphical representation structure. Another assumption made here is that the underlying space Ž V, m . can be partitioned into at least four nonnegligible sets. In this case such a measure space is called quaternary. This therefore includes the case where m is Lebesgue and many Žbut not all. cases where m is a counting measure. McConway Ž1978. proved that in the case where V is countable and m is a counting-type Žpurely atomic. measure then the formulas of the type Ž2. would be the only ones to qualify as EB if Ž3. holds. In the case where V is purely continuous Žexcluding thus the case where the sample space is countable., Genest Ž1984. showed that the only nondictatorship externally Bayesian pooling operator satisfying Ž3. applied to a P that is not indexed by x, that is, T Ž f 1 , . . . , f m .Ž x . s P w f 1Ž x ., . . . , f m Ž x .x rHP Ž f 1 , . . . , f m . d m , m -a.e., is Bacharach’s Ž1972. logarithmic opinion pool ŽLogOp.:
Ž 4.
T Ž f1 , . . . , fm . s
m Ł is1 f iw i m H Ł is1 f iw i d m
,
m-a.e.,
where wi G 0, i s 1, . . . , m, are arbitrary constants such that Ý m is1 wi s 1. Unfortunately, as currently developed, EB pools have serious drawbacks. Perhaps the most obvious one is that, as Lindley Ž1985. points out, it appears perverse that the weights w s Ž w 1 , . . . , wm . must be common knowledge to the group a priori. Surely as evidence appears which sheds light on the relative expertise of its members, the group should agree to adapt its weights to favor the better forecasters w e.g., Faria and Smith Ž1994, 1996.x . It should be natural that, given the members’ commonly held information about their specialities, relative expertise and predictive capabilities, the weights associated with some variables be allowed to depend on the value of other variables in the problem. As we mentioned in the Introduction, our methodology allows for this Žsee also Section 4.. First, we shall introduce the conditions under which the class of the special chain graphs we adopt, the PCG’s, have their structures preserved posterior to the input of observed data. 3. The preservation of PCG structures. In this section we define a useful subclass of graphical models, the PCG’s Žpartially complete chain graphs., and an associated family of likelihood functions. Provided the members of the group are able to agree on a causal ordering of the variables and on a dependence structure between them, as embodied in a chain graph ŽCG., then a class of combination rules much richer than Žbut containing a special case of. the EB combination rules given in Ž2. can be obtained. The way this is achieved is by demanding a weakened form of external Bayesianity to hold only when the agreed likelihood does not destroy the conditional independence ŽCI. structure on the variables represented by the group’s agreed PCG. The properties about the set of uncertain measurements X s Ž X1 , . . . , X n . stated below are assumed to be common knowledge ŽCK. to all members of
CONDITIONALLY EXTERNALLY BAYESIAN GRAPHS
1745
the group. In fact, there are two basic required conditions about the beliefs of the group’s members. The first is that all members agree on the association structure on X such that the following holds. PROPERTY 3.1. Ži. The vector X can be represented as an ordered list of subvectors X1 , . . . , X n , where it is CK to all members of a group G that, for j s 1, . . . , n, the random vector X j receives a directed association from Žor more loosely is caused by . p Ž X j ., where p Ž X j . is a subvector of Ž X 1 , X 2 , . . . , X jy1 ., henceforth called the group parent set of the chain element X j . Žii. It is CK to the group G that X j is conditionally independent of the elements in Ž X 1 , . . . , X jy1 . which are not in p Ž X j ., given its parent set p Ž X j ., that is, in the usual notation of Dawid Ž1979., X j @ Ž X1 , . . . , X jy1 . _ p Ž X j . 4 < p Ž X j . ,
j s 1, . . . , n.
Assuming that all members have a common dominating measure we can then assert that the ith member of the group can write his joint density over X, f i Ž x ., in the form fi Ž x . s
n
Ł
js1
fi j x j N p Ž x j . ,
i s 1, . . . , m,
where x j and p Ž x j . are defined above for j s 2, . . . , n and p Ž x 1 . s B. Henceforth we shall impose the usual positivity condition that f i j w x j N p Ž x j .x ) 0 for each value of p Ž x j . at all values of x j in this space, which is common to all members of the group. The association structure described by Ži. and Žii. above defines a class of multivariate structures which are a subclass of graphical models called chain graph ŽCG. models. Briefly, a CG is a graph which does not contain any directed cycles and whose set of nodes Žvertices. can be partitioned into numbered subvectors. These subvectors form a dependence chain such that all associations between nodes within the same subvector are undirected Žforming the chain elements of the CG. and all associations between components of distinct subvectors are directed. A directed edge from the component of a lower labelled vector points to a component of a higher labelled vector. The literature on these models is extensive w e.g., Wermuth and Lauritzen Ž1990., Frydenberg Ž1990. and Lauritzen Ž1996.x . The PCG’s are defined as follows w see Faria and Smith Ž1994. or Faria Ž1996. for an examplex . DEFINITION 3.2 ŽPCG.. A chain graph G p Ž X . with nodes labelled by its chain elements X 1 , . . . , X n is called a partially complete chain graph if, for each j s 1, . . . , n, all the components of the subvector Žchain element. X j of G p Ž X . are connected together to form a complete undirected subgraph. A directed edge connects X g X i to Y g X j if and only if X i is a subvector of p Ž X j ., where p Ž X j . is the parent set of X j .
1746
A. E. FARIA, JR. AND J. Q. SMITH
The PCG’s are ones which, by judicious changes of definition, can be represented by influence diagrams ŽIDs. on vectors of variables w e.g., Smith Ž1989. and Queen and Smith Ž1993.x . The ID induced by the PCG defined above consists of the statements of Property 3.1 together with a directed graph whose n nodes are labelled X 1 , . . . , X n and where X i is connected by an edge to X j if and only if X i is a subvector of p Ž X j .. We shall call a PCG decomposable, G p d Ž X ., if the ID induced by G p Ž X . is decomposable, that is, when for some k Ž k s 1, . . . , n., X i , X j are both subvectors of p Ž X k ., i / j, then either X i is a subvector of p Ž X j . or X j is a subvector of p Ž X i .. Decomposable models are very common in statistics. They include the cases when Ža. all variables are independent of one another, Žb. they are all dependent on one another and Žc. they form a Markov chain. They also include most hierarchical models like the dynamic linear model w West and Harrison Ž1989.x . If a model is not originally decomposable it can be made so by ignoring certain stated conditional independencies w see Lauritzen and Spiegelhalter Ž1988. for an example of how to do thisx . The moral graph obtained from a CG G Ž X . is the undirected graph G m Ž X . with the same set of nodes X but with any pair of nodes being joined together by an undirected edge in G m if and only if Ži. they were already joined in G Žby either a directed or an undirected edge. or Žii. they both were parents of nodes in the same chain element. Thus, the moral graph of a directed acyclic graph is built up by joining with undirected edges all nodes that have a common child and by replacing all directed edges by undirected ones w see, e.g., Lauritzen and Speigelhalter Ž1988.x . Since the components of the chain elements X j of a PCG G p Ž X . form a complete undirected graph, any new information about a set of such components is informative about all the other components of that chain element and no CI assumption is destroyed within that chain element. In order to generalize external Bayesianity, it is also necessary to introduce a second condition, which will act on the class of likelihoods for models defined on a PCG G p Ž X .. In a general setting, these likelihoods, which we shall call cutting, are those which can be only informative about one chain element X j andror its parents p Ž X j . in the group’s common PCG. DEFINITION 3.3 ŽCutting likelihood.. Say that l Ž x N z . is in the class of cutting likelihoods related to a PCG G p Ž X N Z ., henceforth denoted by L Ž G p ., if it is a likelihood function which could have resulted from a sample Z s Ž Z1 , . . . , Zn . whose density g Ž z N x . can be written in the following form:
Ž 5.
g Ž z N x . s g 1Ž z1 N x 1 . g 2 z 2 N x 2 , p Ž x 2 . , z1 = ??? g n z n N x n , p Ž x n . , z ny 1 ,
where x s Ž x 1 , . . . , x n . are values of the components of X in G p , p Ž x j . are fixed values of the parents of X j in G p Ž X . and z k s Ž z 1 , z 2 , . . . , z k ., for k G 1, are the observed values of Z k s Ž Z1 , . . . , Zk ..
CONDITIONALLY EXTERNALLY BAYESIAN GRAPHS
1747
If G p is complete, then LŽ G p . is the class of all likelihoods. So we only constrain our class of likelihoods when there is some substantive agreement between the members about some lack of association between certain sets of variables given another. In the case when G p Ž X . is not complete, there is an ordering of X to Ž X 1 , . . . , X n . such that X i can be thought Žloosely. of as being caused by p Ž X i ., a subvector of Ž X 1 , X 2 , . . . , X iy1 ., i s 2, . . . , n. When l Ž x N z . g L Ž G p Ž X N Z .x we assume that we have taken an observation Z which, possibly after some transformation, can be represented as Ž Z1 , . . . , Zn ., where Zi is dependent on X only through X i and the values of its direct causes p Ž X i .. The most extreme case is when X 1 , . . . , X n are independent. In this case a cutting likelihood is one which separates in each of these variables. In the case of noncomplete PCG’s, the ID induced from G p Ž X . must be decomposable otherwise original relevances may not be valid anymore after sampling, as we shall see. Now, the second mentioned condition the group is required to obey regarding the likelihoods obtained from sampling over a decomposable PCG is the following: PROPERTY 3.4. External Bayesianity holds only with respect to incoming information Z j , about the chain elements X j of a decomposable PCG G p d Ž X ., for which the value of the ancestral set a Ž z j . s Ž z 1 , . . . , z jy1 . is already known, and whose likelihood l j w x j N z j, p Ž x j .x , j s 1, . . . , n, is a component of a cutting likelihood l Ž x N z . related to G p d Ž X .. In particular, the form Ž5. prevents a variable X k in a decomposable PCG, G p d , not belonging to the parent set p Ž X j . of X j w which is associated with the index of the product component g j in Ž5.x , to condition the observation z j in g j w z j N x j , p Ž x j ., z jy1 x . In other words, no direct associations between X k f p Ž X j . and Z j are allowed in G p d Ž X N Z .. This, in its turn, avoids the introduction of any new association in the moral graph of G p d Ž X N Z . not originally present in G p d Ž X .. Therefore, no CI is lost in G p d according to Pearl’s Ž1988. d-separation theorem. The class of cutting likelihoods is a very natural one to consider in the context of PCG’s, for CG’s G p whose induced ID is decomposable, G p d. This is because, provided Property 3.4 is satisfied by the group, the class is determined by those data sets which, for each member, are guaranteed to preserve the CI structure implicit in G p d after data assimilation. Thus it is simply information which does not destroy the association structure agreed by members of the group. Now we present the formal statement and proof of this result. THEOREM 3.5. Ža. If Property 3.4 is satisfied by the group G for likelihood functions l Ž x N z . related to the variables of a PCG whose induced ID is decomposable, G p d Ž X ., then for each member of G, G p d Ž X N Z . is a CG of that member’s joint density of X N Z. Žb. If Property 3.4 is not satisfied, then, for some member of G, G p d Ž X N Z . may not be the CG of X N Z.
1748
A. E. FARIA, JR. AND J. Q. SMITH
PROOF. Ža. First we draw the ID I Ž X, Z . whose nodes are Ž Z1 , . . . , Zn , X 1 , . . . , X n .. The ID of X 1 , . . . , X n , I Ž X ., is the one induced by G p d Ž X .. A node Y is a parent of Z j , j s 1, . . . , n, if either Ži. Y s X j or Y g p Ž X j . or Žii. Y s Zk , k s 1, . . . , j y 1. Now use Pearl’s Ž1988. d-separation theorem. New edges inducing the marriage of parents of Z j , j s 1, . . . , n, can only occur between a Z node and an X node. This is because G p d Ž X, Z . has an induced decomposable ID within I Ž X, Z ., and the subgraph on subvectors of w p Ž X j ., X j x is complete. It follows that all new paths in I Ž X, Z . between nodes on the subgraph I Ž X . of I Ž X, Z . induced by the marriage of parents are blocked by Z1 , . . . , Zn 4 . Also, there will be no unblocked new path linking X nodes which were not linked in the original association structure. This is because the value of the ancestral set a Ž Z j . is assumed to be already known when Z j is observed. Therefore if a CI statement is implied in I Ž X . on X, then it is also implied on I Ž X, Z .. Žb. If l Ž x N z . f L Ž G p d ., then for some index j, j s 1, . . . , n, there exists a Z j such that the parents of Z j will have an edge induced to X j after moralization. Marrying the parents of the conditioning Z j will now produce an unblocked path between another node X k , k s 1, . . . , j, not connected to X j in I Ž X ., and X j . So, whereas all members agreed a priori that X @ X k N p Ž X j ., after observing Z j this can no longer be deduced. I 4. Characterization of conditionally externally Bayesian pooling operators. Suppose that the m members of a group have agreed on the structure of a decomposable PCG G p d Ž X . relating n random vectors X s Ž X 1 , . . . , X n . in a certain problem. Let A j be the event that the parent nodes of X j , p Ž X j ., have fixed values X j , that is, A j s p Ž X j . s p Ž x j .4 for j s 1, . . . , n. Despite believing the common PCG, G p d , each member has its own particular opinion about the parameters of his conditional densities f i j Ž X j N A j . associated with the graph structure. For technical reasons we shall assume that f i j ) 0 for i s 1, . . . , m and j s 1, . . . , n. The members agree to follow the external Bayesianity axiom, satisfying thus Ž1. applied to for sets of conditional densities. Hence, according to the probability breakdown in CG’s, external Bayesianity on X j N A j is required for all possible values of the X j , j s 1, . . . , n. Unlike in Ž1., the likelihood over which we demand external Bayesianity to hold for these conditional densities is restricted to the family L Ž G p d . of cutting likelihoods defined in Section 3. Thus we demand that Property 3.4 is satisfied. Particularly, this means that external Bayesianity is required only for new information that might come from a designed experiment whose design points are the parents or causes of that variable. We can now define the conditional external Bayesianity property which will characterize conditionally externally Bayesian ŽCEB. pooling operators representing combined probability density functions associated with a PCG. DEFINITION 4.1 ŽConditional external Bayesianity.. Say a group G obeys the conditional external Bayesianity property if the joint density f Ž x . of the
CONDITIONALLY EXTERNALLY BAYESIAN GRAPHS
1749
variables in its common decomposable PCG G p d Ž X . is combined in the following way. For each component X j of X and each set A j of possible values of the parents p Ž X j . of X j , each of the conditional densities f j Ž x j N A j . is pooled to preserve the external Bayesianity property Ž j s 1, . . . , n. with respect to the component l j Ž x j N A j , z j . of the common cutting likelihood l Ž x N z . associated with G p d Ž X .. In terms of obtaining a characterization of conditional external Bayesianity through a class of pooling operators, we propose, in line with Madansky Ž1964, 1978. and Genest McConway and Schervish Ž1986., that a CEB pooling operator Tj associated with a PCG G p Ž X . is one which is EB on appropriate conditional distributions and only for data that respect an ordering of that conditioning, X j N A j , j s 1, . . . , n, implicit in that chain when Property 3.4 is satisfied. Define a measure space w V j ŽA j ., mUj ŽA j .x , hereafter denoted Ž V j , mUj ., with V j being the product space of spaces related to components of X j and mUj being the product reference measure associated with the r Ž j .-dimensional vector X j in V j w see, e.g., Rudin Ž1986.x . Let Tj : Dmj ª D j be a CEB pooling operator, where D j is the class of all mUj -measurable functions f i j : V j ª w 0, `. with f i j ) 0Ž mUj -a.e.. such that H ??? Hf i j d m 1 j ??? d m r Ž j. j s 1 for all i, j and r Ž j .. The m k j ’s are measures associated with components of X j . If such a pooling operator satisfies the following multivariate conditional version of Ž3.: Tj Ž f 1 j , . . . , f m j .Ž x j N A j .
Ž 6.
s
Pj x j N A j , f 1 j Ž x j N A j . , . . . , f m j Ž x j N A j . H ??? HPj ?, f 1 j , . . . , f m j d m 1 j ??? d m r Ž j. j
,
mUj -a.e.,
for each j s 1, . . . , n, where Pj : V j = Ž0, `. m ª Ž0, `. is some arbitrary Lebesgue measurable function; then T s ŽT1 , . . . , Tn . is said to satisfy the PCG G p d likelihood principle. This condition has the same interpretation as condition Ž3. but for X j conditioned on A j ; that is, except for a normalization factor which does not depend on X j , the density of the consensus at X j N A j is required to depend only on X j and its fixed parents p Ž X j . as well as on the individual densities at the actual value of the unseen quantities given its parents, but not upon the densities of the values which might have obtained but did not. Similar to Madansky’s condition Ž1. for EB pooling operators, the CEB pooling operator Tj is required to satisfy the following condition w see Faria and Smith Ž1994. and Faria Ž1996.x : Tj
Ž 7.
l j f1 j H ??? Hl j f 1 j d m 1 j ??? d m r Ž j. j s
,...,
l j fm j H ??? Hl j f m j d m 1 j ??? d m r Ž j. j
l j Tj Ž f 1 j , . . . , f m j .
H ??? Hl j Tj Ž f 1 j , . . . , f m j . d m 1 j ??? d m r Ž j. j
with mUj -a.e. for j s 1, . . . , n where the likelihood l j is a component of l g L Ž G p d ..
1750
A. E. FARIA, JR. AND J. Q. SMITH
Assuming the underlying measure space Ž V j , mUj . of each vector X j in G X . can be partitioned into at least four nonnegligible sets Žthat includes the continuous and most of the countable cases., it is straightforward to extend the characterization theorem of Genest, McConway and Schervish Ž1986. for such an operator in the following way. p dŽ
THEOREM 4.2 ŽConditional modified LogOp.. Let Ž V j , mUj . be the quaternary measure space. Let f j Ž x j N A j .: Dmj ª D j be a CEB pooling operator representing the m individuals combined conditional density for the r Ž j .-dimensional random vector X j , j s 1, . . . , n, given its parents in a PCG G p Ž X .. If f j Ž x j N A j . s Tj Ž f 1 j , . . . , f m j .Ž x j N A j . , j s 1, . . . , n, where for all f i j Ž i s 1, . . . , m. in Dmj and for an existing mUj = Lebesgue measurable function Pj : V j = Ž0, `. m ª Ž0, `., Tj : Dmj ª D j satisfies Ž6., then f j takes the form fj Ž x j N A j .
Ž 8.
s
m pj Ł is1 fi j Ž x j N A j . m H ??? Hpj Ł is1 fi j Ž x j N A j .
w i j ŽA j .
w i j ŽA j .
, d m 1 j ??? d m r Ž j. j
mUj -a.e.,
where pj : V j ª w 0, `. Pj )0, m a.e. are essentially bounded functions and Ž . wi j ŽA j . are weights such that Ý m is1 wi j A j s 1 holds for each index j s 1, . . . , n, and A j are the variables whose values are commonly known by the group when the combination rule is applied. Furthermore, the weights are nonnegative unless V j is finite or there does not exist a countably infinite partition of Ž V j , mUj . into nonnegligible sets. The proof of this theorem is straightforward by a slight adaptation of Theorem 4.4 in Genest, McConway and Schervish Ž1986. individually for each node in G p associated with the underlying measure space Ž V j , mUj ., but conditioned on X j N A j , j s 1, . . . , n. The proof together with some intermediate results is given in Appendix A. The wi j ŽA j . are the weights that should be a measure of i’s expertise associated with the vector X j . They can be possibly a function of other components in X j . It is rather difficult to give an interpretation to pj in the context of our group decision problem. However, it is reasonable in the majority of problems to require that Tj preserves the group’s unanimity. This leads us to setting pj equal 1. Note that in this case condition Ž6. can be restricted to Tj Ž f 1 j , . . . , f m j .Ž x j N A j . s
Pj A j , f 1 j Ž x j N A j . , . . . , f m j Ž x j N A j . H ??? HPj ?, f 1 j , . . . , f m j d m 1 j ??? d m r Ž j. j
,
mUj -a.e.,
thus not allowing Tj to depend on x j directly. The following corollary can be stated, with the proof being easily obtained from Theorem 4.2.
1751
CONDITIONALLY EXTERNALLY BAYESIAN GRAPHS
COROLLARY 4.3 ŽConditional LogOp.. Let Ž V j , mUj . be a quaternary measure space and let Tj : Dmj ª D j be a CEB pooling operator which preserves unanimity. If there exists a mUj = Lebesgue measurable function Pj : V j = Ž0, `. m ª Ž0, `. such that Tj satisfies Ž6. for all vectors of conditional opinions Ž f 1 j , . . . , f m j . g Dmj , then Tj is a logarithmic opinion pool, that is, Tj Ž f 1 j , . . . , f m j .Ž x j N A j .
Ž 9.
s
m Ł is1 fi j Ž x j N A j .
H ???
m H Ł is1
fi j Ž x j N A j .
w i j ŽA j .
w i j ŽA j .
, d m 1 j ??? d m r Ž j. j
mUj -a.e.,
for some arbitrary weights wi j , i s 1, . . . , m, j s 1, . . . , n, possibly functions of A j Ž the commonly known past when the densities of X j are combined . adding up to 1. Moreover, the weights wi j ŽA j . are nonnegative unless V j is finite or there does not exist a countable partition of Ž V j , mUj . into nonnegligible sets. 5. CEB poolings appearing EB. We have already shown in Section 3 that in order to ensure that the CEB rules are well defined, the common PCG G p Ž X . must be decomposable. Note that the CEB rules are not based on pooling operators since their arguments are not necessarily just the values of the joint densities in those pools. The question now is, when l Ž x N z . g L Ž G p d ., in what sense, if any, are the CEB poolings on chain elements EB on the whole PCG G p d ? Certainly, when data Z about X in G p d Ž X . is observed, that evidence must be propagated through the PCG. Therefore, all the conditional poolings Tj Ž f 1 j , . . . , f m j .w x j N x Ž ny1. x on the chain elements X j and its predecessors on G p d Ž X ., that is, X 1 , . . . , X jy1 , must be updated to Žomitting the members’s densities. Tj w x j N x Ž jy1., z x for j s 1, . . . , n. Suppose we demand that the group agrees to update those conditional densities in a backwards sequence. Thus assume that the group agrees to update X n N X Ž ny1. first, X ny1 N X Ž ny2. second and so on to X 1. Let f˜ be the group’s combined joint density which takes the individual posterior densities on X n N w X Ž ny1., Z x , pools them, uses the derived Žagreed. density of Z N X Ž ny1. to obtain individual densities of X ny 1 N w X Ž ny2., Z x , pool these and so on down to the density of X 1. Also, let f be the group’s combined joint density which pools the individual prior densities of X n N X Ž ny1., forms a group’s posterior density of X n N w X Ž ny1., Z x and a density of Z N X Ž ny1., uses this agreed density of Z N X Ž ny1. and the pool of the prior densities on X ny 1 N X Ž ny2. to obtain the posterior density X ny1 N w X Ž ny2., Z x and so on down to the pooling of X 1. Thus, the question is when does f˜Ž x N z . s f Ž x N z .? The answer is provided by the following theorem. THEOREM 5.1. Suppose that Property 3.4 is satisfied by the group G for a PCG G p d Ž X .. Also assume that the vector of weights wj of the conditional LogOps used to combine the beliefs of the members of G on chain elements X j
1752
A. E. FARIA, JR. AND J. Q. SMITH
of G p d Ž X . is a function only of variables in p Ž x j . for all j s 1, . . . , n. Then, for the whole graph G p d Ž X N Z ., f˜Ž x N z . s f Ž x N z . ,
where the conditional LogOps components of f˜or f are backwards sequentially updated. See Appendix B for the proof of this theorem. Note that the above result guarantees that the original PCG structure is preserved after new information is incorporated into the model and that there is agreement on how the graph is updated, although the updating is only strictly EB when the corresponding graph is completely disconnected. 6. Discussion. Complete CG’s are always decomposable and make no statements about CI being probabilistically valid for all situations. Therefore, for complete CG’s, Theorem 5.1 implies that all CEB rules are sequentially EB Žin the sense defined in the previous section. to general likelihoods. Each ordering of variables in a complete induced ID gives a different class of sequentially EB pool, so with n variables there are n! different classes defined by different CG’s. Consequently, the collection of such sequentially EB pools is extremely rich, a fact obscured by the insistence that a pooling should be an operator on a joint density w see Faria and Smith Ž1996., Example 2.2x . On the other hand, for CG’s representing unconditional independence between all the variables, Theorem 5.1 implies that the joint pooling on the whole CG is EB to cutting likelihoods. In this case, the agreement on the sequential backwards performing of conditional pools need not be made. The CEB pools are not formed as pooling operators on joint densities on all the variables of a system. They act as components of pooling operators on conditional densities. The argument that a pooling should be a pooling operator on a joint density appears to us to be weak, since relative expertise on particular variables should be allowed. This multiplicity in the complete case is, in one sense, a problem, since we need to choose which CEB pool to use. But this will largely be determined by the time or causal order in which the random variables are observed. We need this ordering to fix the weights wj associated with the jth variables X j since wj is allowed only to depend on the parents of X j . In the complete case, p Ž X j . s X 1 , . . . , X jy1 4 , and, for all possible pools to operate in the class of CEB pools associated with this graph, X 1 , . . . , X jy1 will not need to be known before the pooling takes place w Faria Ž1996. and Faria and Smith Ž1996.x . When the induced ID is not decomposable, assimilation of data tends to destroy that agreed structure. It is natural therefore to work only with information in the ID which is not destroyed by assimilation of new information. On the other hand, an agreed nondecomposable ID I can always be made decomposable by marrying parents and adding directed edges until it is decomposable w Lauritzen and Spiegelhalter Ž1988.x . By using this derived ID
CONDITIONALLY EXTERNALLY BAYESIAN GRAPHS
1753
I d as a basis for the CEB pools instead of I , some information might be lost because there may be CI statements supplied by I but not by I d. Rather we claim that it is only sensible to formulate explicitly the pooling on those agreed conditional independencies which can reasonably be assumed to be preserved after simple types of sampling. Otherwise, the combination obtained a posteriori would not even be defined w see Faria and Smith Ž1996., Example 2.1x . One legitimate question that might be asked is: why use CG’s whose induced ID’s are not complete? Because an ID is always acyclic there will always be a complete dimensional graph which has it as a subgraph and so is a valid ŽID. description of the problem. Furthermore, the CEB pools which are associated with complete graphs have the advantage that they are sequentially EB with respect to all likelihoods. On the other hand, CEB pools related to incomplete graphs are only sequentially EB to data which gives rise to a certain structure of likelihood. There are four answers to this question: 1. Simplicity. If the type of information you expect to receive will automatically preserve CI structure, it seems perverse to demand methods of combining densities which exhibit individual’s dependence structures they will never believe. 2. Preservation of symmetry. Suppose two random variables X i , X j in the random vector X are agreed to exhibit CI and are symmetric in the sense that there is no clear order of causality or association between them. Thus, to introduce such an association into the pooling algorithms seems to be artificial and undesirable. 3. Fixing a frame. In a given problem, experts will agree on a set of random vectors X1 , . . . , X n on which they will pool their opinions. However, in most circumstances, each expert will have beliefs about other variables X *, agreed as independent of X. Implicitly in any pooling, they will ignore the disparity between their beliefs about X *. Similarly, at a future time, if asked to combine their beliefs about X *, they will choose to ignore the disparity in their beliefs about X. But to do this implies the use of a CEB rule which explicitly demands in its associated CG that X @ X *. Thus, if we do not allow incomplete cases, in different and independent forecasting problems about X and X *, then we would need to prioritize X and X *. 4. External Bayesianity. The sparser the CG, the more informative it is in terms of CI statements and the closer to being EB the group is in the joint combination. APPENDIX A Proof of Theorem 4.2. The proof of Theorem 4.2 needs intermediate results that we state here as other theorems. Basically, the proof itself is split up to include all possible configurations of the underlying measure spaces, that is, the cases in which Ž V j , mUj . does not contain any atoms, or is purely atomic, or contains atoms but is not purely atomic. Some of these intermedi-
1754
A. E. FARIA, JR. AND J. Q. SMITH
ate theorems themselves need other results and concepts. The complete proof together with the proofs of all related lemmas and theorems can be found in Faria and Smith Ž1994. or in Faria Ž1996.. We begin Section A.1 with two characterization theorems of the conditional modified LogOp. Theorem A.1 covers the case in which Ž V j , mUj . is not purely atomic and condition Ž6. holds for the functions Pj , j s 1, . . . , n. Theorem A.2 characterizes Ž8. for the case in which Ž V j , mUj . contains at least three atoms but is not purely atomic. Those two theorems are then used in Section A.2 to prove Theorem 4.2. A.1. Some characterization theorems for the conditional modified LogOp. THEOREM A.1. Assume that Ž V j , mUj . is not purely atomic and that N is the complement set of the atoms. Let f j : Dm ª D, j s 1, . . . , n, be externally Bayesian. Assume that Ž6. holds for mUj = Lebesgue measurable functions Pj : V j = Ž0, `. m ª Ž0, `., j s 1, . . . , n. Then
Ž 10.
f j Ž f 1 j , . . . , f m j .Ž x j N A j . s
m pj Ž x j N A j . Ł is1 fi j Ž x j N A j .
w i j ŽA j .
H ??? HPj Ž ?, f 1 j , . . . , f m j . d m 1 j ??? d m r Ž j. j
,
mUj -a.e. on N, Ž . where wi j ŽA j . G 0 and Ý m is1 wi j A j s 1, j s 1, . . . , n. See Faria and Smith Ž1994. or Faria Ž1996. for a complete proof of this theorem which is a straightforward generalization of Lemma 4.3 in Genest, McConway and Schervish Ž1986.. THEOREM A.2. Let Ž V j , mUj . be a quaternary measure space that contains at least two atoms, and let f j : Dm ª D be a CEB pooling operator. Suppose there exist mUj = Lebesgue measurable functions Pj : V j = Ž0, `. m ª Ž0, `. such that Ž6. holds for all conditional densities f i j g D, i s 1, . . . , m and j s 1, . . . , n, associated with a PCG G p. Then there exist, for each vector of atoms X j N A j in V j , j s 1, . . . , n, constant terms v1 j ŽA j ., . . . , vm j ŽA j . such that
Ž 11.
f j Ž f 1 j , . . . , f m j .Ž x j N A j . s
m pj Ž x j N A j . Ł is1 fi j Ž x j N A j .
v i j ŽA j .
m H ??? Hpj Ł is1 f ivj i j ŽA j . d m 1 j ??? d m r Ž j. j
.
PROOF. Lemma B.8 in Faria and Smith Ž1994. w a generalization of Lemma 4.2 in Genest, McConway and Schervish Ž1986.x proves that, for the hypotheses above, there exist, for each pair of vectors of atoms Ž X j N A j , Yj N D j . in V 2j , constant terms v 1 j ŽA j , D j ., . . . , vm j ŽA j , D j . such that, for all functions g 1 j , . . . , g m j ) 0 in D, we have for each j that Q j Ž x j N A j , yj N D j . Ž g 1 j , . . . , g m j . s Q j Ž x j N A j , yj N D j . Ž 1.
m
Ł g ivj ŽA , D . , ij
is1
j
j
1755
CONDITIONALLY EXTERNALLY BAYESIAN GRAPHS
where Q j : Ž0, `. m ª Ž0, `. are Lebesgue measurable functions and 1 s Ž1, . . . , 1. is m-dimensional. So, if for all j we fix the vector of atoms yj N D j g V j and choose 0 - « j - 1rmUj Ž yj N D j ., then we can define for all atoms x j N A j g V j the functions j ŽA j , D j . , Pj Ž x j N A j . s Q Ž x j N A j , yj N D j . Ž 1 . Pj yj N D j , « j «yv j
Ž . where « j s Ž « j , . . . , « j . and vj ŽA j , D j . s Ý m is1 vi j A j , D j . Thus, for all the atoms x j N A j , we have that m
Pj Ž x j N A j , g 1 j , . . . , g m j . s pj Ž x j N A j . Ł g ivji j ŽA j . is1
for all 0 - g 1 j , . . . , g m j -
1rmUj Ž x j
N A j .. This implies that, for all j, Ž11. holds. I
We are now in position to prove Theorem 4.2. A.2. The proof of Theorem 4.2. If Ž V j , mUj . contains no atoms at all then the proof is immediate from Theorem A.1. If Ž V j , mUj . is purely atomic, then Ž8. is easily obtained from Ž11. with wi j ŽA j . s vi j ŽA j . for all i s 1, . . . , m and j s 1, . . . , n. Moreover, from the fact that f j is CEB, it is easy to verify that the weights must sum to 1 for each j, Ž . that is, Ý m is1 wi j A j s 1. If Ž V j , mUj . has atoms but is not purely atomic, we use Theorem A.1 to obtain the result on the set N, the complement of the atoms set of Ž V j , mUj .. Ž1. Ž2. Ž2. Ž . Consider the atoms x Ž1. j N A j , x j N A j , . . . in V j and let Pk j g 1 j , . . . , g m j Ž k . Ž k . denote Pj Ž x j N A j , g 1 j , . . . , g m j . for all g i j , i s 1, . . . , m, j s 1, . . . , n, such that 0 - g i j - 1rmUj Ž x Žj k . N AŽjk . .. From Ž7. in the definition of CEB for a PCG G p d we have for each j that l j Ž x j N A j . Pj x j N A j , f 1 j Ž x j N A j . , . . . , f m j Ž x j N A j .
Ž 12.
Pj x j N A j , h1 j Ž x j N A j . , . . . , h m j Ž x j N A j .
s constant,
mUj -a.e., whenever h i j A l j f i j for all i and j. From Ž10., we have that for x j N A j on N the left-hand side of Ž12. equals m Ł is1 t iwj i j ŽA j ., where t i j s H ??? Hl j f i j d m 1 j ??? d m r Ž j. j for all i and j. Fix t 1 j , . . . , t m j , pick a single vector of atoms x Žj k . N AŽjk . in V j and let « j be small enough such that 0 - « jrt i j - 1rmUj Ž x Žj k . N AŽjk . . for each i and j. Then, set si j s f i j Ž x Žj k . N AŽjk . . s « j and sUi j s h i j Ž x Žj k . N AŽjk . . s « jrt i j for each i and j. For another vector x Žj q . N AŽjq ., set u i j s f i j Ž x Žj q . N AŽjq . . and uUi j s h i j Ž x Žj q . N AŽjq . .. Now, choose four sets in V j for each j, say, Bv j , with 0 - mUj Ž Bv j . - ` for v s 1, 2, 3 and mUj Ž B4 j . ) 0, such that si j mUj Ž B1 j . q u i j mUj Ž B2 j . - 1 and sUi j mUj Ž B1 j . q uUi j mUj Ž B2 j . - 1 for all i and j. Make B1 j s x Žj k . N AŽjk .4 and B2 j s x Žj q . N A4 and construct densities Ž f 1X j , . . . , f mX j . and likelihoods l j for these densities such that f iX j Ž x Žj k . N AŽjk . . s si j , f iX j Ž x Žj q . N AŽjq . . s u i j , hXi j Ž x Žj k . N AŽjk . . s sUi j and hXi j Ž x Žj q . N AŽjq . . s uUi j , where hXi j s l j f iX jrH ??? Hl j f iX j d m 1 j ??? d m m j Žfor all i and j ..
1756
A. E. FARIA, JR. AND J. Q. SMITH
Denote g i j s s i j mUj Ž B1 j . q u i j mUj Ž B 2 j .. Observe that s i j mUj Ž B1 j . q u i j mUj Ž B2 j . s t i j w sUi j mUj Ž B1 j . q uUi j mUj Ž B2 j .x - t i j for all i and j, and thus 0 g i j - min t i j , 14 for each i and j. Also choose l j ) 0 and j j - ` such that
½ G max ½ 1 y g
y1
l j - min 1 y g i j
Ž t i j y g i j . : i s 1, . . . , n 5
y1 ij
Ž t i j y g i j . : i s 1, . . . , n 5 - j j .
Fixing an arbitrary density z j g D we can define, for all i and j, ti j y gi j y l j Ž1 y gi j .
f iX j s si j I Ž B1 j . q u i j I Ž B2 j . q q
j j Ž1 y gi j . y Ž ti j y gi j . R j Ž jj y lj .
mUj Ž B3 j .Ž j j y l j .
I Ž B4 j .
z j I Ž B4 j . ,
I Ž B4 j . z j d m 1 j ??? d m r Ž j. j ) 0 and I Ž B . is the indicator of where R j s H ??? HI the set B. Note that H ??? Hf iX j d m 1 j ??? d m r Ž j., j s 1 and f iX j Ž x Žj k . N AŽjk . . s si j , f iX j Ž x Žj q . N AŽjq . . s u i j for all i and j. Finally, consider the likelihoods l j s I Ž B1 j . q I Ž B2 j . q j j Ž B3 j . q l j I Ž B4 j . . Using the fact that Ž12. also holds on all of V j , we have Pk j Ž « j , . . . , « j .
m
Pk j Ž « jrt1 j , . . . , « jrt m j .
s
Ł t iwj
Žk. i j ŽA j .
is1
for each j and an arbitrary k. This means that, for all g 1 j , . . . , g m j strictly between 0 and 1rmUj Ž x Žj k . N AŽjk . ., Pk j Ž g 1 j , . . . , g m j . s Pk j Ž « j , . . . , « j . «y1 j
m
Ł g iwj
Žk. i j ŽA j .
.
is1
and Ž8. is proved. Let Pj Ž x Žj k . N AŽjk . . s Pk j Ž « j , . . . , « j . «y1 j Although the weights can possibly vary with A j , they are constant relative to x j N A j for fixed values of A j , j s 1, . . . , n. Also, they must be nonnegative if mUj is not purely atomic or if it is purely atomic but V j includes a countably infinite number of atoms. In this latter case, we can easily construct densities f 1 j , . . . , f m j which will make the integrals m
H ??? H p is1 Łf j
w i j ŽA j . ij
d m 1 j ??? d m r Ž j. j
infinite unless all the weights are nonnegative. However, these integrals are always finite when V j is finite and mUj is a counting-type measure. The weights can take negative values in this case. Furthermore, pj must be essentially bounded, or else there exist f j such that the above integrals are infinite when all the f i j are equal to f j for each j, according to Hewitt and Stromberg wŽ 1965., Theorem 20.15x .
1757
CONDITIONALLY EXTERNALLY BAYESIAN GRAPHS
APPENDIX B PROOF OF THEOREM 5.1. Here we use the convention that a function will explicitly depend only on the values of its arguments. For simplicity, the argument of wi j , that is p Ž x k ., will be omitted in this proof. First, observe that there is a proportionality constant for each k s 1, . . . , n, m h k w p Ž x k ., z k x s 1rH ??? H Ł is1 f iwj i k w x k N p Ž x k ., z k x dx 1 k ??? dx r Ž k . k , r Ž k . being the dimension of the vector element X k , such that m
f˜k x k N p Ž x k . , z k s h k p Ž x k . , z
Ł f iwk
is1
ik
xk N p Ž xk . , z k ,
where Z k s Ž Z1 , . . . , Zk .. Also, if information Z k about X k is observed then the density of Ž X kq 1 , . . . , X n . given X k remains unchanged. We begin the proof by showing that, under the conditions of the theorem, the density of X n N p Ž X n ., Z n 4 does not depend on when Z n is incorporated. Notice that Z n s Z. Since g Ž z N x . is a function of x n only through g n w z n N x n , p Ž x n ., z ny 1 x , we can write that fn x N p Ž x n . , z n s h n p Ž x n . , z n m
f i n x n N p Ž x n . g n z n N p Ž x n . , x n , z ny 1
is1
hi n p Ž x n . , z n
=Ł
win
,
where throughout we let hi n p Ž x n . , z n s
H ??? H f
in
xn N p Ž xn .
=g n z n N x n , p Ž x n . , z ny 1 dx 1 n ??? dx r Ž n. n for i s 1, . . . , m. Noting that Ý m is1 wi n s 1, this can be arranged as f˜n x n N p Ž x n . , z n s
hn p Ž x n . , z n m Ł is1 hwi ni n p Ž x n . , Z n
m
= Ł f iwn i n x n N p Ž x n . g n z n N x n , p Ž x n . , z ny1 is1
s Vn p Ž x n . , z n f n x n N p Ž x n . , z n , where Vn p Ž x n . , z n s
h n p Ž x n . , z n vn p Ž x n . , z n m Ł is1 hwi ni n p Ž x n . , z n
and where vn p Ž x n . , z n s
1 H ???
m H Ł is1 f iwn i n
x n N p Ž x n . g n z n N x n , p Ž x n . , z ny1 dx 1 n ??? dx r Ž n. n
.
1758
A. E. FARIA, JR. AND J. Q. SMITH
Now we know that f˜n and f n must integrate to 1 over x n for all values of p Ž x n . and z n. It follows therefore that Vn w p Ž x n ., z n x is identically 1 and so
Ž 13.
f˜n x n N p Ž x n . , z n s f n x n N p Ž x n . , z n
as required for each node of the ID induced by G p d. After having shown that whether the combination is done before or after observing Z does not affect the conditional density of X n N p Ž X n ., Z n 4 , we next consider the updating of X ny 1 N p Ž X ny1 . given Z n. First note that to update the distribution of X ny 1 , and hence of X ny 1 N p Ž X ny1 ., in the light of Z n , we need to calculate the density g Ž1. of z n given x ny 1. We can then simply use Bayes’ rule as above and since the likelihood of Z is cutting, from the usual probability calculus we can write that g Ž1. z n N x ny1 s g 1 Ž z 1 N x 1 . g 2 z 2 N x 2 , p Ž x 2 . , z 1 = ??? g ny 1 z ny1 N x ny1 , p Ž x ny1 . , z ny 2 g˜n Ž z n N x ny1 , z ny1 . , where g˜n Ž z n N x ny 1 , z ny1 . s
H ??? Hg
n
z n N x n , p Ž x n . , z ny1
= f n x n N p Ž x n . dx 1 n ??? dx r Ž n. n . Note that f n above is unambiguously defined since f n s f˜n by Ž13. and fn x n N p Ž x n . s h p Ž x n .
m
Ł f iwn
is1
in
xn N p Ž xn . .
Now, provided that wi n is a function of x only of terms in p Ž x n ., it is clear that g˜n is a function of z n and p Ž x n . only. So we can write that g˜n Ž z n N x ny 1 , z ny1 . s g˜n z n N p Ž x n . , z ny1 ,
where g˜n is a function of its arguments only. Also, since G p d Ž X . has a decomposable induced ID, it will exhibit the running intersection property w Lauritzen and Speigelhalter Ž1988.x . This states that there will exist an index jŽ n. Žsay. such that p Ž x n . : x jŽ n. , p Ž x jŽ n. .4 with jŽ n. s 1, . . . , n y 1. So g˜ can be further simplified into g˜n Ž z n N x ny 1 , z ny1 . s g˜n z n N x jŽ n. , p Ž x jŽ n. . , z ny1 . Hence g Ž1. z n N x ny1 s g 1Ž1. z 1Ž1. N x 1 g 2Ž1. z 2Ž1. N x 2 , z 1Ž1. Ž1. Ž1. Ž1., ny2 = ??? g ny , 1 z ny1 N x ny1 , p Ž x ny1 . , z
where Ž1., jy1 Ž1., jy1 g jŽ1. z Ž1. s g j z Ž1. , j N xj, p Ž xj. , z j N xj, p Ž xj. , z
and z kŽ1. s z k for k / jŽ n., k s 1, . . . , n y 1 and jŽ n.y1 g jŽŽ1.n. z Ž1. jŽ n. N x jŽ n. , p Ž x jŽ n. . , z
s g jŽ n. z jŽ n. N x jŽ n. , p Ž x jŽ n. . , z jŽ n.y1 g˜n z n N x jŽ n. , p Ž x jŽ n. . , z ny1 .
CONDITIONALLY EXTERNALLY BAYESIAN GRAPHS
1759
It follows that g Ž1. Ž z N x ny 1 . is a likelihood function which is cutting on x Ž1. s x ny 1 since the only changed term is the likelihood arising as if from Ž . two conditionally independent observations z Ž1. j s z jŽ n. , z n . Now consider the density of X ny 1 N p Ž X ny1 ., Z 4 . The argument leading to Ž13. can be applied with n y 1 replacing n, g replaced by g Ž1. and x by x Ž1. to give us that f ny 1 x ny1 N p Ž x ny1 . , z ny 1 s f˜ x ny1 N p Ž x ny1 . , z ny1 . Since the induced ID is decomposable, we can find an index jŽ n y 1. such that p Ž x ny 1 . : x jŽ ny1. , p Ž x jŽ ny1. .4 . We can therefore use an argument exactly analogous to the one above, replacing x Ž1. by x Ž2. s x ny 2 s Ž x 1 , . . . , x ny2 ., Ž n y 1. by Ž n y 2., g by g Ž1. and g Ž1. by g Ž2., where Ž2. Ž2. g Ž2. z ny1 N x ny2 s g 1Ž2. z 1Ž2. N x 1 ??? g ny2 z ny2 N x ny2 , p Ž x ny2 . , z Ž2., ny3
with Ž2., jy1 Ž2., jy1 g jŽ2. z Ž2. s g j z Ž2. j N xj, p Ž xj. , z j N xj, p Ž xj. , z Ž1. and z Ž2. for j / jŽ n y 1., j s 1, . . . , n y 2. Also, j s zj Ž2., jŽ ny1.y1 g jŽŽ2.ny1. z Ž2. jŽ ny1. N x jŽ ny1. , p Ž x jŽ ny1. . , z
s g jŽŽ1.ny1. z jŽ ny1. N x jŽ ny1. , p Ž x jŽ ny1. . , z jŽ ny1.y1 Ž1. Ž1. Ž2., jŽ ny1. = g ny 1 z ny1 N p Ž x jŽ ny1. . , z
and we have that f ny 2 x ny2 N p Ž x ny2 . , z ny 2 s f˜ny2 x ny2 N p Ž x ny2 . , z ny1 . It should now be clear that we can proceed inductively backwards through the indices k, starting from k s n to prove that f k w x k N p Ž x k ., z k x s f˜k w x k N p Ž x k ., z k x for k s 1, . . . , n. However, since the density of f is uniquely determined by the product of f k and f˜ by the product of conditional densities f˜k , k s 1, . . . , n, we have then proved that f Ž x N z . s f˜Ž x N z . , that is, a sequential external Bayesianity holds for G p d Ž X N Z .. I Acknowledgments. We are grateful to two referees and an Associate Editor for their helpful comments and suggestions on the first version of the paper. REFERENCES BACHARACH, M. Ž1972.. Scientific disagreement. Christchurch, Oxford. Unpublished manuscript. BACHARACH, M. Ž1975.. Group decisions in the face of differences of opinion. Management Science 22 182]191. BATES, J. M. and GRANGER, C. W. J. Ž1969.. The combination of forecasts. Oper. Res. 20 451]468.
1760
A. E. FARIA, JR. AND J. Q. SMITH
BAYARRI, M. J. and DEGROOT, M. H. Ž1988.. Gaining weight: a Bayesian approach. In Bayesian Statistics 3 ŽJ. M. Bernardo, M. H. DeGroot, D. V. Lindley and A. F. M. Smith, eds.. 25]44. Oxford Univ. Press. COOKE, R. M. Ž1991.. Experts in Uncertainty: Expert Opinion and Subjective Probability in Science. Oxford Univ. Press. DALKEY, N. C. Ž1972.. An impossibility theorem for group probability functions. Report No. P-4862, The Rand Corporation, Santa Monica, CA. DALKEY, N. C. Ž1975.. Toward a theory of group estimation. In The Delphi Method: Techniques and Applications ŽH. A. Linstone and M. Turoff, eds.. 236]261. Addison-Wesley, Reading, MA. DAWID, A. P. Ž1979.. Conditional independence in statistical theory Žwith discussion.. J. Roy. Statist. Soc. Ser. B 41 1]31. DAWID, A. P. Ž1992.. Applications of a general propagation algorithm for probabilistic expert systems. Statist. Comput. 2 25]36. DEGROOT, M. H. and MORTERA, J. Ž1991.. Optimal linear opinion pools. Management Science 37 546]558. FARIA, A. E. Ž1996.. Graphical Bayesian models in multivariate expert judgements and conditional external Bayesianity. Ph.D. dissertation, Dept. Statistics, Univ. Warwick. FARIA, A. E. and SMITH, J. Q. Ž1994.. Conditional external Bayesianity in chain graphs. Technical Report 271, Dept. Statistics, Univ. Warwick. FARIA, A. E. and SMITH, J. Q. Ž1996.. Conditional external Bayesianity in decomposable influence diagrams. In Bayesian Statistics 5 ŽJ. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith, eds.. 551]560. Clarendon Press, Oxford. FARIA, A. E. and SOUZA, R. C. Ž1995.. A re-evaluation of the quasi-Bayes approach to the linear combination of forecasts. Journal of Forecasting 14 533]542. FRENCH, S. Ž1985.. Group consensus probability distributions: a critical survey. In Bayesian Statistics 2 ŽJ. M. Bernardo, M. H. DeGroot, D. V. Lindley and A. F. M. Smith, eds.. 183]202. North-Holland, Amsterdam. FRYDENBERG, M. Ž1990.. The chain graph Markov property. Scand. J. Statist. 17 333]353. GEANOKOPLOS, J. Ž1992.. Common knowledge. Journal of Economic Perspectives 6 53]82. GENEST, C. Ž1984.. A characterization theorem for externally Bayesian groups. Ann. Statist. 12 1110]1115. GENEST, C., MCCONWAY, K. J. and SCHERVISH, M. J. Ž1986.. Characterization of externally Bayesian pooling operators. Ann. Statist. 14 487]501. GENEST, C. and WAGNER, C. G. Ž1984.. Further evidence against independence preservation in expert judgement synthesis. Technical Report 84-10, Dept. Statistics and Actuarial Science, Univ. Waterloo. GENEST, C. and ZIDEK, J. V. Ž1986.. Combining probability distributions: a critique and an annotated bibliography Žwith discussion.. Statist. Sci. 1 114]148. HEWITT, E. and STROMBERG, K. Ž1965.. Real and Abstract Analysis. Springer, New York. JENSEN, F., JENSEN, F. V. and DITTMER, S. L. Ž1994.. From influence diagrams to junction trees. Report R-94-2013, Dept. Mathematics and Computer Science, Aalborg Univ., Denmark. LADDAGA, R. Ž1977.. Lehrer and the consensus proposal. Synthese 36 473]477. LAURITZEN, S. L. Ž1996.. Graphical Models. Oxford Univ. Press. LAURITZEN, S. L., SPEED, T. P. and VIJAYAN, K. Ž1984.. Decomposable graphs and hypergraphs. J. Austral. Math. Soc. Ser. A 36 12]29. LAURITZEN, S. L. and SPIEGELHALTER, D. J. Ž1988.. Local computations with probabilities on graphical structures and their application to expert systems. J. Roy. Statist. Soc. Ser. B 50 157]224. LEHRER, K. and WAGNER, C. G. Ž1983.. Probability amalgamation and the independence issue: a reply to Laddaga. Synthese 55 339]346. LINDLEY, D. V. Ž1985.. Reconciliation of discrete probability distributions. In Bayesian Statistics 2 ŽJ. M. Bernardo, M. H. DeGroot, D. V. Lindley and A. F. M. Smith, eds.. 375]390. North-Holland, Amsterdam. MADANSKY, A. Ž1964.. Externally Bayesian groups. Rand Memo RM-4141-PR.
CONDITIONALLY EXTERNALLY BAYESIAN GRAPHS
1761
MADANSKY, A. Ž1978.. Externally Bayesian groups. Univ. Chicago. Unpublished manuscript. MCCONWAY, K. J. Ž1978.. The combination of experts’ opinions in probability assessment: some theoretical considerations. Ph.D. dissertation, Univ. College London. MCCONWAY, K. J. Ž1981.. Marginalization and linear opinion pools. J. Amer. Statist. Assoc. 76 410]414. MENDEL, M. B. and SHERIDAN, T. B. Ž1987.. Optimal estimation using human experts. Man] machine Systems Lab., Dept. Mechanical Engineering, MIT. PEARL, J. Ž1988.. Probabilistic Reasoning in Intelligence Systems. Morgan Kaufmann, San Mateo, CA. PILL, J. Ž1971.. The Delphi method: substance, context, a critique and an annotated bibliography. Socio-Economic Planning and Science 5 57]71. QUEEN, C. M. and SMITH, J. Q. Ž1993.. Multiregression dynamic models. J. Roy. Statist. Soc. Ser. B 55 849]870. RAIFFA, H. Ž1968.. Decision Analysis: Introductory Lectures on Choices under Uncertainty 211]226. Addison-Wesley, Reading, MA. RUDIN, W. Ž1986.. Real and Complex Analysis. McGraw-Hill, New York. SMITH, A. F. M. and MAKOV, U. E. Ž1978.. A quasi-Bayes sequential procedure for mixtures. J. Roy. Statist. Soc. Ser. B 40 106]112. SMITH, J. Q. Ž1989.. Influence diagrams for statistical modelling. Ann. Statist. 17 654]672. SMITH, J. Q. and PAPAMICHAIL, K. N. Ž1996.. Fast Bayes and the dynamic junction forest. Real Time Online Decision Support System document ŽWG1.-TNŽ96.-02. STONE, M. Ž1961.. The opinion pool. Ann. Math. Statist. 32 1339]1342. WERMUTH, N. and LAURITZEN, S. L. Ž1990.. On substantive research hypotheses, conditional independence graphs and graphical chain models. J. Roy. Statist. Soc. Ser. B 52 21]50. WEST, M. and HARRISON, P. J. Ž1989.. Bayesian Forecasting and Dynamic Models. Springer, New York. WINKLER, R. L. Ž1968.. The consensus of subjective probability distributions. Management Science 15 B61]B75. WINKLER, R. L. Ž1986.. Expert resolution. Management Science 32 298]303. DEPARTMENT OF STATISTICS UNIVERSITY OF WARWICK COVENTRY CV4 7AL ENGLAND