arXiv:quant-ph/9706039v1 17 Jun 1997. QUANTUM BAYESIAN NETS. Robert R. Tucci. P.O. Box 226,. Bedford, MA 01730. Published in: Int. Jour. Of Mod. Phys.
arXiv:quant-ph/9706039v1 17 Jun 1997
QUANTUM BAYESIAN NETS Robert R. Tucci P.O. Box 226, Bedford, MA 01730 Published in: Int. Jour. Of Mod. Phys. B9(1995)295-337 The principles of this paper have been implemented in commercial software available at www.ar-tiste.com
ABSTRACT We begin with a review of a well known class of networks, Classical Bayesian (CB) nets (also called causal probabilistic nets by some). Given a situation which includes randomness, CB nets are used to calculate the probabilities of various hypotheses about the situation, conditioned on the available evidence. We introduce a new class of networks, which we call Quantum Bayesian (QB) nets, that generalize CB nets to the quantum mechanical regime. We explain how to use QB nets to calculate quantum mechanical conditional probabilities (in case of either sharp or fuzzy observations), and discuss the connection of QB nets to Feynman Path integrals. We give examples of QB nets that involve a single spin- 21 particle passing through a configuration of two or three Stern-Gerlach magnets. For the examples given, we present the numerical values of various conditional probabilities, as calculated by a general computer program especially written for this purpose.
1
1. INTRODUCTION The artificial intelligence and expert systems literature contains a large number of articles ([1]-[8]) discussing the theory and application of Classical Bayesian (CB) nets. At least one software package[9] that implements this theory is commercially available. Amazingly, the physics community, to whom this paper is mainly addressed, seems not to have discovered or used CB nets yet. Therefore, we begin this paper with a review of CB nets. However, the real purpose of this paper is to introduce a new class of nets, which we shall call Quantum Bayesian (QB) nets, that generalize CB nets to the quantum mechanical regime. In this paper, to illustrate QB nets, we use them to predict conditional probabilities for experiments comprising combinations of two or three Stern-Gerlach magnets. In future papers, we will use QB nets to analyze quantum optical experiments and other physical situations. Earlier workers have devised networks for quantum mechanics ( for eg., Feynman diagrams, and the “trajectory graphs” of Ref.[10]), but their nets differ substantially from ours. Henceforth, we will underline random variables. We will write P (x = x) for the probability that the random variable x assumes the particular value x[11]. Sometimes, if there is no danger of confusion, we will write P (x) rather than P (x = x). Similarly, we will often write P (y|x) instead of P (y = y|x = x). P (y = y|x = x), the conditional probability that y assumes the value y given that x assumes the value x, is defined by P (y = y|x = x) =
P (y = y, x = x) . P (x = x)
(1.1)
Suppose h is a random variable that can take on as values each of a number of hypotheses, and e is a random variable representing the observations that constitute the available evidence. We often want to calculate the posterior probability P (h|e) in terms of the prior probability P (h). To do this, one may use Bayes’s rule, which says P (h|e) = P
P (e|h)P (h) . ′ ′ h′ P (e|h )P (h )
(1.2)
Equation(1.2) is a simple consequence of Eq.(1.1). The subject of CB nets may be viewed as an extension of Bayes’s rule. A CB net has two parts: a diagram consisting of nodes with arrows connecting some pairs of these nodes, and a collection of probabilities, one per node. For example, the digital circuits of electrical engineering can be modelled as CB nets if each NAND (or, alternatively, each AND, OR and NOT gate) is replaced by a node, and each connecting cable is replaced by an arrow pointing in the direction of current flow. Of course, in the usual digital circuits, the NAND gates are deterministic, their output being a deterministic function of their inputs. More generally, if some of the NAND gates were to act erratically, and if the input sources to the circuit also acted in a random fashion, then such a circuit could also be modelled by a CB 2
net. In this non-deterministic CB net, the signal flowing in each wire would have a certain probability of being 0 and of being 1. If the signal in a certain wire were measured so that one knew that it was definitely 1, then one would want to revise the probability distributions for the signals in all other wires so as to reflect the new evidence. Hence, conditional probabilities and Bayes’s rule arise naturally when considering probabilistic nets. As we shall see, a major difference between CB nets and QB nets is that for QB nets, one assigns complex amplitudes rather than probabilities to each node.
2. THEORY OF CB NETS In this section, we will review the simple theory of CB nets. The next section will present some examples. We call a graph (or a diagram or an architecture) a collection of nodes with arrows connecting some pairs of these nodes. The arrows of the graph must satisfy certain constraints that will be specified below. We call a labelled graph a graph whose nodes are labelled. A CB net consists of two parts: a labelled graph with each node labelled by a random variable, and a collection of node matrices, one matrix for each node. These two parts must satisfy certain constraints that will be specified below. We define two kinds of arrows: internal arrows are those that have a starting node and a different ending one; external arrows are those that have a starting node but no ending one. We define two types of nodes: external nodes are those that have a single external arrow leaving it, and internal nodes are those that have one or more internal arrows leaving it. It is also common to use the terms root node or prior probability node for a node which has no incoming arrows, only outgoing ones. We restrict all nodes of a graph to be either internal or external. Hence, no nodes may have both an external and one or more internal arrows leaving it. We define each node of a CB net to represent a numerical value, and the whole net to represent the product of all these node values. We assign a numerical value to each node as follows. First, we assign a random variable to each node. Suppose the random variables assigned to the N nodes are x1 , x2 , · · · , xN . Define ZN = {1, 2, · · · , N}. For any finite set S, let |S| be the number of elements in S. If S = {k1 , k2 , · · · , k|S| } ⊂ ZN , define (x· )S = (xk1 , xk2 , · · · , xk|S| ) and (x· )S = (xk1 , xk2 , · · · , xk|S| ). Sometimes, we also abbreviate (x· )ZN (i.e., the vector that includes all the possible xj components) by just x· , and (x· )ZN by just x· . For j ∈ ZN , we imagine node xj to lie in state xj (See Fig.1). We also imagine all arrows leaving the node xj to lie in state xj , and thus we label all of them xj . At this point we’ve shown how to label each arrow in the graph by xk for some k ∈ ZN . Define Sj to be the set of all k such that an arrow labelled xk enters node xj . Now we assign a value P [xj |(x· )Sj ] to node xj . P [xj |(x· )Sj ] is what we referred to earlier as a node matrix; xj is the matrix’s row index and (x· )Sj is its column index. As the notation 3
suggests, we assume that the values P [xj |(x· )Sj ] are conditional probabilities; i.e., that they satisfy P [xj |(x· )Sj ] ≥ 0 , X xj
P [xj |(x· )Sj ] = 1 ,
(2.1) (2.2)
where the sum in Eq.(2.2) is over all states xj that the random variable xj can assume, and where Eqs. (2.1) and (2.2) must be satisfied for all j ∈ ZN and for all possible values of the vector (x· )Sj of random variables. The left hand side of Eq.(2.2) is just the sum over the entries along a column of a node matrix. The CB net is taken to represent the product of all the probabilities P [xj |(x· )Sj ] for j ∈ ZN . This product is a function P (x· ) of the current states x1 , x2 , · · · , xN of the nodes. Thus, P (x· ) =
Y
j∈ZN
P [xj |(x· )Sj ] .
(2.3)
We require that X x·
P (x· ) = 1 ,
(2.4)
as expected for a joint probability distribution of the random variables x1 , x2 , · · · , xN . Next, to illustrate the CB net concepts just presented, we will discuss all possible CB nets with two or three nodes. For 2 nodes labelled by random variables x and y, there are only 2 possible labelled graphs, depending on whether the arrow points from x to y (Fig.2a) or vice versa (Fig.2b). Figure 2a is diagrammatic notation for the following equation: P (x, y) = P (y|x)P (x) .
(2.5)
Notice that each probability factor on the right hand side of Eq.(2.5) is represented by a node in Fig.2a. Notice also that the prior probability P (x), since it has no conditions placed on it, is portrayed in Fig.2a by a node which has no arrows pointing into it. The probability P (y|x), on the other hand, depends on the value x of node x, and thus it is represented by a node with an arrow labelled x pointing into it. According to Eq.(1.1), Eq.(2.5) is a tautology (i.e., a statement that is always true). Thus, the net of Fig.2a may represent any probability distribution of two random variables x and y. Figure 2b is diagrammatic notation for the following equation: P (x, y) = P (x|y)P (y) .
4
(2.6)
Again, this equation is a tautology. Thus, like the net of Figs.2a, the net of Fig.2b may represent an arbitrary probability distribution of two random variables x and y. (A collorary is that two CB nets with different labelled graphs may still represent the same probability distribution. ) Figure 3a is diagrammatic notation for P (x, y, z) = P (z|x, y)P (y)P (x) .
(2.7)
Contrary to Eqs.(2.5) and (2.6), Eq.(2.7) does not represent a tautology: not all probability distributions of three random variables x, y and z must satisfy Eq.(2.7). In fact, summing both sides of the last equation over the values of z yields P (x, y) = P (y)P (x) ,
(2.8)
i.e., x and y are independent. Thus, the net of Fig.3a represents two independent random variables x and y. Figure 3b is diagrammatic notation for the following equation: P (x, y, z) = P (z|x)P (y|x)P (x) .
(2.9)
Dividing both sides of Eq.(2.9) by P (x) yields: P (y, z|x) = P (z|x)P (y|x) .
(2.10)
Even though y and z are not necessarily independent, they are conditionally independent, i.e., they are independent at fixed x = x. Thus, the net of Fig.3b represents two conditionally independent random variables y and z. Figure 3c is diagrammatic notation for the equation: P (x, y, z) = P (z|y)P (y|x)P (x) .
(2.11)
Random variables x1 , x2 , · · · , xN form an N step Markov chain (N may be infinite) if P (xn+1 |xn , xn−1 , · · · , x1 ) = P (xn+1 |xn ) for n = 1, 2, · · · , N − 1, i.e., if the (n+1)th step xn+1 depends only on the step xn immediately preceding it. Clearly, the random variables x, y and z of Fig.3c represent a 3 step Markov chain. Figure 3d is diagrammatic notation for the equation: P (x, y, z) = P (z|y, x)P (y|x)P (x) .
(2.12)
Replacing the conditional probabilities on the right hand side of Eq.(2.12) by their definitions in terms of probabilities without conditions, we obtain P (x, y, z) for the right hand side. Thus, Eq.(2.12) is a tautology, and the net of Fig.3d can represent any probability distribution of three random variables x, y and z. The graph of Fig.3d is acyclic. That is, it does not contain any cycles (a cycle is a closed path of arrows with the arrows all pointing in the same sense). On the other hand, the graph of Fig.4 is paracyclic (i.e., it contains at least one cycle). In 5
fact, the whole graph of Fig.4 is a cycle. The net of Fig.4 shall be forbidden because, if one sums over its free indices (x, y, z), one does not always obtain unity, as must be the case if the net is to represent a probability distribution P (x, y, z). For example, if we assume that x, y and z can assume only states 0 and 1, and if P (y|x) = δy,x , P (x|z) = δx,z and P (z|y) = δz,y , where δx,y is the Kronecker delta function, then Fig.4 P adds up to x,y,z δx,z δz,y δy,x = 2. The acyclic net of Fig.3d does add up to one, as proven in diagrammatic notation in Fig.5. In this figure, summation over the states of an arrow is indicated by giving the arrow a double shaft. Note that in Fig.5, we first add over the index z of the external arrow. This produces a new external arrow, and we add over its index y, and so on. Each time we add over the index of the current external arrow until finally we get unity. One cannot follow this summation procedure to show that the net of Fig.4 adds up to one. Indeed, Fig.4 has no external arrow to begin the procedure with. Certain aspects of the preceding discussion of nets with two or three nodes can be generalized to any number N of nodes. For any number of nodes, we will say that a graph is fully connected if it is acyclic, and if everyone of its nodes is connected to all other nodes by either incoming or outgoing arrows. We will say that a net is fully connected if its graph is fully connected. Any fully connected CB net represents a completely general joint probability distribution of the random variables labelling its nodes. Indeed, it is always possible to label the nodes of an N node fully connected CB net so that the net represents the right hand side of the following equation: P (x· ) = P (xN |xN −1 , xN −2 , · · · , x1 )P (xN −1 |xN −2 , xN −3 · · · , x1 ) · · · P (x2 |x1 )P (x1 ) . (2.13) And this last equation is a tautology. To label the nodes of a fully connected net so that Eq.(2.13) applies to it, one proceeds as follows. There always exists exactly one external node. This is why. The external node, call it xN , must exist. Otherwise, all nodes would have at least one outgoing internal arrow. Then one could start from any node and travel along one of its outgoing internal arrows to reach another node, and so on, until one came back to a previously visited node. Therefore, the graph would not be acyclic. The external node xN is unique because, since all other nodes have outgoing arrows that point to xN , all other nodes must be internal. Remove xN from the graph. For the same reasons as before, the resulting diminished graph contains a unique external node. Call the latter node xN −1 . Continue removing nodes in this way until all the nodes are labelled x1 , x2 , · · · , xN . Since Eq.(2.13) suggests that xj occurs after xj−1 for j = 2, 3, · · · , N, we call this node labelling (ordering) the chronological labelling of the graph. Two colloraries of the preceding proof are that fully connected graphs have a single external node, and that all fully connected N-node labelled graphs are identical once they are relabelled in the fashion described above. Figure 6a shows the fully connected four node graph with its chronological labelling. By deforming Fig.6a into a topologically equivalent diagram, one obtains 6
Fig.6b, a more “stylized” version of the same thing. Fig.6b might make more clear to some readers how the arrows of a fully-connected graph are organized. Note that one can relabel any graph chronologically, even if it isn’t fully connected. Indeed, given any graph G, one may add arrows to it to form a fully connected graph G. Call G a completion of G. Sometimes it is possible to add arrows to G in two different ways and arrive at two different completions. Hence completions are not always unique. Any completion of a graph G may be labelled chronologically following the procedure described above. This, in turn, gives the graph G a chronological ordering, albeit, not a necessarily unique one. Suppose that x and y are two nodes in a graph G. We say that x precedes y and write x < y if, for any completion of G, there exist integers j and j ′ with j < j ′ so that x = xj and y = xj ′ . We say that x and y are concurrent and write x ∼ y if x precedes y in some completions but y precedes x in others. And we say that x succeeds or follows y and write x > y if y < x. Call a CB pre-net a labelled graph and an accompanying set of node matrices that satisfy Eqs.(2.1), (2.2) and (2.3), but don’t necessarily satisfy the overall normalization condition Eq.(2.4). An acyclic CB pre-net always satisfies Eq.(2.4), and a paracyclic CB pre-net may not satisfy Eq.(2.4). This is why. Following the procedure just discussed, the nodes of any acyclic pre-net can be relabelled chronologically as x1 , x2 , · · · , xN . The relabelled graph, even if it is not fully connected, corresponds to the right hand side of Eq.(2.13), except that some of the conditional probabilities on the right hand side of Eq.(2.13) might include redundant conditioning. (If a conditional probability P (x|y, R) is known to satisfy P (x|y, R) = P (x|y), then we would say that the expression P (x|y, R) shows redundant conditioning on R.) This correspondence between any acyclic pre-net and the right hand side of Eq.(2.13) guarantees that acyclic pre-nets will satisfy the overall normalization condition Eq.(2.4). On the other hand, as we’ve shown with the 3-node pre-net of Fig.4, an N-node paracyclic pre-net may not reduce to unity upon summing over its free indices. If one tries the process of “adding over all the current external arrows” on a pre-net with a cycle embedded in it, at some point the process comes to a stop and cannot be completed due to a lack of a current external arrow to sum over next. Note that, in some sense, Eq.(2.13) embodies the principle of causality. Thus, acyclic CB pre-nets preserve causality, and paracyclic CB pre-nets violate this principle. If one considers only acyclic graphs, as we shall do henceforth, then there is no difference between CB nets and CB pre-nets. Note that if one sums both sides of Eq.(2.11) over y and divides by P (x), one obtains P (z|x) =
X
P (z|y)P (y|x) .
(2.14)
y
This last equation, valid for a Markov chain (x, y, z), is the so called ChapmanKolgomorov equation. Equation (2.14) is represented diagramatically in Fig.7. If one sums both sides of Eq.(2.12) over y and divides by P (x), one obtains
7
P (z|x) =
X
P (z|y, x)P (y|x) .
(2.15)
y
The last equation is a generalization of the Chapman-Kolgomorov equation to arbitrary random variables x, y, z that don’t necessarily form a Markov chain. Equation (2.15) is represented diagramatically in Fig.8. Note that in both Figs.7 and 8, we follow a process of adding over some of the arrows of a net to obtain a new net with fewer nodes. This process may be called coarsening or data compression, because it reduces the number of nodes and because the joint probability distribution of the new net carries less information than the joint probability distribution of the old one. In general, if we start with N nodes x1 , x2 , · · · , xN , and we sum over x1 , x2 , · · · , xk , then the resulting probability distribution of the variables xk+1 , xk+2 , · · · , xN can always be represented by a fully connected net with nodes xk+1 , xk+2 , · · · , xN . In modelling a classical physical situation by a CB net, if one knows very little about the nature of the probability distributions involved, one can always use a fully connected net. Later on, if one learns that certain pairs of random variables are conditionally independent, one may remove the arrows connecting these pairs of variables without changing the value of the full net. Once we have designed a net architecture and the net’s node matrices have been calculated, what next? How to use this information? The information is sufficient for calculating the joint probability P (x1 , x2 , · · · , xN ), where x1 , x2 , · · · , xN are the nodes of the net. From this joint probability, one can calculate P (xa |xb , xc , · · ·), the probability that any one node xa of the net assumes one of its states xa , conditioned on several other nodes xb , xc , · · · of the net assuming respective states xb , xc , · · ·. To go from P (x1 , x2 , · · · , xN ) to P (xa |xb , xc , · · ·), one can sum P (x1 , x2 , · · · , xN ) over those variables which are not in the list xa , xb , xc , · · · to get P (xa , xb , xc , · · ·). One can obtain P (xb , xc , · · ·) similarly, and then divide the two. However, this brute force method takes no advantage of the particular topology of the net, and so it is very labor intensive. Artificial intelligence and expert systems workers have found an algorithm for calculating P (xa |xb , xc , · · ·) from P (x1 , x2 , · · · , xN ) that takes advantage of the net topology to reduce dramatically the number of arithmetical operations required. For a discussion of this great achievement, the reader can consult Refs.[1]-[8]. The software package of Ref.[9] implements this fast algorithm.
3. EXAMPLES OF CB NETS The references at the end of this paper present examples of CB nets used in medical diagnosis[4],[6], monitoring of processes[1], genetics[5], etc. And this is just a small fraction of the possible applications of CB nets. Indeed, any probabilistic model can be discussed in terms of CB nets. In this section, we do not intend to present a comprehensive collection of CB net examples, but only to give a few simple introductory ones. 8
(a)digital circuits Figure 9 shows a Bayesian net version of an AND gate. The random variables x, y and z are binary (i.e. they assume values in {0, 1}.) The node matrices for this net consist of the prior probabilities P (x), P (y) and of the conditional probability P (z|x, y). P (z|x, y) is given by the table in Fig.9. With states 0 and 1 standing for false and true, respectively, this table is what one would expect if z = ( x and y ). Note that the columns of the node matrix P (z|x, y) add up to one, as required by Eq.(2.2). A node matrix whose entries are all either 0 or 1 (like P (z|x, y) in this example) will be called a deterministic node matrix. CB nets representing OR and NOT gates can be defined analogously to the AND net. Then one can construct CB nets for any combination of AND, OR and NOT gates. (b)constraint nodes Figure 10 shows a net for the binary random variables x, y and the random variable z ∈ {0, 1, 2}. To fully specify this net, one must give the prior probabilities P (x), P (y) and the conditional probability P (z|x, y). The node matrix P (z|x, y) is given by the table of Fig.10. This node matrix is deterministic and agrees with z = x + y. Note that if we know for sure that x + y = 1, then we can “fix” the sum node z to one, and calculate probabilities, like P (x|z = 1), which have z = 1 as evidence. In such a case, we call z a constraint node, because it is used to enforce the constraint x + y = 1. As another example of a net that possesses a node which is useful to constrain, consider Fig.11. In this net the random variables x, y and z are binary. To fully specify the net, one must give prior probabilities P (x), P (y) and the conditional probability P (z|x, y). P (z|x, y) is given by the table in Fig.11. This table is what one would expect if z = (if x then y). Since an “if x then y” statement does not say anything about what to do when x = 0, the probabilities P (z|x, y) are arbitrary and not necessarily deterministic when x = 0. The net of Fig.11 could be used by fixing the if-then node z to one and considering only probabilities with z = 1 as evidence. In such a case, we would say that the z node was a constraint node. (c)clauser-horne experiment Consider the Clauser-Horne experiment, which is used to observe violations of Bell-type inequalities[12]. Two particles, call them 1 and 2, are created at a common vertex and fly apart. Let λ represent the “hidden variables”. Particle 1 is subjected to a spin measurement along the direction A with outcome xA 1 ∈ {+, −} ′ A′ or along the direction A with outcome x1 ∈ {+, −}. Particle 2 is subjected to a spin measurement along the direction B with outcome xB 2 ∈ {+, −} or along the ′ B′ direction B with outcome x2 ∈ {+, −}. One can draw the net of Fig.12, which has nodes λ, xθ11 and xθ22 . (Figure 12 really represents 4 nets, one for each of the following possibilities: (θ1 , θ2 ) = (A, B), (A, B ′ ), (A′ , B), (A′ , B ′ )). To specify the net, one must give probabilities P (λ), P (xθ11 |λ) and P (xθ22 |λ). Note that if one sums the net of Fig.12 over λ, one gets
9
P (xθ11 , xθ22 ) =
X
P (xθ11 |λ)P (xθ22 |λ)P (λ) ,
X
P (x1 |θ1 , λ)P (x2 |θ2 , λ)P (λ) .
λ
(3.1)
which is the starting point in the derivation of the Bell inequalities for the ClauserHorne experiment. As a slightly more complicated experiment, one might choose at random whether θ1 = A or A′ and whether θ2 = B or B ′ . Such an experiment can be represented by a CB net with nodes θ1 , x1 , θ2 , x2 and λ, where xj for j ∈ {1, 2} represents the outcome of a measurement on particle j. See Fig.13. This net is specified if we give probabilities P (x1 |θ1 , λ), P (θ1 ), P (x2 |θ2 , λ), P (θ2 ) and P (λ). Summing the net over λ and dividing by P (θ1 )P (θ2 ) yields an equation analogous to Eq.(3.1): P (x1 , x2 |θ1 , θ2 ) =
λ
(3.2)
(d)random walk Suppose that a particle moves in a straight line, taking unit length steps either forwards or backwards, with probabilities p+ and p− , respectively, where p+ + p− = 1. Let xj be the position of the particle at time j, with j ∈ {0, 1, 2, · · · , }. Assume x0 , the starting position, is zero. Define ∆xj by ∆xj = xj − xj−1 , for j ∈ {1, 2, · · · , }. Figure 14a shows a CB net that represents the probability distribution of x0 , x1 , · · · and ∆x1 , ∆x2 , · · ·. Clearly, xj ∈ {0, ±1, ±2, · · · , ±j} and ∆xj ∈ {±1}. To fully specify the net of Fig.14a, we must give the probability matrices associated with each node. These matrices are P (x0 = 0) = 1 , P (∆xj ± 1) = p± for j = 1, 2, · · · , P (xj = y|xj−1 = x, ∆xj ± 1) = δ(y, x ± 1) for j = 1, 2, · · · ,
(3.3)
where δ is the Kronecker delta function. By summing the net of Fig.14a over all possible values of the indices ∆xj that label the arrows from the ∆xj to the xj nodes, one obtains Fig.14b. For this coarser net, one obtains P (x0 = 0) = 1 , P (xj = y|xj−1 = x) = p+ δ(y, x + 1) + p− δ(y, x − 1) for j = 1, 2, · · · .
(3.4)
One may go even further: For any k ∈ {0, 1, 2 · · ·}, one may sum Fig.14b over all xj such that j 6∈ {0, k}. One obtains the net of Fig.14c, with P (x0 = 0) = 1 , P (xk = xk |x0 = 0) = 10
r+s r
(3.5a) !
pr+ ps− ,
(3.5b)
where r −s = xk and r + s = k, and the first factor on the right hand side of Eq.(3.5b) is a combinatorial factor.
4. THEORY OF QB NETS In this section, we will define QB nets and explain how to use them to calculate quantum mechanical conditional probabilities. The next section will give examples of QB nets. Like a CB net, a QB net consists of two parts: a labelled graph and a collection of node matrices. These two parts must satisfy certain constraints that will be specified below. External and internal arrows, external and internal nodes, and root nodes are all defined in the same way for QB nets as for CB nets. All nodes of a QB net must be either internal or external. We define each node of a QB net to represent a numerical value, and the whole net to represent the product of all these node values. We assign a numerical value to each node as follows. First, we assign a random variable to each node. Suppose the random variables assigned to the N nodes are x1 , x2 , · · · , xN . Define ZN = {1, 2, · · · , N}. For any finite set S, let |S| be the number of elements in S. If S = {k1 , k2 , · · · , k|S| } ⊂ ZN , define (x· )S = (xk1 , xk2 , · · · , xk|S| ) and (x· )S = (xk1 , xk2 , · · · , xk|S| ). Sometimes, we also abbreviate (x· )ZN (i.e., the vector that includes all the possible xj components) by just x· , and (x· )ZN by just x· . For j ∈ ZN , we imagine node xj to lie in state xj . We also imagine all arrows leaving the node xj to lie in state xj , and thus we label all of them xj . At this point we’ve shown how to label each arrow in the graph by xk for some k ∈ ZN . Define Sj to be the set of all k such that an arrow labelled xk enters node xj . Now we assign a complex number A[xj |(x· )Sj ] to node xj . A[xj |(x· )Sj ] is the node matrix for node xj ; xj is the matrix’s row index and (x· )Sj is its column index. We require that the quantities A[xj |(x· )Sj ] be probability amplitudes that satisfy 2 X A[xj |(x· )Sj ]
=1,
(4.1)
xj
where the sum in Eq.(4.1) is over all states xj that the random variable xj can assume, and where Eq.(4.1) must be satisfied for all j ∈ ZN and for all possible values of the vector (x· )Sj of random variables. The QB net is taken to represent the product of all the probability amplitudes A[xj |(x· )Sj ] for j ∈ ZN . This product is a function A(x· ) of the current states x1 , x2 , · · · , xN of the nodes. Thus, A(x· ) =
Y
j∈ZN
A[xj |(x· )Sj ] .
(4.2)
Let ZNext be the set of all j ∈ ZN such that xj is an external node, and let ZNint be 11
the set of those j ∈ ZN such that xj is an internal node. Clearly, ZNext and ZNint are disjoint and their union is ZN . We require A(x· ) to satisfy X
(x· )Z ext N
and
X (x ) int · ZN
X x·
2 A(x· )
=1
|A(x· )|2 = 1 .
(4.3)
(4.4)
Note that as a consequence of Eqs.(4.1) and (4.4), given any QB net, one can construct a special CB net by replacing the value A[xj |(x· )Sj ] of each node by its magnitude squared. We call this special CB net the parent CB net of the QB net from which it was constructed. We call it so because, given a parent CB net, one can replace the value of each node by its square root times a phase factor. For a different choice of phase factors, one generates a different QB net. Thus, a parent CB net may be used to generate a whole family of QB nets. A QB pre-net is a labelled graph and an accompanying set of node matrices that satisfy Eqs.(4.1), (4.2) and (4.3), but don’t necessarily satisfy Eq.(4.4). A QB pre-net that is acyclic satisfies Eq.(4.4), because its parent CB pre-net is acyclic and this implies that Eq.(4.4) is satisfied. If one considers only acyclic graphs, as we shall do henceforth, then there is no difference between QB nets and QB pre-nets. In the second quantized formulation of quantum mechanics, one speaks of M modes represented by M annihilation operators a1 , a2 , · · · , aM , which obey certain commutation relations amongst themselves. Define Z0+ = {0, 1, 2, · · ·} and let ni ∈ Z0+ for i = 1 to M. For M modes, one uses quantum states like a†n1 a†n2 a†nM √1 √2 · · · √M |0i . n1 ! n2 ! nM !
(4.5)
The state given by Eq.(4.5) is specified by a vector (n1 , n2 , · · · , nM ) of occupation numbers. For the rest of this paper, we will consider only QB nets whose states xj are vectors (nj,1 , nj,2, · · · , nj,Kj ) of occupation numbers. Define Γ to be the set of all α such that nα is an occupation number of the QB net under consideration. For example, α = (j, 1) in nj,1 , where nj,1 is a component of xj = (nj,1 , nj,2, · · · , nj,Kj ). For any set Γ′ = {α1 , α2 , · · · , α|Γ′ | } ⊂ Γ, let (n· )Γ′ = (nα1 , nα2 , · · · , nα|Γ′ | ) and (n· )Γ′ = (nα1 , nα2 , · · · , nα|Γ′ | ). We will sometimes abbreviate (n· )Γ by n· and (n· )Γ by n· . For each j, define Γj to be the set such that (n· )Γj is the occupation number vector specifying the state of node xj . Thus, xj = (n· )Γj and xj = (n· )Γj . If xj is an external node, call all the components of (n· )Γj external occupation numbers, and if xj is an internal node, call all the components of (n· )Γj internal occupation numbers. Let Γext be the set of all α such that nα is an external occupation number, and let Γint be the 12
set of all α such that nα is an internal occupation number. Clearly, Γint and Γext are disjoint and their union is Γ. Note that (x· )ZNext = (n· )Γext , (x· )ZNint = (n· )Γint , and (x· )ZN = (n· )Γ (Analogous statements with x and n underlined to indicate random variables also hold). Consider a classical probability distribution P (n· ). For any Γ0 ⊂ Γ, we may define the characteristic function χc by χc [(n· )Γ0 ] =
X m·
P (m· )
Y
δ(mα , nα ) ,
(4.6)
α∈Γ0
where δ is the Kronecker delta function. (The “c” subscript in χc stands for “classical”.) If ΓH and ΓE are disjoint subsets of Γ, one defines the classical conditional probability that (n· )ΓH = (n· )ΓH (hypothesis) given that (n· )ΓE = (n′· )ΓE (evidence), by P [(n· )ΓH = (n· )ΓH |(n· )ΓE = (n′· )ΓE ] =
χc [(n· )ΓH , (n′· )ΓE ] . χc [(n′· )ΓE ]
(4.7)
By setting ΓE = φ in the last equation, we conclude that for any Γ0 ⊂ Γ, P [(n· )Γ0 = (n· )Γ0 ] = χc [(n· )Γ0 ] .
(4.8)
χc [(n· )Γ0 , (n′· )Γ′0 ] = χc [(n· )Γ0 ] .
(4.9)
According to the definition Eq.(4.6), if Γ0 and Γ′0 are disjoint subsets of Γ, then X
(n′ )Γ′
·
0
Equations (4.8) and (4.9) imply that X
(n′ )Γ′ 0
for
Γ0 , Γ′0
⊂ Γ, Γ0 ∩
Γ′0
·
P [(n· )Γ0 , (n′· )Γ′0 ] = P [(n· )Γ0 ] ,
(4.10)
= φ. Equation (4.9), when applied to Eq.(4.7), yields X
(n· )ΓH
P [(n· )ΓH |(n′· )ΓE ] = 1 .
(4.11)
Furthermore, from Eq.(4.7) it is obvious that P [(n· )ΓH |(n′· )ΓE ] ≥ 0 .
(4.12)
Equations (4.11) and (4.12) imply that Eq.(4.7) is an adequate definition of conditional probability. Now consider a quantum mechanical probability amplitude A(n· ). For any Γ0 ⊂ Γ, we define the characteristic function χ by
13
χ[(n· )Γ0 ] =
X
X (m ) int
2 A(m· ) δ(mα , nα ) α∈Γ0 Y
.
(4.13)
·Γ If ΓH and ΓE are disjoint subsets of Γ, one defines the quantum mechanical conditional probability that (n· )ΓH = (n· )ΓH (hypothesis) given that (n· )ΓE = (n′· )ΓE (evidence), by (m· )Γext
P [(n· )ΓH = (n· )ΓH |(n· )ΓE
χ[(n· )ΓH , (n′· )ΓE ] . = (n· )ΓE ] = P ′ (m· )ΓH χ[(m· )ΓH , (n· )ΓE ] ′
(4.14)
Note that the denominator of the right hand side of Eq.(4.14) depends on both ΓE and ΓH , unlike the analogous denominator in the classical definition Eq.(4.7). In quantum mechanics, with χc replaced by χ, Eq.(4.8) is not true for all Γ0 ⊂ Γ (although it is true if Γ0 ⊂ Γext ). In quantum mechanics, with χc replaced by χ, Eqs.(4.9) and (4.10) are not necessarily true, whereas Eqs.(4.11) and (4.12) are obviously always true. Because Eqs.(4.11) and (4.12) are satisfied in quantum mechanics, Eq.(4.14) is an adequate definition of conditional probability. Note that both the classical and quantum mechanical definitions of conditional probability Eqs. (4.7) and (4.14) implicitly assume that we perform non-destructive measurements on any internal nodes that might be measured. Hence, if a particle is detected at an internal node, it is allowed to continue past that node so that it can reach other nodes further downstream. If one wishes to perform a destructive measurement on a particle when it passes through an internal node x, then one is representing the physical situation by the wrong CB or QB net; what is required is a net that has the node x as an external node. In Appendix A, the classical and quantum mechanical definitions of conditional probability Eqs.(4.7) and (4.14) are generalized so that they allow either sharp or fuzzy hypotheses and pieces of evidence. If the evidence narrows the set of possible values for n1 to a single number, say n1 = 1, then we say that the evidence for n1 is sharp. If the evidence doesn’t do this, then we say that the evidence for n1 is fuzzy. Sharp and fuzzy hypotheses are defined analogously. Note that if we had used Eq.(4.7) (with χc replaced by χ) as the quantum mechanical definition of conditional probability, there would have been no guarantee that Eq.(4.11) would be satisfied. For given ΓH , ΓE and (n′· )ΓE , define the quantum non-additivity factor fqna by ′
fqna [ΓH , ΓE , (n· )ΓE ] =
P
(m· )ΓH
χ[(m· )ΓH , (n′· )ΓE ] . χ[(n′· )ΓE ]
(4.15)
This quantity will be calculated for the examples in the next section. If fqna = 1, then Eq.(4.7) (with χc replaced by χ) and Eq.(4.14) agree; if fqna 6= 1, then Eq.(4.7) does not give a well defined probability distribution for (n· )ΓH whereas Eq.(4.14) does. 14
In quantum mechanics, χ[(n· )ΓH , (n′· )ΓE ] is proportional to the number of occurrences of (n· )ΓH = (n· )ΓH and (n· )ΓE = (n′· )ΓE in an experiment that measures all the ΓH and ΓE nodes. χ[(n′· )ΓE ] is proportional to the number of occurrences of (n· )ΓE = (n′· )ΓE in an experiment that measures only the ΓE nodes, leaving the ΓH nodes undisturbed. Thus, Eq.(4.7) (with χc replaced by χ) refers to the ratio of the number of occurrences in two different types of experiments (one type measuring the ΓH ∪ ΓE nodes and the other only the ΓE nodes). On the other hand, Eq.(4.14) refers to the ratio of occurrences in a single type of experiment (measuring the ΓH ∪ ΓE nodes). It seems to us that the latter ratio is the more useful of the two. By using CB and QB nets, one is led easily and naturally to calculate probabilities by considering sums over paths, the type of sums advocated by Feynman for quantum mechanics and by Kac for Brownian motion. Indeed, one can express the classical and quantum mechanical definitions of conditional probability Eqs.(4.7) and (4.14) in terms of sums over paths rather than sums over node states. We do so for arbitrary CB and QB nets in Appendix B. Specific examples illustrating the affinity of QB nets with sums over paths can be found in the next section and in Appendix C. Appendix C presents a QB net which yields the Feynman path integral for a single mass particle under the influence of an arbitrary potential. Using an approach similar to that of Appendix C, it should be possible to define QB nets that yield the Feynman integrals employed in non-relativistic and relativistic quantum field theories.
5. EXAMPLES OF QB NETS In this section, we will present the results of a computer program that calculates conditional probabilities for QB nets. In particular, we shall consider QB nets for experiments containing either two or three Stern-Gerlach magnets. We will restrict our attention to experiments involving a single particle. For simplicity, for each experimental configuration, we will assume that the magnetic field vectors of all the Stern-Gerlach magnets are coplanar. Figures 15, 16 and 17 show the three kinds of nodes that will be used in this section. In Fig.15, the triangle represents a root node. This node will stand for ψnz− nz+ , with (nz− , nz+ ) ∈ {(0, 1), (1, 0)} and |ψ01 |2 + |ψ10 |2 = 1. ψnz− nz+ is just the initial wavefunction for the single particle under consideration. In Fig.16a, the black-filled circle represents a marginalizer node. The single incoming arrow is in a state characterized by a vector (n1 , n2 , · · · , nK ) of occupation numbers. The outgoing arrow is in a state characterized by a single occupation number n′1 . The amplitude associated with the node is A(n′1 |n1 , n2 , · · · , nK ) = δ(n′1 , n1 ) .
(5.1a)
Thus, a marginalizer node takes a vector of occupation numbers and projects out one
15
of its components. Note that Eq.(5.1a) satisfies Eq.(4.1). In Fig.16b, the black-filled circle with a phase factor next to it represents a phase shifter node. The single incoming (outgoing) arrow is in a state characterized by a single occupation number n1 (n′1 ). The amplitude associated with the node is A(n′1 |n1 ) = eiξ δ(n′1 , n1 ) ,
(5.1b)
where ξ is a real constant. Note that Eq.(5.1b) satisfies Eq.(4.1). In Figs.17, the white-filled circles represent Stern-Gerlach magnets. The outgoing arrows are labelled by a vector nαu = (nαu− , nαu+ ) of occupation numbers. Here the unit vector uˆ represents the direction of the magnet’s magnetic field, and α labels the magnet ( uˆ is not enough to label the magnet if the experiment contains more that one magnet whose magnetic field points along the uˆ direction.) The vector nαu ∈ {(0, 0), (0, 1), (1, 0)} specifies a state (see Appendix D) α
aα† u−
α
nu− α† nu+ (aα† (au+ ) |0i . u− )
(5.2)
aᆠu+
The creation operators and create particles in the |−u i and |+u i states respectively. In Fig.17a there is one arrow entering the node, whereas in the Fig.17b there are two. In general, since we are considering a single particle experiment, there may be any number of arrows entering the node, but only one may be in state 1, all others must be in state 0. In Figs.17a and 17b, the amplitude assigned to each node is given by a table next to the graph. Clearly, the tables in Figs.17a and 17b both satisfy Eq.(4.1). For each experimental configuration, we will assume that the magnetic fields of all magnets are coplanar. The plane containing these magnetic field vectors may be chosen to be the X-Z plane with φ = 0. Thus, the matrix elements in the tables of Figs.17a and 17b are given in terms of angular parameters by Eqs.(D.7) and (D.8). (a) experiments with 2 stern-gerlach magnets We will consider 2 configurations with 2 magnets: Fig.18 (the tree diagram) and Fig.19 (the loop diagram). The diagrams are like road maps, with the arrows representing the various roads along which the particle may travel. A single particle travelling through the experimental configuration of Fig.18 could exit through either the nz− , nu− or nu+ nodes. There is only one possible path leading to each of these outcomes. Thus, one has F I1 = A(π1 ) , A(π1 ) = ψ10 , F I2 = A(π2 ) , A(π2 ) = h−u |+z iψ01 , F I3 = A(π3 ) , A(π3 ) = h+u |+z iψ01 .
(5.3)
For i ∈ {1, 2, 3}, A(πi ) is the amplitude for path πi . For j ∈ {1, 2, 3}, F Ij is a Feynman integral; that is, the sum of the amplitudes for all paths with a given final 16
state. (See Appendix B). We have already checked that Eq.(4.1) is satisfied. We did so when we defined the wavefunction, marginalizer and Stern-Gerlach nodes. It is P easy to show that 3j=1 |F Ij |2 = 1, so Eq.(4.3) is also satisfied for this net. A single particle travelling through the experimental configuration of Fig.19 could exit through either the nu− or the nu+ nodes. There are two possible paths leading to each of these two outcomes. Thus, one has F I1 = A(π1 ) + A(π2 ) , A(π1 ) = h−u |−z iψ10 , A(π2 ) = h−u |+z iψ01 , F I2 = A(π3 ) + A(π4 ) , A(π3 ) = h+u |−z iψ10 , A(π4 ) = h+u |+z iψ01 .
(5.4)
As before, for i ∈ {1, 2, 3}, A(πi ) is the amplitude for path πi , and for j ∈ {1, 2}, F Ij P is a Feynman integral. It is easy to show that 2j=1 |F Ij |2 = 1, so Eq.(4.3) is satisfied by this net. We will call an arrow or a node simple if its state is characterized by a single occupation number (for example, a marginalizer node and its outgoing arrow are both simple). We will say that a net is fully marginalized if the incoming arrows of any node that is not a marginalizer are all simple, and all external arrows are simple. The nets of Figs.18 and 19, and, in fact, all the nets considered in this section, are fully marginalized. For each of the nets of Figs.18 and 19, we used a computer program to calculate probabilities with one or two hypotheses and with zero, one or two pieces of evidence. More precisely, for the 2 QB nets in Figs. 18 and 19, and for their parent CB nets, for n, n′ , m and m′ ∈ {nu+ , nu− , nz+ , nz− }, for all n, n′ , m and m′ ∈ {0, 1}, we calculated (using Eqs.(4.7) or (4.14)) the unconditional probabilities P (n) and P (n, n′ ), and the following conditional probabilities: P (n = n|m = m) ,
(5.5)
P (n = n|m = m, m′ = m′ ) ,
(5.6)
P (n = n, n′ = n′ |m = m) ,
(5.7)
P (n = n, n′ = n′ |m = m, m′ = m′ ) .
(5.8)
The table of Fig.20, call it the evidence-case file, gives the various sets of evidence that were considered. For example, in case 2 (the row that starts with a 2), we assumed nz+ = 0, whereas the values of the remaining occupation numbers ( nz− , nu+ , nu− ) were assumed to be unknown (i.e., we assumed there was no evidence
17
as to whether they were 0 or 1 and this is indicated in Fig.20 with a blank space ). In case 11, we assumed nz− = 1 and nz+ = 0, whereas the values of the remaining occupation numbers ( nu+ , nu− ) were assumed to be unknown. Note that for the evidence-cases 2 to 9 we considered 1 piece of evidence (as in Eqs.(5.5) and (5.7)), and for the evidence-cases 10 to 33 we considered 2 pieces of evidence (as in Eqs.(5.6) and (5.8)). To get numerical values for the probabilities associated with the nets Figs.18 and 19, particular values had to be assumed for the initial wavefunction and for the magnetic field direction of each magnet. We used 1+i π 1 , ψ10 = √ , θu − θz = . (5.9) 2 5 2 For the tree graph of Fig.18, all the probability distributions for the QB net and for its parent CB net were identical. This is not surprising since tree graphs, i.e., graphs without any loops, have no interfering paths. That is, for a given final state, there is only one possible path that produces that final state. More interesting results were obtained for the loop graph of Fig.19. Figures 21 and 22 show our computer program’s output for the loop net, for cases 1, 2, 4, 10, 12 (See evidence-case file Fig.20). Figure 21 gives one-hypothesis probabilities of the type Eq.(5.5) and (5.6), whereas Fig.22 gives two-hypotheses probabilities of the type Eq.(5.7) and (5.8). First consider Fig.21. Columns A to D refer to the parent CB net and columns F to I to the QB net. Column A for the CB net (F for the QB net) gives the identity of the occupation number n in Eqs.(5.5) and (5.6). Columns B and C for the CB net (G and H for the QB net) give the probabilities that n = 0 and n = 1, respectively, in light of the evidence. Column D for the CB net (I for the QB net) gives the quantity fqna defined by Eq.(4.15). Note that for evidence-case 10, there was no output, because the computer program detected a contradiction. (In case 10, we were assuming that nz+ = nz− = 0, which is impossible since nz+ + nz− = 1.). More interesting is the evidence-case 4. In this case, for the QB net, column I indicates that fqna 6= 1 for the distributions P (nz+ |nu+ = 0) and P (nz− |nu+ = 0).[13] Next consider Fig.22. Columns A to F refer to the parent CB net and columns H to M to the QB net. Column A for the CB net (H for the QB net) gives the identity of the occupation numbers n and n′ in Eqs.(5.7) and (5.8). Columns B to E for the CB net (I to L for the QB net) give the probabilities that (n, n′ ) = (0, 0), (0, 1), (1, 0), (1, 1), respectively. Column F for the CB net (M for the QB net) gives fqna . For example, in Fig.22 we see that for evidence-case 4, for the QB net, all the probability distributions except P (nu+ , nu− |nu+ = 0) have fqna 6= 1.[13] (b) experiments with three stern-gerlach magnets We will consider 7 configurations with 3 Stern-Gerlach magnets: Figs.23 to 29. For Fig.23, define ψ01 =
18
P4
It is easy to show that For Fig.24, define
j=1
P3
It is easy to show that For Fig.25, define
(5.11)
|F Ij |2 = 1, so Eq.(4.3) is satisfied by this net.
F I1 = A(π1 ) , A(π1 ) = ψ10 , F I2 = A(π2 ) + A(π3 ) , A(π2 ) = h−u |−v ih−v |+z iψ01 , A(π3 ) = h−u |+v ih+v |+z iψ01 , F I3 = A(π4 ) + A(π5 ) , A(π4 ) = h+u |−v ih−v |+z iψ01 , A(π5 ) = h+u |+v ih+v |+z iψ01 .
j=1
(5.10)
|F Ij |2 = 1, so Eq.(4.3) is satisfied by this net.
F I1 = A(π1 ) + A(π2 ) , A(π1 ) = h−v |−z iψ10 , A(π2 ) = h−v |+z iψ01 , F I2 = A(π3 ) + A(π4 ) , A(π3 ) = h−u |+v ih+v |−z iψ10 , A(π4 ) = h−u |+v ih+v |+z iψ01 , F I3 = A(π5 ) + A(π6 ) , A(π5 ) = h+u |+v ih+v |−z iψ10 , A(π6 ) = h+u |+v ih+v |+z iψ01 .
j=1
P3
It is easy to show that For Fig.26, define
F I1 = A(π1 ) , A(π1 ) = ψ10 , F I2 = A(π2 ) , A(π2 ) = h−v |+z iψ01 , F I3 = A(π3 ) , A(π3 ) = h−u |+v ih+v |+z iψ01 , F I4 = A(π4 ) , A(π4 ) = h+u |+v ih+v |+z iψ01 .
|F Ij |2 = 1, so Eq.(4.3) is satisfied by this net.
19
(5.12)
F I1 = A(π1 ) + A(π2 ) + A(π3 ) + A(π4 ) , A(π1 ) = h−u |+v ih+v |−z iψ10 , A(π2 ) = h−u |−v ih−v |−z iψ10 , A(π3 ) = h−u |+v ih+v |+z iψ01 , A(π4 ) = h−u |−v ih−v |+z iψ01 , F I2 = A(π5 ) + A(π6 ) + A(π7 ) + A(π8 ) , A(π5 ) = h+u |+v ih+v |−z iψ10 , A(π6 ) = h+u |−v ih−v |−z iψ10 , A(π7 ) = h+u |+v ih+v |+z iψ01 , A(π8 ) = h+u |−v ih−v |+z iψ01 . P2
It is easy to show that For Fig.27, define
j=1
|F Ij |2 = 1, so Eq.(4.3) is satisfied by this net. F I1 = A(π1 ) , A(π1 ) = h−v |−z iψ10 , F I2 = A(π2 ) , A(π2 ) = h+v |−z iψ10 , F I3 = A(π3 ) , A(π3 ) = h−u |+z iψ01 , F I4 = A(π4 ) , A(π4 ) = h+u |+z iψ01 .
(5.14)
F I1 = A(π1 ) , A(π1 ) = h−v |−z iψ10 , F I2 = A(π2 ) + A(π3 ) , A(π2 ) = eiξ h+u |+v ih+v |−z iψ10 , A(π3 ) = h+u |+z iψ01 , F I3 = A(π4 ) + A(π5 ) , A(π4 ) = eiξ h−u |+v ih+v |−z iψ10 , A(π5 ) = h−u |+z iψ01 .
(5.15)
P4
It is easy to show that For Fig.28, define
(5.13)
j=1
|F Ij |2 = 1, so Eq.(4.3) is satisfied by this net.
Note that we have included a phase factor eiξ which can be thought of as arising from the Stern-Gerlach node u when its input nv+ equals 1 (or from node v when its output nv+ equals 1, or from node z when its output nz− equals 1). Alternatively, the phase factor may be though of as arising from a phase shifter node, not pictured in Fig.28, P located in the middle of the arrow labelled nv+ . If ξ = 0, then 3j=1 |F Ij |2 6= 1 and therefore Eq.(4.3) is violated. On the other hand, if eiξ = i
∗ ψ01 ψ10 , ∗ |ψ01 ψ10 |
20
(5.16)
then
P3
j=1 |F Ij |
2
= 1.[14] For Fig.29, define F I1 = A(π1 ) + A(π2 ) + A(π3 ) , A(π1 ) = h+u |+z iψ01 , A(π2 ) = eiξ h+u |+v ih+v |−z iψ10 , A(π3 ) = eiξ h+u |−v ih−v |−z iψ10 , F I2 = A(π4 ) + A(π5 ) + A(π6 ) , A(π4 ) = h−u |+z iψ01 , A(π5 ) = eiξ h−u |+v ih+v |−z iψ10 , A(π6 ) = eiξ h−u |−v ih−v |−z iψ10 .
(5.17)
It is easy to show that 2j=1 |F Ij |2 = 1 for any value of ξ. However, we believe the correct choice of ξ to be the one given by Eq.(5.16). For this choice, P (nu− = 1|nv− = 0) is the same for Figs.28 and 29. And this is what we expect. Otherwise, empty De Broglie waves could influence the outcome of an experiment (and thus could be detected), which does not appear to be the case experimentally. For the QB nets of Figs.23 to 29, and for their parent CB nets, for n, n′ , m and m′ ∈ {nu± , nv± , nz± }, for all n, n′ , m and m′ ∈ {0, 1}, we calculated the conditional probabilities of Eqs.(5.5) to (5.8). The table of Fig.30 is an evidence-case file that gives the sets of evidence that were considered for the nets of Figs.23 to 29. For example, in case 2 ( the row that starts with a 2), we assumed nz+ = 0, whereas the values of the remaining occupation numbers (nu± , nv± , nz− ) were assumed to be unknown. To get numerical values for the probabilities associated with the nets of Figs.23 to 29, particular values for ψ01 , ψ10 , θu , θv , θz were assumed. For the tree graphs Figs.23 and 27, and also for the non-tree graph Fig.28, we found that the QB net and its parent CB net yielded identical probability distributions. P
APPENDIX A. CONDITIONAL PROBABILITIES FOR FUZZY MEASUREMENTS In this appendix, we will generalize the classical and quantum mechanical definitions of conditional probability Eqs.(4.7) and (4.14) so as to include either sharp or fuzzy hypotheses and pieces of evidence. The following simple results from set theory are relevant. Given any finite set S, we will denote the number of elements in S by |S|. Given two sets R and S, we define the direct product set R × S by R × S = {(x, y)|x ∈ R, y ∈ S} .
21
(A.1)
R × S is also denoted by the vector of sets (R, S). Given sets R1 , R2 , S1 , S2 , it is easy to show that (see Fig.A.1) (R1 × R2 ) ∩ (S1 × S2 ) = (R1 ∩ S1 ) × (R2 ∩ S2 ) .
(A.2)
Therefore, (R1 × R2 ) ∩ (S1 × S2 ) = φ if and only if R1 ∩ S1 = φ or R2 ∩ S2 = φ. Given any set S and any integer n, we will denote by S n the product S × S × · · · × S of n copies of S. Suppose Qα ⊂ Z0+ for each α. If Γ′ = {α1 , α2 , · · · , α|Γ′ | } ⊂ Γ, we define the vector of sets or direct product set (Q· )Γ′ by (Q· )Γ′ = (Qα1 , Qα2 , · · · , Qα|Γ′ | ). |Γ′ |
Equivalently, one may write (Q· )Γ′ = Qα1 ×Qα2 ×· · ·×Qα|Γ′ | . Note that (Q· )Γ′ ⊂ Z0+ . Sometimes, we will abbreviate (Q· )Γ by just Q· . Two direct product sets (R· )|Γ′ | and (S· )|Γ′ | are disjoint if and only if there exists an α ∈ Γ′ such that Rα ∩ Sα = φ. We will need to consider collections H = {H·i |i = 1, 2 · · · , |H|} of direct product sets |Γ| |Γ| H·i ⊂ Z0+ . Such a collection will be said to be a partition of Z0+ if any pair of distinct |Γ| |H| sets H·i and H·j is disjoint and ∪i=1 H·i = Z0+ . We are interested in P (n· ∈ H· |n· ∈ E· ), i.e., the probability that nα ∈ Hα for all α ∈ Γ given that nβ ∈ Eβ for all β ∈ Γ. Hα ⊂ Z0+ is the hypothesis for nα , and Eβ ⊂ Z0+ is the evidence for nβ . If Hα contains only one element of Z0+ , then we say that the hypothesis for nα is sharp, whereas if Hα contains more that one element of Z0+ , we say that the hypothesis for nα is fuzzy. Analogously, the number of elements in Eβ determines whether the evidence for nβ is sharp or fuzzy. For any set S, define the indicator function 1S (x) by 1S (x) =
(
0 1
if x 6∈ S if x ∈ S
.
(A.3)
Consider first a classical probability distribution P (n· ). One defines P (n· ∈ H· |n· ∈ E· ) =
P (n· ) = P n· ∈E· P (n· )
P
n· ∈H· ∩E·
P (n· ) α∈Γ 1Hα ∩Eα (nα ) . P Q n· P (n· ) α∈Γ 1Eα (nα )
P
n·
Q
(A.4)
To write the last equation more succinctly, it is convenient to define the filter function |Γ| fQ· (n· ), for any direct product set Q· ⊂ Z0+ , by fQ· (n· ) =
Y
1Qα (nα ) .
(A.5)
α∈Γ
It is also convenient to define the characteristic functional χc [K] of any function K(n· ) of n· by χc [K] =
X n·
P (n· )K(n· ) .
22
(A.6)
(The c subscript in χc stands for “classical”). Now Eq.(A.4) can be written succinctly as P (n· ∈ H· |n· ∈ E· ) =
χc [fH· ∩E· ] . χc [fE· ]
(A.7)
Note that fH· ∩E· = fH· fE· . Note that if H = {H·i |i = 1, 2, · · · , |H|} is a partition of |Γ| Z0+ , then Eq.(A.7) satisfies |H| X i=1
P (n· ∈ H·i |n· ∈ E· ) = 1 .
(A.8)
Now consider a quantum mechanical probability amplitude A(n· ). We define the characteristic functional χ[K] by χ[K] =
X ext (n ) int
X
(n· )Γ
·
Γ
2 A(n· )K(n· )
.
(A.9)
|Γ|
Suppose H = {H·i|i = 1, 2, · · · , |H|} is a partition of Z0+ . In quantum mechanics, the conditional probability that n· ∈ H·i given that n· ∈ E· , depends on the choice of partition H. We define this probability by χ[fH i ∩E· ] PH (n· ∈ H·i |n· ∈ E· ) = P|H| · . j=1 χ[fH j ∩E· ] · Equation (A.8) is trivially satisfied by Eq.(A.10).
(A.10)
APPENDIX B. CONDITIONAL PROBABILITIES EXPRESSED IN TERMS OF PATH SUMS In this appendix, we will express the classical and quantum mechanical definitions Eqs.(A.7) and (A.10) of conditional probability in terms of sums over paths. Simple examples of the results of this appendix may be found in Section 5 of this paper. Let P (n· ) and A(n· ) be the values of a CB net and a QB net, respectively. Suppose that P(n· ) = P (n· ) if a CB net is being considered and P(n· ) = A(n· ) if a QB net is. Let Π be the set of all possible paths. Π is defined so that there is exactly one π ∈ Π for each n· that satisfies P(n· ) 6= 0. Thus, there is a one to one onto map n· (π) from Π into {n· |P(n· ) 6= 0}. For any Γ′ ⊂ Γ, if m· is such that (m· )Γ′ = (n· )Γ′ , call m· an extension of (n· )Γ′ . Let Σ be the set of all possible final states. Σ is defined so that there is
23
exactly one σ ∈ Σ for each (n· )Γext for which there exists an extension n· such that P(n· ) 6= 0. Thus, there is a one to one onto map from Σ into {(n· )Γext |P(n· ) 6= 0}. Define the partition function Z to be a function from Π to Σ such that Z(π) = σ if the path π has σ as final state. For each σ ∈ Σ, let C(σ) be the set of all paths π that have σ as final state; i.e., C(σ) = {π ∈ Π|Z(π) = σ}. Clearly, the sets (equivalence classes) C(σ) and C(σ ′ ) are disjoint when σ 6= σ ′ , and ∪σ∈Σ C(σ) = Π. For each π ∈ Π, define P(π) by P(π) = P(n· (π)). |Γ| For any direct product set Q· ⊂ Z0+ , define the filter function f Q· (π) by f Q· (π) =
Y
1Qα (nα (π)) .
(B.1)
α∈Γ
Classically, one defines the characteristic functional χc [K] of any function K(π) of π by χc [K] =
X
X
P (π)K(π) .
(B.2)
σ∈Σ π∈C(σ)
Then P (n· ∈ H· |n· ∈ E· ) =
χc (f H· ∩E· ) χc (f E· )
.
(B.3)
Quantum mechanically, one defines the characteristic functional χ[K] by χ[K] =
2 X X A(π)K(π) σ∈Σ π∈C(σ)
.
(B.4)
|Γ|
If H = {H·i|i = 1, 2, · · · , |H|} is a partition of Z0+ , then one defines χ(f H i ∩E· ) PH (n· ∈ H·i |n· ∈ E· ) = P|H| · . j=1 χ(f H j ∩E· ) ·
(B.5)
APPENDIX C. QB NET FOR SINGLE PARTICLE FEYNMAN PATH INTEGRAL In this appendix, we will present a QB net which yields the Feynman path integral[15] that in turn yields the Schroedinger equation for a single mass particle in an arbitrary potential. We begin by restricting positions x to lie in the interval (“box”) [ −L , L2 ], 2 where L is much larger that any other length (including the particle’s position coordinate) occurring in the problem. Then we divide the box into subintervals of 24
infinitesimal length ∆x, and call the midpoints of these subintervals xs with s ∈ {0, ±1, ±2, · · · , ±Nx /2} = Zx . Similarly, we restrict times t to lie in the interval [0, T ]. Then we divide the interval [0, T ] into subintervals of infinitesimal length ∆t, and call the midpoints of these subintervals ti with i ∈ {0, 1, 2, · · · , Nt } = Zt . Henceforth, for any function f (xs ), we will use f (·) to represent the vector whose sth component is f (xs ). For example, δxr (·) will represent the vector whose sth component δxr (xs ) is 1 if r = s and zero otherwise. As in Fig.C.1, we assign a node and a random variable n(xs , ti ) to all spacetime lattice points (xs , ti ) with s ∈ Zx and i ∈ Zt −{0} and to the lattice point (x0 , t0 ). (Thus, at time t0 = 0, only position x0 gets a node). We draw arrows pointing from each node to all nodes occurring a time ∆t later. We also draw external arrows pointing out of each node at time tNt . The net starts at time t0 with a single root node at position x0 , and it ends at time tNt with external nodes at each position xs for all s ∈ Zx . The random variables n(xs , ti ) are occupation numbers that assume values n(xs , ti ) ∈ Z0+ . These occupation numbers specify states |n(xs , ti )i =
[a† (xs , ti )]n(xs ,ti ) q
n(xs , ti )!
|0i .
(C.1)
From the states of Eq. (C.1), one can form states |n(·, ti )i defined by |n(·, ti )i = Πs∈Zx |n(xs , ti )i .
(C.2)
A[n(xs , ti )|n(·, ti−1 )] = hn(xs , ti )|Ω|n(·, ti−1 )i ,
(C.3)
Figure C.2 shows the input and output arrows for a single node (xs , ti ) of our net. This node is assigned a value
where Ω is a second quantized operator. In general, the theory characterized by this QB net is a multi-particle quantum field theory. In fact, n(xs , ti ) is the classical field φ(xs , ti ) that is commonly used to define Feynman integrals in quantum field theories. In this appendix, we will consider P a single particle. Thus, for all i, s∈Zx n(xs , ti ) = 1 and n(·, ti ) = δxr (·) for some r ∈ Zx . For a single particle, the root node (x0 , t0 ) will be taken to have amplitude 1 for n(x0 , t0 ) = 1 and amplitude 0 for n(x0 , t0 ) = 0. All other nodes will be taken to have the value A[n(xs , ti )|n(·, ti−1 )] that is specified by the table of Fig.C.3. The quantity αs,r in Fig.C.3 is given by −i
αs,r = hxs |e h¯ ∆t H |xr i .
(C.4)
In this last equation, h ¯ is Planck’s constant; H is the single particle, first quantized Hamiltonian; |xr i for r ∈ Zx are the position eigenvectors, normalized so that P hxs |xr i = δs,r and r∈Zx |xr ihxr | = 1, where δs,r is the Kronecker delta. Equation (4.1) is trivially satisfied by the table of Fig.C.3. As for Eq.(4.3), it follows from the identity 25
X
−i
s∈Zx
For H = αs,r
p2 2m
|hxs |e h¯ (tNt −t0 )H |x0 i|2 = 1 .
(C.5)
+ V (x, t), one may write
−i∆t −¯ h2 d 2 = exp ( ) + V (xs , t) )( h ¯ 2m dxs (
"
#)
∆xδ(xs − xr ) ,
(C.6)
where δ(·) is the Dirac delta function. If one were to expand the exponential in the last equation, terms in which ( dxd s )2 acted only on the delta function would be much larger than those in which it acted on V (xs , t). Thus, one may approximate αs,r by keeping only those terms in which ( dxd s )2 does not act on V (xs , t). One then gets −i
αs,r ≈ (αs,r )f ree e h¯ ∆tV (xs ,t) ,
(C.7)
where dk ik(xs −xr )− i∆t ( h¯ 2 k2 ) h ¯ 2m . e −∞ 2π The Gaussian integral in Eq.(C.8) is easily performed. One obtains (αs,r )f ree = ∆x
Z
αs,r =
+∞
s
−i∆θ i ∆tLs,r e h¯ , π
(C.8)
(C.9)
where ∆θ = (
∆t m ∆x 2 ) ( ) , h ¯ 2 ∆t
(C.10)
2
(C.11)
and[16] Ls,r
m xs − xr = 2 ∆t
− V (xs , t) .
In general, |αs,r |