Statistical-Mechanical Informatics 2008 (IW-SMI 2008)

1 downloads 0 Views 8MB Size Report
Machine Learning with Quantum Relative Entropy ...... [22] Hayashi M 2007 Error exponent in asymmetric quantum hypothesis testing and its application to classical- ..... candidates of the exponent E(ρ, λ, Q, R) for upper-bounding the minimum ...... The key size is at least as big as the message in this case, but the accessible ...
International Workshop on

Statistical-Mechanical Informatics 2008 (IW-SMI 2008) Sendai International Center, Sendai, Japan September 14-17, 2008

Sponsored by Grant-in-Aid for Scientific Research on Priority Area “Deepening and Expansion of Statistical Mechanical Informatics” (DEX-SMI)

In collaboration with:

The Physical Society of Japan (JPS) JST CREST Project “Creation of New Technology Aiming for the Realization of Quantum Information Processing Systems”

IEICE-ES QIT Technical Committee

Preface Statistical mechanical informatics (SMI) is an approach that applies physics to information science, in which many-body problems in information processing are tackled using statistical mechanics methods. In the last decade, the use of SMI has resulted in great advances in research into classical information processing, in particular, theories of information and communications, probabilistic inference and combinatorial optimization problems. It is expected that the success of SMI can be extended to quantum systems. The importance of many-body problems is also being recognized in quantum information theory (QIT), for which the estimation theory of fewbody systems has recently been almost completely established after considerable eort. SMI and QIT are suciently well developed that it is now appropriate to consider applying SMI to quantum systems and developing many-body theory in QIT. This combination of SMI and QIT is highly likely to contribute signi cantly to the development of both research elds. The International Workshop on Statistical-Mechanical Informatics has been organized in response to this situation. This workshop, held at Sendai International Conference Center, Sendai, Japan, from 14th to 17th of September, 2008, and sponsored by the Grant-in-Aid for Scienti c Research on Priority Areas \Deepening and Expansion of Statistical Mechanical Informatics (DEX-SMI)" (Head investigator: Yoshiyuki Kabashima, Tokyo Institute of Technology) (Project WWW page: http://dex-smi.sp.dis.titech.ac.jp/DEX-SMI), is intended to provide leading researchers with strong interdisciplinary interests in QIT and SMI with the opportunity to engage in intensive discussions. The aim of the workshop is to expand SMI to quantum systems and QIT research on quantum (entangled) many-body systems, to discuss possible future directions, and to oer researchers the opportunity to exchange ideas that may lead to joint research initiatives. We would like to extend a warm welcome to the invited contributors and other participants in this workshop and wish them an enjoyable stay in Sendai. Finally, we do anticipate that this workshop will be successful and stimulate further development of the interdisciplinary research eld of informatics and statistical mechanics. The IW-SMI 2008 Organizing Committee: Kazuyuki Tanaka, General Chair (Tohoku University) Yoshiyuki Kabashima, Vice-General Chair (Tokyo Institute of Technology) Jun-ichi Inoue, Program Chair (Hokkaido University) Masahito Hayashi, Pulications Chair (Tohoku University) Hidetoshi Nishimori (Tokyo Institute of Technology) Toshiyuki Tanaka (Kyoto University)

i

ii

Table of Contents Program …………………………………………………………………………………………vi Quantum universal coding protocols and universal approximation of multi-copy states Masahito Hayashi (Tohoku University, Japan)…………………………………………………..1 How could the replica method improve accuracy of performance assessment of channel coding? Yoshiyuki Kabashima (Tokyo Institute of Technology, Japan)…………………………………11 A survey on loking of bipartite correlations Debbie Leung (University of Waterloo, Canada)……………………………………………….25 Spin Chain under Next Nearest Neighbor Interaction Kwek Leong Chuan (National Institute of Education, Singapore)……………………………...35 On the irreversiblity of measurements of correlations Masato Koashi (Osaka University, Japan)………………………………………………………49 Quantum response to time-dependent external field Seiji Miyashita (University of Tokyo, Japan)…………………………………………………...59 Dissipative quantum dynamics Leticia F Cugliandolo (Universite' Pierre et Marie Curie, France)…………………………….69 Non-classical Role of Potential Energy in Adiabatic Quantum Annealing Arnab Das (Abdus Salam International Center for Theoretical Physics, Italy)………………...81 A comparison of classical and quantum annealing dynamics Sei Suzuki (Aoyama Gakuin University, Japan)………………………………………………..89 Quantum annealing for problems with ground-state degeneracy Hidetoshi Nishimori (Tokyo Institute of Technology, Japan)…………………………………..99 Re`class'ification of `quant'ified classical simulated annealing Toshiyuki Tanaka (Kyoto University, Japan)………………………………………………….107 Monte Carlo Approach to Phase Transitions in Quantum Systems Naoki Kawashima (University of Tokyo, Japan)……………………………………………....115 Machine Learning with Quantum Relative Entropy Koji Tsuda (Max Planck Institute for Biological Cybernetics, Germany)…………………….127

iii

A Novel Quantum Transition in a Fully Frustrated Transverse Ising Antiferromagnet Bikas K. Chakrabarti (Saha Institute of Nuclear Physics, India)……………………………...137 Symmetries, Dimensional Reduction, and Topological Quantum Order Gerardo Ortiz (Indiana University, USA)……………………………………………………..143 Quantum algorithms and complexity Michele Mosca (University of Waterloo, Canada)…………………………………………….145 Variational Bayesian inference for partially observed stochastic dynamical systems Mike Titterington (University of Glasgow, UK)………………………………………………147 Mathematical Structures of Loopy Belief Propagation and Cluster Variation Method Kazuyuki Tanaka (Tohoku University, Japan)............................................................................159 Entanglement Manipulation under Non-Entangling Operations Fernando G.S.L. Brandao (Imperial College London, UK)…………………………………...177 Entanglement Production in Non-Equilibrium Thermodynamics Vlatko Vedral (University of Leeds, UK)……………………………………………………...189 Complementarity and the algebraic structure of finite quantum systems Denes Petz (Alfred Renyi Institute of Mathematics, Hungary)……………………………….197 Quantum spin glasses at finite connectivity: cavity method and quantum satisfiability Antonello Scardicchio (Princeton University, USA)………………………………………….207 Quantum mean-field decoding algorithm for error-correcting codes Jun-ichi Inoue (Hokkaido University, Japan)………………………………………………….209

iv

Scientific Program

v

International Workshop on Statistical-Mechanical Informatics 2008 14th September 9:30— Registration 10:30—10:40

Yoshiyuki Kabashima (Tokyo Institute of Technology, Japan): Opening

10:40—11:20

Masahito Hayashi (Tohoku University, Japan): Quantum universal coding protocols and universal approximation of multi-copy states

11:20—12:00

Yoshiyuki Kabashima (Tokyo Institute of Technology, Japan): How could the replica method improve accuracy of performance assessment of channel coding?

12:00—14:00

Lunch Time & Free discussion

14:00—14:40

Debbie Leung (University of Waterloo, Canada): A survey on loking of bipartite correlations

14:40—15:20

Kwek Leong Chuan (National Institute of Education, Singapore): Spin Chain under Next Nearest Neighbor Interaction

15:20—16:00

Masato Koashi (Osaka University, Japan): On the irreversiblity of measurements of correlations

16:00—16:40

Coffee Break & Free Discussion

16:40—17:20

Seiji Miyashita (University of Tokyo, Japan): Quantum response to time-dependent external field

17:20—18:00

Leticia F Cugliandolo (Universite' Pierre et Marie Curie, France): Dissipative quantum dynamics

15th September 9:40—10:20

Arnab Das (Abdus Salam International Center for Theoretical Physics, Italy): Non-classical Role of Potential Energy in Adiabatic Quantum Annealing

10:20—11:00

Sei Suzuki (Aoyama Gakuin University, Japan): A comparison of classical and quantum annealing dynamics

11:00—11:40

Hidetoshi Nishimori (Tokyo Institute of Technology, Japan): Quantum annealing for problems with ground-state degeneracy

11:40—12:20

Toshiyuki Tanaka (Kyoto University, Japan): Re`class'ification of `quant'ified classical simulated annealing

12:20—

Lunch Time & Free Discussion

vi

16th September 10:00—10:40

Naoki Kawashima (University of Tokyo, Japan): Monte Carlo Approach to Phase Transitions in Quantum Systems

10:40—11:20

Koji Tsuda (Max Planck Institute for Biological Cybernetics, Germany): Machine Learning with Quantum Relative Entropy

11:20—12:00

Bikas K. Chakrabarti (Saha Institute of Nuclear Physics, India): A Novel Quantum Transition in a Fully Frustrated Transverse Ising Antiferromagnet

12:00—14:00

Lunch Time & Free Discussion

14:00—14:40

Gerardo Ortiz (Indiana University, USA): Symmetries, Dimensional Reduction, and Topological Quantum Order

14:40—15:20

Michele Mosca (University of Waterloo, Canada): Quantum algorithms and complexity

15:20—16:00

Frank Verstraete (University of Vienna, Austria): Strongly correlated quantum systems from the point of view of quantum information theory

16:00—16:40

Coffee Break & Free Discussion

16:40—17:20

Mike Titterington (University of Glasgow, UK): Variational Bayesian inference for partially observed stochastic dynamical systems

17:20—18:00

Kazuyuki Tanaka (Tohoku University, Japan): Mathematical Structures of Loopy Belief Propagation and Cluster Variation Method

17th September 10:00—10:40 Fernando G.S.L. Brandao (Imperial College London, UK): Entanglement Manipulation under Non-Entangling Operations 10:40—11:20

Vlatko Vedral (University of Leeds, UK): Entanglement Production in Non-Equilibrium Thermodynamics

11:20—12:00

Denes Petz (Alfred Renyi Institute of Mathematics, Hungary): Complementarity and the algebraic structure of finite quantum systems

12:00—14:00

Lunch Time & Free Discussion

14:00—14:40

Antonello Scardicchio (Princeton University, USA): Quantum spin glasses at finite connectivity: cavity method and quantum satisfiability

14:40—15:20

Jun-ichi Inoue (Hokkaido University, Japan): Quantum mean-field decoding algorithm for error-correcting codes

15:20—

Kazuyuki Tanaka (Tohoku University, Japan): Closing

vii

viii

Proceedings

ix

x

Proceedings of the International Workshop on Statistical-Mechanical Informatics September 14–17, 2008, Sendai, Japan

Quantum universal coding protocols and universal approximation of multi-copy states Masahito Hayashi Graduate School of Information Sciences, Tohoku University, Sendai, 980-8579, Japan E-mail: [email protected] Abstract. We have constructed universal codes for quantum lossless source coding and classical-quantum channel coding. In this construction, we essentially employ group representation theory. In order to treat quantum lossless source coding, universal approximation of multi-copy states is discussed in terms of the quantum relative entropy.

1. Introduction Main limiting theorems in information theory were proven by Shannon [1], in which the protocols constructed depend on the distributions of the information source/the communication channels. Later, many researchers improved Shannon’s original results so that the protocol to achieve the optimal rate does not depend on the distributions of the information source/the communication channels. Such protocols are called universal, and Csisz´ar and K¨orner [2] developed the type method as a unified method to treat them. For the channel coding, they showed that their code has the same exponential rate of the error probability as the Gallager’s exponent [3], which is the optimal one among known codes in the stationary memoryless case. For the source coding, universal coding was discussed for the both cases of fixed-length and variable-length when the information source is given as an independent and identical distribution. In the fixed-length case, the universal code attains the optimal exponential rate of the error probability[2]. Clarke and Barron [4] treated the variable-length case in the following way. Since the average coding length is given by the Shannon entropy when the distribution of information source is known, they regarded the difference between the average coding length and the Shannon entropy as the redundancy of the given code, and studied its asymptotic behavior. Thanks to Kraft inequality, there exists a one-to-one correspondence between a variable-length prefix code and a probability distribution. Hence, the redundancy is given as the relative entropy between the distribution of information source and the distribution corresponding to the code. Consider the set of commutative densities {ρθ |θ ∈ Θ ⊂ Rm } on H := Cd , Rand a prior distribution w(θ) on Θ. Clarke and Barron [4] showed that the mixture state ρw,n := Θ ρθ w(θ)dθ satisfies the relation p n 1 ∼m det I(θ) − log w(θ), (1) D(ρ⊗n θ ∥σw,n ) = 2 log 2πe + 2 log where I(θ) is the Fisher information matrix (See [5]). In this paper, logarithms are taken to base 2. In particular, Clarke and Barron [6] proved that √ the mini-max of the constant term is given when the prior w is chosen as the Jeffreys’ prior

1

R



det I(θ)

det I(θ′ )dθ′

.

Concerning the quantum system, several universal protocols were given by several papers. Jozsa et al. [7] constructed a universal fixed-length source coding, which depends only on the compression rate and attains the minimum compression rate. Hayashi [8] proved that their code attains the optimal exponential rate of the error probability. Further, Hayashi and Matsumoto [9] constructed a universal variable-length source coding in the quantum system. In their formulation, the quantum state is compressed after the decision of the compression rate. That is, a superposition among states with different coding length is not allowed. Hence, since the measurement is necessary for the decision of the coding length, the quantum state is inevitably demolished. They proved that it is possible to decrease the degree of state demolition to infinitesimal and compress the state with the optimal compression rate even though the entropy rate of the information source is unknown. Therefore, it is impossible to treat the optimization of compression protocol in the same way as Clarke and Barron [4] in this scenario. In this paper, we consider a different formulation of quantum variable-length source coding, in which a superposition among different coding lengths is allowed. In this formulation, there is a one-to-one correspondence between codes and density matrices. Hence, when the information ensemble is known, the optimal average compression rate is given by its von Neumann entropy. Hence, the redundancy is given by the quantum relative entropy between the density of information source and the density corresponding to the code. Concerning the quantum case of this problem, in 1996, Krattenthaler and Slater[10] derived its quantum extension for the qubit case. However, the general case has remained an open problem for more than ten years. Their paper did not provide a complete solution to the mini-max problem for the constant term. As one of main results, we prove the existence of states σn on H⊗n satisfying d2 − 1 D(ρ⊗n ∥σn ) ∼ log n + O(1) = 2

(2)

for all faithful states ρ on H, i.e., states ρ in the set S := {ρ| rank ρ = d}. Since the dimension of the state family S is d2 − 1, the relation (2) can be regarded as a natural quantum extension of (1). More precisely, we calculate the following mini-max value µ min sup lim

{σn } ρ∈S n→∞

D(ρ

⊗n

¶ d2 − 1 ∥σn ) − log n , 2

(3)

which is one of the main results in this paper. Krattenthaler and Slater[10] treated the same problem for a restricted class of states {σn } in the qubit case. As the other main result, this paper treats a universal code for quantum channel. That is, we construct a universal coding for a classical-quantum channel, which attains the quantum mutual information and depends only on the coding rate and the ‘type’ of the input system. In the proposed construction, the following three methods play essential roles. The first method is the information spectrum method, which is essential for construction of the decoder. In the information spectrum method, the decoder is constructed by the square root measurement of the projectors given by the quantum analogue of the likelihood ratio between the signal state and the mixture state[11, 12]. The second method is the irreducible decomposition of the dual representation of the special unitary group and the permutation group. The method of irreducible decomposition provides the universal protocols in quantum setting[7, 9, 13, 14, 15, 16, 17]. However, even in the classical case, the universal channel coding requires the conditional type as well as the type[2]. In the present paper, we introduce a quantum analogue of the conditional type, which is the most essential part of the present paper. The third method is the packing lemma, which yields a suitable combination of the signal states independent of the form of the channel in the classical case[2]. This method plays the same role in the present paper.

2

The remainder of the present paper is organized as follows. In section 2, the notation for group representation theory is presented and a quantum analogue of conditional type is introduced. Group representation theory is essential for our derivation as the previous papers[7, 9, 13, 14, 15, 16, 17]. Using these facts, we show that it is possible to approximate the tensor product state universally. In section 3, we give the first main result, which treats the universal quantum channel coding. In section 4, we give a code that well works universally. In section 5, we treat universal approximation of tensor product states more precisely, and discuss the mini-max problem of this redundancy, which is the second main result of this paper. In section 6, we discuss the relation between the second main result and the quantum data compression. 2. Group representation theory In this section, we focus on the dual representation on the n-fold tensor product space by the the special unitary group SU (d) and the n-th symmetric group Sn 1 . For this purpose, we focus on the Young diagram and the ‘type’. The former is a key concept in group representation theory and the latter is that in information theory[2]. When the vector of integers ⃗n = (n1 , n2 , . . . , nd ) P satisfies the condition n1 ≥ n2 ≥ . . . ≥ nd ≥ 0 and di=1 ni = n, the vector ⃗n is called the Young diagram (frame) with size n and depth d, the set of which is denoted as Ynd . When the vector of P integers ⃗n satisfies the condition ni ≥ 0 and di=1 ni = n, the vector p⃗ = ⃗nn is called the ‘type’ with size n, the set of which is denoted as Tnd . Further, for p⃗ ∈ Tnd , the subset of X n is defined as: Tp⃗ := {⃗x ∈ X n |The empirical distribution of ⃗x is equal to p⃗}. The cardinalities of these sets are evaluated as follows: |Ynd | ≤ |Tnd | ≤ (n + 1)d−1 (n + 1) where H(⃗ p) := −

−d nH(⃗ p)

e

≤ |Tp⃗ |,

(4) (5)

Pd

Since the sets {(ns(i) )|⃗n ∈ Ynd } and {(ns′ (i) )|⃗n ∈ Ynd } are P nd−1 distinct for any s relation |{⃗n| i ni = n}| ∼ implies the following = (d−1)! d asymptotic behavior of the cardinality |Yn |: i=1 pi log pi [2]. ̸= s′ ∈ Sd , the

|Ynd | ∼ =

nd−1 . d!(d − 1)!

(6)

Using the Young diagram, the irreducible decomposition of the above representation can be characterized as follows: M H⊗n = U⃗n ⊗ V⃗n , ⃗ n∈Ynd

where U⃗n is the irreducible representation space of SU (d) characterized by ⃗n, and V⃗n is the irreducible representation space of n-th symmetric group Sn characterized by ⃗n. Here, the representation of the n-th symmetric group Sn is denoted as V : s ∈ Sn 7→ Vs . According to Weyl’s dimension formula, the dimension of U⃗n can be expressed as dim U⃗n =

Y ni − nj + j − i d(d−1) · · · > pd , when (p1 , . . . , pd ) = ( nn1 , . . . , nnd ), Q d(d−1) i0 0≤t≤1 1+t max

because there exists a parameter t ∈ (0, 1) such that ϕW,⃗p (t) − tR > 0. That is, the average error probability ε[Φn , W ] goes to zero.

5

4. Construction of universal code According to Csisz´ar and K¨orner[2], the proposed code is constructed as√follows. For a type p⃗ ∈ Tnd and a real positive number R < H(⃗ p), there exist Mn := enR− n distinct elements Mn := {⃗x1 , . . . , ⃗xMn } ⊂ Tp⃗ such that their empirical distributions are p⃗ and |TV⃗ (⃗x) ∩ (Mn \ {⃗x})| ≤ |TV⃗ (⃗x)|e−n(H(⃗p)−R) ⃗ ∈ V (⃗x, X ). As is explained in [19], this argument can be shown by for ⃗x ∈ Mn ⊂ Tp⃗ and V substituting the identical map into Vˆ in Lemma 5.1 in Csisz´ar and K¨orner[2]. Our encoder is constructed from the above argument. Next, for any ⃗x ∈ X n and any real number Cn , we define the projection P (⃗x) := {ρ⃗x − Cn σU,n ≥ 0}, P where {X ≥ 0} presents the projection i:xi ≥0 Ei for a Hermitian matrix X with the P diagonalization X = i xi Ei . Remember that the density ρ⃗x is commutative with the other density σU,n . Using the projection P (⃗x), we define the decoder: s X s X −1 −1 ′ Y⃗x′ := P (⃗x) P (⃗x ) P (⃗x) . ⃗ x∈Mn

⃗ x∈Mn

√ (enR− n , M

p, R). Then, the above-constructed code n , {Y⃗ x }⃗ x∈Mn ) is denoted by ΦU,n (⃗ As is shown in [19], the properties (5) and (11) imply that ϕW,⃗p (t) − tR −1 log ε(ΦU,n (⃗ p, R), W ) ≥ max n→∞ n 1+t t∈(0,1) lim

for any channel W . Therefore, we obtain Theorem 1. 5. Approximation of multi-copy states Next, we discuss the information-theoretic approximation more precisely. That is, we calculate the value D(ρ⊗n ∥σU,n ) more deeply. In the following calculation, we focus on the set Y d := {⃗ p = (p1 , p2 , . . . , pd−1 , 1 − p1 − . . . − pd−1 )|p1 > p2 > . . . > pd−1 > 1 − p1 − . . . − pd−1 > 0} of the probability distributions on {1, . . . , d}, and the density ρ(⃗ p) =

d X

pi |i⟩⟨i|,

i=1

{|i⟩}di=1

where is the standard orthonormal basis of H. Thus, for any state ρ, there exist Ω ∈ SU(d)/U (1)d−1 and p⃗ ∈ Y d such that ρ = ρp⃗,Ω := UΩ ρ(⃗ p)UΩ† , where UΩ is a representative of Ω. In this calculation, it is essential to calculate the average of the random variable p)⊗n I⃗n . In order to treat log |Ynd | + log dim U⃗n + log dim V⃗n under the distribution Qp⃗ (⃗n) := Tr ρ(⃗ P n) log dim V⃗n asymptotically, Matsumoto and Hayashi [17] introduced the quantity ⃗ (⃗ ⃗ n∈Ynd Qp n! n! 2 ⃗ n! := n1 !n2 !···nd ! . In their Appendix D , they showed that X n! Qp⃗ (⃗n)(log dim V⃗n − log ) ⃗n! d ⃗ n∈Yn

X sgn(s) Q pδs(i) i i ∼ Q log = (p i 0 and ∀ρ > 0. This indicates that the accuracy of the upper-bound can be improved by minimizing the right hand side with respect to these parameters. 3.2. Ensemble average as an upper-bound for the minimum Unfortunately, direct minimization of the right hand side of equation (4) is non-trivial due to the complicated dependence on C. However, the expression can still be useful for assessing the minimum error probability among all possible codes, Pe = minC∈{all codes} {Pe (C)}, for classical channels. K For this purpose, we introduce an ensemble of all codes Q(C) = 2s=1 Q(xs ), where Q(x) is an identical distribution for generating codewords x1 , x2 , . . . , x2K independently. Averaging equation (4) with respect to Q(C) gives an upper-bound of Pe as 

⎛ Q(C) ⎝2−K



⎛ P (y|xm ) ⎝

  P (y|xs ) λ

⎞ρ ⎞

⎠ ⎠ P (y|x ) m m,y s=m C∈{all codes} ⎛ ⎛ ⎞ ⎞ρ K K 2 2      ⎝ Q(xm )P (y|xm )1−λρ ⎠ Q(xs ) ⎝ P (y|xs )λ ⎠ , = 2−K y m=1 xm s=m C\xm s=m

Pe ≤ Pe (C) ≤

(5)

due to the fact that the minimum value over a given ensemble is always smaller than the average over the ensemble. Here, · · · represents the average over a code ensemble Q(C) and C\xm denotes a subset of C = {x1 , x2 , . . . , x2K } from which only xm is excluded. 3.3. Jensen’s inequality and random coding exponent Equation (5) is still difficult to assess for large K because the right hand side

involves the fracρ 2K λ , tional moment of a sum of exponentially many terms C\xm s=m Q(xs ) s=m P (y|xs ) the direct and rigorous evaluation of which requires an exponentially large computational cost even while the code ensemble is factorizable with respect to codewords. Jensen’s inequality ⎛ ⎞ρ ⎛ ⎞ρ 2K 2K     Q(xs ) ⎝ P (y|xs )λ ⎠ ≤ ⎝ Q(xs ) P (y|xs )λ ⎠ s=m s=m C\xm s=m C\xm s=m  ρ  ρ   K ρ λ ρK λ = (2 − 1) ≤2 , (6) Q(x)P (y|x) Q(x)P (y|x) x x which holds for 0 < ρ ≤ 1, is a standard technique of information theory to overcome this difficulty. Plugging this into equation (5), in conjunction with an additional restriction ρ ≤ 1,

13

we obtain the expression Pe ≤ Pe (C) ≤ 2ρK

   y

x

Q(x)P (y|x)1−ρλ

  x

ρ Q(x)P (y|x)λ

,

(7)

K (0 ≤ ρ ≤ 1), where 2−K 2m=1 xm Q(xm )P (y|xm )1−λρ = x Q(x)P (y|x)1−λρ is utilized and the trivial case ρ = 0 is included. For any given 0 ≤ ρ ≤ 1, the upper-bound of equation (7) is generally minimized by λ = 1/(1 + ρ), as assumed in Gallager’s paper [11]. The computational difficulty for  assessing equation (7) is resolved for memoryless channels P (y|x) = N l=1 P (yl |xl ) by assuming N factorizable distributions Q(x) = l=1 Q(xl ). This assumption naturally indicates that the upper-bound depends exponentially on the code length N as Pe ≤ exp [−N (−ρR + E0 (ρ, Q))], where R = K/N and ⎡  1+ρ ⎤   1 ⎦, E0 (ρ, Q) = − ln ⎣ Q(x)P (y|x) 1+ρ (8) y

x

are often termed the code rate and Gallager function, respectively. This means that if N is sufficiently large and the random coding exponent Er (R) = max {−ρR ln 2 + E0 (ρ, Q)} , 0≤ρ≤1,Q

(9)

is positive for a given R, there exists a code with a decoding error probability smaller than an arbitrary positive number. For a fixed Q(x), E0 (ρ, Q) is a convex upward function satisfying E0 (ρ = 0, Q) = 0 and    P (y|x) ∂ E0 (ρ, Q) ≡ ln 2 × I(Q), (10) = Q(x)P (y|x) ln ∂ρ x Q(x)P (y|x) ρ=0 y,x where I(Q) represents the mutual information between x and y (in bits). This implies that the critical rate Rc below which Er (R) becomes positive is given by ρ = 0 as Rc = max{I(Q)}, Q

(11)

which agrees with the definition of the channel capacity [14]. As R is reduced from Rc , the value of ρ that optimizes the right hand side of equation (9) increases and reaches ρ = 1 at a certain rate Rb . Below Rb , equation (9) is always optimized at the boundary ρ = 1. Figure 1 shows an example of Er (R) for the binary symmetric channel (BSC), which is characterized by a crossover rate of 0 ≤ p ≤ 1 as P (1|0) = P (0|1) = p and P (1|1) = P (0|0) = 1 − p for binary alphabets x, y ∈ {0, 1}. Er (R) characterizes an upper-bound of a typical decoding error probability of randomly constructed codes. However, surprisingly enough, it is known that for certain classes of channels, Er (R) represents the performance of the best codes at the level of exponent for a relatively high code rate region R ≥ Ra , which contains R = Rb , since Er (R) agrees with the exponent of a lower bound of the best possible code [15]. This is far from trivial because the restriction ρ ≤ 1, which governs Er (R) of R ≤ Rb , is introduced in an ad hoc manner when employing Jensen’s inequality in the above methodology.

14

0.3

Er(R)

0.25 0.2 0.15 0.1 0.05 0 0

0.1

0.2

0.3

0.4

0.5

0.6

R Figure 1. Random coding exponent Er (R) for BSC for a crossover rate p = 0.1. Er (R) (solid curve) becomes positive for R < Rc ( 0.531). The functional form of Er (R) for R < Rb ( 0.189) differs from that for Rb ≤ R ≤ Rc . The broken curve represents the value of the upper-bound exponent that is maximized without the restriction ρ ≤ 1. 4. Performance assessment by the replica method 4.1. Expanding the upper-bound for ρ = 1, 2, . . . In order to clarify the origin of the superficially artificial restriction ρ ≤ 1, we evaluate the exponent without using Jensen’s inequality. For this purpose, we assess the right hand side of equation (5) analytically, continuing the expressions obtained for ρ = 1, 2, . . . to ρ ∈ R. This is often termed the replica method [16, 17]. For the current problem, the first step of the replica method is to evaluate the expression ⎛ ⎞ρ   2K 2K  Pρ    a ,τ ) λ λ δ(s Q(xs )⎝ P (y|xs ) ⎠ = Q(xτ )P (y|xτ ) a=1 ρ a s=m C\xms=m {s }a=1 τ =m xτ  it ρ   λt W(i1 , i2 , . . . , iρ ) , (12) Q(x)P (y|x) = t=1 x (i1 ,i2 ,...,iρ ) analytically for ρ = 1, 2, . . ., where δ(x, y) = 1 for x = y and vanishes otherwise, and W(i1 , i2 , . . . , iρ ) is the number of ways of partitioning ρ replica messages s1 , s2 , . . . , sρ to i1 states (out of τ = 1, 2, . . . , 2K except for τ = m) by one, to i2 states by two, . . . and to iρ states by ρ. Obviously, W(i1 , i2 , . . . , iρ ) = 0 unless ρt=1 it t = ρ. It is worth noting that the expression of the right hand side is valid only for ρ = 1, 2, . . ..

15

4.2. Saddle point assessment under the replica symmetric ansatz Exactly evaluating equation (12) is difficult for large K = N R. However, in many systems, quantities of this kind scale exponentially with respect to N , which implies that the exponent characterizing the exponential dependence can be accurately evaluated by the “saddle point method” with respect to the partition of ρ, (i1 , i2 , . . . , iρ ), under an appropriate assumption of the symmetry underlying the objective system in the limit of N → ∞. The replica symmetry, for which equation (12) is invariant under any permutation of the replica indices a = 1, 2, . . . , ρ, is critical for the current evaluation. This implies that it is natural to assume that, for large N , the final expression of equation (12) is dominated by a single term possessing the same symmetry, which yields the following two types of replica symmetric (RS) solutions: • RS1: Dominated by (i1 , i2 , . . . , iρ ) = (ρ, 0, . . . , 0), giving ⎛ ⎞ρ  ρ 2K    Q(xs ) ⎝ P (y|xs )λ ⎠  W(ρ, 0, . . . , 0) Q(x)P (y|x)λ x s=m C\xm s=m  ρ  Q(x)P (y|x)λ .  2N ρR x • RS2: Dominated by (i1 , i2 , . . . , iρ ) = (0, 0, . . . , 1), giving ⎛ ⎞ρ  1 2K    Q(xs ) ⎝ P (y|xs )λ ⎠  W(0, 0, . . . , 1) Q(x)P (y|x)λρ x s=m C\xm s=m  NR  2 Q(x)P (y|x)λρ . x

(13)

(14)

Plugging these into the final expression of equation (5), in conjunction with P (y|x) = N N l=1 P (yl |xl ) and Q(x) = l=1 Q(xl ), gives the exponents    ρ     1−λρ λ Q(x)P (y|x) Q(x)P (y|x) , (15) ERS1 (ρ, λ, Q, R) = −ρR ln 2 − ln y

x

x

and  ERS2 (ρ, λ, Q, R) = −R ln 2 − ln

   y

Q(x)P (y|x)1−λρ

x

 

 Q(x)P (y|x)λρ

,

(16)

x

where the suffixes RS1 and RS2 correspond to equations (13) and (14), respectively, as two candidates of the exponent E(ρ, λ, Q, R) for upper-bounding the minimum decoding error probability as Pe ≤ exp [−N E(ρ, λ, Q, R)]. 4.3. Phase transition between RS solutions: origin of the restriction ρ ≤ 1 Although we have so far assumed that ρ is a natural number, both the functional forms of the saddle point solutions, (15) and (16), can be defined over ρ ∈ R. Therefore, we analytically continue these expressions from ρ = 1, 2, . . . to ρ ∈ R, and select the relevant solution for each set of (ρ, λ, Q, R) in order to obtain the correct upper-bound exponent E(ρ, λ, Q, R). This is the second step of the replica method. For ρ = 1, 2, . . . and sufficiently large N , this can be carried out by selecting the solution of the lesser exponent value. Unfortunately, as yet a mathematically justified general guideline

16

for selection of the relevant solution for ρ ≤ 1 has not been determined. Such a guideline is necessary for determining the channel capacity by assessment at ρ = 0. However, there is an empirical criterion for this purpose, which is indicated by the analysis of exactly solvable models [18]. In the current case, this means that for fixed λ, Q and R we should choose the solution for which the partial derivative with respect to ρ at ρ = 1, (∂/∂ρ)ERS1 (ρ, λ, Q, R)|ρ=1 or (∂/∂ρ)ERS2 (ρ, λ, Q, R)|ρ=1 , is lesser, as the relevant solution for ρ ≤ 1. This criterion implies that ERS1 (ρ, λ, Q, R) should be chosen to provide the tightest bound Ereplica (R) = max0≤ρ,0≤λ,Q {E(ρ, λ, Q, R)} for relatively large R, which yields the expression {ERS1 (ρ, λ, Q, R)} ⎡  1+ρ ⎤⎫ ⎬ ⎨   1 ⎦ . Q(x)P (y|x) 1+ρ = max −ρR ln 2 − ln ⎣ ⎭ 0≤ρ,Q ⎩ y x

Ereplica (R) =

max

0≤ρ,0≤λ,Q ⎧

(17)

As R is reduced from R = Rc , below which equation (17) becomes positive, the value of ρ that maximizes the right hand side of equation (17) increases from ρ = 0, keeping the relation λ = 1/(1 + ρ) at the maximum point. When R reaches Rb , the optimal value of ρ becomes unity and λ = 1/2, for which     ∂ ∂ ERS1 (ρ, λ, Q, R) ERS2 (ρ, λ, Q, R) − ∂ρ ∂ρ (ρ,λ,R)=(1,1/2,R (ρ,λ,R)=(1,1/2,Rb ) b) ⎧ ⎡  1+ρ ⎤⎫ ⎨ ⎬    1 ∂ ⎦  −ρRb ln 2 − ln ⎣ = Q(x)P (y|x) 1+ρ = 0. (18)  ⎭ ∂ρ ⎩  y x ρ=1

This implies that for R < Rb , (∂/∂ρ)ERS2 (ρ, λ, Q, R)|ρ=1 < (∂/∂ρ)ERS1 (ρ, λ, Q, R)|ρ=1 holds when the condition for a maximum is satisfied. Therefore, we should not select ERS1 (ρ, λ, Q, R), but rather ERS2 (ρ, λ, Q, R) for assessing the tightest bound Ereplica (R) = max0≤ρ,0≤λ,Q {E(ρ, λ, Q, R)} for R < Rb , which yields {ERS2 (ρ, λ, Q, R)}         1−λρ λρ Q(x)P (y|x) Q(x)P (y|x) −R ln 2 − ln = max 1≤ρ,0≤λ,Q y x x ⎧ ⎡  2 ⎤⎫ ⎬ ⎨   1 Q(x)P (y|x) 2 ⎦ . (19) = max −R ln 2 − ln ⎣ ⎭ Q ⎩ y x

Ereplica (R) =

max

0≤ρ,0≤λ,Q

In the second line, any choice of (ρ, λ) that satisfies λρ = 1/2 and ρ ≥ 1 optimizes the exponent. Although the style of the derivation seems somewhat different from that of the conventional approach, the exponents obtained by equations (17) and (19) are identical to those assessed using equation (9). Therefore, Ereplica (R) = Er (R) holds, implying that no improvement is gained by the replica method in the analysis of the ensemble of all codes. Nevertheless, our approach is still useful for clarifying the origin of the seemingly artificial restriction ρ ≤ 1 in the conventional scheme. The above analysis indicates that there is no such restriction as long as the upper-bound of equation (5) is directly evaluated. Instead, what is the most relevant is the breaking of the analyticity with respect to ρ of the upper-bound exponent E(ρ, λ, Q, R), which can be interpreted as a phase transition between the two types of replica symmetric solutions ERS1 (ρ, λ, Q, R) and ERS2 (ρ, λ, Q, R) in the terminology of physics. As a consequence, we have to appropriately switch the functional forms of the objective function in

17

order to correctly obtain the optimized exponent. However, this procedure, in practice, can be completely simulated by optimizing a single function in conjunction with introducing an additional restriction ρ ≤ 1, which can be summarized by a conventional formula of the random coding exponent, namely equation (9). Of course, it must be kept in mind that the mathematical validity of our methodology is still open while the known correct results are reproduced. Although applying the saddle point assessment is a major reason for the weakening of mathematical rigor, the most significant issue in the current context is mathematical justification of the empirical criterion at ρ = 1 to select the appropriate solution for ρ ≤ 1 when multiple saddle point solutions exist. Accumulated knowledge about error exponents of various codes in information theory [11, 19, 20, 21, 22] may be of assistance for solving this issue. Although we have applied the replica method to an upper-bound following the conventional framework in order to clarify the relation to an information theory method, it can be utilized to directly assess the minimum possible decoding error probability. For a region of lower R, there still exists a gap between the lower- and upper-bounds of the error exponents of the best possible code. An analysis based on the replica method indicates that the lower-bound of the exponent, which corresponds to the upper-bound of the decoding error probability, agrees with the correct solution [23]. 5. Analysis of low-density parity-check codes 5.1. Definition of an LDPC code ensemble Although a novel interpretation is obtained, our approach does not update known results in the analysis of the ensemble of all codes. However, this is not the case in general; the replica method usually offers a smaller upper-bound than conventional schemes for general code ensembles. We will show this for an ensemble of low-density parity-check (LDPC) codes. A (k, j) LDPC code is defined by N − K parity checks composed of k components,  selecting  N combinations of indices for characterizing a binary xl1 ⊕ xl2 ⊕ . . . ⊕ xlk = 0, out of k codeword of length N , x = (xl ) ∈ {0, 1}N , where l1 , l2 , . . . , lk = 1, 2, . . . , N and ⊕ denotes addition over the binary field. There are several ways to define an LDPC code ensemble. For analytical convenience, we here focus on an ensemble constructed by uniformly selecting N − K ordered combinations of k different indices l1 , l2 , . . . , lk , l1 l2 . . . lk , for parity checks, so that each component index of codewords l(= 1, 2, . . . , N ) appears j times in the total set of parity checks. A code C constructed in this way is specified by a set of binary variables c = {cl1 l2 ...lk  }, where cl1 l2 ...lk  = 1 if the combination l1 l2 . . . lk is used for a parity check and cl1 l2 ...lk  = 0 otherwise. For simplicity, we assume symmetric channels, where we can assume that the sent message m is encoded into the null codeword x = 0. Under this assumption, the generalized Chernoff’s bound (4) for an LDPC code is expressed as ⎛ ⎞ρ   I(x|c)P (y|x)λ ⎠ , P (y|0)1−λρ ⎝ (20) Pe (C) ≤ y x=0 where I(x|c) =



! 1 − cl1 l2 ...lk  + cl1 l2 ...lk  δ(xl1 ⊕ xl2 ⊕ . . . ⊕ xlk , 0) ,

(21)

l1 l2 ...lk 

returns unity if x satisfies all the parity checks and vanishes otherwise, screening only codewords in the summation over x in the right hand side of equation (20).

18

5.2. Performance assessment by the replica method Unlike the random code ensemble explored in the previous section, a statistical dependence arises among codewords in an LDPC code. This yields atypically bad codes, the minimum distance of which is of the order of unity with a probability of algebraic dependence on N . The contribution of such atypical codes causes the average of the decoding error probability over a naive LDPC code ensemble to decay algebraically with respect to N , indicating that the error exponent vanishes even for a sufficiently small rate R [24]. However, we can reduce the fraction of the bad codes to as small as required by removing short cycles in the parity check dependence by utilizing certain feasible algorithms [25]. This implies that, in practice, the performance of the LDPC code ensembles can be characterized by analysis with respect to the typical codes utilizing the saddle point method as shown below [26]. In order to employ the replica method, we assess the average of the right hand side of equation (20) with respect to the LDPC code ensemble ⎛ ⎞ N  1 δ⎝ cll2 l3 ...lk  , j ⎠ , (22) Q(c) = N (k, j) l=1

l2 l3 ...lk 

  δ c , j stands for the number of (k, j) LDPC codes. where N (k, j) = c N l2 l3 ...lk  ll2 l3 ...lk  l=1 For ρ = 1, 2, . . .  and sufficiently large N , evaluating this using the saddle method, substituting with P (y|x) = N l=1 P (yl |xl ), gives an upper-bound for the average decoding error probability over an ensemble of typical LDPC codes from which atypically bad codes are expurgated as Pe (C) ≤ exp [−N ELDPC (ρ, λ, R)], where ⎧ ρ k ⎨ N k−1  χ(bt ) δ(ba1 ⊕ ba2 ⊕ . . . ⊕ bak , 0) ELDPC (ρ, λ, R) = − Extr χ,b χ ⎩ k! t=1 a=1 b1 ,b 2 ,...,bk   ρ   P (y|0)1−λρ P (y|xa )λ χ (x)j + ln y x a=1  ⎫  ⎬ 1−1/k  (jN ) j − j + j ln , (23) − χ (b)χ(b) − ⎭ k ((k − 1)!)1/k b b = (b1 , b2 , . . . , bρ ) ∈ {0, 1}ρ and x = (x1 , x2 , . . . , xρ ) ∈ {0, 1}ρ . ExtrX denotes the operation of extremization with respect to X, which corresponds to the saddle point assessment of a certain complex integral and therefore does not necessarily mean maximization or minimization. An outline of the derivation is shown in Appendix A. An RS solution which is relevant for 0 ≤ ρ ≤ 1 corresponding to RS1 in the previous section is obtained under the RS ansatz " +1 ρ  a 1 + u(−1)b , (24) duπ(u) χ(b) = q 2 −1 a=1 " +1 ρ  a 1+u (−1)b , (25) d uπ ( u) χ (b) = q 2 −1 a=1 where q and q are normalization variables that constrain the respective variational functions π(u) and π ( u) to be distributions over [−1, 1], making it possible to analytically continue the expression (23) from ρ = 1, 2, . . . to ρ ∈ R. Carrying out partial extremization with respect to

19

q and q yields an analytically continued RS upper-bound exponent  RS ELDPC (ρ, λ, R)

j ln k

"



k +1

1+

k

t=1 ut

ρ 

= − Extr dut π(ut ) π,b π 2 −1 t=1 ⎞ρ ⎤ ⎛ ⎡  " +1 j  j x   1+u μ (−1) P (y|x)λ ⎠ ⎦ P (y|0)1−λρ d uμ π ( uμ ) ⎝ + ln ⎣ 2 −1 μ=1 y x=0,1 μ=1 #" +1  $%  1+u u ρ − j ln du π ( u)π(u) , (26) 2 −1

where the functional extremization Extrπ,bπ {· · ·} can be performed numerically in a feasible time by Monte Carlo methods in practice [27]. 5.3. Comparison of lower-bound estimates of error threshold When the noise level is sufficiently small and the code length N is sufficiently large, there exists at least one (k, j) LDPC code with a decoding error probability smaller than an arbitrary positive number. The maximum value of such noise levels is sometimes termed the error threshold. Equation (26) can be utilized to assess a lower-bound of the error threshold. Table 1 shows the lower-bounds obtained by maximizing this equation with respect to ρ ≥ 0 and λ ≥ 0 for several sets of (k, j). Estimates obtained by the conventional schemes utilizing Jensen’s inequality, which in the current case are determined by an upper-bound exponent Jensen (ρ, λ, R) ELDPC

# $ & 1 + uk j = − Extr ρ ln k ⎛2 ⎞ρ ⎤ ⎡u,bu j x  1 + u  (−1) P (y|0)1−λρ ⎝ P (y|x)λ ⎠ ⎦ + ln ⎣ 2 y x=0,1 $% # 1+u u , − ρj ln 2

(27)

are also provided for comparison. Table 1 indicates that, in general, the lower-bounds estimated by the replica method are not smaller than those of the conventional schemes. This implies that unlike the case of the ensemble of all codes, employing Jensen’s inequality can relax an upper-bound for general code ensembles and therefore there may be room for improvement in results obtained by conventional schemes based on this inequality. 6. Summary and discussion In summary, we have explored the relation between statistical mechanics and information theory methods for assessing performance of channel coding, based on a framework developed by Gallager [11]. An average of a generalized Chernoff’s bound for probability of decoding error over a given code ensemble can be directly evaluated by the replica method of statistical mechanics, while Jensen’s inequality must be applied in a conventional information theory approach. The direct evaluation of the average associated a switch of two analytic functions in the random coding exponent known in information theory with a phase transition between two replica symmetric solutions obtained by the replica method. Better lower-bounds of the error threshold were obtained for ensembles of LDPC codes under the assumption that the replica method produces the correct results. This may motivate an improvement in the accuracy of performance assessment, refining the conventional methodologies.

20

(j, k) (3, 6) (3, 5) (4, 6) (2, 3) (2, 4)

R 1/2 2/5 1/3 1/3 1/2

Jensen 1 0.0678 0.115 0.1705 0 0

Jensen 2 0.0915 0.129 0.1709 0.0670 0.0286

replica 0.0998 0.136 0.173 0.0670 0.0286

Shannon 0.109 0.145 0.174 0.174 0.109

Table 1. Lower-bound estimates of the error threshold of BSC. In columns “Jensen 1”, “Jensen 2” and “replica”, the estimates represent the critical crossover rates pc , below which the maximized values of equation (26) or (27) are positive. In the evaluation, the exponents are maximized with respect to two parameters ρ ≥ 0 and λ ≥ 0 for “Jensen 2” and “replica” while a single parameter maximization with respect to ρ ≥ 0, keeping λ = 1/(1 + ρ), is performed for “Jensen 1”. “Shannon” represents the channel capacity for a given code rate R.

A characteristic feature of the methods developed in statistical mechanics is the employment of the saddle point assessment utilizing a certain symmetry underlying the objective system, which, in some cases, makes it possible to accurately analyze macroscopic properties of large systems even when there are statistical correlations or constraints among system components. Such approaches may also be useful for analyzing codes of quantum information, for which, in many cases, there arise non-trivial correlations among codewords for the purpose of dealing with noncommutativity of operators [28]. Acknowledgments This work was partially supported by a Grant-in-Aid from MEXT, Japan, No. 1879006. Appendix A. Outline of derivation of equation (23) Equation (23) is obtained by averaging the right hand side of equation (20) with respect to the (k, j) LDPC code ensemble (22). For this assessment, we first evaluate the normalization constant N (k, j) utilizing the identity ⎞ ⎛ P '  1 l l ...l  cll2 l3 ...lk  −(j+1) ⎠ ⎝ dZl Zl cll2 l3 ...lk  , j = Zl 2 3 k , (A.1) δ 2πi l2 l3 ...lk 

( √ −1 and dZ denotes the contour integral along a closed curve surwhere i = rounding the origin on the complex plane. Plugging this expression into N (k, j) =  N l2 l3 ...lk  cll2 l3 ...lk  , j yields c l=1 δ N (k, j) =

=

=

1 (2πi)N 1 (2πi)N 1 (2πi)N

' N

−(j+1)

dZl Zl

l=1

' N l=1

(1 + Zl1 Zl2 . . . Zlk )

l1 l2 l3 ...lk 

l=1

' N



−(j+1)

dZl Zl



exp ⎣

⎤ ln (1 + Zl1 Zl2 . . . Zlk )⎦

⎡l1 l2 l3 ...lk  ⎤  −(j+1) dZl Zl exp ⎣ Zl1 Zl2 . . . Zlk + higher order terms⎦ l1 l2 l3 ...lk 

21

 

1 (2πi)N 1 (2πi)N

' N l=1

' N l=1

⎡ −(j+1)

dZl Zl

exp ⎣



⎤ Zl1 Zl2 . . . Zlk ⎦

⎡l1 l2 l3...lk  k ⎤ N k  1 N −(j+1) dZl Zl exp ⎣ Zl ⎦ . k! N

(A.2)

l=1

Here, in the third to fifth lines we have omitted irrelevant higher order terms since they do not saddle point assessment.* Inserting the identity 1 =

affect the following  + ) ) ) +i∞ N N −1 −1 q0 exp q0 dq0 δ dq0 −i∞ d = (2πN ) into this N l=1 Zl − N q0 l=1 Zl − N q0 expression makes it possible to analytically integrate equation (A.2) with respect to Zl (l = 1, 2, . . . , N ). For large N , the most dominant contribution to the resulting integral with respect to q0 and q0 can be evaluated by the saddle point method as      j (jN )j−j/k q0j N k−1 k 1 ,(A.3) ln N (k, j)  Extr q − q0 q0 + ln = − j + ln N q0 ,b q0 k! 0 j! k ((k − 1)!)j/k j! where the saddle point is given as q0 = ((k−1)!)1/k j 1/k N −1+1/k and q0 = ((k−1)!)−1/k (jN )1−1/k . The average of the right hand side of equation (20) for ρ = 1, 2, . . . can be evaluated in ρ λ and take the average with a similar manner. For this, we expand x=0 I(x|c)P (y|x) respect to c, utilizing the LDPC code ensemble (22). For each fixed set of x1 , x2 , . . . , xρ , we obtain the expression ⎞ ⎛ ρ   ⎠ ⎝ cll2 l3 ...lk  , j I(xa |c) δ c a=1 l2 l3 ...lk    ' ρ N 1 −(j+1) a a a dZl Zl δ(xl1 ⊕ xl2 ⊕ . . . ⊕ xlk , 0) 1 + Zl1 Zl2 . . . Zlk = (2πi)N a=1 l=1 l1 l2 l3 ...lk  ⎡ ⎤ ' ρ N  1 −(j+1) dZl Zl exp ⎣ Zl1 Zl2 . . . Zlk δ(xal1 ⊕ xal2 ⊕ . . . ⊕ xalk , 0)⎦  (2πi)N a=1 l=1 l1 l2 l3 ...lk  ' N 1 −(j+1) dZl Zl ×  (2πi)N l=1 ⎤ ⎡  ρ  ρ k N k   N 1 Zl δ(xal , bat ) δ(ba1 ⊕ ba2 ⊕ . . . ⊕ bak , 0)⎦ , (A.4) exp ⎣ k! N t=1 a=1 a=1 l=1 b1 ,b2 ,...,bk where we have introduced the dummy variables bt = (b1t , b2t , . . . , bρt ) (t = 1, 2, . . . , k) as ρ

δ(xal1

⊕ xal2

⊕ ...

⊕ xalk , 0)

=

a=1



 b1 ,b2 ,...,bk

ρ k

a=1 t=1

δ(xalt , bat )

ρ

 δ(ba1 ⊕ ba2 ⊕ . . . ⊕ bak , 0), (A.5)

a=1

in order to decouple xal1 , xal2 , . . . , xalk of the left hand side. Inserting the identity 1=N

−2ρ

" b

dχ(b)δ

N  l=1

Zl

ρ

 δ(xal , ba ) − N χ(x)

a=1

22

=

1 ρ (2πN )2

=

1 (2πN )2ρ

⎡ ⎤ N ρ   ⎝ dχ(b)d Zl δ(xal , ba ) − N χ(b) ⎦ χ(b)⎠ exp ⎣ χ (b) a=1 l=1 ⎤ ⎞ ⎡ b ⎛b " N   ⎝ dχ(b)d Zl χ (xl ) − N χ (b)χ(b)⎦ , (A.6) χ(b)⎠ exp ⎣ l=1 b b "







where xl = (x1l , x2l , . . . , xρl ) (l = 1, 2, . . . , N ), into equation (A.4) allows integration with respect to Zl (l = 1, 2, . . . , N ) to be performed analytically. The resulting expression enables us to take summations with respect to xl (l = 1, 2, . . . , N ) independently in assessing the average, which yields identical contributions for l = 1, 2, . . . , N and leads to the saddle point evaluation of equation (23). References [1] Nishimori H 2001 Statistical Physics of Spin Glasses and Information Processing: An Introduction (Oxford: Oxford University Press) [2] Watkin T L H, Rau A and Biehl M 1993 Rev. Mod. Phys. 65 499 [3] Engel A and van den Broeck C 2001 Statistical Mechanics of Learning (Cambridge: Cambridge University Press) [4] Kabashima Y and Saad D 1998 Europhys. Lett. 44 668 [5] Kabashima Y and Saad D 1999 Europhys. Lett. 45 97 [6] Montanari A and Sourlas N 2000 Eur. Phys. J. B 18 107 [7] Tanaka T 2002 IEEE Trans. Inform. Theory 48 2888 [8] Kabashima Y 2003 J. Phys. A: Math. Gen. 36 11111 [9] Mezard M and Parisi G 2001 Eur. Phys. J. B 20 217 [10] Kabashima Y 2003 J. Phys. Soc. Jpn 72 1645 [11] Gallger R G 1965 IEEE Trans. Inform. Theory 11 3 [12] Gallager R G 1968 Information Theory and Reliable Communication (New York: John Wiley & Sons) [13] Viterbi A J and Omura J K 1979 Principles of Digital Communication and Coding (New York: McGraw-Hill) [14] Shannon C E 1948 Bell System Technical Journal 27 379; ibid. 623 [15] Burnashev M V 2005 Problems of Information Transmission 41 301 [16] Mezard M, Parisi G and Virasolo M A 1987 Spin Glass Theory and Beyond (Singapore: World Scientific) [17] Dotzenko V S 2001 Introduction to the Replica Theory of Disordered Statistical Systems (Cambridge: Cambridge University Press) [18] Ogure K and Kabashima Y 2004 Prog. Theor. Phys. 111 661 [19] McEliece R J and Omura J K 1977 IEEE Trans. Inform. Theory 23 157 [20] Litsyn S 1999 IEEE Trans. Inform. Theory 45 385 [21] Burnashev M V 1984 IEEE Trans. Inform. Theory 30 23 [22] Ashikhmin A, Barg A and Litsyn S 2000 IEEE Trans. Inform. Theory 46 1945 [23] Skantzos N S, van Mourik J, Saad D and Kabashima Y 2003 J. Phys. A: Math. Gen. 36 11131 [24] Miller G and Burshtein D 2001 IEEE Trans. Inform. Theory 47 2696 [25] van Mourik J and Kabashima Y 2003 The polynomial error probability for LDPC codes Preprint arXiv:condmat/0310177 [26] Kabashima Y and Saad D 2004 J. Phys. A: Math. Gen. 37 R1 [27] Kabashima Y, Sazuka N, Nakamura K and Saad D 2001 Phys. Rev. E 64 046113(1-4) [28] Hayashi M 2006 Quantum Information: An Introduction (Berlin: Springer-Verlag)

23

24

Proceedings of the International Workshop on Statistical-Mechanical Informatics September 14–17, 2008, Sendai, Japan

A survey on locking of bipartite correlations Debbie Leung Institute for Quantum Computing, University of Waterloo, Waterloo, Ontario, N2L3G1, Canada E-mail: [email protected] Abstract. Unlocking of a correlation refers to the unexpected phenomenon that a small amount of communication increases that correlation (as a function of the state of the distributed system) by a disproportionate amount. Locking refers to the suppression of the correlation prior to the communication. The notion was subsequently extended to abrupt changes in a correlation due to the manipulation (in particular, the addition or removal) of a small subsystem. In this proceeding, we review the basic ideas and summarize the results known so far.

1. Intuitive properties for correlations Intuitively, we expect any good correlation measure to satisfy certain axiomatic properties. First, correlation is a nonlocal property and should not increase under local processing (monotonicity). Second, starting from an uncorrelated initial state, a certain amount of communication should not increase the correlation in a disproportionate way. We call this property total proportionality. In fact, we expect the above to hold for any initial state. We call this more stringent property incremental proportionality. Other properties such as continuity or convexity are also expected. For the most general form of communication that can involve any number of rounds of forward or backward quantum or classical communication between Alice and Bob, the quantum mutual information satisfies all of the above properties [1]. The quantum mutual information is defined as Iq (ρ) ≡ S(ρA )+S(ρB )−S(ρ) with S(η) ≡ −Tr η log η being the von Neumann entropy and ρA = TrB ρ, ρB = TrA ρ being the reduced density matrices. Consequently, mutual information in the classical setting (classical states and communication) also satisfies all the stated properties. A natural definition of classical mutual information of a quantum state ρAB is the maximum classical mutual information that can be obtained by local measurements MA ⊗ MB [1]: Ic (ρ) ≡ max I(A : B). MA ⊗MB

(1)

Here I(A : B) is the classical mutual information defined as I(A :B) ≡ H(pA ) + H(pB ) − H(pAB ), H is the entropy function [2], and pAB , pA , pB are the probability distributions of the joint and individual outcomes of performing the local measurement MA ⊗MB on ρ. The physical relevance of Ic is many-fold. • Ic (ρ) is the maximum classical correlation obtainable from ρ by purely local processing.

25

• Ic (ρ) corresponds to the usual classical mutual information when ρ is “classical,” i.e. ρ is diagonal in some local product basis, corresponding to a classical distribution. • When ρ is pure, Ic (ρ) is the correlation calculated in the Schmidt basis and is thus equal to the entanglement of ρ [3, 4]. • For a classical-quantum system (that is, the density matrix of the distributed state is block diagonal), Ic (ρ) is the accessible information of the ensemble of quantum states living in system B with classical labels in system A. (See Section 2.2 Eq. (3) for the definition of the accessible information.) • Ic (ρ) = 0 if and only if ρ = ρA ⊗ ρB [5]. Ic is proved to satisfy monotonicity, total proportionality, continuity, and convexity hold [1]. Incremental proportionality was proved for pure initial states ρ for any communication [1]. 2. Violation of intuition – Locking of Ic 2.1. Earliest example While working on [1], the authors were initially trying to prove incremental proportionality for Ic , only to find states that violate it [5]. These states, ρACB , are supported on Cd ⊗ C2 ⊗ Cd such that Ic (ρAC|B ) = 21 log d and Ic (ρA|CB ) = log d + 1 (we use “|” to indicate the bipartite cut that defines the correlation). Furthermore, systems AC are classical. Thus an alternative statement of the result is that, providing Bob with one extra bit of classical information C increases his accessible information by 21 log d where d is his system dimension. Naturally, we think of system C as the “key” that “locks” his accessible information. Many of the subsequent developments in the subject are based on this specific example provided in [5]. In detail, 1 XX (|xihx| ⊗ |kihk|)A ⊗ |kihk|C ⊗ (Uk |xihx|Uk† )B . 2d x=0 d−1 1

ρACB =

(2)

k=0

Here U0 = I and U1 is any unitary that changes the computational basis to a conjugate basis (∀i,x |hi|U1 |xi| = √1 ). In this example, Bob is given a message |xi uniformly distributed over d d possibilities in two possible random bases (depending on whether k = 0 or 1), while Alice holds the classical label of his state (both the message and the key). The 1-bit communication that unlocks the message is the basis information. Clearly, Ic (ρA|CB ) = log d + 1. The challenge is to prove that Ic (ρAC|B ) = 12 log d. This has to be done for all possible POVM measurements that Bob can apply. 2.2. Connection to entropic uncertainty Now we prove that Ic (ρAC|B ) = 12 log d. This takes several steps and a connection to a subject concerning “entropic uncertainty relations.” First, the complete measurement MA along {|xi ⊗ |ki} is provably optimal for Alice: Since the outcome tells her precisely which pure state from the ensemble she has, she can apply classical, local post-processing to obtain the output distribution for any other measurement she could have performed. For Alice’s choice of optimal measurement, Ic (ρAC|B ) is simply Bob’s accessible information Iacc [3] about the uniform ensemble of states {|xi, U1 |xi}x=0,···,d−1 . In general, let I be a random variable such that outcome i occurs with probability pi ≥ 0. A draw from the ensemble of states E = {pi , ηi } is a specimen of the corresponding random state ηI . 26

The accessible information Iacc of E is the maximum mutual information between the random variable I and the outcome of a measurement performed on ηI . Iacc (E) can be maximized by a POVM that has only rank-1 elements [3]. Let M = {αj |φj ihφj |}j stand for a POVM with rank-1 elements where each |φj i is normalized and αj > 0. Then Iacc (E) can be expressed as h X XX pi hφj |ηi |φj i i Iacc (E) = max − pi log pi + pi αj hφj |ηi |φj i log , M hφj |µ|φj i i

where µ =

P

i

(3)

j

i pi ηi .

1 We now apply Eq. (3) to the present problem. Our ensemble is { 2d , Uk |xi}x,k with i = x, k a 1 I 1 double index, px,k = 2d , µ = d , and hφj |µ|φj i = d for all j. Putting all these in Eq. (3),

h X αj |hφj |Uk |xi|2 i |hφj |Uk |xi|2 log Ic (ρAC|B ) = max log 2d + M 2d 2 jxk h i X αj  1 X = max log d+ |hφj |Uk |xi|2 log |hφj |Uk |xi|2 M d 2 j

xk

P P where we have used αj = d and ∀jt k |hφj |Ut |ki|2 = 1 to obtain the last line. Since j P αj j d = 1, the second term is a convex combination, and can be upper bounded by maximization over just one term: Ic (ρAC|B ) ≤ log d + max |φi

1X |hφ|Uk |xi|2 log |hφ|Uk |xi|2 . 2

(4)

xk

P Note that − xk |hφ|Uk |xi|2 log |hφ|Uk |xi|2 is the sum of the entropies of measuring |φi in the computational basis and the conjugate basis. Reference [6] proves that such a sum of entropies is at least log d, no matter how |φi is chosen. Lower bounds of these type are called entropic uncertainty inequalities, which quantify how much a vector |φi cannot be simultaneously aligned with states from a set of bases. It follows that Ic (ρAC|B ) ≤ 21 log d. Equality can in fact be attained when Bob measures in the computational basis, so that Ic (ρAC|B ) = 12 log d and Ic (ρA|BC ) − Ic (ρAC|B ) = 1 + 21 log d. We remark that incremental proportionality remains violated for multiple copies of ρ. Wootters proved that [7] the accessible information from m independent draws of an ensemble E of separable states is additive, Iacc (E ⊗m ) = mIacc (E). It follows Ic (ρ⊗m ) = mIc (ρ) in our example. 2.3. Direct generalizations – locking by encoding in an unknown basis The result in Section 2.1 is suggestive – when encoding the classical message in a number of possible bases, Bob’s best measurement is to guess the basis and measure accordingly. Consider direct generalizations of the locking scheme above – in which multiple bases are used to lock accessible information for Bob. It is highly desirable to be able to lock all but a negligible amount of accessible information, using a key with size negligible compared to the change in Iacc . Furthermore, beyond information theoretical considerations, it will also be nice to have schemes that can be efficiently implemented. A variety of interesting results are obtained in the past 5 years, but there are no definitive solution. We now explain these results briefly.

27

2.3.1. Full set of mutually unbiased bases The earliest example uses 2 conjugate bases to lock the message. There is a natural generalization of 2 conjugate bases to a set of mutual unbiased bases (MUBs) [8, 9], with the defining property that the inner product between any two states from two different bases has magnitude √1d in a d-dimensional system. Thus, it is natural to suspect that the same effect holds for t ≥ 2 mutual unbiased bases (MUBs) [9] – that Bob’s accessible information is upper bounded by 1t log d before the communication. Indeed, for the case when the dimension d is a prime power and a full set of MUBs are used (t = d + 1) Ic (ρAC|B ) ≤ log d − log(d + 1) + 1 = 1 − log 1 +

1 d

(5)

and Ic (ρAB|C ) − Ic (ρA|BC ) ≥ 2 log(d + 1) − 1

(6)

This follows from an entropic uncertainty inequality for the full set of MUBs due to Sanchez [10]. The scheme removes nearly all the accessible information from Bob, but the key size is now comparable to the change in Ic and thus the scheme cannot be properly considered as “locking.” 2.3.2. Small number of MUBs Since a set of MUBs conceals accessible information well (for t = 2 or t = d + 1), researchers naturally turn to the possibility of locking with a small number of MUBs, with the hope that choosing one out of t MUBs at random to encode a message can lock the accessible information down to 1t log d. (We will see in Section 2.4 that one cannot lock better.) The problem of bounding Bob’s accessible information appears to be mathematically complicated. The community only has some partial results. (i) A simple investigation is a numerical optimization of Bob’s measurements for small primes d. During the investigation leading to [5], an optimization for |φi in Eq.(4) was performed to provide a numerical upper bound of the accessible information that varies as ≈ 1t log d+ c where c ≈ 1 is independent of t. It becomes clear in this investigation that different t-subsets of the full set of MUBs are generally inequivalent, and the numerical bound depends on the particular choice. Since the observed dependence is small, this simulation uses a particular order of the MUBs and simply takes the first t MUBs. This study focusing on prime values of d misses an important phenomenon to be discussed next. (ii) For a while, mutual unbiasedness was believed to be a central ingredient for locking. Such √ intuition turns out questionable. Reference [11] presents special choices of t = d MUBs that do not lock better than t = 2 MUBs. One simple example holds when t is a prime-power. Let X and Z be the generalized Pauli matrices acting on t dimensions, and ω be a primitive tth root of unity. (In other words, X|ai = |(a+1) mod ti and Z|ai = ω a |ai.) Consider the eigenbases of Z and XZ k for k = 0, · · · , t−1. It was shown in [12] that they form a maximal set of MUBs on t dimensions. The encoded message can be taken as the label for the eigenvalue of the eigenvector used. Let B1 , B2 be a pair of MUBs for a Hilbert space H1 , and B3 , B4 be another pair of MUBs for another Hilbert space H2 . Consider the tensor product basis formed from B1 and B3 , and that formed from B2 and B4 . They are mutually unbiased on H1 ⊗ H2 . 28

√ The above method can be used to define t+1 = d+1 MUBs for a d-dimensional Hilbert space as follows. The first one is the tensor product basis defined by Z and Z −1 . The log dbit message has two parts – the label of the eigenvalue of Z ⊗ I and that of I ⊗ Z −1 . Each of the other t bases is the tensor product basis defined by XZ k and XZ −k for k = 0, · · · , t−1. The message consists of the labels of the eigenvalues of XZ k ⊗ I and I ⊗ XZ −k . Now, Bob’s optimal measurement is that of X ⊗ X and Z ⊗ Z −1 (this is the measurement along the generalized Bell basis). To see optimality, note that Bob can multiply the appropriate powers of his two measurement outcomes to obtain the product of the pair of messages for each possible choice of the basis, thus his accessible information is at least 1 2 log d. Following earlier work for t = 2, it cannot be higher. (iii) Note that locking is defined information theoretically, without reference to limitations of computation power. In practical applications, for states with dimension d or size n = log d, it makes sense to consider a weaker notion of locking in which Bob’s optimal measurement is limited to those implementable with poly(n)-sized circuits. This was proposed and analyzed k in [13]. It states that, if Bob’s circuit is no larger than 2n , then, choosing (k + 2) log n + 3 bases randomly from the full set of d + 1 MUBs can lock Bob’s accessible information down to less than 2 bits. k The idea is simply that there are at most 2n possible measurements, yet on average, these MUBs locks well (because of the bound by Sanchez [10]). Thus a random choice of a small subset also locks quite well (according to the Chernoff bound) and the probability that the random choice fails is small. The union bound over all allowed measurements can then be applied to obtain the stated result. Note that the argument is an existential proof of an efficient scheme. A main advantage of MUBs is that, the encoding and decoding schemes are easy to implement. However, locking with MUBs occurs to be extremely difficult to analyze. It is still an open question to find an efficiently implementable locking scheme which takes a negligible key and provably locks all but a negligible amount of accessible information. However, if the concern is primarily information thereotic – focusing on the limit to which accessible information can be locked rather than practicality, then locking schemes such as those using random bases are much easier to analyze. 2.3.3. Locking with random bases One expects random bases to lock at least as well as MUBs because two random bases are approximately mutually unbiased in large dimensions. The advantage of using random bases is that they are much easier to analyse. More precise, the new locking scheme can be obtained from Eq. (2) by replacing the t = 2 conjugate bases with an arbitrary number t of independently drawn random bases. In other words, the U0 , · · · , Ut−1 are independently drawn from the Haar measure. The analysis in Section 2.2 stays the same, but now it is possible to prove an entropic uncertainty inequality for these random bases [14]. It was proved that there is a positive constant C ′′ such that   t  1X Pr inf Hj ≤ (1−ǫ) log d − 3 ≤ ( 10 )2d exp − t ǫ |φi t j=1

ǫC ′′ d 2(log d)2

−1



,

(7)

where pj (i) = |hi|Uj† |φi|2 , and Hj is the entropy for the distribution pj . Choosing t = (log d)3 ensure the vanishing of the RHS of Eq. (7) – in particular it is smaller than 1 and there

29

P exist instances (of the random Uk ’s) with inf |φi 1t tj=1 Hj > (1 − ǫ) log d − 3. Then both Ic (ρAC|B ) ≤ ǫ log d + 3 and the key size log t = 3 log log d are negligible compared to log d.

In [13] it is noted that choosing ǫ = log1 d and t = (log d)4 in the above can lock the accessible information to ≤ 4 bits (a constant) while increasing the key size only to log t = 4 log log d. 2.4. Limitations of locking with bases

We have seen various results on the locking scheme based on Eq. (2). Each of these schemes consists of a set of bases known to both Alice and Bob, and Alice encodes the message |xi in one of these bases at random. Each can be specified by the dimension d, the number of bases t, and the set of bases. 2.4.1. Lower bound on accessible information By using a full set of MUBs, the accessible information is upper bounded by 1, and by using poly(log log d) random bases, the accessible information is upper bounded by 4 or so. This begs the question – are there better choices of the unitaries that removes the small residual accessible information? This question was studied at the very early days of “modern quantum information” (1994). Reference [15] presents a lower bound of ≈ 0.577 (Euler’s constant) on the accessible information of any ensemble of pure states. It means that even for very big dimension and key size, the accessible information is a constant far from being “small.” In contrast, consider encryption of a quantum or classical message using a classical key (the classical or quantum one-time-pad [16, 17, 18]) which is similar to locking in encoding the message in a basis unknown to the eavesdropper who has possession of the encrypted/encoded state. The key size is at least as big as the message in this case, but the accessible information on the encoded message is exactly 0. The reason for the difference is the following. In locking, we consider the accessible information of the ensemble of all possible (pure) states sent to Bob, each with a label on the message x and one on the basis k. In encryption, the key is not part of the classical label in the ensemble. P In particular, the ensemble consists of mixed states ρx = 1t tk=1 Uk |xihx|Uk† . In Section 2.5, we will discuss locking schemes involving mixed states. 2.4.2. Neccessity of a uniform prior message distribution We remark in passing that if x, k are not uniformly distributed, the various locking scheme described may fail. 2.4.3. The guessing bound [5] When locking with an unknown basis, and more generally, when the message system C is classical, one possible measurement for Bob to perform on ρAC|B (the locked state) is to guess the message and then apply the corresponding optimal measurement. This provides an upper bound (called the guessing bound) on the jump of the classical mutual information in terms of the size of C. Let there be t possible messages, i.e, the size of C is log t bits. Then, the above argument says  1 (8) t Ic (ρA|CB ) − log t ≤ Ic (ρAC|B ) and thus the jump is upper bounded as Ic (ρA|CB ) − Ic (ρAC|B ) ≤ log t + (t − 1)Ic (ρAC|B ) .

30

(9)

2.4.4. The continuity bound [5] Finally, the amount of initial correlation also limits the extent of the locking effect. In particular, an initially uncorrelated state does not exhibit locking. A continuity argument can be used to bound the extent of locking as a function of the initial (locked) mutual information. This holds for the most general communication structure (quantum, two-way). More precisely, let ρ be a bipartite state on C d ⊗ C d and ρ′ be obtained 1 1 from ρ by l qubits of two-way communication. If Ic (ρ) ≤ 6 ln 2 (d+1)2 , then, p p Ic (ρ′ ) − Ic (ρ) ≤ 2l − (2d)2 (2 ln 2)Ic(ρ) log (2 ln 2)Ic(ρ) .

(10)

Note that the bound grows weak rapidly with the dimension d. 2.5. Mixed state locking schemes We have seen in Section 2.4.1 that pure state schemes cannot lock accessible information to below Euler’s constant. However, if we use mixed states for encoding the message, the accessible information can be made vanishing while the key remains negligibly-sized compared to the amount of mutual information it can unlock. 2.5.1. Locking only the message but not the key [19] In some applications, the secrecy of the key is not of concern. In this case, we can consider a locking scheme given by X 1 |xihx|A ⊗ |kihk|C ⊗ (Uk |xihx|Uk† )B . (11) ρACB = dt x,k

Ic (ρAC|B ) is simply the accessible information on the uniform ensemble of states ηx = † 1P k Uk |xihx|Uk . t

The analysis is similar to (and easier than) that in Section 2.2 and Eq. (4) is replaced by Ic (ρAC|B ) ≤ log d − min H({qx }) |φi

(12)

P where qx = hφ| 1t k Uk |xihx|Uk† |φi The Lipschitz constant of H({qx }) is upper bounded by  1 2 and the expectation is log d. A standard randomized argument then shows t 4 + 8(log d) that Ic (ρAC|B ) → 0 for t ≈ polyloglog(d).

An interesting fact to note is the following. We can denote the three random variables: the (a) message, (b) Bob’s measurement outcome, and (c) the basis by X, Y , and K respectively. Somewhat surprisingly, I(X:Y K) = I(XK:Y ), because they are both equal to I(X:Y |K) – this can be proved by using the chain rule and noting that I(X:K) = I(Y :K) = 0. Thus, the lower bound of Euler’s constant on I(XK:Y ) also applies to I(X:Y K) = I(X:Y ) + I(X:K|Y ). But the mixed state locking scheme makes the locked mutual information I(X:Y ) vanish and thus the I(X:K|Y ) term is responsible for the lower bound, and it is the correlation created between X and K (initially uncorrelated) due to knowing Y . This is what Bob learns in the pure state locking scheme! 2.5.2. Symmetric locking [20] Consider a computation basis |xki in Cd ⊗ Ct . Choose an arbitrary unitary on Cd ⊗ Ct according to the Haar measure, and fix it for the rest of the discussion. Apply the unitary to the message |xki and trace out the second t-dimensional quantum system. The remaining state has negligible accessible information when t ≈ polylog(d). 31

2.6. Consequences and Implications The fact accessible information can increase abruptly with a little extra classical information means that it is not a suitable security measure for some applications. For example, it has been used as a measure of security of QKD. The locking effect has motivated research in better security measure of QKD (universally composable security definition). More detail and references will be included in the conference presentation and the final version. 3. Locking other quantities These are omitted for this proceeding version due to time and space limitations. The schemes for locking accessible information (with unknown basis) can be applied to locking accssible information of Eve (the environment) which makes the entanglement of formation EF nearly maximal (by using the construction in [21] but the random basis locking scheme). Giving the environment the key (by discarding) changes EF to nearly zero. The same holds for the entanglement costs EC which is just EF in that example. A very different method but similar result for EF is given in [22] (but whether EF = EC is unknown in this case). These results will be discussed in the conference presentation and also in the final version. 3.1. Acknowledgments My deepest thanks to friends and colleagues whose research and discussion in this area have been a source of profound excitment and inspiration to me. Risking some careless omissions, this list includes Andris Ambainis, Manuel Ballester, Harry Buhrman, Matthias Christandl, David DiVincenzo, Frederic Dupuis, Aram Harrow, Patrick Hayden, Karol, Michal, and Pawel Horodecci, Robert Koenig, Hoi-Kwong Lo, Jonathan Oppenheim, Renato Renner, Graeme Smith, John Smolin, Barbara Terhal, Steph Wehner, Andreas Winter, and Jan whose last name I still have to learn. References [1] B. M. Terhal, M. Horodecki, D. W. Leung, and D. P. DiVincenzo, “The entanglement of purification,” J. Math. Phys., vol. 43, pp. 4286–98, 2002, arXiv e-print quant-ph/0202044. [2] T. Cover and J. Thomas, Elements of Information Theory. Wiley, New York, 1991. [3] A. Peres, Quantum Theory: Concepts and Methods. Dordrecht: Kluwer Academic, 1993. [4] When ρ is classical (pure), the measurement along the local product basis (Schmidt basis) is optimal by the data processing inequality [2] and the fact other measurement outcomes are obtainable by local processing of the optimal one. [5] D. DiVincenzo, M. Horodecki, D. Leung, J. Smolin, and B. Terhal, “Locking classical correlation in quantum states,” Phys. Rev. Lett., vol. 92, p. 067902, 2004. [6] H. Maassen and J. Uffink, “Generalized entropic uncertainty relations,” Phys. Rev. Lett., vol. 60, pp. 1103– 1106, 1988. [7] D. DiVincenzo, D. Leung, and B. Terhal, “Quantum data hiding,” IEEE Trans. Info. Theory, vol. 48, p. 580, 2001, quant-ph/0103098. [8] J. Schwinger, “Unitary operator bases,” Proc. Nat. Acad. Sci. USA, vol. 46, no. 4, pp. 570–579, 1960. [9] W. Wootters and B. Fields, “Geometrical description of quantal state determination,” Annals of Physics, vol. 191, pp. 368–381, 1989. [10] J. Sanchez, “Entropy uncertainty and certainty relations for complementary observables,” Physics Letters A, vol. 173, pp. 233–39, 1993. [11] M. Ballester and S. Wehner, “Entropic uncertainty relations and locking: tight bounds for mutually unbiased bsaes,” Phys. Rev. A, vol. 73, p. 022110, 2006. [12] S. Bandyopadhyay, P. Boykin, V. Roychowdhury, and F. Vatan, “A new proof for the existence of mutually unbiased bases,” 2001, quant-ph/0103162. [13] H. Buhrman, M. Christandl, P. Hayden, H.-K. Lo, and S. Wehner, “Possibility, impossibility and cheatsensitivity of quantum bit string commitment,” 2005, qquant-ph/0504078v1, Section 5.3 and Appendix.

32

[14] P. Hayden, D. W. Leung, P. W. Shor, and A. Winter, “Randomizing quantum states: Constructions and applications,” Commun. Math. Phys., vol. 250, pp. 371–391, 2004, arXiv:quant-ph/0307104. [15] R. Jozsa, D. Robb, and W. Wootters, “A lower bound for accessible information in quantum mechanics,” Phys. Rev. A, vol. 49, p. 668, 1994. [16] C. E. Shannon, “Communication theory of secrecy systems,” Bell Sys. Tech. J., vol. 28, pp. 656–715, 1949. [17] A. Ambainis, M. Mosca, A. Tapp, and R. de Wolf, “Private quantum channels,” in IEEE Symposium on Foundations of Computer Science (FOCS), 2000, pp. 547–553, LANL e-print quant-ph/0003101. [18] P. O. Boykin and V. Roychowdhury, “Optimal encryption of quantum bits,” LANL e-print quant-ph/0003059. [19] “Locking only the message but not the key,” the possibility to bring the accessible information on x (but not on k) to a vanishing amount was brought up during a Bellairs Workshop on Randomization Techniques in March 2005, between P. Hayden, D. Leung, G. Smith, and A. Winter. A photograph of the blackboard on which an upper bound of the Lipschitz constant was derived remains the only write-up. [20] F. Dupuis, P. Hayden, D. Leung, and Jan, “Symmetric locking,” in preparation. [21] K. Horodecki, M. Horodecki, sP. Horodecki, and J. Oppenheim, “Locking entanglement measures with a single qubit,” arXiv:quant-ph/0404096. [22] P. Hayden, D. W. Leung, and A. Winter, “Aspects of generic entanglement,” Commun. Math. Phys., vol. 265, pp. 95–117, 2006, arXiv:quant-ph/0407049.

33

34

Proceedings of the International Workshop on Statistical-Mechanical Informatics September 14–17, 2008, Sendai, Japan

Spin Chain under Next Nearest Neighbor Interaction L.C. Kwek1−3 , Y. Takahashi1 and K.W. Choo

2

1

Centre for Quantum Technologies, National University of Singapore, 3 Science Drive 2, Singapore 117543 2 National Institute of Education, Nanyang Technological University, 1 Nanyang Walk, Singapore 637616 3 Institute of Advanced Studies (IAS), Nanyang Technological University, 60 Nanyang View, Singapore 639673 E-mail: [email protected] Abstract. We perform a numerical study on the concurrence of the ground state Heisenberg XXX with next-nearest-neighbor interaction with open and periodic boundary condition.

1. Introduction Despite its simplicity, one-dimensional spin chains have always provided profound insights into numerous complex many-body problems. Moreover, there exists quasi one-dimensional systems that provides invaluable test-beds for many theoretical predictions, like spin-Peierls transition and magnetic susceptibility. Examples of these quasi one-dimensional systems are simple inorganic compounds like SrCu2O3 , VO2 P2 O7 , and CuGeO3 and organic compounds like TTF-CuS4 C4 (CF3 )4 [1]. One particularly successful model is the Heisenberg magnet described by a nearest neighbor Hamiltonian of the form H=J

N 

i+1 i · S S

(1)

i=1

i represents spin-1/2 operators along a chain of N sites and periodic boundary condition where S is assumed for a closed chain. Historically, the Heisenberg magnet or XXX model has provided a solid background and a strong motivation for many elegant studies into the algebraic structures and symmetry of the model. The model, together with the anisotropic XXZ model, forms the fundamental basis for the elegant mathematical structure of a quantum group. It has also been shown that there exists a rich algebraic structure called a Yangian [2] which provides the underlying symmetry for this model. Another interesting model is the XY model described by a Hamiltonian of the form H=

N 

y x Jx Six · Si+1 + Jy Siy · Si+1 ,

(2)

i=1

in which the z-component of the spin-spin interaction is missing. Indeed, under the HeitlerLondon approach, it is possible to describe the Coulomb interaction subject to Pauli Exclusion principle for two quantum dots as an XY model.

35

The study of cooperative behavior in magnets have also led to a prevalent interest in competing (frustrated) systems. Frustrated systems possess a high degeneracy of low energy states. Indeed, in the context of magnetic systems, geometrical frustration often leads to new states such as spin glass and spin liquid. In fact, a simple and natural way to incorporate frustration into a spin chain is to consider next nearest neighbor interaction of the form H=

N 

 i,i+1 + J2 H  i,i+2 J1 H

(3)

i=1

 could be a Heisenberg magnet or an XY model. where H In recent years, there has been increasing interest in observing entanglement in spin chains. Entanglement is perhaps the one of the most striking property of quantum mechanics. Entangled systems have been shown to be extremely useful as a resource for quantum information processing such as quantum teleportation [3] and superdense coding [4]. In this regard, entanglement in the ground state and thermal state of quantum spin chains, like the Heisenberg magnets, XY, XXZ models, has been extensively studied in many works[10, 11, 12]. In particular, it has been shown that spin chains can act as a quantum wire to transfer a quantum state [5] from one site to another. Recently, in addition to spin chains with only nearest neighbor interactions, a number of studies have been performed on spin chains with next-nearest-neighbor(NNN) interactions[6]. Also, extension to periodic spin-one Heisenberg model [7] has been studied, including an analytic investigation into 3-qubit next-nearest-neighbor model [8]. Previous studies regarding the effect of next-nearest-neighbor interaction focused on Heisenberg XXX antiferromagnetic next nearest neighbor interacion with periodic boundary condition(PBC). For finite chains, there is a distinct qualitative difference between open and closed chains with periodic boundary conditions. To carry out quantitative study of entanglement, here we introduce a qualititative measure of entanglement called concurrence [9], which is one of the useful quantitative measure of entanglement for spin-1/2 bipartite system. Define ρ˜ for a bipartite density matrix ρ as

ρ˜ = (σ y ⊗ σ y )ρ∗ (σ y ⊗ σ y )

(4)

where asterisk denotes the complex conjugate. Concurrence is given by 

C = max 0,



λ1 −



λ2 −



λ3 −





λ4

(5)

where λ1 ≥ λ2 ≥ λ3 ≥ λ4 is the eigenvalues of ρ˜ρ. Concurrence takes value between 0 ≤ C ≤ 1, C = 0 corresponds to a separable state and C = 1 corresponds to a maximally entangled state. The Hamiltonian for Heisenberg model with next-nearest-neighbor interaction can be written as H =



y x z J1 (σix σi+1 + σiy σi+1 + σiz σi+1 )

i

+



y x z J2 (σix σi+2 + σiy σi+2 + σiz σi+2 )

(6)

i

where σx , σy , σz are the Pauli matrices and J1 , J2 are coupling constants. It is hard to solve the eigenvalue problem of this Hamiltonian in general and spectrums have not been obtained yet, except for the particular value of J1 and J2 . For Heisenberg XXX model with periodic

36

boundary condition, when 2J1 = J2 , the model reduces to so-called Majumdar-Ghosh model and the ground state is known to be a superposition of state |φ1  and |φ2 . |φ1  = [1, 2][3, 4] · · · [N − 1, N ] |φ2  = [N, 1][2, 3] · · · [N − 2, N − 1]

(7)

where [i, j] denotes singlet 1 [i, j] = √ (|0i |1j − |1i |0j ) 2

(8)

at ith and jth site. The J1 − J2 model has been investigated theoretically over the decades. For arbitrary values of J1 and with J2 > 0, it is a frustrated (competing) system. As shown in Fig. 1, when J1 > 0 and J2 > 0 (AF-AF), the ground state is a spin liquid. The increase of the ratio of the coupling constants α ≡ J2 /J1 induces an infinite-order phase transition from a gapless state to a gapful dimerized state. The critical point αc is numerically estimated to be α ≈ 0.241. When α is further increased to the so-called Majumdar-Ghosh (MG) point at α = 1/2, the ground state is the products of singlet pairs formed by nearest neighboring spins.7,8 It is two-fold degenerate since the Z2 symmetry of translations by one site is spontaneously broken. When J1 < 0 and J2 > 0 (F-AF) with −1/4 ≤ α ≤ 0, the ground state is fully ferromagnetic (FM), and becomes an S = 0 incommensurate state for α > −1/4. It is suggested that in this incommensurate state the gap is strongly suppressed.10 For α = −1/4, the exact ground state can be shown11 to have a N + 2-fold degeneracy, comprised by S =0 and S=N/2 states (with N the lattice size). When J1 > 0 and J2 > 0 (AF-F), the system is believed to be in a gapless antiferromagnetic phase for any permissible values of J1 and J2 . In this article, we perform numerical analysis on entanglement property in Heisenberg XXX with next-nearest-neighbor interaction employing concurrence measure. Specifically, numerical simulation of concurrence betwenn nearest-neighbor sites and next-nearest-neighbor sites is carried out up to N = 11. In Sec.2, we present detailed numerical results on Heisenberg XXX model with next-nearest-neighbor interaction. Finally, in Sec.3, we present some discussion and provide our conclusion to the results. 2. Heisenberg XXX chain Hamiltonian for antiferromagnetic nearest-neighbor interaction is given as H = J1

N 

y x z (σix σi+1 + σiy σi+1 + σiz σi+1 )

i=1

+ J2

N 

y x z (σix σi+2 + σiy σi+2 + σiz σi+2 )

(9)

i=1

2.1. Periodic Boundary Condition 2.1.1. Antiferromagnetic nearest-neighbor interaction(J1 = 1) Results are plotted in Fig.2(even number of sites) and Fig.3(odd number of sites). We observe that NNN interactions in general suppresses NN entanglement both in J2 > 0 and J2 < 0, since NN entalglement takes its maximum around J2 = 0. However, it is possible to induce entanglement between NNN sites by AFM NNN interaction(J2 > 0). Note however that for N = 6, NNN entanglement is absent in all value of J2 . These results have already been observed in Ref.[6].

37

gapful

J2 aaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaa ferromagnetic aaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaa ground state aaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaa a= -0.25 aaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaa

a = 0.5 (M-J point) a = 0.241

J1 gapless gapless antiferromagnetic phase

Figure 1. Phase diagram for the Heisenberg magnet with next-nearest neighbor coupling Plot of C12 and C13 Versus J2 for J1= 1, N=4, 6, 8, 10 0.7 N=4 N=6 N=8 N=10

0.6 0.5 C12

0.4 0.3 0.2 0.1 0 −1.5

−1

−0.5

0

0.5

1

1.5

2

0.5

1

1.5

2

J2 1 N=4 N=6 N=8 N=10

0.8

C13

0.6 0.4 0.2 0 −1.5

−1

−0.5

0 J2

Figure 2. XXX model(J1 = 1) with PBC for N:even 2.1.2. Ferromagnetic nearest-neighbor interaction(J1 = −1) Results are plotted in Fig.4(even number of sites) and Fig.5(odd number of sites). Note that NN entanglement is absent for all

38

Plot of C

12

and C

13

Versus J for J = 1, N=5, 7, 9, 11. 2

1

0.4 N=5 N=7 N=9 N=11

C12

0.3

0.2

0.1

0 −1.5

−1

−0.5

0

0.5

1

1.5

2

J

2

0.3

C

13

0.25 0.2

0.15 N=5 N=7 N=9 N=11

0.1 0.05 0 −1.5

−1

−0.5

0

0.5

1

1.5

2

J2

Figure 3. XXX model(J1 = 1) with PBC for N:odd value of J2 . In fact, the absence of NN entanglement in usual periodic Heisenberg XXX model has been proved analytically [13]. Also, NNN entanglement for N = 6 is absent for all value of J2 . We see that NNN interaction does not contribute to enhancement of entanglement between NN pairs. Also, we observe that although the qualitataive behavior of entanglement is similar to the case of J1 = 1, the transition from non-entangled state to entangled state occurs at lower value of J2 for J1 = −1. 2.2. Open Boundary Condition 2.2.1. Antiferromagnetic nearest-neighbor interaction(J1 = 1) Results are plotted in Fig.6(C12 and C13 , even number of sites), Fig.7(C12 and C13 , odd number of sites), Fig.8(C23 and C24 , even number of sites), Fig.9(C23 and C24 , odd number of sites), Fig.10(C34 and C35 , even number of sites), and Fig.11(C34 and C35 , even number of sites). We find that in OBC, entanglement between certain pairs of NN can be enhanced by NNN interaction, which significantly differs from PBC case where all the NN pair entanglement are suppressed by NNN interaction. Also, NNN entanglement can be induced only when J2 > 0, which is identical to PBC case. Moreover, we observe some interesting peaks in the NNN entanglement(C35 ) for odd number of sites(N = 7, 9, 11), which can be found in Fig.11. In addition, it can be seen that C12 tends to have higher concurrence compared to C23 and C34 . We deem that this is because of the unique property of entanglement called monogamy. Since the sites in the middle of the chain have more sites to interact compared to the sites at the end, they tend to have stronger correlation with other parties. Note that for even number of sites, all the neighboring pairs such as (1, 2), (3, 4), · · · , (N − 1, N ) becomes maximally entangled state at J2 = 1/2. This opens up a possibility of quantum teleportation along an arbitrarily long chain by successive Bell state measurement [14].

39

Plot of C

12

and C

13

Versus J for J = −1, N=4, 6, 8, 10 2

1

1 N=4 N=6 N=8 N=10

C12

0.5

0

−0.5

−1 −1.5

−1

−0.5

0

0.5

1

1.5

2

0.5

1

1.5

2

J

2

1 N=4 N=6 N=8 N=10

0.8

C13

0.6 0.4 0.2 0 −1.5

−1

−0.5

0 J2

Figure 4. XXX model(J1 = −1) with PBC for N:even Plot of C12 and C13 Versus J2 for J1= −1, N=5, 7, 9, 11. 1 N=5 N=7 N=9 N=11

C12

0.5

0

−0.5

−1 −1.5

−1

−0.5

0

0.5

1

1.5

2

J

2

0.3

C13

0.25 0.2

0.15 N=5 N=7 N=9 N=11

0.1 0.05 0 −1.5

−1

−0.5

0

0.5

1

1.5

J2

Figure 5. XXX model(J1 = −1) with PBC for N:odd

40

2

Plot of C12 and C13 Versus J2 for J1= 1, N=4, 6, 8, 10 1 N=4 N=6 N=8 N=10

0.8

C12

0.6 0.4 0.2 0 −1.5

−1

−0.5

0

0.5

1

1.5

2

1

1.5

2

J2 1 N=4 N=6 N=8 N=10

0.8

C13

0.6 0.4 0.2 0 −1.5

−1

−0.5

0

0.5 J2

Figure 6. C12 and C13 for XXX model(J1 = 1) with OBC(N:even) Plot of C12 and C13 Versus J2 for J1= 1, N=5, 7, 9, 11. 1 N=5 N=7 N=9 N=11

0.8

C12

0.6 0.4 0.2 0 −1.5

−1

−0.5

0

0.5 J

1

1.5

2

1

1.5

2

2

1 N=5 N=7 N=9 N=11

0.8

C13

0.6 0.4 0.2 0 −1.5

−1

−0.5

0

0.5 J2

Figure 7. C12 and C13 in for XXX model(J1 = 1) with OBC(N:odd)

41

Plot of C

23

and C

24

Versus J for J = 1 , N=4, 6, 8, 10 2

1

2

0.35 N=4 N=6 N=8 N=10

0.3

C

23

0.25 0.2

0.15 0.1 0.05 0 −1.5

−1

−0.5

0

0.5

1

1.5

2

J

2

0.8

C24

0.6 0.4

N=4 N=6 N=8 N=10

0.2 0 −1.5

−1

−0.5

0

0.5

1

1.5

2

J2

Figure 8. C23 and C24 for XXX model(J1 = 1) with OBC(N:even) Plot of C23 and C24 Versus J2 for J1= 12, N=5, 7, 9, 11. 0.35 N=5 N=7 N=9 N=11

0.3

C23

0.25 0.2

0.15 0.1 0.05 0 −1.5

−1

−0.5

0

0.5

1

1.5

2

0.5

1

1.5

2

J

2

N=5 N=7 N=9 N=11

0.8

C24

0.6 0.4 0.2 0 −1.5

−1

−0.5

0 J2

Figure 9. C23 and C24 for XXX model(J1 = 1) with OBC(N:odd)

42

Plot of C34 and C35 Versus J2 for J1= 12, N=6, 8, 10 1 N=6 N=8 N=10

0.8

C34

0.6 0.4 0.2 0 −1.5

−1

−0.5

0

0.5

1

1.5

2

0.5

1

1.5

2

J2

N=6 N=8 N=10

0.5 0.4 C35

0.3 0.2 0.1 0 −1.5

−1

−0.5

0 J2

Figure 10. C34 and C35 for XXX model(J1 = 1) with OBC(N:even) Plot of C34 and C35 Versus J2 for J1= 12, N=5, 7, 9, 11. 0.8 N=5 N=7 N=9 N=11

C34

0.6

0.4

0.2

0 −1.5

−1

−0.5

0 J

0.6

0.5

1

1.5

2

0.5

1

1.5

2

2

N=5 N=7 N=9 N=11

0.5 C35

0.4 0.3 0.2 0.1 0 −1.5

−1

−0.5

0 J2

Figure 11. C34 and C35 for XXX model(J1 = 1) with OBC(N:odd)

43

2.2.2. Ferromagnetic nearest-neighbor interaction(J1 = −1) Results are plotted in Fig.12(C12 and C13 , even number of sites), Fig.13(C12 and C13 , odd number of sites), Fig.14(C23 and C24 , even number of sites), Fig.15(C23 and C24 , odd number of sites), Fig.16(C34 and C35 , even number of sites), and Fig.17(C34 and C35 , even number of sites). We see that as in the case of PBC, entanglement in NN pair is absent in the whole region of J2 if NN interaction is ferromagnetic(J1 < 0). Also, transition from non-entangled state to entangled state occurs at lower value of J2 . We observe an interasting shark peak in C35 for N = 7, however peaks are not observed for N = 9 and N = 11 which are observed in the case of J1 = 1. In addition, absence of entanglement in whole region of J2 is observed in C35 for N = 8. Plot of C

12

and C

13

Versus J for J = −1, N=4, 6, 8, 10 2

1

1 N=4 N=6 N=8 N=10

C12

0.5

0

−0.5

−1 −1.5

−1

−0.5

0

0.5

1

1.5

2

0.5

1

1.5

2

J2 1 N=4 N=6 N=8 N=10

0.8

C13

0.6 0.4 0.2 0 −1.5

−1

−0.5

0 J

2

Figure 12. C12 and C13 for XXX model(J1 = −1) with OBC(N:even)

3. Conclusion We have carried out comprehensive numerical study on the ground state concurrence of both Heisenberg XXX model and Heisenberg XX model with periodic and open boundary condition. We have found that although next-nearest-neighbor interaction does not enhance entanglement in nearest-neighbor pair in Heisenberg XXX model with periodic boundary condition, for Heisenberg XXX model with open boundary condition, nearest-neighbor entanglement could be enhanced by next-nearest-neighbor interaction. This feature is also present in Heisenberg XX model both in periodic and open boundary condition. Also in Heisenberg XXX model, next-nearest-neighbor entanglement can be induced only by antiferromagnetic(J2 > 0) nextnearest-neighbor interaction regardless of periodic or open boundary condition. In addition, we find that in XXX system although the sign of nearest-neighbor interaction has little effect on the behavior of concurrence, transition from non-entangled state to entangled state occurs at lower value of J2 if nearest-neighbor interaction is ferromagnetic(J1 < 0).

44

Plot of C

12

and C

13

Versus J for J = −1, N=5, 7, 9, 11. 2

1

1 N=5 N=7 N=9 N=11

C12

0.5

0

−0.5

−1 −1.5

−1

−0.5

0

0.5

1

1.5

2

0.5

1

1.5

2

J

2

1 N=5 N=7 N=9 N=11

0.8

C13

0.6 0.4 0.2 0 −1.5

−1

−0.5

0 J2

Figure 13. C12 and C13 in for XXX model(J1 = −1) with OBC(N:odd) Plot of C23 and C24 Versus J2 for J1= −1, N=4, 6, 8, 10 1 N=4 N=6 N=8 N=10

C23

0.5

0

−0.5

−1 −1.5

−1

−0.5

0

0.5

1

1.5

2

J

2

1 0.8

C24

0.6 0.4

N=4 N=6 N=8 N=10

0.2 0 −1.5

−1

−0.5

0

0.5

1

1.5

2

J2

Figure 14. C23 and C24 for XXX model(J1 = −1) with OBC(N:even)

45

Plot of C

23

and C

24

Versus J for J = −1, N=5, 7, 9, 11. 2

1

1 N=5 N=7 N=9 N=11

C23

0.5

0

−0.5

−1 −1.5

−1

−0.5

0

0.5

1

1.5

2

0.5

1

1.5

2

J

2

1 N=5 N=7 N=9 N=11

0.8

C24

0.6 0.4 0.2 0 −1.5

−1

−0.5

0 J2

Figure 15. C23 and C24 for XXX model(J1 = −1) with OBC(N:odd) Plot of C34 and C35 Versus J2 for J1= −1, N=6, 8, 10 1 N=6 N=8 N=10

C34

0.5

0

−0.5

−1 −1.5

−1

−0.5

0

0.5

1

1.5

2

0.5

1

1.5

2

J

2

0.6 N=6 N=8 N=10

0.5

C35

0.4 0.3 0.2 0.1 0 −1.5

−1

−0.5

0 J2

Figure 16. C34 and C35 for XXX model(J1 = −1) with OBC(N:even)

46

Plot of C

34

and C

35

Versus J for J = −1, N=5, 7, 9, 11. 2

1

1 N=5 N=7 N=9 N=11

C34

0.5

0

−0.5

−1 −1.5

−1

−0.5

0

0.5

1

1.5

2

0.5

1

1.5

2

J

2

0.8

N=5 N=7 N=9 N=11

C35

0.6 0.4 0.2 0 −1.5

−1

−0.5

0 J2

Figure 17. C34 and C35 for XXX model(J1 = −1) with OBC(N:odd) 4. Acknowledgment KLC would like to acknowledge financial support provided under the A*Star Grant No. 012-1040040. This work is also supported by the National Research Foundation Ministry of Education, Singapore [1] [2] V. G. Drinfeld, Quantum groups, Proceedings of the International Congress of Mathematicians, Berkeley, USA, 798 (1987). [3] C.H.Bennett et.al., Phys. Rev. Lett. 70, 1985 (1993) [4] C.H.Bennett and S.J.Wiesner, Phys. Rev. Lett. 69, 2881 (1992) [5] Sougato Bose, Phys. Rev. Lett. 91, 207901 (2003); quant-ph/0212041 [6] Shi-Jian Gu, Haibin Li, You-Quan Li, and Hai-Qin Lin, Phys. Rev. A 70, 052302 (2004); quant-ph/0403026 [7] Zhe Sun, XiaoGuang Wang and You-Quan Li, New Journal of Physics 7, 83 (2005) [8] R.Liu, M.-L. Liang, and B.Yuan, Eur. Phys. J. D 41 571-578 (2007) [9] W.K.Wootters, Phys. Rev. Lett. 80, 2245 (1998) [10] M.C.Arnesen, S.Bose and V.Vedral, Phys. Rev. Lett. 87, 017901(2001); quant-ph/0009060 [11] K.M.O’Connor, W.K.Wootters, Phys. Rev. A 63, 052302 (2001) [12] X.Wang, Phys. Rev. A 64, 012313(2001) [13] Xiaoguang Wang and Paolo Zanardi, Phys. Lett. A 301, 1 (2002); quant-ph/0202108 [14] J.P.Barjaktarevic, R.H.Mckenzie, J.Links, and G.J.Milburn, Phys. Rev. Lett. 95, 230501 (2005)

47

48

      

         

      !

               

                                       !     "#$%"  &   '(!   )   *    +   ,% -  . /     $$0 &     1      

       2      3            2 /        2 /    45''     6       7            2     /     8 9  /          2       2          /        3 1         2                  2         

4  2     68 *     :    2         5''              7           2     2       

8   :  /             +  /                8 - / /  :            /    472 6 /         + /     7     

  8   :           2       / 2                2   :2  2          

  /   :  8

 

                                                           ! "                                    #                         $                       

                                                 %          

                                   $               &         '                 '            49

                                   #                                         () *+          "     , -            

  "     ,  -                 .  - , , , ,            /         012   33    .            4  '           .  -   (. ).  .  + #    $     .    -   .  

4     .               %  . .   .  # .   .       .      5            012.                '               .   .    4          012                              $         3      6                         

                #                                           3                     (7+              3            '               4  "          %                  3    & ) )   *  8  8  )    *  *  )  *    *  ) *  8   *   8 )  *  )             9 (7+ 6       :      5                         #                    4            "                              $  4                                     "                        

                

          '             #        %             ;   <                                         ¼

¼

¼





¼

¼

¼



¼



¼



 

¼

 



 

¼









¼

¼

 







¼

¼





¼

¼



¼

¼

¼











 

¼

¼

¼

¼

¼



¼

¼

¼



¼

¼





¼

¼

¼

¼

¼

 



¼

¼



¼





¼

¼

¼











 

 













 

 











 

 

 

 

¼

 

 

50





















¼

4                %     

  &                

                       (=+ !                  "           %        3)*   #                    

                  

   ! "  %            3           3    (>+    %                     4               "               &                     :   4        . - 88 88 * . - )* ? ? ? ? ?  7  *   8 )  4    %                       $   4 %          $                4        

     (7+   $              4    '   # 6 *                 %  $ # 6 7      '            # 6 =   %                    6 >                 6  @  A    '    6  B  '   







 





 



 



 

 

 

 

    

4             (@+      .  .   3     $  C *  7 4                             4                                        

    .     6           012       . .  - . .  - 8 =      - *          - 8 )    $    . !  - 8 )   . .              .   %               $"                     '           ?            4   '                      - " (  ?   + #                       -   % (B+          !                  :         88 ))      8) )8 #     .      





























 







 

















51

       )D*          $

               88  *  88 ? ))     - )* 4  3%      $        )*      $   ) )   (E+ 4        

  - ) (=)  + >  8  )* 

















 





 

 



























 



      

#              -       #           .  .      

   "       88 8) )8 ))  ) 8 8 8 ) 8 8 )  8 8 8 8  . - )  8 ) ) 8 

. -  @ 8 8 8 8 = 8 ) ) 8  ) 8 8 ) 8 8 8 8 4              F C"              &     G     A     &    )   G         &     G          &              .             012                  .   (H+            4   '     "      *            $   '          *  ?          4 012    '     .    ' 3 :                "      8 8   8   8 

B 8   8  8 8  #      $       8    

E 4    (. . + -   (. . + - 8     ) 8 8 ) .  -   88 )) )) 88 

H ) 8 8 ) 4   (. . + - 8       .        8 8 8 8 .  -  88 8 8 88 

)8 8 8 8    

















  











































 









































52











4     

)   8 8  .  -  88 )   )   88  ))  8 8 )      ) *   ) * )   4       - (. . + - ?*=

  

  - )  (=)  + )*  8  )* 4       .  - .   . *  . . ? . .  )7 . ) *  . . ? . .  ?)   . . )= . () ?  *+ . . ? . .  )> )  8  )     .      )             #      '            4                                $  012                    8 ) *           $           #   $        $            #                   012 .   . - . .)  . - .) .) /"                       4 012 $  .    . - . .      . - . 9                  012 .      .  - . . !      $    - 8 ) *        - I  #              $     .     :   !  4               !.            . !.    !       . . 4  $    012   .) .) . . . )@

    :  .) !.)   .  !.)   .  !.    .  !.  )B        !           5  :   !   ' '     ! ) 4  A          $                $     

             







































 





































 



 





 



















 





 















  





 









 



 









½ 

½

½

½



½ ¾

½ 

½ ½





½ ¾

½ ¾

¾



½ ¾ ½ ¾

½

½

 ½

½ ¾ ¡¡¡ 

½

½ ¾ ¡¡¡







½





½

½ ¾

½

½ ¾ ¿ 

½ ¾

53



½ ¾ ¿

½ ¾



y 1

R−

1 + ξ0 2

R−

x ?1 -1

0



1

1 + ξ0 2

R+ R+

?1 -1

!  9$        4                                  " .  4    3      012  .   .  -  .     " .   -  " .    5  " .    8  "          " .     8 4     $           "              $   4        & " .    8    " .      8 )E             :          $   ½ ¡¡¡

½ ¡¡¡

½ ¡¡¡

½

 

½ ¡¡¡

 

½ ¡¡¡

½

 

½ ¡¡¡

½

 

½ ¡¡¡

½

 

½

 

½

 

½





½ ¡¡¡





½ ¡¡¡

·½ ¡¡¡

" #             

#               .  8 #  # )*   

       ; <   012  .  . . . $ 88  88 - .     $  8       ' ' :  $   . 9  !.  9   )H    .    . 4       :  .) .)  .) !.) - 8 8 4  $ ! %       - 8 ) #  - I  - 8   .  .  - 8  .        . .   . .    ! - ) )   ) )   - 8

*8 

 











 







 















½





54



 

 





#  - I  - ) .  .  - 8   $.    - 8   - )

*) #       -         - I  - *            !  **  ! -            - *

4      C )>      .  - .    ?  ?              8 5 .        . . 4   $          "        $   .  4   $  " .   ?  ?  ?  =  ) ?   ?     *7     .  !     . - . .    . -  !.  - !   *= " .  - $. (= !   *) ?   !+ - $. (%  ! ? )   + *>  %  ! (* ) ?  +(*! ) ?  +

*@   C *8J**  *=     " .        8  E) ?  $        -              6    8        - 8    $.   8 ! C *8       ! - ) )            ) )    $   &  ! %  ! 8           ! ) #     ' '          &          ) )  .       & ! C *>   " .   $.  )   $. )    8   # )*                "      $      " .     8 5      $ 3       -   

 























 









¦

½

¦

¦

½



¦









 





 









 











¦





¦





 









¦







 







½

¦





½ ¡¡¡



¦

 





¦







½ ¡¡¡







½ ¡¡¡





½ ¡¡¡









½ ¡¡¡





 ½ ¡¡¡   

$ %&     '  !

·½ ¡¡¡

6                 "                                '       8              " .   8         4                    -         "                  4       9 ()8+      

  '    =) ) ()*? =) *B   +  

¦

 





  



½ ¡¡¡

     

 









 





55





















0.05 (sep)

Popt

u

3 γ1 − (1 − 2γ0 ) 4

(LOCC)

Popt

l1

l2 0

0.50

γ0

!              4       K  

  

7) * =     %        -       "           8 )          ? 5                         4       -             























 



      '   8 #  # )*    ! *            $   $   #   $     ?  L  %         8 ?    8         -    ) (*)  + #         8 )        - 8 7= 2 "       - 8 7=       -  * )   * )            4     ! *  

















































 







( ) *



                4            "           "           4          $          4        "    F

                 - -                   ())+ 4     "                               %          4                        '                     "                                  

 

+ ,! 

 

 

 

4   % ! 4%  4 M  / #               4 %     N 6    0  56

 6  N606 O3 3  6  $ 9 *8>=87EH

-   ;< '  & *

= 9 .  >  5 /    0$$   

 ",, ;0< !  >  ? @   5  9 0$$   

 "%$A ; < > 

' - ? @  B ' C  ! (     9   & C  9

 9 . DDD   $A$ ;,< -    -     -   ( DD%   

 "0 D ;"< -    -     -   ( DDA   

  "A, ;#< *  * D%A  

  0"A ;A< >  & C - @   -

  0$$   

 0"AD$ ;%< &     C DD"  

   % ;D<   C DD#   

  , ;$< .   !    B E    !  *  F 4    6 ;< '  E G  E  0$$    $#, $

57

58

Proceedings of the International Workshop on Statistical-Mechanical Informatics September 14–17, 2008, Sendai, Japan

Quantum response to time-dependent external field Seiji Miyashita1,5 , Shu Tanaka2 , Hans De Raedt3 , and Bernard Barbara4 1

Department of Physics, Graduate School of Science, The University of Tokyo, 7-3-1 Hongo, Bunkyo-Ku, Tokyo 113-0033, Japan 2 Institute for Solid State Physics, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, 277-8581, Japan 3 Department of Applied Physics Zernike Institute of Advanced Materials University of Groningen Nijenborgh 4, NL-9747 AG Groningen The Netherlands 4 Laboratoire Louis N´eel, CNRS 25 Ave. des martyrs, BP 166, 38 042 Grenoble Cedex 09, France. 5 CREST, JST, 4-1-8 Honcho Kawaguchi, Saitama 332-0012, Japan E-mail: [email protected] Abstract. Recently, explicit real time dynamics has been studied in various systems. These quantum mechanical dynamics could provide new recipes in information processing. Here, we study quantum dynamics under time dependent external field. In particular, we compare the thermal and quantum fluctuation in the transverse Ising model. Moreover we study other types of quantum fluctuations, e.g. effects of the Dzyaloshinsky-Moriya interaction in single molecular magnets. Effects of dissipation are also discussed.

1. Introduction Recently, quantum dynamics has attracted interest in the field of information processing, i.e., data propagation by quantum cryptography and quantum computing [1], and quantum annealing method [2]. In order to manipulate the quantum state, we need to understand how the quantum state changes under a change of external field. In general, when we study quantum dynamics, we describe a state of the system by a wavefunction which is generally a superposition of the eigenstates. Therefore, the quantum state covers all the configuration space (Hilbert space) and has an advantage in the data processing. In particular, the quantum annealing makes use of this advantage to find the desired state. Peculiar quantum properties have been found in the so-called single molecular magnets, such as Mn12 [3], Fe8 [4], V15 [5], etc. These systems could be candidates for a storage of information. Here, we will study some properties of quantum fluctuation. 2. Quantum fluctuation 2.1. single spin case First, let us study the quantum fluctuation of a single spin system defined by HTI0 = −Hσ z − Γσ x ,

59

(1)

where σ α denotes the α-th component of the Pauli operator  x

σ =

0 1 1 0



 y

σ =

,

0 −i i 0



 z

σ =

,

1 0 0 −1



.

(2)

Here, we take the z axis as the quantization axis. In the case Γ = 0, the magnetization of the z-component is a good quantum number and no quantum fluctuations exist. The eigenstates are given by σ z |+ = |+,

and

σ z |− = −|−.

(3)

The eigenenergies are E1 = −H and E2 = H, which are depicted by the dotted lines in the Fig. 1(a) and we call them “diabatic state”. When we set an initial state at a negative field (say the point denoted by the open circle in the Fig. 1(a)) and change the field to a positive large field, the state follows the curve and simply comes to the point denoted by the open triangle. Here, the spin state does not change at all. If we add the term of Γ (the transverse field), the energy levels are given by the solid curve. In this case, the infinitesimally slow sweep of the field adiabatically leads the state to the point denoted by the closed circle. Here, the state is always in the ground state, which we call “adiabatic state”. If we sweep the field with a finite speed, then some amount of the state is scattered to the excited state (the open triangle). This process is described by the LandauZener-St¨ uckelberg mechanism [6, 7], where the probability of staying in the ground state is given by   (ΔE)2 , (4) PLZS = 1 − exp −π 4¯hvΔM where ΔE is the energy gap at the avoided level-crossing point (H = 0) and in the present case ΔE = 2Γ, and v is the velocity of the field dH(t)/dt, and ΔM is the difference of the magnetizations of the diabatic state (ΔM = 2). The magnetization processes M (t) for various sweep speeds are depicted in Fig. 1(b) 2.2. Cooperative systems: The transverse Ising model The effects of quantum fluctuation on the ordering phenomena have been studied. The most typical model is given by the transverse Ising model [8] H = −J



σiz σjz − Γ





σix ,

(5)

i

where ij denotes interacting pair sites. The transverse field causes tilt of spins in the classical case, and it causes spins to flip in the quantum case. Thus, this field reduces the correlation function of the z-components. In the case Γ = 0, there are two eigenstates | + + · · · + and | − − · · · − which are degenerate. We may consider these states as classical stable states. In the presence of Γ, the ground state is given by a linear combination of states. This state can be considered as a quantum mixing state of  two classically stable states. Competition between z z the classical order due to the interaction J σi σi+1 and the quantum fluctuation due to  x Γ i σi causes a so-called quantum phase transition in the ground state. First we study the one-dimensional transverse Ising model. In Fig. 2(a), we depict the eigenenergies as functions of H with Γ = 0.5J (N = 6). In the case H = 0, a quantum phase transition takes place at J = Γ. Also at H = 0, the Hamiltonian is expressed in terms of fermion annihilation and creation operators, cq and c†q as H=2

 



Γ2 + 2ΓJ cos q + J 2 c†q cq + const. .

q

60

(6)

E1, E2

1

1 M(t)

H 0 0

−1

0 0

1

−1

t

2

−1

(a)

(b)

Figure 1. (a) The eigenenergies as functions of the field(H). The dotted curves give those for Γ = 0, and the solid curves give eigenenergies for Γ = 0.1. (b) The time dependences of the magnetization M (t) for various values of the sweeping speeds v = 0.005, 0.01, 0.02, 0.03, 0.05 and 0.1. The bold solid curve denotes the adiabatic change. ΔE 3

Δ E03

10 2

5

Δ E02

E/J

0 -5 1

-10 Δ E01 X 1000

-15 -20 -2

-1.5

-1

-0.5

0 0.5 Hz/J

1

1.5

0 0

2

20

40 N

(a)

(b)

Figure 2. (a) Eigenenergies as functions of H of the one-dimensional transverse Ising model with Γ = 0.5J. (b) The energy gaps at H = 0 between the ground state and the first excited state (dots), the second excited state (circles) and the third excited state (crosses). Γ = 0.5J. Using this dispersion relation, we can calculate the energies of low energy states. In Fig. 2(b), we depict the size dependence of the energy gaps between the ground state and the excited states. We find that the energy gap between the ground state and the first excited state ΔE01 becomes exponentially small which represents the symmetry breaking phenomenon below the critical point (Γ < J). On the other hand, the energy gap between the ground state and the second and third excited states, ΔE02 and ΔE03 , respectively, stays almost constant. The time

61

T/J

1 disordered state

0.5 ordered state

0 0

Γ /J

1

Figure 3. Schematic phase diagram of the transverse Ising model in high dimensions (D ≥ 2). evolution of this magnetization under field sweep M (t) = Ψ(t)|



σiz |Ψ(t)

(7)

i

reflects the energy structure. Here, Ψ(t) is the wavefunction in the dynamical process. When we slowly sweep the magnetic field H(t) the magnetization shows a stepwise magnetization process as a sequence of LZS transitions at the avoided level crossing points [9]. If we sweep fast, then the magnetization shows a size-independent shape, which we call “quantum spinodal decomposition” [10]. The process sweeping Γ at H = 0 can be studied exactly [11]. Although, in one dimension, the model has no ordered phase at finite temperature, in higher dimensions, the model has an order-disorder phase transition at a finite temperature for small values of Γ. At Γ = 0 the system is the usual Ising model. Therefore, we have a phase diagram schematically depicted in Fig. 3. When we change parameters along the solid arrow (Γ = 0) quantum fluctuations are absent, and the thermal fluctuation induces the phase transition. On the other hand, when we change parameters along the dotted arrow, (T = 0), the thermal fluctuations are absent, and the quantum fluctuation induces the phase transition. 3. Comparison of thermal and quantum fluctuations First, let us compare the natures of the thermal and quantum fluctuations. In order to describe both fluctuations, it is convenient to express the state by the density matrix ρ=

M 

wij |ij|.

(8)

i=1

In the classical case, the diagonal elements represent the probabilities of the state i, i.e. P (i) = wii . In the quantum system, the off-diagonal elements give the quantum fluctuations. The dynamics of the density matrix is given by the Bloch equation i¯h

∂ ρ = [H, ρ] . ∂t

(9)

In order to find the density matrix for the equilibrium state, quantum Monte Carlo method can be used. Then, the density matrix is expressed by (d + 1)-dimensional configuration using

62

the Suzuki-Trotter decomposition which is a kind of path-integral representation of the density matrix [12]. The equilibrium density matrix of the model is expressed by ρ = e−βH /Z,

Z = Tre−βH .

(10)

In order to study the nature of fluctuation, we study matrix elements of the density matrix which correspond to snapshots of quantum Monte Carlo simulation. β

−M H i|ρ|i = i|ΠM |i. m=1 e

(11)

For large M we may approximate as 

ρ ∝ exp βJ



z σiz σi+1

+ βΓ

i









 exp

σix

i



βJ  z z σ σ exp M i i i+1



βΓ  x σ M i i

 M

.

(12)

Introducing intermediate states {σik = ±1} which are the eigenstates of σiz at the k-th product, the Boltzmann weight is expressed by ρ∝

M



exp

k=1



βJ  k k σ σ σik | exp M i i i+1





βΓ  x σ |σik+1 , M i i

(13)

where the boundary condition σM +1 = σ1 . Using the following relation 



β β 1 x σ|e M ΓS |σ   = A exp − ln(tanh Γ)σσ  , 2 M

(14)

with

2β 1 sinh Γ, (15) 2 M the d-dimensional transverse Ising model is mapped into (d + 1)-dimensional Ising model {σik } with anisotropic couplings A=

β βJ 1 , and Kimaginary time = − ln(tanh Γ). (16) M 2 M If β/M is small, i.e. at a high temperature, Kreal space is small and Kimaginary time is large. Thus, the fluctuation occurs in the real space. On the other hand, at low temperatures, the fluctuation occurs in imaginary axis. In Fig. 4, we depict typical configurations of (a) the thermal and (b) quantum fluctuations in the quantum Monte Carlo simulation. Kreal

space

=

3.1. Reentrant phenomena In classical systems, we have pointed out that frustrated configurations cause non-monotonic developments of ordering. In some systems, the sign of the correlation function changes as a function of the temperature, i.e. an antiferromagnetic correlation appears at high temperatures while a ferromagnetic one at low temperatures. This kind of non-monotonic effect causes a reentrant phase transition where different types of phases appear successively when the temperature changes [13, 14]. There the distribution of density of states (entropy) takes an important role. Similar behavior has been observed in the transverse Ising model with the change of Γ [15]. There, the quantum fluctuation is affected by the frustration as well as the thermal fluctuation. A typical example of reentrant type Γ dependence of the correlation function is depicted in Fig. 5 for a frustrated lattice whose Hamiltonian is given by Hreentrant = J  σ1 σ2 − J

n 

(σ1 + σ2 )sk ,

k=1

with J = 1 and J  = n2 J. 63

σi = ±1,

and si = ±1,

(17)

Transverse−Ising model T=3, G=5 classical fluctiation

Imaginary time

Transverse−Ising model T=0.01, G=5 quantum fluctiation

Imaginary time

40

40

20

20

0

0 0

20

40

0

Real space

20

40

Real space

(a)

(b)

Figure 4. Typical configurations of (a) thermal (T /Γ = 3/5) and (b) quantum fluctuations (T /Γ = 0.002) in the quantum Monte Carlo simulation. The red points denote up spins, and the blue ones down spins.

(a)

(b)

(c)

Figure 5. A typical example of reentrant type of the correlation function. (a) A frustrated lattice. The circles denote the spins σ1 and σ2 , and the triangles denote the decoration spins sk . (b) Correlation function between the spin 1 and 2 as a function of the temperature (n = 4, 8, 12, 16, and 20), and (c) correlation function between the spin 1 and 2 as a function of Γ. (N = 2, 4, 6 and 8). 3.2. Quantum annealing In order to find the ground state of a complicated system H0 , there have been proposed various methods. The so-called annealing method is one of the typical methods for this purpose. In the usual thermal annealing method, the thermal fluctuations produce possible candidates of states for the update. Monte Carlo method provides an ensemble of states for an equilibrium state by making use of a kind of Markov chain (master equation). If we use only local updates of the state, the realization of equilibrium ensemble is often difficult due to the frozen effects due to energy barriers and/or entropy barriers [14]. To avoid the freezing, there have been

64

introduced various techniques such as multi-canonical Monte Carlo method [16], temperature exchange Monte Carlo method [17], etc. These methods employ a wide variety of fluctuations that cause the sampling to be more efficient, and accelerate to converge to the desired ensemble. The Swendsen-Wang algorithm [18] introduces a graphical representation to realize an efficient sampling in which cluster flips performed systematically. Quantum fluctuations have been also used to efficiently search for the ground state, a technique that is called quantum annealing [2]. There the transverse field generates the quantum fluctuations. In the limit Γ = ∞, the state is the ferromagnetic state aligned to the x direction, which is the sum of all state 

|Fx  =

|σi , · · · , σN 

(18)

σ1 =±1,···,σN =±1

is the ground state. When we reduce Γ the ground state changes adiabatically to the ground state of the original system. Generally we believe that there is no level crossing during the process Γ → 0. Thus when we gradually reduce Γ, the ground state moves to the ground state at Γ = 0. This is the idea of the quantum annealing and has been successfully applied to various systems. Recently, the proof of the convergence has been given [19] for the quantum case as well as the classical case [20].i We are attempted to apply the idea of quantum annealing for the cluster classification[21] and also for improvement of the variational Bayes inference[22]. Here let us consider mechanisms to find the ground state. In order to find ground state, the simplest method is the exact diagonalization. If the system has K states we need K 2 memory. In order to reduce the necessary memory several methods have been invented, e.g. the power method, Lanczos method, etc. Then we need the memory proportional to K. In the spin system, K = 2N . Thus, these methods correspond to a full search over all the states. On the other hand, the Monte Carlo method in the classical systems requires only memory of the order N . In the quantum Monte Carlo method N × Nτ , where Nτ is the number of points along the imaginary axis. Thus, both systems has advantage when N is large. As we saw in Fig. 3, beside the thermal annealing (the solid line) and the quantum annealing (the dotted line), there are many other paths to reach the ground state. It would be an interesting problem to find the optimal path starting from the point (T = Γ = ∞). 4. Other types of quantum fluctuation The so-called single molecular magnets such as Mn12 , Fe8 , V15 , etc. have attracted interests because they show a sign of quantum dynamics through the discrete energy level structure. The degree of freedom inside the single molecules would be used as a storage of information. Above we have studied the quantum fluctuation in the transverse Ising model. In this section, we study other types of quantum fluctuations. In general, when an interaction H does not commute with the order parameter M (the magnetization in the above case) [M, H ] = 0

(19)

the magnetization is no longer good quantum number, and we say that the system has the quantum fluctuations. Then we find interesting properties in the dynamics of M under the change of parameters of the Hamiltonian. Here we introduce some of examples. 4.1. Dzyaloshinsky-Moriya interaction in the triangle lattice As the important interaction which does not commute with the magnetization and causes a kind of quantum mixing is the Dzyaloshinsky-Moriya (DM) interaction HDM =



Dij · S i × S j .



65

(20)

−2

E(H)

E(H)

E(H)

1

1

1

0 0

H

2

0 0

−2

2

H

0 0

−2

−1

−1

−1

−2

−2

−2

(a)

(b)

2

H

(c)

Figure 6. Energy structure of an equilateral triangle lattice with DM interaction (Dx = Dz = 0.2J) for (a) θ = 0◦ , (b) θ = 45◦ and (c) θ = 90◦ , where H · D = cos θ. This interaction is characterized the vectors D ij and thus the vectors must be compatible with the symmetry of the lattice. Here, we study the effect of this interaction on a system consisting of three spins making an equilateral triangle. The Hamiltonian is given by H3 = J

3 

S i · S i+1 −

3 

i

D i · S i × S i+1 − H

i

3 

Si,

(21)

i=1

where S 4 = S 1 . Because of the symmetry, it is required that D1z = D2z = D3z and 

D1x D1y





=R

D2x D2y





= R2

D3x D3y



,

(22)

where R is a matrix of the rotation of 120◦ . Here it should be noted that when H is parallel to D, the magnetization along the field commutes with the Hamiltonian and no adiabatic transition takes place (Fig. 6(a)). On the other hand, if they are not parallel, avoided level crossing structures appear. In Fig. 6, we show the energy structures as a function of the field for the case the angle between H and D is 0◦ , 45◦ and 90◦ . Avoided level crossing structures also appear at the crossing of the states with M = 1/2 and M = 3/2. In Fig. 6(b), we have the following interesting characteristic of the adiabatic change of the magnetization. We start from the ground state in a large positive H where the magnetization is almost +3/2. When we reduce the magnetization from it, the state follows the curve drown by a thick curve, and it goes to a level with M = −1/2 at large negative H. If we start from the ground state in a large negative H, it goes to the state of M = 1/2. This indicates that the adiabatic change does not follow the ground state. The double degeneracy of the S = 1/2 states are characterized by the chirality, because the eigenstates of the states for the translation operation are ei2π/3 and ei4π/3 . In the presence of the DM interaction the states with different chirality are not degenerate. The same property holds for other cases except the case of θ = 0. Various interesting magnetization loops in a field cycling appear according to the structures of the adiabatic energy levels [24]. As we saw above, the DM interaction causes energy gaps which strongly depend on the direction of the field. In contrast, it is known that the hyperfine interaction between the electric spins and nuclear spins causes gaps which are independent of the field direction [25]. Recently, coherent quantum dynamics of driven Rabi oscillation was observed in V15 [26].

66

1

0.5

0.5 v=0.1 v=0.2 v=0.4 adiabatic

0 -0.5 -1 -40 -20

0

20

40

M(H)

M(H)

1

v=2×10−3 v=8×10−3 v=2×10−2 isothermal

0 -0.5

60

-1 -10

80

-5

0

H(t)

5

10

15

H(t)

(a)

(b)

Figure 7. (a) Magnetization process for v = 0.1, 0.2, and v = 0.4. (b)Magnetization process for v = 0.001, v = 0.006, and v = 0.01. (From J. Phys. Soc. Jpn. 70 (2001) 3385.) 4.2. Particle conveyance by a trap potential When we consider the motion of particles, the operators of the momentum p and the position x do not commute. Thus, in the process of acceleration of the particle, there occurs various interesting quantum effects[27]. 4.3. Realization of the Nagaoka-ferromagnetism by removal of an electron It is known that the total spin of the Hubbard model is zero in the half-filled bipartite lattices, while it takes the maximum value when an electron is removed. This mechanism is called “Nagaoka ferromagnetism” [28]. We can demonstrate an adiabatic change between these state if we add an extra lattice point and absorb an electron by a strong chemical potential in a magnetic field [29]. 5. Dissipation effect Now, we discuss effects of environments on the adiabatic process, which we call ‘magnetic Foehn effect’[30]. Here, we consider the effects in the simplest model (1) to simulate the phononbottleneck effect found in a magnetic molecule V15 which belongs to this category [5]. We use a quantum master equation[31] for various sweeping velocities. From now on, we set parameters as Γ = 0.5, T = 1.0, and λ = 0.001. In Fig. 7(a), we present the magnetization curves for fast sweeping rates, v = 0.1, 0.2, and 0.4. Here we clearly find that the magnetic plateau decreases when v increases, which is consistent with (4). The increase of the magnetization after the plateau is a process of relaxation to the equilibrium state caused by the dissipation term. The dotted line there denotes the adiabatic curve of the magnetization. In the case of much slow sweeping rates, we again find the magnetic plateau as shown in Fig. 7(b), although the LZS transition probability (4) is almost one in these sweeping rates. Here, the sweeping rates are v = 2 × 10−3 , 8 × 10−3 , and 2 × 10−2 . The dotted line denotes the isothermal curve of the magnetization at the present temperature. We should note that the magnetic plateau in this figure increases when v increases, which is opposite to the fast sweeping case. 6. Summary and Discussion Here we explored properties of quantum fluctuations and possible manipulations of them. Because the nature of quantum fluctuation is different from that of thermal fluctuation, it could play important roles in information processing. In particular, we expect that the nontrivial properties of quantum dynamics would provide new developments of information processing.

67

6.1. Acknowledgments This work was partially supported by a Grant-in-Aid for Scientific Research on Priority Areas “Physics of new quantum phases in superclean materials” (Grant No. 17071011), and also by the Next Generation Super Computer Project, Nanoscience Program of MEXT. We also thank the supercomputer center, Institute for Solid State Physics, University of Tokyo for the use of the facilities. 7. References [1] Nielsen M A and Chuang I L 2000 Quantum Computation and Quantum Information (Cambridge University Press). [2] Kadowaki T and Nishimori H 1998 Phys. Rev. E 58 5355, Das A and Chakrabarti B K 2005 Quantum Annealing And Related Optimization Methods (Springer-Verlag, New York), Santoro G E and Tosatti E 2006 J. Phys. A 39 R393, Das A and Chakrabarti B K 2008 arXiv:0801.2193 to appear in Rev. Mod. Phys. [3] Thomas L, Lionti F, Ballou R, Sessoli R, Gatteschi D and Barbara B 1996 Nature 383 145. Friedman J R, Sarachik M P, Tejada J and Ziolo R 1996 Phys. Rev. Lett 76 3830. [4] Sangregorio C Ohm T, Paulsen C, Sessoli R and Gatteschi D 1997 Phys. Rev. Lett. 78 4645. Wernsdorfer W and Sessoli R 1999 Science 284 233. [5] Chiorescu I, Wernsdorfer W, M¨ uller A, B¨ ogge H and Barbara B 2000 Phys. Phys. Lett. 84 3454. [6] Zener C 1932 Proc. R. Soc. London Ser A137 697, Landau L 1932 Phys. Z. Sowjetunion 2 46, St¨ uckelberg E C G 1932 Helv. Phys. Acta 5, 3207. [7] Miyashita S 1995 J. Phys. Soc. Jpn. 64 3207, Miyashita S 1996 J. Phys. Soc. Jpn. 65 2734. [8] Chakrabati B K, Dutta A and Sen P 1996, Quantum Ising Phase Transitions in Transverse Ising Models (Springer-Verlag, Heiderlberg). [9] De Raedt H, Miyashita S, Saito K, Garcia-Pablos D and Garcia N 1997 Phys. Rev. B 56, 11761. [10] Miyashita S, De Raedt H, and Barbara B, unpublished. [11] Dziarmaga J 2005 Phys. Rev. Lett. 95, 245701. [12] Suzuki M 1976 Prog. Theor. Phys. 56 1454. [13] Miyashita S 1983 Prog. Theor. Phys. 69 714. Kitatani H, Miyashita S and Suzuki M 1986 J. Phys. Soc. Jpn. 55 865. Miyashita S and Vincent E 2001 Eur. Phys. J B 22 203. [14] Tanaka S and Miyashita S 2005 Prog. Theor. Phys. Suppl. 157 34. [15] Tanaka S and Miyashita S unpublished. [16] Berg B A and Neuhaus T 1991 Phys. Lett. B 267 249. Berg B A and Neuhaus T 1992 Phys. Rev. Lett. 68 9. [17] Hukushima K and Nemoto K 1996 J. Phys. Soc. Jpn 65 1604. [18] Swendsen R H and Wang J S 1987 Phys. Rev. Lett. 58 86. Wolff U 1989 Phys. Rev. Lett. 62 361. [19] Morita S and Nishimori H 2006 J. Phys. A 39 13903. Morita S and Nishimori H 2007 J. Phys. Soc. Jpn. 76 064002. [20] Geman S and Geman D 1984 IEEE Trans. Pattern Anal. Mach. Intell. 6 721. [21] Kurihara K, Tanaka S and Miyashita S unpublished. [22] Sato I, Kurihara K, Tanaka S, Nakagawa H, Miyashita S unpublished. [23] Miyashita S and Nagaosa N 2001 Prog. Theor. Phys. 106 533. De Raedt H, Miyashita S, Michielsen K and Machida M 2004 Phys. Rev. B 70 064401. [24] Choi K Y, Matsuda Y H, Nojiri H, Kortz U, Hussain F, Stowe A C, Ramsey C and Dalal N S 2006 Phys. Rev. Lett. 96 107202. [25] Miyashita S, De Raedt H and Michielsen K 2003 Prog. Theor. Phys. 110 889. [26] Bertaina S, Gambarelli S, Mitra T, Tsukerblet B, M¨ uller A and Barbara B 2008 Nature 453 203. [27] Miyashita S 2007 J. Phys. Soc. Jpn 76 104003. [28] Nagaoka Y 1966 Phys. Rev. 147 392. [29] S. Miyashita, unpublished. [30] Saito K and Miyashita S 2001 J. Phys. Soc. Jpn.70 3385. [31] Weidlich W and Haake F 1965 Z. Phys. 185 30, Kubo R, Toda M and Hashitsume N 1985 Statistical Physics II (Springer-Verlag, New York), Louisell W H 1973 Quantum Statistical Properties of Radiation (Wiley, New York). Saito K, Takesue S and Miyashita S 2000 Phys. Rev. E 61 2397.

68

Proceedings of the International Workshop on Statistical-Mechanical Informatics September 14–17, 2008, Sendai, Japan

Dissipative quantum dynamics Leticia F Cugliandolo Universit´e Pierre et Marie Curie – Paris VI Laboratoire de Physique Th´eorique et Hautes Energies UMR 7589 4, Place Jussieu, 75252 Paris Cedex 05 France E-mail: [email protected] Abstract. This article reviews recent research on dissipative macroscopic quantum bosonic and fermionic systems. These studies address the following issues. (i) The existence of static and dynamic phase transitions; order of the critical lines and type of phases. (ii) The dynamics of systems that are unable to reach equilibrium with their (equilibrium) environment due to their intrinsic slow dynamics or driven by an external drive in the form of a time-dependent magnetic field, coupling to source and drain leads, etc. (iii) The development of an effective temperature that controls the large-scale dynamics and a two-time dependent decoherence phenomenon. (iv) The role played by the environment in the behaviour of the systems.

1. Introduction Interest in the out of equilibrium dynamics of (possibly macroscopic) quantum systems has been recently boosted by a number of experiments and applications. On the experimental side spin-glass phases have been identified in many condensed matter systems at very low temperature. Among them we can cite the bi-layer Kagome system Sr Crs Ga4 O19 [1], the polychlore structure Lix Zn1−x V2 O4 , [2], the dipolar magnet LiHox Y1−x F4 [3] and several high Tc compounds [4]. Quantum glassy phases generated by other physical mechanisms appear also in electronic systems, an example being the Coulomb glass [5, 6], and ‘structural’ glasses as Mylar or BK7 [7]. As it is well-known, glasses are characterised by a extremely slow relaxation that occurs out of equilibrium. The study of out of equilibrium steady states in small quantum systems has a practical interest due to their relevance to nano devices [8]. The external drive can be provided by time-dependent magnetic fields [9, 10] or electric currents flowing through the systems. It is only recently that some authors started to study the effect of a drive on quantum macroscopic systems undergoing phase transitions [11]. A rather satisfactory understanding of the non-equilibrium dynamics of coarsening phenomena, glasses, weakly driven complex liquids and granular matter have been achieved and it is based on the solution of mean-field simple models. This approach leads to a common picture for the dynamics of classical systems in the limit of small entropy production [12]. How much of the classical glassy phenomenology survives at very low temperatures where quantum effects are important is a question that deserves careful theoretical and experimental analysis. The impossibility of simulating the real-time evolution of quantum systems of moderate size enhances the importance of solving simple mean-field or low dimensional models with quantum fluctuations.

69

On the other hand, the rapidly developing field of quantum computation aims at solving hard optimisation problems with the help of quantum mechanics. Typically, hard problems are also glassy ones at least in their typical instances. Indeed, a mapping between dilute spin-glass models and the typical instances of several combinatorial optimisation problems has been established (K-satisfiability problems are related to p-spin glass models [13], etc.). Understanding the metastable state organisation and slow dynamics of quantum disordered systems appears then as a necessary step before attempting to develop an special quantum device to solve them. Concretely, a number of questions we tried to give an answer to are: (i) Find the (static and dynamic) critical surfaces in the T, Γ, g, V parameter space, with T temperature, Γ strength of quantum fluctuations, g coupling strength to the environment and V drive strength. Establish the order of the all phase transitions. Characterise the static and dynamic phases. Determine the conditions under which equilibrium, steady state or glassy phases are reached. (ii) Analyze the real-time dynamics after a quench to the different phases. (iii) An important theoretical result in the field of classical glassy nonequilibrium dynamics is that an ‘effective temperature’, Tef f , controls the low-frequency linear response in the limit of small entropy production (long waiting-time, weak drive) [17, 18]. Tef f is defined as the parameter replacing the environmental temperature, T , in the fluctuation-dissipation relation (fdr) evaluated at low-frequencies and it has the properties of a temperature in the sense that can be measured with a thermometer and controls heat flows and partial equilibration. Whether an effective temperature develops in quantum out of equilibrium problems and which are its consequences is a generic question that we also wished to address. (iv) Study decoherence and localization in the different phases. The effect of dissipation on the phase transition, critical behaviour and ordered phase of macroscopic interacting systems is a relatively recent subject of study. Whether a localization transition [14, 15], known to exist in individual two-level system for sufficiently strong coupling strength to the bath, exists and which are the properties of quantum decoherence in a macroscopic ensemble of interacting two-level systems in contact with an environment is an interesting problem that might be of relevance in several areas of condensed-matter and, also, in establishing limitations to quantum computation (see, for example, [16] for a discussion in the context of Josephson junction realizations of qubits). In short, we summarize here the results presented in [19]-[29]. The organisation of the paper is the following. In Sect. 2 we recall the definition of the models that we studied. Sect. 3 is devoted to a very short review of the analytic and numeric techniques that we used. Section 4 summarizes the main results of our studies. 2. The models We analyzed the statics and dynamics of macroscopic quantum systems coupled to equilibrium and out of equilibrium quantum environments. We mainly studied bosonic models coupled to bosonic (oscillator) baths but we also treated cases in which the system degrees of freedom and/or the environment ones are fermionic. 2.1. The system Disordered quantum spin- 12 models with two-body interactions are defined by HS = −

 ij

Jij σ ˆiz σ ˆjz +

 i

Γi σ ˆix +



hi σ ˆiz .

(1)

i

The spins are represented by Pauli matrices satisfying the SU(2) algebra. i = 1, . . . , N labels the spins in the sample. In finite dimensions, the spins lie on the vertices of a cubic d dimensional lattice and the interaction strengths Jij couple near-neighbours only. The coupling strengths

70

Jij are chosen from a probability distribution, P (J). Mean values over P (J) are indicated with square brackets and the average and variance are defined as [Jij ] = Jo and [Jij2 ] = J 2 /(2c), where Jo and J are order one and c is the connectivity of the lattice. The next-to-last term is a coupling to a local “transverse” field Γi that can be taken from another probability distribution, Φ(Γ). The last term represents the coupling to a longitudinal field that one can include to compute local susceptibilities. Several generalizations that render the model easier to treat analytically, and become especially interesting in the context of optimization problems, have been considered. 2.1.1. Fully-connected limit Allowing each spin to interact with all others in the sample, c → N − 1, leads to the quantum extension of the Sherrington-Kirkpatrick mean-field spinglass. This is the d → ∞ limit in which the model is defined on the complete random graph. The scaling of the variance, [Jij2 ] ≈ J 2 /(2N ) ensures a good thermodynamic limit, N → ∞. 2.1.2. Multi-spin interactions In the fully-connected case one can consider p-spin interactions with Hamiltonian [21, 22, 25, 31, 32] HS = −



Ji1 ...ip σ ˆiz1 . . . σ ˆizp +

i1 ...ip

 i

Γi σ ˆix +



hi σ ˆiz .

(2)

i

p is an integer parameter that may take any integer value p ≥ 3, and the sum runs over all p-uplets. The exchanges are Gaussian random independent variables with variance p!J 2 /(2N p−1 ). This model is particularly interesting since it provides a mean-field description of the structural glass transition and glassy physics that is also intimately related to the modecoupling approach [12]. The p-spin model can also be defined on a dilute graph. The connection with the K-satisfiability optimization problem has been discussed in a number of papers, see, for example, [13]. 2.1.3. Spherical case – a particle in a random potential If, instead of working with spinN 1 σi2  = N in which the σ ˆi play the role of i=1 ˆ 2 variables one considers a ‘spherical limit’ coordinate on an N -dimensional space and one includes a kinetic term K=

N 1  Pˆ 2 , 2M i=1 i

(3)

ˆj ] = with Pˆi the conjugated momentum satisfying the commutation rules [Pˆi , Pˆj ] = 0 and [Pˆi , σ −i¯hδij one obtains the model of a particle moving on an N -dimensional hypersphere under the effects of the random potential (1) [19, 20]. The spherical quadratic model (p = 2) is equivalent to a fully-connected model of rotors in the limit in which the number of components diverges, and to quantum coarsening in d = 3 as described by an O(N ) field theory in the large N limit [29]. 2.1.4. Disordered field-theories Models of elastic manifolds in quenched random potentials are simple extensions of the previous case. For proper choices of the quenched random potential they model a large class of systems including charge density waves, Wigner crystals (upon generalization to a two component vector u) and Luttinger liquids in d = 1 (see [28] and [34] for a list of references).

71

2.1.5. A metallic wire We also studied the dynamics of a metallic wire [26] driven out of equilibrium by a time-dependent magnetic flux and coupled to an electronic reservoir. We modelled the wire with an ideal system of non-interacting spin-less electrons that we described with a one dimensional periodic tight-binding chain with length L = N a (N is the number of sites and a the lattice spacing), hopping matrix element w = W/4 and bandwidth W : HS = −w

N  

e−iφt c†i ci+1 + eiφt c†i+1 ci



.

(4)

i=1

The time-dependent phase φt with φ ≡ Φ/(Φ0 N ) and Φ0 = hc/e accounts for the external magnetic flux that depends linearly on time, ΦM (t) = Φt. 2.1.6. Other cases Of course, many other quantum situations deserve to be explored. A particular interesting case, which is actively being studied experimentally [5, 6], is the case of Coulomb glasses. Numerical [35] and analytic [36, 37] studies of different models intended to describe the physical problem appeared in the literature. 2.2. Environment The dynamics of a classical system coupled to a classical environment are described by different kinds of stochastic evolution rules (Langevin, Glauber, Montecarlo, etc.) depending on the symmetry of the dynamic variables. Quantum mechanically, the interaction of a system with a bath is highly non-trivial. We focused on two types of environments. 2.2.1. Harmonic oscillators in equilibrium A standard model of a (bosonic) bath is an ensemble of independent quantum harmonic oscillators with different masses and frequencies. The coupling between system variables and oscillators is typically chosen to be linear in the coordinates. The oscillator degrees of freedom appear quadratically and can be treated exactly. The interaction leads to dissipation, decoherence, and a localization transition in the case of a single two level system when the coupling is sufficiently strong (see below). 2.2.2. Electronic reservoirs (possibly out of equilibrium) By coupling a system to one or more fermionic reservoirs (whose properties are not affected by the coupling to the system) one also introduces dissipation. When using more than one ‘lead’ at, say, the same temperature but different chemical potential a voltage drop between the leads is induced and a current flows across the system driving it out of equilibrium. In the ‘bosonic’ systems the coupling of the ‘coordinate’ or ‘spin’ to an electronic reservoir can be treated perturbatively as described in detail in [47, 29]. For free-electrons, at this level of the approximation, just the electron Green functions appear and then the environment is fully characterised by its density of states, temperature and chemical potential. At sufficiently low frequency (ω β¯ h) the effect of the fermionic reservoir approaches the one of an Ohmic bosonic bath not necessarily in equilibrium. In the same limit a parameter playing a role similar to a classical temperature, T ∗ , can be identified. Still, as we shall discuss below, T ∗ , cannot be identified with the temperature of the (out of equilibrium) system. In [26] we modelled the lead with a semi-infinite tight-binding chain in equilibrium at temperature T and chemical potential μ and with a semicircular spectral density. 2.3. System-bath coupling The coupling to the environment is modelled by three terms that are added to the Hamiltonian: H = HS + HB + HI + HCT ,

72

(5)

where HB is the Hamiltonian of the bath, HI represents the interaction between the system and the bath and HCT is a counter-term that is added to eliminate an undesired mass renormalization induced by the coupling to the oscillators [30, 19, 20]. In the case of a bosonic bath the full environment is usually taken to be a set of independent quantum harmonic oscillators [19, 20, 28]. For simplicity we considered a bilinear coupling with counterterm given by HI = −

N  i=1

σ ˆiz

˜ N 

cil x ˆl

HCT =

l=1

˜ N  l=1

1 2ml ωl2

N 

2

cil σ ˆiz

.

(6)

i=1

For p = 2 the fully-connected model reduces to a model for metallic spin-glasses [33]. In the problem of the metallic ring [26] the contact term between the lead and the ring was chosen to be HI = −w1α (c†1 cα + c†α c1 ). 3. The techniques The basic techniques used to study classical glassy models with or without quenched disorder are well documented in the literature (the replica trick, the cavity method, scaling arguments and droplet theories, the dynamic functional method used to derive macroscopic equations from the microscopic Langevin dynamics, functional renormalization, Montecarlo and molecular dynamic numerical methods). On the contrary, the methods needed to deal with the statics and dynamics of quantum macroscopic systems are much less known in general. In this Section we briefly mention the tools we used to study the statics and dynamics of the models listed above. 3.1. Matsubara replicas The statics of a quantum model in the canonical ensemble can be analyzed by using the Matsubara approach to express the partition function as a path-integral. The quantum model can be the one of a system or a system+bath in which case the whole ensemble is assumed to be described by the canonical measure at a given temperature. In mean-field cases with quenched randomness one uses the replica approach to average over disorder and Parisi’s replica symmetry breaking prescription to study the ordered and disordered phases (see, e.g., [22, 24, 25] for more details). In the freely relaxing problems a ‘trick’ within the replica method can be used to derive the dynamic phase diagram and some properties of the various phases: the Ansatz of marginal stability (ams). Originally developed for classical systems by Kirkpatrick and Thirumalai, this method was recently used to discuss the low-temperature properties of quantum glasses [22, 38]. Its main advantage is that it uses a formalism that is closely related to the imaginary-time approach to equilibrium quantum statistical mechanics. In Ref. [22] the ams was extensively applied to the quantum spherical p-spin model in the absence of the bath. It was shown that the position of the dynamic transition line predicted by this method coincides precisely with that obtained using the real-time approach [19, 20]. It was also shown that the time dependent correlation function computed using the ams in the absence of the bath is identical to the stationary part of the non-equilibrium correlation function (C > qEA ) when one takes the long-time limit first and the limit in which the coupling to the bath goes to zero next. The marginality condition imposed by the Ansatz is intimately related to the fact that the correlation will further decay from qEA towards zero. (The details of this second decay as, for instance, the two-time scaling are not accessible with this method.) A localized solution with C(τ + tw , tw ) approaching, and never leaving, the plateau at qEA corresponds, in replica terms, to a stable replica symmetry solution. In [24, 25] we extended the ams to study the dynamics of models coupled to an environment.

73

3.2. Schwinger-Keldysh real-time The usual methods of equilibrium quantum statistical mechanics are inappropriate to describe a system that is unable to relax to equilibrium or a system that is permanently driven. We studied the dynamics of the fully-connected models [19, 20, 24, 28, 29] using a method especially designed to treat systems out of equilibrium: the Schwinger-Keldysh real-time approach [30]. The typical initial condition we used is one in which system and bath are uncoupled until t = 0 when they set into contact. The initial density matrix is then ρ = ρS ⊗ ρB with ρB in equilibrium at the working temperature (and chemical potentials in the case of fermions) and ρS representing a situation that is completely uncorrelated with the system Hamiltonian. Thus, the initial time corresponds to a rapid quench to a point in parameter space selected by the external parameters. The analysis of initial conditions that ‘know’ about the equilibrium (or metastable) states of disordered systems is harder and we shall discuss it elsewhere [39]. 3.3. Thouless-Anderson-Palmer method In the classical case, the study of the Thouless-Anderson-Palmer (tap) free energy landscape has been of much help to understand the behaviour of these systems [12]. A tap approach can also be developed for the quantum problem [23]. It helps understanding the change in nature of the transition close to the quantum critical point. 3.4. Cavity method The cavity method is specially suited to solve classical spin models on sparse random graphs (mean-field models with finite connectivity). An extension of this method to deal with quantum spin- 12 models in a transverse field was introduced in [42] using a discrete time Trotter-Suzuki representation of the path-integral and improved in [43] by taking the continuous imaginary-time limit analytically and presenting an explicit replica symmetry Ansatz for ferromagnetic model in a Bethe lattice. 4. The results In this Section we present a short survey of the main results found in the series of papers [19]-[29]. 4.1. Phase transition We start by summarizing the phase transitions found. 4.1.1. Pair interactions (p = 2) In the p = 2 case without a drive (V = 0) there is a static second order critical line in the T, Γ plane separating a disordered and an ordered phase. The former ‘continues’ the classical paramagnet while the latter is an ordered phase. In the real-time dynamics of with a bosonic [40] or fermionic [29] bath in equilibrium, taking the g → 0 limit after the asymptotic long-times limit one finds the same second order critical line (in the sense that the Edwards-Anderson order parameter, qEA ≡ limτ →∞ limtw →∞ C(τ + tw , tw ) vanishes at criticality). The dynamics in the ordered phase are equivalent to 3d coarsening in the quantum O(N ) model in the large N limit [29]. If g remains finite the critical lines depend on g as discussed below. When a drive is applied (V = 0) there is a trully out of equilibrium phase transition between the ‘outer’ quantum nonequilibrium steady state and the ‘inner’ driven quantum coarsening regime. The form of the critical manifold depends on the details of the fermionic baths and it is discussed in [29]. The dynamic transitions are of second order.

74

4.1.2. Multi-spin interactions (p ≥ 3) In the freely relaxing multi-spin models coupled to harmonic oscillator baths the static [31, 32, 21, 22] and dynamic [21, 21] transitions at the critical point (T = 0, Γc ) are of first-order. The first-order line extends at small temperatures and ends in a tricritical point in which the transition becomes second order. Across the first order critical line the susceptibility is discontinuous and shows hysteresis. This is similar to what has been observed in the dipolar-coupled Ising magnet LiHox Y1−x F4 [3]. This behaviour appears to be generic of classical disordered models with random first order transitions at Γ = 0 (models whose statics is solved by a one-step replica symmetry Ansatz). It was shown in [41], using the random energy model as a simple example in this family of models, that at the first order transition the eigenstate suddenly projects onto the unperturbed ground state and the gap between the lowest states is exponentially small in the system size. This sets a clear limit into the performance of quantum annealing procedures to find the ground state of this class of systems. 4.2. Dynamics Let us first discuss the freely relaxing (undriven) dynamics in the macroscopic disordered spin models [19, 20, 24] (see also [40, 46, 47] for similar studies of different models). In the disordered phase the dynamics are fast and occur in equilibrium. The auto correlation and the linear response functions are invariant under time-translations. They both show oscillations, as is typical of a quantum problem. The frequency of the oscillations depends on Γ [and the characteristics of the bath (g, s), see below]. Correlations and responses are linked by the quantum fluctuation dissipation theorem (fdt). At high temperatures and after a short transient there is a decoherence effect and the dynamics become totally classical. For example, responses and correlations are related by the classical fdt. In the ordered phase the dynamics show typical glassy features. There is separation of time scales controled by tw : for short t − tw with respect to tw the relaxation is stationary and the quantum fdt holds. For longer t − tw (comparable to tw ) the waiting-time dependence remains explicit and the relaxation is the slower the longer tw . The quantum fdt is not verified but, instead, one observes that a classical one, with an effective temperature Tef f > 0 holds. The relaxation of the symmetric correlation and linear response show oscillations in the stationary regime but they are monotonic in the aging regime. In all respects, the relaxation in the aging (t/tw finite) regime looks classical at an effective temperature Tef f . The effect of a drive on the p = 2 spherical model introduced in the form of a potential drop between the two fermionic reservoirs was analysed in [29]. The current flowing through the system drives it out of equilibrium in the full phase diagram. In the ‘outer’ phase a steady state is reached but the correlation and linear response do not satisfy the quantum fdt. In the ‘inner’ phase the system tends to order in the sense that one identifies a coarsening phenomenon as in the freely relaxing case with aging properties. Note the difference with the classical sheared problem [45] in which the external drive kills the aging relaxation (and renders the dynamics stationary) at all finite bath temperatures (T > 0). The effective temperature linking correlations and linear responses in the coarsening phase diverges, as in the classical limit. Moreover, in the p = 2 driven problem we found an extension of the irrelevance of T in classical ferromagnetic coarsening (T = 0 ‘fixed-point’ scenario): after a suitable normalization of the observables that takes into account all microscopic fluctuations (e.g. qEA ) the scaling functions are independent of all parameters including V and Γ. We expect this result to hold in all instances with the same type of ordered phase, say ferromagnetic, and a long-time aging dynamics dominated by the slow motion of large domains. Thus, a large class of coarsening systems (classical, quantum, pure and disordered) should be characterized by the same scaling functions. The effect of a drive on the p ≥ 3 cases and, in particular, its effect on the (first-order) phase

75

transition lines has not been studied yet. We plan to explore this problem in the future. 4.3. Coupling to the bath: localization The coupling of quantum two-level systems (tls) to a dissipative environment has decisive effects on their dynamical properties. The dilute case, in which interactions between the tls can be neglected, has been extensively investigated in the literature [14, 15, 44]. Under certain circumstances the model can be mapped onto the 1d Ising model with inverse squared interactions, the anisotropic Kondo model, or the resonant model [44]. Three different cases exist depending on the value of g: in the Ohmic case, at zero temperature, there is a phase transition at g = 1 [15]. For g < 1 there is tunneling and two distinct regimes develop. If g < 1/2 the system relaxes with damped coherent oscillations; in the intermediate region 1/2 < g < 1 the system relaxes incoherently. For g > 1 quantum tunneling is suppressed and ˆ σz  = 0 signalling that the system remains localized in the state in which it was prepared. These results also hold for sub-Ohmic baths while weakly damped oscillations persist for super-Ohmic baths. At finite temperatures (but low enough such that thermal activation can be neglected), there is no localization but the probability of finding the system in the state it was prepared decreases slowly with time for g > gcrit . In thermodynamic equilibrium, in the absence of the bath, interactions between the tls lead to the appearance of an ordered state at low enough temperature. If the interactions are of random sign, as in models with quenched randomness the latter will be a spin glass (sg) state. In this phase the symmetry between the states σiz = ±1 at any particular site is broken but  z σi  = 0. Since the coupling to the bath also tends to there is no global magnetization, i ˆ locally break the symmetry between the degenerate states of the tls, both interactions compete with the tunneling term in the Hamiltonian. One thus expects the quantum noise to increase the stability of the sg state against quantum fluctuations. The consequences of this fact are particularly interesting when the coupling to the bath leads by itself to localization at some g = gcrit . Consider a system of size N with g > gcrit at T = 0 and suppose that we turn off the interactions between the tls. The ground state of the system is then 2N -fold degenerate as each tls can be in one of the states ˆ σiz  = ± σ0 (say) independently. If we now turn on an infinitesimal random interaction between the tls, this macroscopic degeneracy will be immediately lifted as the system will select among its 2N degenerate configurations the one (or one among the ones) that minimizes the interaction energy. If we denote by J the typical scale of the interactions and by gcrit the localization threshold, we thus expect a quantum critical point at J = 0, g = gcrit between a quantum paramagnet and the ordered state such that, for g > gcrit , the sg phase survives down to J = 0. A system of non-interacting localized tls and a sg state in equilibrium are in some way  z σi  = 0 and the presence of order is reflected  by a non-vanishing value similar: in both cases i ˆ of the long-time limit of the correlation function, qEA = limt→∞ N −1 i σiz (t)σiz (0) (since we assume equilibration the correlation is stationary and the reference time can be taken to be zero). However, this resemblance is only superficial since the dynamics are quite different, in particular, the way in which the correlation function reaches its asymptotic limit, qEA . Further differences between the localized state and the sg state are seen from the study of the out of equilibrium relaxation of such states. As already mentioned, an important feature of glassy systems is that their low-temperature dynamics occur out of equilibrium and the dynamic correlation functions loose time translation invariance. If tw denotes the time elapsed since a quench from the high temperature phase into the sg phase, C(τ + tw , tw ) depends on both τ and tw . The order in which the limits tw → ∞ and τ → ∞ are taken is in this case very important. For sufficiently long τ and tw but in the regime τ tw , the dynamics are stationary and the correlation function reaches a plateau qEA . Much of what was said above for the equilibrium state also holds for this stationary regime. However, for times τ ∼ tw , the system enters an

76

aging regime where the correlation function depends on the waiting-time tw explicitly. In this regime, the dynamic correlation function vanishes at long times, limτ →∞ C(τ + tw , tw ) = 0, at a rate that depends on tw . This is to be confronted to the dynamics in the localized state, where C(τ + tw , tw ) reaches, for any waiting-time tw and long enough τ a plateau that it never leaves. In the aging regime, even for g > gcrit , small interactions will result in the destruction of localization of the tls at long enough times. The problem of a single tls being a difficult one, that of an infinite set of interacting tls seems hardly solvable. Therefore, as a first step, we focused on the effect of the reservoir on the p-spin spherical model [24]. We found that the position of the static critical line separating the disordered and the ordered phases strongly depends on the strenght of the coupling to the bath and the type of bath (ohmic, subOhmic, superOhmic). For a given type of bath, the ordered glassy phase is favored by a stronger coupling. Ohmic, subOhmic and superOhmic baths lead to different transition lines. The classical static transition temperature corresponding to Γ → 0, with Γ the parameter controlling the strenght of the quantum fluctuations, remains unchanged by the coupling to the quantum heat reservoir. The spherical model localizes in the absence of interactions when coupled to a subOhmic bath. When interactions are switched on localization disappears and the system undergoes a phase transition towards a glassy phase. The Ansatz of marginal stability allowed us to identify the dynamic critical line that is consistent with the one found studying the real-time dynamics of the coupled system with the Schwinger-Keldysh closed time-path formalism. The effect of the coupling to the environment on the dynamic critical line is similar. The classical dynamic transition temperature corresponding to Γ → 0, also remains unchanged. 4.4. Effective temperature The thermodynamic relevance of the parameter replacing the environmental temperature in the fdt in slow classical glassy relaxation was discussed in [17, 18]. A similar analysis in the quantum context has not been developed yet. In particular, questions such as what happens with heat flows when two quantum problems with different Tef f are set in contact, can Tef f be measured with a thermometer, is the violation of the quantum FDT – and hence the difference between the effective and the environmental temperature – bounded by a function of some kind entropy production time-variation as done in [48] for classical stochastic problems. The environment plays, in a sense, a dual rˆole: its quantum character basically determines the phase diagram but the coarsening process at long times and large length-scales is the one of a classical problem in a white noise at an effective temperature Tef f . This two-time dependent decoherence phenomenon (absence of oscillations, validity of a classical FDT when t/tw = O(1), etc.) is thus intimately related to the development of a non-zero (infinite in the p = 2 case) effective temperature, Tef f , of the system as defined from the deviation from the (quantum) FDT. Tef f should be distinguished any temperature, say T ∗ , that could be identified from a low-frequency analysis of the noise kernels. Tef f is generated not only by the environment but by the system interactions as well (e.g. cases in which Tef f > 0 even at T ∗ = 0 can be seen in [19, 20, 28, 29, 46, 47]). We conclude that the non-vanishing character of the effective temperature is responsible for the decoherence effect observed at large time and length scales that renders the aging dynamics classical. In all quantum models studied up to present a quench from the disordered state into the ordered one was considered and the effective temperature was found to be larger or equal than the ambient one. In classical problems the ‘reversed’ procedure in which one heats the system from the zero-temperature ground state to a finite temperature still within the ordered phase was considered and an effetive temperature that is smaller than the working one was found. It would be extremely interesting to check whether a similar result is found quantum mechanically.

77

Acknowledgments In this contribution we summarize the main results of work done in collaboration with C. Aron, L. Arrachea, G. Biroli, C. Chamon, S. Garnerone, T. Giamarchi, D. R. Grempel, M. P. Kennett, P. Le Doussal, G. Lozano, H. Lozza and C. A. da Silva Santos. [1] Uemura YJ et al, Phys. Rev. Lett. 73, 3306 (1994). [2] Urano C et al, Phys. Rev. Lett. 85, 1032 (2002). [3] Ancona-Torres C, Silevitch DM, Aeppli G and Rosenbaum TF Phys. Rev. Lett. 101, 057201 (2008) and references therein. [4] Julien M-H, Physica B 329, 693 (2003). [5] Ovadyahu Z, Phys. Rev. B 73, 214204 (2006). [6] Popovic D et al. Proceedings of SPIE 5112, 99 (2003). [7] Rogge S, Natelson D, and Osheroff DD Phys. Rev. Lett. 76, 3136 (1996). [8] Imry Y, Introduction to mesoscopic physics, (Oxford Univ. Press, 1997). [9] Arrachea L, Phys. Rev. B 66, 045315 (2002). [10] Arrachea L, Eur. Phys. J. B 36 253, (2003). [11] Mitra A and Millis AJ, arXiv:0804.3980 and references therein. [12] Cugliandolo LF, in Les Houches Session 77, J-L. Barrat et al eds. (Springer-EDP Sciences, 2002), arXiv:condmat/0210312. G. Biroli, J. Stat. Mech. P05014 (2005). [13] Monasson R, in Les Houches Session 85, J-P Bouchaud, M. M´ezard and J. Dalibard eds. (Elsevier, 2007), arXiv:0704.2536. [14] Chakravarty S, Phys. Rev. Lett. 49, 681 (1982). [15] Bray AJ and Moore M, Phys. Rev. Lett. 49, 1545 (1982). [16] Faoro L and Ioffe LB, Microscopic origin of low frequency flux noise in Josephson circuits arXiv:0712.2834; Quantum two level systems and Kondo-like traps as possible sources of decoherence in superconducting qubits arXiv:cond-mat/0510554. [17] Cugliandolo LF, Kurchan J and Peliti L, Phys. Rev. E 55, 3898 (1997). [18] Cugliandolo LF and Kurchan J, J. Phys. Soc. Japan 69, 247 (2000). [19] Cugliandolo LF and Lozano G, Phys. Rev. Lett. 80, 4979 (1998), [20] Cugliandolo LF and Lozano G, Phys. Rev. B 59, 915 (1999). [21] Cugliandolo LF, Grempel DR, and da Silva Santos CA, Phys. Rev. Lett. 85, 2589 (2000). [22] Cugliandolo LF, Grempel DR, and da Silva Santos CA, Phys. Rev. B 64, 014403 (2001). [23] Biroli G and Cugliandolo LF, Phys. Rev. B 64, 014206 (2001). [24] Cugliandolo LF, Grempel DR, Lozano G, Lozza H, da Silva Santos CA, Phys. Rev. B 66, 014444 (2002). [25] Cugliandolo LF, Grempel DR, Lozano G and Lozza H, Phys. Rev. B 70, 024422 (2004). [26] Arrachea L and Cugliandolo LF, Europhys. Lett. 70, 642 (2005). [27] Kennett MP, Chamon C and Cugliandolo LF, Phys. Rev. B 72, 024417 (2005). In this paper a model for a 2d doped classical antiferromagnet is studied with the scope of discussing super-conductor physics. [28] Cugliandolo LF, Giamarchi T and Le Doussal P, Phys. Rev. Lett. 96, 217203 (2006). [29] Aron C, Biroli G and Cugliandolo LF, arXiv:cond-mat/. [30] Weiss U, Quantum dissipative systems (World Scientific, Singapore, 1993). Kamenev A, in Les Houches Session 81, H. Bouchiat et al. eds. (Springer-EDP Sciences, 2004), arXiv:cond-mat/0412296. [31] Goldschmidt Y, Phys. Rev. B 41, 4858 (1990). [32] Nieuwenhuizen TM and Ritort F, Physica A 250, 89 (1998). [33] Grempel DR and Rozenberg MJ, Phys. Rev. B 60, 4702 (1999). [34] Giamarchi T, arXiv.org:cond-mat/0403531, Lecture notes of the course given at the E. Fermi school on ”Quantum Phenomena in Mesoscopic Systems”, Varenna 2002. [35] Grempel DR, Europhys. Lett. 66, 854 (2004). [36] Pankov S and Dobrosavljevic V, Phys. Rev. Lett. 94, 046402 (2005). [37] Mueller M and Ioffe LB, Collective modes in quantum electron glasses and electron-assisted hopping arXiv:0711.2668 [38] Giamarchi T and Le Doussal P, Phys. Rev. B 53, 15206 (1996). [39] Cugliandolo LF, Grempel DR and da Silva Santos CA, unpublished. Aron C, Biroli G and Cugliandolo LF, in preparation. [40] Rokni M and Chandra P, Phys. Rev. B. 69, 094403 (2004).

78

[41] [42] [43] [44] [45] [46] [47] [48]

Jorg T, Krzakala F, Kurchan J and Maggs AJ, arXiv:cond-mat/0806414. Laumann C, Scardicchio A and Sondhi SL, arXiv:cond-mat/0706.4391. Krzakala F, Rosso A, Semerjian G, Zamponi F, arXiv:cond-mat/0807:2553. Leggett AJ, Chakravarty S, Dorsey AT, Fisher MPA, Garg A and Zwerger W, Rev. Mod. Phys. 59, 1 (1987); 67, 725 (1995). Cugliandolo LF, Kurchan J, Le Doussal P and Peliti L, Phys. Rev. Lett. 78, 350 (1997). Kennett MP and Chamon C, Phys. Rev. Lett. 86 1622 (2001). Biroli G and Parcollet O, Phys. Rev. B 65, 094414 (2002). Cugliandolo LF, Dean DS and Kurchan J, Phys. Rev. Lett. 79, 2168 (1997).

79

80

Proceedings of the International Workshop on Statistical-Mechanical Informatics September 14–17, 2008, Sendai, Japan

Non-classical Role of Potential Energy in Adiabatic Quantum Annealing Arnab Das The Abdus Salam International Center for Theoretical Physics, Starda Costiera 11, 34014 Trieste, Italy E-mail: [email protected] Abstract. Adiabatic quantum annealing is a way of realizing analog quantum computation, where a computational task is mapped to the problem of finding the global minimum of some classical potential energy function or Hamiltonian, using externally introduced kinetic quantum fluctuations. In this method, the entire potential energy landscape (PEL) may be accessed simultaneously through a delocalized wave-function, in contrast to a classical search, where the searcher has to visit different points in the landscape (i.e., individual classical configurations) sequentially. Thus in such searches, the role of the potential energy might be significantly different in the two cases. Here we discuss this in the context of searching of a single isolated hole (potential minimum) in a golf-course type gradient free PEL. We show, that the quantum particle would be able to locate the hole faster if the hole is deeper, while the classical particle of course would have no scope to exploit the depth of the hole. We also discuss the effect of the underlying quantum phase transition on the adiabatic dynamics.

1. Introduction Adiabatic quantum annealing (AQA) [1]-[12] is a method of finding the ground state (minimum energy state) of a given classical Hamiltonian by employing external quantum fluctuations and subsequent adiabatic reduction of them. One is given with a classical Hamiltonian H, which may be a physical Hamiltonian with many degrees of freedom, or a suitable mathematical function depending on many variables, whose ground state is to be determined. In order to introduce the quantum fluctuations necessary for the AQA of such a Hamiltonian, one adds a quantum kinetic part H′ (t) to it, such that H′ (t) and H do not commute. Initially, one keeps |H′ (t = 0)| ≫ |H| so that the total Hamiltonian Htot (t) = H′ (t) + H is well approximated by the kinetic part only (Htot (0) ≈ H′ (0)). If the system is initially prepared to be in the ground state of H′ (0) (one chooses H′ (0) to have a easily realizable ground state) and H′ (t) is reduced slowly enough, then according to the adiabatic theorem of quantum mechanics, the overlap |hψ(t)|E− (t)i|, between the instantaneous lowest-eigenvalue state |E− (t)i and the instantaneous state of |ψ(t)i of the evolving system, will always stay near its initial value (which is close to unity, since Htot (0) ≈ H′ (0)). Hence at the end of such an evolution, when H′ (t) is reduced to zero at t = τ (the annealing time), the system will be found in a state |ψ(τ )i with |hψ(τ )|E− (τ )i| ≈ 1, where |E− (τ )i is the ground state of Htot (τ ), which is nothing but the surviving classical part H. Thus at the end of an adiabatic annealing the system is found in the ground state of the classical Hamiltonian with a high probability. Based on this principle, algorithms can be framed to anneal complex physical systems like spin glasses as well as the objective functions of

81

hard combinatorial optimization problems (like the Traveling Salesman Problem), towards their ground (optimal) states [2, 6, 8, 11]. In order to ensure adiabaticity, the evolution should be such that τ ≫ α; where

|hH˙tot i|max , ∆2min

α=

(1)

    dHtot max E− (s) E+ (s) 0≤s≤1 ds h i

|hH˙tot i|max = ∆2min =

min ∆2 (s) ;

0≤s≤1

s = t/τ ;

0 ≤ s ≤ 1,

(2)

|E+ (t)i being the instantaneous first excited state of the total Hamiltonian Htot (t), ∆(t) being the instantaneous gap between the ground state and the first excited state energies and α being the adiabatic factor (for a simple, general proof see [14]). One key feature, believed to be behind the success of AQA over the classical ones [2, 6, 8] in glass-like rugged PEL, is the ability of the quantum systems to tunnel easily through potential energy barriers even if they are very high, provided they are narrow enough, in contrast to the the classical ones, which always has to scale the barrier height with its kinetic energy (temperature) irrespective of the width [5, 7, 10, 13, 15]. Here we show that there is another respect in which a quantum mechanical searcher gets advantage over the classical one - it can utilize the depth of the potential energy minimum in locating it in absence of any potential gradient which a classical searcher cannot. 2. Searching a hole on a gradient-free PEL We consider a lattice with N sites, |ii denoting the state of a particle localized at the i-th site. At each site, there is a potential, which is zero at all the sites i 6= w, and is −χ at i = w, where w is chosen randomly. Thus the PEL is essentially a flat one without any gradient, with a single hole (minimum) at i = w with a depth χ. This is precisely some kind of analog version of Grover’s algorithm for searching a particular entry in an unstructured database [16] - [18]. But in those studies, the possibility of utilizing the depth of the hole in favor of faster search was not considered. Let us consider that the lattice points are connected to each other by an infinite range hopping term Γ between any two sites. The question is how fast a particle can locate the hole starting from a state which does not assume any knowledge of the position of the hole, by reducing its kinetic energy Γ, and tuning the hole depth χ. The Hamiltonian for a particle on such a lattice will be given byHtot (t) = −χ(t)|wihw| − Γ(t)

X

i,j;i6=j

|iihj|;

χ(t), Γ(t) > 0.

(3)

In order to anneal the particle to the hole, one has to reduce Γ from a very high value to a very low final value and tune χ in the opposite manner, so that Γ(t = 0) ≫ χ(t = 0) and Γ(t = τ ) ≪ χ(t = τ ), where τ is the annealing time. The evolution should satisfy the adiabatic condition (1). The eigen-spectrum of Htot (t) consists of a ground state |E− (t)i and first excited state |E+ (t)i (in the order of increasing eigen values) with energies 

q 1 E± (t) = − (N − 2)Γ + χ ± (N Γ − χ)2 + 4χΓ 2



(4)

respectively, all the time dependencies being implicit, through the time dependence of Γ and χ. The instantaneous gap is thus given by q

∆(t) =

(N Γ − χ)2 + 4χΓ. 82

(5)

The instantaneous first and second excited states |E± (t)i are given by 

N X



1 C± (t)|wi + |E± (t)i = q |ii , 2 C± (t) + N − 1 i6=w

(6)

where

1 [−(N − 2)Γ + χ ∓ ∆] (7) 2Γ The second excited state is (N − 2)-fold degenerate, with eigenvalue −Γ, and the time evolving Hamiltonian never mixes the first two eigenstate with any of the second excited states. This can be easily argued noting that a state of the form |E2 i = √12 (|ii − |ji) (i, j 6= w) is an eigenstate of Htot (t) with eigenvalue −Γ, and hE2 |E− (t)i = hE2 |E+ (t)i = 0 for all t. For all allowed combinations of i and j we get (N − 2) such linearly-independent eigenstates. Form these (N − 2) linearly-independent eigenstates we can construct (N − 2) mutually orthogonal eigenstates, each of which will obviously satisfy the above non-mixing condition. Thus we have to take care of only two lowest lying states and the gap between them. Here one may note that if χ is independent of N , the gap ∆ scales approximately linearly with N for large N at any given non-zero value of Γ, and so does the term |hH˙tot i|. But at Γ = 0, the N -dependence of the former vanishes, but not the later, if Γ has a linear time dependence. Thus one needs a linear N -dependence in χ in order to obtain an adiabatic factor α that does not diverge with N , as N → ∞. Henceforth we will consider only the large N limits, and replace ′ =′ by ′ ≈′ when the correction will vanish in the said limit. Physically this means, the depth of the hole is to be increased in order to get it sensed and successfully tracked by the searching wave-function if the space of search is made larger. Such a non-local sensing of the hole-depth and utilizing it for tracking the hole is impossible for a classical searcher, since it cannot sense the hole depth unless it drops right into it. In investigating the condition of adiabatic evolution we consider two separate cases. In a general AQA program, one might not have the facility of tuning both the potential energy part and the kinetic part in practice, since, say, one might depend on the strength of interactions between the elementary constituents (hard to tune), while the other might be introduced through an applied external field (easily tunable). Hence in our analysis we have considered two separate cases - in the first case we tune χ keeping Γ constant, while in the second case we do the reverse. C± (t) =

2.1. Constant-Γ Annealing P We start with the democratic initial state |ψ(0)i = √1N N i |ii (which is of course the ground state of Htot (0)) and adopt a linear annealing schedule with an explicit N -dependence of χ Γ = Γ0 ;

t χ = χ0 ; τ

χ0 = rN Γ0 .

(8)

When Γ is kept constant, we have to keep it sufficiently low (or, in other words, χ sufficiently high), so that the final ground state has a substantial (non-vanishing in the N → ∞ limit) overlap with the hole-stateq |wi. At t = τ , we get from Eq. (7), the amplitude of |wi in the final 2 (τ ) + N − 1, where ground state to be C− (τ )/ C− p

N 2 (r − 1)2 + 4rN N (r − 1) C− (τ ) = 1 + + 2 2 N (r − 1) |N (r − 1)| + . ≈ 1+ 2 2

83

(9)

τmin

8

(a)

4 2 1

P(|w>)

1 1

1000

N

Γ=1E-3 Γ=0.05 Γ=0.5 Γ=5.0

(b)

1e-04 1e-08 0.01

1e+06

1

τ

100

Figure 1. Panel (a) in the figure shows the variation of the minimum time τmin required to achieve the target success probability PT = 0.33 for the constant-Γ annealing (Γ(t) = Γ0 , χ(t) = χt/τ ; r = 2.0, Γ0 = 0.5), obtained by solving the time-dependent Schr¨ odinger equation numerically. Panel (b) shows the variation of final probability P (|wi) of finding the system in state |wi, with annealing time τ , for different final value of Γ. Here we have taken N = 106 . Clearly, if r > 1, then the amplitude C− (τ ) ∼ N , and thus the overlap amplitude ∼ 1, whereas if r < 1, then C− (τ ) ∼ 1, and the amplitude vanishes as N → ∞. Thus to be able to locate the hole at the end, we have to take r > 1. In fact, The adiabatic factor in that case (r > 1) is given by ! √ |hH˙tot (t)i|max rΓ20 N N − 1 α1 = = (10) |∆2 (t)|min τ |∆3 (t)|min This has its maximum at tm = τ (N − 2)Γ0 /χ0 ≈ τ /r, and the maximum value is given by √ χ N r . (11) α1 = ≈ 2 3/2 8Γ 8Γ0 (N − 2) 0 This clearly shows that if the depth of the hole scales linearly with the size N of the search space, one can in fact find the hole in a time independent of N . We calculated numerically the minimum time τmin required for obtaining a target success probability PT = 0.33 for different N , through many decades. The evolution is computed solving time-dependent Schr¨odinger equation numerically and τmin is figured out up to an accuracy of 10−4 employing the following bisection scheme. We first figure out arbitrarily a high value of τ (call it τhi ) for which P (|wi) ≡ |hψ(τ )|wi|2 > PT . Next we find a low τ (τlo ), for which P (|wi) < PT . Then we evaluate P (|wi) for τ = τm = (τhi + τlo )/2. If the result is greater than PT , then we replace τhi by τm (and retain old τlo ), else we replace τlo by τm (and retain old τhi ) and repeat the same process. We go on iterating until the the value of |P (|wi) − PT | for both τhi and τlo lies within some desired accuracy limit. The results (Fig. 1a) clearly show that P (|wi) becomes independent of N for large N , as expected. The relaxation behavior for large N for a given annealing time τ of course depends on the value of Γ0 (see Fig. 1b). If Γ0 is too small, the system takes a longer time to feel the changes in the landscape, and hence the adiabatic relaxation requires longer time (the adiabatic factor becomes bigger; see Eq. 11). On the other hand, if Γ0 is too large, the ground state itself is pretty delocalized, and hence the final state, though more closer to the ground state, has again

84

a small overlap with the target state |wi. For Γ0 ≈ 0.5, the schedule is found to be optimal (Fig. 1(b)). The relaxation behavior is seen to be linear with the annealing time τ for large τ . 2.2. Constant-χ Annealing 10 Γ0 = 100 Γ0 = 1

τmin

1

0.1

0.01 103

104

105

106

107

108

N

Figure 2. The figure shows variation of the minimum time τmin required to achieve success probability PT = 0.9, with N for the constant-χ annealing. The result is obtained by solving the time-dependent Schr¨ odinger equation numerically with Γ(t) = Γ0 (1 − t/τ ), χ(t) = χ0 = rΓ0 N for r = 0.5. Next, we consider the case where χ is kept fixed and Γ is reduced linearly, i.e., χ = χ0 = rN Γ0 and Γ = Γ0 (1 − t/τ ). In this case, we start with the same democratic initial state |ψ0 i as in the previous case, which is not of course the ground state in presence of the hole (we cannot construct the actual initial ground state without the explicit knowledge of the location of the hole). All we need to show in this case, is that it has a non-vanishing (i.e., non-zero in the N → ∞ limit) overlap |hψ(0)|E− (0)i| between our initial state and the true ground state at t = 0. If we can assure adiabaticity for the subsequent evolution, this overlap will be conserved and would emerge as the final overlap between |ψ(τ )i and |wi. Calculation similar to that of the previous section shows that, for r < 1, C− (0) → 0 as N → ∞, implying that the overlap actually tends to unity in that limit, while it vanishes for r > 1. The adiabatic factor in the former case (r < 1) is given by 1 , (12) α2 ≈ 8Γ0 r 2 a constant independent of N again. We present a similar numerical calculation, as in the previous section, for the variation of τmin with N in Fig. (2). It shows that the value o f τmin becomes independent of N in the large-N limit (its asymptotic value, however depends on Γ0 ). 3. The Phase Transition The expression for the gap ∆ (Eq. 5) shows √ that the gap scales linearly with N everywhere except at the point N Γ = χ, where it scales as N . In our notation, this happens at the instant t∗ = τ /r in the case of constant-Γ annealing, and at t∗∗ = τ (1 − r) in the constant-χ case. For successful annealing one cannot avoid this point during the evolution in both cases. To see what happens at that point, let us focus on the amplitude |hw|E− (χ, Γ)i after and before passing through the transition point. Let us consider, say, the Γ-constant annealing case. We have from Eq. (7), C− (t) ≈ (−N Γ0 (1 − rt/τ ) + N Γ0 |1 − rt/τ |)/2Γ0 . Taking the modulus into account, one finds C− (t) ≈ 0 for t < t∗ (which means the amplitude vanishes), while C− (t) ∼ N for t > t∗ (the amplitude tends to unity). This means the ground state of Htot (χ, Γ) undergoes a global change in character from a completely delocalized one to a completely localized one

85

at this special point. Thus, to follow the wave-function, a lot of tunneling from all the sites to the hole is required at the point. Here the depth of the hole plays a crucial role in making this tunneling possible preventing the gap ∆ from closing at the transition point. If however, the system passes this point very fast, this massive tunneling might remain incomplete and a resultant loss in the adiabaticity may occur. If instead of the Hamiltonian we considered, one takes a bounded version of it (energy not growing with N ), say, by keeping χ independent of N and scaling Γ0 by N , then one can easily see that this special √ transition point turns out to be a true quantum critical point with the gap vanishing as 1/ N [17]. The two states- the one localized at the hole and the other delocalized over the entire lattice, have exactly equal energies at this critical point as the gap closes. But before reaching the critical point (i.e., in the kinetic energy dominated region), the ground state is non-degenerate and delocalized, while after the point (i.e., in the potential energy dominated region) the ground state is degenerate and localized at the hole. Since there is no energy difference between the two states (the localized and the delocalized ones) at the critical point, the evolving system cannot sense the global exchange of character between the ground state and the first excited state at this point and thus fails to adopt (tunnel) accordingly to follow the ground state. Letting the hole scale linearly with N and taking infinite range hopping, one prevents the gap from closing, and thus favors the localized state energetically over the delocalized one. This acts as a drive for the necessary tunneling and thus provides a guidance for the system to follow the ground state as it changes its global character, unless it is infinitely slow in passing through the point. However, it is also worth noting that increasing the hole-depth χ0 indiscriminately does not pay. In the case of constant-Γ annealing, the adiabatic factor increases with r (Eq.11), which means we have to keep r as small as possible, i.e., r → 1 gives the best result. This can be explained by noting that bigger is the χ0 , longer the time to evolve it adiabatically. In the constant-χ case, adiabatic factor decreases with r, but one has to keep r < 1, which means χ can be increased only linearly with N , up to the upper limit N Γ0 to accelerate the annealing effectively. This is because, too big χ0 implies too much error in the initial state and the overlap is negligible right from the onset. Hence even with a perfectly adiabatic evolution we end up with the negligible overlap. To summarize, we studied the adiabatic searching of a gradient-free potential energy landscape on a lattice with a randomly placed single isolated minimum (hole) by a quantum searcher. We have found that the depth of the hole plays a non-classical role in accelerating the search. This is because the delocalized wave-function of the quantum adiabatic searcher can detect the hole from the very onset of the search and keep track of it, in contrast to a classical searcher, who of course cannot sense the hole depth non-locally and utilize it. We found that the condition for adiabatic search requires an infinite-range hopping between the points in the PEL and the depth of the hole to scale linearly with the lattice-size N . We have discussed how the hole-depth plays a role in preventing the quantum criticality and the consequent divergence of the timescale from occurring. Acknowledgments: The author is thankful to B. K. Chakrabarti, G. E. Santoro and E. Tosatti for useful discussions. References [1] T. Kadowaki and H. Nishimori, Phys. Rev. E 58 5355 (1998) [2] E. Farhi, J. Goldstone, S. Gutmann, J. Lapan, A. Ludgren and D. Preda, Science 292 472 (2001) [3] A. Das and B. K. Chakrabarti Rev. Mod. Phys. (in press, 2008)

86

[4] G. Santoro and E. Tosatti, News and Views in Nature Physics 3, 593 (2007); G. E. Santoro and E. Tosatti, J. Phys. A 39 R393 (2006) [5] A. Das and B. K. Chakrabarti, Eds., Quantum Annealing and Related Optimization Methods, Lecture Note in Physics, 679, Springer-Verlag, Heidelberg (2005) [6] G. E. Santoro, R. Marto˘ n´ ak, E. Tosatti and R. Car, Science, 295 2427 (2002) [7] J. Brook, D. Bitko, T. F. Rosenbaum and G. Aeppli, Science 284 779 (1999) [8] R. Marto˘ n´ ak, G. E. Santoro and E. Tosatti, Phys. Rev. E 70 057701 (2004) [9] J.-I. Inoue, Phys. Rev. E 63 046114 (2001); see also J.-I. Inoue in [5]. [10] A. Das, B. K. Chakrabarti and R. B. Stinchcombe, Phys. Rev. E 72 026701 (2005) [11] A. Das and B. K. Chakrabarti, arXiv:0803.4508, 2008 [12] R.D. Somma, C. D. Batista and G. Ortiz, Phys. Rev. Letts. 99 030603 (2007) [13] E. Farhi, J. Goldstone, S. Gutmann, arXiv:quant-ph/0201031 (2002) [14] M. S. Sarandy, L.-A. Wu and D. A. Lidar, Quantum Information Processing 3 331 (2004) [15] P. Ray, B. K. Chakrabarti and A. Chakrabarti, Phys. Rev. B 39 11828 (1989) [16] L. K. Grover, Phys. Rev. Lett. 79 325 (1997) [17] E. Farhi and S. Gutmann, Phys. Rev. A 57 2403 (1998) [18] J. Roland and N. J. Cerf, Phys. Rev. A 65 042308 (2001)

87

88

Proceedings of the International Workshop on Statistical-Mechanical Informatics September 14–17, 2008, Sendai, Japan

A comparison of classical and quantum annealing dynamics Sei Suzuki Department of Physics and Mathematics, Aoyama Gakuin University, Fuchinobe, Sagamihara, 229-8558, Japan E-mail: [email protected] Abstract. Simulated annealing and quantum annealing are algorithms for combinatorial optimization problems. The former brings solutions using thermal fluctuations and through classical dynamics, while the latter does using quantum fluctuations and through quantum dynamics. In this paper, dynamics of these two algorithms are compared by means of the KibbleZurek argument, employing a one-dimensional random Ising model as an exercise for algorithms. We reveal that quantum annealing reduces residual errors faster than simulated annealing with decreasing annealing rate. The result implies the advantage of quantum annealing over simulated annealing.

1. Introduction Physics has inspired algorithms of computation. A celebrated example is simulated annealing [1]. Simulated annealing is an algorithm for general combinatorial optimization problems. In simulated annealing, we consider a system representing the problem and put it in finite temperature. The problem is translated into finding the ground state of the system. To this end, we initially require that the temperature is sufficiently high. It is easy to obtain the equilibrium state of the system in high temperature. Suppose that the system is in the equilibrium initially. Then we lower the temperature toward zero according to a certain schedule. If the speed of temperature change is slow enough, the system maintains equilibrium state and reaches the ground state when the temperature becomes zero. We thus obtain the solution. Obviously simulated annealing produces the solution through the physical dynamics of the classical statistical mechanics. The idea of simulated annealing has developed into more elaborate algorithm such as the parallel tempering [2, 3] these days. Algorithms using quantum mechanics have also attracted a lot of attentions. Quantum annealing is one of quantum mechanical algorithms for general combinatorial optimization problems [4, 5, 6]. Instead of thermal fluctuations in simulated annealing, quantum annealing introduces quantum fluctuations. If the problem is encoded in the interacting Ising-spin model, quantum fluctuations are induced by the transverse field typically. We note that the Hamiltonian of the transverse field and that of Ising spins do not commute. We assume that the transverse field is strong enough initially and the initial state is the ground state of the initial Hamiltonian. The ground state of the system with strong transverse field is easily obtained. Then we lower the transverse field gradually with time. The time evolution of the state is governed by the Schr¨ odinger’s dynamics. If the change of the transverse field is slow enough, the state evolves

89

adiabatically and reaches the ground state of the Ising model when the transverse field vanishes. Quantum annealing produces the solution using quantum fluctuations through the dynamics of quantum mechanics. One may hold a hope that quantum mechanics is outstanding from classical mechanics in the search of the ground state of classical system. Several studies have compared simulated annealing and quantum annealing numerically [7, 8, 9, 10] and experimentally [11]. Most of them have reported results in favor of the advantage of quantum annealing. However we have had no analytic evidence that shows the outstanding power of quantum annealing over simulated annealing. To understand the feature of algorithms deeply, it is significant to reveal the difference between quantum annealing and simulated annealing analytically. The purpose of this paper is to provide an analytical evidence that quantum annealing has an advantage over simulated annealing. In order to discuss dynamics of the system with time-dependent parameter, one has to care about the phase transition. In both simulated annealing and quantum annealing, the initial state is a disordered state and the target ground state is usually a state with a certain order. The phase transition lies between the initial state and the target state. If the temperature is lowered, it is the thermal phase transition. If the transverse field applied to the Ising model is weakened, the quantum phase transition takes place. The state evolved from the disordered state cannot become the ordered state completely as far as the parameter is changed with finite rate. This is because the characteristic time diverges at the critical point. Thus the final state inevitably contains spatial defects. This mechanism of the defect formation is called the Kibble-Zurek mechanism [15, 16]. We explain the Kibble-Zurek mechanism in the next section more in detail. Errors of simulated annealing and quantum annealing are associated with the Kibble-Zurek mechanism. In our study, we compare errors of simulated annealing and quantum annealing estimated by means of the argument for the Kibble-Zurek mechanism. The model we employ is the random ferromagnetic Ising model in one dimension: H0 = −

X

Ji σi σi+1 ,

(1)

i

where we assume that the coupling constant Ji is drawn from the uniform distribution between 0 and 1, namely ½ 1 0 ≤ Ji ≤ 1 P (Ji ) = . (2) 0 otherwise The ground state of this model is the complete ferromagnetic state. However the first excited state reflects the randomness. Hence this model is trivial as the optimization problem but possesses non-trivial dynamics. In our model, the error of the obtained state can be measured by density of kinks: 1 X ρ≡ (1 − hσi σi+1 i) (3) 2N i as well as residual energy per spin: hH0 i 1 + , (4) N 2 where N (→ ∞) is the number of spins and h· · ·i denotes the expectation value with respect to the state after simulated or quantum annealing. We remark that 21 in eq. (4) stands for the minus of the true ground energy per spin. Quantum annealing dynamics of the present model has been studied by Dziarmaga on the basis of the Kibble-Zurek mechanism a couple of years ago [12]. He revealed that density of kinks decays with annealing rate 1/τ of the transverse field as εres ≡

ρQA ∼ 1/ (ln τ )2

90

(5)

for large τ . Caneva et al. [13] arrived at the same result by the Landau-Zener formula and scaling of the distribution of energy gaps at the quantum critical point. In ref. [13], the decay rate of residual energy is also estimated numerically. Numerical result for large τ is written as ζ εQA , res ∼ 1/ (ln τ )

ζ ≈ 3.4.

(6)

As for simulated annealing of the same model, the author of the present paper revealed on the basis of the Kibble-Zurek mechanism that density of kinks decays as

and residual energy per spin as

ρSA ∼ 1/ ln τ

(7)

2 εSA res ∼ 1/ (ln τ )

(8)

for large τ [19]. The result of residual energy is the reproduction of the Huse-Fisher’s law [14]. Comparison of residual energies, eqs. (6) and (8), might indicate the advantage of quantum annealing. However some ambiguity remains since analytic result has not obtained for quantum annealing. Comparing density of kinks after quantum annealing and that after simulated annealing, it is obvious that the former decays faster than the latter. This result gives a firm evidence that quantum annealing has an advantage over simulated annealing. The organization of the paper is as follows. We first explain the argument of the KibbleZurek mechanism in the next section. Then, according to ref. [12], we apply the Kibble-Zurek argument to quantum annealing of one-dimensional random Ising ferromagnet in sec. 3. We obtain the decay rate of density of kinks, eq. (5). Dynamics of simulated annealing is studied in sec. 4. We derive density of kinks and residual energy, eqs. (7) and (8), by means of the Kibble-Zurek argument. We also present the confirmation of analytic results by the Monte-Carlo simulation. The paper is concluded in sec. 5. 2. Kibble-Zurek argument Let us consider a uniform Ising ferromagnet. We suppose that a parameter γ specifies static state of the system. It corresponds to the temperature or the transverse field. We assume that the static state exhibits the phase transition at γ = γc . The order parameter of the system is zero for γ > γc , while it is finite and uniform for γ < γc . In the argument of the KibbleZurek mechanism, the correlation length and the characteristic (relaxation) time at fixed γ are important quantities. We write the correlation length and the characteristic time as ξ(γ) and τr (γ) respectively. These quantities grow with decreasing γ toward γc and diverge at γc . Now we consider dynamics which follows quenching γ with time from γ > γc to γ < γc . We assume that γ is quenched linearly with time. As the initial condition, we assume that the system is in the static (equilibrium or ground) state of γ. Starting from the initial state, the state evolves with decreasing γ. If γ is still large enough, the state should maintain the static state of γ since the characteristic time τr is small. However, as γ comes close to γc , γ decreases further before the state attains the static state of γ. Hence the state cannot evolve into the static state of γ when γ comes below γc as far as the quenching rate is finite. It follows that, after the evolution, the order parameter is not uniform and ferromagnetic domains are formed. This is the Kibble-Zurek mechanism of spatial defects formation. In the Kibble-Zurek argument, one can derive the relation between the mean size of ferromagnetic domains and the quenching rate. We assume linear quenching of γ and introduce the dimensionless parameter as follows, ²(t) ≡

t γ(t) − γc =− , γc τ

91

(9)

where t is time and 1/τ is the quenching rate of γ. Time t is assumed to go forward from −∞ to τ . We here pose an equation: τr (γ(tˆ)) = |tˆ|. (10) This equation defines the time tˆ at which the characteristic time is equal to the remaining time to the critical point. Since the characteristic time is longer than the remaining time to the critical point, the system cannot attain the static state of γ(t) for t > tˆ. We then assume that the state maintains the static state for t < tˆ and the evolution of the state stops at t = tˆ. This state possesses a finite correlation length. Hence this state consists of ferromagnetic domains. Once the domain structure forms, it could scarcely develop into the complete ferromagnetic state. Therefore it is reasonable to consider that the state remains the state at t = tˆ until t = τ . Thus the size of domain of the final state is estimated by the correlation length at γ = γˆ ≡ γ(tˆ), ˆ namely ξ(ˆ γ ) ≡ ξ. It is instructive to consider the standard phase transition of second order. Using the critical exponents ν and z, the correlation length and characteristic time obey the power law near the critical point: ξ ∼ |²|−ν and τr ∼ ξ z ∼ |²|−zν , where dispensable factors are omitted. Applying the relation of τr and eq. (9) to eq. (10), we obtain the equation of ²ˆ ≡ ²(tˆ): ²ˆ−zν ∼ ²ˆτ , where we supposed that tˆ is negative and hence ²ˆ is positive. It follows that ²ˆ ∼ 1/τ 1/(1+zν) . Using the relation between ξ and ², we obtain ξˆ ∼ τ ν/(1+zν) . Thus the domain size of the state after evolution obeys the power law of τ and its exponent is determined by z and ν. 3. Quantum annealing In this section we derive the decay rate of density of kinks after quantum annealing of the random Ising ferromagnet in one dimension. Let us consider the following quantum Ising model: H = H0 − Γ

X

hi σ x ,

(11)

i

where H0 is the Ising Hamiltonian defined by eq. (1) with replacing σi by σiz . σix and σiz are the Pauli’s spin operators at site i. hi is a positive factor randomly chosen from [0, 1]. It has been known that this model exhibits the quantum phase transition at Γ = Γc = 1. We define here the dimensionless field ² ≡ (Γ − 1). According to the Fisher’s analysis by the renormalization group, the correlation length behaves as ξ ≈ ξ0 /|²|2 ,

(12)

and the characteristic time as µ

τr ≈ τ0

ξ ξ0

¶1/2|²|

≈ τ0 |²|−1/|²| ,

(13)

where ξ0 and τ0 are the critical amplitudes. Note that the anomalous dynamical exponent, z = 1/2|²|, represents the universality of the random system. Now we suppose that the transverse field is quenched linearly with time and with rate τ as eq. (9). According to the Kibble-Zurek argument, we pose eq. (10). Then we obtain the equation of ²ˆ as τ0 /ˆ ²1/ˆ² ≈ τ ²ˆ, (14) where we defined ²ˆ by ² that satisfies this equation. For τ /τ0 À 1, since ²ˆ ¿ 1, the equation is reduced to τ 1 1 ln ≈ ln . ²ˆ ²ˆ τ0

92

The solution of this equation is approximately 1 ln(τ /τ0 ) ≈ . ²ˆ ln ln(τ /τ0 )

(15)

As mentioned in the previous section, the correlation length of the state after evolution is estimated by ξˆ = ξ(ˆ ²). From eqs. (12) and (15), we obtain [ln(τ /τ0 )]2 ξˆ ≈ . [ln ln(τ /τ0 )]2

(16)

The inverse of the correlation length is almost the same as the density of kinks in the state. Hence the density of kinks after quantum annealing is written as ρQA ≈

1 [ln ln(τ /τ0 )]2 1 ≈ > . 2 ˆ [ln(τ /τ )] [ln(τ /τ0 )]2 0 ξ

(17)

Thus, ignoring the double logarithm in the numerator, one obtains eq. (5). The decay rate of density of kinks derived here has been confirmed by the numerical simulation using the JordanWigner transformation [12, 13]. 4. Simulated annealing We consider the classical system represented by the Hamiltonian, eq. (1). To discuss the dynamics of the present system, we assume the Glauber model [17]. In the Glauber model with random coupling, the motion of the thermal average of spin at site i, Si (t), is governed by ´ d 1³ − Si (t) = −Si (t) + Ci Si−1 (t) + Ci+ Si+1 (t) , dt 2

(18)

Ci± = tanh β(Ji + Ji−1 ) ± tanh β(Ji − Ji−1 ).

(19)

where Ci± is defined by

In a following few paragraphs we give the correlation length, energy, and characteristic relaxation time at fixed temperature in the present model. The correlation function between sites i and i + k of the system with fixed {Ji } is given by Qi+k−1 hσi σi+k i = j=i tanh βJj . Taking the average over randomness, the correlation function in the thermodynamic limit is obtained as [hσi σi+k i]av =

Z hY

i

dJj P (Jj ) hσi σi+k i =

j

³ ln cosh β ´k

β

.

(20)

The correlation length ξ is defined by [hσi σi+k i]av = e−k/ξ , where the lattice constant is taken as the unit of the length. From the above equation, we obtain an explicit expression of ξ. In particular, an expression for low temperature limit (β À 1) is given by ξ(T ) ≈ β/ ln 2.

(21) P

The energy of the system with quenched {Ji } is written as hHi = − i Ji tanh βJi . The average over randomness yields an expression of the energy per spin in the thermodynamic limit. Z [hHi]av = − dJP (J)J tanh βJ ε = lim N →∞ N 93

∞ 1 1 π2 1 1 X (−1)n −2nβ =− + 2 − ln(1 + e−2β ) − 2 e . 2 β 24 β 2β n=1 n2

(22)

In the low temperature limit, this formula is reduced to 1 1 π2 ε≈− + 2 . 2 β 24

(23)

We remark that the ground state energy is − 12 . The relaxation time is available in ref.[18] by Dhar and Barma. 1/(1 − tanh 2β). The low temperature expression is

It is given by τr =

τr (T ) ≈ e4β /2.

(24)

Combination of this expression with eq. (21) yields relation between τR and ξ: τr (T ) = e(4 ln 2)ξ(T ) /2.

(25)

Now let us consider that the temperature is quenched according to the following schedule: T (t) = −t/τ,

(26)

where time t evolves from −∞ to 0. Remark that the critical temperature of the present system is Tc = 0. This is because there is no long-range order in the equilibrium at any finite temperature, while the ground state is the complete ferromagnetic state. Anyway we pose eq. (10). We have the relation between t and ξ, |t| = τ /(ξ ln 2), derived by eqs. (21) and (26). Using this relation and eq. (25) in eq. (10), we obtain an equation of ξˆ ≡ ξ(T (tˆ)): Ã

ξˆ =

!

1 ξˆ ln 2 ln τ − ln . 4 ln 2 2

(27)

ˆ ξˆ → 0 for ξˆ → ∞, ξˆ is almost This equation cannot be solved analytically. However, since (ln ξ)/ proportional to ln τ when τ À 1. The inverse of correlation length corresponds to the density of kinks. Consequently the density of kinks in the state after simulated annealing is estimated as ρSA ≈

4 ln 2 ˆ

2 ln τ − ln ξ ln 2

.

(28)

The second term in the denominator is negligible for sufficiently long τ as mentioned above. Hence eq. (7) is derived. Residual energy is also estimated from the energy at T = T (tˆ). Using eqs. (24) and (26), eq. (10) is rewritten as 1 4βˆ ˆ e = τ /β, (29) 2 ˆ where we defined βˆ ≡ 1/T (tˆ). This equation is followed by βˆ = 41 ln τ − 14 ln(β/2). Substituting this for β in eq. (23), we obtain residual energy per spin as εres =

1 2π 2 . 2 ˆ 3 (ln τ − ln(β/2))

94

(30)

Since the second term in the denominator is negligible for large τ , hence eq. (8) is derived. We next consider a logarithmic schedule: T (t) =

T0 , 1 + a ln(− T0t τ )

(31)

where T0 and a are positive numbers and t is assumed to evolve from −T0 τ to 0. In this schedule, the temperature is reduced from T0 at t = −T0 τ to 0 at t = 0. Arranging eq. (31) as ˆ 1 e4βˆ = T0 τ e1/a−(T0 /a)βˆ, from eqs. (10) and −t = T0 τ e1/a−(T0 /a)β , one obtains the equation of β, 2 (24). This equation can be solved analytically and yield βˆ = ln(2e1/a T0 τ )/(4 + T0 /a). From eq. (21), one obtains the expression for density of kinks as SA

ρ

ln 2(4 + Ta0 ) 1 ≈ ≈ . ξˆ ln τ + ln(2T0 ) + a1

(32)

This expression is reduced to eq. (7) for τ → ∞. The expression of residual energy per spin is obtained as (4 + Ta0 )2 π2 (33) εres ≈ ´ , ³ 24 ln τ + ln(2T ) + 1 2 0

a

which yields eq. (8) for τ → ∞. The asymptotic behaviors of density of kinks and residual energy for τ → ∞ are insensitive to the schedule of quenching temperature. 4.1. Monte-Carlo simulation We confirm the results on the basis of the Kibble-Zurek argument by the Monte-Carlo simulation for systems with 500 spins. The temperature is quenched according to the linear schedule, eq. (26), and the logarithmic schedule, eq. (31). We choose the initial condition of the temperature as T = 5 at t = −5τ for both schedules. This condition is attained by T0 = 5 and an arbitrary a in the logarithmic schedule. a is fixed at 10. To average over randomness of the system, we generated 100 configurations of coupling constants {Ji } according to eq. (2). For each configuration, simulated annealing is carried out 500 times. Figures 1 and 2 are results for density of kinks and residual energy respectively. Square symbols are obtained by Monte-Carlo simulation. We have to care about units of quantities to interpret results of the simulation. At first, the unit of time in the Glauber’s dynamics should be different from that in Monte-Carlo simulation. Then we bring up the relation, τ = cτ mc , between the inverse of annealing rate τ in the Glauber dynamics and that in Monte-Carlo simulation τ mc , where c is an unknown parameter. Next, the inverse of correlation length, 1/ξ is not exactly the same as the density of kinks defined by eq. (3). Then we introduce another parameter b and demand ξ = b/ρ. The curve for the linear schedule in Fig. 1 is given by ρ = 1/ξˆ with ξˆ determined by eq. (27) and τ = cτ mc . Parameters, b and c, are determined by the Monte-Carlo results with the largest two τ mc ’s. The fitting function for the logarithmic schedule is chosen as ρ = A/(ln τ mc + B). Parameters A and B are determined by means of the least square method. The obtained curves fit Monte-Carlo results nicely for both schedules. In order to fit residual energies after simulated annealing with linear schedule in Fig. 2, we 2 ˆ 2, put forward an ansatz that residual energy is given with a parameter α by εres = π24 /(αβ) where βˆ is determined by eq. (29) with τ = cτ mc . The parameter c is given from results for density of kinks. The other parameter α is determined by the Monte-Carlo result of residual energy for the largest τ mc . We assume the fitting function, εres = D/(ln τ mc + B)2 , for the logarithmic schedule. The fitting parameter D is averaged over values obtained from Monte(k) Carlo results εres for given τ mc (k) . B is given by the result of density of kinks. Evidently, these curves for residual energy are in good agreement with Monte-Carlo results.

95

ρ

linear schedule

logarithmic schedule

ln τmc Figure 1. Density of kinks after simulated annealing with linear and logarithmic schedules obtained by Monte-Carlo simulation (square symbols). The fitting function for linear schedule is given by ρ = b/ξˆ with the correlation length ξˆ determined by eq. (27) with τ = cτ mc . For logarithmic schedule, the fitting function is given by ρ = A/(ln τ mc + B). Parameters b, c, A, and B are adjusted so as to fit Monte-Carlo data.

εres

linear schedule

logarithmic schedule

ln τmc Figure 2. Residual energy obtained by Monte-Carlo simulation (square symbols). The fitting 2 ˆ 2 with βˆ determined by eq. (29) with function for linear schedule is given by εres = π24 /(αβ) mc τ = cτ . Data for logarithmic schedule are fitted by εres = D/(ln τ mc + B)2 . α and D are the fitting parameters. c and B are given from results of density of kinks. The parameters A, B and D for the logarithmic schedule relate with b, c, and α as A = b(4 + T0 /a) ln 2, B = ln(2cT0 ) + 1/a, and D = (π 2 /24)(4 + T0 /a)2 /α2 , where T0 = 5 and a = 10 in our simulation. From the values of A, B, and D, we estimate b ≈ 0.240, c ≈ 27.09, and α ≈ 2.145. These are roughly consistent with the values b ≈ 0.240, c ≈ 21.46, and α ≈ 2.137 obtained from results of the linear schedule. Although there seems to be inconsistency in c, we attribute it to statistical errors of the Monte-Carlo simulation. Indeed, the curve taking this difference into account lies inside error bars of the Monte-Carlo simulation. 5. Conclusion We compared quantum annealing and simulated annealing of the random Ising model in one dimension. Using the Kibble-Zurek argument, we can analytically obtain the estimation of residual errors after annealing. The results are summarized as follows. For quantum annealing,

96

density of kinks decays as

ρQA ∼ 1/(ln τ )2

for large τ , where τ is the inverse of the annealing rate. Residual energy after quantum annealing has not been available so far. For simulated annealing, density of kinks decays as ρSA ∼ 1/ ln τ, and residual energy as

2 εSA res ∼ 1/(ln τ )

for large τ . The decay rate of residual energy after simulated annealing is the same as the Huse-Fisher law, but we reproduced it in the different manner. Comparing density of kinks, the decay rate of quantum annealing is faster than that of simulated annealing. This is the first analytic evidence that quantum annealing has an advantage over simulated annealing. Acknowledgments The author acknowledges T. Caneva, G. E. Santoro, and H. Nishimori for valuable discussions and comments. The present work has been partially supported by CREST, JST. References [1] S. Kirkpatrick, C. D. Gelett, and M. P. Vecchi, Science 220, 671 (1983). [2] K. Hukushima and K. Nemoto, J. Phys. Soc. Jpn., 65, 1604 (1996). [3] D. J. Earl and M. W. Deem, Phys. Chem. Chem. Phys., 7, 3910 (2005). [4] A. B. Finnila, M. A. Gomez, C. Sebenik, C. Stenson, and J. D. Doll, Chem. Phys. Lett. 219, 343 (1994). [5] T. Kadowaki and H. Nishimori, Phys. Rev. E 58, 5355 (1998). [6] E. Farhi, J. Goldstone, S. Gutmann, and M. Sipser, e-print arXiv:quant-ph/0001106. [7] T. Kadowaki, Thesis, Tokyo Institute of Technology, e-print arXiv:quant-ph/0205020. [8] G. Santoro, R. Martoˇ na ´k, E. Tosatti, and R. Car, Science 295, 2427 (2002). [9] R. Martoˇ na ´k, G. Santoro, and E. Tosatti, Phys. Rev. E 70, 057701 (2004). [10] D. Battaglia, G. Santoro, and E. Tosatti, Phys. Rev. E 71, 066707 (2005). [11] J. Brooke, D. Bitko, T. F. Rosenbaum, and G. Aeppli, Science 284, 779 (1999). [12] J. Dziarmaga, Phys. Rev. B 74, 064416 (2006). [13] T. Caneva, R. Fazio, and G. Santoro, Phys. Rev. B 76, 144427 (2007). [14] D. Huse and D. Fisher, Phys. Rev. Lett. 57, 2203 (1986). [15] T. W. B. Kibble, Phys. Rep. 67, 183 (1980). [16] W. H. Zurek, Nature 317, 505 (1985). [17] R. J. Glauber, J. Math. Phys. 4, 294 (1963). [18] D. Dhar and M. Barma, J. Stat. Phys. 22, 259 (1980). [19] S. Suzuki, arXiv: 0807.2933.

97

98

Proceedings of the International Workshop on Statistical-Mechanical Informatics September 14–17, 2008, Sendai, Japan

Quantum annealing for problems with ground-state degeneracy Yoshiki Matsuda1 , Hidetoshi Nishimori1 and Helmut G Katzgraber2 1 2

Department of Physics, Tokyo Institute of Technology, Tokyo 152-8551, Japan Theoretische Physik, ETH Z¨ urich, CH-8093 Z¨ urich, Switzerland

Abstract. We study the performance of quantum annealing for systems with ground-state degeneracy by directly solving the Schr¨ odinger equation for small systems and quantum Monte Carlo simulations for larger systems. The results indicate that quantum annealing may not be well suited to identify all degenerate ground-state configurations, although the value of the ground-state energy is often efficiently estimated. The strengths and weaknesses of quantum annealing for problems with degenerate ground states are discussed in comparison with classical simulated annealing.

1. Introduction Quantum annealing (QA) [1, 2] is the quantum-mechanical version of the simulated annealing (SA) [3] algorithm to study optimization problems. While the latter uses the slow annealing of (classical) thermal fluctuations to obtain a ground-state estimate, the former uses quantum fluctuations. An extensive body of numerical [4, 5, 6] as well as analytical [7] studies show that QA is generally more efficient than SA for the ground-state search (optimization) of classical Hamiltonians of the Ising type. This fact does not immediately imply, however, that SA will soon be replaced by QA in practical applications because the full implementation of QA needs an efficient method for solving the Schr¨odinger equation for large systems, a task optimally achievable only on quantum computers. Given continuing progress in the implementability of quantum computers, we thus continue to study the theoretical efficiency and the limit of applicability of QA using small-size prototypes and classical simulations of quantum systems, following the general spirit of quantum information theory. The present paper is a partial report on these efforts with a focus on the efficiency of QA when the ground state of the studied model is degenerate, i.e., when different configurations of the degrees of freedom yield the same lowest-possible energy. So far, almost all problems studied with QA have been for nondegenerate cases, and researchers have not paid particular attention to the role played by degeneracy. This question, however, needs careful scrutiny because many practical problems have degenerate ground states. If the goal of the minimization of a Hamiltonian (cost function) of a given problem is to obtain the ground-state energy (minimum of the cost function), it suffices to reach one of the degenerate ground states, which might often be easier than an equivalent nondegenerate problem because there are many states that are energetically equivalent. If, on the other hand, we are asked to identify all (or many of) the degenerate ground-state configurations (arguments of the cost function which minimize it) and not just the lowest value of the energy, we have to carefully

99

check if all ground states can be found. This would thus mean that the chosen algorithm can reach all possible ground-state configurations ergodically. We have investigated this problem for a few typical systems with degeneracy caused by frustration effects in the interactions between Ising spins. Our results indicate that QA is not necessarily well suited for the identification of all the degenerate ground states, i.e., the method fails to find certain ground-state configurations independent of the annealing rate. This is in contrast to SA, with which all the degenerate states are reached with almost equal probability if the annealing rate of the temperature is sufficiently slow. Nevertheless, when only the groundstate energy is needed, QA is found to be superior to SA in some example systems. The present paper is organized as follows: Section 2 describes the solution of a small system by direct diagonalization and numerical integration of the Schr¨odinger equation. Section 3 is devoted to the studies of larger degenerate systems via quantum Monte Carlo simulations, followed by concluding remarks in section 4. 2. Schr¨ odinger dynamics for a small system It is instructive to first study a small-size system by a direct solution of the Schr¨odinger equation, both in stationary and nonstationary contexts. The classical optimization problem for this purpose is chosen to be a five-spin system with interactions as shown in figure 1.

Figure 1. Five-spin toy model studied. Full lines denote ferromagnetic interactions (Jij = 1) while dashed lines stand for antiferromagnetic interactions (Jij = −1). Because of the geometry of the problem the system has a degenerate ground state by construction. The Hamiltonian of this system is given by X H0 = − Jij σiz σjz ,

(1)

hiji

where the sum is over all nearest-neighbor interactions Jij = ±1 and σiz denote Ising spins parallel to the z-axis. The system has six degenerate ground states, three of which are shown in figure 2. We apply a transverse field H1 = −

5 X

σix

(2)

i=1

to the system H0 to induce a quantum transitions between classical states. Hamiltonian H(t) changes from H1 at t = 0 to H0 at t = τ , i.e.,   t t H1 + H0 . H(t) = 1 − τ τ

The total

(3)

For large τ the system is more likely to follow the instantaneous ground state according to the adiabatic theorem. If the target optimization Hamiltonian H0 had no degeneracy in the

100

|1i

|2i

|3i

Figure 2. Nontrivial degenerate ground states of the toy model shown in figure 1. The other three ground states |¯1i, |¯2i, and |¯3i are obtained from |1i, |2i, and |3i by reversing all spins. ground state, the simple adiabatic evolution (τ ≫ 1) would drive the system from the trivial initial ground state of H1 to the nontrivial final ground state of H0 (solution of the optimization problem). The situation changes significantly for the present degenerate case as illustrated in figure 3, which depicts the instantaneous energy spectrum. Some of the excited states reach the final 4

Energy

2 0 -2 -4 0.0

0.2

0.4

0.6 tΤ

0.8

1.0

Figure 3. Instantaneous energy spectrum of the five-spin system depicted in figure 1. For simplicity we have omitted the energy levels that are not reachable from the ground state due to different symmetry properties.

ground state as t/τ → 1. In particular, the instantaneous ground state configurations have been found to be continuously connected to a special symmetric combination of four of the final ground states at t = τ , |2i + |¯2i + |3i + |¯3i, whereas the other two states |1i and |¯1i are out of reach as long as the system faithfully follows the instantaneous ground state (τ ≫ 1). A relatively quick time evolution with an intermediate value of τ may catch the missed ground states. However, there is no guarantee that the obtained state using this procedure is a true ground state since one of the final excited states may be reached. As shown in the left panel of figure 4, intermediate values of τ around 10 give almost an even probability to all the true ground states, an ideal situation. However, the problem is that we do not know an appropriate value of τ beforehand. In contrast, the right panel of figure 4 shows the result of SA by a direct numerical integration of the master equation, in which all the states are reached evenly in the limit of large τ . Figure 3 suggests that it might be plausible to start from one of the low-lying excited states of H1 to reach the missed ground state. However, such a process has also been found to cause similar problems as above. We therefore conclude that QA is not suitable to find all degenerate ground-state configurations of the target system H0 , at least in the present example. This aspect is to be contrasted with SA, in which infinitely slow annealing of the temperature certainly finds all ground states with equal probability as assured by the theorem of Geman and Geman [8]. QA nevertheless shows astounding robustness against a small perturbation that lifts part of the degeneracy if our interest is in the value of the ground-state energy. Figure 5 depicts the

101

1.0

|1> |2> |3>

0.8

total

0.8

total

)

|1> |2> |3>

0.6

P(

)

P(

1.0

0.6

0.4

0.4

0.2

0.2

0.0 10

-2

10

-1

10

0

10

1

10

2

10

3

10

0.0

4

10

-2

10

-1

10

0

10

1

10

2

10

3

10

4

Figure 4. Annealing-time dependence of the final probability that the system is in any one of the ground states. Left panel: Data for the five-spin model using QA. Only the states |2i and |3i (and their reversals |¯2i and |¯3i) are reached for large τ . Right panel: In contrast, SA finds all the states with equal probability. residual energy—the difference between the obtained approximate energy and the true groundstate energy—as a function of the annealing time τ . Data using SA are also shown in figure

0

10

QA SA

10

10

10

10

Residual Energy

Residual Energy

10

-2

-4

-6

10

-2

10

-1

10

0

10

1

10

2

10

3

10

10

10

10

4

QA SA

10

-8

0

-2

-4

-6

-8

10

-2

10

-1

10

0

10

1

10

2

10

3

10

4

Figure 5. Residual energy as a function of the annealing time τ for the five-spin toy model. Left panel: Degenerate case. Right panel: A small perturbation h = 0.10 [see equation (4)] has been added to lift the overall spin-reversal symmetry and thus break the degeneracy. While QA is rather robust against the inclusion of a field term and the residual energy decays in both cases ∼ τ −2 for large τ , SA seems to not converge after the inclusion of a field term. Dotted lines represent the results in zero field for comparison. 5 using an annealing schedule of temperature T = (τ − t)/t, corresponding to the ratio of the first and the second terms on the right-hand side of equation (3). In the degenerate case (left panel) it seems that SA outperforms QA since the residual energy decays more rapidly using

102

SA. However, if we apply a small longitudinal field to H0 H2 = −h

5 X

σiz

(4)

i=1

and regard H0 +H2 as the target Hamiltonian to be minimized, the situation changes drastically: The convergence of SA slows down significantly while the convergence of QA remains almost intact (right panel of figure 5). As already observed in the simple double-well potential problem [9], the energy barrier between the two almost degenerate states of H0 + H2 may be too high to be surmountable by SA whereas the width of the barrier remains thin enough to allow for quantum tunnelling to keep QA working. 3. Monte Carlo simulations for larger systems The simplest Ising model which possesses an exponentially-large (in the system size) groundstate degeneracy is the two-dimensional Villain fully-frustrated Ising model [10]. The Ising model, which we study here using quantum Monte Carlo simulations, is defined on a square lattice of size N = L × L and has alternating ferromagnetic and antiferromagnetic interactions in the horizontal direction and ferromagnetic interactions in the vertical direction; see figure 6. We use periodic boundary conditions and show data for L = 6. The left panel of figure 7 shows

Figure 6. Fully-frustrated Ising model with N = 36 spins (dots) and periodic boundary conditions. The system has 45088 degenerate ground states (excluding spin-reversal symmetry). The horizontal bonds alternate between ferromagnetic (full lines) and antiferromagnetic (dashed lines). The vertical bonds are all ferromagnetic. This ensures that the product of all bonds around any plaquette is negative, i.e., the system is maximally frustrated. a linear-log plot of the relative number of ground-state hits versus the ground-state numbering (sorted by the number of hits). While a small part of the total set of ground states are reached very frequently, some ground states seem exponentially suppressed. Furthermore, this seems almost independent of the chosen value of the annealing time τ . In the right panel of figure 7 we show the relative number of ground-state hits versus the ground-state numbering for SA. In contrast to QA, SA finds almost all ground states with even probability. While some ground states seem to be preferred, all ground states can be reached with a frequency of at least 40% (see also Ref. [11]). The residual energy is shown in the left panel of figure 8. The data for SA follow a rapid decrease beyond τ ≈ 2 × 103 , whereas QA stays unimproved beyond this region. Note that the energy for QA has been estimated as the average value of energies of all Trotter slices that emerge in the quantum-classical mapping for the quantum Monte Carlo simulation. The best value among the Trotter slices shows better performance. Since we are not certain if such a process constitutes a fare comparison with SA, we avoid further details here. Finally, we also show some preliminary data for the two-dimensional bimodal (±J) Ising spin glass [12]. In this case the situation is quite different, as can be seen in the right panel of figure 8. The residual energy using QA decreases more rapidly than when using SA. For this particular model it seems that QA is a powerful tool, as witnessed in previous studies [1, 4, 5, 6].

103

0

10

1.0

-1

0.8

Frequency

Frequency

10

-2

10

-3

10

0.6

0.4

=100

=100 -4

10

0.2

=300

=1000

=1000

-5

0.0

10

0

10000

=300

20000

30000

0

40000

10000

20000

30000

40000

Figure 7. Histograms of the relative frequency that a given ground state is reached by QA (left panel) and SA (right panel). In the abscissa the ground states are numbered according to their relative frequency to be reached, and thus the histograms are monotonically decreasing. While SA finds most ground states evenly and all ground states can be reached at least 40% of the time, in QA some ground states seem exponentially suppressed.

Residual Energy

10

10

10

10

-1

-2

Residual Energy

10

-3

-4

-5

10

10

-1

-2

QA

QA

10

SA

-6

10

1

10 10

2

10

3

4

10

10

SA

-3

10

5

1

10

2

10

3

10

4

10

5

Figure 8. Left panel: Residual energy for the fully-frustrated Ising model with L = 40 using QA and SA. While the residual energy for SA decreases monotonically, for QA it saturates around τ ≈ 2 × 103 . Right panel: Residual energy for the ±J Ising spin glass with L = 40. QA shows a more rapid decay than SA both in the sample average and minimum value between samples. 4. Conclusion We have studied the performance of QA for systems with degeneracy in the ground state of the target Hamiltonian. Our results show that QA reaches only a limited part of the set of degenerate ground-state configurations and misses the other states. The instantaneous energy spectrum as a function of time, figure 3, is a useful tool to understand the situation: some of the excited states merge to the final ground states as t/τ → 1. It is usually difficult, however, to initially select the appropriate excited states which reach the ground state when t/τ → 1. We therefore have to be biased a priori in the state search by QA for degenerate cases. Simulations

104

on nontrivial Ising models with ground-state degeneracy such as the two-dimensional Villain model show that certain ground-state configurations are exponentially suppressed when using QA, whereas this is not the case when using SA. Nevertheless, if the problem is to find the ground-state energy, QA is often (but not necessarily always) an efficient method as exemplified in figure 5 where the residual energies are shown. It is clearly necessary to study more examples and to construct an analytical theory based on the numerical evidence to establish criteria when to (and when not to) use QA. Acknowledgement The present work was supported by the Grant-in-Aid for Scientific Research for the priority area ‘Deepening and expansion of statistical-mechanical informatics’ (DEX-SMI) and CREST, JST. H.G.K. acknowledges support from the Swiss National Science Foundation under Grant No. PP002-114713. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]

Kadowaki T and Nishimori H 1998 Phys. Rev. E 58 5355 Finnila A B, Gomez M A, Sebenik C, Stenson C and Doll J D 1994 Chem. Phys. Lett. 219 343 Kirkpatrick S, Gelatt, Jr C D and Vecchi M P 1983 Science 220 671 Das A and Chakrabarti B K 2005 Quantum Annealing and Related Optimization Methods (Edited by A. Das and B.K. Chakrabarti, Lecture Notes in Physics 679, Berlin: Springer) Santoro G E and Tosatti E 2006 J. Phys. A 39 R393 Das A and Chakrabarti B K 2008 Quantum Annealing and Analog Quantum Computation (arXiv:0801.2193 [quant-ph]), to appear in Rev. Mod. Phys. Morita S and Nishimori H 2008 Mathematical Foundation of Quantum Annealing (arXiv:0806.1859 [quantph]), to appear in J. Math. Phys. Geman S and Geman D 1984 IEEE Trans. Pattern. Analy. Mach. Intell. PAMI-6 721 Stella L, Santoro G E and Tosatti E 2005 Phys. Rev. B 72 014303 Villain J 1977 J. Phys. C 10 1717 Moreno J J, Katzgraber H G and Hartmann A K 2003 Int. J. Mod. Phys. C 14 285 Binder K and Young A P 1986 Rev. Mod. Phys. 58 801

105

106

Proceedings of the International Workshop on Statistical-Mechanical Informatics September 14–17, 2008, Sendai, Japan

Re‘class’ification of ‘quant’ified classical simulated annealing Toshiyuki Tanaka Department of Systems Science, Graduate School of Informatics, Kyoto University, 36-1 Yoshida Hon-machi, Sakyo-ku, Kyoto-shi, Kyoto 606-8501, Japan E-mail: [email protected] Abstract. We discuss a classical reinterpretation of quantum-mechanics-based analysis of classical Markov chains with detailed balance, that is based on the quantum-classical correspondence. The classical reinterpretation is then used to demonstrate that it successfully reproduces a sufficient condition for cooling schedule in classical simulated annealing, which has the inverse-logarithm scaling.

1. Introduction Theoretical properties of simulated annealing [1] have been extensively studied in the 1980s [2, 3]. One of the main issues in those research activities was regarding the annealing schedule: How should one decrease temperature T (t) as a function of time t in order to finally arrive at a globally optimum solution? Geman and Geman [4] were the first to obtain an answer, which states a sufficient condition of the form T (t) ≥ O(1/ log t). The inverse-logarithm scaling turned out to be universal, in the sense that it is also sufficient for many variants of simulated annealing and some other algorithms. Hajek [5] proved a necessary and sufficient condition which also has the inverse-logarithm form, showing that one cannot do the cooling any faster than that while guaranteeing global optimality. Somma et al., in their recent contribution [6], have shown that the inverse-logarithm scaling of simulated annealing can also be obtained via the adiabatic condition [7] of a related quantummechanical system. The relationship between the original Markov chain in simulated annealing and the quantum system is established via the so-called classical-quantum mapping or quantumclassical correspondence [8, 9, 10]. In this paper, we discuss a classical reformulation of quantum equivalent of a classical Markov chain with detailed balance, in order to elucidate mathematical structure of the correspondence between a Markov chain and its quantum equivalent, without making reference to quantum mechanics. We also discuss another classical reformulation of the argument deriving the optimal inverse-logarithm scaling of annealing schedule [6] (see also [11]), which is based on the quantum adiabatic theorem. This paper is organized as follows. In section 2 we first provide a basic formulation of classical Markov chains with detailed balance, and derive its α-representation. A local linear approximation of the time evolution in terms of α-representation is also discussed. Our derivation of the inverse-logarithm scaling of simulated annealing is discussed in section 3. A “chasing” view of simulated annealing, that is based on the local linear approximation based on the 0-representation, and a bound of the largest negative eigenvalue are used in the derivation.

107

In section 4 we discuss relation between our formulation and the stochastic matrix form decomposition, which is defined and discussed extensively in [9]. Section 5 concludes the paper. 2. Basic formulations 2.1. Markov chains Let S denote a state space, which is a finite set of cardinarity N . Let E be an “energy function” defined on S, which associates a state i ∈ S with its energy Ei . Then, one can define a probability distribution on S, in terms of a probability vector ρ¯ = (¯ ρi ), as ρ¯i =

e−βEi , Z

Z=



e−βEi ,

(1)

i∈S

which is the Gibbs-Boltzmann distribution induced by the energy function E, with β > 0 a parameter corresponding to the inverse temperature. Let us consider a undirected graph G with S its vertex set and an edge set L. We assume G to be a connected graph, without self-edge (loop). We define a transition matrix M = (mij ) as ⎧ β(Ej −Ei )/2 ((ij) ∈ L) ⎨ w ij e (i = j) − k: (ik)∈L mki , mij = ⎩ 0 (otherwise)

(2)

where W = (wij ) is a symmetric matrix with wij > 0 for (ij) ∈ L. On the basis of the transition matrix M , one can define a continuous-time Markov chain, as ρ˙ = M ρ.

(3)

The connectedness of the graph G induces irreducibility of the Markov chain. The Markov chain is also aperiodic, so that it is ergodic, and therefore bears a unique equilibrium distribution. The Gibbs-Boltzmann distribution (1) is the equilibrium distribution of the Markov chain, since M ρ¯ = 0 holds. The formulation presented here is general, including various typical systems as special cases. For example, conventional Ising spin systems are described by letting S = {−1, 1}n with N = 2n and L having an n-dimensional hypercubic structure. Metropolis and Glauber dynamics are implemented by letting   (4) wij ∝ max eβ(Ej −Ei )/2 , eβ(Ei −Ej )/2 , and wij ∝

eβ(Ej −Ei )/2

1 , + eβ(Ei −Ej )/2

(5)

respectively, as mentioned in [9]. 2.2. α-representation We discuss a different representation of the continuous-time Markov chain (3), in view of the classical-to-quantum mapping utilized in [6]. Although the quantum reformulation mapped √ from a classical Markov chain makes use of square roots of probabilities { ρi }, we here discuss a slightly more generalized expression which is based on the so-called α-representation of ρ. (α) Definition. We define the α-representation ψ (α) = (ψi ) of ρ as (α)

ψi

=

2 (1−α)/2 ρ . 1−α i

108

(6)

The concept of α-representation is originally introduced in information geometry [12, 13], in order to discuss intrinsic geometrical structures of statistical manifolds. Taking square roots of probabilities corresponds to considering 0-representation. Although not used in this paper, 1-representation is defined as (1) (7) ψi = log ρi . We next derive an expression of the Markov chain in terms of the α-representation. One has (α) −(1+α)/2 = ρi ψ˙ i



mij ρj

j∈S

=

−(1+α)/2 1 ρi

−α (1+α)/2 (α) mij ρj ψj , 2

(8)

j∈S

which is rewritten, in a vector-matrix form, as 1 − α (−α) (α) H ψ , ψ˙ (α) = 2

(9)

where the matrix H (α) is defined as H (α) = (Ψ(α) )−1 M Ψ(α) ,

(10)

with Ψ(α) = diag(ψiα ). Clearly, eigenvalues of the matrix H (α) are the same as those of M . The (−α) elements of the matrix H (−α) = (hij ) are given by

(−α)

hij



−1 (−α) β(Ej −Ei )/2 ψ (−α) ⎪ w e ψj ((ij) ∈ L) ⎨ ij i  , = (i = j) − k: (ik)∈L wki eβ(Ei −Ek )/2 ⎪ ⎩ 0 (otherwise)

(11)

and consequently, ¯ (−α) = h(−α) |ρ=ρ¯ = h ij ij

⎧ ⎨ ⎩



wij eαβ(Ei −Ej )/2 ((ij) ∈ L) β(Ei −Ek )/2 w e (i = j) . ki k: (ik)∈L 0 (otherwise)



(12)

The above expression evidently shows that the 0-representation is special in our formulation, ¯ (−α) = H (−α) |ρ=ρ¯ becomes symmetric when α = 0, that is, under the in that the matrix H 0-representation. The fact that the 0-representation symmetrizes the transition matrix M was also mentioned in [14], in order to state that eigenvalues of M are all real. It should be noted that the matrix H (−α) is dependent on ψ (α) via Ψ(−α) and therefore H (0) does not symmetric at ρ = ρ¯ in general. 2.3. Time evolution We discuss linearization of the α-representation of the dynamical equation. Starting from the nonlinear dynamics

1 − α (α) −(1+α)/(1−α)  (α) (α) 2/(1−α) ψi mij ψj , ψ˙ i = 2 j∈S

109

(13)

and considering a small perturbation δψ (α) around ψ (α) , we obtain the following linearized system which describes time evolution of δψ (α) :



 (α) (α) −(1+α)/(1−α) (α) (1+α)/(1−α) (α) = ψi mij ψj δψj δψ˙ i j∈S







2/(1−α) 1 + α ⎣ (α) −2/(1−α)  (α) ⎦ δψ (α) + o δψ (α)  . ψi − mij ψj i 2

(14)

j∈S

In particular, observing that the second term of the right-hand side of (14) vanishes at the ¯ irrespective of the value of α, the linearization around the Gibbs-Boltzmann distribution ρ, equilibrium point becomes, ignoring higher-order terms, ¯ (−α) δψ (α) . (15) δψ˙ (α) = H ¯ (−α) governs the local dynamics described in terms Equation (15) states that the matrix H of α-representation in the vicinity of the equilibrium distribution ρ¯. It should be noted that the right-hand side of (14) is in general not a projection of H (−α) δψ (α) onto the manifold of probability distributions in α-representation, defined as   1 − α (α) 2/(1−α) ψi = 1. (16) 2 i∈S

3. Simulated annealing 3.1. Relaxation in annealing With the inverse temperature β fixed, the distribution following the Markov chain relaxes toward the Gibbs-Boltzmann distribution. The basic idea behind simulated annealing is that by gradually reducing the temperature one can arrive at a distribution which concentrates on a set of minimum-energy states. Thus, by performing simulations of the Markov chain with a proper cooling schedule, one expects to obtain minimum-energy states with probability close to 1. One of the basic questions regarding simulated annealing is to determine the cooling schedule which guarantees convergence to minimum-energy states. We wish to study this problem via the linearized local dynamics in α-representation (15), with α = 0. Intuitively, our expectation is that if simulated annealing works well the distribution should stay very close to instantaneous equilibrium distributions as β is changed slowly enough. If it is the case, then arguments that are based on the local linear approximation around the ¯ (0) is symmetric, all eigenvalues equilibrium (15) will be justified. Since the coefficient matrix H are real, so that the local dynamics around equilibrium is a simple linear relaxation toward ¯ (0) govern the speed of relaxation. In simulated the equilibrium, with negative eigenvalues of H annealing the instantaneous equilibium distribution is also slowly drifting as β changes. One can therefore expect to obtain a minimum-energy distribution only if the drift is slow enough so that the relaxation process is managed to catch up with the drift. What is important for ¯ (0) . successful convergence of simulated annealing is thus the largest negative eigenvalue of H 3.2. Bound on largest negative eigengalue We let ¯ (0) )N , M = (bI + χH where χ = e−βd/2 /wmax , with d = maxi, j |Ei −Ej | and wmax ¯ (0) nondiverging as β gets large, and where elements of χH 

b = 1 + max i∈S

k:(ki)∈L

110

(17) = max(ij)∈L wij , is to make diagonal

wki wmax

(18)

¯ (0) becomes a non-negative matrix. Irreducibility of the original Markov is chosen so that bI +χH chain guarantees M to be a (strictly) positive matrix. The following theorem for positive matrices, due to Hopf [15] in its operator form, is applied to obtain an upper bound of the largest negative eigenvalue. Theorem 1 Let A = (aij ) be a square matrix that is positive, i.e., aij > 0 holds for all i, j. Then the maximum eigenvalue λ0 of A and any other eigenvalues λ satisfy the inequality |λ| ≤

κ−1 λ0 , κ+1

(19)

aik . ajk

(20)

where κ = max i, j, k

¯ (0) are bounded from below by min{1, wmin χ}, All positive elements of the matrix bI + χH where wmin = min(ij)∈L wij , and wmin χ actually gives the lower bound for not too small values of β. A lower bound of the minimum element of M is thus (wmin χ)N . Alternatively, the matrix ¯ (0) is upper bounded componentwise by the matrix (b − 1)I + 11T , where 11T is an all-1 bI + χH matrix, so that an upper bound of the maximum element of M is given by (3N )N . An upper bound of the parameter κ is therefore evaluated as  κ≤

3N wmin χ

N (21)

.

¯ (0) , and hence of M, makes the argument of bounding Note that symmetry of the matrix bI +χH κ straightforward, thereby demonstrating efficiency of the 0-representation. ¯ (0) has a zero eigenvalue which ¯ (0) . Since we know that H Let λ be a negative eigenvalue of H is the largest, applying theorem 1 yields (b + χλ)N ≤

κ−1 N b , κ+1

(22)

and consequently, λ≤−

2b(wmin χ)N b(wmin χ)N 2b ≤− ≤ − , N (κ + 1) N [(3N )N + (wmin χ)N ] N (3N )N

(23)

where we used the inequality 1 − [(κ − 1)/(κ + 1)]1/N ≥ 2/[N (κ + 1)] for κ, N ≥ 1. To make clear its dependence on β, we rewrite it as 

λ ≤ −δe−βN (d+d )/2 ,

δ=

b , N (3N )N

(24)

where we have taken into account possible dependence of wij on β, by assuming that  wmin ≥ e−βd /2 , wmax

holds.

111

d ≥ 0

(25)

≤ Ce−βg/2β˙ r

Target

≥ |λ|r Chaser

Figure 1. A “chasing” view of simulated annealing. 3.3. Simulated annealing as a chase of target From now on we assume the inverse temperature β to be a function of time t, and consider speed of drift of the instantaneous equilibrium distribution ψ¯(0) . We have  (0) 2  ˙¯  (26) ψ  = Cov(E)β˙ 2 ≤ C 2 e−βg β˙ 2 , where g is an energy gap between the lowest and the second lowest energies in {Ei ; i ∈ S}, and where C > 0 is a constant independent of β. Now the problem is recast into the problem of “chasing” a drifting target (see figure 1), whose ˙ The speed of the chaser is no less than |λ|r, where r is the velocity is no more than Ce−βg/2 β. “distance” between the chaser and the target, because the speed is determined by gradient¯ (0) . In view of the adiabatic theorem, which lays the descent of a potential surface induced by H basis of the quantum-mechanics-based analysis of simulated annealing [6], we assume that r is small throughout the process, so that the local linear approximation of the dynamics is valid. We wish to obtain a sufficient condition for β, as a function of time t, such that r tends to 0 as t → ∞ and β → ∞. With a modest amount of foresight, we assume that r approaches zero as r ∼ r0 t−γ with 0 < γ < 1. Since the speed of the chaser should be larger than that of the target, as a sufficient condition one has 

˙ δr0 e−βN (d+d )/2 t−γ > Ce−βg/2 β.

(27)

Solving it for β, we obtain for large enough t, β


N (d + d ) 2 log t

(29)

as a sufficient condition for simulated annealing to converge to a minimum-energy distribution.

112

4. Stochastic matrix form decomposition The stochastic matrix form (SMF) decomposition, defined in [9], is a key to establishing the classical-to-quantum mapping. In this section, we briefly discuss relation between our formulation and the SMF decomposition. The SMF decomposition of H (α) is given by H (α) =



(α)

wij Hij ,

(30)

(ij)∈L

with Hij = eβ(Ej −Ei )/2 (α)



 

 (α) −1 (α) (α) −1 (α) ψj Eij − Ejj + eβ(Ei −Ej )/2 ψj ψi Eji − Eii , ψi

(31)

¯ (α) denote the matrix where Eij is a matrix with (i, j) element being 1 and others being 0. Let H ¯ (α) = H (α) |ρ=ρ¯. When evaluated at the equilibrium distribution of the Markov chain, that is, H ij α = 0, it becomes ¯ (0) = Eij + Eji − eβ(Ei −Ej )/2 Eii − eβ(Ej −Ei )/2 Ejj . (32) H ij ¯ (0) is symmetric. The matrix H ij The α-representation of the Gibbs-Boltzmann distribution, ψ¯(α) , is an eigenvector of the ¯ (−α) with eigenvalue 0, that is, matrix H ij ¯ (−α) ψ¯(α) = 0 H ij

(33)

holds. This condition corresponds to the detailed-balance condition of the original formulation of the Markov chain. Note that it is consistent with the fact that ψ¯(α) is an eigenvector of the ¯ (−α) with eigenvalue 0. matrix H 5. Conclusion In this paper, we have discussed a classical reinterpretation of the quantum-mechanicsbased analysis of classical simulated annealing [6], that is based on the quantum-classical correspondence [8, 9, 10]. We have provided a reformulation of a Markov chain with detailed balance via the α-representation, as well as its local linear approximation of time evolution. It has been shown that the local linear approximation preserving the eigenvalues of the original Markov chain (equation (15)) is valid only in the vicinity of the equilibrium distribution. On the basis of the 0-representation-based reformulation, we have shown that the inverse-logarithm scaling of temperature in simulated annealing that guarantee optimality is successfully reproduced on the basis of our formulation, without recourse to quantum adiabatic theorem. We believe that usefulness of the α-representation of Markov chains with detailed balance goes well beyond just deriving the inverse-logarithm scaling, and hope that our reformulation helps shed light on the usefulness of the α-representation in more general context. Acknowledgments Support from the Grant-in-Aid for Scientific Research on Priority Areas, Ministry of Education, Culture, Sports, Science and Technology, Japan (no. 18079010) is acknowledged.

113

References [1] Kirkpatrick S, C D Gelatt J and Vecchi M P 1983 Science 220 671–680 [2] van Laarhoven P J M and Aarts E H L 1988 Simulated Annealing: Theory and Applications (Kluwer Academic Publishers) [3] Aarts E and Korst J 1989 Simulated Annealing and Boltzmann Machines: A Stochastic Ppproach to Combinatorial Optimization and Neural Computing (John Wiley and Sons Ltd.) [4] Geman D and Geman S 1984 IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-6 721–741 [5] Hajek B 1988 Mathematics of Operations Research 13 311–329 [6] Somma R D, Batista C D and Ortiz G 2007 Physical Review Letters 99 030603–1–4 [7] Messiah A 1958 Quantum Mechanics (John Wiley & Sons) [8] Henley C L 2004 Journal of Physics: Condensed Matter 16 S891–S898 [9] Castelnovo C, Chamon C, Mudry C and Pujol P 2005 Annals of Physics 318 316–344 [10] Verstraete F, Wolf M M, Perez-Garcia D and Cirac J I 2006 Physical Review Letters 96 220601–1–4 [11] Morita S and Nishimori H 2008 Mathematical foundation of quantum annealing [online] arXiv:0806.1859v1 [quant-ph] [12] Amari S 1985 Differential-Geometrical Methods in Statistics (Lecture Notes in Statistics vol 28) (SpringerVerlag) [13] Amari S and Nagaoka H 2000 Methods of Information Geometry (Translations of Mathematical Monographs vol 191) (American Mathematical Society) [14] Mukherjee S, Nakanishi H and Fuchs N H 1994 Physical Review E 49 5032–5045 [15] Hopf E 1963 Journal of Mathematics and Mechanics 12 683–692

114

Proceedings of the International Workshop on Statistical-Mechanical Informatics September 14–17, 2008, Sendai, Japan

Monte Carlo Approach to Phase Transitions in Quantum Systems Naoki Kawashima and Yasuyuki Kato Institute for Solid State Physics, Kashiwanoha 5-1-5, Kashiwa, Chiba 277-8581, JAPAN E-mail: [email protected] Abstract. We propose algorithms of the path-ingetral-based quantum Monte Carlo simulation, which is otherwise prohibitively slow. While the basic idea is the loop-cluster update, there are some important “tricks” that are vital to make the simulation practical. In the present paper, we show two such techniques and their successful applications to the two-dimensional SU(N ) Heisenberg model and the three-dimensional Bose Hubbard model. In the former, we obtain a new type of the valence-bond-solid state for the two-boson representation, while in the latter we equillibrate a system of which the size is comparable to a typical experiment of optical lattices.

1. Introduction Quantum Monte Carlo simulation [1] is a robust tool for investigating novel phenomena in strongly correlated quantum systems, potential candidates for a quantum computer. Various quantum systems on lattices are investigated recently as candidates of quantum computers, such as squid arrays [2], trapped ions [3], optical lattices [4], and quantum dots [5]. In most of such systems, the basic degrees of freedom (q-bits) are some discrete quantum degrees of freedom defined on a discrete space (i.e., lattice). Therefore, in theoretical studies of the quantum computers, quantum lattice models, such as the transverse Ising model or the XXZ model, often appear as convenient play-grounds. Boson models are also of a considerable interest from the same view-point since the model system can be the Bose-Hubbard model in some cases such as bosonic atoms trapped in an optical lattice. In addition, these lattice quantum models are worth studying in their own right, since these models exhibits strange quantum phases and critical phenomena which may lead to discovery of novel and fundamental quantum phenomena. In the present paper, we first review one of the most powerful numerical methods developed recently for quantum lattice models, namely, the world-line Monte Carlo method with directedloop algorithm (DLA) [6]. This is a simple framework suitable for general use, ranging from quantum spin systems to soft-core boson systems. It is also the basis of the new methods discussed in the present paper. Then, we discuss two typical examples, the SU(N ) Heisenberg model and the Bose-Hubbard model, and show that a straight-forward application of the DLA to these models does not necessarily yield a practical algorithm but the problem can be solved with some additional improvements. The SU(N ) Heisenberg model is the simplest generalization of the conventional SU(2) Heisenberg model to higher symmetry. For the ground state of this model, there is an interesting theoretical predictions [7] that the ground state is characterised by broken lattice-translational

115

and lattice-rotational symmetry and that the symmetry of the ground state depends periodically on the number of bosons in the representation of the model. The broken-symmetry state are generally called “valence-bond-solid (VBS)” states. In some quantum systems, including the SU(N ) Heisenberg model, there is a phase transition between the VBS state and the N´eel state as some parameter (N in the case of SU(N ) model) is changed. The nature of this phase transition is one of the subjects of the most intense arguments in recent year, because if the phase transition is of the second order, it does not fit in the conventional Landau-Fisher-Wilson framework of critical phenomena. However, this is not the main subject of the present paper. The Bose-Hubbard model is considered to be the model of the ultra-cold atom systems [8] trapped in an optical lattice. [9] Many researchers in condensed-matter physics expect that this system will eventually provide us with a variety of “real” strongly-correlated quantum systems which otherwise are mere toys of theoreticians. The Bose-Hubbard model is the first example system that is materialized in the optical lattice. Here, the central issue is to check various theoretical predictions in a real system, such as the relationship between the superfluidity and the condensation, the development of interference pattern in the static structure factor and its temperature dependence, etc. [10] A Monte Carlo simulation for a uniform system has been done to reveal that the periodic structure in the interference pattern does not precisely characterize the onset of the superfluidity [11], while the larger scale simulation with non-uniform chemical potential, which may be directly comparable with experiments, are still missing. For both systems, the SU(N ) model and the BH model, a straight-forward application of the directed-loop update algorithm becomes unpractical in several crucial conditions. In the case of the SU(N ) Heisenberg model, as we go to larger N and higher representations the number of local states becomes too large to handle in the form of a simple array in the computer memory. [12] In the case of the BH model, as we go to larger on-site repulsion U , the number of vertices, which is proportional to U and the squared upper-bound for the particle numbers, increases. This slows down the simulation significantly. [13, 14] In the present paper, we show methods for solving these problems. A part of the application results has been published in [12] for the SU(N ) model, and more details of the method together with some application results will be published in [14]. 2. Directed-Loop Update in Quantum Monte Carlo The directed-loop update was proposed [6] as a method of updating the configuration in the stochastic series expansion [15, 16]. Since the SSE formulation and the world-line Monte Carlo simulation based on the path-integral are identical in some limit [1], the directed-loop update can be regarded as a method of updating the world-line pattern in the path-integral Monte Carlo simulation. Since the present algorithm is based on the world-line quantum Monte Carlo simulation with the directed-loop update, we first give a brief review on this framework. For more details, see [1]. Suppose we want to study static properties of a d-dimensional quantum system characterized by the Hamiltonian H. The problem is reduced to computing expectation values of various quantities Q(Σ) with the weight W (Σ). Here, Σ(r, τ ) is the scalar field defined on a (d + 1)dimensional space-time, and Q and W are some functionals of Σ. In the present case, Σ(r, τ ) is an integer field called world-line configuration with r specifying a point on a discrete lattice Λ and τ ∈ [0, β) a continuous imaginary time. The weight W is given by the Feynman path integral   W (Σ) ≡



DΣ exp −

 β

dτ L(Σ(τ ), ∂τ Σ(τ )) .

(1)

0

A simulation with the world-line Monte Carlo method is a Markov process in the space of the world-line configurations. The updating procedure defines the transition matrix of the Markov

116

process and is arranged so that the resulting transition matrix satisfies the detailed-balance condition. In the directed-loop algorithm, a Monte Carlo step (often referred to as a sweep) consists of two phases. In the first phase we place vertices that represents various terms in the Hamiltonian. Vertices are often depicted as equal-time (horizontal) lines connecting several spatial positions. For instance, a two-body interaction corresponds to a vertex connecting two spatial positions. In the second phase we modify the world-line configurations. These two phases corresponds to the graph-updating process and the configuration-updating process in a more general framework of loop-cluster algorithm [17]. The procedure of the second phase is achieved by repeating a worm-updating cycles a number of times. In the directed-loop algorithm, this is done by extending a phase space. We extend the phase space by introducing two special points, which we call the head and the tail of the worm at which the continuity condition of the world-line configuration is violated. A worm-updating cycle consists of three procedures: creating a worm, moving the head, and annihilating a worm. The head moves along the temporal axis and changes its direction when it hits a vertex. When a worm head hits a vertex it can also change its spatial position (to which the current position is connected by the vertex). As a result, the Monte Carlo simulation is characterized by the following ingredients: (1) the vertex assignment density for the first phase, (2) the worm creation/annihilation probability for the second phase, and (3) the scattering probability of the head at vertices for the second phase. In general, one can study any lattice model with this prescription provided that it does not have the notorious negative sign problem. However, in practice, one often encounter problems in various specific applications. In what follows, we see what kind of problems may exist and how we can solve those. 3. SU(N ) Model and VBS States We first consider the simple generalization of the conventional Heisenberg model, i.e., the SU(N ) Heisenberg model. The model Hamiltonian we discuss is defined as N J   S αβ (r)S¯βα (r  ) H= N (r ,r  ) α,β=1

(2)

where the operators S αβ and S¯αβ are the generators of the SU(N ) algebra. Here we consider the simple square lattice with the periodic boundary condition. Similar to the case of the conventional SU(2) Heisenberg model, we need to specify the representation of the algebra we work with. Unlike the SU(2) case, however, since the conjugate representation is not identical to the original representation, we have two options in generalizing the SU(2) antiferromagnetic Heisenberg model to the SU(N) symmetry. Namely, we can use two representations, conjugate to each other, for two sub-lattices, while we can also use the same representation for all spins. While both options are equally natural extension of the conventional antiferromagnetic Heisenberg model, we use the first option, i.e., the alternating representation. In Eq.(2), we have used the ¯ for spins on one sub-lattice to remind ourselves of the fact that they belong to a symbol, S, different representation. In this paper, we further specify the model by considering only the symmetric representations (for the sub-lattice A). This corresponds to the Young tableaux with a single row. Still we have a tunable parameter, i.e., the number of columns, which we refer to as n in what follows. This number corresponds to 2S in the SU(2) case. Accordingly, the representation on the sub-lattice B is characterized by the N − 1 rows and n columns. Based on the 1/N expansion treatment, Read and Sachdev [7] predicted that the model with sufficiently large N has a valence-bond-solid (VBS) ground state with spontaneously-broken lattice symmetry. The nature of the ground states may depend on the representation, somewhat analogous to what is known as Haldane’s conjecture in the SU(2) models in one dimension. More

117

specifically, it was suggested [18, 7] that, for the model with the Young tableaux with m rows and n columns, the N -n phase diagram does not strongly depend on m, and has a line of phase transition separating the small-N N´eel region from the large-N VBS region. It was also argued that the nature of the VBS ground state can be classified according to the quotient of the division of n by 4. If n ≡ 1 or 3 (mod 4), the ground state has a columnar ordering with both the translational symmetry and the 90-degree rotational symmetry both broken, whereas if n ≡ 2 (mod 4) it has a “nematic” VBS ordering with only the lattice-rotational symmetry broken. Finally, if n is a multiple of 4, there is no spontaneous breaking of the lattice symmetry. A direct check of the spontaneous breaking-down of the translational symmetry for m = n = 1 was carried out in the previous paper [19], which yielded the transition value of N , namely 4 < N ∗ (m = n = 1) < 5. When we try to apply the above-mentioned framework of the directed-loop algorithm to this system, we immediately encounter the difficulty arising from very large number of local states of a vertex. In order to study systematically the N and n dependence of the ground state we need to generate tables of the scattering probabilities with which the head of the worm scatters at each vertex. The size of this table is proportional to Ns4 where Ns is the number of local state on a site. It is related to N and n by Ns ∝ N n , which makes the table size proportional to N 4n . Since the VBS state appears in the region N > 5n, in order to study, for example, the first uniform VBS state, which is expected around n = 4 and N ∼ 20, the table size will be 2016 , which is well beyond the capacity of the computer memory. We could reduce this number by making use of the fact that for a given initial state the number of possible final states is much smaller than N 2n . This may reduce the table size down to ∝ N 2n , which is 208 in the previous example. However, this may be still too large. Of course, instead of storing the whole table in the memory, we could compute the probability every time the head hits the vertex. However, if this is done in an arbitrary way, it slows down the simulation considerably. In the following, we show how we can handle this problem in a systematic and economical way. In the present article, we present a way of solving this problem by splitting each one of the original SU(N ) spins into n spins in the fundamental representation. In the familiar SU(2) case, this is to split a large spin of the magnitude S into 2S Pauli spins, which was originally proposed in [20]. In the present case, formally we replace the original SU(N ) spin operator with the n-boson representation by the sum of operators corresponding to the fundamental (i.e., the single-boson) representation. S αβ (r) ⇒ S˜αβ (r) ≡

n 

σμαβ (r).

(3)

μ=1

with σμαβ (r) being an SU(N ) spin in the fundamental representation. Accordingly, the phase space summation should be replace by 

TrS (f {S αβ (r)}) = Trσ (P f {S˜αβ (r)}).

(4)

Here P ≡ r P (r) where P (r) is the symmetrizer of n particles on the site r. Let us assume that a site r  belongs to the sub-lattice B for which the conjugate representation is used. Note that there is a one-to-one correspondence between states in one representation and those in the conjugate representation. In the case of the conjugate to the fundamental representation, for example, if we define |¯ αr  ≡ |1, 2, · · · , α − 1, α + 1, · · · , N r  ,

(5)

the singlet state can be formed between two sites r and r  by |singlet =

N 

|αr ⊗ |¯ αr 

α=1

118

(6)

provided that the states |¯ αr  are defined with appropriate signs. By using |¯ αr  as the basis vector for the sub-lattice B, the operator σμαβ (r  ) can be expressed as the same matrix that represents −σμβα (r) on sub-lattice A. Therefore, we can rewrite the pair Hamiltonian in the following way. N  n n  N  J   σ βα (r)σνβα (r  ). (7) Hrr = − N α=1 β=1 μ=1 ν=1 μ This form of the Hamiltonian immediately suggests a loop algorithm to be used. Namely, we should assign a graph (i.e., a vertex) which may be represented by two horizontal lines (Fig.1(a)). In the directed loop framework, this vertex means that when a worm head arrives at one of the four legs of a vertex, it should hop, with probability 1, to the neighboring site and change the direction of the motion. (Because of this deterministic nature of the vertex, the resulting directed-loop algorithm is almost identical with the loop algorithm.) The density of the vertices is J/N for a given pair of σ-spins if the local states on both ends are identical. Otherwise, the density is 0. With this density, we assign the vertices to all the n2 pairs of σ-spins. As in the SU(2) case, the symmetrization operator can be taken into account by random reconnection of the ends of the n lines at τ = β to those at τ = 0. [20] In Fig.1(b), loops of a system with only 4 spins are illustrated as an example. Along a loop, the local state is the same on any point, and it is one of the N values. Different loops can have different values.

D

C

Figure 1. The split-spin algorithm for the SU(N ) Heisenberg model in the case of two-boson representation. The elementary graph (a) and an example loop pattern in a 4-spin system (b).

In an actual simulation, we do not directly work with each individual σ spins. The actual computer program is written based on the “occupation number”, i.e., the number of bosons that has a specified color, nα (r) ≡

n 

σμ (r)

(8)

μ=1

where σμ (r) = 1, 2, · · · , N is the local state (color) of the μ-th boson. Shifting to this “secondquantized” picture can be done by the “coarse-graining” introduced in [21]. As a demonstration of the new algorithm, we here present results of simulation of the SU(15) Heisenberg model with the two-boson representation upto L = 32. According to the theoretical predictions [7], the ground state of the two-boson representation model should be the nematic VBS state at sufficiently large value of N . This is the state where the translational symmetry is

119

preserved while the 90-rotational symmetry is broken. In other words, the correlation between two nearest-neighbor spins is uniform but direction dependent. In our previous paper [19], we presented some results on the higher representations as well as the results on the fundamental representation. However, no clear evidence was found for the spontaneous rotational symmetry breaking in the case of two- and three- boson representations even if N is larger than the critical value, in contrast to the theoretical prediction. In the present paper, we carry out further parameter search and obtain an evidence of the rotational symmetry breaking. We define the following local quantity. A(r) ≡ Bx (r) − By (r) where Bμ (r) ≡

(9)

N 1  nα (r)nα (r + eμ ) n2 α=1

(10)

When integrated over all the lattice points, A(r) clearly yields the order parameter characterizing the spontaneous rotational symmetry breaking. Instead of measuring the integrated quantity, however, we here compute the two-point correlation with respect to this local quantity at the distance L/2 where L is the system size, A(0, 0)A(L/2, 0). If there is a finite longe-renge ordering, the correlation must converge to a finite value as L increases, whereas it should decrease exponentially when there is no long-range ordering.

㪓㪘㩿㪇㪃㪇㪀㪘㩿㪣㪆㪉㪃㪇㪀㪕

0.01

0.001

0.0001

1e-05

1e-06

β,0=1.0 β,0=2.0 β,0=4.0

1

10

100

㪣 Figure 2. The two-point correlation function of A, the rotation-symmetry-breaking orderparameter, of the SU(15) Heisenberg model as a function of the system size. The horisontal axis is shifted according to the simulation length. (Each symbol corresponds to a simulation 4 times shorter than the one represented by its next right neighbor.)

120

In Fig.2, we plot the correlation as a function of the system size in log-log scale. The horizontal axis is shifted according to the simulation length, so that the convergence as a function of the simulation length becomes more evident. In the initial equilibration stage of the simulation, the system is forced to have a finite asymmetry by making the coupling constant asymmetric. Therefore, for short simulations, the quantity A(r) is biased (to be positive), resulting in a relatively large non-zero values for shorter simulation, while for longer simulations such a bias should become negligible. For most cases presented in the figure, in particular at higher temperatures βJ/N = 1 and 2, the simulation has reached the equilibrium, whereas for βJ/N = 4 and L = 32, it is close to the equilibrium value but may still be larger than that. However, the trend is very clear that at sufficiently low temperature the two point correlation function does not exponentially decay (as a function of L). Rather, it is likely to converge to a finite value. 4. Bose Hubbard Model The second example of recent progress in the quantum Monte Carlo method we see in the preset paper is the Bose Hubbard model. The Bose-Hubbard model is considered to be the model of the ultra-cold atom systems [8] trapped in an optical lattice. Recently the Mott-superfluidity phase transition was observed in an optical lattice system [9]. Since the real system is “small” unlike most conventional solid-state-physics experiments, direct comparison between experiments and numerical simulation is possible. The system size in a well-known experiment by Greiner et al. [22] was L = 64. In the present paper we present some result of a simulation of the same systems size together with the technique that makes the simulation of such large systems possible. In the present paper, we consider the following Bose Hubbard model in a non-uniform chemical potential. H=−

 U t  † (bR bR + h.c.) + nR (nR − 1) − μR n R Z (RR ) 2 R R

(11)

The non-uniform chemical potential μR represents the trapping potential which is usually approximated by a parabolic form μR = μ0 − Ωr 2 . In the directed loop algorithm, the number of local states must be finite. Therefore, when we apply the algorithm to the BH model, we have to truncate the number of the particles on a single lattice point. If this artificial upper bound is not touched (so frequently) in the actual simulation, the error due to the truncation can be neglected. Therefore, in order to reduce the systematic error due to this truncation, we might want to make the maximum number as large as possible. However, this may make the number of vertices very large because the vertex density increases as a function of the artificial upper bound [13, 14]. To be more specific, the number of vertices is proportional to U and the squared upper-bound. Existence of too many vertices slows down the simulation significantly. In the previous paper [13], we presented a method for solving this problem. However, the increase of the computational time is not the only problem that arises in the Boson simulation. Another problem may arise from the memory requirement when the worm passes straight through most of the two-site vertices. In the present paper, we discuss a method for solving this problem and demonstrate the efficiency of the resulting algorithm by performing a large scale simulation whose size is the same as the above-mentioned experiment [22]. The essential idea is to skip all the stochastic events that actually do not change the direction or the spatial position of the worm head. In the first modification [13], we only skipped the straight-passings at one-site vertices. In the present paper, we propose a procedure in which straight-passings of all types of vertices are skipped. As a result, we no longer carry out the first phase, i.e., the phase of placing the vertices, separately. The vertices are created only when the

121

worm head changes the direction and/or the position and, as a result, a kink is generated there. In this regard, the resulting algorithm is very close to the worm algorithm [23], in which there is no notion of vertices. However, the present algorithm may be simpler in that we consider only scattering processes of the worm head whereas in the worm algorithm several different types of procedures, such as jumps/anti-jumps and reconnections must be executed in addition to somewhat complicated processes of choosing the next position of the worm head according to the “mean-field” that the head feels [23].

C

D

E

+

+

+

Figure 3. The head’s motion in the improved directed-loop algorithm for the Bose-Hubbard model. The open triangle indicates the position and the direction of the worm head just after the previous step, whereas the filled triangle represents the worm head at the end of the current step. The short right-pointing arrow indicates the position of the stochastically generated first scattering time τfirst . The local states are indicated by the thickness of the vertical lines. The dashed vertical line represents the part that will be changed by the current step. The dashed horizontal ones are vertices. Three cases are illustrated: (a) the first scattering time τfirst is outside of the current constant-environment interval, I, and there is no change in the state on the current site at the end of I, (b) τfirst is beyond the boundary of I, and there is a change in the state at the end of I, (c) τfirst is within I. The actual prescription of avoiding all the straight-passing processes is simple. It is analogous to the decay of a radio active atom. When the decay rate is λ, the probability of having a decay event in an infinitesimal time interval Δτ is λΔτ provided that the event has not taken place at any earlier time. We can simulate this decaying process by discretizing the time. Namely, starting from the first time window, we stochastically decide whether the atom decays or not for each time window, and when the decay event finally takes place at some time window we stop there. An alternative procedure which is statistically equivalent to the former is that we first stochastically generate the first decay time, wait until this time, let the atom decay, and stop there. Although statistically equivalent, in the first procedure, the number of operation is proportional to the skipped time-windows whereas the number of operations is O(1) in the latter procedure. If we apply this analogy to the process of a worm head passing a number of two-site vertices, we obtain the following prescription. Consider a head traveling along a vertical line and an “constant-environment” interval I is ahead of the worm. In this imaginary-time interval, there is no change in the environment, namely, there is no change in the local state at any of the

122

site that interacts with the current site. In other words, the constant-environment interval is an interval in which the molecular field felt by the worm’s head is constant. Because of this definition of the interval, the vertex density is uniform in this interval. Here we have to consider all types of vertices, i.e., all the interaction terms in the Hamiltonian that involves the current site. We here use the index i to specify the type of the vertices. For example, if there is only two-body interaction terms, i specifies a nearest neighbor site. We let ρi denote the vertex density of the type i in the current interval I. The probability of having a scattering event in the imaginary time interval Δτ is Δτ λ = Δτ ×



(scatter)

ρi pi

(12)

i (scatter)

is the probability of the head being scattered when it hits a vertex of type i. where pi Then we can generate the first scattering time τfirst according to the distribution P (τfirst ) ∝ e−λτ .

(13)

This can be achieved by generating a uniform random number r ∈ [0, 1) and define τfirst = λ−1 log

1 r

(14)

If this turns out to fall out of the constant-environment interval I we simply let the head proceed to the end of the interval. In case the head hits a kink as a result, let it be scattered at the kink in the same way as the conventional directed-loop update (Fig.3(b)). If there is no kink there, we do nothing other than moving the head to the end of I (Fig.3(a)). On the other hand, if tfirst falls within I, we let the head advance by tfirst . We now need to choose the the type i of the vertex that actually scatters the head. This should be done with the probability, (scatter) . (15) pi ∝ ρi pi Once the type of the vertex i is chosen, we place a vertex of this type at the head’s position and let the head be scattered by this vertex. The scattering procedure is the same as that of the conventional directed-loop update. Then, in all cases, we start again from the beginning with the new constant-environment interval I  ahead of the head’s new position. In order to demonstrate that the present method makes it possible to simulate a system of which the size is comparable to the real experiment, we perform a Monte Carlo simulation with the parameters which are close to the ones used in the experiment [22]. Figure 4 presents the phase diagram of the BH model in three dimensions (a) and the density profile at U = 20t, μ0 = 66t, Ω = 0.08t, βt = 5.0 and nmax = 15. The system size is L = 64. With these parameters, the average number of bosons is 2.6 × 105 . Both the system size and the number of particles are close to the experimental values [22]. The characteristic shape of the density profile, which is often referred to as “big wedding cake” structure, is clearly observed in the figure. The equilibration has been confirmed by performing the simulation with various simulation length and checking the absence of systematic dependence on the length of the simulation. 5. Summary In the present paper, we have reviewed the directed loop algorithm, one of the standard method that provides a robust and general framework for the quantum Monte Carlo simulation of a very broad variety of quantum systems. Based upon this framework, we develop two algorithms for quantum spin and boson models. While in both cases a straight-forward application of

123

C

D

μ7

ρ

κ V7

TC

Figure 4. The phase diagram of the uniform Bose-Hubbard model (a), and the ’big-weddingcake’ structure in the particle density profile. The parameters used here are U = 20t, μ0 = 66t, Ω = 0.08t, βt = 5.0, nmax = 15.

the conventional directed loop algorithm is not practical, we have demonstrated that there are solutions to the problems that greatly enhances the efficiency of the method. We have also presented some new physical results in order to illustrate the efficiency of the resulting algorithms. In the case of the SU(N) model, in particular, we have presented the result that suggests the spontaneous breaking of the lattice rotational symmetry at N = 15 for the twoboson representation model. Acknowledgments The computation in the present work is executed on computers at the Supercomputer Center, Institute for Solid State Physics, University of Tokyo. The present work is financially supported by MEXT Grant-in-Aid for Scientific Research (B) (19340109), MEXT Grant-inAid for Scientific Research on Priority Areas “Novel States of Matter Induced by Frustration” (19052004), and by Next Generation Supercomputing Project, Nanoscience Program, MEXT, Japan

124

References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23]

Kawashima N. and Harada K., J. Phys. Soc. Jpn. 73, 1379 (2004). Clarke J. and Wilhelm F. K., Nature 453, 1031 (2008). Cirac J. I. and Zoller P., Phys. Rev. Lett. 74, 4091 (1995). Brennen G. K., Caves C. M., Jessen P. S., and Deutsch I. H., Phys. Rev. Lett. 82, 1060 (1999). Loss D. and DiVincenzo D. P., Phys. Rev. A 57, 120 (1998). Sylju˚ asen O. F. and Sandvik A. W.: Phys. Rev. E. 66, 046701 (2002). Read N. and Sachdev S., Phys. Rev. B 42, 4568 (1990). Inouye S., et al., Nature 392, 151 (1998). Greiner M., et al., Nature 415, 39 (2002). Griffin A., Nature Physics 4, 592 (2008). Kato Y., Zhou Q., Kawashima N. and Trivedi N., Nature Physics 4, 617 (2008). Kawashima N., Phys. Rev. Lett. 98, 057202 (2007). Kato Y., Suzuki T. and Kawashima N., Phys. Rev. E 75, 066703 (2007). Kato Y. and Kawashima N., unpublished. Sandvik A. W., J. Phys. A: Math. Gen. 25, 3667 (1992). Sandvik A. W., Phys. Rev. B 59, 14157 (1999). Kawashima N. and Gubernatis J. E., Phys. Rev. E 51, 1547 (1995). Read N. and Sachdev S., Nuc. Phys. B316, 609 (1989). Harada K., Kawashima N. and Troyer M., Phys. Rev. Lett. 90, 117203 (2002). Kawashima N. and Gubernatis J. E., Phys. Rev. Lett. 73, 1295 (1994). Harada K. and Kawashima N., Phys. Rev. E 66, 056705 (2002); ibid. 67, 039903(E) (2003). Greiner M., Mandel O., Esslinger T., H¨ ansch T. W., and Bloch I., Nature 415, 39 (2002). Prokof’ev N. V., Svistunov B. V. and Tupitsyn I. S., Sov. Phys. JETP 87, 310 (1998).

125

126

Proceedings of the International Workshop on Statistical-Mechanical Informatics September 14–17, 2008, Sendai, Japan

Machine Learning with Quantum Relative Entropy Koji Tsuda Max Planck Institute for Biological Cybernetics, Spemannstr. 38, T¨ ubingen, 72076 Germany E-mail: [email protected] Abstract. Density matrices are a central tool in quantum physics, but it is also used in machine learning. A positive definite matrix called kernel matrix is used to represent the similarities between examples. Positive definiteness assures that the examples are embedded in an Euclidean space. When a positive definite matrix is learned from data, one has to design an update rule that maintains the positive definiteness. Our update rule, called matrix exponentiated gradient update, is motivated by the quantum relative entropy. Notably, the relative entropy is an instance of Bregman divergences, which are asymmetric distance measures specifying theoretical properties of machine learning algorithms. Using the calculus commonly used in quantum physics, we prove an upperbound of the generalization error of online learning.

1. Introduction Machine learning and quantum physics are totally different subjects which do not seem to be related. In machine learning, the central aim is to develop a good algorithm to learn from the data for predicting some important property of yet unseen objects. Figure 1 illustrates a pattern recognition task of predicting which category a shape belongs to (i.e., red or blue). Each shape is represented as a numerical vector. From a set of training examples, we need to learn a discriminant function f (x) which classifies x to positive if f (x) > 0, and otherwise to negative. Category labels of a set of test examples are predicted by the discriminant function. The two class classification setting appears in many applications like character recognition, speech recognition and gene expression profile analysis. However, recent rise of kernel methods [7] has brought the two subjects slightly closer. Kernel methods represent the similarity of n objects (e.g., shapes) as n × n kernel matrix W . When the n objects include both training and test examples, all computation in learning and prediction are completely determined by W . By definition, W has to be positive semidefinite, W  0, otherwise it causes problem in learning (i.e., local minima in parameter optimization). Usually, the kernel matrix is determined a priori by users’ prior knowledge, but it would also be possible to learn the matrix from data [13]. Learning is basically a parameter adjustment process. In batch learning, a set of training examples is given a priori, and the parameters are determined as the solution of an optimization problem. On the other hand, in online learning, we assume a stream of examples. At a trial, an example is given to the learning machine. Prediction with respect to the example is computed based on the current parameters. Then, the true value is exposed and the loss is incurred by comparing the prediction and the true value. Finally the parameters are updated for better prediction in next steps. The sum of losses from trials 1 to T is called “total loss”. A good learning algorithm has a good updating rule leading to small total loss. Gradient descent and

127

Figure 1. Typical pattern recognition task. exponentiated gradient descent are among the most popular algorithms [4]. However, they deal with numerical vectors, not positive semidefinite matrices.  , W  0, W   0, tr(W ) = tr(W  ) = 1, the quantum Given density matrices W and W relative entropy is defined as  , W ) = tr(W  log W  −W  log W ). ΔF (W This formulation is introduced by Umegaki [5], and regarded as a natural extension of the von Neumann entropy[5]. As kernel matrices are positive semidefinite, the quantum entropy can be used for computing the distance between kernel matrices. In designing an online learning algorithm, a distance measure is necessary. In each trial, the parameters are shifted to reduce the loss of an example, but should not be moved too much in order not to forget the previously learned result. The updating rule is determined by the trade-off between the loss and the distance to the previous parameters. Using the quantum relative entropy, an updating rule for kernel matrices, called “matrix exponential gradient updates” [13], can be derived. It has an advantage that the positive definiteness is preserved after updates. To evaluate the algorithm theoretically, it is possible to build an upperbound of the total loss [4]. For matrix parameters, it is not straightforward to generalize the bounds for vectors due to the lack of commutativity (i.e., AB = BA). However, it turns out that the bound can still be derived via the Golden-Thompson inequality, tr(exp(A + B)) ≤ tr(exp(A) exp(B)),

(1)

which holds for arbitrary symmetric matrices A and B. We also need the following basic inequalities for symmetric matrices. The first one may be seen as a generalization of Jensen’s inequality when applied to the matrix exponential. Lemma 1. If a symmetric matrix A ∈ Rd×d satisfies 0 ≺ A  I, then exp(ρ1 A+ ρ2 (I − A))  exp(ρ1 )A + exp(ρ2 )(I − A) for finite ρ1 , ρ2 ∈ R. The rest of this paper is organized as follows. Section 2 prepares mathematical definitions about kernels and Bregman divergences. In Section 3, the online kernel learning problem is defined and the bound of total loss is proven. Section 4 presents an experiment illustrating the tightness of the bound. Section 5 concludes the paper with discussions. 2. Preliminaries 2.1. Kernels Kernel methods process data based on pairwise similarity function called kernel function. Given two objects x, x ∈ X , the kernel function is described as w(x, x ). The domain X can be

128

φ F

X x

d(x,x )

φ( x )

φ(x )

x

Figure 2. Given a space X endowed with a kernel, a distance can be defined between points of X mapped to the feature space F associated with the kernel. This distance can be computed without explicitly knowing the mapping φ thanks to the kernel trick. numerical vector, strings, graphs, etc [8]. For example, in pattern recognition with support vector machines, the discriminant function is described as f (x) =

 

αi w(x, xi ),

i=1

where x1 , . . . , x are training examples, and α denotes weight parameters. If and only if k is positive semidefinite, the whole domain X can be embedded in a Hilbert space such that the kernel is preserved as the inner product (Figure 2). In the following, denote by W the matrix of kernel values involving all training and test examples. In most cases, the kernel function is given a priori. However, there are multiple literature [9, 10, 16] about the “meta”-problem of learning kernels. Sometimes, it is possible to measure the similarity between some pairs, but not all. In protein sequence comparison, the similarity of close homolog can be measured reliably by sequence alignment, but it is difficult to come up with a reasonable similarity measure for distant sequences. In such cases, a series of measurements yt = tr(W X t ), where X t is a sparse matrix, is given, and W is estimated based on measurements. In estimation, we have to make sure that W is positive semidefinite. In [16], the matrix is parameterized as W = XX  and X is optimized, but this method introduces non-uniqueness of the solution and non-convexity. It will be shown in the next section that our learning rule can keep positive definiteness without any correction steps. 2.2. Bregman divergence In machine learning, Bregman divergences are an important tool to define asymmetric distances between parameters [2]. The Kullback-Leibler divergence, Hellinger distance, Euclidean distance are instances of Bregman divergences. In this section, it is shown that the quantum relative entropy is also an instance of Bregman divergences. We found that Petz [6] recently pointed out this fact and discussed the relationship with the relative operator entropy. If F is a real-valued strictly convex differentiable function on the parameter domain and  and W is f (W ) := ∇W F(W ), then the Bregman divergence between two parameters W defined as  , W ) = F(W  ) − F(W ) − tr((W  − W )f (W )T ). ΔF (W

129

 , W ) is also strictly convex in its first argument. Furthermore, Since F is strictly convex, ΔF (W the gradient in the first argument has the following simple form:   ∇W f ΔF (W , W ) = f (W ) − f (W ), since ∇A tr(AB) = B  . Quantum relative entropy is obtained when F(W ) = tr(W log W − W ). The strict convexity of this function is well known [5]. Furthermore, ∇W F(W ) = f (W ) = log W .   If W = i λi v i v i is our notation for the eigenvalue decomposition, we can rewrite the divergence as   2 ˜i − ˜ i log λ ˜ i log λj (˜ ,W) = v (2) λ λ ΔF (W i vj ) . i

i,j

This divergence quantifies the difference in the eigenvalues as well as the eigenvectors. When ˜ i = v i ), then the divergence becomes the usual relative both eigensystems are the same (i.e. v ˜i λ ˜ ,W) =  λ entropy between the eigenvalues ΔF (W i i log λi . 3. Online Learning of Kernels In this section, we consider on-line learning which proceeds in trials. In the most basic form, the on-line algorithm produces a parameter W t at trial t and then incurs a loss Lt (W t ). In this paper the parameters are square matrices in Rd×d . In the refined form several actions occur in each trial: The algorithm first receives an instance X t in some instance domain X . It then produces a prediction Yˆt for the instance X t based on the algorithm’s current parameter matrix W t and receives a label y t in some labeling domain Y. Finally it incurs a real valued loss L(ˆ yt , y t ) and updates its parameter matrix to W t+1 . For example in Section 3.1 we will analyze an on-line algorithm that predicts with yˆt = yt , yt ) = (ˆ yt − yt )2 . tr(W t X t ) and is based on the loss Lt (W t ) = L(ˆ In this section we only discuss updates at a high level and only consider the basic form of the on-line algorithm. We assume that Lt (W ) is convex in the parameter W (for all t) and that the gradient ∇W Lt (W ) is a well-defined matrix in Rd×d . In the update we aim to solve the following problem (see e.g. [4, 3]): W t+1 = argminW

ΔF (W , W t ) + ηLt (W ),

(3)

where the convex function F defines the Bregman divergence and η is a non-negative learning rate. The update balances two conflicting goals: staying close to the old parameter W t (as quantified by the divergence) and achieving small loss on the current labeled instance. The learning rate becomes a trade-off parameter. Setting the gradient with respect to W of the objective in the argmin to zero, we obtain W t+1 = f −1 (f (W t ) − η∇W Lt (W t+1 )) .

(4)

If we assume that f and f −1 preserve symmetry, then constraining W in (3) to be symmetric1 changes the update to W t+1 = f −1 (f (W t ) − η sym(∇W Lt (W t+1 ))) , where sym(X) = (X + X  )/2. 1

Note that square matrices with real eigenvalues are not closed under addition.

130

(5)

The above implicit update is usually not solvable in closed form. A common way to avoid this problem [4] is to approximate ∇W Lt (W t+1 ) by ∇W Lt (W t ), leading to the following explicit update for the constraint case: W t+1 = f −1 (f (W t ) − η sym(∇W Lt (W t ))) . In the case of the quantum relative entropy, the functions f (W ) = log W and f −1 (Q) = exp Q clearly preserve symmetry. When using this divergence we arrive at the following (explicit) update: ⎛ ⎞ any sq. matrix sym.pos.def. 

 ⎜ ⎟ (6) −η sym( ∇W Lt (W t ) )⎠ . Wt W t+1 = exp ⎝log 



symmetric

 symmetric positive definite



We call this update the Unnormalized Matrix Exponentiated Gradient Update. Note that f (W ) = log W maps symmetric positive definite matrices to arbitrary symmetric matrices, and after adding a scaled symmetrized gradient, the function f −1 (Q) = exp Q maps the symmetric exponent back to a symmetric positive definite matrix. When the parameters are constrained to trace one, then we arrive at the Matrix Exponentiated Gradient (MEG) Update, which generalizes the Exponentiated Gradient (EG) update of [4] to non-diagonal matrices: W t+1 =

1 exp (log W t − η sym(∇W Lt (W t ))) . Zt

(7)

where Zt = tr (exp (log W t − η sym(∇W Lt (W t )))) is the normalizing constant. 3.1. Relative Loss Bounds In this section we prove a certain type of relative loss bound for the MEG update which generalize the analogous known bounds for the EG algorithm to the non-diagonal case. For the sake of simplicity we now restrict ourselves to the case when the algorithm predicts yt , yt ) := (ˆ yt − yt )2 . with yˆt = tr(W t X t ) and the loss function is quadratic: Lt (W t ) = L(ˆ We begin with the definitions needed for the relative loss bounds. Let S = (X 1 , y1 ), . . . , (X T , yT ) denote a sequence of examples, where the instance matrices X t ∈ Rd×d and the labels yt ∈ R. For any symmetric positive semi-definite matrix U with tr(U ) = 1, define T 2 its total loss as LU (S) = t=1 (tr(U X t ) − yt ) . The total loss of the on-line algorithm is T LM EG (S) = t=1 (tr(W t X t ) − yt )2 . We prove a bound on the relative loss LM EG (S) − LU (S) that holds for any comparator parameter U . The proof generalizes a similar bound for the Exponentiated Gradient update (Lemmas 5.8 and 5.9 of [4]). The relative loss bound is derived in two steps: Lemma 2 upper bounds the relative loss for an individual trial i.t.o. the progress towards the comparator parameter U (as measured by the divergence). In the second Lemma 3, the bound for individual trials is summed to obtain a bound for a whole sequence. Lemma 2. Let W t be any symmetric positive definite matrix. Let X t be any symmetric matrix whose eigenvalues have range at most r, i.e. λmax (X t ) − λmin (X t ) ≤ r. Assume W t+1 is produced from W t by the MEG update with learning rate η, and let U be any symmetric positive semi-definite matrix. Then for any b > 0 and a = η = 2b/(2 + r 2 b): a (yt − tr(W t X t ))2 −b (yt − tr(U X t ))2 ≤ ΔF (U , W t ) − ΔF (U , W t+1 )



 

MEG-loss U -loss progress towards U

131

The proof is given in Appendix. In the proof, we use the Golden-Thompson inequality (1). and the approximation of the matrix exponential (Lemma 1). Lemma 3. Let S be any sequence of examples with positive symmetric matrices as instances and real labels and let r be an upper bound on the range of eigenvalues of each instance matrix of S. Let W 1 and U be arbitrary symmetric positive definite initial and comparison matrices, respectively. Then for any c such that η = 2c/(r 2 (2 + c)),    c 1 1 LU (S) + + r 2 ΔF (U , W 1 ). (8) LM EG (S) ≤ 1 + 2 2 c Proof. For the maximum tightness of (2), a should be chosen as a = η = 2b/(2 + r 2 b). Let b = c/r 2 , and thus a = 2c/(r 2 (2 + c)). Then (2) is rewritten as 2c (yt − tr(W t X t ))2 − c(yt − tr(U X t ))2 ≤ r 2 (ΔF (U , W t ) − ΔF (U , W t+1 )) 2+c Adding the bounds for t = 1, · · · , T , we get 2c LM EG (S) − cLU (S) ≤ r 2 (ΔF (U , W 1 ) − ΔF (U , W t+1 )) ≤ r 2 ΔF (U , W 1 ), 2+c which is equivalent to (8). Assuming LU (S) ≤ Lmax and ΔF (U , W 1 ) ≤ dmax , the bound (8) is tightest when c =  r 2dmax /Lmax . With this choice of c, we have  r2 2Lmax dmax + ΔF (U , W 1 ). 2  In particular, if W 1 = d1 I, then ΔF (U , W 1 ) = log d − i λi log λ1i ≤ log d. Additionally, when LM EG (S) − LU (S) ≤ r

2

d . Lmax = 0, then the total loss of the algorithm is bounded by r log 2 Note that the MEG algorithm generalizes the EG algorithm of [4]. In the case of linear regression, a square of a product of dual norms appears in the bounds for the EG algorithm: 2 . Here u is a parameter vector and X ||u||21 X∞ ∞ is an upper bound on the infinity norm of the instance vectors xt . Note the correspondence with the above bound (which generalizes the bounds for EG to the non-diagonal case): the one norm of the parameter vector is replaced by the trace and the infinity norm by the maximum range of the eigenvalues.

4. Experiments In this section, our technique is applied to learning a kernel matrix from a set of distance measurements. This application is not on-line per se, but it shows nevertheless that the theoretical bounds can be reasonably tight on natural data. When K is a d × d kernel matrix among d objects, then the Kij characterizes the similarity between objects i and j. In the feature space, Kij corresponds to the inner product between object i and j, and thus the Euclidean distance can be computed from the entries of the kernel matrix [7]. In some cases, the kernel matrix is not given explicitly, but only a set of distance measurements is available. The data are represented either as (i) quantitative distance values (e.g., the distance between i and j is 0.75), or (ii) qualitative evaluations (e.g., the distance between i and j is small) [16, 12]. Our task is to obtain a positive definite kernel matrix which fits well to the given distance data. In the experiment, we consider the on-line learning scenario in which only one distance example is shown to the learner at each time step. The distance example at time t is described

132

1.8

0.45

1.6

0.4

1.4 Classification Error

0.35

Total Loss

1.2 1 0.8 0.6

0.3 0.25 0.2

0.4

0.15

0.2

0.1

0 0

0.5

1

1.5 Iterations

2

2.5

0.05 0

3 5

x 10

0.5

1

1.5 Iterations

2

2.5

3 5

x 10

Figure 3. Numerical results of on-line learning. (Left) total loss against the number of iterations. The dashed line shows the loss bound. (Right) classification error of the nearest neighbor classifier using the learned kernel. The dashed line shows the error by the target kernel. as {at , bt , yt }, which indicates that the squared Euclidean distance between objects at and bt is yt . Let us define a time-developing sequence of kernel matrices as {W t }Tt=1 , and the corresponding points in the feature space as {xti }di=1 (i.e. (W t )ab = x ta xtb ). Then, the total loss incurred by this sequence is T  

xtat − xtbt 2 − yt

2

=

t=1

T 

(tr(W t X t ) − yt )2 ,

t=1

where X t is a symmetric matrix whose (at , at ) and (bt , bt ) elements are 0.5, (at , bt ) and (bt , at ) elements are -0.5, and all the other elements are zero. We consider a controlled experiment in which the distance examples are created from a known target kernel matrix. We used a 52 × 52 kernel matrix among gyrB proteins of bacteria (d = 52). This data contains three bacteria species (see [11] for details). Each distance example is created by randomly choosing one element of the target kernel. The initial parameter was set as W 1 = d1 I. When the comparison matrix U is set to the target matrix, LU (S) = 0 and Lmax = 0, because all the distance examples are derived from the target matrix. Therefore we choose learning rate η = 2, which minimizes the relative loss bound of Lemma 3. The total loss of the kernel matrix sequence obtained by the matrix exponential update is shown in Figure 3 (left). In the plot, we have also shown the relative loss bound. The bound seems to give a reasonably tight performance guarantee—it is about twice the actual total loss. To evaluate the learned kernel matrix, the prediction accuracy of bacteria species by the nearest neighbor classifier is calculated (Figure 3, right), where the 52 proteins are randomly divided into 50% training and 50% testing data. The value shown in the plot is the test error averaged over 10 different divisions. It took a large number of iterations (∼ 2×105 ) for the error rate to converge to the level of the target kernel. In practice one can often increase the learning rate for faster convergence, but here we chose the small rate suggested by our analysis to check the tightness of the bound. 5. Discussion In this paper, we have shown that online learning learning algorithms can be derived from quantum relative entropy, and the upperbound of its total loss can be derived. The main

133

difficulty in deriving the bound was due to non-commutativity of matrices, and the quantumstatistical calculus such as Golden-Thompson inequality was very effective in breaking the wall. Since the introduction of the quantum entropy to machine learning in [13], several follow-up studies have appeared. We dealt with a full rank matrix W , but if W is low-rank, it can represent a subspace in a high dimensional space. According to this idea, online learning algorithms for principal component analysis and subspace Winnow were proposed [15, 14]. They use very similar updates as shown in this paper, and the relative loss bounds can be derived. Recently, [1] proposed to use the matrix updates to solve certain classes of semidefinite programs with promising results. We hope such attempts lead to increased communication between quantum physics and machine learning. Reference [1] S. Arora and S. Kale. A combinatorial, primal-dual approach to semidefinite programs. In Annual ACM Symposium on Theory of Computing (STOC), pages 227–236. ACM, 2007. [2] L.M. Bregman. Finding the common point of convex sets by the method of successive projections. Dokl. Akad. Nauk SSSR, 165:487–490, 1965. [3] J. Kivinen and M. K. Warmuth. Relative loss bounds for multidimensional regression problems. Machine Learning, 45(3):301–329, 2001. [4] J. Kivinen and M.K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1–63, 1997. [5] M.A. Nielsen and I.L. Chuang. Quantum Computation and Quantum Information. Cambridge University Press, 2000. [6] D. Petz. Bregman divergence as relative operator entropy. Acta Mathematica Hungarica, 116:127–131, 2007. [7] B. Sch¨ olkopf and A. J. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002. [8] B. Sch¨ olkopf, K. Tsuda, and J.P. Vert, editors. Kernel Methods in Computational Biology. MIT Press, 2004. [9] S. Shai-Shwartz, Y. Singer, and A.Y. Ng. Online and batch learning of pseudo-metrics. In C.E. Brodley, editor, Machine Learning, Proceedings of the Twenty-first International Conference (ICML 2004). ACM, 2004. [10] I.W. Tsang and J.T. Kwok. Distance metric learning with kernels. In Proceedings of the International Conference on Artificial Neural Networks (ICANN’03), pages 126–129, 2003. [11] K. Tsuda, S. Akaho, and K. Asai. The em algorithm for kernel matrix completion with auxiliary data. Journal of Machine Learning Research, 4:67–81, May 2003. [12] K. Tsuda and W.S. Noble. Learning kernels from biological networks by maximizing entropy. Bioinformatics, 20(Suppl. 1):i326–i333, 2004. [13] K. Tsuda, G. R¨ atsch, and M.K. Warmuth. Matrix exponentiated gradient updates for online learning and Bregman projection. Journal of Machine Learning Research, 6:995–1018, 2005. [14] M.K. Warmuth. Winnowing subspaces. In Proceedings of the 24th International Conference for Machine Learning (ICML 07), pages 999–1006, 2007. [15] M.K. Warmuth and D. Kuzmin. Online kernel pca with entropic matrix updates. In Proceedings of the 24th International Conference for Machine Learning (ICML 07), pages 465–472. ACM Press, 2007. [16] E.P. Xing, A.Y. Ng, M.I. Jordan, and S. Russell. Distance metric learning with application to clustering with side-information. In S. Thrun S. Becker and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, pages 505–512. MIT Press, Cambridge, MA, 2003.

Appendix: Proof of Lemma 2 Let δt = −2η(tr(XW t ) − yt ), then the right hand side of (2) is rewritten as ΔF (U , W t ) − ΔF (U , W t+1 ) = δt tr(U X t ) − log tr(exp(log W t + δt sym(X t ))). Therefore, (2) is equivalent to f ≤ 0, where f = log tr(exp(log W t + δt sym(X t ))) − δt tr(U X t ) + a(yt − tr(W t X t ))2 − b(yt − tr(U X t ))2 . Let us bound the first term. Due to Golden-Thompson inequality (1), we have tr (exp(log W t + δt sym(X t ))) ≤ tr (W t exp(δt sym(X t ))) .

(.1)

The right hand side can be rewritten as exp(δt sym(X t )) = exp(r0 δt ) exp(δt (sym(X t ) − r0 I)).

134

Using Jensen’s inequality for matrices (Lemma 1), we have exp(δt (sym(X t ) − r0 I))  I − sym(Xrt )−r0 I (1 − exp(rδt )). Here 0 ≺ A  I, because r0 I ≺ sym(X t )  (r0 + r)I by assumption. Since W t is strictly positive definite, tr(W t B) ≤ tr(W t C) if B  C. So, the right hand side of (.1) can be written as   tr(W t X t ) − r0 (1 − exp(rδt )) , tr (W t exp(δt sym(X t ))) ≤ exp(r0 δt ) 1 − r where we used the assumption tr(W t ) = 1. We now plug this upper bound of the first term of f back into f and obtain f ≤ g, where t )−r0 (1 − exp(rδt ))) − tr(U X t )δt g = r0 δt + log(1 − tr(W t X r +a(yt − tr(W t X t ))2 − b(yt − tr(U X t ))2 .

(.2)

Let us define z = tr(U X t ) and maximize the upper bound (.2) with respect to z. Solving ∂g ∂z = 0, we have z = yt − δt /(2b) = yt + η(tr(X t W t ) − yt )/b. Substituting this into (.2), we have the upper bound g ≤ h where   t )−r0 (1 − exp(2ηr(y − tr(X W )))) h = 2ηr0 (yt − tr(X t W t )) + log 1 − tr(Xt W t t r −2ηyt (yt − tr(X t W t )) + (a +

η2 b (y

− tr(X t W t ))2 .

Using the upper bound log(1 − q(1 − exp p)) ≤ pq + p2 /8 in the second term, we have h≤

(yt − tr(X t W t ))2 ((2 + r 2 b)η 2 − 4bη + 2ab). 2b

It remains to show q = (2 + r 2 b)η 2 − 4bη + 2ab ≤ 0. We easily see that q is minimized for η = 2b/(2 + r 2 b) and that for this value of η we have q ≤ 0 if and only if a ≤ 2b/(2 + r 2 b).

135

136

Proceedings of the International Workshop on Statistical-Mechanical Informatics September 14–17, 2008, Sendai, Japan

A Novel Quantum Transition in a Fully Frustrated Transverse Ising Antiferromagnet Anjan Kumar Chandra Centre for Applied Mathematics and Computational Science, Saha Institute of Nuclear Physics, 1/AF Bidhannagar, Kolkata-700064, India. E-mail: [email protected]

Bikas K. Chakrabarti Centre for Applied Mathematics and Computational Science and Theoretical Condensed Matter Physics Division, Saha Institute of Nuclear Physics, 1/AF Bidhannagar, Kolkata-700064, India. E-mail: [email protected] Abstract. We consider a long-range Ising antiferromagnet put in a transverse field. Applying quantum Monte Carlo method, we study the variation of order parameter (spin correlation in Trotter time direction), susceptibility and average energy of the system for various values of the transverse field at different temperatures. Indications of a novel quantum phase transition is observed as the transverse field is tuned. Here the thermal fluctuations are seen to stabilise the order, while the quantum fluctuations destroy that.

1. Introduction Quantum phases in frustrated systems are being intensively investigated these days; in particular in the context of quantum spin glass and quantum ANNNI models [1]. Here we study a fullyfrustrated quantum antiferromagnetic model. Specifically, the long-range antiferromagnetic Ising model put under transverse field. The finite temperature properties of sub-lattice decomposed version of this model was already considered earlier [2, 3]. The quantum phase transition and entanglement properties of the full long-range model at zero temperature was studied by Vidal et al [4]. Here we present some results obtained by applying quantum Monte Carlo technique [5] to the same full long-range model at finite temperature. We observe indications of a quantum phase transition in the model, where the antiferromagnetically ordered phase seems to get stabilised by the thermal (classical) fluctuations while the quantum fluctuations (due to tunnelling or transverse field) drive the system to disordered or para phase. This paper is organized in the following manner. In Section II we introduce the quantum model and then the quantum Monte Carlo technique in Section III. In Section IV we present our results obtained for the model and in the final section we present some analytical discussions on our results. 2. The Model The Hamiltonian of the infinite-range quantum Ising antiferromagnet is

137

H ≡ H (C) + H (T ) N N N   J  = σiz σjz − h σiz − Γ σix , N i=1

i,j(>i)=1

(1)

i=1

where J denotes the long-range antiferromagnetic (J > 0) exchange constant; we fix the value J = 1 in this study. Here σ x and σ z denote the x and z component of the N Pauli spins  σiz

=

1 0 0 −1



 ;

σix

=

0 1 1 0

 ;

i = 1, 2, ...., N.

h and Γ denote respectively the longitudinal and transverse fields. We have denoted the cooperative term of H (including the external longitudinal field term) by H (C) and the transverse field part as H (T ) . As such the model has a fully frustrated (infinite-range or infinite dimensional) co-operative term. At zero temperature and at zero longitudinal and transverse fields, the H (C) would prefer the spins to orient in ±z directions only with zero net magnetization in the z-direction. This antiferromagnetically ordered state is completely frustrated and highly degenerate. Switching on the transverse field Γ would immediately induce all the spins to orient in the x-direction (losing the degeneracy), corresponding to a maximum of the kinetic energy term and this discontinuous transition to the para phase occurs at Γ = 0. However, at any finite temperature the entropy term coming from the extreme degeneracy of the antiferromagnetically ordered state and the close-by excited states would induce a stability of this phase and shifts the transition (to para phase) point from Γ = Γc = 0 at T = 0 to Γc (T ) > 0 for T > 0. We investigate this continuous quantum transition behaviour in the model. 3. MONTE CARLO SIMULATION 3.1. SUZUKI-TROTTER MAPPING AND SIMULATION This Hamiltonian (1) can be mapped to a (∞ + 1)-dimensional classical Hamiltonian [5] using the Suzuki-Trotter formula. The effective Hamiltonian can be written as H = −

1 NP

N 

P 

σi,k σj,k

i,j(>i)=1 k=1

N P N P h  Jp   σi,k − σi,k σi,k+1 , P P i=1 k=1

(2)

i=1 k=1

where Jp = −(P T /2) ln(tanh(Γ/P T )).

(3)

Here P is the number of Trotter replicas and k denotes the k-th row in the Trotter direction. Jp denotes the nearest-neighbour interaction strength along the Trotter direction. We have studied the system for N = 100. Because of the diverging growth of interaction Jp for very low values of Γ and also for high values of P , and the consequent non-ergodicity (the system relaxes to different states for identical thermal and quantum parameters, due to frustrations, starting from different initial configurations), we have kept the value of P at a fixed value of 5. This choice of P value helped satisfying the ergodicity of the system upto very low values of the transverse field at the different temperatures considered T = 0.10 and 0.20. Starting from a random initial configurations (including all up or 50-50 up-down configurations) we follow the time variations of different quantities until they relax and study the various quantities after they relax.

138

1 T = 0.10,N = 100 T = 0.10,N = 200 T = 0.20,N = 100 T = 0.20,N = 200 T = 0.30,N = 100 T = 0.30,N = 200

0.8

1

q

q

0.6

0.5

0.4

0

0.2

0

4

8

12

16

Γ/T 0 0

1

2

Γ

3

4

5

Figure 1. Variation of the order parameter q (correlation in the Trotter direction) with transverse field Γ for T = 0.10, 0.20 and 0.30 (h = 0) for two different system sizes (N = 100 and 200). q = 0 for Γ > Γc (T ), where Γc (T ) is an increasing function of T . The inset shows the plot of q against the scaled variable Γ/T .

3.2. RESULTS We studied results for three different temperatures T = 0.10, 0.20 and 0.30 and all the results are for N = 100 and 200 and P = 5. We estimated the following quantities after relaxation : (i) Correlation along Trotter direction (q) : We studied the variation of the order parameter N P 1  σi,k σi,k+1 , q= NP

(4)

i=1 k=1

which is the first neighbour correlation along Trotter direction. Here, ... indicate the average over initial spin configurations. This quantity q shows a smooth vanishing behaviour at and beyond a value of the transverse field Γ = Γc (T ). We consider this correlation q as the order parameter for the transition at Γc . A larger transverse field is needed for the vanishing of the order parameter for larger temperature. The observed values (see Fig. 1) of Γc are  1.6, 2.2 and 3.0 for T = 0.1, 0.2 and 0.3 respectively. As shown in the inset, an unique data collapse occurs when q is plotted against Γ/T and one gets q = 0 for Γ/T ≥ (Γ/T ) c  13. (ii) Susceptibility (χ) : The longitudinal susceptibility χ = (1/N P )∂[ i,k σi,k ]/∂h, where h (→ 0) is the applied longitudinal field, has also been measured. We went upto h = 0.1 and estimated the χ values. As we increase the value of the transverse field Γ from a suitably chosen low value, χ initially starts with a value almost equal to unity and then gradually saturates at lower values (corresponding to the classical system where Jp = 0 in Eq.(2)) as Γ is increased. Also at Γ = 0, the classical values are indicated in Fig. 2. This saturation value of χ decreases with temperature. Again the field at which the susceptibility saturates are the same as for the vanishing of the order parameter for each temperature. (iii) Average energy (E) : We have measured the value of the co-operative energy for each Trotter index and then take its average E i.e. E = H (C)  of Eq. (1) with J = 1. It initially begins with −1.0 and after a sharp rise the average energy saturates at Γ ≥ Γc (T ) ( 1.6, 2.2 and 3.0 for T = 0.1, 0.2 and 0.3) values corresponding to the classical equilibrium energy (Ecl for Jp = 0 in Eq.(2)) at those temperatures. Again it takes larger values of Γ at higher temperatures

139

1 T = 0.10,N = 100 T = 0.10,N = 200 T = 0.20,N = 100 T = 0.20,N = 200 T = 0.30,N = 100 T = 0.30,N = 200

0.9

χ

0.8

1 N = 100 N = 200

0.7

χcl

0.9

0.6

0.8 0.7 0.6 0.5 0

0.1

0.2

0.3

0.4

0.5

T 0.5 0

1

2

Γ

3

4

5

Figure 2. Variation of the susceptibility χ with transverse field Γ for T = 0.10, 0.20 and 0.30 (h ≤ 0.1) for two different system sizes (N = 100 and 200). The corresponding susceptibility χcl for various temperatures for N = 100 and 200 for the classical system are shown in the inset. χ converges to the classical values χcl for Γ > Γc (T ). -0.8 T = 0.10,N = 100 T = 0.10,N = 200 T = 0.20,N = 100 T = 0.20,N = 200 T = 0.30,N = 100 T = 0.30,N = 200

-0.8 -0.82

Ecl

-0.84

-0.84 -0.86

-0.88 -0.92 -0.96 -1

-0.88

E

N = 100 N = 200

0

0.1

0.2

0.3

0.4

0.5

T -0.9 -0.92 -0.94 -0.96 -0.98 -1 0

0.5

1

1.5

Γ

2

2.5

3

Figure 3. Variation of average energy E with transverse field Γ for T = 0.10, 0.20 and 0.30 (h = 0) for two different values of N (= 100, 200). The corresponding average energy Ecl for various temperatures for N = 100 and 200 for the are shown in the inset. E converges to the classical values Ecl for Γ > Γc (T ).

to acheive the classical equilibrium energy. At Γ = 0, the corresponding classical values of E are plotted in Fig. 3. The variations of all these quantities indicate that the ‘quantum order’ disappears and the quantities reduce to their classical values (corresponding to Jp = 0 in Eq.(2)) as the transverse field Γ exceeds Γc (T ). Γc (T ) is observed to be an increasing function of temperature T . These Γc (T ) values are seen to be independent of the system sizes (N = 100 and 200) considered.

140

4. Discussion Let us first rewrite our Hamiltonian H in Eq.(1) as H=

N N N N   1  z 2 1  z 2 ( σi ) − (σi ) − h σiz − Γ σix 2N N i=1

i=1

i=1

(5)

i=1

 σi (where N |σ| = 0, 1, 2, ...., N ), then If we now denote the total spin by σtot i.e. σtot = N1 N i=1  the above mentioned Hamiltonian H can be expressed as 1 z 2 1 H z x = (σtot ) − hσtot − Γσtot − . N 2 N

(6)

Let us assume the average total spin σ  to be oriented at an angle θ with the z-direction : z  = σ cos θ and σ x  = σ sin θ. Hence the average total energy E σtot tot = H can be written tot as 1 1 Etot = σ 2 cos2 θ − hσcos θ − Γσ sin θ − . N 2 N

(7)

At the zero temperature and at Γ = 0, for h = 0, the energy Etot is minimised when θ = 0 and σ = 0 (complete antiferromagnetic order in z-direction). As soon as Γ = 0 (h = 0) the minimisation of Etot requires θ = π/2 and σ = 1 (the maximum possible value); driving the system to paramagnetic phase. This discontinuous transition at T = 0 was also seen in [4]. As observed in our Monte Carlo study in the previous section, Γc (T ) → 0 as T → 0. This is consistent with this exact result Γc = 0 at T = 0. For T = 0 (and h = 0), therefore, the transition from antiferromagnetic (θ = 0 = σ) to para (θ = π/2, σ = 1) phase, driven by the transverse field Γ, occurs at Γ = 0 itself. One can also estimate the susceptibility χ at Γ = 0 = T . Here Etot /N = 12 σ 2 cos2 θ −hσcos θ − 1 N and the minimisation of this energy gives σ cos θ = h giving the (longitudinal) susceptibility χ = σ cos θ/h = 1. This is consistent with the observed behaviour of χ shown in Fig. 2 where the extrapolated value of χ at Γ = 0 increases with decreasing T and approaches χ = 1 as T → 0. At finite temperatures T = 0, for h = 0, we have to consider also the entropy term and minimise the free energy F = Etot − T S rather than Etot where S denotes the entropy of the state. This entropy term will also take part in fixing the value of θ and σ at which the free energy F is minimised. As soon as the temperature T becomes non-zero, the extensive entropy of the system for antiferromagnetically ordered state with σ  0 (around and close-by excited states with θ = 0) helps stabilisation near θ = 0 and σ = 0 rather than near the para phase with θ = π/2 and σ = 1, where the entropy drops to zero. While the transverse field tends to align the spins along x direction (inducing θ = π/2 and σ = 1), the entropy factor prohibits that and the system adjusts θ and σ values accordingly and they do not take the disordered or para state values (θ = π/2 and σ = 1) for any non-zero value of Γ (like at T = 0). For very large values of Γ, of course, the free energy F is practically dominated by the transverse field term in H and again θ = π/2 and σ = 1, beyond Γ = Γc (T ) > 0 for T > 0. However, this continuous transition-like behaviour may be argued [6] to correspond to a crossover type property of the model at finite temperatures (suggesting that the observed finite values of Γc (T ) are only effective numerical values). fact, for h =0 one adds the entropy term −T ln Ds to Etot in Eq.(7) to get F, where  In N  N [4, 6]. One can then get [6], after minimising the F with respect to Ds = N −σ − N −σ−1 2 2 σ and θ, σ = tanh(Γ/2T ), which indicates a smooth variation of σ. This also indicates the competing (rather than complementary) role of Γ and T in determining the order in the system.

141

Acknowledgements The work of one author (AKC) was supported by the Centre for Applied Mathematics and Computational Science (CAMCS) of the Saha Institute of Nuclear Physics. We are grateful to I. Bose, S. Dasgupta, J.-I. Inoue, D. Sen, P. Sen and K. Sengupta for useful discussions and comments. References [1] See e.g., R.N. Bhatt, in Spin Glasses and Random Fields, edited by A.P. Young, pp. 225-249 (World Scientific, Singapore, 1998); B.K. Chakrabarti, A. Dutta, P. Sen, Quantum Ising Phases and Transitions inTransverse Ising Models (Springer, Heidelberg, 1996) [2] B.K. Chakrabarti and J.-I. Inoue, Indian J. Phys. 80 (6), 609 (2006) [3] B.K. Chakrabarti, A. Das and J.-I. Inoue, Eur. Phys. J. B 51, 321 (2006) [4] J. Vidal, R. Mosseri and J. Dukelsky, Phys. Rev. A 69, 054101 (2004) [5] M. Suzuki, Prog. Theor. Phys. 56, 2454 (1976); see also B.K. Chakrabarti, A. Das, pp. 3-36 in Quantum Annealing and Related Optimization Methods, edited by A. Das, B.K. Chakrabarti, LNP 679 (Springer, Heidelberg, 2005) [6] D. Sen, private communication.

142

Proceedings of the International Workshop on Statistical-Mechanical Informatics September 14–17, 2008, Sendai, Japan

Symmetries, Dimensional Reduction, and Topological Quantum Order Gerardo Ortiz Department of Physics, Indiana University, Bloomington, IN 47405, USA E-mail: [email protected] Abstract. We prove sufficient conditions for Topological Quantum Order at zero and finite temperatures. The crux of the proof hinges on the existence of low-dimensional Gauge-Like Symmetries, thus providing a unifying framework based on a symmetry principle. All known examples of Topological Quantum Order display Gauge-Like Symmetries. Other systems exhibiting such symmetries include Hamiltonians depicting orbital-dependent spin exchange and Jahn-Teller effects in transition metal orbital compounds, short-range frustrated Klein spin models, and p+ip superconducting arrays. We analyze the physical consequences of GaugeLike Symmetries (including topological terms and charges) and, most importantly, show the insufficiency of the energy spectrum, (recently defined) entanglement entropy, maximal string correlators, and fractionalization in establishing Topological Quantum Order. Duality mappings illustrate that not withstanding the existence of spectral gaps, thermal fluctuations may impose restrictions on suggested topological quantum computing schemes. Our results allow us to go beyond standard topological field theories and engineer new systems with Topological Quantum Order.

143

144

Proceedings of the International Workshop on Statistical-Mechanical Informatics September 14–17, 2008, Sendai, Japan

Quantum algorithms and complexity Michele Mosca Institute for Quantum Computing, University of Waterloo, Canada E-mail: [email protected] Abstract. The quantum features of nature lead to qualitatively different and apparently more powerful models of computation and communication. This talk will focus on the power of quantum computation. Quantum computation allows us to efficiently solve problems that were previously believed to be intractable. One important example is the problem of factoring integers, which would have an enormous impact on the existing information security infrastructure. Another important application is the simulation of quantum mechanical systems. There are also “polynomial” speed-ups for a very general class of search and optimization problems. This talk will also review the more recent developments in quantum algorithms. Quantum (computational) complexity is a measure of the intrinsic difficulty of a computational problem on a quantum computer. I will also give an overview of quantum complexity theory, and discuss the relevance to finding efficient quantum algorithms.

145

146

Proceedings of the International Workshop on Statistical-Mechanical Informatics September 14–17, 2008, Sendai, Japan

Variational Bayesian inference for partially observed stochastic dynamical systems B Wang1 and D M Titterington2 1 2

Department of Mathematics, University of York, Heslington, York YO10 5DD, UK Department of Statistics, University of Glasgow, Glasgow G12 8QQ, UK.

E-mail: 1 [email protected]

2

[email protected]

Abstract. In this paper the variational Bayesian approximation for partially observed continuous time stochastic processes is studied. We derive an EM-like algorithm and describe its implementation. The variational Expectation step is explicitly solved using the method of conditional moment generating functions and stochastic partial differential equations. The numerical experiments demonstrate that the variational Bayesian estimate is more robust than the EM algorithm.

1. Introduction For most models of interest involving missing data a full Bayesian analysis is computationally complex because complicated multiple integrations are involved. Markov chain Monte Carlo for numerical integration helps to side-step this problem, but it is clearly time-consuming, samples of parameter values have to be stored and it may not be clear whether or not convergence has occurred. Recently, a deterministic approximate approach to intractable Bayesian learning problems, the variational Bayesian approximation, has been introduced in the neural-computing literature, for instance in [1] – [3], and is widely recognised to be effective [4] – [9]. Variational Bayes draws together variational ideas from the analysis of intractable latent variable models [10] and from Bayesian inference [11] [12]. This framework facilitates calculation of approximations to posterior distributions over the hidden variables, parameters and structures. They are computed via an iterative algorithm that is closely related to the Expectation-Maximisation (EM) algorithm and its convergence is guaranteed in an analogous way. Suppose that θ represents the set of parameters in the model and that x and y denote respectively the missing and observed data. Bayesian inference centres on the posterior distribution of θ, given y, which may in theory be obtained as the marginal distribution from the joint distribution of θ and x, given y. In the variational approach, this joint posterior density is approximated by a density qy (x, θ), chosen to maximise  qy (x, θ) log

p(x, y, θ) dx dθ, qy (x, θ)

which is equivalent to minimising the Kullback-Leibler divergence between the exact and approximate joint distributions of θ and x, given y. To be at all useful, the approximation

147

qy (x, θ) must be simple enough for computations to be feasible. This is achieved by neglecting some or even all of the dependences between the variables. Typically qy (x, θ) is assumed to take a factorised form, one factor involving only x and the other involving only θ. Given a suitable prior on θ, the factors corresponding to the parameters turn out to have the same distributional form as the conjugate family that would obtain were there no missing data, if such a conjugate family exists; the appropriate values of the hyperparameters are obtained by optimisation. The literature concerning the variational Bayesian method has been dominated by contexts involving discrete-time models, such as hidden Markov model [1], graphical models [2] [3], mixture models [5] [8] [9] and state space models [6] [7]. In this paper we apply the variational Bayesian approach to linear partially observed continuous-time stochastic processes. This kind of model is widely used in the fields of signal filtering, prediction and control. The EM algorithm for calculating the Maximum Likelihood estimate or Maximum A-Posteriori estimate for partially observed diffusions was developed in [13]. In [14] this algorithm was compared with the direct maximisation method for the MLE of parameters in these models. In [15] [16] the implementation of the E-step was simplified by the use of finite-dimensional filters for integral processes. The purpose of this paper is to derive the variational Bayesian algorithm for continuous-time linear Gaussian systems. In this setting the variational posterior densities for the unobserved states have to be properly replaced by Radon-Nikodym derivatives. We shall see that the resulting algorithm preserves the basic features of the variational methods in discrete-time contexts. Moreover, as in the discrete-time case, there are similarities between the variational Bayesian method and the EM algorithm for the partially observed diffusions model [13]. For example, the Expectation steps of both algorithms normally involve Kalman filtering and smoothing. In this paper we employ the method of conditional moment generating functions for integrals and stochastic integrals to deal with the variational Expectation step without using Kalman smoothing. Section 2 presents the statistical model and the prior distributions of the parameters. In Section 3 we derive the EM-like algorithm, and we show how to implement it in Section 4. Subsection 4.1 implements the variational maximisation step, and Subsection 4.2 describes the variational expectation step, although only sketchily; full details are available in a technical report by the authors. Some numerical experiments are reported in Section 5, and we conclude in Section 6 with some discussion. 2. Statistical model On a measurable space (Ω, A, P ) the following are given: (a) a family of probability measures M = {Pθ , θ ∈ Θ} depending on a parameter θ  (A, C); (b) a pair of stochastic processes X = {Xt , 0 ≤ t ≤ T } and Y = {Yt , 0 ≤ t ≤ T } taking values in IRm and IRd , respectively, such that, under Pθ , dXt = AXt dt + BdWt , dYt = CXt dt + DdVt ,

X0 = ξ, Y0 = 0,

(1) (2)

where {Wt , 0 ≤ t ≤ T } and {Vt , 0 ≤ t ≤ T } are respectively m-dimensional and d-dimensional independent standard Brownian motions, and ξ is an IRm -valued Gaussian variable, independent of the Brownian motions, with mean λ0 and covariance matrix Π0 , and with density denoted by p0 (x). Unlike in the case of discrete-time state-space models, the probability measures induced by stochastic differential equations driven by Brownian motions with different covariance structures are not mutually absolutely continuous, so it is not possible to estimate B and D by the maximum likelihood method. In fact these parameters can be estimated in terms of the quadratic variations

148

of the observation and filtered state processes; see, for example, [15] [17]. Therefore we assume that B and D are known and there is no loss of generality in taking them to be identity matrices. Assume that Ω is the canonical space C([0, T ]; IRm+d ), in which case X and Y are the canonical processes on C([0, T ]; IRm ) and C([0, T ]; IRd ), respectively, and Pθ is the probability law of (X, Y ). Here X = {Xt , 0 ≤ t ≤ T } is the state process, which is not directly observed; rather, the information about its evolution is obtained through the noisy observed process Y = {Yt , 0 ≤ t ≤ T }. Let YT denote the σ-algebra generated by the process Y , and let PθY denote the restriction of Pθ to YT . The framework defined in (1) and (2) ensures that the probability measures in M are mutually absolutely continuous. Thus, if we let θ0 = (A0 , C0 ) be the reference parameter and write Pθ0 as P0 , according to [18], the likelihood function for estimating the parameter θ on the basis of a given observation path {Yt , 0 ≤ t ≤ T } can be expressed as L(θ) 

dPθY = IE0 (ZTθ |YT ), dP0Y

where IE0 denotes the expectation under P0 , and ZTθ is the Radon-Nikodym derivative of Pθ with respect to P0 ; that is ZTθ



dPθ dP0

 1 T (AXt − A0 Xt ) dWt − |AXt − A0 Xt |2 dt 2 0 0  T  T 1  (CXt − C0 Xt ) dVt − |CXt − C0 Xt |2 dt} + 2 0 0  T  T 1 (AXt − A0 Xt ) dWt − |AXt − A0 Xt |2 dt = exp{ 2 0 0   T 1 T   (CXt − C0 Xt ) dYt − Xt (C − C0 ) (C + C0 )Xt dt}, + 2 0 0  = exp{

T

(3)

which is also called the “complete data” likelihood [13]. Using the Bayesian approach, we assign a prior density p0 (θ) with the following structure to the unknown parameters: the ith row vector of the A matrix, denoted by a i , is given a Gaussian and covariance matrix equal to a diagonal matrix α prior with mean μ 0 ; the ith row vector 0  and covariance matrix equal to a , is given a Gaussian prior with mean ν of C, denoted by c 0 i diagonal matrix β0 ; and all these row and column vectors are assumed independent. Thus the posterior probability distribution of the parameter θ is defined by Bayes’ Theorem as IE0 (ZTθ |YT )p0 (θ) L(θ)p0 (θ) = . p(θ|Y ) =  L(θ)p0 (θ)dθ IE0 (ZTθ |YT )p0 (θ)dθ As pointed out in [4] [7], the exact Bayesian treatment would require us to compute marginals of the posterior distribution over all the unknown parameters and hidden states. This involves the cross-integration terms of up to fourth order; for example, expression (3) contains terms in the exponent of the form |CXt − C0 Xt |2 . Therefore, integrating over the parameters and the hidden states would be time-consuming. Now we consider the variational approximation for this continuous-time partially-observed diffusion model, in which the approximate posteriors are computed via an iterative algorithm.

149

3. The variational Bayesian treatment The basic idea of the variational Bayesian method is simultaneously to approximate the intractable joint distribution of both hidden states and parameters with a simpler distribution, usually of a factorised form, corresponding to assuming that the hidden states and parameters are independent. In the case of the continuous-time model, the idea is the same but we need to deal with a Radon-Nikodym derivative rather than a density for the variational posterior of the hidden state. X|Y the probability law of X conditional on the observed process Y with the Denote by P0 fixed parameter θ0 . The ensemble loglikelihood can thus be expressed as   log L(θ)p0 (θ)dθ = log IE0 (ZTθ |YT )p0 (A)p0 (C)dAdC  X|Y = log ZTθ · p0 (A)p0 (C)P0 (dX)dAdC. ˜ We use an approximating conditional distribution Q(X, A, C), which factorises as follows:   ˜ Φ(X, A, C)dQ(X, A, C) = Φ(X, A, C)QA (A)QC (C)QX (dX)dAdC for any function of interest Φ, where QX (·) is a probability measure on the space C([0, T ]; IRm ). By Jensen’s inequality we have  log L(θ)p0 (θ)dθ 

X|Y

dP p0 (A) p0 (C) · · QA (A)QC (C)QX (dX)dAdC · ZTθ · 0 dQX QA (A) QC (C) ⎫ ⎧  ⎨ Z θ · p (A)p (C) ⎬ 0 0 T QA (A)QC (C)QX (dX)dAdC ≥ log X ⎩ dQX|Y · QA (A)QC (C) ⎭ = log

dP0

 F(QX (X), QA (A), QC (C)). The function F is called free energy. For variational Bayesian learning the free energy F is the key quantity with which we work. Learning proceeds with iterative updates of the variational posteriors. The optimum forms of these approximate posteriors can be found by taking functional derivatives of F with respect to each distribution. This results in the following theorem. Theorem 1. The free energy F is maximised by the following distributions: (i) the optimal variational posteriors of the parameters are multivariate Gaussian and satisfy QA (A) ∝ exp{log ZTθ X }p0 (A), QC (C) ∝ exp{log ZTθ X }p0 (C),

(4) (5)

where ·X denotes expectation under QX (X); (ii) the optimal variational posterior of the unobserved state satisfies dQX X|Y dP0

∝ exp{log ZTθ θ },

where ·θ denotes expectation with respect to QA (A) and QC (C).

150

(6)

Proof. The proof of (i) is straightforward if we solve the following variational equations associated with the functional F: δF =0 δQA (A)

and

δF = 0. δQC (C)

For (ii), the functional F can be rewritten as

 dQX θ Q(dX) + constant independent of Q(X) F = log ZT θ − log X|Y dP0

 dQ dQX X|Y X = log ZTθ θ − log · P0 (dX) + constant. · X|Y X|Y dP0 dP0 X|Y

Noting that P0 (·) is a reference probability measure and is independent of Q(X), we take the X|Y functional derivatives of F with respect to dQX /dP0 , and (ii) follows. Theorem 1 suggests that we can maximise the free energy numerically by iterating between the variational posteriors given by (i) and (ii). Therefore, we obtain the following EM-like algorithm. VE Step: compute log ZTθ X , and obtain the variational posteriors of A and C according to (i): ai ∼ N (μi , αi ) and ci ∼ N (νi , βi ). VM Step: compute log ZTθ θ , and obtain the variational posterior of the hidden state according to (ii), namely, the Radon-Nikodym derivative of the variational distribution with respect X|Y to the reference measure P0 . 4. Implementation of the variational approximation Practical implementation of the iterative algorithm provided in the previous section is not simple, mainly because the VE step involves the conditional expectation under a measure on the space of continuous functions on [0, T ]. In general, this kind of expectation results in nonlinear filtering and smoothing problems which do not have any analytical form. However, under our linear setting, finite-dimensional solutions exist, as demonstrated in the sequel. 4.1. VM step Since the variational posteriors of A and C are Gaussian the computation of the VM step is straightforward. From (6) one has dQX X|Y

dP0

∝ exp{log ZTθ θ }

  T 1 T  Xt (A − A0 ) A dWt − Xt (A − A0 ) (A − A0 )A Xt dt = exp{ 2 0 0  T  T 1 Xt (C − C0 ) C dYt − Xt (C − C0 ) (C + C0 )C Xt dt} + 2 0 0  ΓT . ¯  = (μ1 , · · · , μm ), V¯  = (ν1 , · · · , νm ) and Define U ¯ − A0 , F1 = U

¯ − A0 ) (U ¯ − A0 ) + diag(tr(α1 ), · · · , tr(αm )), H1 = (U

151

H2 = (V¯ − C0 ) (V¯ − C0 ) + diag(tr(β1 ), · · · , tr(βd )).

F2 = V¯ − C0 , Then ΓT can be rewritten as ΓT

  T 1 T    = exp{ Xt F1 dWt − Xt H1 Xt dt 2 0 0   T 1 T  Xt F2 dYt − Xt H2 Xt dt}. + 2 0 0 X|Y

Therefore, we obtain dQX /dP0



it follows that K=

= KΓT . Since  dQX X|Y · P0 (dX) = 1, X|Y dP0 X|Y

ΓT · P0

−1 −1 (dX) = IE0 [ΓT |YT ] .

This Radon-Nikodym derivative implies that {Xt , 0 ≤ t ≤ T } is no longer a diffusion process under the variational distribution QX ; that is, it does not satisfy any stochastic differential equation under QX . 4.2. VE step Given an observation path {Yt , 0 ≤ t ≤ T }, ZTθ is a functional on the space C([0, T ]; IRm ) of IRm -valued continuous functions on [0, T ]. Now we consider the expectation of log ZTθ under the probability measure QX on C([0, T ]; IRm ). ¿From the VM step, we have  θ log ZTθ · Q(dX) log ZT X =  dQX X|Y · P0 (dX) = log ZTθ · X|Y dP0

dQ X |YT . = IE0 log ZTθ · X|Y dP0 Hence, from (4), (5) and (3), we obtain, for each i,    dQX |Y p0 (A) QA (A) ∝ exp IE0 log ZTθ · T X|Y dP0   T  dQX Xt (A − A0 ) dWt · |Y ∝ exp IE0 T X|Y 0 dP0   dQX 1  T   Xt (A − A0 ) (A − A0 )Xt dt · |YT p0 (A), − IE0 X|Y 2 0 dP0 which gives

  ˜  −1 −1 ˜ −1 −1 ˜ −1 , QA (a i ) ∼ N (Si + a0,i S + μ0 α0 )[α0 + S] , [α0 + S]

152

with S˜ = IE0



T

Xt Xt dt ·

0



T

Si = IE0

Xt dWti ·

0

 |YT ,

(7)

 |YT ,

(8)

dQX X|Y

dP0

dQX X|Y

dP0

where {Wti } is the ith component of {Wt }. Similarly,

  ˜  −1 −1 ¯ ˜ −1 −1 ˜ −1 , QC (c i ) ∼ N (Si + a0,i S + ν0 β0 )[β0 + S] , [β0 + S] where S¯i = IE0



T

Xt dYti ·

0

dQX X|Y

dP0

 |YT ,

(9)

in which {Yti } is the ith component of {Yt }. Therefore we see that, similarly to the discrete-time case [4] [6] [7], in general, the VE step involves filtered estimates of the (stochastic) integrals of the unobserved state and their exponents that normally have to be computed via a Kalman smoothing procedure. These do not have any analytical form and require large memory in any numerical implementation. In laying the groundwork for their version of the EM algorithm, Elliott and Krishnamurthy [15] developed the method of conditional moment generating functions for integrals and stochastic T T integrals in order to compute the estimates such as IE0 [ 0 Xti Xtj dt|YT ], IE0 [ 0 Xtj dWti |YT ] T j i and IE0 [ 0 Xt dYt |YT ]. Next we employ this technology to obtain the explicit form of the VE step without using Kalman smoothing but using the Kalman filter, which is much simpler than the former. Instead of using a different group of stochastic partial differential equations for calculating the conditional expectation of each integral process as presented in [15] [16], we use a method of ensemble computation in which only a single group of stochastic partial differential equations is needed to deal with all the conditional expectations (7)-(9). In fact, if we write  T  T  T Xt R1 Xt dt + Xt R2 dWt + Xt R3 dYt , (10) ΥT = 0

0

0

then the conditional expectations (7)-(9) can be computed as   dQX |Y IE0 ΥT · T , X|Y dP0 for suitable matrices R1 ∈ IRm×m , R2 ∈ IRm×m and R3 ∈ IRm×d . After considerable algebra, details of which are available in the accompanying technical report, we finally achieve that

dQX |YT IE0 ΥT · X|Y dP0  T ˆ0 ¯ 0 ) + (r 0 ) ( 1 G G0 − F0 )X tr(Σ0t K0 + Pt0 K = t t 2 0 0  1 1 1 0   0 0    0 ¯ G0 + G0 G ˆ t ) ( G0 G0 − F0 )rt + (X ˆt ) ( G ˆ t dt ¯ 0 − F¯0 )X +(X 2 2 0 2  T ¯ 0X ˆ 0 + H0 r 0 ) dYt . (H + t t 0

153

In the above, 1 1 1 F0 = C0 C0 + H1 + H2 , G0 = F1 , 2 2 2 1  H0 = F2 + C0 , K0 = (G0 G0 + H0 H0 − 2F0 ), 2 ¯ ¯ 0 = R3 , ¯ F0 = −R1 , G0 = R2 , H  ¯ ¯ ¯ ¯ 0 = 1 (G ¯  G0 + G ¯ K 0 G0 + H0 H0 + H0 H0 − 2F0 ). 2 0 ˆ t0 and Pt0 satisfy respectively the stochastic and ordinary differential equations In addition, X ˆ 0 dt + P 0 H  dYt , X ˆ 0 = λ0 , ˆ 0 = (A0 + 2P 0 K0 + G0 − P 0 H  H0 )X dX t t t 0 t t 0 0 0 0 0  0  0 ˙ Pt = (A0 + G0 )Pt + Pt (A0 + G0 ) − Pt (H0 H0 − 2K0 )Pt + 1, P00 = Π0 , and rt0 and Σ0t satisfy ¯ 0 )dYt G − 2P 0 F + G0 )rt0 dt + (Σ0t H0 + Pt0 H drt0 = (A0 + Pt0 G  0  0 0  t 0 ¯ 0 G0 + G0 G ¯ 0 − 2F¯0 ) + Pt (G  ¯ ˆ0 +Σ0t (G 0 G0 − 2F0 ) + G0 Xt dt, 0 0 ˙ 0t = (A0 + G0 + Pt0 G Σ 0 G0 − 2Pt F0 )Σt 0  +Σ0t (A0 + G0 + Pt0 G 0 G0 − 2Pt F0 ) 0 ¯ 0 P 0 + P 0 (G ¯  G0 + G G ¯ ¯ + G ¯ +Pt0 G 0 t t 0 0 0 − 2F0 )Pt ,

with initial conditions r00 = Σ00 = 0. Thus the integrals (7), (8) and (9) can be computed correspondingly for suitable R1 , R2 and R3 , and the VE step is implemented explicitly. 5. Numerical experiments In this section, we look at the special case of the one-dimensional autoregressive system, dXt = AXt dt + dWt , dYt = hXt dt + dVt ,

X0 = 0, Y0 = 0,

(11) (12)

where A is an unknown parameter which is assigned a normal prior distribution with mean μ0 and variance α0 , and h is a known gain. We seek the variational posterior of A. The reference parameter A0 is taken to be 0. VM step: Since, according to the variational posterior for A, A ∼ N (μ, α), we have   T 1 T 2 dQX ∝ exp{ μX dW − (μ + α)Xt2 dt}. (13) t t X|Y 2 0 0 dP0 VE step: for our special model we have QA (A) ∼ N ( with



T

Xt2 dt

a = IE0 

μ0 + bα0 α0 , ), 1 + aα0 1 + aα0

0 T

b = IE0

·

dQX

Xt dWt ·

0

154

X|Y

dP0

|YT ,

dQX X|Y

dP0



 |YT .

(14)

We take R3 = 0 in (10), so that 1 2 (μ + α + h2 ), F¯0 = −R1 , 2 ¯ 0 = R2 , H0 = h, H ¯ 0 = 0, G0 = μ, G 1 ¯ 0 = R1 + R2 μ. K0 = − α, K 2 F0 =

ˆ t0 , Pt0 , rt0 and Σ0t satisfy the equations The corresponding X ˆ 0 dt + hP 0 dYt , ˆ 0 = [μ − (h2 + α)P 0 ]X dX t t t t P˙t0 = 2μPt0 − (h2 + α)(Pt0 )2 + 1, drt0 = [μ − (h2 + α)Pt0 ]rt0 dt + hΣ0t dYt ˙ 0t Σ

ˆ 0 dt, +[R2 + 2(R1 + R2 μ)Pt0 − (h2 + α)Σ0t ]X t = 2[μ − (h2 + α)Pt0 ]Σ0t + 2R2 Pt0 +2(μR2 + R1 )(Pt0 )2 ,

(15) (16) (17) (18)

ˆ 0 = P 0 = r 0 = Σ0 = 0. with initial conditions X 0 0 0 0 Thus, we have

dQX |YT IE0 ΥT · X|Y dP0   T 1 0 0 2 0 ˆ0 0 2 ˆ − Σs α + Ps (R1 + R2 μ) − (h + α)rs Xs + (R1 + R2 μ)(Xs ) ds = 2 0  T hrs0 dYs . + 0

To compute a and b, we take R1 = 1, R2 = 0 and R1 = 0, R2 = 1 respectively in the last expression. We choose T = 20 and discretise time into steps of Δt = 0.01. The SDE’s (11) and (12) are simulated by the Euler time-discretisation scheme. For example, (11) is approximated by √ xn+1 = xn + Axn Δt + Δt · wn , in which {wn } is a standard Gaussian white noise sequence. Solution of equations (15)-(18) is described in our technical report; alternatively they can be sampled by a discretisation scheme. We generate a sample of size T /Δt using A = −1 and h = 10 so that our results are comparable with [13], and then compute the mean and variance of A’s variational posterior by iterating (13) and (14) using different hyperparameters (μ0 , α0 ) in the prior distributions. Some of the experimental results are summarised in Table 1. The parameter A can also be estimated by the maximum likelihood method, and the estimate can be computed using the EM algorithm; see [13] [15]. In Table 2 we list the computational results using the method of [15] for different initial guesses μ0 . The experiments show that the VEM and EM algorithms have almost the same computational burden, though Bayesian inference needs one to evaluate the normalising constant more often than in the maximum likelihood method, and both become close to the true value quickly, even from initial guesses that are far from the true values. However, from Tables 1 and 2 we see that, if it converges, the EM algorithm always converges to the same point for different initial values, whereas the stationary points of the VEM algorithm differ a little, though all of them

155

Table 1. Experimental results for the VEM algorithm, displayed as posterior means, with variances in parentheses. Results are shown for the first two iterations and then iterations at which the various means have converged to 4 decimal places. The estimates are not displayed after convergence has occurred. Prior

-1.0 (4.0)

-10.0 (100.0)

10.0 (100.0)

-50.0 (100.0)

20.0 (100.0)

25.0 (100.0)

Iteration 1

-1.0156 (0.0950) -0.9518 (0.0889) -0.9502 (0.0887)

-5.6210 (0.5512) -1.5001 (0.1457) -0.9764 (0.0929) -0.9574 (0.0906)

-6.5221 (0.5695) -1.6849 (0.1674) -0.9665 (0.0938) -0.9389 (0.0906)

-47.6119 (4.5237) -44.0303 (4.1832) -40.0878 (3.8082) -30.4072 (2.8875) -7.8165 (0.7401) -0.9946 (0.0908)

-28.9921 (1.8395) -20.8580 (2.1206) -11.7174 (1.1898) -1.1521 (0.1145) -0.9296 (0.0906)

-91.0143 (4.5063) -80.3540 (8.2243) -71.2609 (7.2935) -55.9601 (5.7266) -35.5292 (3.6338) -1.0685 (0.1063) -0.9250 (0.0905)

2 3 5 8 13 16

Table 2. EM experimental results. ‘NaN’ means ‘not a number’ and indicates that convergence has failed. Results are shown for the first iteration and then iterations at which the various means have converged to 4 decimal places. The estimates are not displayed after convergence has occurred. Initial Value

-1.0

-10.0

10.0

-50.0

20.0

25.0

Iteration 1 2 5 11 16 17

-0.9487 -0.9475

-2.9777 -1.0688 -0.9475

-4.4837 -1.2629 -0.9475

-46.3949 -42.3395 -25.6663 -0.9475

-257.7984 NaN

71.1369 74.4490 83.1805 97.4681 112.2916 115.8845

are close to the true value; this is a consequence of the different values taken by the prior’s hyperparameters. On the other hand, when the initial values are too far away from the true values (as in the cases μ0 = 20.0, 25.0 in Tables 1 and 2), the EM algorithm no longer converges but the VEM algorithm may still converge to a point close to the true value. This indicates that the VEM algorithm is more robust than the EM algorithm. 6. Discussion If data are observed only at discrete times, a similar variational algorithm can be developed along the above lines and using the technology introduced in [19]. This algorithm will be the same as that presented in [6] [7] for state-space models, but the implementation is different. The

156

former uses only the Kalman filter, whereas both the Kalman filter and smoother are needed for the latter. Our method can also be applied to certain more general cases than linear models. If the systems are linear in the parameters and nonlinear in the state, similar algorithms can be developed and explicit implementation exists for certain models, such as the Benes processes [20], as well as the systems discussed in [21] and in [22] [23]. The VE step cannot be computed explicitly for general nonlinear models, but the Galerkins approximation can be used to compute the conditional expections involved in the VE step; see for example [17]. Variational inference for stochastic dynamical systems is receiving more and more attention because of its potentially wide applications. While we were finishing this paper, we became aware of a project entitled “Variational Inference in Stochastic Dynamic Environmental Models” (VISDEM) [24]. One of the aims of this project is to study systematically the application of variational Bayesian methods to inference in stochastic dynamical systems and to investigate their applications in weather and climate forecasting. Acknowledgments This work was supported by a grant from the UK Science and Engineering Research Council. This version was compiled while the second author was a Visiting Fellow on the ‘Statistical Theory and Methods for Complex High-Dimensional Data’ Research Programme at the Isaac Newton Institute for Mathematical Sciences in Cambridge. References [1] MacKay D J C 1997 Cavendish Laboratory Report, University of Cambridge [2] Attias H 1999 Proc. 15th Conf. on Uncertainty in Artificial Intelligence (Stockholm), ed H. Prade and K Laskey (Morgan Kaufmann) pp 21–30 [3] Attias H 2000 Advances in Neural Information Processing Systems vol 12, ed S Solla, T Leen and K R Muller (Cambridge MA: MIT Press) pp 209–15 [4] Beal M J 2003 PhD thesis, University College London [5] Corduneanu A and Bishop C M 2001 Proc. 8th Int. Conf. on Artificial Intelligence and Statistics, ed T Richardson and T. Jaakkola (Morgan Kaufmann) pp 27–34 [6] Ghahramani Z and Beal M J 2001a Advances in Neural Information Processing Systems vol 13, ed T Leen, T Dietterich and V Tresp (Cambridge MA: MIT Press) pp 507–13 [7] Ghahramani Z and Beal M J 2001b Technical report, University College London [8] Humphreys K and Titterington D M 2000 COMPSTAT2000, ed J G Bethlehem and P G M van der Heijden (Heidelberg: Physica-Verlag) pp 331–6 [9] Penny W D and Roberts S J 2000 Technical Report PARG-2000-01, Oxford University [10] Saul L K, Jaakkola T and Jordan M I 1996 J. Artific. Intell. Res. 4 61–76 [11] Waterhouse S, MacKay D and Robinsion T 1996 Advances in Neural Information Processing Systems vol. 8, ed D S Touretzky, M Mozer and M Hasselmo (Cambridge MA: MIT Press) pp 351–7 [12] Hinton G E and van Camp D 1993 Proc. 6th Annual ACM Conf. on Computational Learning Theory (Santa Cruz CA) (New York: ACM Press) pp 5–13 [13] Dembo A and Zeitouni O 1986 Stoch. Process. Applic. 23 91–113 [14] Campillo F and LeGland F 1989 Stoch. Process. Applic. 33 245–74 [15] Elliott R J and Krishnamurthy V 1997 SIAM J. Control Optim. 35 1908–23

157

[16] Charalambous C D, Elliott R J and Krishnamurthy V 2003 SIAM J. Control Optim. 42 1578–603 [17] Charalambous C D and Logothetis A 2000 IEEE Trans. Automat. Control 5 928–34 [18] Liptser R S and Shiryayev A N 1977 Statistics of Random Processes: I, II (New York: Springer-Verlag) [19] Charalambous C D and Logothetis A 1998 Proc. 37th IEEE Conf. on Decision and Control (Tampa FL) [20] Benes V 1981 Stochastics 5 65–92. [21] Haussmann U and Pardoux E 1988 Stochastics 23 241–75 [22] Charalambous C D and Elliott R J 1997 IEEE Trans. Automat. Control 42 482–97 [23] Charalambous C D and Elliott R J 1998 SIAM J. Control Optim. 36 542–78 [24] Cornford D 2006 VISDEM: Variational inference in stochastic dynamic environmental models http://www.ncrg.aston.ac.uk/˜cornfosd/VISDEM/

158

Proceedings of the International Workshop on Statistical-Mechanical Informatics September 14–17, 2008, Sendai, Japan

Mathematical Structures of Loopy Belief Propagation and Cluster Variation Method Kazuyuki Tanaka Graduate School of Information Sciences, Tohoku University, 6-3-09 Aramaki-aza-aoba, Aoba-ku, Sendai 980-8579, Japan E-mail: [email protected] Abstract. The mathematical structures of loopy belief propagation are reviewed for graphical models in probabilistic information processing in the stand point of cluster variation method. An extension of adaptive TAP approaches is given by introducing a generalized scheme of the cluster variation method. Moreover the practical message update rules in loopy belief propagation are summarized also for quantum systems. It is suggested that the loopy belief propagation can be reformulated for quantum electron systems by using density matrices of ideal quantum lattice gas system.

1. Introduction Advanced mean field methods are powerful applications in probabilistic information processing[1]. Many computer scientists, statistical scientists as well as physicists are interested in the mathematical structures of mean field theory. As one of advanced mean field methods, we have loopy belief propagations (LBP)[2, 3, 4] They have been applied to many problems in computer sciences. One of the successful applications is to probabilistic image processing [5, 6, 7]. Others is to construct algorithms for error correcting codes and other communication technologies[8, 9, 10]. In probabilistic inferences, LBP has been applied also to probabilistic inferences[11, 12, 13]. Recently, it is suggested that LBP can be derived from the cluster variation method (CVM)[4]. The CVM is one of statistical mechanical methods and is an extension of mean field theory[14, 15, 16, 17, 18, 19]. Cycles corrections in loop belief propagations have been discussed[20, 21, 22, 23, 24]. The convergence of algorithms in LBP have been investigated[25]. Some authors have discussed the accuracy of LBP in statistical inferences based on Gaussian graphical models by comparing results obtained through LBP with those based on exact calculation[26, 27, 28]. Averages, variance and covariances in Gaussian graphical model can be calculated by using multi-dimensional Gaussian integral formulas. Another approach of advanced mean field methods is an adaptive TAP (Thouless-AndersonPalmer) method[29, 30, 31]. The method is formulated by using the calculations of some statistical quantities in Gaussian graphical models and is applicable to the computations in nontrivial graphical models. Some authors have applied it to some practical problems in computer sciences[32, 33]. Recently, it is expected that the success of advanced mean-field approaches to probabilistic information processing is extended to quantum statistical mechanical approaches[35, 36, 37] The

159

CVM which can derive some algorithms of LBP has been formulated also in the case of quantum systems[38, 39, 40]. In the present paper, the mathematical structure of LBP and an extension of the adaptive TAP approaches is given. Moreover we give the practical message update rules in LBP. The practical message update rules in loopy belief propagation are summarized also for quantum systems. In section 2, we give brief reviews of conventional LBP. In section 3, we propose an extension of the loopy belief propagation by using CVM for Gaussian graphical models. In section 4, a new interpretation of the adaptive TAP approaches by means of the framework in section 3. In section 5, we derive message passing formulas of the conventional LBP for quantum systems in statistical-mechanical informatics. Section 6 is concluding remarks and we mention the relationship between conventional mean-field approaches to quantum electron systems and LBP. Our suggestions in section 6 is based on Refs.[39, 40]. 2. Loopy Belief Propagation In this section, we survey the conventional LBP. The message update rules are derived by using the scheme of CVM. Messages of LBP corresponds to effective fields in the statistical mechanics. In order to explain the framework of LBP, we should define some notations for hyperedges. Hyperedge is a set of nodes. When a node i belongs to a hyperedge γ, we call i an element of γ and we express it in terms of the notation i∈γ. When the node i belongs to a hyperedge γ, i can be regarded as a proper subset of γ and use the notation i < γ. When all the node i in the hyperedge γ 0 belong to a hyperedge γ, γ 0 can be regarded as a subset of γ and use the notation γ 0 ≤γ. We denote the set of all the nodes by V . First of all, we have to specify a set of hyperedges, E. Every hyperedges must not be a subset of another element in the set of hyperedges, E. Every common set of any two or more hyperedges is only a node. We denote the set of hyperedges by E. We consider such a set Vc of clusters that a hyperedge or a node is in C if and only if it is the common node of two or more hyperedges in E. A random variable xi is associated with every node i∈V . Every random variable x i takes +1 and −1. A random variable vector defined for the set of all the nodes V is denoted by ~xV ≡(x1 , x2 , · · ·, x|V | )T . Suppose that the nodes γ1 , γ2 , · · ·, γ|γ| belonging to the heperedge γ are ordered so that γ1 < γ2 < · · · < γ|γ| . Let the |γ|-dimensional vectors ~xγ be defined by ~xγ ≡(xγ1 , xγ2 , · · ·, xγ|γ| )T . We consider the following probability distribution for random variable vector ~x V : Y

fγ (~xγ )

{γ|γ∈E}

PV (~xV ) ≡ X Y

fγ (~zγ )

,

(1)

~ zV {γ|γ∈E}

Here ~zV ≡ z1 =±1 z2 =±1 · · · z|V | =±1 is the summation over all the possible configuration for the random variable vector. The purpose of the present paper is to calculate the marginal probabilities P

P

Pi (xi ) ≡

P

X

P

δxi ,zi PV (~zV ) (i∈V ),

Pγ (~xγ ) ≡

~ zV

X

δ~xγ ,~zγ PV (~zV ) (γ∈E).

(2)

~ zV

Now we consider the Kulback-Leibler divergence D[P V ||QV ] defined by D[PV ||QV ] ≡

X

QV (~zV )ln

~ zV

160

  Q (~ V z)

PV (~z )

.

(3)

By substituting Eq.(1) to Eq.(3), we have D[PV ||QV ] = −

XX

Qγ (~zγ )ln(fγ (~zγ )) +

γ∈E ~ zγ

X

QV (~zV )lnQV (~zV ) + ln

X Y

~ zV γ∈E

~ zV



fγ (~zγ ) .

(4)

We approximately restrict the trial probability distribution Q V (~xV ) to the following form: QV (xV ) =

Y

Qγ (~xγ )

γ∈E

 Y

Qi (xi )−|∂i|+1

i∈V



(5)

By using Eq.(5), Eq.(4) can be rewritten as D[PV ||QV ] = F[{Qγ |γ∈Vc ∪E}] + ln

X Y



fγ (~zγ ) ,

~ zV γ∈E

(6)

where F[{Qγ |γ∈Vc ∪E}] ≡ −

XX

Qγ (~zγ )ln(fγ (~zγ ))

γ∈E ~ zγ



X

(|∂i| − 1)

X zi

i∈V

XX

Qi (zi )lnQV (zi ) +

Qγ (~zγ )lnQγ (~zγ ).

(7)

γ∈E ~ zγ

The marginal probability distributions Q i (xi ) and Qγ (~xγ ) satisfy the following consistencies: X

zi Qi (zi ) =

~ zγ

X

zi Qγ (~zγ ) (i∈Vc , γ∈∂i)

(8)

Qγ (~zγ ) = 1 (i∈Vc , γ∈E)

(9)

~ zγ

and the normalization conditions: X

Qi (zi ) =

zi

X ~ zγ

We introduce some Lagrange multipliers λ i,γ , λi and λγ to ensure the consistency conditions (8) and the normalization conditions (9) as follows: L[{Qγ |γ∈Vc ∪E}] ≡ F[{Qγ |γ∈Vc ∪E}] −

X

λγ (

γ∈E



X

i∈Vc

λi (

X

Qi (zi ) − 1) −

zi

XX

X

Qγ (~zγ ) − 1)

~ zγ

λi,γ (

X

zi Qi (zi ) −

zi

i∈Vc γ∈∂i

X

zi Qγ (~zγ ))

(10)

~ zγ

By taking the variational calculations of L[{Q γ |γ∈Vc ∪E}] in Eq.(10) and by driving the extremum conditions, the marginal probabilities {P γ (~xγ )|γ∈E} and {Pi (xi )|i∈C\E} are approximately expressed in terms of the Lagrange multipliers as follows: fγ (~xγ )exp Qγ (~xγ ) = X





X



zi





exp λi,i zi

161



λi,γ zi

{i|i≤γ,i∈Vc }

exp λi,i xi

Qi (xi ) = X

λi,γ xi

{i|i≤γ,i∈Vc }

fγ (~zγ )exp



X





(i∈Vc ),

(γ∈E),

(11)

(12)

where λi,i (xi ) are determined so as to satisfy the following equations: X

(|∂i| − 1)λi,i =

λi,γ

(i∈Vc ).

(13)

{γ|γ>i, γ∈E}

The marginal probability distributions in Eqs.(11) and (12) can be reduced to the representations in LBP by introducing new parameters h γ 0 →i defined from the Lagrange multipliers λi,γ by means of the following linear transformation: X

λi,γ =

hγ 0 →i ,

(14)

{γ 0 |γ 0 ∈∂i\γ}

The quantities exp(hγ 0 →i xi ) corresponds to the message that hyperedge γ sends to node i in LBP. The new parameter hγ 0 →i is referred to as an effective field in the statistical mechanics. By using Eq.(14), the approximate marginal probability distributions Q γ (~xγ ) and Qi (xi ) are rewritten as follows: Qγ (~xγ ) =

 X  X 1 fγ (~xγ )exp hγ 0 →j xj Zγ {j|j∈γ}{γ 0 |γ 0 ∈∂j\γ}

Qi (xi ) =

  X 1 exp hγ 0 →i xi Zi {γ 0 |γ 0 ∈∂i}

(γ∈E).

(i∈Vc ).

(15)

(16)

From Eq.(8), it is valid that X

Qi (xi ) =

δzi ,xi QV (~zγ ).

(17)

~ zγ

By substituting Eqs.(15) and (16) to Eq.(17), we derive the simultaneous fixed point equations for effective fields as follows: exp(hγ\i→i xi ) =

  X X Zi X δzi ,xi fγ (~zγ )exp hγ 0 →j zj (i∈Vc , γ∈∂i), Zγ ~z {j|j∈γ\i}{γ 0 |γ 0 ∈∂j\γ}

(18)

γ

such that hγ→i =

X xi

xi ln

nX

δzi ,xi fγ (~zγ )exp



X

X

hγ 0 →j zj

{j|j∈γ\i}{γ 0 |γ 0 ∈∂j\γ}

~ zγ

o

(i∈Vc , γ∈∂i).

(19)

We remark that hγ→i is referred to as a effective field and exp(h γ→i xi ) corresponds to a message from the hyperedge γ to the node i. The marginal probability distributions Q i (xi ) and Qγ (~xγ ) in Eq.(8) can be expressed in terms of expectation values of products of random variables as follows: 1 1 + hxi ii xi (i∈Vc ), 2 2 Y 1 1 X Y Qγ (~xγ ) = |γ| + |γ| h xk i γ xk 2 2 γ 0 ≤γ k∈γ 0 k∈γ 0

Qi (xi ) =

(20) (γ∈E),

(21)

where hxi ii ≡

X

zi =±1

zi Qi (zi ),

h

Y

xk iγ ≡

k∈γ 0

X Y

(

~ zγ k∈γ 0

162

xk )Qγ (~zγ ).

(22)

From Eq.(21), we have X

δxi ,zi Qγ (~zγ ) =

~ zγ

1 1 + hxi iγ xi . 2 2

(23)

By comparing Eq.(23) with Eq.(20), we see that the consistency condition (8) is equivalent to the following reducibility condition: Qi (xi ) =

X

δxi ,zi Qγ (~zγ ) (xi = ±1, i∈Vc , γ∈∂i).

(24)

~ zγ

After setting mi = hxi ii = hxi iγ and mγ 0 = h k∈γ 0 xk iγ , we substitute Eqs.(20)-(21) to Eq.(25) and rewrite the approximate free energy F[{Q γ |γ∈Vc ∪E}] as follows: Q

F[{Qγ |γ∈Vc ∪E}] = −|E| −

XX

wγ 0 mγ 0

γ∈E γ 0 ≤γ



X

(|∂i| − 1)

zi

i∈V

+

X 1

X X 1

γ∈E ~ zγ

2|γ|

+

 1  1 1 + mi zi ln + mi zi 2 2 2 2

  1  Y Y 1 X 1 X 0( 0( m z ) ln + m z ) , γ k γ k 2|γ| γ 0 ≤γ 2|γ| 2|γ| γ 0 ≤γ k∈γ 0 k∈γ 0

(25)

where wγ 0 ≡

1 X Y ( zk )lnfγ (~zγ ) (γ 0 ≤γ, γ∈E). 2|γ| ~z k∈γ 0

(26)

γ

By taking the first deviation with respect to m i (i∈V ) and mγ 0 (γ 0 ≤γ, γ∈E), we can derive the simultaneous equations For the case that E is the set of edges, the present framework ha been given in Ref.[41]. Moreover, the framework has been extended to higher level approximation methods in Ref.[42]. 3. An Extension of Loopy Belief Propagation The conventional LBP has been formulated by means of a set of marginal probability distributions for hyperedges as shown in the previous section. In such formulation, it is hard to calculate some covariances between a pair of nodes which do not belong to a hyperedge. In the present section, we give an extension of LBP in the stand point of CVM. This framework is constructed by combining LBP for graphical models with discrete random variables with the one for Gaussian graphical models. In the similar way of the conventional LBP, we introduce marginal probability distributions P P Qi (xi ) ≡ ~zV δzi ,xi QV (~zV ) and Qγ (~xγ ) ≡ ~zV δ~zγ ,~xγ QV (~zV ). Here the random variable xi at each node i takes +1 and −1. Moreover we consider marginal probability density functions ρi (ξi ), ργ (ξ~γ ) and ρV (ξ~V ), where the random variable ξi at each node i takes any real number in the interval (−∞, +∞). These marginal probability density functions have the same average, variance and covariance as the ones of the marginal probability distributions Q i (zi ) (i∈Vc ) and Qγ (~zγ ) (γ = ∂i∩∂j∈E). Instead of Eq.(7), we consider the following approximate free energy: F[ρV , {Qγ , ργ |γ∈Vc ∪E}] ≡−

XX

Qγ (~zγ )ln(fγ (~zγ )) +

γ∈E ~ zγ

163

Z

ρV (ζ~V )ln(ρV (ζ~V ))dζ~V

+

X X

γ∈E



X

Qγ (~zγ )ln(Qγ (~zγ )) −

~ zγ

(|∂i| − 1)

X

Z

ργ (ζ~γ )ln(ργ (ζ~γ ))dζ~γ Z

Qi (zi )lnQi (zi ) −

zi

i∈Vc



ρi (ζi )ln(ρi (ζi ))dζi



(27)

The marginal probabilities for hyperedges and nodes are determined so as to minimize it under the consistency conditions: Z Z Z X X  ~ ~  z P (z ) = z P (~ z ) = ζ ρ (ζ )dζ = ζ ρ ( ζ )d ζ = ζi ρV (ζ~V )dζ~V (i∈Vc ), i i i i γ γ i i i i i γ γ γ     z i ~ zγZ  Z Z     ~ ~  ζi ρi (ζi )dζi = ζi ργ (ζγ )dζγ = ζi ρV (ζ~V )dζ~V (i∈V \Vc ), Z Z Z  2 2 ~ ~  ζ ρ (ζ )dζ = ζ ρ ( ζ )d ζ = ζi 2 ρV (ζ~V )dζ~V = 1 (i∈V ),  i i i i i γ γ γ   Z Z  X     zi zj Pγ (~zγ ) = ζi ζj ργ (ζ~γ )dζ~γ = ζi ζj ρV (ζ~V )dζ~V (i∈V, j∈V, γ = ∂i∩∂j∈E),  

(28)

~ zγ

and the normalization conditions: X zi

Pi (zi ) =

X

Z

Pγ (~zγ ) =

~ zγ

ρi (ζi )dζi =

Z

ργ (ζ~γ )dζ~γ =

Z

ρV (ζ~V )dζ~V = 1.

(29)

By taking the variational calculations and by driving the extremum conditions, the marginal probabilities Qγ (~xγ ), ργ (~xγ ) (γ∈E), Qi (xi ), ρi (ξi ) (i∈Vc ) and ρV (ξ~V ) are approximately expressed in terms of the Lagrange multipliers as follows: 

Tx fγ (~xγ )exp ~hγ ~xγ − 12 ~xT γ γ Dγ ~

Qγ (~xγ ) = X

fγ (~zγ )exp ~hγ ~zγ − 12 ~zγT DγT ~zγ



ρV g (ξ~V ) = Z ργg (ξ~γ ) = Z







exp ~hV g ξ~V − 12 ξ~VT DV g ξ~V 



(γ∈Vc ∪E),



(31)



exp ~hV g ζ~V + 12 ζ~VT DV g ζ~V dζ~V 

exp ~hγg ξ~γ − 12 ξ~γT Dγg ξ~γ 





exp ~hγg ζ~γ − 12 ζ~γT Dγg ζ~γ dζ~γ

(30)

(γ∈Vc ∪E),

(32)

Here hi , hig , ~hγ , ~hγg , ~hV g , Di , Dig , Dγ , Dγg , DV g are determined so as to satisfy the

164

consistencies (28), the normalizations (29) and the following linear equations:                            

(|∂i| − 1)(hig − hi ) + ~hV g |ii − (|∂i| − 1)Dig + hi|DV g |ii −

X

(~hγg |ii − ~hγ |ii) = 0 (i∈Vc )

Xγ∈∂i

hi|Dγg |ii = 0 (i∈Vc )

γ∈∂i

−hi|Dγg |ii + hi|DV g |ii = 0 (i∈V \Vc ) −~hγg |ii + ~hγ |ii + ~hV g |ii = 0 (i∈V \Vc ) .

(33)

   −hi|Dγg |ji + hi|Dγ |ji + hi|DV g |ji = 0 (i∈V, j∈V, γ = ∂i∩∂j∈E)          hi|DV g |ji = 0 (i∈V, j∈V, ∂i∩∂j ∈E) /         hi|Dγ |ii = 0 (i∈V, i∈γ, γ∈E)       

Di = 0 (i∈Vc )

The matrices Dγ , Dγg and DV g are symmetric matrices. By using Eq.(28) and Eq.(33) and by introducing a new parameter h γ→i ≡hi −~hγ |ii, we derive the following simultaneous equations for Lagrange multipliers h i , hig , ~hγ , ~hγg , ~hV g , Di , Dig , Dγ , Dγg , DV g as follows: mi =

X

zi Pi (zi ) (i∈Vc )

(34)

zi Pγ (~zγ ) (i∈V \Vc , γ∈∂i)

(35)

zi

mi =

X ~ zγ

hi|Dγ |ii = 0 (i∈V, i∈γ, γ∈E) hi|Dγ |ji =

(36)

1 X zi zj ln(Pγ (~zγ )) (i∈V, j∈V, γ = ∂i∩∂j∈E) 2|γ| ~z

(37)

γ

Dig = (1 − mi 2 )−1 (i∈V )

(38)

hi|Dγg −1 |ii = 1 − mi 2 (γ∈E, i < γ)

(39)

hi|Dγg −1 |ji =

(40)

X

zi zj PV (~zγ ) − mi mj (γ∈E, i < γ, j < γ)

~ zγ

hi|DV g |ii = −(|∂i| − 1)Dig +

X

hi|Dγg |ii (i < V, j < V )

(41)

γ∈∂i

hi|DV g |ji = hi|Dγg |ji − hi|Dγ |ji (i∈V, j∈V, ∂i∩∂j∈E)

(42)

hi|DV g |ji = 0 (i∈V, j∈V, ∂i∩∂j = φ)

(43)

~hV g |ii = m ~T V g DV g |ii (i∈V )

(44)

~hγg |ii = m ~T γg Dγg |ii (i∈γ, γ∈E)

(45)

165

hig = mi Dig

(46)

~hγ |ii = −(|∂i| − 1)hi,i − ~hV g |ii +

X

~hγ 0 g |ii +

γ 0 ∈∂i

hγ 0 →i (i∈Vc )

(47)

γ 0 ∈∂i\γ

~hγ |ii = −~hV g |ii + ~hγg |ii (i∈V \Vc , γ∈∂i) hγ→i =

X

(48)

nX Y  Y o 1X xi ln δxi ,zi fγ (~zγ ) exp(zk hk|Dγ |lizl ) exp(~hγ |kizk ) 2 xi kl≤γ ~ z k∈γ\i γ

(i∈Vc , γ∈∂i)

(49)

hi = ~hγ |ii + hγ→i (i∈Vc , γ∈∂i) hi|DV g −1 |ji + mi mj =

X

(50)

zi zj fγ (~zγ )

Y

exp(zk hk|Dγ |lizl )

kl≤γ

~ zγ

 Y

exp(~hγ |kizk )

k∈γ



(i∈V, j∈V, γ = ∂i∩∂j∈E)

(51)

4. An Interpretation of Adaptive TAP Approach from Cluster Variation Method In this section, we give an interpretation of adaptive TAP approaches by means of the framework given in section 3. Moreover, an extended framework of adaptive TAP are also proposed in the present section. We consider the following probability distribution for random variable vector ~x V : exp

X

θi xi +

i∈V

X

Jij xi xj

ij∈E



X . PV (~xV ) ≡ X X θi xi + Jij xi xj exp ~ zV

i∈V

(52)

ij∈E

This probability distribution corresponds to the one that E is only edges ij consisting of two 1 1 nodes and is set to fij (~xij )≡exp( |∂i| θi + |∂j| θj + Jij xi xj ) in Eq.(1). P We introduce marginal probability distributions Q i (xi ) ≡ ~zV δzi ,xi QV (~zV ). Here the random variable xi at each node i takes +1 and −1. Moreover we consider marginal probability density functions ρi (ξi ) and ρV (ξ~V ), where the random variable ξi at each node i takes any real number in the interval (−∞, +∞). In the similar way to Eq.(27), we consider the following approximate free energy: F[ρV , {Qi , ρi |i∈V }] ≡ −

X Z

θi ζi ρV (ζ~V )dζ~ −

i∈V

+

Z

ρV (ζ~V )ln(ρV (ζ~V ))dζ~V +

X

ij∈E

X X

Jij

Z

ζi ζj ρV (ζ~V )dζ~

Qi (zi )lnQi (zi ) −

i∈V zi =±1

Z

+∞ −∞



ρi (ζi )ln(ρi (ζi ))dζi (53)

The marginal probability distribution Q i (xi ) and the marginal probability density functions

166

ρi (ξi ) and ρV (ξ~V ) should satisfy the following consistencies:  Z +∞ Z +∞ X  ~ ~  ζ ρ ( ζ )d ζ = ζ ρ (ζ )dζ = zi Qi (zi ),  i V V i i i i   −∞ −∞  z =±1  i  Z +∞ Z +∞

ζi 2 ρi (ζi )dζi = 1, ζi 2 ρV (ζ~V )dζ~ =   −∞ −∞  Z +∞ Z +∞     ~ ~  ρV (ζV )dζ = ρi (ζi )dζi = 1. −∞

(54)

−∞

Qi (xi ), ρi (ξi ) and ρV (ξ~V ) are determined so as to minimize the above approximate free energy F[ρV , {Qi , ρi |i∈V }] under the constraint conditions (54). By introducing Lagrange multipliers 



~hV g = h1|~hV g , h2|~hV g , · · ·, h|V ||~hV g ,

DV g



h1|DV g |1i

· · · h1|DV g ||V |i · · · h2|DV g ||V |i .. .. . . h|V ||DV g |1i h|V ||DV g |2i · · · h|V ||DV g ||V |i

 h2|DV g |1i  = ..  .

h1|DV g |2i h2|DV g |2i .. .

(55) 

  . 

(56)

to ensure the constraint conditions. We remark that all off-diagonal elements of the matrix D V g are equal to zero. By taking the first variation of the approximate free energy F[ρ V , {Qi , ρi |i∈V }] with respect to the marginal, we can derive the approximate expressions of Q i (xi ), ρi (ξi ) and ρV (ξ~V ) as follows: exp(hi xi ) Qi (xi ) = X exp(hi zi )

(i∈V ),

(57)

zi

ρV (ξ~V ) = Z ρi (ξi ) = Z

~ ξ~V − 1 ξ~T (DV g − J)ξ~V ) exp((~hV g − θ) 2 V

~ ζ~V − 1 ζ~T (DV g − J)ζ~V )dζ~V exp((~hV g − θ) 2 V

exp((hi − ~hV g |ii)ξi − 12 Dig ξi 2 )

exp((hi − ~hV g |ii)ζi − 12 Dig ζi 2 )dζi

(i∈V ).

(58)

(59)

Here hi , ~hV g and DV g are determined so as to satisfy the consistencies (54). The matrix D V g are symmetric matrices. The deterministic equations of h i , ~hV g and DV g are reduced to the following simultaneous equations:  −1 ~  tanh(hi ) = (hi − ~hV g |ii)Dig −1 = (~hV g − θ)(D V g − J) |ii (i∈V ),     

1 − tanh2 (hi ) = Dig −1 = hi|(DV g − J)−1 |ii (i∈V ),

     

(60)

hi|DV g |ji = 0 (i6=j, i∈V, j∈V ),

Eq. (60) is equivalent to the deterministic equation in the adaptive TAP approach for the probabilistic model in Eq.(52). The minimization of the approximate free energy (53) with respect to the marginal under the constraint conditions (54) is one of an interpretation of the adaptive TAP approach from CVM.

167

In the standpoint of CVM, we can propose an extension of the adaptive TAP approach by introducing the following approximate free energy: F[ρV , {Qi , ρi |i∈V }, {Qij , ρij |ij∈E}, ] ≡ −

X Z

θi ζi ρV (ζ~V )dζ~ −

i∈V

+ +

Z

ij∈E

Jij

Z

ζi ζj ρV (ζ~V )dζ~

ρV (ζ~V )ln(ρV (ζ~V ))dζ~V

X X

X

Z

Qij (zi , zj )lnQij (zi , zj ) −

ij∈E zi =±1zj =±1



X

X

(|∂i| − 1)

 X

Qi (zi )lnQi (zi ) −

zi =±1

i∈V

Z

+∞ Z +∞

−∞

−∞

ρij (ζi , ζj )ln(ρij (ζi , ζj ))dζi dζj

+∞

ρi (ζi )ln(ρi (ζi ))dζi

−∞





(61)

In the present framework, the marginal probability distributions Q i (xi ), Qij (xi , xj ) and the marginal probability density functions ρ i (ξi ), ρij (ξi , ξj ) and ρV (ξ~V ) should satisfy the following consistencies: Z Z +∞ Z +∞ Z +∞   ~V )dζ~ =  ζ ρ ( ζ ζ ρ (ζ )dζ = ζi ρij (ζi , ζj )dζi dζj i V i i i i    −∞ −∞ −∞ X X X    = zi Qi (zi ) = zi Qij (zi , zj ) (i∈V, j∈∂i),     z =±1 z =±1 z =±1  i i j  Z +∞ Z +∞ Z +∞  Z

ζi 2 ρV (ζ~V )dζ~ =

ζi 2 ρij (ζi , ζj )dζi dζj =

ζi 2 ρi (ζi )dζi = 1 (i∈V, j∈∂i),

(62)

 −∞ −∞ Z Z−∞   +∞ Z +∞ X X   ~ ~  ζi ζj ρV (ζV )dζ = ζi ζj ρij (ζi , ζj )dζi dζj = zi zj Qij (zi , zj ) (ij∈E),    −∞ −∞  zi =±1zj =±1   Z +∞ Z +∞ Z +∞ Z +∞    ~ ~  ρV (ζV )dζ = ρij (ζi , ζj )dζi dζj = ρi (ζi )dζi = 1 (j∈∂i).  −∞

−∞

−∞

−∞

The marginal are determined so as to minimize the above approximate free energy F[ρV , {Qi , ρi |i∈V }, {Qij , ρij |ij∈E} under the constraint conditions (62). By using the constraint conditions (62), F[ρ V , {Qi , ρi |i∈V }, {Qij , ρij |ij∈E} can be rewritten as F[ρV , {Qi , ρi |i∈V }, {Qij , ρij |ij∈E}] ≡ −

X X

ij∈E zi =±1zj =±1

+ +

Z

ρV (ζ~V )ln(ρV (ζ~V ))dζ~V

X X

X

Qij (zi , zj )lnQij (zi , zj ) −

ij∈E zi =±1zj =±1



 1 θi zi + θj zj + Jij zi zj Qij (zi , zj ) 2 2

X 1

X

i∈V

(|∂i| − 1)

 X

zi =±1

Qi (zi )lnQi (zi ) −

Z

Z

+∞ −∞

+∞ Z +∞

−∞

−∞

ρij (ζi , ζj )ln(ρij (ζi , ζj ))dζi dζj 

ρi (ζi )ln(ρi (ζi ))dζi .



(63)

The approximate free energy (63) corresponds to the one given in the case of the probabilistic distribution (52) by Eq.(27). 5. Quantum Belief Propagation In this section, we derive the message passing rule of LBP in quantum graphical model. Quantum graphical models are usually expressed in terms of density matrices on hypergraphs. We define the vector representations of quantum states and the matrix representation of density

168

matrix and show how to construct the approximate free energy for the density matrix in CVM. The framework in the present section is an extension of the conventional LBP and is based on Ref.[38]. Each random variable xi (i∈V ) takes two possible states specified by +1 and −1. As two representations which corresponds to the two possible states +1 and −1, we introduce two vectors i + 1| ≡ (1, 0) and h − 1| ≡ (0, 1). Moreover we introduce a notation h~xV | = hx1 , x2 , · · ·, x|V | | ≡ hx1 |⊗hx2 |⊗· · ·⊗hx|V | |,

|~xV i ≡ h~xV |T , |~xγ i ≡ h~xγ |T ,

h~xγ | = hxγ1 , xγ2 , · · ·, xγ|γ| | ≡ hxγ1 |⊗hxγ1 |⊗· · ·⊗hxγ|γ| |,

(64)

(65)

so that we have h + 1, +1| = (1, 0, 0, 0), h + 1, −1| = (0, 1, 0, 0), h − 1, +1| = (0, 0, 1, 0) and h − 1, −1| = (0, 0, 0, 1). We consider the following representation R(V ) ≡

exp( − H(V )) , tr exp( − H(V ))

(66)

where  X  H(V ) ≡ H(γ),    γ∈E

 h~x|H(γ)|~y i ≡ h~xγ |E(γ)|~yγ i   

 Y

i∈V \γ



δxi ,yi .

(67)

Though one of the goals in the statistical inference by means of Bayesian networks is to compute the marginal probability at each node or at each hyperedge, the computations correspond to the one of reduced density matrices in the probabilistic information processing by using a density matrix. Reduced density matrices are defined by Y  Y  XX  ~  h~ x |R |~ x i ≡ h~ z |R(V )| ζi δ δ δ (γ∈E), γ γ γ z ,x ζ ,y z ,ζ  j j j j k k   j∈γ ~ z ζ~ k∈V \γ  Y  XX ~ z ,x δz ,y  hx |R |y i ≡ h~ z |R(V )| ζiδ δ (i∈V ),  i i i z ,ζ i i i i  k k  ~ z

(68)

k∈V \{i}

ζ~

P

P

P

P

where ~z ≡ (z1 , z2 , · · ·, z|V | ) and ~z ≡ z1 =±1 z2 =±1 · · · z|V | =±1 . In this paper, the expressions of Eq.(68) are expressed in terms of the following notations: (

R(γ) ≡ tr\γ R(V ), R(i) ≡ tr\i R(V ).

(69)

We explain the representation of Eq.(1) in terms of a density matrix. By using functions fγ (~xγ ) (γ∈E) in Eq.(1), we set the 2|γ| ×2|γ| matrices E(γ) as h~xγ |E(γ)|~yγ i = −

Y



δxi ,yi ln(fγ (~xγ )).

i∈γ

(70)

If the energy matrix E is a diagonal matrix whose all off-diagonal elements h~x|E|~y i (~x 6= ~y) are equal to zero, R(V ) are also diagonal matrix whose diagonal elements h~x|R(V )|~xi are given as

169

PV (~x). hxi |R(V )i |xi i and h~xγ |R(γ)|~yγ i correspond to marginal probabilities P γ (~xγ ) and PV (xi ) as follows: Y    h~ xγ |R(γ)|~yγ i = δxj ,yj Pγ (~xγ ),

(71)

j∈γ



hxi |R(i)|yi i = δxi ,yi PV (xi ),

respectively. We introduce four kinds of 2×2 matrices: X

(+1,+1)

≡ | + 1ih + 1| =

X (+1,−1) ≡ | + 1ih − 1| = X

(−1,+1)

≡ | − 1ih + 1| =

X (−1,−1) ≡ | − 1ih − 1| =









1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1









,

(72)

,

(73)

,

(74)

,

(75) (xy)

as well as 2×2 identity matrix I. In terms of these matrices X (xy) , 2|γ| ×2|γ| matrix Xi (xy) (i∈γ), and 2|V | ×2|V | matrix Xi (V ) (i∈V ) are defined as follows: (xy)

Xi (xy)

Xi

(γ) ≡ I⊗I⊗· · ·⊗I⊗X (xy) ⊗I⊗I⊗· · ·⊗I, (xy)

= Xi

(V ) ≡ I⊗I⊗· · ·⊗I⊗X (xy) ⊗I⊗I⊗· · ·⊗I.

(γ)

(76)

(77)

Instead of Eq.(7), we consider the following free energy expressed in terms of the reduced density matrices Q(i) (i∈Vc ) and Q(γ) (γ∈E) of a 2|V | ×2|V | trial density matrix Q: F[{Q(γ)|γ∈Vc ∪E}] ≡ −

X

tr[Q(γ)E(γ)] +

γ∈E



X

X

tr[Q(γ)ln(Q(γ)]

γ∈E

(|∂i| − 1)tr[Q(i)ln(R(i)]

(78)

i∈Vc

The reduced density matrices for hyperedges and nodes are determined so as to minimize it under the consistency conditions: Q(i) = tr\i Q(γ) (γ∈∂i, i∈Vc ),

(79)

tr[Q(i)] = 1 (i∈Vc ), tr[Q(γ)] = 1 (γ∈E).

(80)

and the normalizations

The reduced density matrices Q(i) (i∈V c ) and Q(γ) (γ∈E) which are determined so as to minimize the free energy F[{Q(γ)|γ∈V c ∪E}] under the above constraint conditions (79) and (80) are regarded as approximations of reduced density matrices R(i) ≡ tr \i R(V ) and R(γ) ≡ tr\γ R(V ) of the density matrix R(V ).

170

In the quantum belief propagation, some reduced density matrices {Q(i)|i∈V c } and {Q(γ)|γ∈E} are approximately expressed as follows: exp

X X X

(xy)

hx|L(j, γ)|yiXj

(γ)

j∈γ x=±1y=±1

Q(γ) =

h

tr exp

X X X

(xy)

hx|L(j, γ)|yiXj



(γ)

j∈γ x=±1y=±1

Q(i) =

i (γ∈E),

(81)

exp(L(i, i)) (i∈Vc ), tr[exp(L(i, i))]

(82)

where L(i, i) and L(i, γ) are 2×2 matrices and are determined so as to satisfy the following equations: X

0 = −(|∂i| − 1)L(i, i) +

L(i, γ) (i∈Vc ),

(83)

{γ|γ≥i, γ∈E}

as well as the reducibility conditions (79). Now we introduce effective fields defined by the following linear transformations: X

L(i, γ) =

hγ→i (i∈Vc , γ∈∂i).

(84)

{γ 0 |γ 0 ∈∂i\γ}

By using Eq.(84), we can rewrite the reduced density matrices Q(i) and Q(γ) as Q(γ) =

  X X X X 1 (xy) exp − E(γ) + hx|hγ 0 →j |yiXj (γ 0 ) (γ∈E), Zγ j∈γ γ 0 ∈∂j\γ x=±1 y=±1

(85)

 X 1 exp hγ 0 →i (i∈Vc ), Zi γ 0 ∈∂i

(86)

Q(i) = where h



Zγ ≡ tr exp − E(γ) +

X

X

X

X

(xy)

hx|hγ 0 →j |yiXj

j∈γ γ 0 ∈∂j\γ x=±1 y=±1

h

Zi ≡ tr exp

X

hγ 0 →i

γ 0 ∈∂i

i

(γ 0 )

i

(γ∈E),

(87)

(i∈Vc ).

(88)

By substituting Eqs.(85) and (86) to Eq.(79), we can derive the simultaneous equations to determine the effective fields hγ→i as follows: hγ→i = −

X

hγ 0 →i

γ 0 ∈∂i\γ

+ ln tr\i

hZ

i





exp − E(γ) +

X

X

X

X

0

hx |hγ 0 →j |y

0

(x0 y 0 ) 0 iXj (γ )

j∈γ γ 0 ∈∂j\γ x0 =±1 y 0 =±1

(i∈Vc , γ∈∂i),

171

i

!

(89)

Here hγ→j can be referred to as an effective field in the statistical mechanics. It is known that effective fields correspond to messages in the conventional loopy belief propagation. Eq.(89) can be regarded as a message passing rule in the quantum belief propagation. As one of the examples, we consider the reduced density matrix for the probabilistic inference system given as “Asia”. The energy matrix H(V ) can be given as the following 2 8 ×28 matrix: H(V ) ≡ H(13) + H(24) + H(25) + H(346) + H(568) + H(67),

(90)

where   h~x|H(13)|~y i ≡ hx1 , x3 |E(13)|y1 , y3 iδx2 ,y2 δx4 ,y4 δx5 ,y5 δx6 ,y6 δx7 ,y7 δx8 ,y8 ,     h~ x|H(24)|~y i ≡ hx2 , x4 |E(24)|y2 , y4 iδx1 ,y1 δx3 ,y3 δx5 ,y5 δx6 ,y6 δx7 ,y7 δx8 ,y8 ,   

h~x|H(25)|~y i ≡ hx2 , x5 |E(25)|y2 , y5 iδx1 ,y1 δx3 ,y3 δx4 ,y4 δx6 ,y6 δx7 ,y7 δx8 ,y8 ,

(91)

 h~x|H(346)|~y i ≡ hx3 , x4 , x6 |E(346)|y3 , y4 , y6 iδx1 ,y1 δx2 ,y2 δx5 ,y5 δx7 ,y7 δx8 ,y8 ,     h~x|H(568)|~y i ≡ hx5 , x6 , x8 |E(568)|y5 , y6 , y8 iδx1 ,y1 δx2 ,y2 δx3 ,y3 δx4 ,y4 δx7 ,y7 ,   

h~x|H(67)|~y i ≡ hx6 , x7 |E(67)|y6 , y7 iδx1 ,y1 δx2 ,y2 δx3 ,y3 δx4 ,y4 δx5 ,y5 δx8 ,y8 .

The density matrix R(V ) is also 28 ×28 matrix. By the context of above quantum belief propagation, reduced density matrices are approximately given as follows: 1 Z13 exp(E(13) + I⊗h346→3 ) 1 Z24 exp(E(24) + h25→2 ⊗I + I⊗h346→4 ) 1 Z25 exp(E(25) + h24→2 ⊗I + I⊗h568→5 ) 1 exp(E(346) + h13→3 ⊗I⊗I + I⊗h24→4 ⊗I Q(346) = Z346 + I⊗I⊗h568→6 + I⊗I⊗h67→6 ) 1 Q(568) = Z568 exp(E(568) + h25→5 ⊗I⊗I + I⊗h568→6 ⊗I Q(67) = Z167 exp(E(67) + h346→6 ⊗I + h568→6 ⊗I)

 Q(13) =      Q(24) =       Q(25) =           

  Q(2) =       Q(3) =

Q(4) =

   Q(5) =    

Q(6) =

1 Z2 exp(h24→2 + h25→2 ) 1 Z3 exp(h346→3 + h13→3 ) 1 Z4 exp(h346→4 + h24→4 ) 1 Z5 exp(h25→5 + h568→5 ) 1 Z6 exp(h346→6 + h568→6

,

(92)

+ I⊗h67→6 ⊗I)

.

(93)

+ h67→6 )

where Zi and Zγ are normalization constants. The above density matrices Q i and Qγ have the following reducibility conditions:  Q(2) = tr\2 Q(24) = tr\2 Q(25),       Q(3) = tr\3 Q(13) = tr\3 Q(346),

Q(4) = tr Q(24) = tr Q(346),

\4 \4    Q(5) = tr Q(25) = tr  \5 \5 Q(568),  

(94)

Q(6) = tr\6 Q(346) = tr\6 Q(568) = tr\6 Q(67).

By substituting Eqs.(92)-(93) to Eq.(94), we obtain the following recursion formula to determine

172

the effective fields hγ→i :                                           





Z2 tr exp(E(24) + h25→2 ⊗I + I⊗h346→4 )   Z24 \2 Z2 h25→2 = −h24→2 + ln Z25 tr\2 exp(E(25) + h24→2 ⊗I + I⊗h568→5 )   h13→3 = −h346→3 + ln ZZ133 tr\3 exp(E(13) + I⊗h346→3 )   3 h346→3 = −h13→3 + ln ZZ346 tr\3 exp(E(346) + h13→3 ⊗I⊗I + I⊗h24→4 ⊗I)   h24→4 = −h346→4 + ln ZZ244 tr\4 exp(E(24) + h25→2 ⊗I + I⊗h346→4 )

h24→2 = −h25→2 + ln

h346→4 =  −h24→4 4 + ln ZZ346 tr\4 exp(E(346) + h13→3 ⊗I⊗I

+ I⊗h24→4 ⊗I + I⊗I⊗(h568→6 + h67→6 )) 



(95)



h25→5 = −h568→5 + ln ZZ255 tr\5 exp(E(25) + h24→2 ⊗I + I⊗h568→5 ) h568→5 =  −h25→5  5 + ln ZZ568 tr\5 exp(E(568) + h25→5 ⊗I⊗I + I⊗(h568→6 + h67→6 )⊗I) h346→6 =  −h568→6 − h67→6 6 + ln ZZ346 tr\6 exp(E(346) + h13→3 ⊗I⊗I

                            + I⊗h ⊗I + I⊗I⊗(h + h )) 24→4 568→6 67→6      h = −h − h 568→6 346→6 67→6       Z6  + ln tr exp(E(568) + h ⊗I⊗I + I⊗(h + h )⊗I  25→5 568→6 67→6 Z568 \6      Z6

h67→6 = −h346→6 − h568→6 + ln

Z67 tr\6 exp(E(67)

+ (h346→6 + h568→6 )⊗I



6. Concluding Remarks In the present paper, the fundamental structure of conventional LBP has been reviewed and some new advanced mean field approaches has been proposed. Our proposed approaches in sections 3 and 4 are based on the exact calculations of Gaussian graphical model and are constructed by combining their exact calculation with CVM. They can be applied to some discrete probabilistic models and we have given some interpretations and general extensions in the adaptive TAP approaches. It is interesting to compare the conventional adaptive TAP approach with our proposed extensions in some numerical experiments for the probability distribution in Eq.(52). It is left to a separate paper. Cycle corrections in the present method in section 3 may be estimated by using the purtabative method based on Plefka expansion. It is also interesting problem. Also message propagation rules in LBP for quantum systems have been derived in section 5 and an explicit example for the density matrix of graphical models with eight nodes have been shown. We have to remark that the formal structures of LBP for quantum systems are different from the conventional LBP. In the quantum systems, we have to consider not only quantum spin systems but also quantum electron systems with Fermion or Boson particles. Moreover, it is known that quantum spin systems including the Heisenberg model can be expressed in terms of the Bose lattice gas. In such electron systems, the tractable models of mean-field approaches are ideal Bose or Fermi gases. As one of fundamental quantum systems, we have a Bose lattice gas system whose energy matrix is defined by H(V ) = µ

X

i∈V

a†i ai + t

X

(a†i aj + a†j ai ) + u

ij∈E

X

i∈V

173

a†i a†i ai ai

(96)

where E is a square lattice with periodic boundary conditions along x- and y-directions. a †i and ai are Bose creation and annihilation operators. We introduce reduced density matrices ρ V , ρi and Qi as follows:    ρV =     

exp(−

P P (µ+hi,V f )a†i ai −t ij∈E (a†i aj +a†j ai )) Pi∈V P , † † †

tr[exp(−

i∈V

(µ+hi,V f )ai ai −t

exp(−hi,if a†i ai )

ij∈E

(ai aj +aj ai ))]

(97)

ρi =  tr[exp(−hi,if a†i ai )]    † † †    Qi = exp(−hi,i ai †ai −uai †ai †ai ai ) , tr[exp(−hi,i ai ai −uai ai ai ai )]

Here ρV and ρi are density matrices of the free Bose particles. The energy matrices of density matrices ρV and ρi have quadratic forms of the Bose creation and annihilation operators. We consider the following approximate free energy for the Bose lattice gas in Eq.(96), which is defined by F[ρV , {Qi , ρi |i∈V }] = tr

h X

µa†i ai +

i∈V

h

X



ij∈E

i

+tr ρV lnρV +

i

t(a†i aj + a†j ai ) ρV +

X h

i

h

X

i∈V

tr Qi lnQi − tr ρif lnρif

i∈V

h

utr a†i a†i ai ai Qi

i

.

i

(98)

In the similar way to the previous section, reduced density matrices ρ V , ρi and Qi are determined so as to minimize F[ρV , {Qi , ρi |i∈V }] under the consistencies tr[a†i ai ρV ] = tr[a†i ai ρi ] = tr[a†i ai Qi ] (i∈V ),

(99)

tr[ρV ] = tr[ρi ] = tr[Qi ] = 1 (i∈V ).

(100)

and the normalizations

By taking the first deviation of the approximate free energy, reduced density matrices ρ V , ρi and Qi can be derived as the expression in Eq.(97). h i,if , hi,i , hi,V f and ui,i is determined so as to satisfy the consistencies (99) and the following simultaneous linear equations hi,if = hi,i + hi,V f (i∈V ).

(101)

From the reduced density matrices determined by means of Eqs.(99) and (101), some approximate expectation values of the density matrix R(V ) can be given as follows: tr[a†i ai R(V )] ' tr[a†i ai ρV ] (i∈V ),

(102)

tr[(a†i aj + a†i aj )R(V )] ' tr[(a†i aj + a†i aj )ρV ] (ij∈E),

(103)

tr[a†i a†i ai ai R(V

(104)

)] '

tr[a†i a†i ai ai Qi ]

(i∈V ).

The reduced density matrices ρV , and ρi corresponds to the density matrices of the ideal Bose lattice gases with V nodes and one node, respectively. The expectation value tr[(a †i aj +a†i aj )ρV ] can be computed by using the discrete Fourier transformations. This framework was proposed in Ref.[39] and was applied to analyze the low-temperature behaviour of Heisenberg model in Ref.[40]. It corresponds to an extension of spin wave theory[43]. When quantum systems have locally uniform model parameters, for example, interactions and external fields and so on, the corresponding reduced density matrix for ideal Bose gas system can be treated by using the discrete Fourier transformation. However, in the many cases of

174

probabilistic information processing, it is expected to formulate the algorithms that is applicable to density matrices with locally non-uniform model parameters, This is left for future research. Acknowledgements This work was partly supported by the Grants-In-Aid (No.18079002) and the Global COE (Center of Excellence) Program “Center of Education and Research for Information Electronics Systems” for Scientific Research from the Ministry of Education, Culture, Sports, Science and Technology of Japan.

References [1] Opper M and Saad D (eds) 2001 Advanced Mean Field Methods — Theory and Practice — (Cambridge: MIT Press) [2] Kabashima Y and Saad D 1998 Europhysics Letters 44 668. [3] Kschischang F R, Frey R J and Loeliger H -A 2001 IEEE Transactions on Information Theory 47 498. [4] Yedidia J S, Freeman W T and Weiss Y 2005 IEEE Transactions on Information Theory 51 2282 [5] Freeman W T, Jones T R and Pasztor E C 2002 IEEE Computer Graphics and Applications 22 56. [6] Tanaka K 2002 J. Phys. A: Math. Gen. 35 R81 [7] Willsky A S 2002 Proceedings of IEEE 90 1396. [8] Frey B J 1998 Graphical Models for Machine Learning and Digital Communication (Cambridge: MIT Press) [9] MacKay D J C 2003 Information Theory, Inference, and Learning Algorithms (Cambridge University Press) [10] Kabashima Y and Saad D 2004 J. Phys. A: Math. Gen. 37 R1 [11] Pearl J 1988: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (Morgan Kaufmann) [12] Jensen F V 2001 Bayesian Networks and Decision Graphs Springer [13] Tanaka K 2003 IEICE Transactions on Information and Systems E86-D 1228 [14] Kikuchi R 1951 Phys. Rev. 81 988. [15] Morita T 1957, J. Phys. Soc. Jpn. 12 753 [16] Morita T 1972 J. Math. Phys. 13 115 [17] Morita T 1984 J. Stat. Phys. 34 319 [18] Morita T 1990 J. Stat. Phys. 59 819 [19] Pelizzola A 2005 J. Phys. A: Math. Gen. 38 R309 [20] Weiss Y 2000 Neural Computation 12 1 [21] Montanari A and Rizzo T 2005 J. Stat. Mech.: Theory and Experiment P10011 [22] Marinari E and Semerjian G 2006 J. Stat. Mech.: Theory and Experiment P06019 [23] Yasuda M and Tanaka K 2006 J. Phys. Soc. Jpn 75 084006 [24] Yasuda M and Tanaka K 2007 J. Phys. A: Math. Theor., 40 9993 [25] Heskes T 2004 Neural Computation 16 2379. [26] Weiss Y Freeman W T 2001 Neural Computation 13 2173. [27] Tanaka K, Shouno H, Okada M, Titterington D M 2004: J. Phys. A: Math. Theor., 37 8675. [28] Tanaka K and Titterington D M: J. Phys. A: Math. Theor., 40 11285. [29] Opper M and Winther O 2001 Phys. Rev. Letts. 86 3695 [30] Opper M and Winther O 2001 Phys. Rev. E 64 05613186 [31] Opper M and Winther O 2005 Journal of Machine Learning Research 1 1 [32] Hojen-Sorensen P A F R and Winther O 2002 Neural Computation 2002 889 [33] Csato L, Opper M and Winther O 2003 Complexity 8 64 [34] Biroli G and Cugliandolo L F 2001 Phys. Rev. B, 64 014206 [35] Tanaka K and Horiguchi T 1997 IEICE Transactions A, J80-A 2117 (in Japanese); translated in Electronics and Communications in Japan 3: Fundamental Electronic Science, 83 84. [36] Suzuki S, Nishimori H and Suzuki M 2007 Phys. Rev. E 75 051112 [37] Hastings M B 2007 Phys. Rev. B 76 201102(R) [38] Morita T 1957 J. Phys. Soc. Jpn. 12 1060 [39] Morita T 1994 Prog. Theor. Phys. 92 1081 [40] Morita T 1995 J. Phys. Soc. Jpn. 64 1211 [41] Horiguchi 1981 Physica A 107 360 [42] Yasuda M and Horiguchi T 2006 Physica A 368 83 [43] Kubo R 1952 Phys. Rev. 87 568

175

176

Proceedings of the International Workshop on Statistical-Mechanical Informatics September 14–17, 2008, Sendai, Japan

Entanglement Manipulation under Non-Entangling Operations Fernando G.S.L. Brand˜ ao Institute for Mathematical Sciences, Imperial College London, London SW7 2BW, UK E-mail: [email protected]

Martin B. Plenio Institute for Mathematical Sciences, Imperial College London, London SW7 2BW, UK Abstract. We demonstrate that entanglement shared by two or more parties can be asymptotically reversibly interconverted when one considers the set of operations which asymptotically cannot generate entanglement. In this scenario we find that the entanglement of every quantum state is uniquely characterized by a single quantity: the regularized relative entropy of entanglement. The main technical tool is a generalization of quantum Stein’s Lemma, which gives optimal discrimination rates in quantum hypothesis testing, to the case in which the alternative hypothesis might vary over sets of correlated states. We analyse the connection of our approach to recent rigorous formulations of the sencond law of thermodynamics.

1. Introduction A basic feature of many physical settings is the existence of constraints on physical operations and processes that are available. These restrictions generally imply the existence of resources that can be consumed to overcome the constraints. Examples include an auxiliary heat bath in order to decrease the entropy of an isolated thermodynamical system or prior secret correlations for the establishment of secret key between two parties who can only operate locally and communicate by a public channel. In quantum information theory one often considers the scenario in which two or more distant parties want to exchange quantum information, but are restricted to act locally on their quantum systems and communicate classical bits. We find in this context that a resource of intrinsic quantum character, entanglement, allows the parties to completely overcome the limitations caused by the locality requirement on the quantum operations available. In this respect resource theories are considered to determine when a physical system, or a state thereof, contains a given resource, to characterize the possible conversions from an state to another when one has access only to a restricted class of operations which cannot create the resource for free, and to quantify the amount of such a resource contained in a given system.

177

One may carry out these investigations at the level of individual systems, which is motivated by experimental considerations. It is natural however to expect that a simplified theory will emerge when instead one looks at the bulk properties of a large number of systems. The most successful example of such a theory is arguably thermodynamics. This theory was initially envisioned to describe the physics of large systems in equilibrium, determining their properties by a very simple set of rules of universal character. This was reflected in the formulation of the defining axiom of thermodynamics, the second law, in terms of quasi-static processes and heat exchange. However, the apparently universal applicability of thermodynamics suggested a deeper mathematical and structural foundation. Indeed, there is a long history of examinations of the foundations underlying the second law, starting with Carath´eodory work in the beginning of last century (1). Of particular interest in the present context is the work of Giles (2) and notably Lieb and Yngvason (3) stating that there exists a total ordering of equilibrium thermodynamical states that determines which state transformations are possible by means of an adiabatic process. In this sense, thermodynamics can be seen as a resource theory of order, dictating which transformations are possible between systems with different amounts of (dis)order by operations which cannot order out systems (adiabatic processes). A remarkable aspect of these fountational works (2; 3) is that from simple, abstract, axioms they were able to show the existence of an entropy function S fully determining the achievable transformations by adiabatic processes: given two equilibrium states A and B, A can be converted by an adiabatic process into B if, and only if, S(A) ≤ S(B). As pointed out by Lieb and Yngvason (4), it is a strength of this abstract approach that it allows to uncover a thermodynamical structure in settings that may at first appear unrelated. Early studies in quantum information indicate that entanglement theory could be one such possible setting. Possible connections between entanglement theory and thermodynamics were noted earlier on when it was found that for bipartite pure states a very similar situation to the second law holds in the asymptotic limit of an arbitrarily large number of identical copies of the state. Given two bipartite pure states |ψAB i and |φAB i, the former can be converted into the latter by LOCC if, and only if, E(|ψAB i) ≥ E(|φAB i), where E is the entropy of entanglement (5). However, for mixed entangled states there are bound entangled states that require a non-zero rate of pure state entanglement for their creation by LOCC, but from which no pure state entanglement can be extracted at all (6). As a consequence no unique measure of entanglement exists in the general case and no unambiguous and rigorous direct connection to thermodynamics appeared possible, despite various interesting attempts. In this paper we identify a class of quantum operations which can be considered as a counterpart in entanglement theory of adiabatic processes. This, together with a new result on quantum hyothesis testing, allow us to establish a Theorem completely analogous to the Lieb and Yngvason formulation of the second law of thermodynamics for entanglement manipulation. Similar considerations may also be applied to resource theories in general, including e.g. theories quantifying the non-classicality, the non-Gaussian character, and the non-locality of quantum states. Indeed, the main conceptual message of this paper is an unified approach to deal with resources manipulation, by abstracting from theories where a resource appears due to the existence of some contraint on the type of operations available, to theories where the class of operations is derived from the resource considered, and chosen to be such that the latter cannot be freely generated. As it will be shown, such a shift of focus leads to a much simpler and elegant theory, which at the same time still gives relevant information about the original setting.

178

2. Definitions and Main Results A m-partite quantum state ρ ∈ D(H1 ⊗ ... ⊗ Hm ) contains only classical correlations, and is called separable (7), if there exist local density operators ρkj acting on Hk and a probability distribution {pj } such that ρ=

X

pj ρ1j ⊗ .... ⊗ ρm j .

(1)

j

Operationally, the set of separable states S is formed by all states that may be created from a pure product state |0ih0| ⊗ ... ⊗ |0ih0|, i.e. an uncorrelated state, by means of local operations and classical communicatin (LOCC). If ρ cannot be written as in Eq. (1) we say it is entangled and its generation requires, in addition to LOCC, an exchange of quantum particles (quantum communication) or a supply of pre-existing pure entangled states that are consumed in the process. A central state in this context is the two qubit maximally entangled state, 2 1 X φ+ := |i, iihj, j|, (2) 2 i,j=1

which defines the unit of entanglement (8). The observation that LOCC alone does not create entanglement has been taken further to formulate as the basic law of entanglement manipulation that entanglement cannot be increased by LOCC. A principle that is similar in spirit to the second law of thermodynamics. Our results are concerned with the asymptotic limit of a large number of identical copies of quantum states. Here it is also natural to consider transformations between states that may be approximate for any finite number of systems and are only required to become exact in the asymptotic limit. To describe this limiting process rigorously the well established trace distance D(ρ, σ) = ||ρ − σ||1 between quantum states is used. We say that a state ρ can be asymptotically converted into another state σ by operations of a given class if there is a sequence of quantum operations {Ψn } within such a class acting on n copies of the first state such that limn→∞ D(Ψn (ρ⊗n ), σ ⊗n ) = 0. To formulate the main result we need two measures that will be used to quantify entanglement. It will be satisfying that these two seemingly distinct approaches will turn out to be equivalent in the asymptotic limit, as a consequence of the technical proof of our main theorem. We consider the relative entropy of entanglement (9; 10) ER (ρ) = min S(ρ||σ), σ∈S

(3)

where S(ρ||σ) = tr{ρ log ρ − ρ log σ} is the quantum relative entropy. and S is the set of separable states. Furthermore, we consider the global robustness of entanglement (11; 12) which is defined as   ρ + sσ ∈S . (4) RG (ρ) = min s : ∃ σ s.t. s∈R 1+s Both are meaningful entanglement quantifiers and a more detailed account of their properties can be found in Refs. (8; 13). They will play a crucial role in the proof of our main theorem. We will also use the log global robustness, given by LRG (ρ) := log(1 + RG (ρ)). As we are concerned with the entanglement properties in the asymptotic limit we should not consider entanglement measures at the single copy level, but rather their asymptotic,

179

or regularized, counterpart. In the case of ER , the relevant quantity to consider is actually the regularized relative entropy of entanglement, given by ER (ρ⊗n ) . (5) n→∞ n This will turn out to be the central quantity in this work as it will emerge as the unique entanglement quantifier. The correct choice of the set of operations employed is crucial for establishing reversibility in entanglement manipulation. To motivate this choice it is instructive to note that in the context of the second law it follows both from the approach of Giles (2) as well as Lieb and Yngvason (3) that the class formed by all adiabatic processes is the largest class of operations which cannot decrease the entropy of an isolated equilibrium thermodynamical system. Following such an inside we now identify the largest set of quantum operations that obeys the basic law of the non-increase of entanglement. Then one might expect to achieve reversibility in entanglement manipulation and thus a full analogy to the second law of thermodynamics. While operationally well motivated, the set of LOCC operations is not such a class. As we are concerned with asymptotic entanglement manipulation, it is physically natural and also convenient for mathematical reasons to define the set of asymptotically non-entangling operations, composed of sequences of operations that for a finite number of copies n may generate a small amount n of entanglement which vanishes asymptotically, limn→∞ n = 0. It is important to note that we do not simply require that the entanglement per copy vanishes but actually the total amount of entanglement. The next two definitions determine precisely the class of maps employed. ∞ ER (ρ) = lim

0

0

Definition 2.1 Let Ω : D(Cd1 ⊗ ... ⊗ Cdm ) → D(Cd1 ⊗ ... ⊗ Cdm ) be a quantum operation. We say that Ω is an -non-entangling (or -separability-preserving) map if for every separable state σ ∈ D(Cd1 ⊗ ... ⊗ Cdm ), RG (Ω(σ)) ≤ .

(6)

We denote the set of -non-entangling maps by SEP P (). With this definition an asymptotically non-entangling operation is given by a sequence ⊗n ⊗n of CP trace preserving maps {Λn }n∈N , Λn : D(Hin ) → D(Hout ), such that each Λn is n -non-entangling and limn→∞ n = 0. The use of the global robustness as a measure of the amount of entanglement generated is not arbitrary. The reason for this choice is explained in Ref. (13). Having defined the class of maps we are going to use to manipulate entanglement, we can define the cost and distillation functions. Definition 2.2 We define the entanglement cost under asymptotically non-entangling maps of a state ρ ∈ D(Cd1 ⊗ ... ⊗ Cdm ) as     kn ⊗kn ⊗n ane : lim min ||ρ − Λ(φ2 )||1 = 0, lim n = 0 , EC (ρ) := inf lim sup n→∞ Λ∈SEP P (n ) n→∞ {kn ,n } n→∞ n where the infimum is taken over all sequences of integers {kn } and real numbers {n }. In n the formula above φ⊗k stands for kn copies of a two-dimensional maximally entangled 2 state shared by the first two parties and the maps Λn : D((C2 ⊗ C2 )⊗kn ) → D((Cd1 ⊗ ... ⊗ Cdm )⊗n ) are n -non-entangling operations.

180

Definition 2.3 We define the distillable entanglement under asymptotically nonentangling maps of a state ρ ∈ D(Cd1 ⊗ ... ⊗ Cdm ) as     kn ⊗kn ane ⊗n : lim ED (ρ) := sup lim inf min ||Λ(ρ ) − φ2 ||1 = 0, lim n = 0 , n→∞ n→∞ n n→∞ Λ∈SEP P (n {kn ,n } where the infimum is taken over all sequences of integers {kn } and real numbers {n }. In n stands for kn copies of a two-dimensional maximally entangled the formula above φ⊗k 2 state shared by the first two parties and the maps Λn : D((Cd1 ⊗ ... ⊗ Cdm )⊗n ) → D((C2 ⊗ C2 )⊗kn ) are n -non-entangling operations. Note that when we do not specify the state of the other parties we mean that the nonlocal parts of their state is trivial (one-dimensional). Note furthermore that the fact that initially only two parties share entanglement is not a problem as the class of operations we employ include the swap operation. We are now in position to state the main result of this chapter. Theorem 2.4 For every multipartite state ρ ∈ D(Cd1 ⊗ ... ⊗ Cdm ), ane ∞ ECane (ρ) = ED (ρ) = ER (ρ).

(7)

We thus find that under asymptotically non-entangling operations entanglement can be interconverted reversibly. The situation is analogous to the Giles-Lieb-Yngvason formulation of the second law of thermodynamics. Indeed, Theorem 2.4 readiy implies 0

Corollary 2.5 For two multipartite states ρ ∈ D(Cd1 ⊗ ... ⊗ Cdm ) and σ ∈ D(Cd1 ⊗ ... ⊗ 0 Cdm0 ), there is a sequence of quantum operations Λn such that Λn ∈ SEP P (n ),

lim n = 0,

n→∞

(8)

and1 lim ||Λn (ρ⊗n ) − σ ⊗n−o(n) ||1 = 0

n→∞

(9)

if, and only if, ∞ ∞ ER (ρ) ≥ ER (σ).

(10)

In addition we have also identified the regularized relative entropy of entanglement as the natural counterpart in entanglement theory to the entropy function in thermodynamics. As will be shown in section 3, the regularized relative entropy measures how distinguishable an entangled state is from an arbitrary sequence of separable states. Therefore, we see that under asymptotically non-entangling operations, the amount of entanglement of any multipartite state is completely determined by how distinguishable the latter is from a state that only contains classical correlations. Furthermore, in section 3 we will also see that the amount of entanglement is equivalently uniquely defined in terms of the robustness of quantum correlatons to noise in the form of mixing. In the following sections we outline the structure of the proof of Theorem 2.4. The proof is rather lengthy and hence it is out of the scope of the present paper to give it in full. The interested reader is reffered to Refs. (13; 14; 15) for a full rigorous proof of the results of this paper. 1

In Eq. (9) o(n) stands for a sublinear term in n.

181

3. A Generalization of Quantum Stein’s Lemma We now turn to the main technical tool for establishing our main result. This is not directly related to entanglement theory, but actually to quantum hypothesis testing. Hypothesis testing refers to a general set of tools in statistics and probability theory for making decisions based on experimental data from random variables. In a typical scenario, an experimentalist is faced with two possible hypothesis and must decide based on experimental observation which of them was actually realized. Suppose we are faced with a source that emits several i.i.d. copies of one of two quantum states ρ and σ, and we should decide which of them is being produced. Since the quantum setting also encompasses the classical, we will focus on the former, noting the differences between the two when necessary. In order to learn the identity of the state the observer measures a two outcome POVM {Mn , I − Mn } given n realizations of the unknown state. If he obtains the outcome associated to Mn (I − Mn ) then he concludes that the state was ρ (σ). The state ρ is seen as the null hypothesis, while σ is the alternative hypothesis. There are two types of errors here: • Type I: The observer finds that the state was ρ, when in reality it was σ. This happens with probability αn (Mn ) := tr(ρ⊗n (I − Mn )). • Type II: The observer finds that the state was σ, when it actually was ρ. This happens with probability βn (Mn ) := tr(σ ⊗n Mn ). There are several distinct settings that might be considered, depending on the importance we attribute to the two types of errors (16). In one such setting, the probability of type II error should be minimized to the extreme, while only requiring that the probability of type I error is bounded by a small parameter . The relevant error quantity in this case can be written as βn () := min {βn (Mn ) : αn (Mn ) ≤ }. 0≤Mn ≤I

Quantum Stein’s Lemma (17; 18) tell us that for every 0 ≤  ≤ 1, lim −

n→∞

log(βn ()) = S(ρ||σ). n

(11)

This fundamental result gives a rigorous operational interpretation for the relative entropy and was proven in the quantum case by Hiai and Petz (17) and Ogawa and Nagaoka (18). Different proofs have since be given in Refs. (19; 20; 16). The relative entropy is also the n→∞ asymptotic optimal exponent for the decay of βn when we require that αn −→ 0 (20). Let us now turn to describe the setting of quantum hypothesis setting that we are interest in. Given a set of states M ⊆ D(H), we define EM (ρ) := inf S(ρ||σ),

(12)

LRM (ρ) := inf Smax (ρ||σ),

(13)

Smax (ρ||σ) := min{s : ρ ≤ 2s σ}

(14)

σ∈M

and σ∈M

where is the maximum relative entropy introduced by Datta (21). Note that if we take M to be the set of separable states, then EM and LRM reduce to the relative entropy of

182

entanglement and the logarithm global robustness of entanglement. This connection is the reason for the measures nomenclature. We will also need the smooth version of LRM , defined as  (ρ) := min LRM (˜ ρ), (15) LRM ρ˜∈B (ρ)

where B (ρ) := {˜ ρ ∈ D(H) : ||ρ − ρ˜||1 ≤ }. Let us specify the set of states over which the alternative hypothesis can vary. We will consider any family of sets {Mn }n∈N , with Mn ∈ D(H⊗n ), satisfying the following properties (i) (ii) (iii) (iv) (v)

Each Mn is convex and closed. Each Mn contains the maximally mixed state I⊗n / dim(H)n . If ρ ∈ Mn+1 , then trn+1 (ρ) ∈ Mn . If ρ ∈ Mn and σ ∈ Mm , then ρ ⊗ σ ∈ Mn+m . If ρ ∈ Mn , then Pπ ρPπ ∈ Mn , for every π ∈ Sn 2 .

We define the regularized version of the quantity given by Eq. (12) as 1 EMn (ρ⊗n ). n→∞ n

∞ (ρ) := lim EM

(16)

We now turn to the main result of this section. Suppose we have one of the following two hypothesis: (i) For every n ∈ N we have ρ⊗n . (ii) For every n ∈ N we have an unknown state ωn ∈ Mn , where {Mn }n∈N is a family of sets satisfying properties i-v. The next Theorem gives the optimal rate limit for the type II error when one requires that type I error vanishes asymptotically. Theorem 3.1 Given a family of sets {Mn }n∈N satisfying properties i-v and a state ρ ∈ D(H), for every  > 0 there exists a sequence of POVMs {An , I − An } such that lim tr((I − An )ρ⊗n ) = 0

n→∞

and for all sequences of states {ωn ∈ Mn }n∈N , −

log tr(An ωn ) 1 ∞  +  ≥ EM (ρ) = lim lim sup LRM (ρ⊗n ). n →0 n→∞ n n

Conversely, if there is a  > 0 and sequence of POVMs {An , I − An } satisfying −

log(tr(An ωn )) ∞ −  ≥ EM (ρ) n

for all sequences {ωn ∈ Mn }n∈N , then lim tr((I − An )ρ⊗n ) = 1.

n→∞ 2

Pπ is the standard representation in H⊗n of an element π of the symmetric group Sn .

183

The proof of Theorem 3.1 can be found in Refs. (13; 15). On general lines, the main idea of the proof is to employ Renner’s exponential de Finetti Theorem (22) to reduce the correlated instance at hand to an almost i.i.d. setting, which can then be handled by standard techniques. Theorem 3.1 gives an operational interpretation to the regularized relative entropy of entanglement. Taking {Mn }n∈N to be the sets of separable states over H⊗n , it is a simple exercise to check that they satisfy conditions i-v. Therefore, we conclude that ∞ (ρ) gives the rate limit of the type II error when we try to decide if we have several ER realizations of ρ or a sequence of arbitrary separable states. This rigorously justify the use of the regularized relative entropy of entanglement as a measure of distinguishability of quantum correlations from classical correlations. Another noteworthy point of Theorem 3.1 is the formula 1 1  EMn (ρ⊗n ) = lim lim sup LRM (ρ⊗n ). n n→∞ n →0 n→∞ n lim

(17)

Taking once more {Mn } as the sets of separable states over H⊗n , this Equation shows that the regularized relative entropy of entanglement is a smooth asymptotic version of the log global robustness of entanglement. We hence have a connection between the robustness of quantum correlations under mixing and their distinguishability to classical correlations. For later use we denote the quantity appearing in Eq. (17) by LG when {Mn } are taken to be the sets of separable states. 4. Proof of Theorem 2.4 4.1. The Entanglement Cost under Asymptotically non-Entangling Maps Proposition 4.1 For every multipartite state ρ ∈ D(Cd1 ⊗ ... ⊗ Cd2 ), ∞ ECane (ρ) ≤ ER (ρ).

(18)

Proof To prove the Proposition we consider a specific sequence of operations achieving ∞ (ρ). We consider maps of the form: the rate LG(ρ) = ER Λn (A) = tr(AΦ(Kn ))ρn + tr(A(I − Φ(Kn )))πn ,

(19)

where (i) {ρn } is an optimal sequence of approximations for ρ⊗n achieving the infimum in LG(ρ), (ii) log(Kn ) = blog(1 + RG (ρn ))c, and (iii) πn is a state such that (ρn + (Kn − 1)πn )/Kn is separable which always exists as Kn ≥ 2log(1+RG (ρn )) = RG (ρn ) + 1. As πn and ρn are states, each Λn is completely positive and trace-preserving. We now show that each Λn is a 1/(Kn − 1)-non-entanging map. From (ii) and (iii) we find πn + (Kn − 1)−1 ρn ∈ S, (20) 1 + (Kn − 1)−1 and, thus, 1 . (21) Kn − 1 A simple calculation using the U U ∗ -symmety of the maximally entangled state, together with the convexity of RG indeed show that Λn is a 1/(Kn −1)-separability-preserving map. ∞ (ρ) > 0 for every entangled state A Corollary of Theorem 3.1 shows that LG(ρ) = ER (see Refs. (13; 15)). Therefore RG (πn ) ≤

lim (Kn − 1)−1 ≤ lim RG (ρn )−1 = 0.

n→∞

n→∞

184

Moreover, as lim ||ρ⊗n − Λn (Φ(Kn ))||1 = lim ||ρ⊗n − ρn ||1 = 0,

n→∞

n→∞

(22)

it follows that {Λn } is an admissible sequence of maps for ECane (ρ) and, thus, 1 log(Kn ) n→∞ n 1 = lim sup blog(1 + RG (ρn ))c n→∞ n = LG(ρ).

ECane (ρ) ≤ lim sup

(23) t u

4.2. The Distillable Entanglement under non-Entangling Operations Before we turn to the proof of the main Proposition of this section, we state and prove an auxiliary Lemma which will be used later on. It can be considered the analogue for non-entangling maps of Theorem 3.3 of Ref. (24), which deals with PPT maps. Its proof, which is a simple exercise in duality of convex optimization problems, is given in (14; 13). Lemma 4.2 For every multipartite state ρ ∈ D(Cd1 ⊗ ... ⊗ Cdn ) the singlet-fraction under non-entangling maps, Fsep (ρ; K) := max tr(Φ(K)Λ(ρ)), (24) Λ∈SEP P

where Φ(K) is a K-dimensional maximally entangled state shared by the first two parties, satisfies 1 tr(σ). (25) Fsep (ρ; K) = min tr(ρ − σ)+ + K σ∈cone(S) It turns out that for the distillation part we do not need to allow any generation of entanglement from the maps. In analogy to Definition 2.3, we can define the distillable entanglement under non-entangling maps as     kn ⊗kn ⊗n ne : lim min ||Λ(ρ ) − φ2 ||1 = 0 . (26) ED (ρ) := sup lim inf n→∞ Λ∈SEP P n→∞ n {kn } Proposition 4.3 For every multipartite entangled state ρ ∈ D(Cd1 ⊗ ... ⊗ Cdn ), ne ne ∞ ED (ρ) ≥ ED (ρ) = ER (ρ).

(27)

Proof On one hand, from Theorem 3.1 we have ( 0, lim min tr(ρ⊗n − 2ny σn )+ = n→∞ σn ∈S 1,

∞ (ρ) y > ER ∞ (ρ). y < ER

(28)

On the other, from Lemma 4.2 we find Fsep (ρ⊗n ; 2ny ) := min tr(ρ⊗n − 2nb σ)+ + 2−(y−b)n . σ∈S,b∈R

(29)

∞ (ρ) + , for Let us consider the asymptotic behavior of Fsep (ρ⊗n , 2ny ). Take y = ER  ∞ any  > 0. Then we can choose, for each n, b = 2n(ER (ρ)+ 2 ) , giving ∞





Fsep (ρ⊗n , 2ny ) ≤ min tr(ρ⊗n − 2n(EM (ρ)+ 2 ) σ)+ + 2−n 2 . σ∈S

185

We then see from Eq. (28) that limn→∞ Fsep (ρ⊗n , 2ny ) = 0, from which follows that ne (ρ) ≤ E ∞ (ρ) + . As  is arbitrary, we find E ne (ρ) ≤ E ∞ (ρ). ED R D R ∞ (ρ) − , for any  > 0. The optimal b for each n has Conversely, let us take y = EM to satisfy bn ≤ 2yn , otherwise Fsep (ρ⊗n , 2ny ) would be larger than one, which is not true. Therefore, ∞ Fsep (ρ⊗n , 2ny ) ≥ min tr(ρ⊗n − 2n(EM (ρ)−) σ)+ , σ∈S

ne (ρ) ≥ E ∞ (ρ) − . Again, which goes to one again by Eq. (28). This then shows that ED R ne ∞ as  is arbitrary, we find ED (ρ) ≥ ER (ρ). t u

From Propositons 4.1 and 4.3 it is clear that in order to prove Theorem it sufficies to ane (ρ) for every state ρ. This can shown using standard methods in show that ECane (ρ) ≥ ED entanglement theory and is left out of this paper (see Refs. (14; 13) for a proof). We also do not present the proof of Corrolary 2.5, which follows straighforwardly from Theorem 2.4. A full proof can be found in Refs. (14; 13). 5. Connection to the Axiomatic Formulation of the Second Law of Thermodynamics In this section we comment on the similarities and differences of entanglement manipulation under asymptotic non-entangling operations and the axiomatic approach of Giles (2) and more particularly of Lieb and Yngvason (3) for the second law of thermodynamics. Let us start by briefly recalling the axioms used by Lieb and Yngvason (3) in order to derive the second law. Their starting point is the definition of a system as a collection of points called state space and denoted by Γ. The individual points of a state space are the states of the system. The composition of two state spaces Γ1 and Γ2 is given by their Cartesian product. Furthermore, the scaled copies of a given system is defined as follows: if t > 0 is some fixed number, the state space Γ(t) consists of points denoted by tX with X ∈ Γ. Finally, a preorder ≺ on the state space satisfying the following axioms is assumed: (i) (ii) (iii) (iv) (v)

X ≺ X. X ≺ Y and Y ≺ Z implies X ≺ Z. If X ≺ Y , then tX ≺ tY for all t > 0. X ≺ (tX, (1 − t)X) and (tX, (1 − t)X) ≺ X. If, for some pair of states, X and Y , (X, Z0 ) ≺ (Y, Z1 )

(30)

holds for a sequence of ’s tending to zero and some states Z0 , Z1 , then X ≺ Y . (vi) X ≺ X 0 and Y ≺ Y 0 implies (X, Y ) ≺ (X 0 , Y 0 ). Lieb and Yngvason then show that these axioms, together with the comparison hypothesis, which states that Comparison Hypothesis: for any two states X and Y either X ≺ Y or Y ≺ X, are sufficient to prove the existence of a single valued entropy function completely determining the order induced by the relation ≺. In the context of entanglement transformations, we interpret the relation ρ ≺ σ as the possibility of asymptotically transforming ρ into σ by asymptotically non-entangling maps. Then, the composite state (ρ, σ) is nothing but the tensor product ρ⊗σ. Moreover, tρ takes

186

the form of ρ⊗t , which is a shortcut to express the fact that if ρ⊗t ≺ σ, then asymptotically t copies of ρ can be transformed into one of σ. More concretely, we say that ρ⊗t ≺ σ ⊗q ,

(31)

for positive real numbers t, q if there is a sequence of integers nt , nq and of SEP P (n ) maps Λn such that lim ||Λn (ρ⊗nt ) − σ ⊗nq −o(n) ||1 = 0, n→∞

nq nt = t, and lim = q. n→∞ n→∞ n n→∞ n With this definition it is straightforward to observe that properties 1, 3, and 4 hold true for entanglement manipulation under asymptotically non-entangling maps. Property 2 can be shown to hold, in turn, by noticing that, if Λ ∈ SEP P () and Ω ∈ SEP P (δ), then Λ◦Ω ∈ SEP P (+δ +δ). Therefore the composition of two asymptotically non-entangling maps is again asymptotically non-entangling. That property 5 is also true is proven in Refs. (13). The Compasison Hypothesis, in turn, follows from Corollary 2.5: it expresses the total order induced by the regularized relative entropy of entanglement. We cannot decide if the theory we are considering for entanglement satisfy axiom 6. This is fundamentally linked to the possibility of having entanglement catalysis (23) under asymptotically non-entangling transformations. As shown in Theorem 2.1 of Ref. (3), given that axioms 1-5 are true, axiom 6 is equivalent to lim n = 0,

lim

(i) (X, Y ) ≺ (X 0 , Y ) implies X ≺ X 0 , which is precisely the non-existence of catalysis. Interestingly, we can link such a possibility in the bipartite case to an important open problem in entanglement theory, the full additivity of the regularized relative entropy of entanglement. Lemma 5.1 The regularized relative entropy of entanglement is fully additive for bipartite 0 0 states, i.e. for every two states ρ ∈ D(Cd1 ⊗ Cd2 ) and π ∈ D(Cd1 ⊗ Cd2 ), ∞ ∞ ∞ ER (ρ ⊗ π) = ER (ρ) + ER (π),

(32)

if, and only if, there is no catalysis for entanglement manipulation under asymptotically non-entangling maps. References [1] Carath´eodory C 1909 Math. Ann. 67 355 [2] Giles R 1964 Mathematical Foundations of Thermodynamics. Pergamon (Oxford: Oxford University Press) [3] Lieb E H and Yngvason J 1999 Phys. Rept. 310 1 [4] Lieb E H and Yngvason J 2002 Current Developments in Mathematics 89 10 [5] Bennett C H, Bernstein H J, Popescu S, and Schumacher B 1996 Phys. Rev. A 53 2046 [6] Horodecki M, Horodecki P, and Horodecki R 1998 Phys. Rev. Lett. 80 5239 [7] Werner R F 1989 Phys. Rev. A 40 4277 [8] Plenio M B and Virmani S 2007 Quant. Inf. Comp. 7 1

187

[9] Vedral V, Plenio M B, Rippin M A, Knight P L 1997 Phys. Rev. Lett. 78 2275 [10] Vedral V and Plenio M B 1998 Phys. Rev. A 57 1619 [11] Vidal G and Tarrach R 1999 Phys. Rev. A 59 141 [12] Harrow A W and Nielsen M A 2003 Phys. Rev. A 68 012308 [13] Brand˜ao F G S L 2007 PhD Thesis (Imperial College London) [14] Brand˜ao F G S L and Plenio M B 2007 A reversible theory of entanglement and its connection to the second law preprint arXiv:0710.5827 [15] Brand˜ao F G S L and Plenio M B. A Generalization of Quantum Stein’s Lemma. In Preparation [16] Audenaert K M R, Nussbaum N, Szkola A and Verstraete F 2007 Asymptotic Error Rates in Quantum Hypothesis Testing preprint arXiv:0708.4282 [17] Hiai F and Petz D 1991 Comm. Math. Phys. 143 99 [18] Ogawa T and Nagaoka H 2000 IEEE Trans. Inf. Theo. 46 2428 [19] Hayashi M 2002 J. Phys. A: Math. Gen. 35 10759 [20] Ogawa T and Hayashi M 2004 IEEE Trans. Inf. Theo. 50 1368 [21] Datta N 2008 Min- and Max- Relative Entropies and a New Entanglement Measure preprint arXiv:0803.2770 [22] Renner R 2007 Nature Physics 3 645 [23] Jonathan D and Plenio M B 1999 Phys. Rev. Lett. 83 3566 [24] Rains E 2001 IEEE Trans. Inf. Theo. 47 2921

188

Proceedings of the International Workshop on Statistical-Mechanical Informatics September 14–17, 2008, Sendai, Japan

Entanglement Production in Non-Equilibrium Thermodynamics Vlatko Vedral The School of Physics and Astronomy, University of Leeds, Leeds, LS2 9JT, United Kingdom Centre for Quantum Technologies, National University of Singapore, 3 Science Drive 2, Singapore 117543 Department of Physics, National University of Singapore, 2 Science Drive 3, Singapore 117542 E-mail: [email protected] Abstract. We define and analyse the concept of entanglement production during the evolution of a general quantum mechanical dissipative system. While it is important to minimise entropy production in order to achieve thermodynamical efficiency, maximising the rate of change of entanglement is important in quantum information processing. Quantitative relations are obtained between entropy and entanglement productions, under specific assumptions detailed in the text. We apply these to the processes of dephasing and decay of correlations between two initially entangled qubits. Both the Master equation treatment as well as the higher Hilbert space analysis are presented. Our formalism is very general and contains as special cases many reported individual instance of entanglement dynamics, such as, for example, the recently discovered notion of the sudden death of entanglement.

1. Introduction Entanglement has been studied extensively in many body systems in the last five years or so [1]. A formalism has been developed to treat entanglement using the standard methods of statistical mechanics. Almost all of the studies have been focused on systems in thermal equilibrium, both at zero and finite temperatures. Real systems are, however, hardly ever in thermal equilibrium and at a fixed temperature. The field of non-equilibrium thermodynamics was developed in the decades between 40s and 70s to deal with such driven systems and laws of their evolution [2]. Theoretical predictions of macroscopic entanglement have also been experimentally corroborated in systems such as, for example, grains of salt [3]. This raises a very interesting possibility, which I briefly speculated about sometime ago [4], that entanglement can and does feature in living systems. Living systems, however, are composed of large macromolecules of various types and they exist at high temperatures (about 300 Kelvin plus). Can any entanglement survive under such harsh conditions? It would seem unlikely that macroscopic thermal entanglement could exist at 300K, but, it is very important to remember that living systems are not in equilibrium. They are in fact very much driven by their environments and continuously change in time. For example, chemical reactions in the cell’s mitochondria are very far from equilibrium, which is why they can produce and supply energy necessary for cell’s functioning. It is feasible, therefore, that a system’s equilibrium state is not entangled, while, for the same system, entanglement gets created when this system dynamically approaches equilibrium. Steady states of driven systems could also

189

easily be engineered to be entangled. Achieving coherence away from equilibrium is nothing unusual after all: the phenomenon of population inversion is one such instance and it has been speculated that the coherent electron transport in cell membranes (as well as photosynthesis) functions in a similar way [5]. Bio-chemical experiments are now approaching the appropriate levels of sophistication where quantum features of energy transfer processes can be addressed in greater detail. Remarkable recent results show that quantum coherence, in the form of quantum energy beats, is present in the energy transfer process in photosynthesis at T = 77K [6]. The authors compare their findings to a version of Grover’s search algorithm [7], where the incoming light excites a number of energy states in the receptors, which then coherently (and rapidly) transfers energy to the most convenient storage state. The authors also do not rule out the possibility of non-local effects, i.e. entanglement, though their experiment was not designed to reveal any such phenomenon. Studying entanglement in real systems, especially biological ones, impels us therefore to phrase the whole question of the existence of many-body entanglement within the formalism of non-equilibrium thermodynamics. The central quantity in thermodynamics is entropy. Whether a certain thermodynamical state can be transformed into another one is determined by the difference between their corresponding entropies. But, thermodynamics does not tell us how exactly the transformation is to be executed - despite its name, the details of dynamics are not part of thermodynamical description. Dynamics of macroscopic objects immersed in noisy environments belongs to the domain of non-equilibrium thermodynamics. The central quantity in non-equilibrium thermodynamics in not entropy, but the rate of change of entropy, which is known as entropy production [2, 8]. This is perhaps not surprising, since dynamics should be described by rates of change of relevant kinematic quantities and entropy is one such quantity. Here, however, our intention is to study behavior of entanglement in non-equilibrium and we therefore introduce a quantity we call entanglement production. The main aim of this work will be to compare the behavior of entropy and entanglement productions under very general dissipative evolutions. The hope of the author is that the entropy production in some way constrains the rate of entanglement production. This would be a completely novel contribution to the study of connections between entanglement and thermodynamics [9]. Our approach is important not only in the domain of studying macroscopic quantum physics, but may also be able to shed some new light on the issues related to the quantum measurement (see [10]). 2. Entropy production We begin by analysing the following question. A system, whose Hamiltonian is H, is in contact with a heat bath which interacts with the system driving it into the equilibrium thermal state ρT = e−βH /Z where Z = tre−βH is the partition function. Note that here H will be time independent; however, entropy production can be defined for driven systems as well and most of what we have to say will apply to these more general circumstances. We will point this out whenever appropriate as we proceed in our argument. If the system is at the beginning in some state ρ, we ask about the entropy change during the process in which this state thermalises. There are two components to the total entropy change [11, 12], ΔSt , which are usually written as ΔSt = ΔSint +ΔSext . The first term signifies the internal entropy change, which is the change entropy of system’s entropy, given by ΔSint = S(ρT ) − S(ρ), while the second term, the external   change, is ΔSext = − k rk (trρT H − rk |H|rk )/T = tr(ρT − ρ) ln ρT , where ρ = k rk |rk rk | is the eigen-decomposition of the density matrix. This term signifies the entropy increase in the environment. It is derived by calculating the (expected) heat transfer from the system to the environment divided by the temperature. The total entropy change is, therefore, conveniently expressed using the relative entropy as ΔSt = S(ρ||ρT ) = tr(ρ ln ρ − ρ ln ρT )

190

(1)

This quantity is never negative [13], which is an expression of the second law of thermodynamics. If we look at the continuous change of entropy in time, a naive way of thinking would suggest to us that the entropy production, σ(t), should be given by a derivative of the above in time, like so d d (2) σ = − ΔSt = −tr( ρ)(ln ρ − ln ρT ) dt dt The negative sign corresponds to the fact that, while the relative entropy generally decreases in time, the entropy production itself should be a positive quantity. Remarkably, upon a more rigorous investigation, this turns out to be the correct expression for entropy production [8]. Since ρ describes the most general evolution of an open system, its (continuous) dynamics is given by the Lindblad master equation which has the following general form (¯ h = 1) [14]: n 1 1 {Γ† Γk ρ + ρΓ†k Γk − 2Γk ρΓ†k }, ρ˙ = [H, ρ] − i 2 k=1 k

(3)

where the dot represents the time derivative, and Γ’s are the dissipative operators. They do not need to satisfy any special requirements, since the combination through which they enter the Master equation already guarantees the “physicality” of the whole process. Namely, for a small time interval Δt, we can describe the time evolution of the density matrix by the n † following completely positive, trace √ preserving map (CP-map), ρ(t + Δt) ≈ k=0 Wk ρ(t)Wk , ˜ and Wk = ΔtΓk (k ∈ {1 . . . n}) are called the “no-jump” and jump where W0 = 1 − iHΔt ˜ is a non-Hermitian Hamiltonian, given by: H ˜ = H − i  n Γ† Γk . operators respectively. H k=1 k 2  Note that the operators Wk fulfill the completeness relation nk=0 Wk† Wk = 1. Because of the non-increase of the relative entropy under general CP-maps [13], we can also conclude that σ ≥ 0 holds in general (thereby justifying our insertion of the minus sign in the above definition). For completeness, it should be stressed the entropy production rate can be defined even when we do not have stationary states of our evolution. The derivation here followed the assumption that the state of the system thermalises due to its interaction with the environment, but even outside of this framework - and we mentioned driven systems earlier - the concept of entropy production is still very much meaningful. Since in this paper most of our attention will be devoted to dissipative evolutions leading to thermalisation, our above considerations will be sufficient, but it is by all means worth noting that the generality of the concept of entropy production stretches far beyond what we have presented. Thermodynamically the most efficient processes are the reversible ones, where σ = 0. A state of the system must then be transformed by being in touch with (a sequence of) reservoirs whose state is only infinitesimally different to that of the system. Then, the relative entropy between the system and the reservoir is zero up to the second order, S(ρ + δρ||ρ) ≈ 0. Thermodynamical efficiency, however, may not be the desired goal when we are optimising the computational speed-up, as we do in quantum computation. Then, a far better indicator of efficiency is the rate of consuming or creating entanglement (whether we do the former or the latter depends on the model of quantum computers we use, but this distinction is of little importance here). We now proceed to define it. 3. Entanglement Production The first important point to mention is that when a system interacts with its environment, there are many different entanglements that we can consider. For instance, there is the entanglement between the system and the environment that develops as the evolution, in the form of their interaction, proceeds. Secondly, there is the entanglement within the environment itself, which could be very complex. In this paper, however, we are interested in the amount of entanglement

191

within the system only. Much as the entropy production refers to the entropy production by the system, so will the entanglement production be strictly confined to within the system. Given this, there is still a myriad different expressions for the amount of entanglement in a given quantum state ρ [15]. However, since the relative entropy is involved in quantifying the entropy production, we will also use the relative entropy to measure entanglement [16, 17]. Another advantage of this measure is that it is universal, applying as it does to any number of subsystems of any dimensionality [15]. We define entanglement production, σE , to be the temporal derivative of the relative entropy of entanglement. Thus, σE =

d S(ρ(t)||ρsep (t)) dt

(4)

This can be easily rewritten as σE = tr{ρ˙ ln ρ−exp{ln ρ−ln ρsep }ρ˙ sep }, highlighting the difficulty that, unfortunately, exists in computing the last term, ρ˙ sep . This is because the closest separable states changes in time with ρ and we need to calculate it at each instant in time. Ignoring this difficulty for the moment (for this already exists when calculating the relative entropy of entanglement itself [15]), let us first analyze the general relationship between entropy and entanglement productions. How do we expect σ and σE to be related, given our knowledge of dissipation and entanglement? We would most likely conjecture that the rate of entropy production is in absolute sense always greater than that of entanglement production. The reason is that we can easily imagine a situation where the steady state is disentangled, and therefore σE = 0, but the entropy is still produced, i.e. σ > 0 (see for example [18]). But, does this hold more generally? This simple answer is no, since we can construct a state very weekly coupled to its environment, so that the resulting entropy production is very small, but that its (state’s) internal dynamics is so strong that entanglement gets rapidly generated. Since the rate of internal dynamics is (seemingly) completely independent from the coupling rate to the environment, this difference between the entropy and entanglement productions can be made as large as we please (this is not quite correct since both the driving and dissipation may depend on the same features, but we will not go into details of this here [19]). What if, on the other hand, the system is only evolving entanglement due to the environmental coupling (i.e. without any other external driving)? To investigate this, let us now assume a special case where ρsep (t) = ρT at all times (we will encounter a concrete example of such a system later). Then, dρsep /dt = 0, and so σE (t) + σ(t) = 0. In this case entanglement decreases by exactly as much as the entropy increases at each instant in time. Since we know that entropy production is always positive (this is a restatement of the Second Law), the above equation implies that entanglement is being reduced. All the dissipation is therefore used up solely in destroying entanglement. In general, however, this is not necessarily the case. The separable state will in general evolve during the dissipative evolution and the above law no longer holds. What then is the function of the excess of entropy production over entanglement production? Before we address this issue, it is interesting to mention Prigogine [20] who showed that under some restricted conditions, a principle of the minimum entropy production can be derived (for details see [2]). We hear it frequently stated that biological processes conform to this rule, which in our above case would imply the principle of minimum entanglement destruction. It is fascinating to explore further if the efficiency of some natural processes simply derives from the maximal possible preservation of entanglement during the time of these processes. (Might the same be true for general quantum computational processes?). We argued that it is clearly feasible to have a finite entropy production while at the same time not have any entanglement dynamics. A steady state of a driven system displays this feature as we noted before, but so does the recently experimentally confirmed [21] phenomenon of the “sudden death of entanglement” [22]. In the latter case, the system dissipates toward a

192

disentangled steady state, but entanglement vanishes before this steady state is reached. Beyond this point, it is clear that σE = 0, but entropy is still produced until the steady state is reached. The opposite of this cannot happen, namely the fact that entanglement is produced but entropy is not, providing that the system is not driven. If the system is not driven, the entropy production is zero only in thermal equilibrium, but then so is entanglement production. It is therefore natural to conjecture that, if we eliminate the possibility of driven entanglement, then |σE | ≤ σ. We use the absolute value of entanglement production because we do not want to worry about whether entanglement increases or decreases. We would only like to claim that the change in entanglement is bounded by the change in entropy. This question resembles the question of Landauer’s erasure [23], namely whether the entropy increase in the environment during a measurement is greater than the information obtained during the same measurement. In order to address the question of entanglement production in a more quantitative way, we now utilise a different way of presenting dissipative evolutions. 4. Higher Hilbert Space View Any completely positive map of the above type can be represented in a fully unitary way providing we are allowed to include the environment in the treatment of the dynamics. At this global level of the system and environment, the dynamics is strictly unitary (since their aggregate is closed). Where does entropy production come from then, when unitarity strictly preserves entropy? The answer is that it is generated by neglecting the correlations between the system and the environment [24]. We will see this in our examples below, but for now let us imagine that the initial state of the system and environment is uncorrelated, ρS ρE . Suppose that after the unitary interaction the state is ρS  E  (primes will always pertain to the evolved state). It is true that due to unitarity S(ρS ρE ) = S(ρS  E  ), however, if we separate the system and environment then, due to subadditivity, S(ρS  E  ) ≤ S(ρS  ) + S(ρE  ). Therefore, if we disregard the correlations, we have that S(ρS ) + S(ρE ) ≤ S(ρS ) + S(ρE ) and so ΔSt = ΔSS + ΔSE = S(ρS ) − S(ρS ) + S(ρE ) + S(ρE ) ≥ 0. A way of explaining the entropy increase from a unitary evolution lies therefore in neglecting correlations (this is probably the most accepted view of “deriving” irreversibility from the microscopic reversibility). I would like to make a simple point here that is well known, but may be worth discussing briefly. It is this. The view that the deletion of correlations is responsible for the entropy increase is very closely related, if not identical, to the “coarse graining” method of explaining the entropy increase (that originated in classical physics). The coarse graining argument goes as follows. Any Hamiltonian evolution (be it quantum or classical) preserves entropy and is incapable of explaining its increase. However, if we calculate the entropy with respect to some underlying structure (known as the coarse grained version of phase space in classical physics), then this relative entropy between our state and its coarse grained version will always increase. Here, it is the neglect of the underlying microscopic structure that is responsible for entropy increase and the relative entropy again becomes a prominent measure of the “amount of neglect”. It turns out, furthermore, that the relative entropy between the quantum state ρAB and its marginals ρA ρB is just equal to the mutual information of AB, S(ρA ) + S(ρB ) − S(ρAB ) [15]. And, we have seen that the “flow” of relative entropy, unlike that of ordinary entropy, is always unidirectional when we have the most general CP evolution. It is this change in relative entropy that is used to quantify the entropy production. It is thus immaterial whether we use the mutual information or the relative entropy of coarse graining to measure the entropy increase - they should ultimately be the same. Suppose, following this logic, that our system S now contains two subsystems A and B. The initial state of the system and environment is, as before, ρAB ρE . The evolution is given by † = ρA B  E  . We can now apply the strong subadditivity [25] to the final state UABE ρAB ρE UABE to obtain SA B  E  + SB  ≤ SA B  + SB  E  . Since SA B  E  = SABE = SAB + SE , we have that

193

ΔSAB + ΔSE ≥ SB  + SE  − SB  E  = I(B  , E  ), which means that the entropy increase due to separation is bigger than the mutual information between either of the subsystems and the environment (see [26] for a more general discussion of the thermodynamics of measurement information). We have shown previously [27] that, under quite general circumstances, the change in the mutual information between environment and the system is an upper bound to the change of entanglement. (Similar considerations were presented in [28] where the authors show that the amount of work needed to erase all correlations in a state is equal to its mutual information). Following through the above inequality this would imply that ΔSAB + ΔSE ≥ ΔEAB ; this, inequality, however, still remains a conjecture, although we have presented strong “circumstantial” evidence in its favor. The issues related to entropy and entanglement productions have been very general so far, and hold for any kind of physical system independently of its size, number of degrees of freedom and such. From now on, however, we will specialize to the evolution of two qubits under the influence of dephasing and dissipation. These are two very common mechanisms leading to thermalisation which is why they are of particular importance. We hope, however, that our considerations will, in the future, be extended to many body systems, where - as we already indicated in the introduction - many interesting and fundamental questions are to be found (for a review see [1]). 5. Examples: dephasing and dissipation Suppose that systems A and B are two entangled qubits initially in an entangled state |ΨAB = a |00 + b |11. Here the relative entropy of entanglement is EAB = −a2 ln a2 − b2 ln b2 (We assume a, b to be real for simplicity and normalised a2 + b2 = 1). There are many way of destroying this entanglement, but the two most common ones are through dephasing or decay. Imagine that one of our qubits is coupled to an environment leading to a dissipative evolution. At the Master equation level, this can be written as dρ dt

= +

ω γ [σz , ρ] + ([σ − ρ, σ + ] + [σ − , ρσ + ]) 2i 2 η κ + − ([σ ρ, σ ] + [σ + , ρσ − ]) − [σz , [σz , ρ]] 2 2

(5)

where σz± are the usual Pauli matrices (instance of the Γ matrices in the general formulation) and γ, κ, η ≥ 0 are decay rates. The first term describes the free qubit evolution under H = ωσz (with ω being its natural frequency), the second and third terms are the decay ones (in both directions of exciting and de-exciting the qubit), while the last term indicates the dephasing process. Here we have assumed that only one of the qubits interacts in this way with its environment, while the other one remains stationary. This restricted assumption is immaterial to our discussion. We could easily have included the evolution of the other qubit without any fundamental modification to the forthcoming conclusions. It is straightforward to compute the overall state evolution under this Master equation, ⎛

a(t) ⎜ 0 ρ(t) = ⎜ ⎝ 0 μ(t)ab

0 0 0 0



0 μ(t)ab ⎟ 0 0 ⎟ , ⎠ 0 0 0 1 − a(t)

where μ(t) = exp{−iωt − (1/2)(γ + κ + η)t}, a(t) = δ(t)a2 + δ (t), δ(t) = exp(−(γ + κ)t), and δ (t) = (γ(1 − exp(−(γ + κ)t)))/(κ + γ). The stationary state of this evolution is obtained by letting t → ∞, while the closest separable state at any instant in time is given by deleting the off-diagonal elements, i.e. by setting μ = 0 in the above density matrix. These allow us to

194

compute both the entropy as well as entanglement productions. The expressions are simple but cumbersome to write down; the difference between entropy and entanglement productions is: ˙ ln(1 − a(t))a∞ /(a(t)(1 − a∞ )), where a∞ = limt→∞ a(t). It can be proved that σ − |σE | = a(t) this is a positive quantity (thus supporting our conjecture) that decays at the rate e−(κ+γ)t . Note that the off-diagonal decay does not play any role in this formula, although, of course, it is the reduction of the off-diagonals that contributes to thermalissation (and thereby it features in both entropy and entanglement productions individually). We can now investigate the process of dephasing on its own, but to make it clearer we will undertake this from the higher Hilbert state picture. We have that κ = γ = 0 and η > 0. Dephasing is obtained through the following interaction between (AB) and E, (|0A 0B  + |1A 1B ) |0E  → |0A 0B  |aE  + |1A 1B  |bE , where aE |bE  = 1 − ηδt > 0. Suppose that the same interaction happens in each small time step δt = t/k, where t is the total time of evolution. Then, after k such steps we obtain, |0A 0B  |aE  |aF  ... |aN  + |1A 1B  |bE  |bF  ... |bN 





k



k

But now aE |bE  aF |bF  ... aN |bN  = (1 − ηt/k)k → exp(−ηt) as k → ∞. The final state of the system AB, when the environment is traced out, is the mixture |0000| + |1111|, which is exactly the same result as that obtained from the Master equation treatment above. This is not an accident; this way of treating the dephasing is equivalent to the two assumptions used in deriving the Master equation, the so-called Born and Markov approximations. The former ensures that during each step, δt, the interaction with the environment is weak (i.e. ηδt is “small”), while the latter effectively corresponds to a memoryless environment (each new interaction is with a different environmental subsystem). In this case the state of AB becomes mixed as t → ∞ and contains no entanglement. Quantum correlations have thus been destroyed, but the classical ones still remain. Note that the closest separable state throughout the evolution is always the same and it is equal to the final state reached as k → ∞. An interesting aspect of this treatment is that the mutual information between the system and environment, whose total state is pure, is twice the value of the entropy of the state, and, twice the value of the reduction of entatanglement. This seems to represent a contradiction in the light of the previous claim, namely that when the thermal and separable state coincide, then the entanglement reduction should be equal to entropy production. However, the environment in this example was pure to start with. In order to recover the equality of entropy and entanglement rates, we need to start with an already mixed environment (many identical subsystems, all completely dephased). Then, through the interaction with the system, dephasing will be achieved whose total entropy increase will exactly match the entanglement decrease. This shows that a more appropriate way of looking at dephasing is through the process of thermalisation as presented in [29]. 6. Concluding discussion Here we have introduced the notion of entanglement production, in direct analogy with the quantity called entropy production. The latter has been used extensively in the studies of nonequilibrium thermodynamics leading to fundamental and profound insights in chemistry and biology [30]. We have shown that the two quantities are closely related, and have presented evidence for the fact that entropy production presents a bound on the rate of change of entanglement during general dissipative processes. There are many open question of general nature implied by our work. For instance, is there an entanglement “balance” equation of the type dE/dt = σE + ∇jE , where ∇jE would be the entanglement flux of entanglement through the boundary between the system and its

195

environment? It is clear that this can be answered in the affirmative in the case when σE = σ, since entropy production satisfies a balance equation, but does it hold more generally? While entropy is important in describing the directionality as well as the speed of thermal processes, entanglement and the rate of its change are important when we discuss the capacity of the system for quantum information processing. One wonders, for instance, if the efficiency of one way quantum computers relies on a delicate balance and trade-off between entropy production generated by measurements and the entanglement reduction during computation [31]. In a broader context, since we are discovering more and more rapidly that natural processes exploit quantum effects to enhance their efficiency, studying general principles behind dissipative entanglement production appears to be a very important and worthwhile venture. Acknowledgments. The author would like to thank the Royal Society, the Wolfson Trust, the Engineering and Physical Sciences Research Council (UK) and the National Research Foundation and Ministry of Education (Singapore) for their financial support. [1] L. Amico, R. Fazio, A. Osterloh and V. Vedral, Rev. Mod. Phys 80, 517 (2008). [2] S. R. De Groot and P. Mazur, Non-equilibrium thermodynamics (Dover Publications Inc.; Dover Ed edition, 1984). [3] S. Ghosh, T. F. Rosenbaum, G. Aeppli, S. N. Coppersmith, Nature 425, 48 (2003). [4] V. Vedral, Nature 425, 28 (2003). [5] H. Fr¨ ohlich, Nature 219, 743 (1968). [6] G. S. Engel et. al., Nature 446, 782 (2007). [7] L. Grover, Phys. Rev. Lett. 79, 325 (1997). [8] H. Spohn, J. Math. Phys. 19, 1227 (1978). [9] M. B. Plenio and V. Vedral, Cont. Phys. 39, 431 (1998). [10] V. Vedral, Phys. Rev. Lett. 90, 050401 (2003). [11] E. Lubkin, Int. J. Theor. Phys. 26, 523 (1987). [12] V. Vedral, Proc. Roy. Soc. London A 456, 969 (2000). [13] G. Lindblad, Comm. Math. Phys. 33, 305 (1973). [14] G. Lindblad, Comm. Math. Phys. 48, 119 (1976). [15] V. Vedral, Rev. Mod. Phys. 74, 197 (2002). [16] V. Vedral, M. B. Plenio, M. A. Rippin and P. L. Knight, Phys. Rev. Lett. 78, 2275 (1997). [17] V. Vedral and M. B. Plenio, Phys. Rev. A 57, 1619 (1998). [18] H.-P. Breuer, Phys. Rev. A 68 932105 (2003). [19] M. B. Plenio and P. L. Knight, Phys. Rev. A 53, 2986 (1996). [20] I. Prigogine, Thermodynamics of Irreversible Processes (Wiley, New York, 1961). [21] M. P. Almeida et al., Science 316, 579 (2007). [22] T. Yu and J. H. Eberly, Phys. Rev. Lett. 97, 140403 (2006). [23] R. Landauer, IBM J. Res. Develop. 5, 183 (1961). [24] A. Peres, Quantum Theory: Concepts and Methods (Fundamental Theories of Physics S.), (Kluwer Acad. Publishers, 1995). [25] E. H. Lieb and M. B. Ruskai, Phys. Rev. Lett. 30, 434 (1973). [26] G. Lindblad, J. Stat. Phys. 11, 231 (1974). [27] L. Henderson and V. Vedral, Phys. Rev. Lett. 84, 2263 (2000). [28] B. Groisman, S. Popescu and A. Winter, Phys Rev A 72, 032317 (2005). [29] V. Scarani et. al., Phys. Rev. Lett. 88, 097905 (2002). [30] P. V. Coveney, Nature 333, 409 (1988). [31] J. Anders, D. Markham, V. Vedral, M. Hajduˇsek, arXiv:quant-ph/0702020 (2007).

196

Complementarity and the algebraic structure of nite quantum systems Denes Petz

Alfred Renyi Institute of Mathematics, H-1364 Budapest, POB 127, Hungary E-mail: [email protected] Complementarity is a very old concept in quantum mechanics, however the rigorous de nition is not so old. Complementarity of orthogonal bases can be formulated in terms of maximal Abelian algebras and this may lead to avoid commutativity of the subalgebras. In some sense this means that quantum information is treated instead of classical (measurement) information. The subject is to extend to the quantum case some features from the classical case. This includes construction of complementary subalgebras. The Bell basis has also some relation. Several open questions are discussed. Abstract.

1. Introduction

The origin of complementarity is historically connected with the non-commutativity of operators describing observables in quantum theory. Although the concept was born together with quantum mechanics itself, the rigorous de nition was given much later. Complementary bases or complementary measurements give maximal information about the quantum system. Complementarity is also used, for example, in state estimation [14, 24] and in quantum cryptography [2]. When non-classical, say quantum, information is considered, then noncommutative subalgebras or subsystems of the total system should be regarded. The study of complementary non-commutative subalgebras is rather recent [16]. Complementarity appeared in the history of quantum mechanics in the early days of the theory. According to Wolfgang Pauli, the new quantum theory could have been called the theory of complementarity [13]. This fact shows the central importance of the notion of complementarity in the foundations of quantum mechanics. Unfortunately, the importance did not make clear what the concept really means. The idea of complementarity was in connection with uncertainty relation and measurement limitations. Wolfgang Pauli wrote to Werner Heisenberg in 1926: \One may view the world with the p-eye and one may view it with the q -eye but if one opens both eyes simultaneously then one gets crazy". The distinction between incompatible and complementary observables was not really discussed. This can be the reason that \complementarity" was avoided in the book [7] of John von Neumann, although the mathematical foundations of quantum theory were developed in a generally accepted way. The concept of complementarity was not clari ed for many years, but it was accepted that the pair of observables of position and momentum must be a typical and important example (when complementarity means a relation of observables). 197

The canonically conjugate position and momentum, Q and P , are basic observables satisfying the commutation relation, (QP P Q)f = if (f 2 D) which holds on a dense domain D (for example, on the Schwartz functions in L (R)). The uncertainty relation, (Q; f ) (P; f )  12 (f 2 D) holds on the same domain. (Recall that the variance of the observable A in the vector state f is de ned as (A; f ) = hf; A f i hf; Af i .) The Fourier transform F : L (R) ! L (R) is a unitary and makes a connection P = F QF between P and Q. This extends also to the spectral measures E P (  ) and E Q(  ), so that one has E P (H ) = F E Q (H )F for all Borel sets H  R. From the Fourier relation one can deduce that Tr E P (H )E Q(H ) = 21 (H )(H ) (1) for all bounded intervals H ; H  R with length (H ) and (H ). Note that the operators P and Q do not have eigenvectors and the connection (1) can be called complementarity. Since this paper concentrate on nite dimensional Hilbert spaces, P and Q are not discussed here, but we refer to the paper [4]. Herman Weyl used the nite Fourier transform to approximate the relation of P and Q in nite dimensional Hilbert spaces [25]. Let j0i; j1i; : : : ; jn 1i be an orthonormal basis in an n-dimensional Hilbert space. The transformation 2

2

2

2

2

2

1

1

1

1

2

2

1

1

n X : jii 7! p1n !ij jj i 1

Vn

2

2

(! = e  =n) 2 i

j =0

(2)

isP a unitary and it is nowadays called quantum Fourier transform. If the operator A =  i i jei ihei j is diagonal in the given basis and B = Vn AVn , then the pair (A; B ) approximates (Q; P ) when the eigenvalues are chosen properly. The complementarity of observables of a nite quantum system was emphasized by Accardi in 1983 during the Villa Mondragone conference [1]. His approach is based on conditional probabilities. If an observable is measured on a copy of a quantum system and another observables is measured on another copy (prepared in the same state), then one measurement does not help to guess the outcome of the other measurement, if all conditional probabilities are the same. If the eigenvectors of the rst observable are i's, the eigenvectors of the second one are j 's and the dimension of the Hilbert space is n, then complementarity means jhi; j ij = p1 : (3) n

It is clear that the complementarity of two observables is actually the property of the two eigenbases, so it is better to speak about complementary bases. The Fourier transform (2) moves the standard basis j0i; j1i; : : : ; jn 1i to a complementary basis Vnj0i; Vnj1i; : : : ; Vnjn 1i. The complementarity (3) is often called value complementarity and it was an important subject in the work of Schwinger [21, 22, 19].

198

2. From mutually unbiased bases to complementary subalgebras

Let H be an n-dimensional Hilbert space with an orthonormal basis e ; e ; : : : ; en. A unit vector 2 H is complementary with respect to the given basis e ; e ; : : : ; en if jh; eiij = p1 (1  i  n): (4) 1



1

2

2

n

(4) is equivalent to the formulation that the vector state jihj gives the uniform distribution when the measurement je ihe j; : : : ; jenihenj is performed: Tr jijhj jeiiheij = 1 (1  i  n): 1

1

n

When the Hilbert space H is a tensor product H H , then a unit vector complementary to a product basis is called maximally entangled state. (If a vector is complementary to a product basis, then it is complementary to any other product basis.) When dimH = dimH = 2, then the Bell basis p1 (j00i + j11i); p1 (j01i + j10i); p1 (j00i j11i); p1 (j01i + j10i) (5) 2 2 2 2 consists of maximally entangled states. The goal of state determination is to recover the state of a quantum system by measurements. If the Hilbert space is n dimensional, then the density matrix of a state contains n 1 real parameters. If a measurement is repeated on many copies of the same system, then n 1 parameters can be estimated. Therefore, at least n + 1 di erent measurement should be performed to estimate the n 1 parameters. A measurement (described by minimal orthogonal projections) can be identi ed with a basis. Wootters and Fields argued that in the optimal estimation scheme the n + 1 bases must be pairwise complementary [24]. Instead of pairwise complementary bases, Wootters and Fields used the expression \mutually unbiased bases" and this terminology has become popular. A di erent kind of optimality of the complementary bases was obtained in [15] in terms of the determinant of the average mean quadratic error matrix. In case of a 2-level system, the three Pauli observables       1 0 0 i 0 1  := 1 0 ;  := i 0 ;  := 0 1 can be used for several purposes. For example, the Bloch parametrization of the state space   1 1 1 +   i    = (I +    ) = (6) 2 2  + i 1  ; is convenient. The Pauli observables are pairwise complementary: If  is an eigenvector of i and  is an eigenvector of j with i 6= j , then jh; ij = 1=2. Let u(1); u(2) and u(3) be unit vectors in R and consider the observables A(i) = u(i)   (1  i  3) for measurement. If each of them is measured r times and the relative frequency is  (i)r for the outcome 1 of A(i), then ^ = 2T ( (1)r ;  (2)r ;  (3)r )t T 1 (7) 1

2

1

2

2

2

3

2

1

3

1

1

2

2

3

2

3

1

1

199

is an estimate, where

2 u(1)1 4 T = u(2)1

u(3)1

u(1)2 u(1)3 u(2)2 u(2)3 u(3)2 u(3)3

3 5

is the transpose of the basis transformation. The eigenbases of the Pauli matrices are mutually unbiased and the eigenbases of A(1); A(2) and A(3) are so if T is an orthogonal matrix. For the above estimate, the mean quadratic error matrix is 2 3 1 (u(1)  ) 0 0 5 (T ) 0 1 (u(2)  ) 0 V () = 4T 4 0 0 1 (u(3)  ) which can be averaged with respect to the Lebesgue measure on the Bloch ball (or any other rotation invariant measure), see [15] . 2

(1)

1

1

2

2

Theorem 1 The determinant of the average mean quadratic error matrix is the smallest, if the vectors u(1); u(2) and u(3) are orthogonal, that is, the observables A(1); A(2) and A(3) are complementary.

The content of the theorem is similar to the result of [24] , however in the approach of Wootters and Field not the mean quadratic error was minimized but the information gain was maximized. The complementary (or unbiased) measurements are optimal from both viewpoints. Similar result holds in higher dimensions, as well. Relation (4) can be reformulated in terms of the generated subalgebras. The unital subalgebra generated by jijhj consists of operators jijhj +Pjijhj? (;  2 C), while the algebra generated by the orthogonal projections jeiiheij is f i ijeiiheij : i 2 Cg. Relation (4) can be reformulated in terms of these generated subalgebras. Theorem 2 Let A and A be subalgebras of Mk (C) and let  := Tr =k be the normalized trace. 1

2

Then the following conditions are equivalent:

(

P

2A

( )= ( ) ( ) ( )

Q 2 A2 are minimal projections, then  P Q  P  Q. (ii) The subalgebras A1 and A2 are quasi-orthogonal in Mn C , that is the subspaces A1 CI and A2 CI are orthogonal. (iii)  A1 A2  A1  A2 if A1 2 A1 , A2 2 A2 . (iv) If E1 A ! A1 is the trace preserving conditional expectation, then E1 restricted to A2 is a linear functional (times I ). (i) If

1

:

and

)= ( ) ( )

This theorem was formulated in [16] and led to the concept of complementary subalgebras. Namely A and A are complementary if the conditions of the theorem hold. As we explained above complementary maximal Abelian subalgebras is a popular subject in the form of the corresponding bases. We note that complementary MASA's was studied also in von Neumann algebras [20] Let A and B be maximal Abelian subalgebras of the algebra Mn(C) of n  n matrices. Set c := sup fTr P Q : P 2 A; Q 2 B are minimal projectionsg (8) and for a density matrix ! let !A and !B be the reduced densities in A and in iB . The uncertainty relation conjectured by Krauss and proven by Maasen and Unk [6] is the inequality S (!A ) + S (!B )  2 log c : (9) 1

2

2

200

for the von Neumann entropies S (!A) and S (!B ). The lower bound is the largest if c is the smallest. Since n c  n, the smallest value of c is 1=n. This happens if and only if A and B are complementary. Similar inequality for non-commutative subalgebras is not known. Two orthonormal bases are connected by a unitary. It is quite obvious that two bases are mutually unbiased if and only if the absolute value of the elements of the transforming unitary p is the same, 1= n when n is the dimension. This implies that construction of mutually unbiased bases is strongly related (or equivalent) to the search for Hadamard matrices [23]. Let A and A be subalgebras of Mk (C) and assume that both subalgebras are isomorphic to Mm(C). Then k = mn and we can assume that A = CIn Mm(C). There exists a unitary W such that W A W = A . The next theorem characterizes W when A and A are complementary [10, 16]. (On the matrices the Hilbert-Schmidt inner product hA; Bi = Tr AB is considered.) P Theorem 3 Let Ei be an orthonormal basis in Mn (C) and let W = i Ei Wi 2 Mn (C)

Mm (C) be a unitary. The subalgebra W (CIn Mm (C))W  is complementary to CI Mm (C) if 2

2 2

1

2

2

1

1

2

1

2

mX jWk ihWk j n

and only if

k

(C). The condition in the Theorem cannot hold if m < n and in the case n = m the condition means that fWk : 1  k  n g is an orthonormal basis in Mm(C). Example 1 Consider the unitary W = Vn2 de ned in (2) as an n  n block-matrix with entries from Mn(C). Then the entries form an orthonormal basis in Mn(C) and Theorem 3 tells us that the Fourier transform can be used to construct a complementary pair. It is remarkable that the Fourier transform sends the standard basis into a complementary one but it can produce non-commutative complementary subalgebras as well.  A di erent method for the construction of complementary subalgebras is indicated in the next example. Example 2 Assume that p > 2 is prime. Let e ; e ; : : : ; ep be a basis and let X be the unitary operator permuting the basis vectors cyclically:  ei if 0  i  n 2; Xei = e if i = n 1: Let q := e =p and de ne another unitary by Zei = qiei. Their matrices are as follows. 2 3 2 3 1 0 0  0 0 0  0 1 66 1 0    0 0 77 66 0 q 0    0 77 6 7 6 0 77 : X = 6 0 1  0 0 7; Z = 6 0 0 q  64 .. .. . . . .. .. 75 64 .. .. .. . . . .. 75 . . . . . . . . 0 0  1 0 0 0 0    qp It is easy to check that ZX = qXZ or more generally the relation (X k1 Z `1 )(X k2 Z `2 ) = qk2`1 X k1 k2 Z `1 `2 : (10) is satis ed. The unitaries fX j Z k : 0  j; k  p 1g is the identity mapping on

Mm 2

0

1

1

+1

0

i2

2

1

+

201

+

are pairwise orthogonal. For 0  k ; ` ; k ; `  p 1 set  (k ; l ; k ; l ) = X k1 Z `1 X k2 Z `2 : From (10) we can compute  (u) (u0 ) = q uu  (u0 ) (u); (11) where u  u0 = k ` 0 k 0 ` + k ` 0 k 0 ` (mod p) for u = (k ; ` ; k ; ` ) and u0 = (k0 ; `0 ; k0 ; `0 ). Hence (u) and (u0) commute if and only if u  u0 equals zero. We want to de ne a homomorphism  : Mp(C) ! Mp(C) Mp(C) such that (X ) =  (k ; l ; k ; l ) and (Z uu ) =  (u0 ) when u  u0 6= 0. Since the commutation relation (11) is the same as that for0 X and Z uu ,  can be extended to an embedding of Mp(C) into Mp(C) Mp(C). Let A(u; u )  Mp(C) Mp(C) be the range. This is a method to construct subalgebras. For example, if  (u) = X X and  (u0 ) = Z Z; then the generated subalgebra A(u; u0) is obviously complementary to CI Mp(C) and Mp (C) CI . (At this point we used the condition p > 2, since this implies that X and Z do not commute.)  The idea of the above example is used by Ohno to construct p +1 complementary subalgebras in Mp(C) Mp(C) if p > 2 is prime [9]. The case p = 2 is very di erent. It was proved by that M (C) M (C) does not contain 5 complementary subalgebras isomorphic to M (C) [17]. Let A and B be subalgebras of M  Mn(C). For a state on M the conditional entropy of the algebras A and B is de ned as 1

1

2

2

1

1

2

2

0

1 1

1

1

2

2

1

1 1

1

2

2 2

2 2

2

0

1

1

2

2

0

2

2

2

2

H

(AjB) := sup

nX 

i S

i

( i jA k jA )

S

( i jB k jB )

o

(12)

wherePthe supremum is taken over all possible decomposition of into a convex combination = i i i of states and S (  jj  ) stands for the relative entropy of states. This concept was introduced by Connes and Strmer in 1975 [5] and was called relative entropy of subalgebras. Since in the case of commutative algebras, the quantity becomes the usual conditional entropy, see Chap. 10 in [8], we are convinced that conditional entropy is the proper terminology. Theorem 4 Let A and B be subalgebras of Mn (C). Assume that A is Abelian subalgebra and its minimal projections have the same trace. Then the subalgebras A and B are complementary if and only if H (AjB ) is maximal. This result was obtained in [18] and it turns out that in the general case the conditional entropy of subalgebras cannot characterize complementarity.

202

3. Two qubits

From the point of view of complementarity the algebra M : M (C)  M (C) M (C) is an interesting and important particular case. An F-subalgebra is a subalgebra isomorphic to M (C). \F" is the abbreviation of "factor", the center of such a subalgebra is minimal, CI . If our 4-level quantum system is regarded as two qubits, then an F-subalgebra may correspond to one of the qubits. When the F-subalgebra A describes a \one-qubit-subsystem", then the relative commutant A0 := fB 2 M : BA = AB for every A 2 Ag corresponds to the other qubit. If A is an F-subalgebra of M, then we may assume that M = A A0. An M-subalgebra is a maximal Abelian subalgebra, equivalently, it is isomorphic to C . (M is an abbreviation of \MASA", the center is maximal, it is the whole subalgebra.) An M-subalgebra is in relation to a von Neumann measurement, its minimal projections give a partition of unity. Both the F-subalgebras and the M-subalgebras are 4 dimensional. We de ne a P-unitary as a self-adjoint traceless unitary operator. The eigenvalues of a P-unitary from M are 1; 1; 1; 1. An F-triplet (S ; S ; S ) consists of P-unitaries such that S = iS S . An M-triplet (S ; S ; S ) consists of P-unitaries such that S = S S . One can see that if (S ; S ; S ) is an X-triplet, then the linear span of I; S ; S ; S is an X-subalgebra, X=F, M. Example 3 Example 1 in the n = 2 case gives two F-subalgebras determined by the following two triplets: 4

2

2

2

0

4

1

2

3

3

3

1

2

1

1

2

2

1

1

2

2

3

3

3

0 1 ;

and

0 2 ;

0 3

(     +   ); (     +   +   );   : The linear combinations of the triplets and the identity are complementary subalgebras.  Example 4 In the Hilbert space C = C C the standard product basis is j00i; j01i, j10i; j11i. The Bell basis (5) consists of maximal entangled vectors and so it is complementary to the standard product basis. The Bell basis has important applications, for example, the teleportation of a state of a qubit. The operators diagonal in the Bell basis form an M-subalgebra which is generated by the M-triplet (  ;   ;   ): (13) We call this standard Bell triplet. It was proved in [18] that the Bell basis can be characterized abstractly in the language of complementarity. If A is an F-subalgebra of M (C), then an M-subalgebra complementary to both A and to its commutant is given by the Bell basis (up to a unitary transformation).  It was a natural question if M (C) contains 5 complementary F-subalgebras. The answer appeared in [17]. Assume that A ; A ; : : : ; Ar are complementary F-subalgebras. It was proved that A ; : : : ; Ar intersects the commutator of A , therefore r  3. The algebra M (C) is not only two qubits but also two fermions: Example 5 Let A be the algebra generated by the operators a ; a ; a ; a satisfying the canonical anticommutation relations: fa ; ag = fa ; ag = I; fa ; a g = fa ; a g = fa ; ag = fa ; a g = 0; where fA; Bg := AB + BA. Let A be the subalgebra generated a and A be the subalgebra generated a . Then A and A are complementary. In the usual matrix representation  0 1   1 0  and a =  1 0   0 1  ; a = 0 0 0 1 0 1 0 0 1 2

2

0

2

3

3

1 2

0

2

4

2

1

0

2

3

3

0

3

3

2

1

2

2

3

3

4

4

0

1

1

0

4

1

1

2

1

1

2

1

1

2

1

2

1

1

2

2

1

2

2

1

203

1

2

2

2

2

1

0

therefore

9

82 82 39 a 0 b 0 > > > > > > ; A2 = >64 0 0 > > ; : 0 0 : 0 c 0 d >

0 0 3> > 0 0 77= :

a c

b d

5> > ;

The subalgebra A is generated by the F-triplet (  ;   ;   ); and A is spanned by the F-triplet (   ;   ;   ): Observe that the standard Bell triplet (13) is complementary to both A and A . The parity automorphism is de ned by (a ) = a and (a ) = a . It is induced by the unitary   : (x) = (  )x(  ) The operators i j , 0  i; j  3 are eigenvectors of the parity automorphism. The xed point algebra is linearly spanned by 1

1

0

2

0

3

0

3

1

3

2

0

3

2

1

1

3

1

2

2

2

3

3

and

3

3

3

0 0 ;

1 1 ;

2 2 ;

3 3

0 3 ;

1 2 ;

2 1 ;

3 0 :

The rst group linearly spans the M-subalgebra corresponding to the Bell basis. It follows that all Bell states are even, that is the parity automorphism  leaves them invariant.  4. Complementary decompositions

In this section rst we consider decompositions of M  M (C) [18]. Decomposition of M into pairwise complementary F- and M-subalgebras is known. It is really well-known that decomposition into 5 M-subalgebras is possible. (Recall that this fact is equivalent to the existence of 5 mutually unbiased bases in a 4 dimensional space.) Theorem 5 Let Ak (0  k  4) be pairwise complementary subalgebras of M such that all of them is an F-subalgebra or M-subalgebra. If ` is the number of F-subalgebras in the set fAk : 0  k  4g, then ` 2 f0; 2; 4g, and all those values are actually possible. There is an interesting result about for pairwise complementary F-subalgebras. If A ; : : : ; A are such subalgebras, then then the linear span of the orthogonal complement and identity is always an M-subalgebra which is actually generated by the Bell triplet [10]. If we consider n-fold tensor product M := M (C) : : : M (C), then there are some open questions for n > 2. What is the maximum number of pairwise complementary subalgebras of M which are isomorphic to M (C)? If we calculate the dimensions, then (4n 1)=3 is nan upper bound. In the paper [10] a conjecture is formulated: The maximum number is (4 1)=3 1. So many subalgebras are actually constructed. If the conjecture is true, then one can ask the orthogonal complement of so many subalgebras: Is it a commutative subalgebra (if the identity is added)? The case of n qubit is rather special. Consider now the n-fold tensor product M := Mp(C) : : : Mp(C) and ask the maximum number of pairwise complementary subalgebras of M which are isomorphic to Mp(C). The upper bound is pn 1 p 1: If p > 2 is a prime number, then this upper bound is accessible [9]. 4

1

2

2

2

2

2

204

4

5. Discussion

The motivation for complementary subalgebras was a certain kind of state tomography for two qubits [14] and a systematic study started in [16]. Maximal Abelian subalgebras correspond to orthogonal bases in the Hilbert space and the complementarity of two maximal Abelian subalgebras is the same as the mutually unbiased property of the corresponding two bases. Mutually unbiased bases have a huge literature and nice applications. Much less is known about complementary non-commutative subalgebras. An Abelian subalgebra (corresponding to a measurement) may give classical information about a quantum system and a non-commutative subalgebra provides quantum information about the total system. This is a very essential di erence, to handle quantum information is more sophisticated. Parts of the information coming from several reduced state of a quantum system may be redundant. Intuitively, two subsystems are complementary if the knowledge of their reduced densities is the most informative; i.e. as little redundant as possible. The construction of complementary subalgebras needs much research. For a 4-level quantum system (describing two qubits) a complete description is given in the paper. There is no noncommutative subalgebra complementary to both qubits and there is essentially one maximal Abelian subalgebra complementary to both qubits, this subalgebra is in strong relation with the Bell basis. The di erence between M (C) M (C) and Mn(C) Mn(C) is essential. The dimensional upper bound for the number of complementary subalgebras (isomorphic to Mn(C)) is n + 1. This bound is not reached for n = 2 [17] but it is reached if n > 2 is a prime [9]. Several open questions can be formulated. It is interesting that the present methods for the construction of mutually unbiased bases and complementary subalgebras are similar, typically based on nite elds. However, the relation of the two subjects is not clear. 2

2

2

References

[1] Accardi L 1984 Some trends and problems in quantum probability, in Quantum probability and applications to the quantum theory of irreversible processes, eds. L. Accardi, A. Frigerio and V. Gorini, Lecture Notes in Math. 1055, 1{19. Springer. [2] Bruss D 1998 Optimal eavesdropping in quantum cryptography with six states Physical Review Letters 81, 3018{3021 [3] Busch P and Lahti P J 1995 The complementarity of quantum observables: theory and experiment Riv. Nuovo Cimento 18 27 [4] Cassinelli G and Varadarajan V S 2002 On Accardi's notion of complementary observables In n. Dimens. Anal. Quantum Probab. Relat. Top. 5 135{144. [5] Connes A and Strmer E 1975 Entropy of II1 von Neumann algebras, Acta Math. 134 289-3006. [6] Maasen H and Unk I 1988 Generalized entropic uncertainty relations Phys. Rev. Lett. 60 1103{1106. [7] von Neumann J 1932 Mathematische Grundlagen der Quantenmechanik (Berlin: Springer) [8] Neshveyev S and Strmer E 2006 Dynamical entropy in operator algebras (Berlin: Springer) [9] Ohno H 2008 Quasi-orthogonal subalgebras of matrix algebras, preprint, arXiv:0801.1353. [10] Ohno H, Petz D and Szanto A 2007 Quasi-orthogonal subalgebras of 4  4 matrices Linear Alg. Appl. 425 109{118. [11] Ohya M and Petz D 1993 Quantum Entropy and Its Use (Heidelberg: Springer) [12] Oppenheim J, Horodecki K, Horodecki M, Horodecki P and Horodecki R 2003 A new type of complementarity between quantum and classical information, Phys. Rev. A 68 022307. [13] Pauli W 1980 General Principles of Quantum Mechanics (Berlin: Springer) (original German edition: 1933). [14] Petz D, Hangos K M, Szanto A and Szoll}osi F 2006 State tomography for two qubits using reduced densities J. Phys. A 39 10901{10907. [15] Petz D, Hangos K M and Magyar A 2007 Point estimation of states of nite quantum systems J. Phys. A. 40, 7955{7969. [16] Petz D 2007 Complementarity in quantum systems Rep. Math. Phys. 59 209{224. [17] Petz D and Kahn J 2007 Complementary reductions for two qubits J. Math. Phys. 48, 012107. [18] Petz D, Szanto A and Weiner M 2008, Complementarity and the algebraic structure of 4-level quantum systems, to be published

205

[19] Pittenger A O and Rubin M H 2004 Mutually unbiased bases, generalized spin matrices and separability Linear Algebra Appl. 390, 255{278. [20] Popa S 1983 Orthogonal pairs of -subalgebras in nite von Neumann algebras J. Operator Theory 9, 253{268. [21] Redei M 1998 Quantum Logic in Algebraic Approach (Dordrecht: Kluwer) [22] Schwinger J 1060 Unitary operator bases Proc. Nat. Acad. Sci. U.S.A. 46, 570{579. [23] Tadej W and Zyczkowski K 2006 A concise guide to complex Hadamard matrices Open Syst. Inf. Dyn. 13 133{177. [24] Wootters W K and Fields B D 1989 Optimal state determination by mutually unbiased measurements Ann. Physics 191, 363{381. [25] Weyl H 1931 Theory of groups and quantum mechanics (Methuen)

206

Proceedings of the International Workshop on Statistical-Mechanical Informatics September 14–17, 2008, Sendai, Japan

Quantum spin glasses at finite connectivity: cavity method and quantum satisfiability C. Laumann1 , R. Moessner2 , A. Scardicchio1,† , S. Sondhi1 1 2

Princeton Center for Theoretical Physics and Physics Department, Princeton University, USA Max Planck Institute, Dresden, Germany

E-mail: † [email protected] Abstract. We present preliminary results in the study of quantum spin glasses with finite connectivity. We propose the generalization of the cavity method used in studies of classical spin glasses to quantum systems and use it to study a Bethe lattice spin glass with transverse field. We also outline our recent results on a quantum generalizatin of random K-satisfiability.

207

208

Proceedings of the International Workshop on Statistical-Mechanical Informatics September 14–17, 2008, Sendai, Japan

Quantum mean-field decoding algorithm for error-correcting codes Jun-ichi Inoue1 , Yohei Saika2 and Masato Okada3 1

Graduate School of Information Science and Technology, Hokkaido University, N14-W9, Kita-ku, Sapporo 060-0814, Japan 2 Department of Electrical and Computer Engineering, Wakayama National College of Technology, Nada-cho, Noshima 77, Gobo-shi, Wakayama 644-0023, Japan 3 Division of Transdisciplinary Science, Graduate School of Frontier Science, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa-shi, Chiba 277-8561, Japan E-mail: 1 j [email protected] 2 [email protected] [email protected]

3

Abstract. We numerically examine a quantum version of TAP (Thouless-Anderson-Palmer)like mean-field algorithm for the problem of error-correcting codes. For a class of the so-called Sourlas error-correcting codes, we check the usefulness to retrieve the original bit-sequence (message) with a finite length. The decoding dynamics is derived explicitly and we evaluate the average-case performance through the bit-error rate (BER). We find that the BER at the equilibrium state is very close to the value predicted by the replica symmetric theory when one controls the quantum-mechanical tunneling field appropriately.

1. Introduction Statistical mechanics of information has been applied to a lot of problems in various research fields of information science and technology [1, 2]. Among them, error-correcting code is one of the most developed subjects. In the research field of error-correcting codes, Sourlas showed that the convolutional codes can be constructed by spin glass with infinite range p-body interactions and the decoded message should be corresponded to the ground state of the Hamiltonian [3]. Ruj´an suggested that the bit error can be suppressed if one uses finite temperature equilibrium states as the decoding result, instead of the ground state [4], and the so-called Bayes-optimal decoding at some specific condition was proved by Nishimori [5] and Nishimori and Wong [6]. Kabashima and Saad succeeded in constructing more practical codes, namely, low density parity check (LDPC) codes by using the infinite range spin glass model with finite connectivities [7]. They used the so-called TAP (Thouless-Anderson-Palmer) equations to decode the original message for a given parity check. As we shall see later on, an essential key point to obtain the Bayes-optimal solution is controlling the ‘thermal fluctuation’ in order to satisfy the condition on the Nishimori line. Then, a simple question is arisen, namely, is it possible to obtain the Bayes-optimal solution by means of the ‘quantum fluctuation’ induced by tunneling effects? or what is condition for the optimal control of the fluctuation? To answer these questions, Tanaka and Horiguchi introduced a quantum fluctuation into the mean-field annealing algorithm and showed that performance of image restoration is improved

209

by controlling the quantum fluctuation appropriately during its annealing process [8, 9]. The average-case performance is evaluated analytically by one of the present authors [10]. However, there are few studies concerning such a quantum mean-field algorithm for information processing described by spin glasses. In this talk, we examine a quantum version of TAP-like mean-field algorithm for the problem of error-correcting codes. For a class of the so-called Sourlas error-correcting codes, we check the usefulness to retrieve the original message with a finite length. The decoding dynamics is derived explicitly and we evaluate the average-case performance numerically through the bit-error rate. We find that the bit-error rate at the equilibrium state is very close to the value predicted by the replica symmetric theory when one controls the tunneling field appropriately. This paper is organized as follows. In the next section, we explain our model system and comment on the Shannon’s bound. In section 3, the Bayesian approach to the problem is introduced. Then, quantum Sourlas codes and the preliminary analysis for the case of p → ∞ (p is the number of bit products in the parity check) are reported. In the next section 4, we show the bit-error rate performance at the zero temperature for finite p. In section 5, we construct the TAP-like mean-field decoding algorithm for the Sourlas codes with finite p and examine the average-case performance. The last section is a concluding remark. 2. The model system and the Shannon’s bound In this section, we introduce our model system of error-correcting codes and mention the Shannon’s bound. In our error-correcting codes, in order to transmit the original message {ξ} ≡ (ξ1 , · · · , ξN ), ξi ∈ {−1, 1} through some noisy channel, we send all possible combinations N Cp 0 of the products of p-components in the N -dimensional vector {ξ} such as Ji1,··· ,ip = ξi1 ξi2 · · · ξip as ‘parity’. Therefore, the rate of the transmission is now evaluated as p! N  p−1 (1) R = C N N p in the limit of N → ∞ keeping the p finite. On the other hand, when we assume the additive white Gaussian noise (AWGN) channel  p−1 0 )Ji1i2···ip and variance {J p!/2N p−1 }2 , that is, when the output of the with mean (J0 p!/N channel Ji1i2···ip is given by    J0 p! p! 0 +J η, η = N (0, 1), (2) Ji1i2···ip Ji1i2···ip = p−1 N 2N p−1 the channel capacity C leads to C =

  {(J0 p!/N p−1 )Ji1···ip }2 J02 p! 1 log2 1 +  2 J 2 p!/2N p−1 J 2 N p−1 log 2

(3)

in the same limit as in the derivation (1) (we also used the fact (Ji1···ip )2 = 1). The factors p−1 or p!/2N p−1 appearing in (2) are needed to take a proper thermodynamic limit (to p!/N make the energy of order 1 object) as will be explained in the next section. Then, the channel coding theorem tells us that zero-error transmission is achieved if the condition R ≤ C is satisfied. For the above case, we have R/C = (J/J0 )2 log 2 ≤ 1, that is,  J0 ≥ log 2. (4) J The above inequality means that if the signal-to-noise ratio J0 /J is greater than or equal to √ log 2, the error probability of decoding behaves as Pe  2−N (C−R) → 0 in the thermodynamic limit N → ∞. In this sense, we might say that the zero-error transmission is achieved asymptotically in the limit N → ∞, C, R → 0 keeping R/C = O(1) ≤ 1 for the above what we call Sourlas codes.

210

3. The Bayesian approach For the error-correcting codes mentioned in the previous section, Sourlas pointed out that there exists a close relationship between the error-correcting codes and an Ising spin glass model with infinite range p-body interactions [3]. In this section, we briefly show the relationship for the classical system and then we shall extend the system to the quantum version. 3.1. Classical system To decode the original message {ξ}, we construct the posterior distribution:   2

a0 p! N p−1  exp − 2a2 p! i1,··· ,ip Ji1···ip − N p−1 σi1 · · · σip (5) P ({σ}|{J}) ∝ P ({J}|{σ})P ({σ}) = 2N (a2 πp!/N p−1 )1/2 where {σ} = (σ1 , · · · , σN ) denotes an estimate of the original message {ξ} and a and a0 are the so-called hyperparameters corresponding to the J0 and J, respectively. It should be noted that we assumed that the prior P ({σ}) is uniform such as P ({σ}) = 2−N . For the above posterior distribution, the MAP (maximum a posterior) estimate is obtained as the ground state of the following Hamiltonian: H({σ}|{J}) =

 2 a0 p! N p−1 Ji1···ip − p−1 σi1 · · · σip 2a2 p! N

(6)

i1,··· ,ip

It is obvious that the system {σ} described by the above Hamiltonian is an Ising spin glass with infinite range p-body interactions. Therefore, the decoding is achieved by finding the ground state of (6) via, for instance, simulated annealing. In the context of the MPM (maximizar of the posterior marginal) estimate instead of the MAP, the Bayes-optimal solution is obtained for each bit as a simple majority vote:

ξ i = P (σi = +1|{J}) − P (σi = −1|{J}) = sgn σi P (σi |{J}) ≡ sgn( σi ), (7) σi =±1

where P (σi |{J}) is a posterior marginal calculated as P (σi |{J}) = tr{σ}=σi P ({σ}|{J}).

(8)

It might be convenient for physicists to rewrite the above estimate ξ i in terms of the local magnetization of the system described by the Hamiltonian (6) as  ξ i = sgn

tr{σ} σi exp[−H({σ}|{J})] tr{σ} exp[−H({σ}|{J})]

 .

(9)

In the classical system specified by a given finite temperature T = 1, the Bayes-optimal solution ξ i = sgn( σi ) minimizes the following BER:

1 1 1 1− ξi ξ i = (1 − [ξξ]{ξ},{J} ) (10) pB = 2 N 2 i

[· · · ]{ξ},{J} ≡ tr{ξ} tr{J} (· · · )P ({J}|{ξ})P ({ξ}) on the Nishimori line a0 /a2 = J0 /J 2 [6].

211

(11)

3.2. Quantum system Apparently, essential key point to obtain the Bayes-optimal solution is controlling the ‘thermal fluctuation’ in order to satisfy the condition on the Nishimori line T = 1, a0 /a2 = J0 /J 2 . Then, a simple question is arisen, namely, is it possible to obtain the Bayes-optimal solution by means of the ‘quantum fluctuation’ induced by tunneling effects? or what is condition for the optimal control of the fluctuation? However, in the corresponding quantum system, the condition is not yet clarified. In our preliminary study [11], we considered the quantum version of the posterior by modifying the Hamiltonian as  2 a0 p! z N p−1 z − σ ˆ · · · σ ˆ − γ σ ˆix J i1···ip i1 ip a2 p! N p−1

ˆ H({σ}|{J}) =

i1,··· ,ip

σ ˆiz,x

i

σz,x (i)

≡ I (1) ⊗ · · · ⊗ ⊗ · · · ⊗ I (N)       1 0 1 0 0 1 z x I = ,σ = ,σ = = |+ −| + |− +| 0 1 0 −1 1 0

where the subscript such as {· · · }(i) of each matrix denotes the order in the tensor product. Then, a single bit flip: |+ ≡ t (1, 0) → |− ≡ t (0, 1) or |− → |+ is caused due to the existence 1.6

P

TJ / J

1.2

0.8

0.4

0

SG 0

F

0.3

0.6

J0 / J

0.9

1.2

Figure 1. Phase diagram of the Sourlas codes for p → ∞. In the shaded area (F), zero-error transmission is achieved. The area P denotes the para-magnetic phase and the√area SG is the spin glass phase. For instance, at the ground state, the critical signal-to-noise ratio is (J0 /J)c = log 2 = 0.8326. We set TJ ≡ βJ−1 .

ˆ As the result, the Bayes-optimal solution of the second term in the Hamiltonian H. σiz ρˆ)] ξˆi = sgn[tr(ˆ ˆ

ˆ

(12)

with the density matrix ρˆ ≡ e−H({σ}|{J}) /tr e−H({σ}|{J}) could be constructed by the quantum fluctuation (which is controlled by the amplitude γ) even at zero temperature. With the assistance of the replica method combining the static approximation in the Suzuki-Trottter formula, the phase diagram for the case of p → ∞ is easily obtained within one step replica symmetry breaking scheme as shown in Figure 1. At the ground state, the √ FerromagnetSpinGlass transition takes place at the critical signal-noise ratio (J0 /J)c = log 2  0.8326.

212

As the result, we find that R ≤ C, namely, zero-error transmission pB = 0 is achieved beyond the (J0 /J)c . It should be noted that the critical behavior is independent of the amplitude γ. However, for finite p, the minimum BER state is dependent on the γ and we should control it when we construct the algorithm based on the TAP-like mean-field approximation. It is our main issue in this article. 4. Replica analysis for finite p at zero temperature Before we provide such a decoding algorithm, we show the performance of the MPM estimate at zero temperature for finite p case. By using the Suzuki-Trotter decomposition, the replicated partition function is given by ⎡ ⎤ n M M β J α α Ji1···ip σi1 (t)· · ·σip (t) + B σi (t)σi (t + 1)⎦ (13) Z n = tr{σ} exp ⎣ M α=1 t=1 t=1 i1,···,ip

i

with B ≡ (1/2) log coth(γ/M ) and βJ ≡ a0 /a2 . Using the replica symmetric and the static approximations, we have the average  ∞  ∞  ∞   ∞   dQαβ (t, t ) dλαβ (t, t ) dmα (t) dm ˆ α (t) exp [−N fRS ] (14) [Z n ]{ξ},{J} = tt ,αβ

−∞

−∞

−∞

−∞

in terms of the following order parameters. 1 α σi (t) = m, m ˆ α (t) = m ˆ mα (t) = N i   1 α   χ (α = β) λ1 (α = β) β  σi (t)σi (t ) = , λαβ (t, t ) = Qαβ (t, t ) =  β) q (α = β) λ2 (α = N

(15) (16)

i





where m ˆ α (t) and λαβ (t, t ) are the conjugate order parameters for the mα (t) and the Qαβ (t, t ), respectively. Those are defined by 

  ∞  ∞ 1 α dmα (t) dm ˆ α (t) exp im ˆ α (t) mα (t) − σi (t) = 1 (17) N −∞ −∞ i 

  ∞  ∞ 1 α    β  dQαβ (t, t ) dλαβ (t) exp iλαβ (t, t ) Qαβ (t, t ) − σi (t)σi (t ) = 1. (18) N −∞ −∞ i

Then, we obtain the free energy density f RS explicitly in terms of the above order parameters as  ∞  ∞ (p − 1) βJ J 2 (χp − q p ) − βJ−1 Dw log Dz 2 cosh Ξ (19) f RS (m, χ, q) = (p − 1)J0 mp + 4 −∞ −∞ ˆ = pβJ J0 mp−1 where we used the saddle point equations with respect to m, ˆ λ1 , λ2 , namely, m 2 p−1 2 p−1 and λ1 = p(βJ J) χ /2, λ2 = p(βJ J) q /2. Then, the saddle point equations that determine the equilibrium state are derived as follows.  

  ∞   ∞  ∞  ∞ Φ sinh Ξ Φ sinh Ξ 2 , q= Dω Dz Dω Dz (20) m = ΞΩ ΞΩ −∞ −∞ −∞ −∞       ∞ Φ 2 Dω ∞ 2 sinh Ξ Dz cosh Ξ + γ (21) χ = Ξ Ξ3 −∞ Ω −∞

213

  where ≡ ω p(βJ J)2 q p−1 /w + z p(βJ J)2 (χp−1 − q p−1 )/2 + pβJ J0 mp−1 and  we defined Φ ∞ √ 2 Ξ ≡ Φ2 + γ 2 , Ω ≡ −∞ Dz cosh Ξ with Dz ≡ (dz/ 2π) e−z /2 . For the solution of the saddle point equations, the BER leads to  ∞ Dw H(−zp ) (22) PB = −∞

  where we defined zp ≡ −(pβJ J0 mp−1 + w  p(βJ J)2 q p−1 /2)/ p(βJ J)2 (χp−1 − q p−1 )/2. The ∞ error function H(x) is defined as H(x) = x Dz. We find that the above pB depends on γ through the order parameters χ, q and m. At finite temperature, the phase diagrams obtained by solving the above saddle point equations numerically were reported in our previous article [11]. However, our interest here is rather zero temperature properties. In order to investigate ‘pure’ quantum effects on the decoding performance of the Sourlas codes for finite number of the bit-product p, we here derive the saddle point equations for quantum Sourlas codes at zero temperature, namely, TJ ≡ βJ−1 → 0 in the above replica symmetric saddle point equations. To do this, we take the limit βJ → ∞ keeping Γ = γ/βJ finite and find the relevant solution so as to satisfy χ − q → 0 and βJ (χ − q) = t = O(1). Then, we have immediately as 



m = −∞

φ Dw  , q= φ2 + Γ2



∞ −∞

φ2 Dw , t = Γ2 φ2 + Γ2





−∞

Dw + Γ2 )3/2

(φ2

(23)

  with φ = wJ pq p−1 /2 + φ2 J 2 p(p − 1)qp−2 t/2 φ2 + Γ2 + pJ0 mp−1 . Forthe solution for the above saddle point equations (23), the BER leads to pB = H(pJ0 mp−1 /J pq p−1 /2). We show the results for p = 2 and 3 cases in Figure 2. The left panel shows the behavior of magnetization 1

1

m (p = 2) m (p = 3) q (p = 2) q (p = 3)

0.8

PB (p = 2) PB (p = 3)

0.1 0.6

0.0305

0.4

PB (p = 2)

0.03 0.0295

0.01

0.029 0.0285

0.2

0.028 0.0275

Γ

0

0

0

1

Γ

2

3

0.001

0

1

2

Γ

1

3

Figure 2. The behavior of order parameters m, q (left) and the BER (right). The inset of the right panel shows the behavior around the optimal amplitude of the transverse field. We set J0 = J = 1. and spin glass order parameter for p = 2 and 3, that is, the number of spin products in the parity is 2 and 3. We find that at the critical point, the second order phase transition takes place for p = 2, whereas the first order phase transition occurs for p = 3. The right panel is showing the BER as a function of the amplitude Γ. Thus, we find that there exists a close relationship between the bit-error performance and quantum phase transitions [12, 13]. We also find that there exists an optimal amplitude Γ and the BER is minimized at the value.

214

5. TAP-like mean-field decoding In the previous section, we evaluated the performance of the Bayes-optimal decoding in the Sourlas codes with finite p. Within the replica symmetric theory, we found that there exists the optimal value of the Γ. In practice, for a given set of the parity {J}, we should calculate ˆ ˆ σiz ρˆβ )], ρˆβ = e−β H /tr e−β H for each bit. To calculate the trace the estimate ξˆi = limβ→∞ sgn[tr(ˆ effectively by sampling, the quantum Monte Carlo method (QMCM) might be applicable and useful [14]. However, unfortunately, the QMCM approach encounters several crucial difficulties. First, it takes quite long time for us to simulate the quantum states for large number of the Trotter slice. Second, in general, it is technically quite hard to simulate the quantum states at zero temperature. Thus, we are now stuck for the computational cost problem. Nevertheless, as an alternative to decode the original message practically, we here examine a TAP (Thouless-Anderson-Palmer)-like mean-field algorithm which has a lot of the variants applying to various information processing [15, 16]. In this talk, we shall provide a simple attempt to apply the mean-field equations to the Sourlas error-correcting codes for the case of p = 2. In following, the derivation of the equations is briefly explained. We shall start the Hamiltonian:   2J0 J ˆ = − Jij0 + √ η, Jij0 = ξi ξj , η = N (0, 1) (24) Jij σ ˆiz σ ˆjz − Γ σ ˆix , Jij = H N N ij i Then, we rewrite the Hamiltonian as follows. ˆ (0) + Vˆ ˆ =− (Γˆ σix + hi σ ˆiz ) + Jij (mi Iˆi )(mj Iˆj ) − Jij (ˆ σiz − mi Iˆi )(ˆ σjz − mj Iˆj ) ≡ H H i

ij

ij

(25) ˆ (0) ≡ − H



(Γˆ σix + hi σ ˆiz ) +

i



≡ −





Jij (mi Iˆi )(mj Iˆj )

ij

Jij (ˆ σiz

− mi Iˆi )(ˆ σjz − mj Iˆj ), hi ≡ 2

ij

(26)

Jij mj

(27)

j

where we defined the 2N × 2N identity matrix Iˆi , which is formally defined by Iˆi ≡ I (1) ⊗ · · · ⊗ I (i) ⊗ · · · ⊗ I (N) . mi is the local magnetization for the system described by the mean-field ˆ (0) , that is, Hamiltonian H (0)

(0)

σiz ρˆβ ), ρˆβ ≡ mi ≡ mzi = lim tr(ˆ β→∞

ˆ (0) ) exp(−β H . ˆ (0) ) tr exp(−β H

(28)

Shortly, we derive closed equations to determine mi . It is very tough problem for us to diagonalize ˆ whereas it is rather easy to diagonalize the mean-field Hamiltonian H ˆ (0) . the 2N × 2N matrix H, Actually, we immediately obtain the ground state internal energy as  1 Ei + hi mi , Ei ≡ Γ2 + h2i . (29) E (0) = − 2 i

i

(0) with respect to m and setting it to zero, namely, Then, taking the derivative of the i  E  ∂E (0) /∂mi = k (∂hk /∂mi ){hk / Γ2 + h2k − mk } = 0, we have

(∀i )

mi =



hi Γ2 + h2i

215

, hi = 2

j

Jij mj .

(30)

The above equations are nothing but the so-called naive mean-field equations for the Ising spin glass (the Sherrington-Kirkpatrick model [17]) in a transverse field. It should be noted that the equations are reduced to (∀i ) mi = hi /|hi | = sgn(hi ) = limβ→∞ tanh(βhi ) which is naive mean-field equations at the ground state for the corresponding classical system. To improve the approximation, according to [18, 19], we introduce the reaction term Ri for each pixel i and rewrite the local field hi such as 2 j Jij mj − Ri . Then, the naive mean-field equations (30) are rewritten as    

2 j Jij mj − Ri hi Γ2 Ri  2 1− 2 . (31) (∀i ) mi =   2 2 3/2 Γ + hi hi (Γ + hi ) Γ2 + (2 J m − R )2 j

ij

j

i

In the last line of the above equation, we expanded the equation with respect to Ri up to the ˆ by using the eigenvector first order. We next evaluate the expectation of the Hamiltonian H   (0) x ˆ that diagonalizes the mean-field Hamiltonian H = − i (Γˆ σi + hi σ ˆiz ) + ij Jij (mi Iˆi )(mi Iˆj ). We obtain

Jij2 (0) 4 (32) Eg = E − Γ 2 E 2 (E + E ) . 2E i j i j ij Then, (∂Eg /∂mi ) = 0 gives ⎡ ⎤ 3 1   2 2 Jij mi [2(1 − m2i )(1 − m2j ) 2 + 3(1 − m2i ) 2 (1 − m2j )2 ] hi 1 Γ ⎣1 − ⎦. mi = 2 1 1 Γ2 + h2i hi (Γ + h2i )3/2 2Γ[(1 − m2 ) 2 + (1 − m2 ) 2 ]2 j i

j

(33) By comparing (31) and (33), we might choose the reaction term Ri for each bit i consistently as Jij2 mi [2(1 − m2i )(1 − m2j ) 2 + 3(1 − m2i ) 2 (1 − m2j )2 ] 3

Ri =

1

1

j

1

2Γ[(1 − m2i ) 2 + (1 − m2j ) 2 ]2

.

Therefore, we now have a decoding dynamics described by  2 j Jij mj (t) − Ri (t) mi (t + 1) =   Γ2 + {2 j Jij mj (t) − Ri (t)}2

(35)

Jij2 mi (t)[2(1 − mi (t)2 )(1 − mj (t)2 ) 2 + 3(1 − mi (t)2 ) 2 (1 − mj (t)2 )2 ] 3

Ri (t) =

1

j

(34)

1

1

2Γ[(1 − mi (t)2 ) 2 + (1 − mj (t)2 ) 2 ]2

(36)

for each bit i. Then, the MPM estimate is given as a function of time t as ξ i (t) = sgn[mi (t)] and the BER is evaluated at each time step through the following expression

1 1 1− ξi ξ i (t) . (37) pB (t) = 2 N i

We should notice that the mean-field equations are always retrieved by setting Ri = 0 for all i. We plot several results in Figure 3. In the left panel of this figure, we plot the dynamics

216

0.6

1

J0/J = 0.8 J0/J = 2 J0/J = 1

0.5

N = 400,10-samples

0.1 0.4

PB 0.3

PB

0.01

0.2

0.1

0.001

0

-0.1

1e-04 0

10

20

t

30

40

0

0.3

0.6

0.9

J0 /J

1.2

1.5

1.8

Figure 3. The dynamics of the TAP-like mean-field decoding (left, N = 1000, the error-bars are evaluated by 10-samples). We set p = 2, Γ0 = 0.5 and J/J0 = 0.8, 2 and 1. The right panel shows the signal-to-noise ratio dependence of the BER. We set p = 2 and Γ0 = 0.5.

of mean-field decoding. We plot them for several cases of the signal-to-noise ratio. During the decoding dynamics, we control the Γ by means of   c . (38) Γ(t) = Γ0 1 + t+1 In the Figure 3, we set Γ0 = 0.5. From this figure, we find that the BER converges to the close value predicted by the replica symmetric theory. In the right panel of Figure 3, we show the BER as a function of the signal-to-noise ratio. We find that beyond the ratio J0/J  1, the BER drops. Although the above results are still at preliminary level, however, from these limited results presented here, we might confirm that TAP-like mean-field approach examined here is useful to obtain the solution which is close to the Bayes-optimal performance for the decoding. 6. Concluding remark We examined a quantum version of TAP (Thouless-Anderson-Palmer)-like mean-field algorithm at zero temperature for the problem of error-correcting codes. Although the presented results are still at preliminary level and we should be careful to conclude, the algorithm seems to work well for our decoding problem. Of course, much more extended studies are needed. For instance, we have problems to be clarify such as the structure of basin (the initial condition dependence of the decoding dynamics), studies for the case of p ≥ 3, a comparison of the results with those obtained by the QMCM, the relationship between the convergence of the algorithm and the Almeida-Thouless instability which was investigated for the case of the LDPC (Low Density Parity Check) codes [20]. Some of these issues will be discussed at the workshop. Acknowledgments One of the authors (J.I.) acknowledges Nicola Sourlas for his helpful comments on this study when we met in Greece (International Conference in ΣΦ2008). We were financially supported by Grant-in-Aid Scientific Research on Priority Areas “Deepening and Expansion of Statistical Mechanical Informatics (DEX-SMI)” of The Ministry of Education, Culture, Sports, Science and Technology (MEXT) No. 18079001.

217

References [1] Nishimori H 2001 Statistical Physics of Spin Glasses and Information Processing: An Introduction (Oxford: Oxford University Press) [2] Mezard M, Parisi G and Virasoro M A 1987 Spin Glass Theory and Beyond (Singapore: World Scientific) [3] N. Sourlas, Nature 339, 693 (1989). [4] P. Ruj´ an, Phys. Rev. Lett. 70, 2968 (1993). [5] H. Nishimori, J. Phys. Soc. Japan 62, 2973 (1993) [6] H. Nishimori and K. Y. M. Wong, Phys. Rev. E 60, 132 (1999). [7] Y. Kabashima and D. Saad, Europhys. Lett. 45, 98 (1999). [8] K. Tanaka and T. Horiguchi, IEICE J80-A-12, 2217 (1997) (in Japanese). [9] K. Tanaka, J. Phys. A: Math. Gen. 35, R81 (2002). [10] J. Inoue, Phys. Rev. E 63, 046114 (2001). [11] J. Inoue, Quantum Spin Glasses, Quantum Annealing, and Probabilistic Information Processing, in Quantum Annealing and Related Optimization Methods, A. Das and B. K. Chakrabarti (Eds.), Lecture Notes in Physics 679 (2005). [12] B. K. Chakrabarti, A. Dutta and P. Sen, Quantum Ising Phases and Transitions in Transverse Ising Models, (Springer 1995). [13] S. Sachdev, Quantum Phase Transitions, Cambridge University Press (1999). [14] M. Suzuki, Prog. Theor. Phys. 58, 1151 (1977). [15] M. Opper and D. Daad (Eds.), Advanced Mean Field Methods: Theory and Practice, The MIT Press (2001). [16] M.I. Jordan (Eds.), Learning in Graphical Models, The MIT Press (1998). [17] D. Sherrington and S. Kirkpatrick, Phys. Rev. Lett. 35, 1792 (1975). [18] H. Ishii and T. Yamamoto, J. Phys. C, 18, 6225 (1985). [19] T. Yamamoto, J. Phys. C 21, 4377 (1988). [20] Y. Kabashima, J. Phys. Soc. Japan 72, 1645 (2003).

218