Operator Inequality and its Application to Capacity of Gaussian Channel Kenjiro YANAGI ∗, Han Wu CHEN †and Ji Wen YU
‡
Key Words: Operator Inequality, Gaussian Channel, Capacity, Feedback 1991 Mathematical Classification: 94A
1
INTRODUCTION
The following model for the discrete time Gaussian channel with feedback is considered: Yn = Sn + Zn , n = 1, 2, . . . where Z = {Zn ; n = 1, 2, . . .} is a non-degenerate, zero mean Gaussian process representing the noise and S = {Sn ; n = 1, 2, . . .} and Y = {Yn ; n = 1, 2, . . .} are stochastic processes representing input signals and output signals, respectively. The channel is with noiseless feedback, so Sn is a function of a message to be transmitted and the output signals Y1 , . . . , Yn−1 . For a code of rate R and length n, with code words xn (W, Y n−1 ), W ∈ {1, . . . , 2nR }, and a decoding function gn : Rn → {1, . . . , 2nR }, the probability of error is Pe(n) = P r{gn (Y n ) = W ; Y n = xn (W, Y n−1 ) + Z n }, where W is uniformly distributed over {1, . . . , 2nR } and independent of Z n . The signal is subject to an expected power constraint 1 E[Si2 ] ≤ P, n i=1 n
∗
Department of Applied Science, Faculty of Engineering, Yamaguchi University, Ube 755-8611, Japan. e-mail:
[email protected] † Graduate School of Science and Engineering, Yamaguchi University, Ube 755-8611, Japan ‡ Graduate School of Science and Engineering, Yamaguchi University, Ube 755-8611, Japan
1
and the feedback is causal, i.e., Si is dependent of Z1 , . . . , Zi−1 for i = 1, 2, . . . , n. Similarly, when there is no feedback, Si is independent of Z n. It is well known that a finite block length capacity is given by (n)
(n)
|R + R | 1 ln X (n) Z , Cn,F B,Z (P ) = max 2n |RZ | (n)
where the maximum is on RX symmetric, nonnegative definite and B strictly lower triangular, such that (n)
(n)
T r[(I + B)RX (I + B t ) + BRZ B t ] ≤ nP. Similarly, let Cn,Z (P ) be the maximal value when B = 0, i.e. when there is no feedback. Under these conditions, Cover and Pombra proved the following. Proposition 1 (Cover and Pombra [6]) For every > 0 there exist codes, with block length n and 2n(Cn,F B,Z (P )− ) codewords, n = 1, 2, . . ., such that Pe(n) → 0, as n → ∞. Conversely, for every > 0 and any sequence of codes with 2n(Cn,F B,Z (P )+ ) codewords and block length n, Pe(n) is bounded away from zero for all n. The same theorem holds in the special case without feedback upon replacing Cn,F B,Z (P ) by Cn,Z (P ). When block length n is fixed, Cn,Z (P ) is given exactly. Proposition 2 (Gallager [11]) 1 nP + r1 + · · · + rk ln , Cn,Z (P ) = 2n i=1 kri k
(n)
where 0 < r1 ≤ r2 ≤ · · · ≤ rn are eigenvalues of RZ and k(≤ n) is the largest integer satisfying nP + r1 + · · · + rk > krk . In this paper, we first show that the Gaussian feedback capacity Cn,F B,Z (P ) is a concave function of P by using the operator concavity of log t. And we also show that Cn,F B,Z (P ) is a convexlike function of Z by using the operator convexity of log(1 + t−1 ). At last we have an open problem about convexity of Cn,F B,·(P ). For the sake (n) (n) (n) of simplicity, we use the notations RS , RZ , RS+Z , . . . instead of RS , RZ , RS+Z , . . . from the next section.
2
2
CONCAVITY OF Cn,F B,Z (·)
Before proving the concavity of Cn,F B,Z (P ) as the function of P , we need some known results. We denote ranA, kerA as the range of A, the kernel of A, respectively. Proposition 3 (Douglas [8]) Let H is a real Hilbert space and let B(H) be the all of bounded linear operators on H. And let A, B ∈ B(H). Then the following assertions are equivalent. (i) ranA ⊂ ranB (ii) There exists α ≥ 0 such that AA∗ ≤ αBB∗ . (iii) There exists C ∈ B(H) such that A = BC. Furthermore when the above condition (iii) holds, C is uniquely determined such that the following three conditions are satisfied: (1) C2 = inf{α : AA∗ ≤ αBB∗ }, (2) kerA = kerC, (3) ranC ⊂ (kerB)⊥ . Proposition 4 (Baker [1]) Let H1 (resp. H2 ) be a real and separable Hilbert space with Borel σ-field Γ1 (resp. Γ2 ). Let µX (resp. µY ) be a probability measure on (H1 , Γ1 )(resp. (H2 , Γ2 )) satisfying 2 x1 dµX (x) < ∞ (resp. y22 dµY (y) < ∞) H1
H2
Let RX and mX (resp. RY and mY ) denote the covariance operator and mean element of µX (resp. µY ). Let (H1 × H2 , Γ1 × Γ2 ) be the product measurable space generated by the measurable rectangles. Let µXY , having R as covariance and m as mean element, be a joint measure on (H1 × H2 , Γ1 × Γ2 ) with projections µX and µY . Then the cross-covariance operator RXY of the µXY has a decomposition as 1
1
RXY = RX2 V RY2 , where V is a unique bounded linear operator such that V : H2 → H1 , V ≤ 1, kerRY ⊂ kerV and ranV ⊂ ranRX . The following is given by the operator concavity of log t.
3
Proposition 5 (Cover and Pombra [6]) Let A and B be nonnegative definite matrices. For any α, β ≥ 0 satisfying α + β = 1, we have |αA + βB| ≥ |A|α |B|β .
Lemma 1 Let RS be the covariance matrix of a random vector S. For any α, β ≥ 0 satisfying α + β = 1, the following formulas hold. (i) αRS1 + βRS2 = RαS1+βS2 + αβRS1−S2 . (ii) αRS1 + βRS2 ≥ RαS1 +βS2 , where, if 0 < α < 1, then the equality holds if and only if S1 = S2 . (iii) αRS1 +Z + βRS2+Z = RαS1 +βS2 +Z + αβRS1−S2 . Proof of Lemma 1. (i) It is easy to obtain the following relations by the properties of non-negative definite matrices. RαS1+βS2 + αβRS1−S2 = α2 RS1 + αβRS1S2 + αβRS2S1 + β 2RS2 + αβRS1 − αβRS1S2 − αβRS2S1 + αβRS2 = α(α + β)RS1 + β(α + β)RS2 = αRS1 + βRS2 Then we have the result (i). (ii) We can directly get the result (ii) from (i), because RS1−S2 is non-negative definite matrix. (iii) It is easy to see from (i). Let S1 = S1∗ + Z and S2 = S2∗ + Z, then αS1 + βS2 = α(S1∗ + Z) + β(S2∗ + Z) = αS1∗ + βS2∗ + Z. S1 − S2 = S1∗ + Z − S2∗ − Z = S1∗ − S2∗ . Therefore αRS1∗ +Z +βRS2∗+Z = αRS1 +βRS2 = RαS1 +βS2 +αβRS1−S2 = RαS1∗ +βS2∗ +Z +αβRS1∗−S2∗ . ✷
Then we have the result (iii).
4
Theorem 1 Cn,F B,Z (P ) is a concave function with respect to P. That is, for any P1 , P2 ≥ 0 and for any α, β ≥ 0 satisfying α + β = 1, Cn,F B,Z (αP1 + βP2) ≥ αCn,F B,Z (P1 ) + βCn,F B,Z (P2 ).
Proof of Theorem 1. We can define Cn,F B,Z (P ) as the following Cn,F B,Z (P ) = max{
|RS+Z | 1 ln ; S ∈ Γ(P )}, 2n |RZ |
where Γ(P ) = {S; T r[RS ] ≤ nP }. By Lemma 1 (i) we have
= = =(a)
αRS1 +Z + βRS2+Z RαS1+βS2 +Z + αβRS1−S2 RαS1+βS2 + RαS1 +βS2,Z + RZ,αS1+βS2 + RZ + αβRS1 −S2 αRS1 + βRS2 + RαS1 +βS2 ,Z + RZ,αS1+βS2 + RZ 1
1
1
1
2 2 =(b) αRS1 + βRS2 + RαS V RZ2 + RZ2 V t RαS + RZ 1 +βS2 1 +βS2 1
1
=(c) αRS1 + βRS2 + (αRS1 + βRS2 ) 2 W V RZ2 1
1
+RZ2 (W V )t (αRS1 + βRS2 ) 2 + RZ . Here (a) follows from the Lemma 1 (i), and (b) follows from Proposition 4, where V ≤ 1, and (c) follows from the fact that we can gain RαS1+βS2 ≤ αRS1 + βRS2 1 1 by Lemma 1 (ii) and (RαS1 +βS2 ) 2 = (αRS1 + βRS2 ) 2 W by Proposition 3 (iii), where W ≤ 1. By getting determinants on the both sides of the equation above, we have 1
1
1
1
|αRS1 + βRS2 + (αRS1 + βRS2 ) 2 W V RZ2 + RZ2 (W V )t (αRS1 + βRS2 ) 2 + RZ | = |αRS1 +Z + βRS2+Z | ≥(e) |RS1 +Z |α |RS2+Z |β . Here (e) follows from Proposition 5. Therefore 1
1
1
1
|αRS1 + βRS2 + (αRS1 + βRS2 ) 2 W V RZ2 + RZ2 (W V )t (αRS1 + βRS2 ) 2 + RZ | 1 ln 2n |RZ | ≥
=
|RS1 +Z |α |RS2 +Z |β 1 ln 2n |RZ |
|RS1+Z | β |RS2 +Z | α ln + ln . 2n |RZ | 2n |RZ | 5
(1)
We can get both the S1 ∈ Γ(P1 ) attaining to Cn,F B,Z (P1 ) and the S2 ∈ Γ(P2 ) attaining to Cn,F B,Z (P2 ). Then the right hand side of (1) has RHS = αCn,F B,Z (P1 ) + βCn,F B,Z (P2 ). Since T r[αRS1 + βRS2 ] = αT r[RS1 ] + βT r[RS2 ] ≤ αnP1 + βnP2 = n(αP1 + βP2 ) and W V ≤ W V ≤ 1, we maximize the left hand side of (1) over Γ(αP1 +βP2) and we get Cn,F B,Z (αP1 + βP2 ) ≥ LHS. Then we have Cn,F B,Z (αP1 + βP2) ≥ αCn,F B,Z (P1 ) + βCn,F B,Z (P2 ). ✷
3
OPERATOR INEQUALITY
Let H be a Hilbert space. Let B(H) be all bounded linear operators on H and B(H)+ = {A ∈ B(H); A ≥ 0}. And let J be any interval of R and σ(A) be spectrum of A ∈ B(H). Definition 1 Let f : J → R be continuous. (1) f is called operator monotone if for any self-adjoint A, B ∈ B(H) satisfying σ(A), σ(B) ⊂ J, A ≤ B =⇒ f (A) ≤ f (B). (2) f is called operator convex if for any self-adjoint A, B ∈ B(H) satisfying σ(A), σ(B) ⊂ J, A+B f (A) + f (B) f( )≤ . 2 2 By the continuity of f , it is equivalent to f (λA + (1 − λ)B) ≤ λf (A) + (1 − λ)f (B) for any 0 ≤ λ ≤ 1. (3) f is called operator concave if −f is operator convex.
6
Proposition 6 Let f be nonnegative continuous function on [0, ∞). Then f is operator monotone if and only if f is operator concave. Proposition 7 f (t) = t−1 is operator convex on (0, ∞). Definition 2 (Kubo and Ando [14]) σ is called operator connection if σ is binary operation on B(H)+ satisfying the following axioms; (1) (Monotonicity) A ≤ C, B ≤ D =⇒ AσB ≤ CσD. (2) (Transform Inequality) C(AσB)C ≤ (CAC)σ(CBC). (3) (Upper Continuity) An ↓ A, Bn ↓ B =⇒ An σBn ↓ AσB, where An ↓ A represents
A1 ≥ A2 ≥ · · ·
and An → A(strong operator topology). σ is called operator mean if σ is operator connection satisfying IσI = I. Proposition 8 For any operator connection σ, there exsits a unique nonnegative operator monotone function f on [0, ∞) such that f (t)I = Iσ(tI), t ≥ 0. Then we have the followings; (1) σ → f is an affine order isomorphism between the class of connections and the class of nonnegative operator monotone functions on [0, ∞). (2) For invertible A ∈ B(H)+ , AσB = A1/2 f (A−1/2 BA−1/2 )A1/2 . (3) σ is operator mean if and only if f (1) = 1. Proposition 9 Let σ be operator connection and A, B, C ∈ B(H)+ . 7
(1) For any invertible C, C(AσB)C = (CAC)σ(CBC). (2) For any α ≥ 0, α(AσB) = (αA)σ(αB). Definition 3 (Kubo and Ando [14]) For invertible A, B ∈ B(H)+ , parallel sum is difined by A : B = (A−1 + B −1 )−1 . In general for A, B ∈ B(H)+ , it is defined by A : B = s − lim(A + I) : (B + I). ↓0
Harmonic mean is defined by A!B = 2(A : B).
Proposition 10 Let σ be operator connection and A, B, C, D ∈ B(H)+ . Then (AσB) : (CσD) ≥ (A : C)σ(B : D).
Proposition 11 Let f be nonnegative continuous function on [0, ∞). If f is operator monotone, then for any A, B ∈ B(H)+ f (A!B) ≤ f (A)!f(B).
Proof of Proposition 11. By Proposition 8, there exists a unique operator connection σ such that f (t)I = Iσ(tI).
≥ = = = =
(IσA) : (IσB) (I : I)σ(A : B) (by Proposition10) 1 ( I)σ(A : B) 2 1 1 ( I)σ( (2(A : B))) 2 2 1 (Iσ(2(A : B))) (by Proposition9(2)) 2 1 (Iσ(A!B)). 2 8
Then we have 2((IσA) : (IσB)) ≥ Iσ(A!B). Hence (IσA)!(IσB) ≥ Iσ(A!B). Thus f (A)!f(B) ≥ f (A!B). ✷ Proposition 12 Let f be positive continuous function on (0, ∞). If f is operator monotone, then f (t−1 ) is operator convex. Proof of Proposition 12. For any invertible A, B ∈ B(H)+ , we have A + B −1 ) ) 2 f (A−1 !B −1 ) f (A−1 )!f(B −1 ) (by Proposition11) (f(A−1 ))−1 + (f(B −1 ))−1 −1 { } 2 1 1 f (A−1 ) + f (B −1 ) (by Proposition7) 2 2 f ((
= ≤ = ≤
✷
4
CONVEXITY OF Cn,·(P ), Cn,F B,· (P )
Let RZ˜ = αRZ1 + βRZ2 , where α, β ≥ 0 (α + β = 1). Then we have the following convexity of Cn,· (P ). Theorem 2 For any Z1 , Z2 , for any P ≥ 0 and for any α, β ≥ 0 (α + β = 1), Cn,Z˜ (P ) ≤ αCn,Z1 (P ) + βCn,Z2 (P ).
Proof of Theorem 2. By Proposition 12, f (t) = log(1 + t−1 ) is operator convex on (0, ∞). Then
9
1 |RS + RZ˜ | 1 |RS + RZ1 | 1 |RS + RZ2 | log ≤ α log +β log . 2n |RZ˜ | 2n |RZ1 | 2n |RZ2 |
(2)
We can get the S ∈ Γ(P ) attaining to Cn,Z˜ (P ), where Γ(P ) = {S; T r[RS ] ≤ nP }. And we take the maximization of the right hand side of (2). Then we have the result. ✷ Now we have the following convex likeness of Cn,F B,·(P ). Theorem 3 For any Z1 , Z2 , for any P ≥ 0 and for any α, β ≥ 0 (α + β = 1), there exist P1 , P2 ≥ 0 (P = αP1 + βP2 ) such that Cn,F B,Z˜ (P ) ≤ αCn,F B,Z1 (P1 ) + βCn,F B,Z2 (P2 ).
Proof of Theorem 3. By Proposition 12, f (t) = log(1 + t−1 ) is operator convex on (0, ∞). Then 1 |RX + RZ˜ | 1 |RX + RZ1 | 1 |RX + RZ2 | log ≤ α log + β log . 2n |RZ˜ | 2n |RZ1 | 2n |RZ2 |
(3)
We can get the (X ∗ , B ∗ ) ∈ ∆(P ) attaining to Cn,F B,Z˜ (P ), where ∆(P ) = {(X, B); T r[(I + B)RX (I + B t ) + BRZ˜ B t ] ≤ nP }. Since T r[(I + B ∗ )RX ∗ (I + (B ∗ )t ) + B ∗ RZ˜ (B ∗ )t ] = αT r[(I + B ∗ )RX ∗ (I + (B ∗ )t ) + B ∗ RZ1 (B ∗ )t ] +βT r[(I + B ∗ )RX ∗ (I + (B ∗ )t ) + B ∗ RZ2 (B ∗ )t ], we have αP1 + βP2 = P , where T r[(I + B ∗ )RX ∗ (I + (B ∗ )t ) + B ∗ RZ1 (B ∗ )t ] = nP1 and
T r[(I + B ∗ )RX ∗ (I + (B ∗ )t ) + B ∗ RZ2 (B ∗ )t ] = nP2 .
We take the maximization of the right hand side of (3). Then we have the result. ✷
Finally we have the following open problem. Open Problem. For any Z1 , Z2 , for any P ≥ 0 and for any α, β ≥ 0 (α + β = 1), Cn,F B,Z˜ (P ) ≤ αCn,F B,Z1 (P ) + βCn,F B,Z2 (P ).
10
References [1] C.R.Baker, Joint measures and cross covariance operators, Trans. Amer. Math. Soc., vol 186, pp 273-289, 1973. [2] H.W.Chen and K.Yanagi, On the Cover’s conjecture on capacity of Gaussian channel with feedback, IEICE Trans. Fundamentals, vol E80-A, no 11, pp 22722275, November 1997. [3] H.W.Chen and K.Yanagi, Refinements of the half-bit and factor-of-two bounds for capacity in Gaussian channels with feedback, IEEE Trans. Information Theory, vol IT-45, no 1, pp 319-325, January 1999. [4] H.W.Chen and K.Yanagi, Upper bounds on the capacity of discrete time blockwise white Gaussian channels with feedback, to appear in IEEE Trans. Information Theory. [5] T.M.Cover, Conjecture: Feedback does not help much, in Open problems in communication and computation, T.Cover and B.Gopinath (Ed.), pp 70-71, Springer-Verlag, New York, 1987. [6] T.M.Cover and S.Pombra, Gaussian feedback capacity, IEEE Trans. Information Theory, vol IT-35, no 1, pp 37-43, January 1989. [7] T.M.Cover and J.A.Thomas, Elements of Information Theory, New York, Wiley, 1991. [8] R.G.Douglas, On majorization, factorization, and range inclusion of operators on Hilbert space, Proc. Amer. Math. Soc., vol 17, pp 413-415, 1966. [9] A.Dembo, On Gaussian feedback capacity, IEEE Trans. Information Theory, vol IT-35, no 5, pp 1072-1089, September 1989. [10] P.Ebert, The capacity of the Gaussian channel with feedback, Bell. Syst. Tech. J., vol 49, pp 1705-1712, 1970. [11] R.G.Gallager, Information Theory and Reliable Communication, John Wiley and Sons, New York, 1968. [12] F.Hiai and K.Yanagi, Hilbert Spaces and Linear Operators, (in Japanese), Makino-Shoten, 1995. [13] S.Ihara and K.Yanagi, Capacity of discrete time Gaussian channel with and without feedback, II, Japan J. Appl. Math., vol 6, pp 245-258, 1989. [14] F.Kubo and T.Ando, Means of positive linear operators, Math. Ann., vol 240, pp 205-224, 1980. 11
[15] M.Pinsker, talk delivered at the Soviet Information Theory Meeting, (no abstract published), 1969. [16] K.Yanagi, An upper bound to the capacity of discrete time Gaussian channel with feedback, Lecture Notes in Math., vol 1299, pp 565-570, 1988. [17] K.Yanagi, Necessary and sufficient condition for capacity of the discrete time Gaussian channel to be increased by feedback, IEEE Trans. Information Theory, vol IT-38, no 6, pp 1788-1791, November 1992 [18] K.Yanagi, An upper bound to the capacity of discrete time Gaussian channel with feedback, II, IEEE Trans. Information Theory, vol IT-40, no 2, pp 588-593, March 1994. [19] K.Yanagi, An upper bound to the capacity of discrete time Gaussian channel with feedback, III, Bull. Kyushu Inst. Tech., Pure and Applied Mathematics, vol 45, pp 1-8, 1998.
12