49th IEEE Conference on Decision and Control December 15-17, 2010 Hilton Atlanta Hotel, Atlanta, GA, USA
Rate Distortion Function with Causal Decoding Charalambos D. Charalambous, Christos K. Kourtellaris and Photios A. Stavrou
Abstract— This paper considers source coding of general sources with memory, when causal feedback is available at the decoder. The rate distortion function is defined as the infimum of the so-called directed information over causal data compression channels which satisfy a distortion fidelity constraint. The expression of the optimal causal data compression is derived. Further a tight lower bound on the rate distortion bound is derived. These are variants of the classical noncausal rate distortion function, and associate Shannon lower bound. Generalization of the results to controlled sources is also discussed.
mutual information as follows. Shannon’s self-mutual information for a given realization X n = xn , Y n = y n of these sequences is defined by n n ;x ) (P (dy; x) denotes conditional i(xn ; y n ) = log P P(dy (dy n ) distribution), while its average over all realizations, called Shannons mutual information is defined by [6] I(X n ; Y n ) = EP (dxn ,dyn ) i(xn ; y n ) P (dy n ; xn ) P (dy n , dxn ) = log P (dy n )
I. I NTRODUCTION Over the past few years there has been a renewed interest in communication systems with feedback, and in designing causal encoders and decoders. An important application which involves communication and control analysis and design is that of controlling dynamical systems over finite rate communication channels [1], [10], [3], [4], [5]. One of the fundamental problems often encountered in communication systems and/or control/communication applications is the causal lossy compression, in which information should be provided causally to the encoder and/or decoder. A typical scenario is depicted in Figure II.1. The objective of this paper is to investigate the source coding problem, when the decoder has causal feedback for general uncontrolled and controlled sources, via the rate distortion function [6]. However, unlike the classical rate distortion theory [7], which employs the mutual information to represent the rate between the source sequence and the reconstruction of source sequence, causality of the decoder requires the use of directed information [8], a variant of the mutual information. Specifically, the optimal classical data compression channel [7], [9] is non-causal while in the presence of causal feedback (see Figure II.1) implies that the quantity of interest should be the directed information. The implications of causality on the data compression channel is understood by considering two sequences, X n = (X0 , X1 , . . . , Xn ) denoting the source, and Y n = (Y0 , Y1 , . . . , Yn ), denoting the reconstruction of the source, and then defining Shannon’s The research leading to these results has received funding from the European Community’s Seventh Framework Programme (FP7/2007-2013) under grant agreement no. INFSO-ICT-223844 and the Cyprus Research Promotion Foundation under the grant ARTEMIS-0506/20. C.D. Charalambous is with the Department of Electrical Engineering, University of Cyprus, Nicosia, Cyprus. E-mail:
[email protected] Christos K. Kourtellaris is with the Department of Electrical Engineering, University of Cyprus, Nicosia, Cyprus. Email:
[email protected] Photios A. Stavrou is with the Department of Electrical Engineering, University of Cyprus, Nicosia, Cyprus. E-mail:
[email protected]
978-1-4244-7746-3/10/$26.00 ©2010 IEEE
The classical source coding or lossy data compression is defined by introducing an average distortion or fidelity constraint associated with a distortion measure ρn : X0,n × Y0,n → [0, ∞), n ∈ N. The data compression channel which minimizes the rate of reconstructing X n by Y n [7] is ∗ esρn (x ,y ) ν0,n (dy n ) , s≤0 n n ∗ esρ(x ,z ) ν0,n (dz n ) Y0,n n
∗ q0,n (dy n ; xn )
n
=
(I.1)
∗ ∗ ∈ M1 (Y0,n ) is the marginal of P0,n = μ0,n ⊗ where ν0,n ∗ q0,n ∈ M1 (X0,n × Y0,n ), and s ≤ 0 is the Largange multiplier associated with the fidelity constraint. Even for single n letter distortion ρn (xn , y n ) = i=0 ρi (xi , yi ) it follows ∗ (dy n ; xn ) = ×nj=0 q ∗ (dyj ; y j−1 , xn ) = from (I.1) that q0,n ×nj=0 q ∗ (dyj ; y j−1 , xi ), hence causality of the data compression fails. The main objective of this paper is to re-define the rate distortion function and ensure the optimization leads to a causal data compression channel. To further understand the implications of causality and feedback consider the following alternative expressions of self-mutual information.
i(xn ; y n ) =
n
log
i=0
P (dyi ; y i−1 , xi )P (dxi ; y i−1 , y i−1 ) P (dyi ; y i−1 )P (dxi ; xi−1 )
By taking expectation I(X n ; Y n ) = I(X n → Y n ) + I(X n ← Y n ) where
6445
I(X n → Y n ) = =
n i=0
n
I(X i ; Yi |Y i−1 )
i=0
P (dyi ; y i−1 , xi ) P (dy i , dxi ) log P (dyi ; y i−1 )
I(X n ← Y n ) = =
n i=0
n
I(Y i−1 ; Xi |X i−1 )
i=0
P (dxi ; xi−1 , y i−1 ) P (dy i−1 , dxi ). log P (dxi ; xi−1 )
Note that the term I(X n → Y n ) is the directed information from X n to Y n discussed in [8], and corresponds to the Shannon mutual information restricted to the degraded channel X n → Y n , while I(X n ← Y n ) is the directed information from Y n to X n , which corresponds to the Shannon mutual information restricted to the degraded channel X n ← Y n . Thus, the formulation of causal rate distortion function should involve only the term I(X n → Y n ). The main objectives of this paper are the following. 1) Provide the definition of the causal rate distortion function for general uncontrolled sources, and derive the optimal causal reconstruction kernel. 2) Provide the definition of the causal rate distortion function for general controlled sources, and derive the optimal causal reconstruction kernel. 3) Provide tight lower bound on the casual rate distortion function associated with the uncontrolled and controlled source.. Previous related work on definition on causal rate distortion is found in [2], where coding theorems are also derived. Recently, the problem is revisited in [11], [12]. An alternative non-information theoretic approach is found in [13], [14] using stochastic optimization methods. The material presented in this paper compliments previous work in causal data compression in the sense that we provide the formulation, as well as the optimal causal reptroduction kernel, for general uncontrolled and controlled sources, and tight lower bounds. II. P ROBLEM F ORMULATION In this section, we introduce the set up of the problem on abstract alphabets (Polish spaces) and a discrete time set Nn = {0, 1, . . . , n}, n ∈ N = {0, 1, 2, . . .}. All processes are defined on a complete probability space (Ω, F(Ω), P) with filtration {Ft }t≥0 . The source and reconstruction alphabets are sequences of Polish spaces {Xt : t = 0, 1, . . . , n} and {Yt : t = 0, 1, . . . , n}, respectively,(e.g., Yt , Xt are complete separable metric spaces). Moreover, the abstract alphabets are associated with their corresponding measurable spaces (Xt , B(Xt )) and (Yt , B(Yt )) (e.g., B(Xt ) is a Borel σ−algebra of subsets of the set Xt generated by closed sets). Thus, sequences of source and reproduction of the source alphabets are identified with the product measurable spaces (X0,n , B(X0,n ) = ×nk=0 (Xk , B(Xk )), and (Y0,n , B(Y0,n ) = n ×k=0 (Yk , B(Yk )), respectively. The source is a random process denoted by X n = {Xt : t = 0, 1, . . . , n}, X : n N × Ω → Xt , and the reconstruction of the source is another process denoted by Y n = {Yt : t = 0, 1, . . . , n}, n Y : N × Ω → Yt , where the subscript denotes the time evolution of the processes. Probability measures on any measurable space (Z, B(Z)) are denoted by M1 (Z).
Fig. II.1.
Control/Communication System with Feedback
Next, we introduce the definition of conditional independence. Conditional Independence: Conditionally independent Random Variables (R.V’s) are denoted by (X, Y ) ⊥ Z or equivalently Y → Z → X form a Markov chain. Definition 2.1: Given the measurable spaces (X , B(X )), (Y, B(Y)), a stochastic Kernels on (Y, B(Y)) conditioned on (X , B(X )) is a mapping q : B(Y) × X → [0, 1] satisfying the following two properties: 1) For every x ∈ X , the set function q(·; x) is a probability measure (possibly finitely additive) on B(Y); 2) For every A ∈ B(Y), the function q(A; ·) is B(X )measurable. The set of all stochastic Kernels (Y, B(Y)) conditioned on (X , B(X )) are denoted by Q(Y; X ). The definition of stochastic kernel can be used to define a causal and non-causal rate distortion channels (reproduction kernels) as follows. Definition 2.2: Given the measurable spaces (X0,n , B(X0,n )) and (Y0,n , B(Y0,n )), n ∈ N+ , and their product spaces, data compression channels are defined as follows. 1. Causal Data Compression Channel. A causal data compression channel is a sequence of stochastic kernels {qj (dyj ; y j−1 , xj ) ∈ Q(Yj ; Y0,j−1 × X0,j ) : j ∈ Nn } 2. Non-Causal Data Compression Channel. A noncausal data compression channel is a stochastic kernel q0,n (dy n ; xn ) ∈ Q(Y0,n ; X0,n ), n ∈ N. Thus, a causal data compression channel is a sequence of conditional distributions for Yj given Y j−1 = y j−1 and X j = xj denoted by PYj |Y j−1 ,X j (dyj |Y j−1 = y j−1 , X j = xj ), j = 0, 1, . . . , n. On the other hand, a non-causal data compression channel is given by PY n |X n (dy n |X n = xn ). Since by chain rule PY n |X n (dy n |X n = xn ) = n j−1 = y j−1 , X n = xn ), j=1 PYj |Y j−1 ,X n =xn (dyj |Y it is clear that in classical rate distortion theory the reconstruction of Yj = yj depends on future values of the source sequence, namely, (Xj+1 = xj+1 , . . . , Xn = xn ) in addition to the past reconstructions Y j−1 = y j−1 , and past and present source symbols X j = xj . III. R ATE D ISTORTION F UNCTION The goal of this section is to formulate rate distortion function subject to a causal constraint on the reconstruction
6446
Kernels and then derive the optimal casual reconstruction kernel. A. Non-Causal Rate Distortion Function Given a source probability measure μ0,n ∈ M1 (X0,n ) (possibly finite additive) and a non-causal reconstruction Kernel q0,n ∈ Q(Y0,n ; X0,n ), one can define three probability measures as follows: (P1): The joint probability measure P0,n ∈ M1 (Y0,n ×X0,n ) by
μ0,n −almost all xn ∈ X0,n . Moreover, if P0,n = μ0,n ⊗ q0,n