Robust Coding for Uncertain Sources: A Minimax Approach Farzad Rezaei , Charalambos D. Charalambous†
School of Information Technology and Engineering University of Ottawa CANADA † Department
of Electrical and Computer Engineering University of Cyprus CYPRUS
E-mail:
[email protected],
[email protected]
International Symposium on Information Theory September 2005 Adelaide, Australia 0
Table of Contents 1. Introduction 2. Problem Formulation 3. Solution to the Maximization Problem 4. Robust Shannon Coding 5. Robust Huffman Coding 6. A Numerical Example 7. Relationship to the Maximum Entropy 8. Minmax Redundancy 9. Future Research 1
Introduction
Given the source distribution → Find the code with the shortest average length.
1) Shannon Code
2) Huffman Code
Coding for one source with Unknown Statistics → Use empirical distribution [Davisson73, Barron92, Barron98, Cziszar98 ]
Our problem: Coding for a class of sources, described by a relative entropy constraint. 2
Introduction
There is a known nominal distribution w.r.t. which relative entropy is considered.
This class contains infinitely many sources.
We wish to design a code which is Robust in sense of average length.
i.e. One code for a class of sources, which performs reasonably well for all of them.
3
Problem Formulation
The finite alphabet Σ, and M(Σ) the set of distributions on Σ.
Each source has distribution denoted by ν, the nominal distribution denoted by µ is known,
Uncertainty Description: MR = {ν ∈ M(Σ); H(ν|µ) ≤ R}, R is given. Minmax Problem ( D-ary code ) (
J(`∗, ν ∗) = inf (`1,...,`M ) supν∈M (Σ) Eν (`) (1) PM −` i Subject to H(ν|µ) ≤ R, ≤1 i=1 D
4
Problem Formulation
Lagrangian: Lλ,s (`, ν) = Eν (`) − s(H(ν|µ) − R) + λ(
PM
i=1 D
−`i
− 1)
and the associated dual functional Lλ,s(`, ν ∗) = sup Lλ,s(`, ν) ν∈MR
s > 0 and λ are Lagrange Multipliers.
The supremum over ν is independent of the Kraft inequality.
5
Solution to the Maximization Problem Duality Relation between relative entropy and free energy → Large Deviations theory. See [meneghini96], [deuschel-stroock89]. Theorem 1 Assume s > 0, then the dual function Lλ,s(`, ν ∗) is given by Lλ,s (`, ν ∗ )
= sR + s log (
PM
i=1 e
`i s
µi ) + λ(
PM
i=1 D
−`i
− 1)
Moreover, the supremum is attained at ∗,s
νi
=
`i e s µi `j PM s j=1 e µj
∀i ∈ {1, ..., M }
(2)
The worst case distribution occurs on the boundary of the constraint, that is,
s0 = arg mins>0 Lλ,s(`, ν ∗,s) is such that H(ν ∗,s0 |µ) = R. 6
Robust Shannon Coding
By Kuhn-Tucker conditions, the optimum codeword length, `∗j
C s∗ ln ( )e =d 1 + s∗.lnD µj
∀j ∈ {1, ..., M } (3)
∗
` i PM where C = i=1 e s∗ µi. `∗i ’s and s∗ from double minimization with re-
spect to lengths and s.
(M + 1) unknown values (`∗1, ..., `∗M , s∗).
∗
(3) gives M equations, and H(ν ∗,s |µ) = R. Call this Robust Shannon Code.
7
Robust Shannon Coding
Uniform µ: all code word lengths are equal. ∗ ∗,s Then, ν also uniform, and s∗ → ∞.
This shows that we gain nothing by using robust coding method if the nominal distribution is itself uniform. This result holds for any R > 0.
The worst distribution is µ itself.
8
Robust Shannon Coding Theorem 2 The optimum distribution is given by 1) µi α = PM α j=1 µj s ln D α= 1 + s ln D ∗,s νi
∀i ∈ {1, ..., M }
(4)
2) H(ν ∗,s|µ) is a non-increasing function of s.
3) codeword lengths 1 `∗(s) = dlogD ( ∗,s )e ν Theorem 3 The necessary condition for existence of a solution M 1 1 X R≤ ln ( ) − ln M = H(η|µ) M i=1 µi
(5) 9
Robust Shannon Coding
Remark 1 Coding with respect to the distribution ν ∗, leads to an average length close to R´ enyi entropy. This is similar to the result given in (merhav91). Lemma 1 Suppose {`∗1, ..., `∗M } and s∗ correspond to the robust Shannon code, `∗max ≤ logD (
1
)
µmin 1 1 s∗ ≤ ( logD ( ) − HD (µ)) R µmin
10
Robust Shannon Coding 1) Initial s equal to the upper bound of s∗ , Then C(s) = (
M X
1 α α
µi )
i=1 s ln D . where α = 1+s ln D 2)
³ C(s) ´ s `i = ln 1 + s ln D µi
3) ∗,s
νi
=
`i e s µi `j PM s j=1 e µj
4) If H(ν ∗,s|µ) < R then decrease s by a fixed step size ∆ and then go back to step 1.
5) Continue steps (1) to (4) until |H(ν ∗,s0 |µ) − R| ≤ δ, where δ