Multiple description coding for large diversity ... - Semantic Scholar

2 downloads 0 Views 1011KB Size Report
avr`a nessun guadagno in termine di distorsione se entrambi i pacchetti sono ricevuti. 1.2.2 Sistema con ...... [17] R. Puri, T. Kim, and K. Ramchandran. Multiple ...
POLITECNICO DI TORINO Facolt`a di Ingegneria dell’Informazione Corso di Laurea in Ingegneria delle Telecomunicazioni

Tesi di Laurea

Codifica a Descrittori Multipli per reti “Large Diversity”

Relatori: Prof.sa Gabriella Olmo Ing. Enrico Magli Candidato: Enrico Baccaglini

Settembre 2003

Ringraziamenti ´ Il mio ringraziamento va a tutte le persone conosciute all’Ecole Polytechnique F´ed´erale de Lausanne. Un grazie particolare per l’ospitalit`a a Luciano, Andrea, Thibaut, Deepak e Roberto, nonch´e a Guillermo e a Baltasar per l’aiuto, non solo accademico, nei sei mesi trascorsi a Losanna. Un grazie particolarissimo agli amici del collegio universitario Villa San Giuseppe, bottega di esperienze uniche.

I

Table of contents Ringraziamenti

I

1 Sommario 1.1

1.2

1.3

1.4

1

Introduzione . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

1.1.1

Modello del sistema MD . . . . . . . . . . . . . . . . . . . . .

2

1.1.2

Modello della rete . . . . . . . . . . . . . . . . . . . . . . . . .

4

Codifica UEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4

1.2.1

Sistema con due descrittori . . . . . . . . . . . . . . . . . . . .

5

1.2.2

Sistema con pi` u descrittori . . . . . . . . . . . . . . . . . . . .

6

1.2.3

Confronto con codifica SD . . . . . . . . . . . . . . . . . . . .

9

1.2.4

Confronto tra sistemi a pi` u descrittori . . . . . . . . . . . . . 10

Codifica MDSQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3.1

Sistema con due descrittori . . . . . . . . . . . . . . . . . . . . 13

1.3.2

Confronto con codifica SD . . . . . . . . . . . . . . . . . . . . 14

1.3.3

Sistema con pi` u descrittori . . . . . . . . . . . . . . . . . . . . 15

1.3.4

Confronto tra sistemi a pi` u descrittori . . . . . . . . . . . . . 16

Conclusioni . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2 Introduction

19

2.1

The MD model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.2

MD Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.3

Network model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.4

Project contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.5

Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 II

3 Background 3.1 Joint source channel coding . . . . . . . . 3.1.1 (M,k) source channel erasure codes 3.2 Quantization . . . . . . . . . . . . . . . . 3.2.1 Quantizer performance . . . . . . . 3.2.2 Uniform quantizer . . . . . . . . . 3.2.3 Nonuniform quantizer . . . . . . . 3.2.4 pdf-Optimized quantization . . . . 3.3 Rate distortion theory . . . . . . . . . . . 3.3.1 Rate distortion region . . . . . . . 3.3.2 RD region for two descriptions . . . 3.3.3 RD region for many descriptions . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

4 Uep Coding 4.1 Equal Error Protection . . . . . . . . . . . . . . . . 4.2 Unequal Error Protection . . . . . . . . . . . . . . 4.3 The two-description case . . . . . . . . . . . . . . . 4.3.1 Theoretical performance for two descriptions 4.4 The M-description case . . . . . . . . . . . . . . . . 4.4.1 Theoretical performance for M descriptions . 4.4.2 Optimal rate allocation . . . . . . . . . . . . 4.4.3 Theoretical constraints . . . . . . . . . . . . 4.4.4 Rate allocation results . . . . . . . . . . . . 4.5 Comparison with Single Description Coding . . . . 4.6 Performance with M descriptions . . . . . . . . . . 4.7 Practical rate allocation . . . . . . . . . . . . . . . 4.7.1 Limitations of the allocation algorithm . . . 5 MDSQ coding 5.1 The two-description case . . . . . . 5.1.1 Theoretical performance . . 5.1.2 Implementation of the index 5.1.3 The incomplete matrix . . . 5.1.4 System for two descriptions III

. . . . . . . . . . . . . . assignment . . . . . . . . . . . . . .

. . . . .

. . . . .

. . . . . . . . . . .

. . . . . . . . . . . . .

. . . . .

. . . . . . . . . . .

. . . . . . . . . . . . .

. . . . .

. . . . . . . . . . .

. . . . . . . . . . . . .

. . . . .

. . . . . . . . . . .

. . . . . . . . . . . . .

. . . . .

. . . . . . . . . . .

. . . . . . . . . . . . .

. . . . .

. . . . . . . . . . .

. . . . . . . . . . . . .

. . . . .

. . . . . . . . . . .

. . . . . . . . . . . . .

. . . . .

. . . . . . . . . . .

. . . . . . . . . . . . .

. . . . .

. . . . . . . . . . .

. . . . . . . . . . . . .

. . . . .

. . . . . . . . . . .

26 26 27 28 29 30 31 32 32 33 34 36

. . . . . . . . . . . . .

37 37 38 38 40 43 45 47 50 52 53 55 57 61

. . . . .

65 65 69 70 71 75

5.2

5.3

5.1.5 Comparison of different index assignments . . 5.1.6 Comparison with a Single Description Coding The M-description case . . . . . . . . . . . . . . . . . 5.2.1 Construction of the hyper-cube . . . . . . . . 5.2.2 Complete filling for three descriptions . . . . . 5.2.3 The incomplete cube . . . . . . . . . . . . . . 5.2.4 System for three descriptions . . . . . . . . . 5.2.5 Implementation for R = 2 bits . . . . . . . . . Simulation performance . . . . . . . . . . . . . . . .

6 Comparison and conclusions 6.1 UEP coding . . . . . . . . . . . . . 6.2 MDSQ coding . . . . . . . . . . . . 6.3 Theoretical bounds for UEP coding 6.4 Future work . . . . . . . . . . . . . Bibliography

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . . . . . . .

. . . .

. . . . . . . . .

. . . .

. . . . . . . . .

. . . .

. . . . . . . . .

. . . .

. . . . . . . . .

. . . .

. . . . . . . . .

. . . .

. . . . . . . . .

. . . .

. . . . . . . . .

. . . .

. . . . . . . . .

76 78 80 84 85 86 87 89 92

. . . .

97 97 99 100 103 104

IV

Chapter 1 Sommario ´ Il presente capitolo vuole essere di introduzione al lavoro di tesi svolto presso l’Ecole Polytechnique F´ed´erale de Lausanne (EPFL) nel Laboratoire de Communications Audiovisuelles (LCAV) diretto dal prof. Martin Vetterli sotto la guida del Dr. Baltasar Beferull-Lozano e dell’Ing. Guillermo Barrenechea. Nei prossimi paragrafi `e presentato un sunto in lingua italiana della ricerca che, nei capitoli successivi, `e stata descritta in lingua inglese, come previsto dall’universit`a ospitante.

1.1

Introduzione

Molti degli attuali sistemi di comunicazione multimediale trasmettono dati mediante codifica progressiva. Questo consente al ricevitore di fornire una stima dell’informazione originale proporzionata alla quantit`a di dati ricevuti, purch´e questi giungano nello stesso ordine in cui sono stati inviati. Se una parte di questi dati `e persa, il ricevitore ne chiede la ritrasmissione e la qualit`a della rappresentazione rimane quella dell’ultima sequenza ordinata correttamente ricevuta. La ritrasmissione, per`o, non `e sempre possibile. Ad esempio, se la comunicazione `e monodirezionale, il ricevitore non `e in grado di inviare un messaggio al trasmettitore. Oppure, se il canale `e bidirezionale, il riscontro del ricevitore potrebbe generare troppo traffico come avviene per le comunicazioni broadcast cio`e tra una sorgente e pi` u ricevitori. Per quanto riguarda alcune applicazioni in tempo reale, infine, l’informazione ritrasmessa potrebbe divenire obsoleta nel tempo impiegato a raggiungere la destinazione. 1

1 – Sommario Channel 1

Source

{Xk }

Encoder

Channel 2

Figure 1.1.

(1)

Decoder 1

ˆ } {X k

Decoder 0

ˆ } {X k

Decoder 2

ˆ } {X k

(0)

(2)

Sistema MD a due descrittori

Se nella rete di comunicazione le perdite di pacchetti sono inevitabili ma la loro ritrasmissione non `e possibile, si possono adottare tecniche di codifica che rendono immediatamente utili tutti i dati che vengono ricevuti, indipendentemente dal loro ordine di consegna. Le tecniche di codifica a descrittori multipli (MD) realizzano questo tipo di trasmissione e permettono di stimare l’informazione originale nonostante la mancanza di una parte di essa, ottenendo una rappresentazione della sorgente proporzionata alla quantit`a di dati ricevuti. Nelle tradizionali tecniche a “singolo descrittore”, il codificatore invia sulla rete un’unica sequenza di bit, magari divisa in pi` u pacchetti consecutivi. Un codificatore a descrittori multipli, invece, divide l’informazione in diversi pacchetti ma questi vengono codificati in modo da essere complementari ed indipendentemente utilizzabili.

1.1.1

Modello del sistema MD

Nel caso di un sistema MD a due descrittori, il modello di riferimento `e illustrato in figura 1.1. Il codificatore riceve una sequenza di simboli di sorgente da trasmettere a tre ricevitori tramite due canali che non introducono errori sui singoli bit. Il decodificatore centrale (Decoder 0) riceve l’informazione trasmessa su entrambi i canali mentre i decodificatori Decoder 1 e 2 ricevono l’informazione inviata sul loro rispettivo canale. Ogni canale ha rate Ri (i = 1,2) bit per simbolo. In questo modello possiamo associare ogni canale ad un descrittore (pacchetto) che viene inviato sulla rete. Il ricevitore, in base al numero di descrittori ricevuti otterr`a una rappresentazione della sorgente utilizzando uno dei tre Decoder. La misura della qualit`a di ciascuna sequenza `e espressa in termine di distorsione calcolata come errore quadratico medio tra la sequenza originale e la sequenza 2

1 – Sommario Channel 1

Source

Encoder

Channel 2

Channel 3

Dec1

Dec12

Dec2

Dec13

Dec3

Dec23

Dec123

Figure 1.2. Sistema MD a tre descrittori

stimata. In particolare, la distorsione centrale (Dc) indica la qualit`a ottenuta dal ricevitore che dispone di entrambi i canali (in questo caso Decoder 0) mentre la distorsione laterale (Ds) si riferisce a ciascun decodificatore laterale. Creare un buon sistema MD significa trovare rappresentazioni da inviare su ogni canale in modo che la qualit`a dell’informazione decodificata aumenti con l’aumentare dei descrittori ricevuti. Dal punto di vista teorico, i valori ammissibili di Dc e Ds in funzione dei rate Ri sono noti solo per alcuni particolari tipi di sorgente. Per una sorgente gaussiana con varianza σ 2 si ottiene [14]: Ds = Di ≥ σ 2 2−2Ri per i = 1,2 2 −2(R1 +R2 )

Dc = D0 ≥ σ 2 con γD =

· γD (R1 ,R2 ,D1 ,D2 )

1 p p 1 − ( (1 − D1 )(1 − D2 ) − D1 D2 − 2−2(R1 +R2 ) )2

(1.1) (1.2)

(1.3)

L’estensione di questo modello a M descrittori si basa sull’utilizzo di M canali e di 2M − 1 ricevitori, ciascuno corrispondente ad un particolare sottoinsieme di descrittori ricevuti. Un esempio `e dato in figura 1.2. L’analisi teorica pone ancora problemi aperti e sono conosciute solo alcune possibili combinazioni di distorsioni e rate, presentate in [16, 23]. Tra le diverse tecniche per realizzare descrittori multipli, ricordiamo la codifica MDSQ introdotta in [21] che permette di creare descrittori tramite quantizzatori e quella UEP che fa uso di codici di canale [13, 17]. 3

1 – Sommario

Lo scopo di questa tesi `e studiare sistemi di codifica MD basati su MDSQ e UEP per l’utilizzo con pi` u di due descrittori. Analizziamo le prestazioni dei due sistemi in termini di complessit`a e qualit`a ottenuta al ricevitore e confrontiamo queste tecniche con la classica codifica a singolo descrittore.

1.1.2

Modello della rete

Caratterizziamo la rete di comunicazione con la probabilit`a p di perdita di un pacchetto contenente dati durante la sua trasmissione al ricevitore. Supponiamo che la lunghezza (in bit) del pacchetto non influenzi p e che non ci siano errori di trasmissione sui bit ricevuti. Ipotizziamo, inoltre, che la probabilit`a p al tempo t non dipenda dalla probabilit`a di perdita al tempo t − 1 sullo stesso canale e al tempo t su un altro canale. Questo modello `e particolarmente adatto a reti “large diversity” nelle quali i pacchetti raggiungono la destinazione tramite percorsi differenti ed indipendenti e la probabilit`a di perdita su ciascun canale `e supposta uguale a p. Definiamo come distorsione end-to-end l’espressione: M

Distorsione = Dc · (1 − p)

+

M −1 µ X k=1

M k

¶ · Dsk · (1 − p)k · pM −k + σ 2 · pM

(1.4)

Questa espressione misura la qualit`a dell’informazione disponibile al ricevitore in funzione di p e del numero di descrittori ricevuti. Lo scopo dei sistemi MD implementati nel progetto `e di minimizzare questa distorsione. Il primo termine indica che, con probabilit`a (1 − p)M , tutti i descrittori verranno ricevuti e si otterr`a la distorsione centrale Dc. Il secondo termine indica la distorsione con la ricezione di k descrittori su M . Infine, se tutti i descrittori vengono persi, il ricevitore potr`a solo stimare tramite la varianza della sorgente quale fosse l’informazione originale. Il valore di Dc e di Dsk (1 ≤ k ≤ M −1) dipende dalla particolare tecnica utilizzata.

1.2

Codifica UEP

Mediante la tecnica di codifica Unequal Error Protection (UEP) un trasmettitore genera diversi descrittori utilizzando vari codici di canale a partire da una sequenza progressiva di bit (progressive bitstream). In questa sequenza i primi bit sono pi` u 4

1 – Sommario

ζR

Figure 1.3.

ζR

(1 − ζ)R

D1

ζR

(1 − ζ)R

D2

(1 − ζ)R (1 − ζ)R

Sistema UEP a due descrittori. I primi ζR bit nel descrittore D2 sono una copia dei primi ζR del descrittore D1.

importanti dei successivi e perci`o `e opportuno proteggerli maggiormente durante la trasmissione.

1.2.1

Sistema con due descrittori

Per creare due descrittori ciascuno di R bit, si possono considerare (2 − ζ)R bit ottenuti da un codificatore progressivo. Nel nostro caso, il bitstream `e ottenuto quantizzando con 2R bit di risoluzione una sorgente che emette variabili casuali con distribuzione gaussiana e varianza σ 2 . Di questo bitstream vengono considerati i primi (2 − ζ)R bit. Il parametro ζ (0 ≤ ζ ≤ 1) indica la frazione di bit da proteggere maggiormente e che verranno inseriti in entrambi i descrittori. Un esempio `e dato in figura 1.3. Se uno solo dei due descrittori `e ricevuto, si potranno ricostruire i primi ζR bit di informazione mentre, disponendo di entrambi i pacchetti, (2 − ζ)R bit saranno decodificati. Possiamo notare che sulla rete vengono trasmessi in totale 2R bit ma, di questi, solo (2 − ζ)R sono di informazione. I rimanenti ζR sono aggiunti in modo da proteggere una frazione dei dati pi` u importanti. Le espressioni analitiche per la tecnica UEP considerando una sorgente gaussiana con varianza σ 2 sono date da: Ds1 = Ds2 = Ds = σ 2 · 2−2ζR

(1.5)

Dc = σ 2 · 2−2(2−ζ)R Ogni valore di ζ implica un valore di distorsione laterale Ds e uno di distorsione centrale Dc. La figura 1.4 mostra questi valori al variare di ζ con R = 3 bit. Quando ζ ' 0 si ottiene un’elevata distorsione laterale ed una bassa distorsione centrale perch´e i due descrittori contengono pochi bit ripetuti e, di conseguenza, ricevendoli entrambi, si pu`o decodificare una lunga porzione della sequenza di bit 5

1 – Sommario

M=2 descriptions − R=3 bits 0 −2 −4

Side Distortion [dB]

−6 −8 −10 −12 −14 −16 −18 −20 −38

−36

−34

−32

−30

−28

−26

−24

−22

−20

−18

Central Distortion [dB]

Figure 1.4.

Valori della distorsione centrale e laterale al variare di ζ per un sistema a due descrittori con R = 3 bit

originale. Al contrario, con ζ ' 1, i due descrittori conterranno gli stessi bit e non si avr`a nessun guadagno in termine di distorsione se entrambi i pacchetti sono ricevuti.

1.2.2

Sistema con pi` u descrittori

Per realizzare un sistema MD-UEP a M descrittori si utilizzano codici di canale con diversi rate. In particolare, il bitstream originale `e diviso in M sezioni trasmesse ciascuna mediante codici di canale con rate 1/M , 2/M , . . . , j/M , . . . , 1 (1 ≤ j ≤ M ). In questo caso, ζ `e un vettore di M elementi ed ogni componente ζi (0 ≤ i < M ) indica la frazione di bit, in ogni descrittore, protetta con un codice (i + 1)/M . La sequenza di bit `e divisa in M sezioni e ciascuna corrisponde al raggiungi` possibile mento di un particolare livello di qualit`a come mostrato in figura 1.5. E decodificare la sezione i-esima quando i pacchetti ricevuti sono almeno i. Vengono utilizzati, come codici di canale, i codici della famiglia Reed-Solomon (RS) che consentono di decodificare l’i-esimo livello di qualit`a ricevendo i descrittori qualsiasi su M . In particolare, la sezione i-esima `e divisa in i porzioni della stessa lunghezza e ogni bit di queste porzioni `e inserito in un descrittore differente. Nei rimanenti 6

1 – Sommario

R0 1

Ri−2

R1

Ri−1

RM −2 RM −1

2

...

i

...

M

1

...

s

...

i

to Descr. 1

to Descr. s

to Descr. i

Figure 1.5. Bitstream progressivo diviso in M livelli di qualit`a

0

1

0

Figure 1.6.

0

1

1

0

1

0

1

0

1

0

0

1

0

0

1

1

1

D1

D2

D3

Sistema MD-UEP con codice di parit`a e tre descrittori

M − i descrittori vengono inseriti i bit di ridondanza in accordo con il codice RS corrispondente. Un esempio `e dato in figura 1.6 dove si utilizzano codici di parit`a e tre descrittori. Nell’esempio, il vettore ζ `e dato da [0.25,0.5,0.25]. Infatti, per ogni descrittore, un bit su quattro (0.25) `e ripetuto in ciascun pacchetto (codice a ripetizione), due bit su quattro (0.5) sono protetti con un codice (3,2) e l’ultimo bit, in ciascun descrittore, non `e protetto con alcun codice. Dal punto di vista teorico, si ottengono espressioni analoghe al caso di due desˆ k il numero di bit decodificati con la ricezione crittori. In particolare, indicando con R di k descrittori: k−1 X ˆ (i + 1)ζi R (1.6) Rk = i=0

otteniamo: ˆ

Dsk = σ 2 · 2−2Rk

(1.7)

ˆM −2R

Dc = DsM = σ 2 · 2

dove con il termine Dsk si indica la distorsione ottenuta con la ricezione di k descrittori. 7

1 – Sommario

M=3 descriptions − R=3 bits −15 −20 −25

Dc [dB]

−30 −35 −40 −45 −50 −55 −20 −15 Ds1 [dB] −5 0

−40

−35

−20

−25

−30

−15

−10

−5

0

Ds [dB] 2

Figure 1.7.

Valori della distorsione centrale e di quelle laterali per un sistema a tre descrittori con R = 3 bit

In figura 1.7 vengono rappresentate graficamente le terne di distorsioni ammesse da un sistema a tre descrittori con R = 3 bit. Come nel caso di due descrittori, vogliamo minimizzare la distorsione end-to-end del sistema data dall’equazione 1.4. Con M = 2 `e possibile adottare una ricerca esaustiva del valore ottimale del parametro ζ in un insieme di possibili valori mentre per M > 2 questo non `e possibile perch´e ζ `e un vettore di dimensione M e la ` necessario, perci`o, implementare un complessit`a di questo problema `e O(2M ) [2]. E algoritmo con complessit`a inferiore. La tecnica usata si basa su quella adottata in [17] e permette, conoscendo la probabilit`a di perdita della rete, di ottenere il valore ottimo del vettore ζ che minimizza l’espressione 1.4 con complessit`a lineare con M . L’equazione 1.4 nel caso della tecnica UEP pu`o, infatti, essere equivalentemente scritta come: Ed = q−1 σ 2 +

M −1 X j=0

dove: 8

qj D(Rj )

(1.8)

1 – Sommario

• qj indica la probabilit`a che j + 1 su M pacchetti raggiungano il ricevitore: ¶ µ M qj = · (1 − p)j+1 · pM −(j+1) (1.9) j+1 • D(Rj ) `e la funzione distorsione-rate calcolata in Rj cio`e nel punto che separa il livello di qualit`a j da quello j + 1 nel bitstream come mostrato in figura 1.5. Il rate totale Rm in uscita dal codificatore `e dato da: Rm =

R0 (R1 − R0 ) (RM −1 − RM −2 ) M+ M + ... + M 1 2 M

(1.10)

o, equivalentemente Rm =

M −1 X

αj Rj

(1.11)

j=0

con αj =

M per j = 0, . . . M − 2 (j + 1)(j + 2) αM −1 = 1

L’equazione 1.11 `e il vincolo del sistema MD-UEP ed impone che il rate totale in uscita dal codificatore, diviso su M descrittori, sia minore o uguale a M · R bit, lunghezza del bitstream originale. Per ottenere i valori di Rj che minimizzano 1.8 con il vincolo 1.11 si pu`o utilizzare la tecnica del moltiplicatore di Lagrange Λ [2]: 2

Lc (R1 ,...RM −1 ,Λ) = q−1 σ +

M −1 X

M −1 X

qj D(Rj ) + Λ(

j=0

αj Rj − Rm )

(1.12)

j=0

La soluzione ottima `e ottenuta individuando i punti sulla curva D(R) con penα ` stato dimostrato in [17] che se la sequenza originale αj `e denze in proporzione qjj . E qj monot`ona decrescente, allora la soluzione ottenuta `e ottima altrimenti `e necessario convertire la sequenza in una equivalente in maniera da rispettare la monotonicit`a. I dettagli dell’implementazione sono dati nel paragrafo 4.4.2.

1.2.3

Confronto con codifica SD

Con una tecnica a singolo descrittore (SD), supponiamo che il bitstream originale sia inserito in un unico pacchetto. Se questo giunge a destinazione, la distorsione 9

1 – Sommario

sar`a pari a quella ottenuta con M · R bit di informazione; se il pacchetto `e perso, la sorgente `e stimata solo con la sua varianza. La funzione di distorsione end-to-end per questo sistema `e data da: Distorsione = σ 2 · 2−2·M ·R · (1 − p) + σ 2 · p

(1.13)

Un confronto tra questa tecnica ed un sistema a doppio descrittore (M = 2) `e dato in figura 4.13 con un bitstream di lunghezza M ·R = 6 bit. Il sistema MD porta ad una minore distorsione end-to-end per un ampio intervallo di valori p. Il sistema, infatti, si pu`o basare su due pacchetti nei quali `e stato inserito un livello ottimo di ridondanza. La differenza massima `e di circa 6.7 dB attorno a p ' 0.08. Notiamo per`o anche che, per valori di p particolarmente bassi, la codifica a singolo descrittore porta a migliori risultati, guadagnando circa 1.5 dB attorno a p ' 2.4 · 10−4 . Questo risultato pu`o essere spiegato analiticamente in quanto la distorsione ottenuta da un sistema MD-UEP con ζ = 0 (cio`e un sistema in cui non viene aggiunta ridondanza) `e sempre maggiore o uguale di quella ottenuta da un sistema SD che trasmette la stessa sequenza. Quando p `e molto bassa, il sistema MD non aggiunge ridondanza e quindi la distorsione `e maggiore di quella del sistema SD.

1.2.4

Confronto tra sistemi a pi` u descrittori

Analizziamo ora le prestazioni di sistemi che si basano su diversi descrittori per trasmettere la sequenza progressiva di informazione. In figura 4.15 confrontiamo la distorsione end-to-end per diversi sistemi al variare della probabilit`a di perdita p della rete. La sequenza originale `e di 12 bit e il sistema utilizza da uno a sei descrittori per trasmettere l’informazione. Il rate di ciascun descrittore `e modificato in modo che il prodotto M · R sia costante e uguale a 12 bit. Come si pu`o vedere dalla figura, utilizzare il maggior numero di descrittori possibile porta alla minore distorsione per un ampio intervallo di valori p. Per valori di p bassi, il sistema equivalente a singolo descrittore `e il migliore fino a p ' 7 · 10−8 . Il vettore ζ ottenuto tramite la minimizzazione con il moltiplicatore di Lagrange `e composto da valori non interi. Per soddisfare esattamente questa allocazione, dovremmo poter considerare frazioni di bit della sequenza originale. Questo, naturalmente, non `e possibile e, di conseguenza, il vettore ζ deve essere modificato in modo da poter considerare un numero intero di bit nel bitstream originale. Il 10

1 – Sommario

Simulated MD systems for MR=48 bits 0

−20

Distortion [dB]

−40

−60

−80

−100

−120

M=2 M=3 M=4 M=6 M=8 0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

P(loss)

Figure 1.8.

Confronto tra diversi sistemi UEP con bitstream di M R = 48 bit

meccanismo che si occupa di questa operazione (spiegato in dettaglio nel paragrafo 4.7) impone un vincolo sul numero massimo di descrittori da utilizzare. Per simulare fedelmente il sistema, infatti, `e necessario utilizzare valori di R sufficientemente grandi che pongono un limite al valore massimo di M . Un risultato che utilizza un bitstream di 48 bit `e mostrato in figura 1.8 dove la distorsione end-to-end `e indicata in funzione di p. Nell’intervallo di valori considerato, l’utilizzo di 8 descrittori `e preferibile ad altre configurazioni che si basano su un minor numero di pacchetti.

1.3

Codifica MDSQ

La tecnica Multiple Description Scalar Quantization (MDSQ) consente di creare diversi descrittori mediante quantizzatori. Ciascuno offre una rappresentazione su R bit della sorgente di informazione. In ogni pacchetto inviato alla destinazione `e presente una di queste descrizioni. I quantizzatori dovrebbero essere progettati in modo da essere individualmente validi e, nello stesso tempo, quando utilizzati congiuntamente, dare una rappresentazione proporzionata alla somma dei loro singoli rate. 11

1 – Sommario

Index Assignment

Quantizer Source

α

`

i1 i2

MDSQ encoder α0

Figure 1.9.

Codificatore MDSQ a due descrittori

1 2

Figure 1.10.

3 4 6

5 7 8

9 10

Matrice di index assignment con B = 10 livelli

In [21] viene trattato il caso di due descrittori ed una rappresentazione grafica `e data in figura 1.9. Un codificatore MDSQ trasmette, per ogni valore x ∈ R che riceve in ingresso, una coppia di indici (i1 ,i2 ), ciascuno rappresentato con R bit. Questo processo pu`o essere equivalentemente visto come una quantizzazione con B possibili livelli di uscita (con B ≤ 22R ) seguita da un’operazione di index assignment (`) che associa, ad ogni B, i due indici. Questa funzione deve essere invertibile in modo che il ricevitore, conoscendola, possa stimare la sorgente ricevendo uno o entrambi gli indici. Come nel caso della codifica UEP, in base al numero di indici ricevuti valutiamo la qualit`a mediante la distorsione centrale (Dc, entrambi gli indici giungono a destinazione) o laterale (Ds, solo un indice `e disponibile al ricevitore). ` possibile visualizzare gli indici (i1 ,i2 ) come indici di riga e colonna della index E assignment matrix. Questa matrice `e quadrata in quanto supponiamo che gli indici siano rappresentati ciascuno su R bit e quindi, in totale, `e composta da 2R righe e 2R colonne. La matrice contiene ogni possibile valore di uscita del quantizzatore e ogni combinazione degli indici si riferisce univocamente ad una particolare cella della matrice e quindi ad un particolare livello del quantizzatore. La figura 1.10 mostra una possibile index assignment matrix con B = 10 livelli. Se un solo indice `e ricevuto, il ricevitore deve dedurre il valore che `e stato trasmesso conoscendo solo la riga o la colonna in cui si trova. Questa incertezza aumenta all’aumentare di B perch´e aumenta il numero totale di celle occupate nella matrice. Con il termine ridondanza si intende l’espressione: ρ = 2R − ln2 B 12

(1.14)

1 – Sommario

La index assignment matrix pu`o essere riempita al massimo con B = 22R numeri e questo implica 0 ≤ ρ ≤ R. Diversi index assignment sono possibili. Una matrice con un basso numero di celle occupate equivale ad inviare informazione con alta ridondanza. Questa soluzione `e adatta quando solo uno degli indici `e ricevuto perch´e verrebbe aggiunta poca informazione con la ricezione anche dell’altro. Al contrario, se molti valori vengono inseriti nella matrice, il quantizzatore avr`a un numero elevato di possibili livelli di uscita e quindi si avr`a bassa distorsione centrale. Questa configurazione `e adatta quando entrambi gli indici sono ricevuti dal decodificatore. La matrice index assignment dovrebbe essere aggiornata in accordo con p in modo tale che la minor ridondanza possibile sia inserita nel sistema. Come nel caso di un sistema UEP, vogliamo individuare il valore ottimo di celle da riempire nella matrice di index assignment in modo da minimizzare la distorsione end-to-end 1.4.

1.3.1

Sistema con due descrittori

L’ottimizzazione dell’assegnazione degli indici `e molto complessa e non sono conosciuti algoritmi ottimi ma solo tecniche euristiche. L’idea principale `e di riempire le celle della matrice dall’alto a sinistra verso il basso a destra a partire dalla diagonale centrale della matrice. Il metodo che adottiamo `e quello studiato in [1] dove viene implementato un algoritmo per inserire 22R numeri (da 1 a 22R ) in una matrice 2R × 2R in modo da minimizzare la differenza (spread ) tra il pi` u grande e il pi` u piccolo numero in ogni riga e colonna della matrice. L’algoritmo garantisce uno spread massimo di N (N + 1) −1 2

(1.15)

indicando con N la dimensione del lato della matrice che, nel nostro caso, vale 2R . In [1] `e stato dimostrato che questo algoritmo `e ottimo nel senso che garantisce il minimo valore di spread possibile. Un esempio `e dato in figura 1.11. Un index assignment con 22R numeri `e adatto quando entrambi gli indici sono ricevuti perch´e la qualit`a della rappresentazione al ricevitore `e proporzionata a log2 N 2 = 2R bit. Quando per`o solo uno degli indici `e ricevuto, il ricevitore ha una grande incertezza su quale sia il valore quantizzato originale. Quindi, `e necessario poter riempire la matrice anche con meno di 22R numeri per riuscire ad ottenere 13

1 – Sommario

1 2 9 10 Figure 1.11.

3 4 11 12

5 6 13 14

7 8 15 16

Arrangiamento con R = 2 bits che garantisce uno spread pari a 9

altre combinazioni di distorsione centrale e laterale che garantiscano prestazioni migliori per diversi valori di p. Per fare ci`o, utilizziamo l’arrangiamento precedente eliminando alcuni numeri nelle celle pi` u esterne ed aggiornando i numeri rimasti. In questo modo, seguiamo le tecniche euristiche proposte in [21] che si basano su algoritmi che riempiono la matrice per diagonali partendo da quella pi` u interna. Notiamo che il numero totale di arrangiamenti che consideriamo `e pari al numero di diagonali in una matrice di dimensione 2R × 2R e cio`e: D = 2R+1 − 1

(1.16)

Quando le condizioni della rete cambiano (in termini di p), il codificatore modifica il numero di diagonali riempite nella matrice index assignment. Nella figura 5.12 `e indicato il valore della distorsione end-to-end per diversi valori di p e diverse diagonali riempite nel caso R = 2 bit.

1.3.2

Confronto con codifica SD

Come per il caso UEP, anche per la tecnica MDSQ confrontiamo le prestazioni di un sistema a doppio descrittore con un sistema a singolo descrittore. In quest’ultimo, la sorgente `e campionata con la risoluzione pi` u alta del quantizzatore e cio`e con B = 22R livelli. La distorsione end-to-end `e data da: Distorsione = DM R · (1 − p) + σ 2 · p

(1.17)

dove DM R `e la distorsione dovuta al quantizzatore e cio`e la somma della distorsione granulare e di sovraccarico. In figura 5.14 si pu`o notare come il sistema MDSQ sia migliore di quello a singolo descrittore, garantendo una minore distorsione al ricevitore. Come nel caso UEP, per probabilit`a di perdita p molto bassa, il sistema a singolo descrittore `e migliore del sistema MDSQ. Comunque, all’aumentare del valore di R, l’intersezione tra le due 14

1 – Sommario

curve avviene a valori di p sempre pi` u bassi perch´e il sistema MDSQ pu`o ottimizzare meglio la ridondanza da inserire e quindi aumenta l’intervallo di valori di p per cui la distorsione end-to-end `e minore.

1.3.3

Sistema con pi` u descrittori

Generalizzare questa tecnica di codifica a pi` u di due descrittori implica l’utilizzo di M > 2 quantizzatori sulla stessa sorgente di informazione. In questo caso, la funzione di index assignment pu`o essere vista come un arrangiamento di B numeri in un ipercubo. Il passaggio critico `e l’assegnamento degli M indici da inviare al ricevitore. Il metodo utilizzato si basa sul riempimento di un ipercubo di lato 2R in M dimensioni con 2M R numeri mediante la tecnica proposta in [1]. A differenza del caso bidimensionale, questo arrangiamento non `e ottimale. Dobbiamo notare, inoltre, che l’implementazione che minimizza lo spread nell’ipercubo, come spiegato in [1], risulta essere un problema NP-completo. L’algoritmo proposto garantisce uno spread in ogni riga al massimo di: µ ¶M N N M + −1 (1.18) B(K2 ) · 2 2 dove B(K2M )

=

M −1 µ X t=0

¶ t bt/2c

e N `e la dimensione del lato dell’ipercubo (nel nostro caso, N = 2R ). A causa della complessit`a della funzione di index assignment in pi` u di due dimensioni, l’implementazione del sistema MDSQ `e stata limitata al caso di tre descrittori. In questo caso, il numero di diagonali nel cubo `e dato da: D = (2N − 1)2 − N (N − 1)

(1.19)

con N = 22R . Questa espressione indica anche la complessit`a dell’algoritmo. Infatti il sistema, per ogni arrangiamento (espresso come numero di diagonali riempite), calcola il valore della distorsione end-to-end e sceglie l’arrangiamento che garantisce il valore minimo. Il numero di arrangiamenti che devono essere considerati cresce esponenzialmente con il rate dei singoli descrittori. Lo spread in ciascuna riga e in ciascun piano del cubo di index assignment cresce al crescere del numero di diagonali riempite. Come nel caso di due descrittori, questo 15

1 – Sommario

metodo per riempire il cubo non permette di regolare in maniera precisa il sistema. Infatti, le diagonali pi` u interne del cubo sono composte da diverse celle mentre, man mano che ci si allontana dalla diagonale principale, le diagonali contengono sempre meno valori. Per il sistema a tre descrittori, data la probabilit`a p, il codificatore modifica il numero di diagonali nel cubo di index assignment in modo da minimizzare la distorsione end-to-end. Una volta individuato l’arrangiamento che minimizza questa distorsione, il trasmettitore invia le coordinate, in termine di indici, del valore originario quantizzato. Il ricevitore ha una copia locale della funzione di index assignment e, ricevuto un sottoinsieme di indici, stima il valore originale. I dettagli di questa ricostruzione sono dati nel paragrafo 5.1.4 e sono validi anche per il caso bidimensionale.

1.3.4

Confronto tra sistemi a pi` u descrittori

Abbiamo confrontato un sistema a due descrittori (ciascuno su R = 3 bit) con un sistema a tre descrittori (con R = 2 bit). In questo modo il prodotto M · R risulta costante e pari a 6. Dalle simulazioni eseguite si nota che i risultati migliori si ottengono mediante l’uso di due indici. In figura 5.25 la distorsione end-to-end dei due sistemi `e rappresentata in funzione della probabilit`a di perdita p. Per valori di p molto alti o molto bassi, i due sistemi generano circa lo stesso livello di distorsione ma, nell’ampio intervallo di valori tra p ' 7 · 10−4 e p ' 0.5, il sistema a due descrittori guadagna fino a 3 dB rispetto a quello a tre. Questo comportamento pu`o essere spiegato principalmente dal fatto che non `e noto il modo ottimale per riempire il cubo di index assignment. Inoltre, l’algoritmo da noi utilizzato considera come arrangiamento iniziale un cubo completamente riempito e, da questo, rimuove il contenuto di alcune delle celle pi` u esterne. Questo pu`o non garantire sempre arrangiamenti adatti alle diverse probabilit`a di perdita p. In pi` u, l’arrangiamento che riempie completamente il cubo minimizza lo spread in ogni riga senza considerare ci`o che avviene nei vari piani del cubo e cio`e con la ricezione di un solo indice. Come si vede nella figura 1.12, infatti, la distorsione laterale con la ricezione di un solo indice cresce velocemente all’aumentare delle diagonali riempite. Considerare un arrangiamento in cui viene minimizzato lo spread in ogni piano potrebbe portare a prestazioni migliori. 16

1 – Sommario

M=3 descriptions − R=2 bits 0

−5

Distortion [dB]

−10

−15

−20 Dc Ds2 Ds1

−25

−30

0

5

10

15

20 Diagonals filled

25

30

35

Figure 1.12. Valori di distorsione centrale e laterale in funzione delle diagonali riempite per un sistema MDSQ a tre descrittori e R = 2 bit

1.4

Conclusioni

In questa tesi si sono analizzate le prestazioni di due sistemi di codifica a descrittori multipli in termini di complessit`a implementativa e di qualit`a dell’informazione ottenuta al ricevitore. Abbiamo constatato come queste tecniche permettano di fronteggiare la perdita di pacchetti in reti in cui non `e sempre possibile la ritrasmissione. Nell’analisi di entrambi i sistemi, il passo cruciale per ottenere buone prestazioni `e l’inserimento del giusto livello di ridondanza nei pacchetti che vengono trasmessi. Per quanto riguarda la tecnica UEP, i risultati indicano che, dal punto di vista teorico, maggiore `e il numero di descrittori usati, minore `e la distorsione introdotta dal sistema. Questo risulta valido per un ampio intervallo di valori di probabilit`a di perdita sulla rete. Un limite al numero di descrittori pu`o essere dato dal fatto che, per mantenere costante il rate di uscita dal trasmettitore al variare del numero di descrittori, deve essere possibile creare pacchetti con pochi bit di informazione. Questo potrebbe portare ad un rapporto non vantaggioso tra lunghezza totale del pacchetto ed informazione trasmessa. 17

1 – Sommario

Ulteriori studi potrebbero includere confronti delle prestazioni di questa codifica con flussi di dati di interesse pratico come SPIHT [18] o JPEG 2000 [20]. Per quanto riguarda la tecnica MDSQ, invece, abbiamo messo in evidenza come l’ostacolo principale alla creazione di pi` u di due descrittori sia la complessit`a del sistema che cresce esponenzialmente sia con il numero di descrittori sia con il loro rate. Minimizzare la distorsione data la probabilit`a di perdita, significa, infatti, trovare il compromesso migliore tra distorsione centrale e laterale in termine di diagonali occupate nell’ipercubo di index assignment. Questo numero cresce molto rapidamente in rapporto al numero di descrittori e con il rate. Dal punto di vista teorico, inoltre, si `e visto che la tecnica UEP porta ad una distorsione end-to-end maggiore della tecnica MDSQ, almeno per il caso di due descrittori. Infatti, la codifica UEP pu`o essere considerata come un particolare index assignment ed `e stato verificato che questo arrangiamento porta a distorsioni maggiori di altri che, mantenendo lo stesso spread, permettono di inserire pi` u numeri nell’arrangiamento e, quindi, garantiscono una distorsione centrale pi` u bassa. Ulteriori studi potrebbero essere svolti per realizzare algoritmi che migliorino la funzione di index assignment riducendone la complessit`a in modo da utilizzare questa tecnica con un numero ancora maggiore di descrittori.

18

Chapter 2 Introduction Current multimedia systems typically generate contents with a progressive coding. Quality improves with successive refinements as the number of consecutive received packets increases. Progressive transmission is effective when packets are received in order and without losses. When a loss occurs, the decoder requests a retransmission and the reconstruction stops until that particular packet is received. The delay in receiving a retransmitted packet may be much longer than the interarrival time between packets. In a TCP/IP-based system, if a packet needs to be retransmitted, even if all the following ones have already been received, the quality at the receiver will be proportionate to the first completed and ordered sequence. In fact, these protocols guarantee to applications an ordered sequence of packets. When a retransmission is required, the receiver tells the sender either what arrived or what did not using a protocol of a family called automatic repeat request (ARQ) and the sender will transmit the missing packet until the message is received correctly. Unfortunately, retransmission is not always possible. It can happen that the transmission is strictly in one direction and so there’s no way to get a message back from the receiver to the sender to inform it about the received or missed packets. Moreover, even if the transmission could be potentially bidirectional, the feedback could generate too much traffic. For example, in a broadcast communication (one sender, many receivers) acknowledgement messages could congest the network. When packet losses are sporadic, retransmission makes efficient use of network 19

2 – Introduction

resources. When packet losses are frequent, retransmission can create an even more congested environment and real-time services will particulary suffer from this situation. In fact, retransmissions imply an added delay of at least one round-trip transport time (i.e., the time for a retransmission request packet to reach the sender and go back to the receiver). This could be unacceptable in interactive communications and for audio or video streaming. In fact, the information contained in a retransmitted packet could be obsolete by the time it reaches the destination. When retransmissions are not possible, the technique commonly used to protect data and to allow the receiver to deal with losses is called forward error correction (FEC). Before sending bits on the transmission channel, the sender adds some redundancy to source bits so that when a subset of them reaches the receiver, the original information can be recovered. Examples of FEC techniques (also called channel codes) are parity bits and Reed-Solomon Codes [9]. However, reliable use of channel codes requires long block sizes and this creates difficulties associated with delay. In earlier days, there was almost exclusive interest in reducing the effect of bit errors. In today’s communication systems, queueing delays and buffer overflows seem to have more impact than bit errors. Thus, there is great interest in mitigating the effects of lost packets. If losses are inevitable and retransmission is not possible, representations that make all of the received packets useful and not only those consecutive, can be of great benefit. It could be useful to estimate the original information despite packet losses and obtain a reproduction quality proportionate to the number of packets received. Multiple Description (MD) coding applies precisely to this situation. In conventional, “single description” (SD) source coding, a source encoder produces a single sequence of bits that is received without errors by a source decoder. A MD source encoder produces bits that are partitioned into descriptions. We can associate these descriptions with packets sent over a network. The decoder is able to compute an estimate of the source from any subset of these descriptions. The quality of the estimate will depend on how many descriptions have been received but, in contrast to the single-description case, the loss of packets does not lead to a failure. In both SD and MD coding, encoder and decoder are oblivious to the transport mechanism. MD coding can be described as a generalized source coding problem. 20

2 – Introduction Channel 1

Source

{Xk }

Encoder

Channel 2

Figure 2.1.

(1)

Decoder 1

ˆ } {X k

Decoder 0

ˆ } {X k

Decoder 2

ˆ } {X k

(0)

(2)

A MD system with 2 descriptions

With descriptions corresponding to packets, MD coding is plainly applicable to packet-based communications with the possibility of packet losses. Therefore, the growth of multimedia content on the Internet is spurring great interest in MD coding.

2.1

The MD model

MD coding refers to the scenario depicted in figure 2.1 in which we consider the case of two descriptions. An encoder receives a sequence of source symbols {Xk }N k=1 to communicate to three receivers over two noiseless (or error-corrected) channels. One decoder (the central decoder ) receives information sent over both channels while the remaining two decoders (the side decoders) receive information only over their respective channel. The encoder sends each description along a different channel, therefore we can associate each channel with a particular description. The transmission rate over channel i is denoted by Ri (i = 1,2) in bits per source sample. The ˆ (i) }N . reconstruction sequence produced by decoder i is denoted by {X k=1 k If an information source is described by separate descriptions, the central MD problem consists in finding what are the concurrent limitations on qualities of these descriptions, taken separately and jointly. Intuitively, the quality of the sequence recreated by the central decoder will be more accurate than the one obtained by each of the two side decoders. The natural extension to more than two descriptions is to M channels and 2M −1 receivers, each corresponding to a particular subset of descriptions received. This generalization is of great practical importance for situations in which the network presents many possible paths (channels) to reach the destination. 21

2 – Introduction Channel 1

Source

Encoder

Channel 2

Channel 3

Figure 2.2.

Dec1

Dec12

Dec2

Dec13

Dec3

Dec23

Dec123

A MD system with M = 3 descriptions

An example with M = 3 descriptions is given in figure 2.2. The source encoder generates three different descriptions and send them along different channels. Decoder Deci receives description i (with 1 ≤ i ≤ 3) and create a coarse version of the original source according to the description received. Decij receives descriptions i and j (with 1 ≤ i ≤ 3, i 6= j) and it can reconstruct a more accurate representation of the original information. Decoder Dec123 receives all the descriptions. To create a MD system, we have to find good representations to send on each channel so that the quality of the estimate increases with the number of channels (descriptions) that reach the destination.

2.2

MD Techniques

Many techniques has been proposed to generate multiple descriptions. Most of these, however, deal only with two description case. In [21], MD scalar quantization (MDSQ) was introduced to represent a single scalar random variable with two descriptions. MDSQ can be formally extended to vector quantization (MDVQ) where the MD encoder receives many samples at once but the actual design and implementation is significantly more complicated than MDSQ. This problem has been addressed in [12] where a framework for optimization is proposed for an arbitrary number of channels. A possible approach to limit the complexity of vector quantization is to use lattices. In [22] a scheme called Multiple Description Lattice Vector Quantization 22

2 – Introduction

(MDLVQ) is suggested. Recently, techniques based on transform coding [7] and coding with frames [8] have emerged. A powerful technique to generate multiple descriptions and gain robustness to the loss of descriptions is called Unequal Error Protection (UEP). This technique consists in inserting in each description an optimal level of redundancy using channel codes to combat the unreliability of the network. If any descriptions are lost, a coarse version of the original information can be decoded with available descriptions using the features of channel codes. This technique can be generalized to deal with more than two descriptions as described in [13] and [17].

2.3

Network model

The network model we use is characterized by the probability of losing a packet p as the packets move from the source to the destination. Packet loss happens mainly because network becomes congested and routers start dropping packets at random. We suppose that the probability p does not depend on the length of the packet. Big packets and small ones have the same probability of being dropped during the transmission. We consider the probability of losing a packet due to bit errors negligible. If a packet reaches the destination, it does not contain errors on bits and so it can be safety decoded. We also suppose that the probability of losing packets is uncorrelated over time and channels. That is, the probability of losing one packet in one channel at time t does not depend on the probability of losing a packet over the same channel at time t − 1 and on the probability on another channel at time t. This network model is suitable for large diversity networks where packets flow from the source to the destination through many different and independent paths. The probability of failure of one of these channels is not negligible. In our model, we suppose that all these paths have the same probability p of failure. 23

2 – Introduction

2.4

Project contribution

Our goal is to analyze coding systems based on UEP and MDSQ with more than two descriptions to transmit the source information. We study performance in terms of complexity and quality at the receiver. We also compare MD coding with the classical single description coding for many network conditions. In chapter 3 some theoretical background is introduced. Rate-distortion theory and MD rate-distortion region are presented together with an overview on quantization and channel coding. Chapter 4 deals with UEP technique. There, we start by studying the simplest two-description case and then we analyze the generalization to more than two descriptions. A technique to create many representations from a single source is given together with a possible practical implementation of this technique. Scalar Quantization is the topic of chapter 5 where existing systems are introduced before dealing with the general case of many descriptions. Bounds and implementation issues are presented. Finally, in chapter 6, UEP and MDSQ are both compared from the point of view of performance (in terms of average end-to-end distortion) and complexity.

2.5

Applications

Many types of information are useful at more than one quality level. For example, if a high-quality image is useful, a lower-quality image could be sufficient for some users. If any user is satisfied by a lower-quality version of an information source, a MD representation may be suitable. MD Coding generally requires some redundancy in encoded representations to gain robustness to combat packet losses. Some compression efficiency is thus sacrificed and so this technique should only be applied if the disadvantage in compression is offset by the advantage of mitigating transport failures. MD Coding techniques can be naturally applied in packet networks. In such networks, packets are lost for many reasons. This can occur seemingly at random when an intermediate node along a packet’s path becomes congested, resulting in buffer overflows. This provides, for example in the Internet, a packet loss probability that varies with time of day and connection routing. 24

2 – Introduction

In addition, the Internet is becoming very heterogeneous as backbone capacities increases but more low-bandwidth, wireless devices are connected. Moving data from a higher to a lower-bandwidth link may require dropping packets to accommodate the lower capacity. Using of a multiresolution or layered source coding system is a good solution if the network is able to provide different treatment to some packets but, in general, networks will not look inside packets and so they will be dropped at random. In a network using Internet Protocol Version 6 (IPv6) [4], a node is required to handle at least 576-byte packets without fragmentation. To send packets with less than 536-byte payloads is wasteful and so, with packets of this size, a typical image may be communicated in about ten packets and so techniques that generate MD using many descriptions are important. MD techniques seem to be appropriate for packet networks when retransmissions are not possible, long delays are not acceptable and the number of packets to describe a source is more than one. The distributed storage problem also matches the MD framework well. Typical users could view local images copies but when the need for higher quality arises, one or more remote copies could be retrieved and combined with the local copy. In many wireless systems, to provide robustness against bit errors, a transmitter can hop through a set of carrier frequencies in a manner known to the receiver. This is what happens, for example, in the GSM system. This gives protection against picking a bad carrier frequency. Some carriers will be good and others will be bad. Carrier frequencies can be considered separate channels for a MD source code. Channel codes can be applied separately for each carrier frequency. For some channels, all errors will be corrected, the other channels can be considered lost. Finally, MD coding can be applied to ad-Hoc networks. These networks are composed by a large number of unreliable devices. From any particular node, there are many possible paths to reach any other node. However, the probability of one of these paths to fail is not negligible. Multi path routing techniques have been found to be a good strategy under these conditions to increase robustness. These particular networks strongly call for specific coding techniques capable of exploiting the path diversity present in the network. MD coding fits very well in this scenario.

25

Chapter 3 Background 3.1

Joint source channel coding

Source coding problem deals with the task of removing redundancy from an information source creating a compressed data-stream while channel coding adds controlled redundancy to a data-stream so that it can withstand bit errors caused by the channel noise. The so-called Separation Theorem [3] guarantees that, for stationary memoryless sources and channels, the end-to-end information source transmission problem can be decomposed into the source coding part and the channel coding part. Communication channels, although subject to various physical interferences, can be well characterized statistically. The same cannot be said of sources of practical interest like images, video and audio. Nevertheless, separation is very useful because one can devise a method for compressing a source without any knowledge of the channel and without having to make adjustments for different channels. Similarly, the design of a system to communicate over a particular channel can be based simply on getting bits to the destination, without regard for the significance of each bit. Separation Theorem addresses only the possibility of communication, not the best way to achieve it. Compressing a source as much as possible and communicating bits at close to the channel capacity generally require the simultaneous processing of a large amount of data. This implies a prohibitive computational complexity and increases the cost of implementing such a system. These problems have lead to the realization of systems that use the so-called joint 26

3 – Background

0 0 1 1

0 1 0 1

0 1 1 0

Table 3.1. A (3,2) parity code

source-channel code (JSCC). These techniques can achieve the same performance of separating source and channel coding with less delay and computation. There are many alternatives to create a JSCC. For example, one could design mappings from source sequences to channel sequences without any intermediate representation. However this technique could be unpractical because it implies complete redesign when any specification changes. On the other side, one could use a source-assisted channel coding in which probabilities of source encoder outputs are used at the decoder to reduce the chance of channel decoder errors. MD problem is a particular generalized source coding problem (channel-optimized source coding) and it could be seen as a joint source-channel code system in which the behavior of the channel (in terms of probability or error) modifies the behavior of the source coder.

3.1.1

(M,k) source channel erasure codes

Packet switched networks can be efficiently modelled as packet erasure channels [9]. An information source is encoded into a large number of packets and transmitted over the network. The network randomly erases some of the packets and transmits the rest of them without errors. The decoder is expected to obtain a reconstruction for the source information based on the packets received. Erasure channel codes [11] enable reconstruction at the decoder with the reception of a subset of the packets transmitted and offer a solution to the problem of transmission over these channels. An (M,k,d) erasure channel code refers to a construction where k user symbols belonging to a finite field are encoded into M channel symbols (also belonging to the same finite field) such that with the reception of any (M − d + 1) of the M channel symbols, the original k user symbols can be recovered. Channel codes for which d = M − k + 1 (i.e., the k user symbols can be recovered when any k channel symbols are received) are referred to as Maximum Distance 27

3 – Background x1 y1

Figure 3.1.

x3 . . . xN −2

x2 y2

...

y3

xN −1

yN −1

yN

One-dimensional quantizer interval endpoints and levels

Separable (MDS) codes. The parity bit code is a MDS codes and an example is shown in table 3.1. The first two bits of each line are source bits and the third bit is added so that the number of 1’s in each codeword is even. Each word can be decoded if at least two bits are received and their position in the codeword is known. Over the binary field (where the symbols are 0 or 1), there are almost no other MDS codes. In other fields, Reed Solomon codes are a popular class of codes that belong to the family of MDS codes [9].

3.2

Quantization

Quantization is the heart of analog-to-digital conversion. In its simplest form, a quantizer observes a single number and selects the nearest approximating value from a predetermined finite set of allowed numerical values. Ordinarily, the input value is analog and the output is digital, being uniquely specified by an integer in the set {1,2,3, . . . ,N } where N is the size of the set of output values. We define an N-point scalar quantizer Q as a mapping Q : R → C where R is the real line and C ≡ {y1 ,y2 ,y3 , . . . ,yN } ⊂ R (3.1) is the output set or codebook with size N . The output values yi are called reproduction points. In many cases, N is finite so that a finite number of binary digits (dlog2 N e) is sufficient to specify the output value. Associated with every N point quantizer is a partition of the real line R into N cells Ri , for i = 1,2, . . . ,N . The i-th cell is given by: Ri = {x ∈ R : Q(x) = yi , xi−1 < x ≤ xi } ≡ Q−1 (yi ) (3.2) S T It follows that i Ri = R and Ri Rj = 0 for i 6= j. A cell that is unbounded is called an overload cell. Each bounded cell is called a granular cell. The values xi are called boundary points. 28

3 – Background

Usually, x0 = −∞ and xN = +∞ and the cells R0 and RN are overload cells. The range B of a quantizer is defined as the total length of the granular cells, so that for an unbounded regular quantizer, B = xN −1 − x1 . A quantizer is defined to be regular if each cell Ri is an interval and yi ∈ (xi−1 ,xi ). In most applications, quantizers are regular but irregular quantizers are of interest for MD coding techniques. There, a cells Ri may be composed by several disjoint intervals and the reproduction point yi can belong to one of these intervals or being out of them. Every quantizer can be viewed as the combined effect of two successive operations (mappings): an encoder E and a decoder D. The encoder is a mapping E : R → I where I = {1,2,3, . . . ,N } and the decoder is the mapping D : I → C. Thus, if Q(x) = yi , then E(x) = i and Di = yi . In the context of a waveform communication system, the encoder transmits the index i of the selected level yi , chosen to represent an input sample and not the value yi itself.

3.2.1

Quantizer performance

The purpose of quantization is to provide a limited-precision description of an input value. The input can be modelled as a random variable, having some specific statistical character, usually specified by its probability density function (pdf) fX (x). Consequently, the error introduced in quantizing this value will also be random. The difference between the input x and the output Q(x) of a quantizer is called the quantization error and the design of a quantizer is naturally directed toward making the quantization error small. To conveniently assess the performance of a particular quantizer, we need a single number that indicates the overall quality degradation or distortion introduced by the quantization process. The most common measure of the distortion is the squared error defined by: d(x,Q(x)) = |x − Q(x)|2

(3.3)

so that the mean squared quantization error is given by: D = E[d(X,Q(X))] = E[(X − Q(X))2 ] N Z X = (x − Q(x))2 fX (x)dx i=1

29

Ri

(3.4)

3 – Background

Given a sequence of real numbers {Xn } as input of a scalar quantizer Q, the inevitable error Xn − Q(Xn ) that arises in the quantization of an analog signal is often regarded as noise introduced by the quantizer. Specifically, we have: • Granular noise Dgran that takes into account the difference between the original value x and its quantized representation yi = Q(x) Dgran =

N −1 Z xi X i=2

(x − yi )2 fX (x)dx

(3.5)

xi−1

• Overload noise Dol that considers the error made by quantizing all the values in the range (−∞, x1 ] with y1 and the ones in the range (xN −1 , + ∞) with yN Z

Z

x1

2

+∞

(x − y1 ) fX (x)dx +

Dol =

(x − yN )2 fX (x)dx

(3.6)

xN −1

−∞

Generally, granular noise is relatively small in amplitude and occurs to varying degrees with the quantization of each input sample while overload noise can have very large amplitudes but, for a well-designed quantizer, will occur very rarely. The performance of a quantizer is ultimately dependent on all the partition boundary values and on all of the output points as well the input statistics. The loading factor is a parameter that has an important influence on quantizer performance and measures the size of the highest decision level xN −1 relative to the root mean squared value σ of the input sequence. The loading factor mediates the trade-off between granular and overload distortion incurred in quantization. Overload noise depends very strongly on the signal amplitude and should be kept low, tuning the system modifying the loading factor.

3.2.2

Uniform quantizer

A uniform quantizer is a regular quantizer in which the boundary points are equally spaced and the output levels for granular cells are the midpoint of the quantization interval. The first condition implies that yi = yi−1 = ∆, for i = 2,3, . . . ,N − 1 and the second condition implies yi = (xi−1 + xi )/2 for i = 2,3, . . . ,N − 1. For a uniform quantizer where the input is bounded, with values lying in the range (x0 ,xN ), the range is divided into N equal quantization cells, each one of size 30

3 – Background

∆ = (xN − x0 )/N and the overload cells have zero length. In this case, the overall distortion is given by [3]: D=

∆2 12

(3.7)

For unbounded inputs, the quantizer has overload cells (−∞,x1 ] and (xN −1 ,+∞) with y1 = x1 − ∆/2 and yN = xN −1 + ∆/2. Most inputs of interest have rapidly decreasing tail probabilities for their pdf’s so that for suitable quantizer designs, the overload region will have very low probability of containing an input sample. Hence, performance results for uniform quantization with bounded inputs are approximately valid when the input is unbounded as long as the loading factor is sufficient large. The average distortion is the combined effect of granular and overload quantization errors. Usually, the granular error is bigger than the overload one but, as N increases, the former decreases so that it will be necessary to adjust the loading factor to keep the overload distortion smaller than the granular distortion.

3.2.3

Nonuniform quantizer

If the input distribution has more mass near the origin, the input is more likely to fall in the inner levels of the quantizer. To decrease the average distortion introduced by the quantization process, it is possible to approximate the input better in regions with high probability, perhaps at the cost of worse approximations in regions of lower probability. We can do this by making the quantization intervals smaller in those regions that have more probability mass. For a Gaussian source, we would have smaller intervals near the origin. If we want to keep the number of intervals constant, we will have larger intervals away from the origin. A quantizer that has nonuniform intervals is called nonuniform quantizer. If the intervals closer to zero are smaller, the maximum value that the quantizer error can take on is also smaller, resulting in a better approximation. We pay for this improvement in accuracy at lower input levels by incurring larger errors when the input falls in the outer intervals. However, as the probability of getting smaller input values is much higher than getting larger values, on the average the distortion will be lower than if we had a uniform quantizer. 31

3 – Background

3.2.4

pdf-Optimized quantization

While a nonuniform quantizer provides lower average distortion, the design of a nonuniform quantizer is also somewhat more complex. The basic idea is to find the decision boundaries and reconstruction levels that minimize the mean squared quantization error. If we know the probability model of the source, a direct approach for locating the best nonuniform quantizer is to find the {xi } and {yi } that minimize equation 3.4. Setting the derivative with respect to yj to zero we get: R xj xfX (x)dx x yj = R j−1 (3.8) xj f (x)dx xj−1 X The output point for each quantization interval is the centroid of the probability mass in that interval. Moreover, taking the derivative with respect to xj and setting it equal to zero, we get: yj+1 + yj (3.9) xj = 2 The decision boundary is the midpoint of the two neighboring reconstruction levels. Solving these two equations will give us the values for the reconstruction levels and decision boundaries that minimize the mean squared quantization error. Unfortunately, to solve for yj , we need the values of xj and xj−1 and to solve for xj , we need the values of yj+1 and yj . In practice, it is possible to solve the two equations iteratively with an iterative design procedure called Lloyd-Max algorithm [10].

3.3

Rate distortion theory

We are often interested in representing sources such as real numbers that can never be represented exactly with a finite number of bits. Rate distortion theory is a branch of information theory that gives limits to how closely a source can be approximate with a particular number of bits. Rate distortion bounds give valuable intuition on how the quality of a representation should vary with the length of its description. Assuming that a source produces a sequence of independent, identically distributed, real random variables X1 ,X2 , . . . ,XN , a distortion measure d gives a nonnegative numerical rating d(x,ˆ x) to how well a source letter x ∈ χ is approximate 32

3 – Background

by a reproduction xˆ ∈ χ. ˆ The distortion between sequences xN = (x1 ,x2 , . . . ,xN ) and xˆN = (ˆ x1 ,ˆ x2 , . . . ,ˆ xN ) is defined by: N 1 X d(x ,ˆ x )= d(xi ,ˆ xi ) N i=1 N

N

(3.10)

The most common distortion measure is the squared error distortion defined as: d(x,ˆ x) = (x − xˆ)2

(3.11)

This corresponds, for real sequences of length N , to the squared Euclidean norm between the two sequences divided by N .

3.3.1

Rate distortion region

Given a code with dimension N, rate R and expected distortion DN that consists of an encoding function αN : χN → {1,2,...,2N R } and a decoding function βN : {1,2,...,2N R } → χˆN that satisfy DN = E[d(X N ,βN (αN (X N )))] then, a rate-distortion pair (R,D) is called achievable if there exists a sequence of codes of rate R such that lim DN ≤ D

N →∞

The rate distortion function R(D) is the minimum rate such that (R,D) is in the rate-distortion region. Conversely, the distortion rate function D(R) is the minimum distortion such that (R,D) is in the rate-distortion region. These all depends on the source and distortion measure. The boundary of a RD region is nearly impossible to determine by working directly from definitions. By construction, the rate-distortion function represents the limit of source coding and can be also determined by a constraint minimization as suggested in [19]. This problem has been solved for a few sources. 33

3 – Background

Distortion−Rate function for a Gaussian source with unit variance 0

−10

Distortion [dB]

−20

−30

−40

−50

−60

−70

1

Figure 3.2.

2

3

4

5 6 Rate [bits]

7

8

9

10

Distortion-rate function for a Gaussian source with σ 2 = 1

For a source that emits Gaussian random variables with variance σ 2 , the distortionrate function subject to squared error distortion is [3]: D(R) = σ 2 2−2R

(3.12)

A graphical representation with σ 2 = 1 is given in figure 3.2.

3.3.2

RD region for two descriptions

Referring to figure 2.1, given a distortion measure d, we have three expected average distortions: " # N X 1 ˆ (i) ) for i = 0,1,2 Di = E (3.13) di (Xk ,X k N k=1 The main problem is to determine the set of achievable values for the quintuple (R1 ,R2 ,D0 ,D1 ,D2 ). In general, the MD rate distortion region (for a particular source and distortion measure) is the closure of simultaneously achievable rates and distortions. Side decoder i receives Ri bits per symbol and hence cannot have distortion less 34

3 – Background

than Di (Ri ) where Di is the distortion-rate function of the source for distortion measure di . We obtain:

D0 ≥ D0 (R1 + R2 )

(3.14)

D1 ≥ D1 (R1 ) D2 ≥ D2 (R2 ) With the term central distortion we refer to the distortion achieved when both descriptions are received while the side distortion is the one attained when one description, out of two, is received. Achieving equality simultaneously in bounds 3.14 would imply that an optimal rate (R1 + R2 ) description can be partitioned into two optimal rate R1 and rate R2 descriptions. This is rarely possible because optimal individual descriptions at rates R1 and R2 are similar to each other and hence redundant when combined. Making descriptions individually good and yet not too similar is the fundamental trade-off of MD coding. The set of achievable rates for a memoryless Gaussian source with variance σ 2 (with squared error distortion) has been obtained in [14]:

where γD =

Di ≥ σ 2 2−2Ri for i = 1,2

(3.15)

D0 ≥ σ 2 2−2(R1 +R2 ) · γD (R1 ,R2 ,D1 ,D2 )

(3.16)

1 p p 1 − ( (1 − D1 )(1 − D2 ) − D1 D2 − 2−2(R1 +R2 ) )2

(3.17)

The term γD is the factor by which the central distortion must exceed the distortion-rate bound (3.15). If the descriptions are individually very good, yielding D1 = 2−2R1 and D2 = 2−2R2 , for R1 ≥ R2 we have: γD =

1 1 ≥ 1 − (1 − D1 )(1 − D2 ) 2D2

(3.18)

that gives, substituting in equation 3.16: D0 ≥ D1 D2 γD ≥ 35

D1 2

(3.19)

3 – Background

This means that, when the descriptions are individually good, the joint description is only slightly better than the best of the two. We have also to note that, if the joint description is as good as possible, then γD = 1 and thus D1 + D2 = 1 + 22(R1 +R2 ) = 1 + D0 (3.20) A distortion value of 1 is obtained with no information and so this expression implies a poor reconstruction for at least one of the side decoders. When one or both side distortions are large, γD = 1 and the central distortion can be low. Otherwise, there’s a penalty in the central distortion. Under the assumption R1 = R2 >> 1 and D1 = D2 = 2−2(1−α)R1 with 0 ≤ α ≤ 1 (the balanced case), γD can be estimated as (4D1 )−1 [6] and, thus, D0 ≥ 2−4R1 (4D1 )−1 . Therefore, the product of central and side distortions is approximately lower bounded by: 1 D0 · D1 ≥ σ 2 2−4R1 (3.21) 4

3.3.3

RD region for many descriptions

While for a Gaussian source the complete MD rate-distortion region for two descriptions is known, the general case of M descriptions is an open research problem. The first achievable rate region for general M was presented in [23] where the rate region relies on a conventional conditional successive refinement framework and is in that sense a generalization of [5]. A different approach to obtain bounds for many descriptions has been proposed in [16] which has adopted the coding with side information refinement framework. There, it is shown that it is possible to encode a unit variance i.i.d. Gaussian source into M descriptions with each description containing R bits/sample such that the reconstruction fidelity with the reception of any (k + r) descriptions (for 0 ≤ r ≤ M − k) is given by: Dk+r =

k 22kR (k + r) − r

(3.22)

When r = 0, Dk = 2−2kR which exactly attains the information theoretic optimal rate-distortion performance of the corresponding Gaussian source.

36

Chapter 4 Uep Coding We turn our attention to a practical scheme for generating multiple descriptions called Unequal Error Protection (UEP) coding. This technique allows to generate several packets from a single progressive bitstream using different channel codes. Being a technique for MD systems, the initial bitstream can be reconstructed from any subset of received packets with a quality proportionate to their number. One of the first work to apply this technique is [13] where an iterative algorithm is used to minimize the expected distortion of an image communication system. An other algorithm with the same goal is presented in [17]. We want to study performance of different MD-UEP coding systems using different number of descriptions. These systems should be able to react to changes in network conditions (in terms of variation of probability of losing a packet p) to minimize the end-to-end distortion. The information source generates real random numbers with Gaussian distribution, zero mean and variance σ 2 . This source is then quantized to generate a progressive bitstream of length M · R bits.

4.1

Equal Error Protection

The simplest way to gain robustness to the loss of descriptions is to use a channel code that, given k input symbols, outputs M symbols so that the k input symbols can be recovered from any k of the M output symbols. These channel codes are called Maximum Distance Separable (MDS) (see section 3.1.1 for further details). 37

4 – Uep Coding

Suppose good performance is required when k out of M descriptions are received and each description is at rate R bits. This means that we have a good performance when kR bits reach the destination. The best use of a channel code is to have the source coder produce exactly kR bits and then to apply them a channel code with rate k/M . The channel coder will produce M R bits. When k or more descriptions are received, the channel code will be successfully decoded, yielding the kR bits from the source coder. The reconstruction quality is thus commensurate with kR bits. This is ideal when exactly k descriptions are received but not the best possible when more than k are received because the other descriptions that reach the destination after the first k’s do not add new information. Moreover, when less than k descriptions are received, it is not possible to consistently recover any information. The performance with more or less than k received descriptions highlights the weakness of this approach: the so-called cliff-effect of channel codes. The quality is constant for up to M − k lost descriptions and then drops very sharply with more than M − k losses.

4.2

Unequal Error Protection

The main limitation of Equal Error Protection is that the system is based completely on a single channel code that has to be well constructed according to network conditions. Moreover, only the quality proportionate to kR bits can be achieved and, for that, the decoder must wait for k packets to arrive. UEP systems allow to achieve different levels of quality proportionate to the number of received packets and it is possible to start decoding the original information with the arrival of any subset of packets.

4.3

The two-description case

To create an MD-UEP representation with two descriptions, each one at rate R bits, we may consider (2 − ζ)R bits of a source coder and partition them in three parts as shown in figure 4.1. We denote with ζ (0 ≤ ζ ≤ 1) the fraction of bits to protect the most. The initial, most important, ζR bits are repeated in each description; 38

4 – Uep Coding (1 − ζ)R

D1

ζR

(1 − ζ)R

D2

(1 − ζ)R (1 − ζ)R

ζR

Figure 4.1.

ζR

A graphical representation for M = 2 descriptions. The first ζR bits in D2 are the repetition of the first ζR bits of D1.

0

1

1

1

Figure 4.2.

0

0

1

1

1

0

1

0

1

D1

1 D2

A MD-UEP system for M = 2 descriptions

the second (1 − ζ)R bits are put in Description 1 and the final (1 − ζ)R bits are put in Description 2. Using this method, some of the information is protected by a rate-1/2 code and the rest is unprotected. When one description is received, the decoder can reconstruct the first ζR bits of the bitstream. If both descriptions are received, (2 − ζ)R bits will be decoded. We want to find the optimal amount of redundancy (number of repeated bits in the descriptions) so that the end-to-end distortion is minimized, given the probability p of losing a packet in the network. Using this coding scheme, we transmit (2 − ζ)R bits of information using 2R bits. Therefore, adding redundancy implies sacrifice some source information. If ζ = 0, all the bits are unprotected. If both packets are received, the decoder can reconstruct 2R bits of the original bitstream. If ζ = 1, two identical packets are created and the decoder can reconstruct R bits of the bitstream receiving one of the 2 packets. In this case, no improvement is achieved when both descriptions reach the destination. A more detailed example is given in figure 4.2. Each description has a rate of R = 4 bits and ζR = 2 bits are repeated in each description. This implies ζ = 0.5 that means that the first half of each description contains repeated bits. 39

4 – Uep Coding

4.3.1

Theoretical performance for two descriptions

Let’s consider equation 3.12 with R bits per description and ζ fraction of repeated bits. The central distortion (Dc) and side distortions (Ds1 , Ds2 ) with an Unequal Error Protection technique, for a gaussian source with variance σ 2 (with squared error distortion), are given by: Ds1 = Ds2 = Ds = σ 2 · 2−2ζR

(4.1)

Dc = σ 2 · 2−2(2−ζ)R When the repeated fraction of bits is low (ζ is low), side distortion will be high and the central distortion will be low. Descriptions will contain few repeated bits and, with the reception of both, a long part of the original bitstream will be decoded. When ζ is high, descriptions will be similar. If one of the two is received, a coarse version of the source information will be available and with the reception of both of them, the gain in terms of quality will be low. Each value ζ leads to a particular pair of central and side distortion. In figure 4.3 we plot the trade-off between the central and side distortion for 0 ≤ ζ ≤ 1 and R = 3 bits. When ζ = 0 we obtain Ds = 0 dB and Dc = −36 dB; when ζ = 1, Ds = −18 dB and Dc = −18 dB. We evaluate the quality of the system in terms of end-to-end distortion of the system: Distortion = Dc · (1 − p)2 + (Ds1 + Ds2 ) · p · (1 − p) + σ 2 · p2

(4.2)

With probability (1 − p)2 , both packets are received and the system achieves the central distortion Dc; if one description out of two is received, with probability p · (1 − p), we achieve one of the two side distortions. With probability p2 , all packets are lost and the end-to-end distortion equals the variance σ 2 of the source. In a MD-UEP system, the parameter ζ changes according to network conditions to minimize the end-to-end distortion. When p is low, ζ should have a low value because, most of the times, both packets will reach destination and the end-toend distortion will be the central one. As p increases, ζ increases and this implies stronger protection on the most important bits. Descriptions will be similar and, most of the times, only one of them will reach destination. Figure 4.4 shows an example of the distortion achieved for R = 3 bits and different loss probability p. For values of p bigger than 0.02, the lowest distortion 40

4 – Uep Coding

M=2 descriptions − R=3 bits 0 −2 −4

Side Distortion [dB]

−6 −8 −10 −12 −14 −16 −18 −20 −38

−36

−34

−32

−30

−28

−26

−24

−22

−20

−18

Central Distortion [dB]

Figure 4.3.

Trade-off between central and side distortion for a two-description system with R = 3 bits

is achieved when ζ is 0.75 meaning that 3/4 of each description is identical in both packets. From p = 3 · 10−3 to p = 0.02, the best ζ is 0.5. There, for each of the two packets, the first part contains repeated bits and the second part contains unprotected bits. Given p, the value of ζ that guarantees the lowest end-to-end distortion increases with increasing values of R. This happens because the system can rely on bigger packets and the coder can insert more redundancy bits to achieve better performance. A plot of ζ as function of p and R is given in figure 4.5 for R = 3 bits and R = 6 bits. For each probability p, ζ is higher when the rate is 6 bits. For example, when p = 0.02, the system that relies on 6 bits per description introduces a redundancy of ζ ' 0.8 while the system with R = 3 bits has ζ ' 0.6. Both curves, anyway, have ζ = 0 when p = 0 and the gap between them decreases as p increases. As we can see in figure 4.1, when we receive the first description, we could decode the first R bits of the original bitstream and not only the first ζR. In fact, the (1 − ζ)R unprotected bits are contiguous to the ζR repeated bits from the beginning of the source coder output. But if we receive only the second description, 41

4 – Uep Coding

M=2 descriptions − R=3 bits −5

Distortion [dB]

−10

−15

−20

ζ=0.25 ζ=0.5 ζ=0.75 ζ=1

−25

−30

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

P(loss)

Figure 4.4.

End-to-end distortion for different values of ζ M=2 descriptions

1

0.9

Repetition fraction (ζ)

0.8

0.7

0.6

0.5

0.4 R=3 bits R=6 bits

0.3

0.2

Figure 4.5.

0

0.02

0.04

0.06

0.08

0.1 P(loss)

0.12

0.14

0.16

0.18

0.2

Plot of ζ with different probability of losing a packet p and rate R of each description

42

4 – Uep Coding

the first ζR bits are decoded. Then, receiving only the first description would lead to a lower side distortion than receiving only the second description. Equations 4.1 should be modified in this way: Ds1 = σ 2 · 2−2R

(4.3)

Ds2 = σ 2 · 2−2ζR Dc = σ 2 · 2−2(2−ζ)R In a MD system, all the descriptions should have the same importance. Moreover, to compare fairly different MD systems, we have to use the same framework and so, when the first description is the only one to be received, we will decode its first ζR bits even if this leads to a end-to-end distortion higher than decoding its first R bits (the maximum difference, for R = 3 bits, is around 1.2 dB).

4.4

The M-description case

UEP can be easily generalized to more than two descriptions. To create M descriptions each one of R bits, the output of the progressive coder is partitioned into chunks that are transmitted with channel codes with rates 1/M , 2/M , . . . , j/M , . . . , 1 (1 ≤ j ≤ M ). With M descriptions, ζ is an array of M elements. Each component ζi (0 ≤ i < M ) represents the fraction of bits in each description coded with a channel code of rate (i + 1)/M . The two-description system is a particular case where ζR bits are protected with a channel code of rate 1/M (a repetition code) and the remaining bits are sent with a code of rate M/M (i.e., they are unprotected). In this case, ζ can be considered an array of 2 elements. The first element specifies the fraction of repeated bits and the second one refers to the unprotected part (i.e., the second element ζ1 is the difference 1 − ζ0 ). For M > 2, the progressive bitstream is marked at M different positions, as shown in figure 4.6. Each section corresponds to the attainment of a distortion level. We adopt the terminology used in [17]. The M different sections are called resolution layers because the resolution (quality in terms of distortion) of the received 43

4 – Uep Coding

R0 1

Ri−1

RM −2 RM −1

2

...

i

...

M

1

...

s

...

i

to Descr. 1 Figure 4.6.

Ri−2

R1

to Descr. s

to Descr. i

Progressive bitstream partitioned into M quality layers

bitstream depends on the number of layers (i.e., packets) that reach the destination. The i-th layer can be decoded when the number of packet erasures over the network does not exceed (M − i) or, equivalently, the received packets are, at least, i. The quality is, thus, proportionate to the number of received packets. As the number of received packets increases, the original source is decoded more precisely and the end-to-end distortion decreases. To decode the i-th quality layer when at least any i descriptions out of M are received, the portions of the original bitstream are coded using the family of Reed-Solomon (RS) erasure-correction block codes characterized by the optimal code parameters (M,i,M − i + 1). They can correct any (M − i) erasures out of M descriptions. See section 3.1.1 for further details on RS codes. The i -th quality layer is divided into i subsections of the same length. Each bit of these subsections is put in a different description. Source bits of the i -th quality layer are put in i different descriptions. In the remaining M − i descriptions, the encoder inserts bits to protect the information according to a (M,i) RS code. Each description is, thus, made by M different subsections and contains all the M quality layers. Each description has rate R so the ζ array should satisfy the constraint: M −1 X

ζi = 1

(4.4)

i=0

Let’s consider figure 4.7 for a graphical representation of a MD-UEP system for M = 4 descriptions. The first ζ0 R bits of the bitstream are coded at rate 1/4, i.e., included in all descriptions, so that this portion can be reconstructed from any description. The next 2ζ1 R bits are coded at rate 1/2 so that this information can be 44

4 – Uep Coding ζ0 R ζ1 R ζ2 R ζ3 R ζ0 R ζ1 R ζ2 R ζ3 R ζ0 R

2ζ1 R

3ζ2 R

4ζ3 R

ζ0 R ζ1 R ζ2 R ζ3 R ζ0 R ζ1 R ζ2 R ζ3 R

Figure 4.7.

0

Graphical representation for M = 4 descriptions. The bold R denotes bits added by the RS code.

1

0

Figure 4.8.

0

1

1

0

1

0

1

0

1

0

0

1

0

0

1

1

1

D1

D2

D3

MD-UEP system with three descriptions and parity bits

recovered from any two descriptions. Therefore, any two descriptions are decoded at rate (ζ0 + 2ζ1 )R. The next 3ζ2 R bits are coded at rate 3/4. Any three descriptions can be decoded at rate (ζ0 + 2ζ1 + 3ζ2 )R. The final 4ζ3 R bits are sent without channel coding. An example with M = 3 descriptions using parity bits is given in figure 4.8. The original bitstream is divided into three quality layers (black, red and blue). The first bit is protected with a (3,1) code, the following 4 bits are protected using a (3,2) code and the remaining bits are sent unprotected. If one description is received, the decoder can reconstruct the black part, that is, the first bit. If any two descriptions are received, also the red part is decoded because the parity code (3,2) allows to decode the original bits of information for any 2 out of 3 received bits; if the decoder receives all the descriptions, the original bitstream is decoded up to the blue part. We have R = 4 bits, ζ0 = 0.25, ζ1 = 0.5, ζ2 = 0.25.

4.4.1

Theoretical performance for M descriptions

As in the case of M = 2 of section 4.3.1, we can write similar equations for M > 2. 45

4 – Uep Coding

ˆ k denote the number of source bits decoded if k packets (0 < k < M ) are Let R received: k−1 X ˆ Rk = (i + 1)ζi R (4.5) i=0

we have: ˆ

Dsk = σ 2 · 2−2Rk

(4.6)

ˆM −2R

Dc = DsM = σ 2 · 2

As for the two-description case, we can plot the trade-off between central and side distortions for M = 3 descriptions. Now, we have a side distortion if one packet is received and a different side distortion if two packets are received. Considering equations 4.6 with σ 2 = 1 and M = 3, we obtain: Ds1 = 2−2ζ0 R Ds2 = 2−2(ζ0 +2ζ1 )R Dc = Ds3 = 2−2(ζ0 +2ζ1 +3ζ2 )R Recalling equation 4.4, we see that the system has two degrees of freedom. For example, we can tune ζ0 and ζ1 and the central distortion will be fixed: Ds1 = 2−2ζ0 R Ds2 = Ds1 · 2−4ζ1 R h ³ ´i log D log(D2 /D1 ) −2 3+2( 2R log12 )+ R 4R log 2 Dc = Ds = 2 3

In figure 4.9, we show the trade-off between distortions for M = 3 descriptions and R = 3 bits. The Ds1 axis represents the side distortion when one packet is received, the Ds2 axis refers to side distortion when two descriptions are received and Dc axis corresponds to central distortion when all packets reach destination. The vertexes of the triangle are given by ζ = [0,0,1], ζ = [0,1,0] and ζ = [1,0,0] and represent the points where a specific distortion is minimized. For example, with ζ = [0,0,1] the central distortion achieves its lowest value (approximately −54 dB). Denoting with p the probability of losing a packet in the network, the end-to-end distortion is given by: ¶ M −1 µ X M M Distortion = Dc · (1 − p) + · Dsk · (1 − p)k · pM −k + σ 2 · pM (4.7) k k=1 46

4 – Uep Coding

M=3 descriptions − R=3 bits −15 −20 −25

Dc [dB]

−30 −35 −40 −45 −50 −55 −20 −15 Ds1 [dB] −5 0

−40

−35

−20

−25

−30

−15

−10

−5

0

Ds [dB] 2

Figure 4.9.

Trade-off between central and side distortions for M = 3 descriptions each one of R = 3 bits

Given a progressive coder, the design problem is to choose the ζ array that minimizes, for a given p, this expression.

4.4.2

Optimal rate allocation

In the two-description case we could adopt a sort of exhaustive search for the optimal value of the variable ζ in a finite set in order to minimize the end-to-end distortion. The same cannot be done with M > 2 descriptions because we have to search for an optimal array ζ of M values. In general, the complexity for such a problem is O(2M ) [2]. For this reason, we have to implement a method to get the value of ζ with complexity less than exponential. The technique we use is based on [17] and it allows to get the optimal ζ array with linear complexity. The problem is equivalent to find the M different positions R0 ,...,RM −1 (see figure 4.6 for a graphical representation) in which the original bitstream of length 47

4 – Uep Coding

M · R bits is divided. In fact we have: R0 = ζ0 R

(4.8)

R1 − R0 = 2ζ1 R ... = ... RM −1 − RM −2 = M ζM −1 R Equation 4.7 can be equivalently written in the form: 2

Ed = q−1 σ +

M −1 X

qj D(Rj )

(4.9)

j=0

where: • qj denotes the probability that j + 1 out of M packets are delivered to the destination: ¶ µ M · (1 − p)j+1 · pM −(j+1) (4.10) qj = j+1 with p probability of losing a packet. • D(Rj ) is the distortion-rate function (monotonically decreasing) calculated in Rj , for 0 ≤ j ≤ M − 1. See section 3.3 for further details on distortion-rate theory. Recalling that R0 bits of the original bitstream are inserted in each description, R1 −R0 bits in two different descriptions and so on, we can compute the total output rate Rm as: Rm =

R0 (R1 − R0 ) (RM −1 − RM −2 ) M+ M + ... + M 1 2 M

(4.11)

or equivalently Rm =

M −1 X

αj Rj

j=0

with αj =

M for j = 0, . . . M − 2 (j + 1)(j + 2) αM −1 = 1 48

(4.12)

4 – Uep Coding

Equation 4.12 is the constraint on the MD-UEP system: the total output rate Rm created by the system and divided into M descriptions, should be less or equal than M · R bits, length of the original source bitstream. To obtain the values of Rj that minimizes (4.9) with the constraint (4.12), we can introduce a Lagrangian multiplier Λ [2] and we get: Lc (R1 ,...RM −1 ,Λ) = q−1 σ 2 +

M −1 X

M −1 X

qj D(Rj ) + Λ(

j=0

αj Rj − Rm )

(4.13)

j=0

To get the minimum of this function, the partial derivative of the Lagrangian function with respect to Rj (j = 0, . . . ,M − 1) and Λ equals zero. dLc qj dD(Rj ) + Λ = 0 for j = 0, . . . ,M − 1 ,0⇒ d(Rj ,...,Λ) aj dRj

(4.14)

This means that the optimal solution is obtained by locating the points on the D(R) α curve where slopes are in proportion qjj . For a given p, the iterative algorithm outputs the parameter Λ that satisfies equation 4.14. Figure 4.10 shows Λ as function of the probability p and this function is continuous. This helps to improve the efficiency of the algorithm. In fact, for each p, the algorithm could also receive as input the previous value of Λ. Moreover, we have also to satisfy a constraint on the correct marking of the original bitstream, that is: R0 ≤ R1 ≤ ... ≤ RM −2 ≤ RM −1 dD(R )

(4.15)

The absolute value of dRj j is a monotonically decreasing sequence in j given that α the rate-distortion curve is strictly convex. If the sequence qjj is monotonically decreasing then it follows that (4.15) is satisfied. α The monotonicity of the sequence qjj cannot be guaranteed in general because we α have no control over the channel state information qj . So, the sequence qjj should be converted to a monotonically decreasing sequence before applying the Lagrangian method and this conversion influences the rate allocation. α The algorithm to convert the sequence qjj into a decreasing sequence is proposed n+1 then the algorithm outputs a new sequence with in [17]. If, for any n, αqnn ≤ αqn+1 α ˆ n = αn + αn+1 , qˆn = qn + qn+1 and Rn = Rn+1 . It has been proven in [17] that this is the optimal solution. 49

4 – Uep Coding

M=3 descriptions − R=4 bits

−3

1.6

x 10

1.4

1.2

Λ parameter

1

0.8

0.6

0.4

0.2

0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

P(loss)

Figure 4.10.

Plot of Λ given by the iterative algorithm to get the optimal ζ array as function of p

Solving equation 4.14 with constraints 4.12 and 4.15 can lead to negative values. In fact, there is not a constraint like R0 ≥ 0 that should also be verified. Instead of inserting a third constraint, if negative values of rate occurs, we force them to zero and then we reapply the Lagrangian method to the remaining part of the array. As we will see in the results, this is a good solution to deal with negative values. When the resulting array includes negative values, these are in the first positions of ζ array meaning that very few bits are protected with strong channel codes and this usually occurs with low values of p.

4.4.3

Theoretical constraints

From an analytical point of view, avoiding negative values of the rate would imply the following constraint on Λ. The source of our MD-UEP system is a a zeromean Gaussian source with variance σ 2 . For solving equation 4.14, the allocation algorithm has to calculate the derivative of the distortion-rate function in different 50

4 – Uep Coding

points Rj . The derivative is: dD(R) = −2σ 2 ln 2 · 2−2R dR and so we have:

µ



− dR 2σ 2 ln 2

ln R=

dD(R)

(4.16)

−2 · ln 2

The rate R of each description has to be nonnegative: Ã R ≥ 0 ⇒ ln

− dD(R) dR 2σ 2 ln 2

à R≥0⇒0< This leads to:

!

− dD(R) dR 2σ 2 ln 2

≤0 ! ≤1

true ∀R z }| { dD(R) −2σ 2 ln 2 ≤ 0.3, the two descriptions contain the same bits. For these values of p, no gain in terms of quality is achieved when both descriptions reach destination because they are identical.

4.6

Performance with M descriptions

We now analyze performance of MD-UEP coding using different frameworks in which many descriptions are used. We want to see if using many descriptions leads to lower end-to-end distortion. In figure 4.15 the length of the original source bitstream is M R = 12 bits. The MD-UEP coder uses up to six descriptions to transmit the source information. The rate R of each description is modified for each system so that the product M · R is always 12 bits. The allocation algorithm of section 55

4 – Uep Coding

Rate allocation −− M=2 R=3 6

5

Rate allocation [bits]

4

3

R 0 R1

2

1

0

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5 −3

x 10

P(loss)

Rate allocation −− M=2 R=3 5.5

5

4.5

Rate allocation [bits]

4

3.5

3

2.5

2

1.5 R0 R1

1

0.5

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

P(loss)

Figure 4.14.

Rate allocation for M = 2 descriptions and R = 3 bits

56

4 – Uep Coding

4.4.2 outputs the optimal level of redundancy given the probability p. Then, the distortion is calculated with equations 4.6 and 4.7. As we can clearly see, using the highest number of descriptions always leads to the lowest distortion for a wide range of values p. The distortion with M = 6 descriptions is 10 dB less than the distortion with M = 2. This can be explained by the fact that the system can rely on a high number of packets M to bring the information to the receiver. Into these descriptions the coder inserts the optimal level of redundancy that depends on p but also on the rate R of each description and on their number M . As seen in section 4.5, at very low probability p, using too many descriptions leads to higher distortion than using a small number or even only one. The SD system is the best up to p ' 7 · 10−8 even if the gain over the best MD system is only around 1 dB. At these values of p, the performance of the MD-UEP system is worse than the SD one because it relies on too many packets in which the redundancy is very low. Losing any of these will not allow the decoder to reconstruct a significant portion of the original source. For bitstream of practical use and congested networks, the values of p are very often in a range where the use of many descriptions is, from a theoretical point of view, always preferred. A comparison with many descriptions with a progressive bitstream of 24 bits is given in figure 4.16. As seen before, using the highest number of descriptions lead to the lowest end-to-end distortion. Here, the best system is the one that relies on 12 descriptions. Our practical implementation of MD techniques should give bounds on the number of descriptions to be used.

4.7

Practical rate allocation

The optimal ζ array is made of non-integer values that refer to how partitioning the bitstream in order to minimize the end-to-end distortion. To satisfy exactly this allocation, we should be able to take fractions of bits from the original bitstream. Clearly, this is not possible in a real system and so we need to round the ζ array generated by the algorithm. We will then guarantee the theoretical rate allocation over a long sequence of samples where each sample corresponds to a different progressive bitstream. 57

4 – Uep Coding

MD systems for MR=12 bits −66

−67

Distortion [dB]

−68

−69

−70

−71

M=1 M=2 M=3 M=4 M=6

−72

−73

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 −7

x 10

P(loss)

MD systems for MR=12 bits 0

−10

Distortion [dB]

−20

−30

−40

−50

−60

M=1 M=2 M=3 M=4 M=6 0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

P(loss)

Figure 4.15.

MD-UEP systems for a bitstream of M R = 12 bits

58

4 – Uep Coding

MD systems for MR=24 bits 0

−20

Distortion [dB]

−40

−60

−80 M=1 M=2 M=3 M=4 M=6 M=8 M=12

−100

−120

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

P(loss)

Figure 4.16.

MD-UEP systems for a bitstream of M R = 24 bits

Figure 4.17 shows the mechanism for M = 4 descriptions. Given the optimal ζ array generated by the allocation algorithm, the first step is to round to a finite number d of decimal places its M components. After this, we have to verify that their sum still satisfy equation 4.4. If not, the array is tuned modifying the components bigger than zero at the end of the array. Then, each value is multiplied by R · 10d . This guarantees that each component is now integer1 . This new array, that we denote by Rb, contains M integer numbers that sum to R · 10d . The practical rate allocation algorithm guarantees that the proportion given by the ζ array is satisfied out of 10d samples. With Rb as input, the algorithm generates a new array denoted by Rbpd (repeated bits per description) with M rows and 10d columns. The array Rbpd is generated as follows. Each column refers to a sample and the integer number in a cell denotes how many bits of each description are coded with a specific code-rate. Row i (with i = 1, . . . ,M ) refers to a i/M channel code. For example, if there’s a 1 in the first position of a column (i = 1), for that given 1

It would sufficient to multiply by 10d and the multiplication by R is implemented to be coherent with the notation.

59

4 – Uep Coding

Optimal ζ array ζ0

ζ1

ζ2

ζ3

Round off

Rounded ζ array ζˆ0

ζˆ1

ζˆ2

ζˆ3

·R · 10d

Rb array ζˆ0 R · 10d

Figure 4.17.

ζˆ1 R · 10d

ζˆ3 R · 10d

Rate allocation mechanism for M = 4 descriptions

1 1 1 0 Figure 4.18.

ζˆ2 R · 10d

1 0 1 1

0 1 0 2

First three columns of a Rbpd array for M = 4 and R = 3 bits

sample, 1 bit of information will be repeated in all the descriptions (i.e., coded at rate 1/M ). If there’s a n in the second position of a column (i = 2), n bits in each description belong to a 2/M channel-code and so i × n bits of information are sent. Each column sums to R. Moreover, row i of Rbpd sums to (ζi−1 · R · 10d ). In fact, out of R · 10d bits, there should be (ζi−1 · R · 10d ) bits to satisfy the original ζ array. Figure 4.18 shows an example of the first columns of a possible Rbpd array for M = 4 descriptions with R = 3 bits. We note that each column sums to 3. With the constraints on the sums of each column and row, the algorithm arranges numbers in Rbpd minimizing the variance of the numbers in each column to make the 10d distortions as similar as possible. We simulate the delivery of j packets (with j = 0, . . . ,M ) decoding the portion of original sample up to the number of bits proportionate to j according to the Rbpd 60

4 – Uep Coding

array. For example, let’s consider the first column of the Rbpd array in figure 4.18. If one description is received, we can decode the first bit. If two descriptions are available at the receiver, we can decode 1 · 1 + 1 · 2 = 3 bits of the original bitstream.

4.7.1

Limitations of the allocation algorithm

The main limitation with the implementation of the practical rate allocation algorithm is that the distortion of the simulated system and the theoretical one do not coincide exactly. This happens because, for a Gaussian source like the one we consider, the rate-distortion function is not linear with the rate but exponential. The difference becomes more significant as the values of ζR are smaller. The value of p from which the difference becomes relevant decreases as M · R increases. A comparison between the theoretical system and the simulated one is given in figures 4.19 and 4.20 for the two-description case. There, we plot the end-to-end distortion as function of p for R = 3 and R = 6 bits. If the probability p is around 0.2 we should use, at least, 6 bits for each of the two descriptions to correctly simulate the MD-UEP system. Lower rates imply a higher difference between the theoretical and simulated system. To compare performance with many descriptions, we cannot use a bitstream of 12 bits. In this case, in fact, even if the two-description case has performance near to theoretical bounds, for M = 3 descriptions with R = 4 bits the gap is not negligible as shown in figure 4.21. There, the difference between the theoretical and simulated system is significant up to p ' 0.25. If we use R = 8 bits and M = 3, we see in figure 4.22 that the gap is reduced from p ' 0.1 on. We can control the limitation of the practical implementation using long bitstreams that allow systems that rely on many descriptions to work with large values of R. Fixing a range of values p, the difference between the theoretical and simulated distortions is reduced as R increases. This bound on R implies a lower bound on the number of descriptions that can be used in our simulated system. A practical result for a bitstream of 48 bits is given in figure 4.23. There, the end-to-end distortion is plotted as function of p. We see that we should not use more than 8 descriptions (each one of 6 bits) to study performance from p > 0.01. In this range of value, using 8 descriptions always outperforms the systems that rely on a smaller number of packets. 61

4 – Uep Coding

M=2 descriptions − R=3 bits −5

Distortion [dB]

−10

−15

−20

−25

−30

simulation theoretical

0

0.05

0.1

0.15

0.2

0.25

P(loss)

Figure 4.19. M = 2 descriptions with R = 3 bits. Comparison between simulation and theoretical performance M=2 descriptions − R=6 bits −10

−15

−20

Distortion [dB]

−25

−30

−35 simulation theoretical

−40

−45

−50

0

0.05

0.1

0.15

0.2

0.25

P(loss)

Figure 4.20. M = 2 descriptions with R = 6 bits. Comparison between simulation and theoretical performance

62

4 – Uep Coding

M=3 descriptions − R=12 bits −15

−20

−25

Distortion [dB]

−30

−35

−40

−45

simulation theoretical

−50

−55

0

0.05

0.1

0.15

0.2

0.25

P(loss)

Figure 4.21. M = 3 descriptions, R = 4 bits. Comparison between simulation and theoretical performance M=3 descriptions − R=8 bits −10

−20

Distortion [dB]

−30

−40

−50

−60

simulation theoretical

−70

−80

0

0.05

0.1

0.15

0.2

0.25

0.3

P(loss)

Figure 4.22. M = 3 descriptions, R = 8 bits. Comparison between simulation and theoretical performance

63

4 – Uep Coding

Simulated MD systems for MR=48 bits 0

−20

Distortion [dB]

−40

−60

−80

−100

−120

M=2 M=3 M=4 M=6 M=8 0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

P(loss)

Figure 4.23.

Practical comparison of different MD-UEP systems with a bitstream of M R = 48 bits

64

Chapter 5 MDSQ coding The MD technique called Multiple Description Scalar Quantization (MDSQ) consists in creating multiple descriptions from a single source using M different quantizers. Each quantizer has a rate of R bits and should offer a particular representation of the source. The representation given by a quantizer is sent to the receiver through a description. Quantizers should be created to be individually good and, at the same time, mutually refining each others to allow the decoder to get an approximation of the original bitstream even when few descriptions are received. The quality of the representation at the decoder should improve as the number of descriptions that reach the destination increases. Our goal is to create M quantizers that satisfy the property of giving the best representation of the source proportionate to their number at the destination. Descriptions are sent through a network characterized by the probability of losing a packet p that doesn’t depend on the length of the packet. One of the first work to study this technique is [21] where the two-description case is addressed. Our goal is to generalize this technique to more descriptions.

5.1

The two-description case

An MDSQ system for M = 2 descriptions consists in two different quantizers that produce two different representations of the same input source. If one description is received, the decoder will obtain a coarse representation of the input source while 65

5 – MDSQ coding

Quantizer Source

Index Assignment

α

`

i1 i2

MDSQ encoder α0

Figure 5.1.

Graphical representation of a MDSQ encoder for 2 descriptions

when both descriptions are available at the receiver, a finer representation is generated. An MDSQ encoder α0 outputs, for each input scalar sample x ∈ R, a pair of quantization indices (i1 ,i2 ). Each index represents a description (or output channel) as shown in figure 5.1 and it’s represented with R bits. We can denote the encoder mapping by: α0 : R → I1 × I2

(5.1)

where symbols in Ii are communicated over channel i (i = 1,2). Indices offer a particular representation of the source information. We can see α0 as composition of an ordinary quantizer encoder α and an index assignment `: α:R→I (5.2) ` : I → I 1 × I2

(5.3)

α0 = ` ◦ α

(5.4)

so that: The central decoder estimates the source receiving the two indices while the other two decoders estimate x from an index only. The central decoder receives the entire representation and thus is a mapping: β0 : I1 × I2 → R

(5.5)

Side decoders receive one of the two indices produced by α0 : βi : Ii → R denoting with i one of the two side decoders. 66

(5.6)

5 – MDSQ coding

1 2 3 4 Figure 5.2.

1 1 2

2 3 4 6

3

4

5 7 8

9 10

A possible index assignment matrix

Thus, we can see a MDSQ system as an ordinary quantizer plus an index assignment and two extra decoder mappings βi (with i = 1,2) for side decoders. The index assignment ` must be invertible so that the central decoder β0 can recover the output of α. Vaishampayan [21] introduced a convenient way to visualize the encoder. The indices (i1 ,i2 ) represent the row and the column of the index assignment matrix. This matrix has 2R rows and 2R columns, denoting with R the number of bits used by each of the two channels. This matrix contains every possible output of the regular quantizer. The number of possible output levels is denoted by B and is given by a combination of indices (i1 ,i2 ). We denote with the term redundancy the expression: ρ = 2R − log2 B

(5.7)

The matrix can be filled up to B = 22R numbers and with no less than 2R , as explained in [21]. This implies 0 ≤ ρ ≤ R. Let’s consider the index assignment in figure 5.2. There, B = 10 levels and if both indices are received, one of the ten levels is decoded. If one index reaches destination, the decoder should guess which level has been sent within a subset of possible values. In fact, the decoder only knows the row or the column of the original sample. This uncertainly increases as B increases because more numbers are inserted into the matrix. If the matrix is completely filled, the output of the encoder consists of B = 16 different levels. If both descriptions are received, the quality of the representation is the quality of a single quantizer with 16 levels, that is log2 16 = 4 bits. We note that each index can be represented with R = log2 4 = 2 bits and thus, according to equation 5.7, ρ = 0. The central distortion Dc (that represents the end-to-end distortion when both descriptions are received) will be very low but if only one 67

5 – MDSQ coding

index is received, the side distortion Ds (that represents the distortion when only one index is received) will be high. This can be summarized as follows: ↑ B ⇒ ↓ ρ ⇒ ↓ Dc ⇒ ↑ Ds As B increases, the redundancy ρ introduced by the index assignment decreases and this implies a lower central distortion and a higher side distortion. Many index assignments are possible. An index assignment with low redundancy leads to a central decoder with high resolution but side decoders will obtain very coarse representations of the input sequence. This assignment could be used when, most of the times, both indices are received. An index assignment with high redundancy is useful when one of the two descriptions is lost, that is when p is high. The price for an individual description being good is that the overall representation has high redundancy. Vaishampayan proposed a generalized Lloyd-like algorithm for MDSQ design that uses Lagrange multipliers to create a scalar distortion criterion and then one can alternate between improving the encoder α and the decoders βi (i = 0,1,2) until an MDSQ that is locally optimal (for the given ` and Lagrangian multiplier) is obtained. However, the optimization of ` is not included and, as Vaishampayan explains, no method for optimizing the index assignment is known, so several heuristic techniques are given. The basic ideas common to these techniques are that the index assignment matrix should be populated from the main diagonal outward and that the numbering should run from upper-left corner to lower-right corner. The redundancy of the index assignment should be updated each time that network conditions change. When the probability of losing a packet is low, many diagonals should be filled and, as p increases, their number should be reduced. Figures 5.3 shows a graphical representation for the index assignment matrix of figure 5.2. The black quantizer is the encoder α and the red and blue quantizers, each one of rate R = 2 bits, represent the index assignment. The output representation of the central decoder has 10 levels. Quantizers created by ` are, in general, irregular. In fact, each cell of the red and blue quantizers are composed by disjoint intervals. Redundancy is roughly 4−log2 10 ≈ 0.68 (the maximum is 2). This index assignment could be adopted when the probability p is quite low because it leads to high side distortion and low central distortion. 68

5 – MDSQ coding

q(x) 4 4 3 3 2 2 1 1

x

1 Figure 5.3.

5.1.1

2

3

4

5

6

7

8

9 10

Graphical representation of the index assignment of figure 5.2

Theoretical performance

To study the theoretical performance of a MDSQ system, we consider, in equation 3.15, R1 = R2 = R and D1 = D2 = Ds. We denote by β (0 ≤ β ≤ 1) the correlation between descriptions and we have: D1 = D2 = Ds = 2−2βR Dc = 2−4R

(5.8)

1 − (1 − Ds −

1 √

Ds2 − 2−4R )2

We study the quality of a MDSQ system in terms of end-to-end distortion: Distortion = Dc · (1 − p)2 + (2Ds) · p · (1 − p) + σ 2 · p2

(5.9)

With probability (1 − p)2 both packets are received at the decoder and the system achieves the central distortion. With probability 2p(1 − p) one packet is lost and this leads to side distortion Ds. If both packets are lost, distortion equals the source variance σ 2 . As for the UEP system, our goal is to minimize equation 5.9 for each probability of losing a packet p. 69

5 – MDSQ coding

5.1.2

Implementation of the index assignment

The critical step in the implementation of a MDSQ system is the optimization of the index assignment `. No method for this optimization is known. The problem can be reduced to fill a 2R × 2R matrix with no more than 22R numbers, starting from 1 so that the difference between the biggest and the smallest number (denoted by the term spread ) in each row is as small as possible. The method we adopt to fill the two-dimensional index assignment matrix is derived from [1]. There, an algorithm that produces an arrangement of 22R numbers into a 2R ×2R matrix (that is, a completely filled matrix) is proposed and it’s optimal in the sense that minimizes the spread in each row and column. Denoting with N the length of each row or column (N = 2R ), the arrangement consists in filling consecutively, left to right, the upper half-columns of the matrix and then fill the lower half-columns of the matrix in the same manner that is: 1. For i = 1 . . . N , fill the cells (1,i), (2,i), . . . , ( with numbers (i − 1)

N ,i) 2

N N N + 1, (i − 1) + 2, . . . , i 2 2 2

2. For i = 1 . . . N , fill the cells (

N N + 1,i), ( + 2,i), . . . , (N,i) 2 2

with numbers N2 N N2 N N2 N + (i − 1) + 1, + (i − 1) + 2, +i 2 2 2 2 2 2 The arrangement is shown schematically in figure 5.4 and guarantees that the spread is, at most: N (N + 1) −1 (5.10) 2 An example for N = 4 is given in figure 5.5. There, numbers from 1 to 8 are vertically aligned in the upper half-columns of the matrix and the remaining numbers in the lower half-columns. This arrangement guarantees a maximum spread of 9 in each row and column. 70

5 – MDSQ coding 1 1

N 1

2

...

N

...

2N

N/2 N+1 N+2

N

Figure 5.4.

Optimal arrangement to completely fill a N ×N matrix that minimizes the spread in each row

1 2 9 10 Figure 5.5.

5.1.3

3 4 11 12

5 6 13 14

7 8 15 16

Optimal arrangement for N = 4 where the maximum spread is 9

The incomplete matrix

The optimal algorithm proposed in [1] doesn’t deal with arrangements of less than N 2 numbers in a N × N matrix. An index assignment with N 2 numbers can be used when both indices are received because the quality achieved is proportionate to log2 N 2 = 2R bits. When only one index is received, the decoder has a great uncertainty about the original quantized sample because it can only select a row or a column where N numbers are inserted. An index assignment with N 2 numbers leads to high side distortion and implies high end-to-end distortion for high values of p. Thus, we need to fill the matrix with less than N 2 numbers with the constraint of minimizing the maximum spread. This will decrease the numbers in each row and column and help the decoder to get the original output level of the source quantizer even with the reception of an index only. To fill the matrix with less than N 2 numbers, we refers to the techniques proposed in [21]. There, the idea is to fill the index assignment matrix in diagonals, starting 71

5 – MDSQ coding

1

2

4

6

3 5 7

Figure 5.6.

A 4 × 4 matrix: order of diagonals to be filled

from the inner one. The total number of diagonals for a 2R × 2R matrix is: D = 2R+1 − 1

(5.11)

The algorithm scans the matrix according to the method of the complete filling and inserts a number in a given cell if this cell belongs to a diagonal we want to fill. We can control the trade-off between central and side distortion by selecting a matrix with a particular number of diagonals filled. The algorithm starts from position (1,1) and ends in (N,N ) denoting with (i,j) the cell at the intersection between the i-th row and the j-th column. Diagonals are filled from the central one outward as explained in figure 5.6. The first diagonal of the matrix is the inner one and contains cells (i,i) (with 1 ≤ i ≤ N ). This diagonal contains N numbers. If we want to insert only N numbers then we should put these numbers in the central diagonal and this guarantees that the spread will be zero because in each row or column, only one cell is occupied. Figure 5.7 shows the matrix generated by the algorithm when three diagonals are filled. The first cell to be considered is (1,1). It belongs to the first diagonal so the current number (i.e., 1 ) is inserted. The algorithm then considers cell (2,1) that belongs to the third diagonal as figure 5.6 shows. We want to fill the matrix up to the third diagonal and so the next number (i.e., 2 ) is inserted in (2,1). The algorithm continues with cell (1,2) until cell (N,N ). Figures 5.8 and 5.9 show performance of different index assignment for R = 2 bits. Substituting R = 2 in equation 5.11, we see that there are 7 diagonals in the matrix, each one corresponding to a particular arrangement. Each arrangement 72

5 – MDSQ coding

1 2

Figure 5.7.

3 4 6

5 7 8

9 10

A possible index assignment for a 4 × 4 matrix with 3 diagonals filled

R=2 bits, 4 x 4 matrix 15 max spread in each line max spread in the matrix

Spread

10

5

0

4

6

8

10 Max quantized value

12

14

16

Figure 5.8. M = 2 descriptions, R = 2 bits, spread vs number of occupied cells

guarantees a maximum spread in each row and column. Filling four diagonals implies, for example, a maximum spread of 5 and allows to insert 12 numbers in the matrix. The spread in each row and column is approximately linear with the highest quantized value. The maximum spread in the plane is the difference between the biggest and the smallest number in the matrix and it can be used as bound to evaluate the performance of the corresponding spread in each row or column. An other example is given for R = 3 bits in figures 5.10 and 5.11. There, there are 15 different diagonals and the highest quantized level is 22R = 64. 73

5 – MDSQ coding

R=2 bits, 4 x 4 matrix 15 max spread in each line max spread in the matrix

Spread

10

5

0

1

2

3

4 Number of diagonals filled

5

6

7

Figure 5.9. M = 2 descriptions, R = 2 bits, spread vs number of diagonals filled R=3 bits, 8 x 8 matrix max spread in each line max spread in the matrix

60

50

Spread

40

30

20

10

0

10

20

30 40 Max quantized value

50

60

Figure 5.10. M = 2 descriptions, R = 3 bits, spread vs number of occupied cells

74

5 – MDSQ coding

R=3 bits, 8 x 8 matrix max spread in each line max spread in the matrix

60

50

Spread

40

30

20

10

0

0

2

4

6 8 10 Number of diagonals filled

12

14

16

Figure 5.11. M = 2 descriptions, R = 3 bits, spread vs number of diagonals filled

5.1.4

System for two descriptions

The central encoder α quantizes the source with B different levels. According to the assignment `, it generates two indices to be sent as descriptions of the information source. The decoder has its own index assignment matrix so that, given the indices of row and column, it can decode the original information. If one index is received, the decoder knows only the row or column in which the original quantized value lies. It can recover a coarse version of the original sample in many different manners like: • calculating the centroid for the selected interval using the Lloyd-Max algorithm to minimize the quantization error as seen in section 3.2; • taking a cell as more significative than the others and decoding its value (for example, for a Gaussian source, with zero mean, we could take the cell nearest to dB/2e + 1 that refers to the mean of the Gaussian distribution); • taking the mean value of the numbers in the row and decoding its value. Performance of each of these methods depends on the distribution of the source. 75

5 – MDSQ coding

In general, using centroids is the most accurate method but also the slowest to run. The second and the third method are easier to implement but more inaccurate. Given an index assignment, the receiver calculates the decoded output value (using centroids or other techniques) for each of the 2R+1 rows or columns. These values could be calculated before the decoding starts and as network conditions change (and the index assignment matrix is updated) they need to be re-calculated. This could be problematic as R increases.

5.1.5

Comparison of different index assignments

When network conditions change, the coder reacts modifying the number of diagonals filled in the matrix. When p is low, many diagonals should be filled because, most of the times, both indices will reach destination and the decoder will know exactly which cell contains the original sample value. When p is high, it is better to have few diagonals in the matrix. Most of the times, in fact, an index is lost and the decoder can select only a row or a column of the matrix. If few numbers are inserted, it is easier to decode the original level of the quantizer. The matrix, anyway, contains few numbers and the source quantizer has low resolution. In figure 5.12 we show the end-to-end distortion for different values of p and different diagonals filled for R = 2 bits per descriptions. The reconstruction when a description is available is based on the third method of section 5.1.4, that is the decoder sums all the values in the row selected by the received index and then it divides the result by the number of components in the row bigger than zero. According to equation 5.11, the array contains 7 diagonals. When p is below 0.01, the matrix is completely filled and the maximum quantized value is 16. The small difference between filling the matrix with 5, 6 or 7 diagonals is due to the fact that B increases only of 1 (in fact, the sixth and the seventh diagonal contain, each one, only one element) as we can see in table 5.1. As p increases, the system leads to better performance if the matrix is not completely filled. The system reacts to the increasing probability p reducing the number of occupied cells in the matrix. When p is bigger than 0.01, three diagonals are filled, then two and, when p is bigger than 0.6 (that is, most of the times, one description is lost), one could put the same index in both descriptions. This leads to high redundancy in the system and no gain is achieved when both descriptions reach 76

5 – MDSQ coding

M=2 − R=2 bits −− Comparison of different matrix fillings −6

−8

Distortion [dB]

−10

−12

−14 diag=7 diag=6 diag=5 diag=4 diag=3

−16

−18

−20

0

0.01

0.02

0.03

0.04 P(loss)

0.05

0.06

0.07

M=2 − R=2 bits −− Comparison of different matrix fillings 0

−2

Distortion [dB]

−4

−6

−8

−10 diag=4 diag=3 diag=2 diag=1

−12

−14

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

P(loss)

Figure 5.12. M = 2 descriptions, R = 2 bits: end-to-end distortion for different index assignment

77

5 – MDSQ coding

diagonals filled 1 2 3 4 5 6 7

max quantized number (B) 4 7 10 12 14 15 16

Table 5.1. M = 2 descriptions, R = 2 bits: number of diagonals filled compared to the biggest quantized number in the matrix

destination. We tune the system by changing the number of diagonals in the matrix. This method is not a very fine tuning because the first three diagonals contain more than half of the numbers that can be inserted. We can choose the number of diagonals but we can not control the maximum quantized value. Figure 5.13 shows the case of R = 3 bits. As p increases, the number of occupied cells in the matrix decreases but, around p = 5 · 10−3 , the matrix is not completely filled as it was with R = 2 bits and 5 diagonals out of 15 are considered. Anyway, filling 5 diagonals for R = 3 bits means that more than half of the matrix is filled (34 numbers out of 64). Adding diagonals would increase the total spread and the system would not achieve a lower distortion proportionate to the increase of spread. As p decreases, more numbers will slowly populate the matrix until filling it completely.

5.1.6

Comparison with a Single Description Coding

We compare a MDSQ system with two descriptions with a system in which all the source information is sent to the destination in one packet (that is, a Single Description SD System). In the SD system the source is sampled with the highest resolution of the quantizer, that is with B = 22R different output levels and sent into a single packet. If this is successfully received at the receiver, the quality is proportionate to M · R bits. If the packet is lost, the distortion equals the source variance σ 2 because no bits are received. 78

5 – MDSQ coding

M=2 − R=3 bits −− Comparison of different matrix fillings −10

−12

Distortion [dB]

−14

−16

−18

−20 diag=5 diag=4 diag=3 diag=2

−22

−24

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

P(loss)

M=2 − R=3 bits −− Comparison of different matrix fillings −2

−4

−6

Distortion [dB]

−8

−10

−12

−14

−16 diag=4 diag=3 diag=2 diag=1

−18

−20

−22

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

P(loss)

Figure 5.13. M = 2 descriptions, R = 3 bits, end-to-end distortion for different index assignment

79

5 – MDSQ coding

The MDSQ system tunes the level of redundancy according to p to achieve the lowest end-to-end distortion. To calculate the end-to-end distortion for the SD system, we compute: Distortion = DM R · (1 − p) + σ 2 · p

(5.12)

denoting with DM R the distortion due to the source quantizer (i.e., the sum of granular distortion and overload distortion, as explained in section 3.2). Figure 5.14 shows a comparison in terms of end-to-end distortion for different values of p between the SD system and a MDSQ with R = 3 bits. The MDSQ outperforms the SD system for a wide range of values, starting from p ' 0.01. The SD system relies on a single packet so that, when this packet is lost, no information is available. When p ≤ 0.01, the index assignment matrix of the MDSQ system is completely filled and no redundancy is added. As it occurs in the UEP technique, for this packet loss probability the SD system performs better, achieving a lower end-toend distortion. This difference is due to the fact that MD techniques split the source information into several parts and losing any of these descriptions when no redundancy is added translates into losing all source information. As the rate R of each description increases, the intersection of end-to-end distortions of the systems occurs at lower values of p. For R = 4 bits, the intersection is at p ' 7 · 10−4 . This happens because the MDSQ system can tune more precisely the optimal level of redundancy to ensure a lower distortion.

5.2

The M-description case

Generalizing a MDSQ system for more than two descriptions implies the use of M > 2 different quantizers over the same input. Generalizing the M = 2 case, a M -description system can be seen as a central quantizer and an index assignment ` in M dimensions. The encoder α0 produces, for each source scalar input x ∈ R, M quantization indices (i1 ,i2 ,...iM ). Each of these indices is sent over one of the M channels and corresponds to a description. There are 2M − 1 receivers and each receiver can reconstruct an approximation of the original sample from a particular subset of received indices. 80

5 – MDSQ coding

MDSQ systems for MR=6 bits −15

Distortion [dB]

−20

−25

M=1 M=2

−30

0

0.005

0.01

0.015

0.02

0.025

0.03

P(loss)

MDSQ systems for MR=6 bits 0

−2

−4

Distortion [dB]

−6

−8

−10

−12

−14

−16 M=1 M=2

−18

−20

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

P(loss)

Figure 5.14. Comparison between a MDSQ system with two descriptions and a Single Description System with an original bitstream of 6 bits

81

5 – MDSQ coding

Equations 5.1-5.6 can be generalized in the following way. The encoder α quantizes a random variable x ∈ R: Quantizer encoder α : R → I

(5.13)

The quantized sample is an integer value Q(x) ∈ I with 1 ≤ Q(x) ≤ B. The index assignment mapping `, given Q(x), outputs M indices that are sent one for each output channel. Index assignment ` : I → I1 × I2 × . . . × IM

(5.14)

Thus, the encoder can be seen as a composition of α followed by `: Encoder mapping α0 = ` ◦ α

(5.15)

α0 : R → I1 × I2 × . . . × IM

(5.16)

where symbols in Ii are communicated over channel i, i = 1,2 . . . M . The central decoder receives all the M indices and can estimate the original source information xˆ ∈ R using a decoding mapping β0 given by: β0 : I1 × I2 × . . . × IM → R

(5.17)

While the central decoder receives all the descriptions, side decoder i (with 1 ≤ i < 2M − 1) receives a subset of the indices produced by α0 and it can be seen as a mapping βi from the subset of received indices to a coarse representation of the original value. We can denote this by: βi : Ir 1 × . . . × I r n → R

(5.18)

denoting with r1 . . . rn the n indices, out of M , that reach the side decoder i. A schematic representation with M = 3 descriptions is given in figure 5.15. The source is a sequence of values x ∈ R. For each x, the encoder outputs M = 3 indices. Decoders Dec1 , Dec2 and Dec3 receive one index; decoders Dec12 , Dec13 and Dec23 receive two indices and the last decoder receives them all. For the two-description case, we represented the indices generated by the encoder as indices of row and column of the index assignment matrix. Now, with M descriptions, the M indices produced by the index assignment ` can be seen as indices of 82

5 – MDSQ coding Channel 1

Source

Encoder

Channel 2

Channel 3

Dec1

Dec12

Dec2

Dec13

Dec3

Dec23

Dec123

Figure 5.15. A MD system with M = 3 descriptions. Each decoder D can reconstruct an approximation of the original source receiving a subset of indices sent by the encoder.

a M -dimensional cube. Each index denotes an hyper-plane of the index assignment hyper-cube. This generalizes Vaishampayan’s methods [21]. The hyper-cube has M dimensions, each one of size N = 2R . In the hyper-cube we can put up to N M = 2M R different numbers each one corresponding to an output level of the quantizer. So, the quantizer will have no more than B = 2M R possible output levels. The main design problem is to find an assignment ` that guarantees, given p, a good trade-off between distortions achieved with the receptions of different subsets of indices. Our goal is to fill an M -dimensional cube with no more than 2M R numbers in ascending order starting from 1 so that the difference in all the hyper-plane is as low as possible. This will guarantee that if any description is lost, the receiver can reconstruct the best coarse version of the value of the source. In fact, if one index is received, the decoder knows in which (M-1)-dimensional hyperplane the original quantized value is. As the number of received descriptions increases, the decoder can get a more precisely position of the cell in which lies the original quantized value. Considering M = 3 descriptions, if one index is received, the decoder knows the plane of the cube in which the cell containing the original quantized value is located. When two descriptions are received, a row into the plane is selected and when all the descriptions reach the destination, the correct cell in the row is selected by the decoder. Figure 5.16 shows an example with M = 3 descriptions and R = 1 bit per description. Suppose that an input x has been quantized with the value 5 by the encoder quantizer. In the index assignment cube, this value is unequivocally 83

5 – MDSQ coding i2

1 2

5

4

3 6

i3

i1

7

8

Figure 5.16.

An index assignment cube for R = 1 bit and M = 3 descriptions

identified by coordinates i1 = 1, i2 = 1, i3 = 2 (the origin of the frame of reference has coordinates i1 = 1, i2 = 1, i3 = 1). The encoder sends each of these indices in a different description. At the receiver, if one index is received, for example, i2 = 1, the decoder knows that the quantized value is in the plane that contains (3,5,7,8). Then, if in addition to i2 , also i1 = 1 is received, the decoder can select (3,5) and the uncertainty is reduced. If also the remaining index reaches the destination, then the receiver knows exactly that the quantized value was 5.

5.2.1

Construction of the hyper-cube

We use the technique proposed in [1] to fill the M -dimensional hyper-cube with 2M R numbers so that the hyper-cube will be completely filled. The algorithm proposed in [1] deals with arrangements that completely fill an hyper-cube and it has been proven to be nearly optimal. Arrangements of less than 2M R numbers are not addressed. Thus, to tune the system to achieve good trade-off between central and side distortions, we can delete some numbers from the complete hyper-cube. As explained in [1], the problem of completely fill an hyper-cube to minimize the spread in each row is equivalent to bandwidth minimization in graphs and, in general, it is NP-complete [15]. The proposed arrangement guarantees a spread in each row at most: µ ¶M N N M B(K2 ) · + −1 (5.19) 2 2 where B(K2M )

=

M −1 µ X t=0

84

¶ t bt/2c

5 – MDSQ coding

and N is the length of each dimension (in our case, N = 2R ). First, the algorithm divides the cube into 2M small cubes by dividing each coordinate nt (1 ≤ t ≤ M , nt = N, ∀n) into two halves. Then, it fills the small cube that contains coordinate (1,1, . . . ,1) in this way: 1. A(1) = (1, . . . ,1) 2. The first coordinate of A(m) is the first coordinate of A(m − 1) plus 1, modulo nt /2; if, after the modulo operator, the t-th coordinate becomes 1, then the (t+1 )-th coordinate increases by 1 modulo nt+1 /2. denoting with A a function that given a number returns the coordinates of this number in the selected small cube. This completely fills the first small cube. The other small cubes are filled like the first one so that, at each step, the algorithm numbers a neighbor of the smallest already numbered vertex, taking care that the maximum difference occurs between the vertices along the first coordinate n1 . We notice that the algorithm proposed in [1] keeps the spread in each row as low as possible, without taking into account the spread in each hyper-plane of the hyper-cube.

5.2.2

Complete filling for three descriptions

When M = 3, the problem is reduced to find a numbering for a N × N × N cube. In this case, we have: ¶ 2 µ X t 3 B(K2 ) = =4 (5.20) bt/2c t=0 and the algorithm is shown schematically in figure 5.17. Figure 5.17(a) shows the order in which the algorithm considers each hyper-plane of the cube to guarantee the spread in equation 5.19 and figure 5.17(b) shows how each small cube is filled. An example of index assignment cube with M = 3 descriptions and N = 4 (that implies a rate for each description of R = log2 N = 2 bits) is given in figure 5.18. The cube is filled according to figure 5.17. Row index is n1 , column index is n2 and n3 refers to the third dimension of the cube. According to equation 5.19, the maximum spread guaranteed in each row is 33. We also note that the maximum spread in each plane is 51. 85

5 – MDSQ coding

n3 1 3

2

n2 n1

5 4

2

6 1

7

(b) Filling of small cubes

(a) Split of a cube

Figure 5.17.

1 2 25 26

3 4 27 28

17 18 49 50

(a) n3 = 1

19 20 51 52

3

8

Schematic representation of 3-dimensional arrangement

5 6 29 30

7 8 31 32

21 22 53 54

23 24 55 56

9 10 41 42

(b) n3 = 2

11 12 43 44

33 34 57 58

(c) n3 = 3

35 36 59 60

13 14 45 46

15 16 47 48

37 38 61 62

39 40 63 64

(d) n3 = 4

Figure 5.18. Example of a completely matrix filling for N = 4. Row index is n1 , column index is n2 and n3 refers to the third dimension of the cube

5.2.3

The incomplete cube

From now on, we focus our attention to the case of M = 3. So, we have to find arrangements of less than N 3 numbers in a cube of size N = 2R . According to Vaishampayan’s method, to get good trade-off between central and side distortions, our idea is to remove numbers in the outer cells of the completely filled cube to decrease the maximum spread in each row and plane. This is equivalent to start inserting numbers from the main diagonal outward. As we remove numbers, side distortions will decrease and the central distortion will increase. We want to tune central and side distortions to have low end-to-end distortion according to p. When p is quite high, low values of side distortions are required because some indices will be lost and the end-to-end distortion will highly depend on side distortions. When p is low, most of the times all descriptions are received and we want low values of central distortion. The idea is to consider all the diagonals of the cube and start filling from the inner diagonal outward, as done for the two-description case. The first diagonal to be filled is always the central one that includes cells with indices (i,i,i), for 1 ≤ i ≤ N . 86

5 – MDSQ coding

The next diagonals to be filled are the nearest to the central one and include cells that differ of ±1 in one position, that is (i ± 1,i,i), (i,i ± 1,i), (i,i,i ± 1). If we need to fill with more numbers the cube, then the algorithm selects diagonals that differ in many positions and for larger values from the main diagonal. It’s important to notice that, as we move toward the bounds of the cube, the diagonals contains fewer values. The total number of diagonals D in the cube is given by: D = (2N − 1)2 − N (N − 1)

(5.21)

The input of the algorithm that fills the cube is the number of diagonals that should be filled. Thus, it scans the cube according to the path for the complete filled cube and inserts the current number in a given cell if this cell belongs to a diagonal that has to be filled. Otherwise, the cell is skipped and the next one is selected. Numbers are always taken in ascending order starting from 1 and if a number is inserted in a cell, then the next number is considered. Let’s see now how the spread in each row and in each plane increases with the number of diagonals filled. We consider M = 3 descriptions, each one of rate R = 2 bits. In figure 5.19 the spread is plotted as function of the number of diagonals filled. The spread is approximately linear with the number of diagonals. Figure 5.20 shows the maximum spread guaranteed by the algorithm as function of the biggest number inserted in the cube. We notice that, when 10 diagonals out of 37 are filled, half the cube is already completely filled. In fact, the first diagonals contain many cells while, as we move to the bounds of the cube, diagonals contain few numbers. As for the two-description case, this method is not a very fine tuning because the first diagonals contain many numbers while the outer diagonals contain few. We can choose the number of diagonals but we can not control the maximum quantized value so that we can not insert precisely in the cube as many numbers as we would.

5.2.4

System for three descriptions

The encoder of the MDSQ system, given the probability p of losing a packet in the network, modifies the index assignment cube and sends the three indices that refer to that particular quantized sample. The encoder selects the index assignment ` that minimizes the end-to-end distortion with an exhaustive search through all the possible fillings of the cube considering different number of diagonals. In equation 87

5 – MDSQ coding

Rate=2 bits, 4 x 4 x 4 matrix −− M=3 descriptions 70 Max spread in each line (rx 2 pkt) Max spread in each plane (rx 1 pkt) Max spread in the cube (rx 0 pkt) 60

50

Spread

40

30

20

10

0

0

5

10

15

20

25

30

35

40

Number of diagonals filled

Figure 5.19. M = 3 descriptions, R = 2 bits, spread vs number of diagonals filled R=2 bits, 4 x 4 x 4 matrix −− M=3 descriptions Max spread in each line (rx 2 pkt) Max spread in each plane (rx 1 pkt) Max spread in the cube (rx 0 pkt)

60

50

Spread

40

30

20

10

0

0

10

20

30

40

50

60

Max quantized value

Figure 5.20. M = 3 descriptions, R = 2 bits, spread vs number of occupied cells

88

5 – MDSQ coding

5.21, the size of the cube N (that is, the rate R of each description) is in function of the number of diagonals. The encoder calculates the end-to-end distortion for each of these possible fillings. As we can note, complexity grows as O(22R ). The decoder has a copy of the index assignment cube so that, when it receives a subset of indices, it can select a cell, a row or a plane in the cube and thus it can decode an approximate version of the original source, according to the number of received indices. We notice that the index assignment cube must be kept updated at the receiver as the copy at the encoder is modified. To reconstruct an approximation of the original sample given the indices received, the decoder can use centroids or analogous techniques as seen in section 5.1.4. As for M = 2 descriptions, the system computes the mean of all the possible values selected by the received indices and then decodes its value. This allows the decoder to compute faster the values that should be decoded for any possible subset of received values, given the index assignment cube. This can be done before the transmission starts but, anyway, it requires the computation of the mean of values in 3N planes and in 3N 2 rows with N = 2R . The computation is at least exponential with the rate R of each description.

5.2.5

Implementation for R = 2 bits

According to equation 5.21, for R = 2 bits per description (N = 2R = 4), the encoder fills the index assignment cube in 37 different ways, each one corresponding to a particular number of diagonals filled (from one to 37 diagonals filled). For each filling, it calculates the average end-to-end distortion for 3N = 12 planes and 3N 2 = 48 rows. The encoder selects the index assignment that guarantees, given p, the lowest end-to-end distortion. Then, for each input quantized value, it sends the three indices that univocally identify its coordinates in the cube. The encoder needs to know which cells should be occupied, given the number of diagonals to fill. Thus, the encoder maps all the cells into points of a new frame of reference. Each cell of the index assignment cube, in fact, is represented by coordinates (i1 ,i2 ,i3 ). The algorithm, from these coordinates, changes frame of reference to (i1 − i3 ,i2 − i3 ,0) = (i1 − i3 ,i2 − i3 ). The mapping for R = 2 bits is shown in figure 5.21. Each coordinate is labelled with a number from 1 to 37. 89

5 – MDSQ coding 37

35

31

24

34

23

21

10

14

30

20

9

2

6

15

3

11

25

16

28

(-1,0)

27

13

5

1

(0,-1) (0,0)

19

8 18

(0,1)

4

7

(1,0)

(1,1)

12

17

22

32

26

29

33

36

Figure 5.21. Order of diagonals filled for an index assignment cube with R = 2 bits. The number above each point represents the label assigned by the algorithm and the coordinate in the new frame of reference is indicated below the point.

For example, after the change of frame of reference, cells on the main diagonal are referred to as (0,0). We see that (0,0), in the new frame of reference, is labelled as 1. If, for example, 1 diagonal has to be filled, those cells has to be occupied. Cells for which the new coordinate is (0,1) will contain a value if the number of diagonals to be filled is equal or greater than 3. To fill the index assignment cube, the encoder scans it according to the complete filled cube of section 5.2.2 from (1,1,1) to (N,N,N ) and insert a number if the cell belongs to a diagonal that has to be considered. The decoder uses the same index assignment cube. It receives a subset of indices and estimate the original sample as seen in section 5.2.4. In our simulation, when the best end-to-end distortion (for a given p) is achieved with 3 diagonals filled, 10 numbers are inserted in the index assignment cube. The arrangement is shown in figure 5.22 and guarantees a maximum spread in each plane and in each row of 2. The equivalent encoder α0 is shown in figure 5.23. The quantizer encoder α generates 10 output levels sampling a Gaussian random variable with zero mean and unit variance. Then, through the index assignment `, three indices are generated. Each of them represents a coarse version of the source. For example, the quantized value 8 is represented by (i1 ,i2 ,i3 ) = (3,4,3). The index assignment can be seen as 90

5 – MDSQ coding

1

2 3

4

(a) i3 = 1

5 6

7

(b) i3 = 2

8 9

10

(c) i3 = 3

(d) i3 = 4

Figure 5.22. Index assignment for the MDSQ system in figure 5.23. Row index is i1 , column index is i2 and i3 refers to the third dimension of the cube. α

1

i3

3

4

5

6

2

1

i1 i2

2

1

8

9

3

2 1

7

4

3 2

10

4 3

4

Figure 5.23. Central quantizer encoder α and side quantizers obtained by an index assignment ` for M = 3 descriptions each one of rate R = 2 bits

an arrangement of numbers in a cube (figure 5.22) or as three different quantizers over the same source information (figure 5.23). We note that the 10 different levels of the central encoder can be represented with dlog2 10e = 4 bits while the total output is of M R = 6 bits (each of the three quantizers, in fact, uses dlog2 4e = 2 bits). This shows that, in order to guarantee robustness to loss of descriptions, some redundancy has been added. Let’s consider the example when the source is quantized by α with the value 8. We can note that if the decoder receives i1 = 3 and i2 = 4, it can already decode the value 8. The third index does not add new information. This happens because few numbers are inserted into the cube. This arrangement can, thus, be used when most of the times one description out of three is lost. In figure 5.24 we show a plot of the central distortion (Dc ) and the two side distortions (when one packet is received Ds1 and when two out of three reach the destination Ds2 ). The behavior is coherent to our initial considerations. When the number of diagonals increases, the central distortion decreases and side distortions increase. When the first diagonal is the only one to be filled, we obtain the same 91

5 – MDSQ coding

M=3 descriptions − R=2 bits 0

−5

Distortion [dB]

−10

−15

−20 Dc Ds2 Ds1

−25

−30

0

5

10

15

20 Diagonals filled

25

30

35

Figure 5.24. M = 3 descriptions, R = 2 bits. Plot of central and side distortions as function of number of diagonals filled.

value for these three distortions because the same index is repeated in all the descriptions and one is already sufficient to decode a coarse value of the original sample. No improvement is achieved when more are received. As the number of diagonals increases, Ds1 is around 0 dB. Therefore, it is not possible to estimate with sufficient precision the original source with the reception of only one description.

5.3

Simulation performance

Due to the exponential implementation complexity, we compare a 2-description MDSQ system with a 3-description one. Moreover, the complexity of our implementation, even with M = 3 descriptions, is still exponential with the number of bits R of each description. We will use R = 2 bits for M = 3 descriptions and R = 3 bits for M = 2. The original value is quantized with M R = 6 bits. Figure 5.25 shows a plot of the end-to-end distortion for different values of p. For very low or very high probability p, the systems lead to the same end-to-end distortion. In fact, for low values of p, the number of diagonals filled is high in both systems and most 92

5 – MDSQ coding

of the times all the indices will be received and consequently, the distortion will be proportionate to the quality achieved with M R bits. When p is high, both systems introduce high redundancy creating very similar descriptions and consequently, even if more than one description is received, no gain in terms of reduced distortion is achieved. In the range of values between p ' 7 · 10−4 and p ' 0.5, the 2-description system gains up to approximately 3 dB over the 3-description system. This can be caused by the fact that we don’t know what is the optimal way to fill the index assignment cube. Figure 5.26 plots the number of diagonals as function of p for the two systems. As p increases, the algorithm chooses an index assignment with a decreasing number of diagonals to get the best end-to-end distortion. In fact, as p increases, the probability of receiving a small subset of indices increases. Few numbers are inserted in the index assignment so that it is possible to estimate a coarse version of the original value from few indices. We note also that for a wide range of values of p (from p ' 0.1 to p ' 0.5) the number of diagonals filled is constant while other diagonal fillings never lead to the best end-to-end distortion. In fact, fixing a spread, there are some arrangements that guarantee this spread inserting more numbers than other with the same spread. The former are always chosen instead of the latter. The algorithm we used to fill the index assignment cube minimizes the spread in each row and not in each plane of the cube. This leads to some possible sub-filling of the cube where the spread in some plane can be high. As seen in figure 5.24, side distortion when one index is received (i.e., the decoder has to select a plane) quickly increases starting from a low number of diagonals filled. To quantify how much this problem affects the overall system, when R = 2 bits, it is possible to find by hand, good arrangements of numbers that guarantee low spreads in each plane of the cube. In particular, we have found by hand some arrangements that guarantee a given spread with as many numbers as possible in the cube. An example is given in figure 5.27. There, the arrangement guarantees a spread of 3 inserting up to B = 12 numbers. Substituting the arrangements produced by the algorithm with these others, the distortion of the 3-description system is reduced of approximately 0.5 dB around p ' 0.1 but it’s still higher than the 2-description case. The 3description MDSQ system leads to higher end-to-end distortion not only because the index assignment is not optimal. Performance are worse also because we shouldn’t 93

5 – MDSQ coding

Comparison of 2 MDSQ systems −− MR=6 bits −20

−21

−22

Distortion [dB]

−23

−24

−25

−26

−27

−28 M=2 M=3

−29

−30

0

0.5

1

1.5

2

2.5

3 −3

x 10

P(loss)

MDSQ systems −− MR=6 bits 0

−5

Distortion [dB]

−10

−15

−20

−25 M=2 M=3 −30

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

P(loss)

Figure 5.25.

Comparison between MDSQ systems with 2 and 3 descriptions and M R = 6 bits

94

5 – MDSQ coding

MR=6 bits 40

35

Number of diagonals filled

30

25

20

15

M=2 M=3

10

5

0 −4 10

−3

−2

10

10 P(loss)

−1

0

10

10

Figure 5.26. M = 2 and M = 3 descriptions: number of diagonals filled as function of p.

1 2

3 4

(a) i3 = 1

Figure 5.27.

5 6

7

8 10

(b) i3 = 2

(c) i3 = 3

9 11

12 (d) i3 = 4

An index assignment with spread 3 and B = 12 output levels

compare performance of these systems with a value of R so small. This generates too much fragmentation of the original sample and taking subsets of these fragments doesn’t lead to a good approximation of the source. We also compare the performance with M = 2 and M = 3 descriptions with B = 2M R numbers, that is the matrix and the cube are completely filled. Then, for each p, we check the end-to-end distortion. The arrangement for the completely filled matrix is optimal and the arrangement for the cube is nearly optimal, as proven in [1]. Results are shown in figure 5.28. We see that the systems lead to approximately the same end-to-end distortion and the two-description case is slightly better. This 95

5 – MDSQ coding

M=2 R=6 bits vs M=3 R=4 bits 0

−5

Distortion [dB]

−10

−15

−20

−25

M=2 M=3

−30

−35

0

0.02

0.04

0.06

0.08

0.1 P(loss)

0.12

0.14

0.16

0.18

0.2

Figure 5.28. M = 2 and M = 3 descriptions: completely filled arrangement

could be explained by the fact that, for M = 3 descriptions, the side distortions are very high as seen in figure 5.24 even when the probability p is low (many diagonals filled) and this does not allow to achieve low end-to-end distortion.

96

Chapter 6 Comparison and conclusions MD coding is a powerful approach to combat packet losses in networks where retransmission is not always possible and it’s, thus, suitable for real-time applications like audio and video broadcasting. MD coding can be applied over large diversity networks where packets can flow from the source to the destination through many different paths and the probability of losing packets over these paths is not negligible. In this project, we have studied the generalization of two well-known MD techniques to handle many descriptions. We have seen that the critical step in both methods is to insert the convenient level of redundancy in each description according to network conditions.

6.1

UEP coding

The proposed UEP technique allows to produce many descriptions from a progressive bitstream. Results show that using a large number of descriptions leads to lower endto-end distortion for a wide range of values of probability of failure of the network and this coding in general outperforms techniques that rely on a single packet to deliver all the information. The algorithm to get the optimal level of redundancy adapts its output to network conditions so that no more than the needed redundancy is added. Moreover, its complexity is linear with the number of descriptions and this allows to quickly adapt to new network conditions and to a different number of descriptions. As it concerns our practical implementation of the system, the simulator can be 97

6 – Comparison and conclusions p

Source

Quantizer

UEP coder

Figure 6.1.

network

UEP decoder

UEP system

improved to handle longer input bitstreams of practical use. This allows the system to give performance very close to theoretical bounds. For our goals, we have supposed that all the Reed Solomon codes requested by the MD system exist and we haven’t implemented a real RS codec. Using sufficiently long input bitstreams and packets, it is possible to create the required Reed Solomon codes with standard software. To adopt this technique over a network where packets of different length are handled, we should check that the probability p of dropping a small packet is the same of dropping a bigger one, as we suppose along this project. In general, when a network becomes congested, routers become short with memory to store packets and they would start dropping all packets that receive, whether they are small or big. To make comparisons fair, the MD system requires the total number of output bits of the UEP coder to be constant, independently from the number of descriptions we decide to use. If we want to use many, in order to keep the output rate fixed we have to put few bits of information in each packet. This could lead to a total length/payload ratio not very profitable. This gives a bound on the number of descriptions to use in our system. Moreover, all the descriptions need an overhead that contains an identifier of the description. In fact, the decoder needs to know at which layer the received bits belongs to. When the ζ array is updated at the encoder, also the decoder needs to be updated to know the new configuration of layers. A buffer at the receiver would also be necessary to store bits that have to wait a specific number of descriptions before being decoded to ensure a particular quality layer. Our implementation supposes that each packet contains a description. In general, however, each packet could contain many descriptions and this would allow to decode different quality layers even from the first packets that reach the destination. 98

6 – Comparison and conclusions p

Source

Index Assignment

Quantizer

network

MDSQ coder

Figure 6.2.

6.2

Reverse Index Assignment

MDSQ decoder

MDSQ system

MDSQ coding

The main limitation in increasing the number of descriptions M in a MDSQ system is the exponential complexity with both the number of descriptions and the rate R. Minimizing the end-to-end distortion, given the probability p, in fact, implies getting the optimal trade-off between distortions in terms of diagonals filled in the M dimensional index assignment hypercube. The number of diagonals to check quickly increases with both M and R. Given an arrangement of numbers in the hyper-cube, no method is known to check if it is a good arrangement that minimizes the spread in all the hyper-planes. The index assignment algorithm, in fact, should look within a set of good possible assignments and choose the best one according to network conditions. Our algorithm, both for M = 2 and M = 3 descriptions, checks different arrangements of numbers starting from a completely filled matrix (or cube) where the spread in each row and column is as low as possible (and, for M = 2, it has been proven to be optimal). For two descriptions, the constraint on the spread in each row is well suited because there are two dimensions. For M = 3 descriptions, it would be better to start from a completely filled cube where the spread is minimized in planes so that we can control better the side distortion when only one description is received. Moreover, minimizing the spread in a plane would also imply a suitable spread in all the rows belonging to it. Even considering a hyper-cube in which the spread in each plane is the lowest possible, we don’t know if starting from the completely filled arrangement and removing cells starting from the outer, would guarantee optimality also for the subfilling arrangements. We could improve the index assignment algorithm so that, when network conditions change, the encoder doesn’t have to re-scan all possible arrangements to get 99

6 – Comparison and conclusions

M=2 descriptions − R=3 bits −5

Distortion [dB]

−10

−15

−20

UEP simulated MDSQ simulated UEP theoretical MDSQ theoretical

−25

−30

0

0.05

0.1

0.15

0.2

0.25 P(loss)

0.3

0.35

0.4

0.45

0.5

Figure 6.3. M = 2 descriptions, R = 3 bits per description. Comparison of end-to-end distortion for MDSQ and UEP

the new best one but it should be sufficient to look only in a smaller subset.

6.3

Theoretical bounds for UEP coding

We now show how, from a theoretical point of view, UEP coding leads to higher end-to-end distortion compare to MDSQ at least for the two-description case. Figure 6.3 shows the end-to-end distortion as function of p for MDSQ and UEP with M = 2 descriptions and R = 3 bits. In this figure we plotted both the theoretical and the simulated performance. We see that MDSQ coding outperforms UEP method and the best theoretical UEP coder leads to higher end-to-end distortion than MDSQ. In figure 6.4 we compare the trade-off between central and side distortion for these systems. We note that the MDSQ system achieves pairs of Dc and Ds that guarantee better results. For example, fixing the central distortion to Dc ' −30 dB, it is possible to find an index arrangement for MDSQ that leads to Ds ' −12 dB. An UEP system with the same central distortion has higher side distortion (approximately Ds ' −6 dB). 100

6 – Comparison and conclusions

MDSQ vs UEP Evaluation −− M=2 descriptions − R=3 bits 0 UEP MDSQ

−2

−4

Side Distortion [dB]

−6

−8

−10

−12

−14

−16

−18

−20 −38

−36

−34

−32

−30

−28 −26 Central Distortion [dB]

−24

−22

−20

−18

Figure 6.4. M = 2 descriptions, R = 3 bits per description. Trade-off between central and side distortion for UEP and MDSQ coding.

The MDSQ coder receives as input the original source information and quantizes it with M different quantizers. The UEP coder receives as input a quantized representation of the source and create descriptions with this representation. Figures 6.1 and 6.2 show a schematic representation of the systems and we can see how the probability p modifies the output of the encoder. For UEP, the quantizer is outside the MD-UEP coder and it has always 2M R different output levels. In the MDSQ system, on the other side, the central quantizer is included into the encoder and it modifies its output levels according to p. UEP coding can be seen as a MDSQ technique with a particular index assignment. Let’s consider, in a MD-UEP system, a memoryless Gaussian source with squared error distortion with two descriptions. We consider the case where, if the description that contains the first R bits of the original bitstream is received, we can decode all the R bits and not only the first ζR. As seen in section 4.3.1 we obtain: D1 = σ 2 · 2−2R D2 = σ 2 · 2−2ζR D0 = σ 2 · 2−2(2−ζ)R 101

(6.1)

6 – Comparison and conclusions

0 1 2 3 Figure 6.5.

0 0 2

1 1 3

2

3

4 6

5 7

Index assignment matrix for a MD-UEP system with M = 2 descriptions and R = 2 bits with ζ = 0.5

Requiring D1 and D2 to match these bounds, we can compare the D0 obtained with the bound of section 3.16 that is: D0 ≥ σ 2 2−2(R1 +R2 ) · γD (R1 ,R2 ,D1 ,D2 ) where γD =

(6.2)

1 p p 1 − ( (1 − D1 )(1 − D2 ) − D1 D2 − 2−2(R1 +R2 ) )2

The maximum over all rates and side distortions of the difference between D0 in equations 6.1 and 6.2 is bounded at about 4.2 dB [6]. This can be taken as a positive result but the gap is significant. To understand better this gap, we can look at UEP in the context of scalar quantization, considering a regular eight-level quantizer with outputs numbered consecutively from 0 to 7. We write the output as a binary number (b2 b1 b0 )2 . The most significant bit b2 has the most effect on the distortion and could hence be included in both descriptions while the remaining bits are each included in one description. This gives descriptions (b2 b1 )2 and (b2 b0 )2 . These descriptions, obtained with UEP, can be associated with MD scalar quantizers by letting α1 = (b2 b1 )2 and α2 = (b2 b0 )2 . In figure 6.5 the index assignment matrix for this quantizer is shown. This index assignment is good because it fills entries close to the diagonal, from top-left to bottom-right, but other index assignments with more cells along diagonals lead to better performance. For example, in figure 6.6 we show a possible index assignment with the same spread (s = 2) but with more numbers. This index assignment leads to lower end-to-end distortion than the one achieved by the UEP technique. In a more general case with b most significant bits repeated over both channels, the index assignment matrix is a block diagonal with b blocks. This explains why MD-UEP techniques (at least for M = 2 descriptions) lead to higher average endto-end distortion than MDSQ systems. 102

6 – Comparison and conclusions

0 1 2 3 Figure 6.6.

6.4

0 0 1

1 2 3 4

2

3

5 7

6 8

An index assignment matrix for M = 2 descriptions and R = 2 bits with spread 2

Future work

Future work on UEP technique should improves the simulator to generate packets to fit a real application environment and to assess faithfully the level of redundancy introduced by the system. These techniques should be applied to a real progressive bitstream including JPEG 2000 [20] or SPIHT [18]. As concerning MDSQ coder, future work should be focused on developing good algorithms to improve the index assignment and to reduce its complexity. With these algorithms it could be possible to scale MDSQ to more than three descriptions and to rate of practical interest.

103

Bibliography [1] T. Y. Berger-Wolf and M. A. Harris. Sharp bounds for bandwidth of clique products. Submitted to SIAM Journal of Discrete Mathematics, October 2002. [2] D. P. Bertsekas. Non Linear Programming. Athena Scientific, 1995. [3] T. M. Cover and J. A. Thomas. Elements of Information Theory. John Wiley and Sons, New York, 1991. [4] S. Deering and R. Hinden. Internet protocol, version 6 specification. Network Working Group Request for Comment 1883, December 1995. [5] A.A. El Gamal and T.M. Cover. Achievable rates for multiple descriptions. IEEE Trans. Inform. Theory, 28:851–857, November 1982. [6] V.K. Goyal. Multiple description coding: Compression meets the network. IEEE Signal Processing Magazine, 18(5):74–93, September 2001. [7] V.K. Goyal. Theoretical foundations of transform coding. IEEE Signal Proceeding Mag., 18:9–21, September 2001. [8] V.K. Goyal, J.Kovacevic, and M.Vetterli. Multiple description transform coding: Robustness to erasures using tight frame expansions. Proc. IEEE Int. Symp. Information Theory, Cambridge, MA, page 408, 1998. [9] S. Lin and D. J. Costello. Error Control Coding: Fundamentals and Applications. Prentice-Hall, NJ, 1983. [10] S.P. Lloyd. Least square quantization in pcm. IEEE Trans. Inform. Th., IT-28(2):129–137, March 1982. Originally an unpublished Bell Telephone Laboratories tech.memo, July 31, 1957. [11] F. J. Macwilliams and N. J. Sloane. The Theory of Error Correcting Codes. Elseiver North-Holland, 1977. [12] M.Fleming and M. Effros. Generalized multiple description vector quantization. In IEEE Data Compression Conference, Snowbird, Utah, pages 3–12, March 1999. 104

Bibliography

[13] A. Mohr, E. Riskin, and R. Ladner. Unequal loss protection: Graceful degradation of image quality over packet erasure channels through forward error correction. IEEE Journal on Selected Areas in Communications, 18(6):819–828, June 2000. [14] L. Ozarow. On a source coding problem with two channels and three receivers. Bell Syst. Tech. J., 59(10):1909–1921, December 1980. [15] C.H. Papadimitriou. The np-completeness of the bandwidth minimization problem. Computing, (16):263–270, 1976. [16] S.S. Pradhan, R. Puri, and K. Ramchandran. (n,k) source channel erasure codes: Can parity bits also refine quality? Proceedings Conference on Information Sciences and Systems (CISS), Baltimore, MD, March 2001. [17] R. Puri, T. Kim, and K. Ramchandran. Multiple description source coding using forward error correction. In 33rd Asilomar Conference on Signals, Systems and Computer, volume 1, Oct. 1999. [18] A. Said and W.A. Pearlman. A new, fast and efficient image codec based on set partitioning in hierarchical trees. IEEE Transactions on Circuits and Systems for Video Technology, 6(3):243–250, June 1996. [19] C.E. Shannon. A mathematical theory of communication. Bell Syst. Tech. J., (27):379–423, July 1948. Continued 27:623-656 October, 1948. [20] D.S. Taubman and M.W. Marcellin. JPEG2000: Image Compression Fundamentals, Standards and Practice. Kluwer, 2001. [21] V. A. Vaishampayan. Design of multiple description scalar quantizers. IEEE Trans. Inform. Th., 39(3):821–834, May 1993. [22] V.A. Vaishampayan, N.J.A. Sloane, and S.D. Servetto. Multiple description vector quantization with lattice codebook: design and analysis. IEEE Trans. Inform. Th., 47(5):1718–1734, July 2001. [23] R. Venkataramani, G. Kramer, and V.K. Goyal. Bounds on the achievable region for certain multiple description coding problems. In IEEE Int. Symp. Information Theory, Washington, DC, page 148, June 2001.

105