Convex coders and oversampled A/D conversion: theory ... - CiteSeerX

1 downloads 15 Views 652KB Size Report
Dec 16, 1991 - of these sets gives us theoretical means to achieve estimates which have the ...... This theory also provides a charac- ...... 4.5.5 Recapitulation.
Convex coders and oversampled A/D conversion: theory and algorithms Nguyen T.Thao and Martin Vetterli 1 Department of Electrical Engineering and Center for Telecommunications Research Columbia University, New York, NY 10027-6699 December 16, 1991

1 Work supported in part by the National Science Foundation under grant ECD-88-11111.

Abstract Signal reconstruction in oversampled A/D conversion (ADC) is classically performed by lowpass ltering the quantized signal. This leads to a mean squared error (MSE) inversely proportional to R2n+1 where R is the oversampling rate and n is the order of the converter. However, while the estimate given by this reconstruction has the same bandwidth as that of the original analog signal, we show that it does not necessarily lead to the same digital sequence when fed into the same A/D converter. Moreover, under some assumptions, we show analytically that an estimate having the same bandwidth and giving the same digital sequence as those of the original signal should yield an MSE with an upper bound inversely proportional to R2n+2 , instead of R2n+1 ; that is an improvement of 3 dB per octave of oversampling, regardless of the order of the converter. We propose a structural analysis covering most currently known con gurations of oversampled ADC, which enables us to identify sets of input analog signals giving the same digital sequences. This is based on the direct knowledge of the conversion mechanisms and includes circuit imperfections if they can be measured. The inherent property of convexity of these sets gives us theoretical means to achieve estimates which have the same bandwidth as and reproduce the digital sequence of a given input signal. We designed computer implementable algorithms which achieve such estimates. Using these algorithms, we performed numerical experiments on sinusoidal inputs which con rm the R2n+2 behavior of the signal reconstruction MSE. Moreover, we do not observe any performance decay in keeping this R2n+2 behavior, when converters include circuit imperfections up to 1%.

Contents 1 Introduction 2 Analysis of ADC signal partition 2.1 2.2 2.3 2.4 2.5 2.6

Simple ADC : : : : : : : : : : : : : : : : : : : : : : : : : : : Single-quantizer conversion : : : : : : : : : : : : : : : : : : Convexity of signal cells and projection : : : : : : : : : : : Signal cell of a code subsequence, convexity and projection Multi-quantizer conversion : : : : : : : : : : : : : : : : : : : Conclusion : : : : : : : : : : : : : : : : : : : : : : : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

3 MSE analysis of optimal signal reconstruction in oversampled ADC

Modelization of bandlimited and periodic signals (V0) : : : : : : : : : : : : : Simple ADC : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : nthth order single-quantizer converter : : : : : : : : : : : : : : : : : : : : : : : : n order multi-quantizer converter : : : : : : : : : : : : : : : : : : : : : : : : Conclusion : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 Algorithms for the projection on ?I (C0) 4.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4.2 Speci c formulation of the constrained minimization problem and properties : 4.3 Zeroth order converter (simple, dithered, predictive ADC) : : : : : : : : : : : 4.4 1st order converter (1st order  modulation) : : : : : : : : : : : : : : : : : : 4.5 Higher order converter (nth order  modulation) : : : : : : : : : : : : : : : 4.5.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4.5.2 Recursion principle and rst algorithm : : : : : : : : : : : : : : : : : : 4.5.3 More practical version for algorithm 3.1 : : : : : : : : : : : : : : : : : 0 ; Vp0 : : : : : : : : : : : : : : : : : : : : : : : : : : 4.5.4 Computation of Wp;i 4.5.5 Recapitulation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4.5.6 Algorithms 3.4, 3.5 with limited bu er : : : : : : : : : : : : : : : : : : 3.1 3.2 3.3 3.4 3.5

5 Numerical experiments 5.1 5.2 5.3 5.4 5.5

Validation of the R?(2n+2) behavior of the MSE with sinusoidal inputs Eciency of projection algorithms for n  2 : : : : : : : : : : : : : : : Speci c tests for 1st and 2nd order  modulation : : : : : : : : : : : Further improvements with relaxation coecients : : : : : : : : : : : : Circuit imperfections : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1

: : : : :

: : : : :

: : : : :

: : : : :

3 10

10 11 14 17 19 21

23

23 24 26 34 35

36

36 37 40 40 42 42 43 45 48 49 50

52

52 53 54 55 55

A Appendix

A.1 Proof lemmas 3.2 and 3.5 : : : : : : : : : : : : : : : : : A.1.1 Preliminary notations and lemmas : : : : : : : : A.1.2 Proof of lemma 3.2 : : : : : : : : : : : : : : : : A.1.3 Proof of lemma 3.5 : : : : : : : : : : : : : : : : A.1.4 Proof of lemma A.2 : : : : : : : : : : : : : : : : A.1.5 Proof of lemma A.5 : : : : : : : : : : : : : : : : A.2 Proof of fact 4.5 : : : : : : : : : : : : : : : : : : : : : : A.3 Proof of fact 4.7 : : : : : : : : : : : : : : : : : : : : : : A.4 Proof of fact 4.12 : : : : : : : : : : : : : : : : : : : : : A.4.1 Preliminaries : : : : : : : : : : : : : : : : : : : : A.4.2 Proof of fact 4.12 : : : : : : : : : : : : : : : : : A.4.3 Analytical expressions of M (l) and N (l) versus l A.4.4 Proof of fact A.11 : : : : : : : : : : : : : : : : : A.4.5 Proof of fact A.12 : : : : : : : : : : : : : : : : : A.5 Proof of fact 4.14 : : : : : : : : : : : : : : : : : : : : : A.6 Study of algorithm 3.4 in the case n = 2 : : : : : : : : : 0 ; Vp0 : : : : : : : : : : : : : : : A.6.1 Expression of Wp;i A.6.2 Proof of conjecture 4.6 in the case n = 2 : : : : : 0 for increasing p : : : : A.6.3 Exponential decay of Wp;i A.7 Proof of fact 4.15 : : : : : : : : : : : : : : : : : : : : :

2

: : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : :

59

59 59 60 62 62 63 64 64 66 66 67 68 69 70 71 73 73 73 73 74

Chapter 1

Introduction Analog to digital conversion (ADC) is classically viewed as an approximation of an analog signal by another signal which has a discrete representation in time and amplitude. When the sampling rate is greater than or equal to the Nyquist rate, no information is lost after discretization in time. The loss of accuracy comes only from amplitude quantization (discretization in amplitude). The basic motivation in high resolution ADC is to minimize the error signal, when reconstructing the analog signal from the quantized signal. This minimization is often based on statistical considerations. For example, when the ADC consists of a simple uniform quantizer, it is assumed that the error signal has a uniform distribution and a

at spectral density. These assumptions have been the starting point for the development of oversampled ADC. While this model of quantization error is not completely justi ed [10], it leads to good predictions of practical results. In the case of simple ADC (simple quantization of the sampled signal), the error signal power is expected to be reduced in proportion to the oversampling rate R by a lowpass ltering of the quantized signal (R = 2ffsm , where fs is the sampling frequency and fm the maximum frequency of the input signal). The white noise assumption has also been the foundation for the design of noise shaping converters (multiloop, multi-stage  or interpolative modulators) [1, 4, 6, 8]. When the noise shaping is performed by purely integrating lters, the power of the error signal contained in the digital output can be reduced in proportion to R2n+1 by a single lowpass ltering [2]. These ADC techniques turned out to be very successful due to their simplicity of implementation and the good performances obtained in signal reconstruction with real time linear processing [5]. However, in order for the assumption of white quantization noise to be veri ed in practice , some constraints have to be imposed on the ADC operation. For example, in simple ADC, the quantization noise looses its `whiteness' when the quantizer resolution is too coarse. In every type of ADC, a certain level of linearity and matching has to be achieved by the in-built analog circuits of the converter (quantization thresholds, D/A feedbacks). The sensitivity to circuit imperfections increases with the complexity of the converter and becomes a limitation to high ADC resolution. While the white quantization noise assumption has been the driving force in the development of most oversampled A/D converters, our motivation is to study optimal signal reconstruction without any statistical criteria and to study it directly from the knowledge of the converter's mechanisms. In our sense, optimality is achieved when the reconstructed signal meets every characteristic known from the original signal. In particular, we will require the estimate to give the same digital output sequence as that of the original signal when fed into the same converter. 3

The motivation behind this criterion originally comes from the simple situation shown in gure 1.2. We plot an example of bandlimited analog signal generated numerically and oversampled by 4. The crosses show the result of classical reconstruction in oversampled ADC, obtained by lowpass ltering the quantized signal (black dots). The gure shows that this estimate does not yield the same digital sequence as the original. This can be seen at time indices 10 and 11. Let us now perform the following transformation: we project the two critical samples of the estimate (k = 10 and k = 11) to the nearest border of the quantization interval which actually contains the original samples. This interval can be identi ed from the knowledge of the original digital sequence. We end up with a new estimate which yields the same digital sequence as that of the original analog signal and is at the same time better than the rst estimate: the distance between the rst estimate and the original signal has been necessarily reduced. This improvement is automatic and does not depend on any assumption about the variations of the analog signal. Further remarks are appropriate to this simple example. Our estimate transformation does not require that quantization be uniform. It is based on the knowledge of the thresholds' position and is applicable whether the nonuniformity was designed or comes from circuit imperfections. In the latter case we simply require the quantization thresholds of the converter to be measurable. As a second remark, we do not even require the thresholds to be constant in time, provided that their variations are known or can be determined. This is typically the situation of dithered ADC. The particular condition here is that the dither be known in the time domain. This is also the less obvious situation of  and interpolative modulations, where, as will be shown in chapter 2, the feedback loop can be seen as a computable input dither. Our last remark is the following: the reconstruction improvement of gure 1.2 can be pushed further. Indeed, the new estimate has no reason to be bandlimited like the original signal. Therefore, it can be further improved with a second lowpass ltering. Again, the signal thus obtained may not yield the same digital output: the improvement procedure can be reiterated. This simple example is in fact the starting point to our approach to ADC. What was precisely used in this example was the knowledge provided by the digital output of certain constraints in amplitude that the input signal should meet. Qualitatively speaking, once these constraints are identi ed, we can improve any estimate which does not meet them by a projection. If the digital sequence given by the input signal is C , these constraints can be expressed as the set of all possible signals giving the same code sequence C . Conceptually, the description of an A/D converter can be reduced to a many-to-one mapping between analog signals and digital sequences. The de nition of the mapping only depends on the knowledge of the conversion's mechanisms (imperfections included). If this mapping can be determined, then the set of signals giving the same digital sequence C is the inverse image of C through the mapping. We denote this set by ?(C ) and call it the signal cell of C . Since every analog signal necessarily gives one and only one digital sequence, the inverse mapping associated with the converter de nes a partition of the space of input signals V ( gure 1.3). We call it the signal partition associated with the A/D converter. To identify estimates which have the same digital output as that of an original analog signal, it is therefore essential to determine the signal partition de ned by the A/D converter. In chapter 2, we show how to derive the mapping associated with a converter, and thus the signal partition, on most A/D converters currently known: simple, dithered, predictive and noise shaping ADC, including multi-stage modulators. We will observe that these converters naturally yield signal partitions where every cell is convex. This property justi es the strong fact that, if an estimate X of an analog signal X0 giving the digital sequence C0 does not 4

A/D

D/A

continuous

discrete

discrete

continuous

time continuous

time continuous

time discrete

time continuous

amplitude

amplitude

amplitude

amplitude

sampling

X0[t]

fs>2fm

quantizer

X0(k)

(q)

X(0)(k)

low pass filter fcutoff=fm

X(1)[t]

Figure 1.1: Principle of simple oversampled A/D conversion and classical reconstruction. quantization interval labeling

quantization threshold

3q/2

1

q

analog input signal : X0[t]

q/2

remaining error

0

0

quantized signal : X(0)(k)

-q/2

-1

-q 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

lowpass filtered version of quantized signal : X(1)(k)

time index k

projection 0 0 -1 -1 0 0 0 1 1 1 0 0 1 1 1 1 1 digital sequence of X0 0 0 -1 -1 0 0 0 1 1 1 1 1 1 1 1 1 1 digital sequence of X(1) 0 0 -1 -1 0 0 0 1 1 1 0 0 1 1 1 1 1 digital sequence of X(1) after projection

Figure 1.2: Numerical example of bandlimited signal oversampled by 4. The signal X (1) obtained by lowpass ltering of the quantized signal X (0) does not give the same digital sequence as that of X0 . 5

V

Γ(C3)

Γ(C4) Γ(C6)

Γ(C0)

Γ(C5) X0

Γ(C2) Γ(C1)

Figure 1.3: Signal space partition de ned by an A/D converter: ?(Ci) is the subset of all possible signals giving the digital sequence Ci . belong to ?(C0), the projection X 0 of X on ?(C0) is necessarily closer to X0 , wherever X0 is located inside ?(C0) ( gure 1.4). The signal transformation shown in gure 1.2 is just the realization of this projection in the particular case of simple ADC. Now comes the obvious question: is this improvement to signal reconstruction substantial ? The answer is of course no, when there is completely independence between the samples, like in Nyquist rate sampling. But when considering the oversampling situation, the samples of an input signal contain a certain redundancy. In other words, another restriction is known about the sampled input signals: they are con ned to belong to the subspace V0 of discrete time signals with the limited bandwidth related to the oversampling rate. This de nes actually another convex speci cation about the sources signals. We will thus look for estimates which also meet this constraint. In other words, when X0 2 V0 gives the digital code C0, we will try to nd an estimate in the more restricted convex set V0 \ ?(C0). In X

V X’

Γ(C0) X0

Figure 1.4: Projection of X on the signal cell ?(C0 ) containing X0. When ?(C0) is convex, this leads to an estimate X 0 which is necessarily closer to X0. 6

restriction of the partition to V0

V

V0

Γ(C3)

V0

Γ(C4) Γ(C6)

Γ(C0)

Γ(C6) Γ(C4)

Γ(C5) X0

Γ(C2)

X0

Γ(C2)

Γ(C0) Γ(C1)

Γ(C1)

Figure 1.5: Restriction of the signal partition to the subspace V0. this context, the converter can be thought as de ning a partition in the restricted space V0 ( gure 1.5). This point of view gives an alternative interpretation of the work of Hein and Zakhor on  modulators with constant inputs [9, 11, 12]. In that case, V0 is a one dimensional space (amplitude of the constant), the A/D converter de nes a partition of the real line and V0 \ ?(C0) is an interval. In [9, 11], estimates for the MSE between X0 and the center of the interval were given for 1st and 2nd order  modulation. In chapter 3, we use our point of view for MSE bounds in the more general case where signals in V0 are bandlimited and periodic for simple ADC and any order multi-loop, multistage  modulation, regardless of how the estimate X is chosen inside V0 \ ?(C0 ). With simple ADC, we prove that, if an input signal has at least a certain number of quantization threshold crossings within one period, the MSE at least decreases with the oversampling rate R proportionally to R2 , instead of R in classical reconstruction. This represents an improvement of 3 dB per octave of oversampling. The analysis of MSE is more dicult for higher order converters such as nth order  modulators. However, if we start from the assumption of white quantization noise which leads to the noise reduction order of R2n+1 in classical signal reconstruction, we prove that, taking an estimate of X0 in V0 \ ?(C0 ) yields a noise reduction of the order of R2n+2 [14]. This, again, represents an improvement of 3 dB per octave of oversampling, regardless of the converter's order. We then explain qualitatively how correlated quantization error can be considered in the derivation of R2n+2 . These results give us strong motivation in nding computational means for such signal reconstruction. We know from Youla's theorem [3] that alternating the projection of a rst estimate on two convex sets necessarily converges to an element of the intersection ( gure 1.6). While the projection on V0 is an ideal lowpass lter with the speci ed bandwidth of input signals, we study in chapter 4 computer implementable algorithms for the projection on ?(C0). Performing a projection on a convex set is a classic problem of constrained quadratic minimization. But in our case, we have the extra diculty that a computer can only see and 7

Γ(C0)

X(0) X0

X(2)



X

V0

X(1)

Figure 1.6: Geometric representation of alternate projections. Signal X (1) obtained by single lowpass ltering of the quantized signal X (0) can be improved by alternating projections between V0 and ?(C0 ). This leads to estimate X 1 . process a signal locally in time. Taking this time constraint into account, we manage to nd algorithms which perform the exact projection on ?(C0 ) for zeroth and 1st order converters. This includes simple, dithered, predictive ADC and 1st order  modulation. Dealing with higher order conversion such as nth order multi-loop, multi-stage  modulation, we derive algorithms which perform the projection on a larger convex set than ?(C0) but which always contains ?(C0 ) and therefore the solution X0. In chapter 5, we test these algorithms numerically on sinusoidal signals for simple ADC and 1st to 4th order multi-loop, multi-stage  modulators. We con rm the predicted gain of 3dB /octave over the classical reconstruction, even for modulators of order greater than 2, where the projection on ?(C0) performed by our algorithm is only approximate. We also show that performing a reconstruction in V0 \ ?(C0) is particularly insensitive to circuit nonidealities (provided they are known): no signi cant loss in SNR is observed even with nonidealities of 1% on the quantization thresholds and on the feedback D/A outputs of a  modulator.

Assumptions and terminology Since input signals are assumed to be sampled at a frequency higher than the Nyquist rate. most of the time we will directly deal with their discrete time version and consider them as sequences of real numbers. Therefore, whenever we use the term \analog signal", we will refer to such sequences. If X is an analog signal, X (k) will designate its kth sample. Although bandlimited signals have an in nite support in time, they can only be known and processed from an initial time. Sequences X will be therefore de ned for k  1 and k will be called the time index. The space of real sequences with positive time indices will be called V . The

8

canonical norm k  kV of V will be the mapping

0 11=2 X 2 X 2 V 7?! kX kV = @ jX (k)j A k1

When considering periodic signals sampled on M points, it will be understood that V is the space of M point sequences and

kX kV =

M X k=1

jX (k)j2

!1=2

A digital output C will be considered as a sequence of codewords C (k). We will often call it code sequence. We will use the term operator to denote a transformation which maps a sequence into another sequence. The image sequence need not belong to the same space as the original sequence. For example, an A/D converter is an operator which maps real sequences into code sequences. When an operator H maps X into Y , we will write Y = H [X ]. We will reserve the notation `DAC' for the operator of D/A conversion. For a given A/D converter, if a signal X0 gives a code sequence C0 the signal cell ?(C0) will be also denoted by ?(X0), depending on whether the object we are referring to is C0 or X0 .

9

Chapter 2

Analysis of ADC signal partition In this chapter, we show how the signal partition generated by most A/D converters currently known can be determined and that it is convex. This will enable us to identify the signal cell corresponding to a given code sequence. We saw in chapter 1 that this is an important step for the generation of estimates giving the same code sequence as that of an original analog signal. We will separate A/D converters into two categories: those which include only one quantizer (single-quantizer converters) and those which include more than one quantizer (multi-quantizer converters). The rst category includes, for example, simple, dithered and predictive ADC, single-loop or multi-loop  modulators and interpolative modulators. The second category includes for example multi-stage modulators. We start studying signal partitioning for the rst category (sections 2.1 and 2.2). We will see that the convexity of the signal partition generated by such converters is a direct consequence of the inherent convexity of a quantizer. In section 2.3, we show how this property can be used for signal reconstruction improvement. In section 2.4, we show that it is in fact possible to improve an estimate by using only a partial knowledge of the code sequence. We nally see in section 2.5 how these properties can be applied to multi-quantizer converters.

2.1 Simple ADC The key function of any A/D converter is the quantizer. It is the device which, at every sampling instant, receives an analog input sample and outputs a codeword (in practice a binary number). Rigorously speaking, this code identi es that the input sample belongs to a certain interval, called quantization interval, inside a prede ned subdivision of the input amplitude range. According to our terminology, simple ADC is the type of conversion which is reduced to a single quantizer ( gure 2.1). Figure 2.2a shows an example of conversion of an M point analog sequence X . The output is an M point sequence of codewords and is determined by the de nition of the quantization subdivision. We have chosen on purpose letters for the codewords, so that one is not tempted to interpret C as a quantized signal. The signal cell ?(C ) can be determined as soon as the quantization subdivision is known, or equivalently, the quantization thresholds. For example, any signal giving the code sequence of gure 2.2a is necessarily con ned between the arrows of gure 2.2b. This identi cation of a signal cell in simple ADC is therefore straightforward. This takes circuit imperfections into account if they are included in the de nition of the quantization thresholds. 10

X analog sequence

quantizer

C digital sequence

Figure 2.1: Bloc diagram of simple ADC. Arrows represent discrete time data: thin arrows for analog samples, thick arrows for digital codes. In the very simple case where M = 2, V is a 2 dimensional space and it is possible to represent the exact signal partition associated with an A/D converter on a sheet of paper. In gure 2.3, we show the signal partition corresponding to the simple converter using the quantizer of gure 2.2. Input signals X can be geometrically represented by points of coordinates (X (1); X (2)) and give two-point code sequences. To each couple of codeword will necessarily correspond a signal cell. Every cell has a rectangular shape. In the general case of M pont input sequences, signal cells are M dimensional hyper parallelepipeds. In simple ADC, we immediately see that signal cells are convex.

2.2 Single-quantizer conversion A very large family of ADC falls into what we call \single-quantizer conversion". It includes every converter which uses only one quantizer. We have found that most of the time, such a converter can be modeled by the structure shown in gure 2.4, where H is an invertible ane operator on analog sequences, and G a given operator with digital input and analog output. By ane, we mean that H can be written H [X ] = L[X ] + L0 where L is a linear operator and L0 is a constant signal. In practice, G includes a D/A converter (DAC). Let us go through an exhaustive list of A/D converters and see how they can be reduced to gure 2.4. Simple ADC is the particular case where H = I (identity operator) and G = 0 (null operator). Dithered ADC is modeled by considering H = I and G as the generator of the dither signal. In this case, G does not depend on its input. Simple  modulation is the case where H = I and G is simply a DAC. In general, any kind of predictive ADC can be reduced to gure 2.4 by taking H = I . The de nition G will depend on the considered predictive modulator. An interpolative modulator which is shown in gure 2.5 can be equivalently represented by gure 2.4 by keeping the same operator H , and de ning G as the composition of a DAC, a delay and the lter H . The operator H will be invertible and ane as soon as the initial conditions of ltering are known. As a last example, let us deal with the nth order  modulator, shown in gure 2.6: its structure is equivalent to gure 2.4 when taking H and G shown in gures 2.7a-b respectively. As soon as the initial states of the successive accumulators of H are known, H can be expressed H [X ] = L[X ] ? L0 where L is the linear and invertible operator shown in gure 2.7c and L0 is the constant signal obtained from gure 2.7d. The operator L is simply an nth order discrete integrator. Remark: The nth order  modulator is often represented as in gure 2.8, but this is equivalent to gure 2.6 provided we de ne C 0(k) = C (k ? 1). Our modi ed structure of gure 2.6 will be convenient for the design of the code projection, and is always possible as the time shift on the code sequence can easily be realized in postprocessing. Now that we know how to reduce a given A/D converter to gure 2.4 let us use this 11

amplitude axis

quantization interval labeling

input signal X[t] sampled signal X(k)

a

b c

quantization thresholds

0

d e

output code sequence C = ( c time index k

1

b a

a

b c

d d c

c

2

4

5

7

10 11 12 13 14

3

6

8

9

d e

d c)

(a)

amplitude axis

a

b c d

0

e

output code sequence C = ( c time index k

1

b a

a

b c

d d c

c

2

4

5

7

10 11 12 13 14

3

6

8

9

d e

d c)

(b) Figure 2.2: (a) General principle of quantization. The amplitude axis is divided into prede ned quantization intervals. Each of them is labeled by a codeword ('a' to 'e'). The quantizer maps a signal X into a sequence C of codewords. (b) Time representation of the signal cell ?(C ). The code sequence C indicates at every time index k in what quantization interval the sample of a signal in ?(C0) should belong to. 12

Figure 2.3: Geometric representation of the signal partition for 2 point sequences with the quantizer of gure 2.2a. equivalent structure to identify the signal cell corresponding to a code sequence C . The rst step is to see on gure 2.4 that C is the simple A/D conversion of signal A. From the previous paragraph, we know how to identify the signal cell of signal A covered by a xed code sequence C0. A time representation of this cell still has the form of gure 2.2b and depends directly on the quantization subdivision. To deduce the signal cell in the space of X , we rst point out that, for any signal A giving code sequence C0, the output of G will always be the xed signal D0 = G[C0]. Therefore, the signal cell in X is generated by H ?1 [A + D0] for every A giving C0 through the quantizer. Several remarks can be made. If nonlinearities exist in the feedback loop of a converter, they can be taken into account in the de nition of G and therefore in the calculation of D0 = G[C0]. Once D0 is known, we can consider the virtual converter represented in gure 2.9. This encoder is of course di erent from the original one since it has no feedback but a

X

A

H

quantizer

C

G Figure 2.4: General structure of a single-quantizer converter: H is an invertible ane operator and G a digital to analog operator. 13

X

A

H z-1

C

quantizer

DAC

Figure 2.5: General structure of an interpolative modulator. 1ststage

X

‘An-1(0)’ at k=0

z-1

2ndstage

nthstage

‘An-2(0)’ at k=0

‘A0(0)’ at k=0

z-1

A

quantizer

C

z-1 z-1

DAC

Figure 2.6: Bloc diagram of an nth order  modulator. It is equivalent to the structure of gure 2.4 where H and G are shown in gures 2.7a-b. dither D0 instead. Therefore, it generates a di erent signal partition. Let us call ?0 (C0) the signal cell of C for the virtual converter. However, since D0 = G[C0], it is easy to see that, for the particular case of the code sequence C0, we have ?(C0) = ?0 (C0).

Fact 2.1 Consider a single-quantizer converter ( gure 2.4), a xed code sequence C0 and a converter of the type of gure 2.9 with D0 = G[C0]. The signal cells of C0 generated by the two converters are the same. The second converter will be called the virtual converter associated with the rst converter and code sequence C0. In the following, once a xed code sequence C0 is considered, it will be very convenient to work with the virtual converter.

2.3 Convexity of signal cells and projection

We are going to show that every converter mentioned above have convex signal cells. This is straightforward in the case of simple ADC. We saw in paragraph 2.1 that signal cells are hyper parallelepipeds in space V . Therefore, they are convex. For a single-quantizer converter of general type, we saw that ?(C0 ) is generated by signals of the form H ?1[A + D0] where A yields the code sequence C0 through the quantizer and D0 = G[C0]. Signals A necessarily belong to a convex set (more precisely a parallelepiped), and ?(C0) is therefore the transform of this convex set through the ane transform A 7! H ?1 [A + D0 ]. Thus, ?(C0) is convex. Another way to prove convexity is to deal with virtual converters. To start with, it is easy to prove directly the following fact: 14

‘An-1(0)’ at k=0

X

‘An-2(0)’ at k=0

z-1

‘A0(0)’ at k=0

z-1

H[X]

z-1 0

(a) ‘An-1(0)’ at k=0

G[C] ‘A0(0)’ at k=0

‘An-2(0)’ at k=0

-1

0 z-1

z-1

z-1 z-1

DAC

C

(b)

X

‘0’ at k=0

‘0’ at k=0

z-1

z-1

‘0’ at k=0

L[X]

z-1

(c) ‘An-1(0)’ at k=0

‘An-2(0)’ at k=0

‘A0(0)’ at k=0

0 z-1

z-1

L0

z-1

(d) Figure 2.7: (a-b) Equivalent H and G operators of an nth order  modulator. H is obtained by calculating the signal A of gure 2.6 when the feedback DAC output is replaced by zero. G is obtained by calculating the signal ?A of gure 2.6 when the input is replaced by zero. (c) Linear part L of H , obtained by forcing the initial state of every accumulator to be zero. (d) Constant part L0 of H , obtained by taking the output of H when the input is zero. 15

1ststage

2ndstage

nthstage

X

z-1 z-1

A

quantizer

C’

z-1 DAC

Figure 2.8: Usual representation of an nth order  modulator. It is equivalent to gure 2.6 when taking C 0(k) = C (k ? 1).

X

H

A

quantizer

C

D0 Figure 2.9: Reduced coding structure: H is an invertible ane operator and D0 a known sequence. This leads to the virtual converter of gure 2.4 for C0 when D0 = G[C0].

Fact 2.2 Signal cells generated by converters of the type of gure 2.9 are convex.

Proof : We have to show that if X0; X1 are two signals giving the same code sequence C , then any signal of the form X = (1 ? )X0 + X1 where  2 [0; 1], should also give C . Since H is ane, it can be written H [X ] = L[X ] + L0, where L is a linear operator and L0 is a xed sequence. Let A ; A0 and A1 be the signals seen by the quantizer when X; X0; X1 are input respectively. We have A0 = L[X0] + (L0 ? D0) ; A1 = L[X1] + (L0 ? D0) and A = L[X ] + (L0 ? D0) Using the linearity of L and the fact that 1 = (1 ? ) + , we have:

A = L[(1 ? )X0 + X1 ] + (L0 ? D0) = (1 ? )L[X0] + L[X1] + (1 ?  + )(L0 ? D0) = (1 ? )(L[X0] + (L0 ? D0)) + (L[X1] + (L0 ? D0 )) = (1 ? )A0 + A1 By assumption, at every time index k, A0 (k) and A1(k) belong to the same quantization interval: the one identitied by C (k). this is also true for A (k) = (1 ? )A0 (k) + A1 (k) which belongs to the interval [A0(k); A1(k)], and this for every time index k. Therefore X gives the code sequence C 2 The convexity of signal cells of any single-quantizer converter follows using their equivalence to gure 2.9 and this fact. As already mentioned in chapter 1, convexity provides us with a fundamental tool for signal reconstruction. Without convexity, it did not seem too obvious why it is interesting to 16

look for an estimate in ?(C0 ). For example, this approach would seem inadequate if ?(C0 ) was not connected (in one piece, roughly speaking). With convexity, we not only have the guarantee that ?(C0) is connected, but we have a tool to (at least theoretically) improve any proposed estimate X which does not belong to the signal cell containing X0 namely the projection on ?(C0). By de nition, the projection of X on ?(C0 ) is the signal X 0 2 ?(C0 ) which minimizes the distance with X . Because ?(C0 ) is convex, it is a consequence of Hilbert space properties that the distance between X 0 and X0 is necessarily inferior to that between X and X01. In chapter 4 we will actually propose algorithms to compute the projection of X on ?(C0) in the case of simple, dithered, predictive ADC and the 1st order  modulation.

2.4 Signal cell of a code subsequence, convexity and projection

What really served us in the improvement of an estimate is the fact that ?(C0) is a convex set that we can identify (because we know C0) and which contains X0 for sure. Actually, this principle of estimate improvement will work as soon as we can identify a convex set which contains X0, whatever the method is, provided it is based on the available information. In this section, we prove that a partial knowledge of C0, more precisely, the knowledge of C0 on a subset I of time indices, is enough to determine a convex set which necessarily contains X0. The search for such convex set will be very helpful in the future, especially when trying to perform convex projections. Indeed, when the order of a converter becomes larger than 2, it will become unrealistic to compute the projection of an estimate, taking into account the sequence C0 on its full time range (chapter 4). The rst attempt is to de ne ?I (C0) as the set of all signals giving code sequences which coincide with C0 only on the time subrange I : ?I (C0) = fX 2 V / if X gives code C , 8k 2 I; C (k) = C0 (k) g It is obvious that ?I (C0) contains ?(C0) and therefore includes X0. More precisely, we have the logic relation: if I1  I2 then ?I1 (C0)  ?I2 (C0)  ?(C0) 3 X0 Qualitatively speaking, the closer I will be to the full time range, the closer ?I (C0) will be to ?(C0 ). Unfortunately, as suggested by our expression \ rst attempt", If I does not cover the full time range, there is no reason for ?I (C0) to be convex in general. The situation is not hopeless. Once the code sequence C0 is xed, we can always consider the associated virtual converter of gure 2.9, which has the same signal cell ?(C0). Let us call ?0I (C0) the set of signals giving code sequences which coincide with C0 on I for the virtual converter. There is no reason to have ?I (C0) = ?0I (C0), but we will still have: if I1  I2 then ?0I1 (C0)  ?0I2 (C0)  ?0 (C0) = ?(C0 ) 3 X0 Then we have the following fact: 1 To be rigorous, the projection X 0 exists (and is unique) only if ?(C0 ) is also topologically closed. Actually, we will always consider the projection X 0 on ?(C0 ), the closure of ?(C0 ). Such a signal X 0 will not necessarily belong to ?(C0 ), but qualitatively speaking, will be at the limit to belong to it. Nevertheless, the property we e ectively need is that X 0 is a better estimate of X0 than X .

17

X

P1

I1 , I2 : time index subsets with assumption I1C I2

P2 P

ΓI1(C0) ΓI2(C0)

X0

signal cells for original converter

Γ(C0)=Γ’(C0) Γ’I2(C0)

signal cells for virtual converter

Γ’I1(C0)

Figure 2.10: Geometric representation of the relative positions between signal cells of original and virtual converters. ?0I1 (C0) and ?0I2 (C0) are necessarily convex. The larger I is, the closer ?0I (C0) will be to ?(C0 ) and the closer the projection of X on ?0I (C0) will be to X0.

Fact 2.3 Let us consider a converter of the type of gure 2.9, a xed code sequence C and a xed time index subset I . Then, the set of signals whose digital outputs coincide with C on I , is convex. Proof : It is similar to that of fact 2.2. The di erence is that we start with X0 ; X1 giving code sequences C0; C1 such that 8k 2 I , C0(k) = C1(k) = C (k). Using the same notation as in fact 2.2, the equality A = (1 ? )A0 + A1 is still true since this is a direct consequence of the structure of the converter. This time, if k 2= I , we cannot draw any conclusion about the position of A(k). On the other hand, if k 2 I , A0 (k) and A1(k) belong to the same quantization interval, namely the interval labeled C (k). This is necessarily the case for A (k) = (1 ? )A0 (k) + A1 (k). Therefore, the code sequence produced by X coincides with C on I 2 The set ?0I (C0) has no longer any intuitive interpretation but it has the following properties: - ?0I (C0) includes ?(C0) which itself contains X0 - ?0I (C0) is convex Therefore, projecting an estimate X of X0 on ?0I (C0) will necessarily improve it. The closer I will be to the full time range, the closer the projection on ?0I (C0) will be to the projection on ?(C0), but also to X0 itself. Figure 2.10 recapitulates the situation between the di erent signal cells and projections. Note that, it is not quite correct to say that ?0I (C0) takes into account only a partial knowledge of C0: the dither sequence D0 computed for the virtual converter does depend on the full code sequence C0. From now on, as soon as we consider signal cells covered by a code subsequence, we will automatically refer to the virtual converter and use the notation ?I (C0) without 0. In fact, we will consider the case I equal to the full time range as a particular case of convex set 18

X

single quantizer converter 1

C1

X

single quantizer converter 2

C2

quantization error

single quantizer converter p

C1

quantization error

arithmetic stage

quantization error

single quantizer converter 1

single quantizer converter 2

C

C2

C

quantization error

Cp

single quantizer converter p

(a)

Cp

(b)

Figure 2.11: (a) General structure of a multi-stage converter; (b) Structure considered in this paper. The arithmetic stage included in the structure of (a) is a rst step to signal reconstruction: we do not consider it as a part of the A/D converter. ?I (C0): this is the smallest convex set derivable from C0 containing X0. But, more important, it coincides with ?(C0), the set of all signals giving C0 through the original converter. Therefore, when we deal with code projection in chapter 4, we will automatically work with the transformed virtual converter. This will enable us to design projection algorithms for converters of order higher than 2 (nth order multi-loop, multi-stage  modulators)

2.5 Multi-quantizer conversion

We call \multi-quantizer converter" any A/D converter which includes more than one quantizer. We only consider the important example of multi-stage modulators, shown in gure 2.11a. The quantization error is output from every single-quantizer converter in the way shown in gure 2.12, where the DAC operation has been extracted from the operator G. Multistage  modulation is the particular case where every subconverter is a  modulator of the type of gure 2.6. Note that we consider the arithmetic stage to be a part of the reconstruction process, and therefore, for our reconstruction purposes, we want to consider directly the digital sequences entering the arithmetic stage. This was also done by Hein and Zakhor [12]. The arithmetic stage is designed under the assumption of white quantization noise. Here, we take the \real" digtal sequence of the multi-stage modulator, which is the p-tuple of code sequences (C1; :::; Cp) ( gure 2.11b). Note that, for the very common case of two-stage single-bit modulation, the digital output is the same (2bits/sample) whether an output arithmetic operation is included in the converter or not. Unlike in the previous paragraph, we will not try to nd a general structure for multi19

X

A

H G

quantizer

C

DAC

quantization error

Figure 2.12: Detail of quantization error output from every single-quantizer converter of gure 2.11. quantizer converters. For the analysis of signal cells, we are going to show directly that, for a given code sequence C0 = (C01; C02; :::; C0p), it is possible to build a virtual converter with the structure of gure 2.13 which has exactly the same signal cell ?(C0). Let us show this on the simplest case of a two-stage modulator. The detailed structure is shown in gure 2.14a. Let us consider a xed output code sequence C0 = (C01; C02). From gure 2.14a, we derive a virtual converter by forcing the input of the ith DAC to be equal to C0i ( gure 2.14b). Similarly to fact 2.1, one has to be convinced that the two converters yield exactly the same signal cell ?(C0), even if in general they do not de ne the same partition in V . The second step is to see that the structure of gure 2.14b is equivalent to gure 2.14c when taking H10 = H1, H20 = ?H2  H1 and signals D01; D02 computed from C01; C02 as shown in gures 2.14d and 2.14e respectively. This principle of transformation can be applied to the case of p quantizer modulators. The ith operator Hi0 of the virtual converter ( gure 2.13) will be equal to Hi0 = (?1)i?1 Hi  Hi?1      H1 (2.1) if Hi is the ith operator of the ith single-path modulator in the original converter ( gure 2.11). Now, using the structure of gure 2.13, we say that ?(C0 ) = ?1 (C01) \ ::: \ ?p(C0p) where ?i (C0i) is the signal cell covered by C0i for the ith subconverter. From paragraph 2.2, we know how to generate every single ?i (C0i). Unlike the case of single-quantizer convertion in section 2.2, we will not be able to propose an analytical generation of signals belonging to ?(C0). However, we know how to identify from the virtual structure of gure 2.13 the signal cell ?i (C0i) for the ith converter. A signal will belong to ?(C0) if and only if it belongs to every ?i (C0i) for i = 1; :::; p. Therefore ?(C0) = ?1 (C01) \ ?2 (C02) \    \ ?p (C0p) (2.2) Since ?i (C0i) is convex, ?(C0 ) is necessarily convex. If X0 is an analog signal giving the code sequence C0, the theoretical idea of projecting an estimate X of X0 on ?(C0 ) to improve it, is still valid. In practice, since we have no analytical description of ?(C0 ), we will not propose an algorithm for the direct projection on ?(C0 ). However, X can be improved by successive and periodic projections on ?1 (C01), ?2 (C02), ...,?p(C0p). Youla [3] has actually shown that such iteration of projections converges and leads to an element of the intersection ?(C0). Coming back to the case p = 2, it can be seen from gures 2.14d and 2.14e that nonlinearities in the feedback DAC outputs if they are measurable are automatically included in 20

quantizer 1

H’1

C1

D01 quantizer 2

H’2

C2

D02

X

quantizer p

H’p

Cp

D0p Figure 2.13: Structure of the virtual converter corresponding to the multi-quantizer converter of gure 2.11b. the de nition of D01 and D02. In general, internal feedback imperfections will be included in the dither sequences D01; :::; D0p of the virtual structure ( gure 2.13). In conclusion, we have shown that, by structure transformation, the analysis of a multiquantizer converter is reduced to the study of single-quantizer converters.

2.6 Conclusion We showed how the signal partition generated by most A/D converters currently known can be derived. This partitioning analysis consists in identifying the set ?(C ) of all possible signals producing a given digital sequence C , for a given A/D converter. The de nition of the partition only depends on the direct description of the converter's structure, and includes circuit imperfections if they can be measured. The convexity of such partition is a direct consequence of the inherent convexity of a quantizer. This analysis provides us with theoretical means to improve any estimate which does not reproduce the same digital sequence as that of a given original analog signal, using orthogonal projections. Practical algorithms which perform such projections will be actually derived in chapter 4. Basically, the signal partitioning e ect of an A/D converter is a consequence of quantization. The convexity property of a quantizer simply results from the fact that quantization intervals are naturally convex themselves. This approach of convex coding analysis can be generalized to vector quantization. 21

X

quantizer 1

H1 G1

C1

X

A1

H1 G1

DAC1

quantizer 2

H2 G2

C2

DAC1

A2

H2 G2

DAC2

(a)

C1

quantizer 1

C01 C2

quantizer 2

DAC2

C02

(b) A1

H’1

X

C1

quantizer 1

D01 A2

H’2

C2

quantizer 2

D02

(c) 0

H1

D01

-1

G1

DAC1

0

H1 G1

C01

DAC1

H2

-1

G2

(d)

DAC2

C01

D02 C02

(e)

Figure 2.14: (a) General structure of a two-quantizer converter. (b) Virtual version corresponding to code sequence C0 = (C01; C02). The signal cell ?(C0 ) is the same as that of the original converter (a). (c) Virtual converter equivalent to (b). (d) Computation of the dither sequence D01 of (c). It is the value of the signal ?A1 of (b) when X is replaced by zero. (e) Computation of the dither sequence D02 of (c). It is the value of the signal ?A2 of (b) when X is replaced by zero. 22

Chapter 3

MSE analysis of optimal signal reconstruction in oversampled ADC In this chapter, we suppose that input signals belong to the subspace V0  V of sequences bandlimited to a known maximum frequency fm : this is the situation of oversampled ADC. Once a signal X0 2 V0 is given by a code sequence C0 output by a given A/D converter, we propose to study the estimation of X0 which consists in \picking up" a signal inside V0 \ ?(C0). Theoretically, it is possible to reach such a signal by alternating projections of a rst estimate on V0 and ?(C0) (Youla's theorem [3]). As said in chapter 1, we will propose algorithms which actually perform these projections, in chapter 4. We study MSE bounds of such estimates, rst for the simple ADC and then for any single-quantizer converter where the operator H is a cascade of integrators ( gure 2.4). This includes, for example, the nth order multi-loop  modulator. As a consequence, we will see that MSE bounds can be deduced for multi-stage, multi-loop  modulators as well. Hein and Zakhor measured MSE upper bounds for the 1st and 2nd order  modulators in the particular case of dc input (one dimensional V0) [9, 11]. In our MSE bound evaluation, we will not make any assumption about how the estimate X of X0 is chosen inside V0 \?(C0). Thus our analysis will be applicable to any reconstruction method leading to a reconstructed signal inside V0 \ ?(C0).

3.1 Modelization of bandlimited and periodic signals (V ) 0

Continuous time bandlimited periodic signals of period T0 can be expressed as follows:

X [t] = A +

N p X Bj 2 cos(2j Tt ) + Cj 2 sin(2j Tt )

N X p j =1

0

0

j =1

(3.1)

Such signals have 2N + 1 discrete low frequency components, between ? TN0 and TN0 in the continuous Fourier domain. They can be equivalently described by their sampled version:

X (k) = X

hk i

M T0 = A +

N p X Bj 2 cos(2j Mk ) + Cj 2 sin(2j Mk )

N X p

j =1

j =1

23

(3.2)

when choosing the sampling frequency of the form M T0 such that M > 2N + 1 (oversampling situation). In this section, we will consider that V0 is the space of sequences of the form (3.2) for a xed N , with the assumption that M > 2N + 1. Note that the oversampling rate R is proportional to M . An element of V0 given by (3.2) (or (3.1)) is uniquely de ned by the vector X~ 2 R2N +1 whose components are (A; B1; :::; BN ; C1; :::; CN ). In other words, we have an isomorphism between V0 and R2N +1. The dimension of V0 is therefore 2N + 1. Using Parseval's equality, it can be derived that N N M X X 1 X jX (k)j2 = A2 + Bj2 + Cj2 = kX~ k2

M k=1

j =1

j =1

where jjX~ jj is the canonical norm in R2N +1 . Therefore, if X 2 V0 \ ?(X0) is an estimate of X0 2 V0, the mean squared error MSE (X0; X ) between X0 and X can be measured by the distance in R2N +1 between their associated vectors X~0 and X~ :

MSE (X0; X ) = M1

M X

k=1

jX (k) ? X0(k)j2 = kX~ ? X~0k2

(3.3)

We will use this isometry between V0 and R2N +1 throughout this section. We will in particular try to measure the asymptotic evolution of this MSE bound with increasing oversampling rate. Since M is proportional to R, we will compare the MSE with powers of M .

3.2 Simple ADC We recall that for simple ADC, the classical signal reconstruction yields a gain in SNR of 3dB per octave of oversampling. Increasing R by an octave means doubling the resolution in time of the signal discretization. It is interesting to see that when the resolution in amplitude is doubled, the gain is not 3dB but 6dB (in short, the gain is 6dB per bit). In theory, one would like to think ADC as a homogeneous discretization of a two dimensional graph, where the dimensions are amplitude and time. One could argue that the origin of this di erence is the non-homogeneity of the error measurement: the MSE is a measure of errors only along the amplitude dimension. However, a bandlimited signal has a bounded slope. Therefore, qualitatively speaking, variations in the time dimension should be observable in the amplitude dimension with a bounded multiplicative coecient which should not a ect the logarithmic dependence of the MSE. These considerations give another hint that the classical signal reconstruction in simple ADC is not optimal. In the theorem we are about to present, we show that, in the context of periodic signals, an estimate X of X0 chosen in V0 \ ?(X0 ) will yield an MSE which goes to zero at least as fast as M 2 when M goes to +1, where is a constant independent of M : this actually represents a gain of 6dB per octave of oversampling. This property is however veri ed under a certain condition about the source signal X0 : it must cross the quantization thresholds at least 2N + 1 times. This condition relative to the amplitude dimension is in fact the dual condition to the restriction in the time dimension that M should be larger than 2N + 1. If the last condition fails, aliasing will result from sampling and introduce an irreversible error even before the signal is quantized in amplitude. Obviously, the 6dB /bit gain should not be expected in this case. Similarly, if the number of threshold crossings is less than 2N + 1, the gain of 6dB per octave of oversampling should not be expected either. 24

Theorem 3.1 If the continuous1 time version of a signal X0 2 V0 has more than 2N + 1 quantization threshold crossings , there exists a constant c0 which only depends on X0 such that, for M large enough and any X in V0 \ ?(X0 ), MSE (X0; X )  Mc02 Proof : The set of instants where X0[t] is equal to one of the threshold levels is discrete and nite (since X0 cannot be constant). Let t0 > 0 be the minimum distance between T0 small enough) so that T < t0 . these instants. We consider M large enough (or Ts = M s 3 Let us choose 2N + 1 distinct threshold crossings of X0 [t], call (t00 ; :::; t02N ) their instants and (l1; :::; l2N ) their levels. We are rst going to show that any signal X 2 V0 \ ?(X0) necessarily crosses level li at an instant ti in the same sampling interval as that of t0i , for every i = 0; :::; 2N . Let (k0; :::; k2N ) be the time indices such that the sampling interval Ii = [(ki ? 1)Ts; kiTs ] contains t0i for every i = 0; :::; 2N . For a xed i, X0[t] crosses level li which is the common boundary of two quantization intervals. Because Ts < t0 , X0(ki ? 1) and X0(ki ) necessarily belong respectively to the interior of these two intervals. If X 2 V0 \ ?(X0 ), X (ki ? 1) and X (ki) should respectively belong to the quantization intervals which contain X0(ki ? 1) and X0 (ki). Therefore, since it is continuous, X [t] should cross level li at a time ti 2 Ii . As a consequence, we have T0 (3.4) jti ? t0i j  Ts = M Since Ts < 3t0 , we also have

8i1; i2 2 f0; :::; 2N g such that i1 6= i2; jti1 ? ti2 j > 3t

0

(3.5)

Using (3.4), we are going to show to that the norm of X~ ? X~ 0 can be linearly bounded with 1 M when M goes to +1. This will be based on a linearization of variations in time and an algebraic manipulation. Using (3.1), the fact that X [ti] = X0[t0i ] = li for every i = 0; :::; 2N can be written with a vector notation as M(t0; :::; t2N )X~ = M(t00; :::; t02N )X~ 0 = L~ (3.6) where M(s0 ; :::; s2N ) is the transpose of the square matrix

2 1 66 p2 cos(2s0=T0) 66 .. . 66 p 66 p2 cos(2Ns0=T0) 66 2 sin(2s0=T0) 66 .. 4p .

      2 sin(2Ns0=T0)   

1 p 2 cos(2s2N =T0) .. . p 2 cos(2 Ns 2N =T0) p 2 sin(2s2N =T0) .. . p 2 sin(2Ns2N =T0)

3 77 77 77 77 77 77 5

1 We will not count as threshold crossings, points where X0 [t] reaches a threshold without crossing it. As a consequence, X0 cannot be a constant

25

and L~ the column vector whose components are (l0; :::; l2N ). Replacing the cos and sin functions in M by their complex exponential expressions, it is possible to see that M(s0; :::; s2N ) is similar to the Vandermonde matrix [exp(j 2ksi=T0)]0i2N;?N kN

which is invertible as soon as (s0 ; :::; s2N ) are distinct. Therefore, M(t0; :::; t2N) and M(t00; :::; t02N )  are invertible. It can also be shown that, because of (3.5), M(t00; :::; t02N ) ?1 is bounded. Therefore, X~ = M(t00 ; :::; t02N ) ?1 L~ is bounded. Subtracting M(t00 ; :::; t02N )X~ in the two rst members of equation (3.6), we nd

  M(t00; :::; t02N )(X~ ? X~ 0) = ? M(t0; :::; t2N ) ? M(t00; :::; t02N) X~

(3.7)

At the limit of M going to +1, ti ? t0i goes to zero and we have

2N X 0 0 M(t0; :::; t2N ) ? M(t0; :::; t2N ) ' (ti ? t0i ) @@tM (t00; :::; t02N ) i i=0 Because X~ is bounded and M(t00; :::; t02N ) is invertible, (3.7) and (3.8) imply that h i 2N ~X ? X~ 0 ' ? M(t00; :::; t02N ) ?1 X(ti ? t0i ) @ M (t00; :::; t02N )X~ @ti i=0

(3.8)

(3.9)

Since X~ is bounded, the right hand side goes to zero when M goes to +1. Therefore, X~ tends to X~ 0, and (3.9) is still true when replacing X~ by X~ 0 in the right hand side (since X~ 0 6= ~0). We obtain

X X~ ? X~ 0 ' (ti ? t0i )F~i0 2N

where

h

i=0

i

?1 F~i0 = ? M(t00; :::; t02N ) @@tM (t00 ; :::; t02N )X~ 0 i

is a vector which depends only on the signal X0. Using (3.4), we nd jjX~ ? X~ 0jj  Mc where N jjF~ 0jj. The proof is completed using (3.3) and taking c = c2 2 c = T0 P2i=0 0 i Remark : The only assumption used in this theorem was the number of threshold crossings of the input signal. For example, this does not require quantization to be uniform. However, the constant c0 obtained in the upper bound may depend on how the input signal crosses the thresholds. In particular, c0 depends on F~i0 which contains the term @@tMi (t00 ; :::; t02N )X~ 0. One can verify that this expression contains the slope of X0 at the ith threshold crossing.

3.3 nth order single-quantizer converter

In theorem 3.1, we needed the condition that the source signal has at least 2N + 1 threshold crossings within its period. We saw that this is the amplitude dimension dual condition to 26

the time dimension condition which requires that M > 2N + 1. Indeed, M represents the number of \sampling time crossings" of the signal. Since signals are function of time, there is a trivial equality between the number of sampling instants and the number of sampling time crossings. Unfortunately, this is not the case in the amplitude dimension. In general, with rigid quantization thresholds, there is no way to control the number of threshold crossings of a signal, unless particular characteristics are known from it. Then, it becomes natural to think of time varying quantization thresholds. The idea is that, in simple ADC, the ecient information is located around the threshold crossings of the input signal, whereas the rest of the time is just \waiting". Making the quantization thresholds vary in time is increasing the chances to \intercept" the signal. This is what happens in single-quantizer converters of gure 2.4 when the output of G is not a constant signal. Indeed, if we forget the presence of the operator H for a moment, the code sequence gives the comparison of signal Y with the quantization thresholds shifted by the time varying output of G. This last signal can be prede ned (dithered ADC) or calculated from the digital output (general case of a single-quantizer converter). In order to evaluate the MSE between X0 and an estimate X 2 V0 \ ?(C0 ), we cannot think in terms of threshold crossings any more: since the variations of the G output can be abrupt, we have lost the notion of bounded slope and cannot use di erentiation tools. We have to study how the G output directly a ects the de nition of V0 \ ?(C0). To reduce the diculty of the problem, we are going to con ne ourselves to only consider converters where H is an nth order integrator, and the quantizer is uniform with a step size equal to 1 and a nite number of intervals. We naturally include the single-bit quantizer. Moreover, we suppose that the analog signals X to be coded verify a no-overload stability condition that we de ne as follows: the signal A seen by the quantizer when X is input, never goes farther than a distance equal to 1 from the extreme quantization thresholds. For example, in the case of rst order  modulation, any signal X where every sample belongs to [? 21 ; 21 ], will automatically verify the stability condition when the output values of the feedback DAC are  21 . In this context, we will assume that an estimate X0 of X in V0 \ ?(X0) should also verify this stability condition. We can include this constraint in the de nition of ?(X0) by considering that the two extreme quantization intervals, which are normally of in nite length, have a size equal to 1 (see gure 3.1). We will then automatically reject as estimate of X0, any signal which gives the same code as X0 but does not verify the stability condition. Finally, in our characterization of V0 \ ?(X0) for a given X0, it will be convenient to consider the virtual converter. This is always possible, since the set of signals giving the same code as X0 is the same whether we look at the original or the virtual converter (fact 2.1). The interesting feature about the virtual converter is the simple characterization of the signal seen by the quantizer. Indeed, when X is input, the signal A appearing in front of the quantizer is A = H [X ] ? G[C0] whether X belongs to ?(X0 ) or not. Since we want to study the distance or the di erence between X and X0, it will be convenient to rewrite this last relation as A ? A0 = L[X ? X0 ], where L is the linear part of H . If we express X = X0 + x, then A = A0 + a where a = L[x]. Let us rst try to characterize the elements of ?(X0). The action of the G feedback is re ected in the behavior of signal A0 = H [X0] ? G[C0]. But, to recognize that a signal X belongs to ?(X0), what really matters is only the relative position of A0 (k) within the quantization interval Q0 (k) it belongs to. Indeed, X = X0 + x will belong to ?(X0 ) if and only if, at every instant k, A0(k) + a(k) and A0 (k) belong to the same quantization interval. If we call d0+ (k) and d0? (k) the distance of A0 (k) to the upper and lower bounds of Q0 (k) 27

amplitude axis virtual threshold

2

1

quantization thresholds

0 e0(k

Q0(k)

A0(k)

-1

-2

d+0(k

no-overload stability range of A(k)

d-0(k)

virtual threshold

Figure 3.1: Amplidude representation of 2 bit uniform quantization, with no-overload stability range as de ned in this section. then,

X0 + x 2 ?(X0) () 8k = 1; :::; M; ?d0? (k)  a(k)  d0+(k); where a = L[x] The two distances d0+ (k) and d0? (k) are represented in gure 3.1. We always have

(3.10)

d0+ (k); d0?(k) 2 [0; 1] and d0+ (k) + d0? (k) = 1

Actually, those two numbers are directly related to the quantization error e0 (k) which is the di erence between the center of the interval Q0(k) and A0 (k). From gure 3.1, it is easy to see that: (3.11) e0 (k) = ? 21 + d0+(k) = 21 ? d0? (k) Let us now look at what we consider to be the ecient estimates of X0 : the signals of ?(X0) which belong to V0. Figure 3.2 shows a 3 dimensional representation of ?(X0) and its section with V0, symbolized by a 2 dimensional space (since V0 has to be a subspace of V ). Any element of V0 can be written X = X0 + U where   0 and U belongs to the unit 28

Γ(X0)

V S

X X0

Y

U

V0

Figure 3.2: Signal representation of V0 \ ?(X0). sphere S of V0:

. o n S = U 2 V0 kU~ k = 1 For an element U of S , let us call V = L[U ]. Then, the equivalence (3.10) applied to elements of V0 becomes X0 + U 2 ?(X0) () 8k = 1; :::; M; ?d0? (k)  V (k)  d0+(k) (3.12) Suppose we x U and V = L[U ], and we only want to look at estimates X of X0 which are located in the direction of U~ with respect to X0 (see gure 3.2), that is to say, X has the form X0 + U where   0. Let us call k the sign of V (k). We can write (3.12) as X0 + U 2 ?(X0) () 8k = 1; :::; M; jV (k)j  d0k (k) (3.13) Once we are able to recognize that X = X0 + U 2 ?(X0), MSE (X0; X ) will be directly given by 2. Basically, we have just reduced the characterization of V0 \ ?(X0 ) to the knowledge of d0+ or d0? . Unfortunately, there is no simple expression of d0+ ; d0? in terms of X0. Even with a statistical approach, Gray [7, 10] showed that, for 1st order  modulation with constant inputs, the quantization error signal is not theoretically white. In other words, d0+ (k) or d0? (k) are generally correlated in time. 29

However, without the knowledge of d0+ or d0? , the mere fact that A0 (k) is con ned in an interval of size 1 already gives a rst upper bound to MSE (X0; X ) where X 2 V0 \ ?(X0). From (3.13), an estimate X0 + U in ?(X0 ) at least veri es jV (k)j  1; 8k = 1; :::; M (3.14) We recall that V is the result of the nth order integration of U . To give an intuition of the upper bound, let us look at the particular direction U~ where U is the unit constant signal: 8k = 1; :::; M; U (k) = 1. In this case, without calculation, we will expect V (k) to grow with k in a fashion comparable to knn! . Therefore, V (k) will reach its maximum at k = M . From (3.14), we necessarily have   V (1M ) ' Mn!n . We conclude that, if X 2 V0 \ ?(X0 ) di ers from X0 by a constant signal, we necessarily have MSE (X0; X )  Mc2n where c > 0 is a constant independent of M . We are going to show that this dependence of the MSE with respect to M is the same with every other direction U of S . This time the maximum of jV (k)j is not necessarily achieved at k = M , but we still have from (3.14) the necessary condition   max 1jV (k)j (3.15) 1kM The following lemma shows that, even if U is not the constant signal, but is in general a periodic and bandlimited signal of energy 1, then max1kM jV (k)j is still proportional to M n.

Lemma 3.2 When L is the nth order integrator, there exist two constants 0 < c1 < c2 such that, for M large enough,

8U 2 S ; c1M n  1max jV (k)j  c2M n; where V = L[U ] kM This is proved in appendix A.1. As a natural consequence, we have Fact 3.3 There exists a constant c > 0 such that, for M large enough, for any input X0 2 V0 which satis es the stability condition,

8X 2 V0 \ ?(X0); MSE (X0; X )  Mc2n

Proof : Use the necessary condition (3.15), the lower bound of lemma 3.2, use the fact that

MSE (X0; X ) = 2 and take c = c121 2

Of course, this upper bound is not satisfactory, since the classical reconstruction itself has a M 21n+1 behavior. In particular, when n = 0 such as in dithered or predictive ADC, fact 3.3 does not show any expectation of MSE improvement with increasing oversampling ratio. The weakness of this upper bound comes from the fact that, in the application of (3.13), we did not do better than assuming that d0k (k) = 1, 8k; U . Even if d0k (k) is strongly correlated in time, we would expect d0k (k) to be sometimes less than 1. Think that, in (3.13), it is the smallest d0k (k) for k = 1; :::; M which will impose the strongest constraint on . In fact, this is not entirely true because jV (k)j also varies with k. But, still, this idea gives us a rst hint to a deeper analysis. Let us see what happens if we assume that for any sequence 30

of signs (1 ; :::; M ), (d01 (1); :::; d0M (M )) are independent random variables with a uniform distribution in [0; 1]. This is completely equivalent to saying that the quantization error signal e0 is white with a uniform distribution in [? 21 ; 12 ]. We know this is not theoretically true, but this approach will still give us some insight. As a rst curiosity, we would like to know what would be the expected value of the minimum of (d01 (1); :::; d0M (M )) in terms of the number M of considered random variables. We have the following result: Fact 3.4 Let (d(1); :::; d(M )) be M independent random variables with a uniform distribution in [0; 1], and dmin = min1km d(k). Then dmin is a random variable and its expected value is M1+1

Proof : Prob(dmin  ) = Prob(8k = 1; :::; M; d(k)  ) = (1 ? )M Prob(dmin 2R [;  + d]) = ? dd (Prob(dmin  ))d = M (1 ? )M ?1d E (dmin) = 01 M (1 ? )M ?1d = M1+1 2 If by chance the maximum of jV (k)j and the minimum of d0k (k) are achieved at the same time, then (3.13) will imply   max 1jV (k)j 1min d0 (k) kM k 1kM which qualitatively reduces the upper bound of  by a factor of M1 , and the upper bound of 2 by M12 . Of course this is not likely to happen, but we will see that some compromise will be found. One has to think of jV (k)j as a slowly varying function, and this for two reasons: V is the result of an nth order integration of some signal U , which itself is a slowly varying function, especially for high oversampling ratios. We just want to give the intuition that the behavior jV (k)j  c1M n is true at a substantial number of time indices other than when jV (k)j achieves its maximum. Actually, it will appear in our derivation that the upper bound of the MSE expectation is more precisely related to the average of jV (k)j. The following lemma says that the average of jV (k)j has the same behavior as its maximum value, with respect to M. Lemma 3.5 When L is the nth order integrator, there exist two constants 0 < c3 < c4 such that, for M large enough,

8U 2 S ; c3M n  M1

M X

k=1

jV (k)j  c4M n ; where V = L[U ]

This is proved in appendix A.1. As a consequence, we have the following fact: Theorem 3.6 Suppose that when the input signal X0 is taken at random in the stability range of V0 , for any M  1, the quantization error values (e0 (1); :::; e0(M )) are independent random variables with a uniform distribution in [0; 1]. Then there exists a constant c > 0 such that, for M large enough,

E (MSE (X0; X ))  M 2cn+2

where X is taken at random in V0 \ ?(X0 ).

31

Proof : We rst introduce some notations. ~ =Y~ means that X~ is parallel to Y~ with the same direction. - X= - Prob(A = B ) is the probability of event \A" given the event \B ". It is understood that the expectation will be calculated on the basis of the random variable de ned by the couple (X0; X ). The element X is correlated to X0 in the sense that, X0 is the rst random element to be drawn, then X is drawn inside V0 \ ?(X0). The probabilistic knowledge of X0 is given through the statistical assumption on the random variables d0k (k). No assumption is made about the probability distribution of X except the purely logical fact that X 2 V0 \ ?( of (X0; X ), we call Y the signal (shown in gure   every  ~ realization  ~X0).~For 3.2) such that Y ? X0 == X ? X~ 0 and Y belongs to the boundary of V0 \ ?(X0). Then, (X0; Y ) becomes a new random couple. It is obvious that

MSE (X0; X )  MSE (X0; Y ) (3.16) Let us nd an upper bound to the expectation of MSE (X0; Y ). We rst study the conditioned expectation EU (MSE (X0; Y )) of MSE (X0; Y ) given that Y is located in a xed direction U~ with regard to X0 ( gure 3.2). We de ne the conditioned probability density    . fU ()d = Prob kY~ ? X~0 k 2 [;  + d] Y~ ? X~ 0 ==U~ Then

EU (MSE (X0; Y )) =

If we de ne the cumulative probability



Z +1 =0

PU () = Prob kY~ ? X~0 k  

2fU ()d

. ~



Y ? X~ 0 ==U~

 



we will have fU () = ? dd PU (). Saying that kY~ ? X~ 0k   given that Y~ ? X~ 0 ==U~ is equivalent to saying that the boundary point Y of V0 \ ?(X0) located in the direction U~ with regard to X0 , is at a distance from X0 greater than . This is just equivalent to saying that X0 + U 2 V0 \ ?(X0) (since, it is a convex set). Then

PU () = Prob (X0 + U 2 V0 \ ?(X0 )) If we call V = L[U ] and k = sign(V (k)) for k = 1; :::; M , then we can use equivalence (3.13). Since  and U are xed, the probability that X0 + U 2 V0 \ ?(X0) completely depends on the behavior of the random variables (d01 (1); :::; d0M (M )). We have



PU () = Prob 8k = 1; :::; M; jV (k)j  d0k (k)



(3.17)

Let k0 be the time index when jV (k0)j achieves its maximum. Whenever  > jV (1k0 )j , then jV (k0)j  d0k0 (k0)  1 becomes impossible. Therefore,

8 > jV (1k )j ; PU () = 0 and fU () = 0 0

32

Performing an integration by part, we have

Z +1 h 2 i+1 Z +1 PU ()d (?2)PU ()d = 2 EU (MSE (X0; Y )) = ? PU () 0 ? 0

0

Suppose that 0    jV (1k0 )j . We have assumed that (e0 (1); ::; e0(M )) are independent random variables with a uniform distribution in [? 21 ; 12 ]. Using equation (3.11), this implies that, for the xed choice of signs (1 ; ::; M ), (d01 (1); :::; d0M (M )) are also independent with a uniform distribution in [0; 1]. Therefore, from (3.17) we have

PU () =

M Y k=1

(1 ? jV (k)j)

(3.18)

Using the inequality 1 + x  ex and the lower bound of lemma 3.5, we nd

PU ()  exp ?

M X

k=1

!

jV (k)j  e?c3 M n+1

(3.19)

Actually, this inequality is true for every   0: when  > jV (1k0 )j , (3.19) is trivially satis ed since PU () = 0 and the right hand side is always positive. Therefore Z +1 EU (MSE (X0; Y ))  2 ec3 M n+1 d = (c2)2 M 21n+2 0 3 The last term has been obtained by an integration by part. Thanks to lemma 3.5, we have managed to bound the expectation of MSE (X0; Y ) conditioned on U 2 S , independently of U . This shows that, for M large enough E (MSE (X0; X ))  E (MSE (X0; Y ))  (c2)2 M 21n+2 2 3 Even if the assumption of a \white" quantization noise is not theoretically justi ed, we still considered it for two reasons. First, starting from the \white noise" assumption usually adopted in oversampled A/D conversion, we have just shown that there is actually more information contained in the code sequence C0 about the source signal X0 then expected. We will see in chapter 5 that the practical results we obtained from numerical tests agree with our analytical evaluation based on this noise model. Second, the relatively simple equations we obtained can be the starting point to more detailed discussions. The assumption of quantization error independence and uniformity is exactly used when deriving equation (3.18) from (3.17). In fact, we don't really need (3.18) to be an equality, since we only try to nd an upper bound to PU (). To obtain the M 21n+2 behavior, it would be sucient to have, in place of (3.18) M Y PU ()  c0 (1 ? jV (k)j) (3.20) k=1

where c0 is a constant independent of M . Now, PU (), given by (3.17), is the probability that the M dimensional variable d~0 = (d01 (1); :::; d0M (M )) in [0; 1]      [0; 1] 33

belongs to the parallelepiped [jV (1)j; 1]      [jV (M )j; 1]

(3.21)

Inequality (3.20) just means that the ratio between the probability that d~0 belongs to a parallelepiped of type (3.21) and the volume of this parallelepiped, is bounded regardless of M . This does not necessarily requires that d~0 have a probability density equal to 1 on [0; 1]    [0; 1] (independence and uniformity of d0k (k)). Since we are only interested in the probability of d~0 calculated over volumes of type (3.21) which are nite, we can even allow d~0 to have a Dirac type of distribution. This is theoretically the case of  modulation for example. Indeed, d~0 is locally an ane function of X0 which itself is con ne into a space of dimension 2N + 1 < M . There is however a known case where the M 21n+2 behavior fails. Hein and Zakhor [9, 11] proved that, in the case of 1st order  modulation with constant inputs, the MSE of reconstruction is lower bounded by M 3 where is a constant. However, we found numerically (chapter 5) that a M14 behavior is recovered as soon as inputs become sinusoidal with at least an amplitude of 1=2000.

3.4 nth order multi-quantizer converter We presented in section 2.5 ( gure 2.11a) the general structure of what we called multiquantizer converter. When considering a xed analog input X0 giving C0 , we saw that ?(X0) = ?(C0) is equal to the intersection of the convex sets ?i (X0) = ?i (C0i) which is the signal cell corresponding to the ith single-quantizer converter of the virtual structure ( gure 2.13). Therefore, we have V0 \ ?(X0) = (V0 \ ?1 (X0)) \ (V0 \ ?2 (X0)) \ ::: \ (V0 \ ?p(X0)) suppose that the order of the ith single-quantizer converter in the original structure is ni . Using formula (2.1) in section 2.5, the order of Hi0 will be

n0i = ni + ni?1 +    + n1 If we apply the result of theorem 3.6, the expected value of MSE (X0; X ) when X 2 V0 \?(X0 ) P 0i +2) p ? (2 n 0 will be of the order of O(M ). Since np = 1 ni is the highest order, V0 \ ?p (X0) will be qualitatively speaking the smallest set containing X0 . We conclude that the \size" of V0 \ ?(X0) in terms of MSE has the order O(M ?(2n+2) ) where n = n0p . In practice, while looking for an estimate in V0 \ ?(X0 ), we will only work with V0 \ ?p (X0). This means we

have reduced the study of our multi-quantizer converter to the last single-quantizer converter in the virtual structure. This conclusion should not be misleading. Looking at gure 2.13, it seems as if we dropped the use of the other code components C01, C02; :::, C0(p?1). But one has to remember that this code information is actually included in the dither sequence D0p. We have shown in section 2.5 ( gure 2.14e) the relation between D0p and C01; :::; C0p in the case p = 2. Therefore, the de nition of the last single-quantizer converter of the virtual structure indeed contains the full information of the output code sequence C0 = (C01; :::; C0p). 34

3.5 Conclusion Working on periodic input signals, we have shown under certain assumptions that picking an estimate X of an original signals X0 , having the same bandwidth and producing the same digital sequence, leads to an MSE upper bounded by R?(2n+2) instead of R?(2n+1) , where R is the oversampling rate and n is the order of the converter. For simple ADC (n = 0), this is conditioned on the fact that the original signal X0 has a minimum number of threshold crossings. For single-quantizer converters of higher order (n  1), the MSE has a necessary upper bound proportional to R?(2n+2) when assuming no-overload stability and a white and uniform quantization error signal. We explained, however, that this result can be obtained with weaker assumptions. This analysis can be applied to a multi-quantizer converter since it can be decomposed into single-quantizer converters working in parallel, according to chapter 2. As a consequence, the overall performance is still R?(2n+2) where n is the total order of the multi-quantizer converter.

35

Chapter 4

Algorithms for the projection on

?I (C0)

4.1 Introduction In this chapter, we propose computer implementable algorithms to perform the projection of a signal X on ?I (C0) for a given single-quantizer converter, a code sequence C0 and a time index subset I . The set ?I (C0) is de ned in section 2.4 and is systematically derived from the virtual converter. When I is the whole time range, then ?I (C0) coincides with ?(C0), the set of all analog signals giving code sequence C0 through the original converter. We saw that ?I (C0) is always convex. Even if our study is based on the general case where I is not necessarily the full time range, whenever we can, we will try to nd a method to perform the projection on ?(C0), since ?(C0) gives the best localization of X0 based on the knowledge of C0. Otherwise, we will try to nd the projection on ?I (C0) where I is as large as possible, or, in other words, ?I (C0) is as close as possible to ?(C0). Our particular motivation will be to test the performance of the proposed algorithm in the context of oversampled ADC (chapter 5), and compare the results with the analysis of chapter 3. From section 2.3 we know that the projection of X on ?I (C0) is the signal X 0 2 ?I (C0) which minimizes the mean squared error with respect to X . We immediatly recognize the problem on a constrained minimization: X 0 minimizes the function Y 7! MSE (X; Y ) subject to the constraint Y 2 ?I (C0). We can then apply the theory of optimization in the context of quadratic minimization function with convex constraints. However, we have to take into account the fact that elements we want to optimize are signals de ned on a whole time interval. It is of course expected that a computer can only process a signal through a moving nite length window (typically, the length of the bu er). Conceptually, the total length of signals should be thought of as in nite compared to that of the window of operation. Therefore, we have to design speci c optimization algorithms, adapted to this practical constraint. We will see in section 4.3 that, in the case of zeroth order conversion (simple, dithered or predictive ADC), the minimization of MSE (X; X 0) subject to X 0 2 ?I (C0) can be performed by an independent computation of X 0(k) at time k, only function of X (k) and D0(k) at the same instant (D0 is the dither sequence derived from the virtual converter of gure 2.9). This will enable us to propose a straightforward algorithm for the projection on ?(C0 ). Unfortunately, this is no longer true with converters of higher order. For 1st order  modulation (section 4.4) we will see that X 0(k) has to be computed by blocks in time. However, the computation is independent from one block to another. We propose an 36

X X+x

H

A A+a

quantizer

C C0

D0 Figure 4.1: Coding an estimate X which does not give the code sequence C0. If a variation x is added to X , the signal seen by the quantizer will be A + a where a = L[x], where L is the linear part of H . algorithm which nds the projection on ?(C0) which we call the \thread algorithm". With converters of order higher than 1, it becomes no longer possible to split the computation of X 0(k) into independent time intervals. Indeed, the determination of X 0(k) at a certain instant k will depend on the knowledge of C0 and X on the whole time range. Even if we assume that the computer's bu er is longer than the signal, there is no single step method to nd the minimum to the MSE subject to inequality constraints (the relation X 0 2 ?I (C0) can indeed be formulated as inequality constraints). Based on a property (conjecture 4.6) we proved for the case n = 2, and which still works for higher orders as con rmed by our numerical simulations, we propose an algorithm which nds in one step the projection on ?I (C0), where I is recursively constructed in time. If we assume that the computer's bu er is longer than the signal, this algorithm gives the exact projection on ?I (C0). When the bu er is limited, we show qualitatively that the solution given by the algorithm is a \satisfactory" approximation of the projection.

4.2 Speci c formulation of the constrained minimization problem and properties

In the formulation of the constrained minimization problem, is is particularly convenient to take X , the signal we want to project, as the origin of the signal space. This is possible since X is always known. We will then express any element of V in the form X + x. The projection of X on ?I (C0) becomes the signal X + x 2 ?I (C0) such that kxkV is minimized. The function to minimize is simple but the system of constraints ?I (C0) has a complex expression. Therefore, we propose to perform a change of variable on x which will greatly simplify the expression of constraints while the minimization function will be of a quadratic type. It is possible to determine the signal A seen by the quantizer when X is fed into the virtual converter ( gure 4.1). When any signal X + x is input, the signal seen by the quantizer is of the form A + a, where a = L[x] and L is the linear part of H . By assumption, L is invertible and de nes a one-to-one mapping between x and a. According to the structure of the virtual converter, X + x belongs to ?I (C0) if and only if, for every time index k 2 I , A(k) + a(k) belongs to the quantization interval labeled C0(k), or equivalently, a(k) belongs to this interval shifted by ?A(k). Let us introduce the following notation: for every k (not necessarily in I ), we call QC0 ;X (k) the quantization interval labeled C0(k) shifted by ?A(k). For a xed k, QC0 ;X (k) does only depend on C0 and X : the identi cation of the 37

amplitude

0

0

1

2

3

4

5

6

7

8

9

10 11 12 13 14

time index k

a(k) : example which satifies the contraints boundaries of Q(k)

Figure 4.2: Time representation of C (I ). quantization interval is directly related to C0(k), and signal A depends on X and D0, where D0 itself depends on C0. With the following equivalence, we express the constraints in terms of variable a: X + x 2 ?I (C0) () a = L[x] 2 CC0 ;X (I ) where n . o CC0;X (I ) = a 2 V 8k 2 I; a(k) 2 QC0;X (k) 1 In the following, since C0 and X are considered to be xed , we will simply write QC0 ;X (k) = Q(k) and CC0 ;X (I ) = C (I ) When I covers the whole time range, we will just write C (I ) = C . Let us just keep in mind the relation of Q(k) and C (I ) with C0 and X . Figure 4.2 shows a time representation of C (I ). It is easy to see that C (I ) is a convex set. The function to minimize kxkV can also be expressed in terms of a. We de ne



2

(a) = 21

L?1 [a]

V

If a = L[x], then (a) = 12 kxk2V . We can now express the projection problem as follows: 1 Again, we will always consider the closure of C (I ). This is equivalent to considering the closure of C0 ;X

every Q

C0 ;X

(k). As a consequence, we will assume that Q

38

C0 ;X

(k) includes its bounds.

Fact 4.1 The projection of X on ?I (C0) is the signal X + x such that x = L?1[a] and a is the minimum of  subject to the constraint C (I ) = fa 2 V /8k 2 I; a(k) 2 Q(k) g Since  is a quadratic function and C (I ) a convex set, the theory of optimization directly con rms that  has a unique minimum subject to C (I ). This theory also provides a characterization of the minimum when  is quadratic, which is the case here. If we designate by @ the sequence whose value @ (k) at time k is the partial derivative of (a) with respect to @a @a a(k), we have the following theorem:

Theorem 4.2 (Euler inequality characterization) A signal a0 is a minimum of  sub-

ject to convex constraints S if and only if

a0 2 S

and

8a 2 S ;

X @ k

@a0 (k)  (a(k) ? a0 (k))  0

(4.1)

Thanks to the change of variable x $ a, the constraint speci cation C (I ) has the particular advantage to be separable. By this we mean that constraints are de ned independently of a(k) at every time index k. As a consequence of Euler characterization, the theory of optimization provides us with the following theorem:

Theorem 4.3 Let a0 be a minimum of  subject to separable convex constraints S . Then

@ (k)  (a(k) ? a (k))  0 (4.2) 8k; 8a 2 S ; @a 0 0 If for a certain index k, S does not impose any constraint on a(k), then @ (k) = 0 (4.3) @a0 Proof : Let us consider a xed time index k1, and a xed element a1 2 S . Let a01 be the sequence equal to a0 everywhere except at instant k = k1 where a01 (k1) = a1 (k1). Since a0 2 S @ (k )(a (k ) ? and S is separable, a01 also belongs to S . Applying (4.1) on a = a1 , we nd @a 0 1 1 1 a0 (k1))  0. This proves (4.2). To show (4.3), suppose that at k = k1, S does not impose any constraint on a(k1). Let a+ ; a? be the sequences equal to a0 everywhere except at k = k1 where a+ (k1) = a0 (k1) + 1 and a? (k1) = a0 (k1) ? 1. We have a+ ; a? 2 S . Then applying @ (k )  0 and ? @ (k )  0, which (4.1) on a = a+ and a = a? , we respectively nd @a @a0 1 0 1 @ implies that @a0 (k1) = 0 2 If the change of variable x $ a simpli es the constraints expression, it however complicates the function to minimize . In this chapter, we will only consider converters where L is an nth order integrator. In this case, using notation 4.4, @ @a can be obtained according to fact 4.5.

Notation 4.4(n)?nth order(n)+discrete derivative For y 2 V , y and y are the nth order discrete backward and forward derivatives of y

recursively de ned as follows:

y(0)? = y (0)+ = y 39

and for n  0 and

( (n+1)? y (1) = y (n)? (1)

y(n+1)? (k) = y (n)? (k) ? y (n)? (k ? 1) ; for k  1

y(n+1)+ (k) = y (n)+ (k + 1) ? y (n)+ (k) ; for k  0

Fact 4.5 If L is an nth order integrator ( gure 2.7) and a = L[x], then n (n)+ x = a(n)? and @ @a = (?1) x

This is shown in appendix A.2.

4.3 Zeroth order converter (simple, dithered, predictive ADC)

This is the easy case where H = L = identity. We will immediately consider the projection on ?(C0). We will derive an algorithm directly without using any of the theorems of the previous section. Signal A is just X ? D0 where D0 is the zero signal in simple ADC, the dither signal in dithered ADC and the feedback output in predictive ADC. Variables x and P a are thus identical and the function to minimize is (a) = 21 kak2V which is proportional to k ja(k)j2. Since the constraint P C (I ) is a system of conditions applied to samples a(k) taken separately, the minimization of k ja(k)j2 is equivalent to the individual minimization of each ja(k)j. If k 2 I , a0 (k) is necessarily the element of Q(k) of lowest magnitude. Figure 4.3 shows an example of minimization solution. This leads to the following algorithm:

Algorithm 1 : Given the knowledge of C0 and X , at every time index k: (1) - calculate D0 (k) the value of the feedback signal G[C0] at time k (2) - calculate A(k) = X (k) ? D0(k)

(3) - determine Q(k), the quantization interval labeled C0(k) shifted by ?A(k) (4) - if Q(k) contains 0, take a0(k) = 0 otherwise, take a0(k) equal to the closest bound of Q(k) to 0.

Then, X + a0 , where a0 results from this algorithm, is the projection of X on ?I (C0). In the particular case of simple ADC (D0 = 0), one can check that this leads to the transformation we introduced in chapter 1 ( gure 1.2).

4.4 1st order converter (1st order  modulation)

In this case L is a single discrete integrator (n = 1). According to fact 4.5, if L[x] = a, then x(k) = a(k) ? a(k ? 1) is the slope of a at time k and @ = ? fx(k + 1) ? x(k)g = ? f(a(k + 1) ? a(k)) ? (a(k) ? a(k ? 1))g @a is minus the change of slope of a about time index k. Here again, we consider that I covers

the whole time range. We propose an algorithm that we describe in a \physical" way. 40

amplitude

0

0

k0

time index k k1 k2 k3 k4 contact points a(k) solution to the constrained minimization problem

1

2

3

4

5

6

7

8

9

10 11 12 13 14

boundaries of Q(k)

Figure 4.3: Solution a0 for zeroth order conversion.

Algorithm 2 : Given the knowledge of code sequence C0 and signal X , (1) - calculate signal A = H [X ] ? G[C0] (2) - determine, for every k, Q(k) the quantization interval labeled C0 (k) shifted by ?A(k) (3) - represent C in time (like in gure 4.2) (4) - attach a \thread" at node (k = 0; a(0) = 0), stretch it under tension between the constraints (arrows in gure 4.2) (5) - take (a0 (k))k>0 on the path of the resulting thread position.

The solution a0 resulting from this algorithm is shown in gure 4.4. Is is easy to see that this signal sati es criteria (4.1) of theorem 4.2. First, it is obvious that a0 2 C . Whenever @ @a is non zero, the thread touches a constraint (in gure 4.4, contact points k = 5; 9; 12; 14; :::). But the way the thread's slope changes is related to the direction of the arrow of contact. For example, in gure 4.4, the change of slope is negative only when the arrow of contact goes up (k = 5; 9) and vice-versa (k = 12). When an arrow of contact goes up, for example at k = 5, for any other admissible signal a 2 C , then we necessarily have a(k) ? a0(k)  0, @ Similarly, if an arrow of contact goes down at k (k = 12), then a(k) ? a0 (k)  0. Since @a 0 @ (k) 6= 0, then is the opposite to the slope's variation, we have just shown that whenever @a 0 @ (k) and a(k) ? a (k) have the same sign. This implies (4.1). Therefore a is the minimum 0 0 @a0 of  subject to C . Then, once a0 is computed, the projection of X on ?I (C0) is the signal ? X + x0 where x0 = L?1 [a0] = a(1) 0 is just the slope of a0 , or the slope of the thread in gure 4.4. This algorithm is easy to translate in terms of computer operations.

41

amplitude

0

0

k0

time index k k1 k2 k3 k4 contact points a(k) solution to the constrained minimization problem

1

2

3

4

5

6

7

8

9

boundaries of Q(k)

10 11 12 13 14

thread path

Figure 4.4: Solution a0 for 1st order  modulation. This is obtained from the \thread algorithm".

4.5 Higher order converter (nth order  modulation) 4.5.1 Introduction

The principle of the \thread algorithm" was to nd the sequence which meets the constraints and, at the same time, minimizes its \slope". For an nth order converter, the principle is similar, but the function to mimimize is the energy of the nth order derivative. For example, when n = 2, we will have to minimize the \curvature" of the signal. This is similar to the physical problem of thin elastic beam under constraint. Figure 4.5 shows the solution minimizing the curvature under the same constraints as in gure 4.4. However, the resolution of this problem includes an extra diculty. Looking at rst order minimization, we can see that the solution is obtained by linear interpolation of two consecutive contact points. The thread algorithm gave an easy way to identify these contact points. With 2nd order minimization, the way the solution is interpolated between two contact points depends on the past but also on how the signal will be constrained on the future. It is not even possible to determine locally whether a constraint will be a contact point of the nal solution or not. Therefore, we will no longer try to compute the projection on ?(C0 ), but only on ?I (C0), where we will try to maximize the time density of I . First, we will assume that the computer has a bu er as long as signals to be processed. In this context, we will present an algorithm which, at every time index k, proposes a choice Ip of p time indices belonging to the past from k = 0, and compute the minimum ap of  subject to C (Ip) from the initial time index 0 to the present time index k. The construction of Ip will be such that Ip = Ip?1 [ fkp g 42

amplitude

0

0

k0

time index k k1 k2 k3 k4 contact points a(k) solution to the constrained minimization problem

1

2

3

4

5

6

7

8

9

boundaries of Q(k)

10 11 12 13 14

thin beam path

Figure 4.5: Solution a0 for 2nd order  modulation. where kp belongs to the future of Ip?1. The computation of ap will be obtained by correcting ap?1 (k) from k = 0 to k = kp?1 , previously computed and stored in a bu er, and adding to the same bu er the speci c values of ap (k) computed from k = kp?1 + 1 to k + kp . This algorithm will provide the exact projection on ?I (C0) where I = Ipm and pm is the total number of selected time indices. If the computer has a bu er limited to length l, we will use the fact that, when computing ap from ap?1 , the correction signal ap (k) ? ap?1(k) decays exponentially when k goes from kp?1 to the past. We will then limit ourselves to compute ap by a correction of the stored value of ap?1 (k) from k = kp?1 ? l + 1 to k = kp?1 . This will lead to an approximation of the projection on ?I (C0).

4.5.2 Recursion principle and rst algorithm

Let us consider a sequence (ki)i1 of increasing time indices (8i  1; 0 < ki < ki?1 ). We call Ip = fk1; :::; kpg and ap the minimum of  subject to C (Ip). We introduce the minimum bp of  subject to the equality constraints E (Ip) = fa 2 V /8i = 1; :::; p ? 1 ; a(ki) = 0 and a(kp) = 1 g Conjecture 4.6 If (ki)i1 is such that 8i  1; ki ? ki?1  n (k0 = 0 by convention), then for p  1, bp veri es

!

@ (k ) = (?1)p?i 8i = 1; :::; p ; sign @b (4.4) i p We discovered this property numerically when L is an nth order integrator. We only have to prove in the case n = 2 (see appendix A.6.2). As a consequence, we have the following fact: 43

Fact 4.7 We suppose that 8i  1; ki ? ki?1  n (k0 = 0). If for a certain p  2, ap?1 veri es   (1) 8i = 1; :::; p ? 1 ; sign @a@p?1 (ki ) = (?1)p?i where  2 f?1; 1g (2) ap?1 (kp) is below (resp. above) interval Q(kp) when  = 1 (resp. ?1) then ap veri es  @  p?i (3) 8i = 1; :::; p ; sign @a p (ki) = (?1)

(4) ap = ap?1 + bp , where  is the distance between ap?1 (kp) and Q(kp) This is a consequence of (4.4), the Euler characterization, theorem 4.3 and the linearity of a 7! @ @a (fact 4.5). The complete proof is shown in appendix A.3. This fact is still true at p = 1, when taking a0 = 0 (zero signal). Of course the criteria (1) disappears because I0 = ;. Since  has not been determined yet, we will have the freedom to choose k1 such that 0 is either below or above Q(k1). Then  will be xed to +1 or ?1 respectively. Fact 4.7 becomes:

Fact 4.8 Assuming that k1  n, if we have

(2') 0 does not belong to Q(k1) then a1 veri es  @  (3') sign @a p (ki ) = , where  = 1 (resp. ?1) if 0 is below (resp. above) Q(k1) (4') a1 = b1 , where  is the distance of 0 to Q(k1)

Proof : Like in the proof of fact 4.7 (A.3) one has to see that b1 2 C (I ) and veri es the Euler inequality with regard to C (I ) 2 These facts give us the foundation for our next algorithm. If we are able to nd a set of p ? 1 increasing time indices Ip?1 = fk1; :::; kp?1g such that the minimum ap?1 of  subject to C (Ip?1 ) can be obtained and veri es criteria (1) of fact 4.7, then we will look for an index kp  kp?1 + n such that criteria (2) is satis ed. Taking Ip = Ip?1 \ fkpg, if we can calculate the minimum bp of  subject to E (Ip), we can calculate the signal ap?1 + bp (where  and  are de ned in criteria (1) and (4)) which is for sure the minimum ap of  subject to C (Ip). According to fact 4.7, ap veri es criteria (3) which can be written:

!

@ (k ) = (?)(?1)(p+1)?i 8i = 1; :::; p ; sign @a i p and which is similar to criteria (1) when replacing p by p + 1 and  by ?. At the rst step p = 1, the initial  is determined according to the choice of k1 and criteria (3'). We formally

describe the algorithm as follows:

Algorithm 3.1 : Given the knowledge of code sequence C0 and signal X , (1) - Initialize p = 0, k0 = 0, I0 = ;, a0 = 0 (zero signal)

(2) - Increment p (3) - For increasing k starting from kp?1 + n, determine Q(k) using the knowledge of C0 and X (like in algorithm 2) (4) - Choose kp  kp?1 + n such that ap?1 (kp) is below (resp. above) Q(kp) if 0 (?1)p = 1 (resp. ?1) (0 is chosen at the same time as k1 during the rst step p = 1) (5) - Calculate p the distance between ap?1 (kp) and Q(kp ) (6) - Take Ip = Ip?1 \ fkpg (7) - Calculate the minimum bp of  subject to E (Ip) 44

(8) - Calculate ap = ap?1 + 0 (?1)pp bp (9) - Go to (2) An example of construction of Ip; ap; bp according to this algorithm is shown in gure 4.6.

4.5.3 More practical version for algorithm 3.1

One should not forget that the computation of ap is only an intermediate step in the computation of the projection X + xp of X on ?Ip (C0): we still have to perform the operation xp = L?1 [ap] = ap(n)? . If we apply the operator L?1 on the recursive relation of line (8) in algorithm 3.1, we nd xp = xp?1 + 0 (?1)pp yp where yp = L?1 [bp] = bp(n)? . However, line (4) of algorithm 3.1 still requires ap?1 (k) to be known for k > kp?1. Therefore, we will still need to determine bp?1(k) for k > kp. We propose the modi ed algorithm:

Algorithm 3.2 : In algorithm 3.1, replace lines (7) and (8) by (7) - Call bp the minimum of  subject to C (Ip)

- calculate yp = bp(n)? - calculate bp(k) for k > kp (8) - - calculate xp = xp?1 + 0 (?1)pp yp - calculate ap (k) = ap?1 (k) + 0 (?1)pp bp(k) for k > kp .

This algorithm requires xp to be memorized at every step p. However, since xp is only an intermediate variable in the computation of the nal solution xpm , it is not necessary to determine xp explicitly on its full time range. Actually, xp is completely determined from the knowledge of (n?1)+; xp(1); x(1)+ p (1); :::; xp

and x(pn)+ (ki ) for i = 1; :::; p Indeed, as a consequence of theorem 4.3 and fact 4.5,

8k 2= Ip; x(pn)+ (k) = 0 This implies that x(pn)+ is known on the full time range and xp can be obtained from an nth order integration of x(pn)+ . But, when reaching the nal step p = pm , xpm itself has to be given explicitely: it is not realistic to compute it by an nth order integration from initial conditions given at k = 1 because of round o error accumulation. Therefore, we propose an intermediate solution which consists in calculating the vectors

2 (n?2)+ 3 x (ki + 1) p 6 77 .. Wp;i = 64 . 5 x(0)+ p (ki + 1)

(4.5)

for every i = 0; :::; p, to characterize xp . Indeed, xp can be reconstructed from (Wp;i)0ip?1 using the following facts: 45

amplitude

a2

δ1 0

δ3 δ2

a0 a1 a3

k0=0

k1

k2

k3

time index k

amplitude

(a)

δ2b2

δ2 0

δ1 δ3

-δ1b1 -δ3b3

k0=0

k1

k2

k3

time index k

(b) Figure 4.6: Example of construction of Ip , ap , bp (with 1  p  3). The selected constraints are shown by dark arrows. 46

Fact 4.9 8k  kp + 1; 8q  0; x(pq)+(k) = 0 and yp(q)+(k) = 0 Proof : Suppose there exists h > kp such that xp (h) = 6 0. Let x0p the sequence equal to xp 0 everywhere except at k = h where xp (h) = 0 and call a0p = L[x0p]. We have kx0pkV < kxp kV which implies (a0p) < (ap ). Since L is causal, a0p will di er from ap only for k  h. Therefore a0p 2 C (Ip) and (a0p)  (ap ) which is impossible. We conclude that 8k > kp, xp(k) = 0. It

will be also the case for its successive forward derivatives for k > kp . The reasoning is the same for yp 2

Fact 4.10 x(pn?2)+ and yp(n?2)+ are piecewise linear on intervals fki?1 + 1; :::; ki + 1g (1  i  p, k0 = 0). For 1  i  p, 8k = ki?1 + 1; :::; ki ? 1, x(pn)+ (k) = 0, because of theorem 4.3 and fact 4.5. Since x(pn)+ (k) = (xp(n?2)+ (k + 2) ? x(pn?2)+ (k + 1)) ? (x(pn?2)+ (k + 1) ? x(pn?2)+ (k)), by de nition of the forward derivative, we conclude that

8k = ki?1 + 1; :::; ki ? 1 ; (x(pn?2)+(k + 2) ? x(pn?2)+(k + 1)) = (x(pn?2)+(k + 1) ? x(pn?2)+(k)) Therefore, x(n?2)+ is piecewise linear on intervals fki?1 + 1; :::; ki + 1g. The reasoning is similar for yp 2 On every interval fki?1 + 1; :::; ki + 1g, x(pn?2)+ (k) can be obtained by linear interpolation of the top components of Wp;i?1 and Wp;i . Then, xp(k) can be obtained by an (n ? 2)th order integration of x(pn?2)+ on this interval, with initial conditions at k = ki?1 + 1 contained in Wp;i?1 . The length of integration will be this time limited to li = ki ? ki?1 . In the particular case where i = p, note that Wp;p is the zero vector according to fact 4.9. This fact also says that xp (k) does not need to be calculated for k  kp + 1 since it is necessarily equal to zero. From line (4) of algorithm 3.2, we need to determine ap (k) at least for k  kp + n. From fact 4.9 we have ap(n)? (k) = xp (k) = 0 for k  kp + 1. This implies that ap (k) can be determined by an nth order integration of the zero signal on k  kp + 1, provided we know the initial conditions at k = kp : ap (kp); ap(1)?(kp ); :::; a(pn?1)?(kp). For this purpose, we de ne the following vector 2 (n?1)? 3 66 ap .. (kp) 77 (4.6) Vp = 4 . 5 (1) ? ap (kp) We will call Wp;i; Vp the vectors associated with ap . They all have size (n ? 1). Note that in 0 ; Vp0 the vectors associated the case n = 2, they are just scalars. Similarly, let us de ne Wp;i with bp as

2 (n?1)? 3 3 2 (n?2)+ bp (kp) y (ki + 1) p 77 6 7 66 . . 0 0 75 (0  i  p) and Vp = 64 .. .. Wp;i = 4 5 ? b(1) p (kp)

yp(0)+ (ki + 1)

Algorithm 3.2 becomes 47

(4.7)

Algorithm 3.3 : In algorithm 3.2, replace lines (1),(3),(4),(7) and (8) by (1) - Initialize p = 0, k0 = 0, I0 = ;, a0 (0) = 0, V0 = On?1 (null vector of size n ? 1)

(3) - For increasing k starting from kp?1 + 1, determine - Q(k) using the knowledge of C0 and X (like in algorithm 2) - ap?1 (k) by nth order integration of the zero signal with initial conditions at k = kp?1 given by ap?1 (kp?1) and Vp?1 (4) - Choose kp  kp?1 + n such that ap?1 (kp) is below (resp. above) Q(kp) if 0 (?1)p = 1 (resp. ?1) (h0 is chosen at the same time as k1 during the rst step p = 1) i T ( n ? 1) ? (1) ? Call Vp0?1 = ap?1 (kp)    ap?1 (kp) 0 )0ip?1 ; Vp0 the vectors associated with the minimum bp of  subject (7) - Calculate (Wp;i to C (Ip) (see formula (4.7)) (8) - Calculate 0 , for 0  i  p ? 2 Wp;i = Wp?1;i + 0 (?1)pp Wp;i 0 ?1 Wp;p?1 = 0 (?1)p p Wp;p 0 p Vp = Vp?1 + 0 (?1) p Vp0 ap(kp) = ap?1 (kp) + 0 (?1)pp

0 )0ip?1 to be memorized for the next step (p + 1). Line (8) requires (Wp;i

4.5.4 Computation of

W

0

0

p;i; Vp

0 and Vp0 can be calculated algebraically from the recursive We show in this section that Wp;i computation of a (2n ? 2)  (n ? 1) matrix Rp and a p(n ? 1)  (n ? 1) matrix Mp. We rst introduce some notation:

Notation 4.11 For any sequence b 2 V , we de ne the column vector h iT Ub(k) = y(n?2)+ (k + 1)    y (0)+(k + 1) b(n?1)? (k)    b(1)? (k) where k 2 N and y = b(n)? (by convention for any q  0, b(q)?(0) = 0).

We have the following preliminary fact:

Fact 4.12 Let k  1, l  n be two integers, and b 2 V such that 8h = 1; :::; l, y(n)+(k +h) = 0 where y = b(n)? . Then Ub (k + l) is uniquely determined by Ub (k), b(k) and b(k + l) as follows: Ub (k + l) = M (l)Ub(k) + N (l)(b(k + l) ? b(k)) where M (l) is a (2n ? 2) square matrix and N (l) a (n ? 1) column vector, both functions of the integer variable l. The expressions of M (l) and N (l) as functions of l, are given in appendix A.4.3.

For the sequence (ki )1ip , let us denote li = ki ? ki?1 with the convention k0 = 0.

Notation 4.13 P and Q are the two (n ? 1)  (2n ? 2) matrices P = [In?1;n?1jOn?1;n?1 ] and Q = [On?1;n?1 jIn?1;n?1 ], where In?1;n?1 and On?1;n?1 are respectively the (n ? 1)2 identity and null matrices.

48

Fact 4.14 Let (Ri)0ip and (Mi)0ip be the matrix sequences recursively de ned by 8 ( > < Ri = M" (li)Ri?1 # T R0 = P M (4.8) M0 = [ ] and 8i = 1; :::; p ; > : Mi = PRii??11 0 )0ip?1 and Vp0 can be expressed in terms of Rp Then the matrix PRp is invertible and (Wp;i and Mp as follows: 3 2 0 66 W..p;0 77 = ?M (PR )?1PN (l ) p p p 4 . 5 0 ?1 Wp;p   Vp0 = Q ? (QRp)(PRp)?1 P N (lp)

(4.9) (4.10)

We study in detail the case n = 2 in appendix A.6

4.5.5 Recapitulation

0 ; Vp0 , algorithm 3.3 becomes With the resolution of Wp;i

Algorithm 3.4 : In algorithm 3.3, replace lines (1) and (7) by (1) Initialize p = 0, k0 = 0, I0 = ;, a0 (0) = 0, V0 = On?1 , R0 = P T , M0 = [ ] (7.1) - Call lp = kp ? kp?1 " # M p ? 1 (7.2) - Take Rp = M (lp)Rp?1 and Mp = PR p?1 2 0 3 Wp;0 7 6 (7.3) - Calculate 64 ... 75 = ?Mp (PRp)?1 PN (lp) 0 ?1 Wp;p ?  and V 0 = Q ? (QR )(PR )?1 P N (l ) p

p

p

p

This requires the inversion of a square matrix (PRp ) of size (n ? 1)2 at every iteration p. Note that in the case n = 2, this is just a scalar inversion. We also see that line (7.3) requires the memorization of the growing matrix Mp of size p(n ? 1)  (n ? 1). We saw in the previous 0 )0ip?1 had to be memorized as well (equivalent to a column vector of size section that (Wp;i p(n ? 1). The inversion of PRp in relations (4.9) and (4.10) can lead to numerical diculties as Rp is a matrix growing with p (see relation (4.8)). This problem can be solved with the following fact: Fact 4.15 Formulas (4.9) and (4.10) are still true with the matrix sequences (Ri)0ip and (Mi)0ip recursively de ned by 8 R = M (l )R T ( < i " i i?1 #i T R0 = P and 8i = 1; :::; p ; > (4.11) M > M0 = [ ] : Mi = PRi?1 Ti i?1

where (Ti )0ip is an arbitrary sequence of invertible matrices of size (n ? 1).

49

This leads to an algorithm with more computation freedom then algorithm 3.4

Algoritm 3.5 : In algorithm 3.4, replace line (7.2) by (7.2) - Choose an (n ? 1) invertible matrix T" p # M p ? 1 Tp Take Rp = M (lp)Rp?1Tp and Mp = PR p?1 Algorithm 3.4 will simply be the particular case of 3.5 where Tp is systematically taken equal to the identity matrix. We propose a particular choice of Tp to make the inversion of PRp practical. At step p, let Up SpVpT be the singular value decomposition of M (lp)Rp?1, where Up is an (2n ? 2)  (n ? 1) matrix with orthonormal column vectors, Sp is an (n ? 1)2 invertible diagonal matrix Vp is an (n ? 1)2 orthogonal matrix. If we take Tp = Vp Sp?1 , we will have Rp = (Up SpVpT )(Vp Sp?1 ) = Up . The inversion of PRp = P Up thus becomes well conditioned. For this choice of Tp, algorithm 3.5 has the particular form:

Algorithm 3.6 : In algorithm 3.5, replace line (7.2) by T (7.2) - Perform the singular value" decomposition # Up SpVp of M (lp)Rp?1 Mp?1 V S ?1 Take Rp = Up and Mp = PR p p p?1

4.5.6 Algorithms 3.4, 3.5 with limited bu er

From the recursive relation of line (8) in algorithm 3.4 or 3.5, it can be derived that

8q  i + 1; Wq;i =

q X

p=i+2

(Wp;i ? Wp?1;i ) + Wi+1;i = 0

q X

p=i+1

0 (?1)pp Wp;i

Suppose pm is the nal number of selected indices. Then, the vectors associated with the nal solution xpm are

Wpm;i = 0

pm X

p=i+1

0 for 0  i  pm (?1)pp Wp;i

(4.12)

0 decays exponentially for a xed i We show in appendix A.6.3 that, in the case n = 2, Wp;i and increasing p. We also observed numerically this behavior for higher orders n. It can be decided that a predetermined number l0 of terms in the summation of (4.12) is enough to give a good approximation of Wpm ;i

Wpm;i ' 0

iX +l0 p=i+1

0 = Wi+l ;i (?1)pp Wp;i 0

(4.13)

In other words, every Wp;i produced h by line (8) shouldi be output as soon as p  i + l0 or i  p ? l0 . This implies that only Wp;p?(l0?1)    Wp;p?1 will be needed and kept in memory 50

for the next iteration step (p + h1). This requires ai bu er of length (l0 ? 1)(n ? 1). Another 0    Wp;p 0 ?1 are the only values needed from line (7) consequence is that, at step p, Wp;p ?l0 of algorithm 3.4 or 3.5. This means that, in line (7.3), only the last l0(n ? 1) rows of Mp need to be known. Therefore, this requires the memorization of the last (l0 ? 1)(n ? 1) rows of Mp?1 from the previous step (p ? 1) for the calculation of line (7.2) at step p. Since the rows of Mp?1 have length (n ? 1), this requires a bu er of length (l0 ? 1)(n ? 1)  (n ? 1). In total, we need a bu er of size (l0 ? 1)n(n ? 1). Conversely, if we have a bu er with a size limited to s, then the number of terms in the approximation (4.13) will be the integer part of n(ns?1) + 1.

51

Chapter 5

Numerical experiments We performed numerical tests to both verify our analytical evaluation of the MSE and validate our projection algorithms. For convenience, but also to keep the same conditions of signal coding as in chapter 3, we dealt with periodic input signals. We actually reduced the complexity of input signals to sinusoids. We draw X0 at random in the Fourier domain and then perform an inverse DFT for di erent oversampling rates (M > 2N + 1). After calculating the code sequence C0 of X0 through a xed A/D converter, we measure the MSE between X0 and the estimate X obtained by alternate projections. We repeat this operation on a certain number of drawn signals (300 to 1000) and calculate the statistical average of the MSE. We know that alternating projections on convex sets converges theoretically to their intersection. In practice, however, we have to limit the number of iterations to only obtain an approximation of an element in V0 \ ?(C0).

5.1 Validation of the R? nusoidal inputs

(2n+2)

behavior of the MSE with si-

Working with sinusoidal inputs, we con rm that the expectation of MSE (X0; X ) decreases with R proportionally to R?(2n+2) , where n is the order of the circuit. In gure 5.1 and 5.2, we show the comparison of performance in SNR between the classical reconstruction and the alternate projection method. We represent the SNR in dB, where the reference (0dB) q2 (variance of a random signal with a uniform distribution corresponds to an MSE equal to 12 in [? 21 ; 12 ]). For classical reconstruction SNR, we used the theoretical formula in [2] giving the in-band noise power for an nth order converter, assuming that the quantization error signal is uncorrelated in time: 2n  q2 (5.1) 2n + 1 R2n+1 where q2 is the variance of the quantization error power. For the alternate projection method, since we want to nd an estimate as close as possible to V0 \ ?(C0 ) to validate our theoretical evaluation of MSE, we stop the iteration process as soon as the relative increment of SNR becomes less than 0.01 dB. For the case n = 0, we considered simple ADC with uniform quantization and input signals of peak-to peak amplitude equal to 2q and random dc component, where q is the step 52

sinusoidal inputs : amplitude = 0.5 q 100 90 multi−stage sigma−delta modulation 80 4th order

SNR (dB)

70 60 3rd order 50 40

2nd order alternate projections

30 1st order

classical reconstruction

20 simple ADC 10 3

3.5

4

4.5 5 5.5 6 6.5 log2(R) (oversampling ratio in octaves)

7

7.5

8

Figure 5.1: SNR of signal reconstruction versus oversampling rate, with classical (theoretical eq.(5.1) and alternate projection method, for simple ADC and 1st to 4th order single-bit multi-stage  modulation. size. By taking this amplitude, we make sure that input sinusoids have at least 4 threshold crossings. As a rst estimate in the iteration, we systematically took the direct conversion of the code sequence, where every sample is the center of the quantization interval indicated by the corresponding code word. For the case n  1, we considered nth order single-bit multi-stage  modulators. The output values of the feedback DACs are  q2 . Input sinusoids drawn for the test have an amplitude (peak-to-peak) equal to 2q with a random phase and a random dc component such that the signal remains in the interval [? 2q ; q2 ]. We take as rst estimate the zero signal. Figure 5.1 shows that we systematically improve the signal reconstruction using alternate projections, regardless of the converter. Moreover, the corresponding curves of SNR (dotted lines) yield a slope of (2n + 2)  3 dB per octave of oversampling R, when n is the order of the converter, while classical reconstruction yield a slope of (2n + 1)  3. The 3 dB/octave slope improvement is more obvious in gure 5.2 where we show the SNR gain over classical reconstruction. We also plot results at even higher rates R to emphasize the asymptotic increase of the gain.

5.2 Eciency of projection algorithms for n  2

We recall that when n  2, due to computation constraints, we could not deal with the direct projection on ?(C0). In this numerical tests, we actually alternated projections between V0 and ?I (C0) where I does not cover the full time range. In section 4.5, we showed how the time indices of I were recursively selected according to C0 and the estimate to project, hoping 53

multi−stage sigma−delta modulation : sinusoidal inputs (amplitude=0.5q) 20

SNR gain over classical reconstruction (dB)

18 16

1st order 2nd 3rd 4th

14 12 10 8 6 4 2 3 dB/octave slope 0 3

4

5

6 7 8 9 log2(R) (oversampling ratio in octaves)

10

11

Figure 5.2: SNR gain of signal reconstruction with alternate projection over classical method, versus oversampling rate. that ?I (C0) will be close enough to ?(C0). But we did not propose any analysis of \quality" of such time index range I , nor proved that alternating projections between V0 and ?I (C0) would converge to an element of V0 \ ?(C0 ). Note that I is not necessarily the same at every iteration, since it depends on the estimate which is being projected. The fact is that those \non-perfect" algorithms still con rm the M ?(2n+2) behavior of the MSE we anticipated in chapter 3. Moreover, in gures 5.1 and 5.2, the absolute improvement of SNR looks similar whether n  2 or n = 0; 1 where we perform the exact projection on ?(C0 ). At last, as shown in table 5.1, the number of iterations needed does not dramatically increase from n = 1 to n = 2.

5.3 Speci c tests for 1st and 2nd order  modulation

We had a closer look at 1st order  modulation with constant inputs. We know from [9, 11] that when V0 is reduced to constant signals (1 dimension), the MSE is lower bounded by O(M ?3 ). In gures 5.1 and 5.2, we have shown that a M ?(2n+2) behavior is recovered as soon as input signals are sinusoids with some substantial amplitude ( 2q ). Now, to better understand the transition between constants and sinusoids, we performed tests where V0 is still the subspace of sinusoids, but input signals X have amplitudes varying between 0 and q . It is important to understand that, in this test0 con guration, the alternate projections 2 will in general lead to a sinusoidal estimate X even if X0 is constant, since X 2 V0 \ ?(X0). However, gure 5.3 shows that the alternate projections don't succeed in improving the O(M ?3) behavior, even though the computed estimates are no longer constrained to be constant themselves. On the other hand, as soon as X0 has an amplitude larger than q=2000, 54

order oversampling rate : R

n=1 n=2 n=3

16 = 4 octaves

17

21

38

32 = 5 octaves

18

23

43

64 = 6 octaves

21

26

36

128 = 7 octaves

21

30

39

Table 5.1: Number of iterations needed in alternate projections, until SNR increment per iteration is less than 0.01 dB. after some \hesitations" at low oversampling rates, the M ?4 behavior is recovered. Figure 5.4 shows the results we obtained for two kinds of 2nd order  modulation: a double loop modulator and a two-stage modulator. When considering sinusoids of amplitude q , we nd a 1 behavior of the MSE for both types of modulation. In the case of constant 2 M6 inputs decoded by constant estimates, Hein and Zakhor found numerically O(M ?6 ) for the two-stage modulator and O(M ?5 ) for the double loop version. Again, since the O(M ?5 ) does not agree with our anticipated M1 6 behavior in the case of double loop modulation, we looked at what the alternate projections would propose when decoding constant inputs with sinusoidal estimates. This time, as shown in gure 5.4, we keep a O(M ?6 ) behavior.

5.4 Further improvements with relaxation coecients

The theorem of alternate projections by Youla [3], remains valid when each projection P is replaced by the following operator:

Pr = P + (r ? 1)(P ? I ) where the relaxation coecient r belongs to ]0; 2[, and I is the identity operator. Note that P0 = I and P1 = P . It is often experimentally observed that taking relaxation coecients greater than 1 can accelerate the convergence process. In our notations, we call and the coecients we apply on the projections on ?(C0) and V0 respectively. We observed that, with simple ADC, we obtained the fastest convergence when = = 2, but also the best results in terms of SNR gain. We show in gure 5.5 the improvement we obtained with regard to the regular case of projection = = 1. With simple ADC, we even found that the iteration ends up with an estimate belonging to the interior of V0 \ ?(C0). We also show in gure 5.5 the improvement we obtained for 1st order  modulation with = 2 and = 1.

5.5 Circuit imperfections As shown in sections 2.2 and 2.5, we can include circuit imperfections in the description of ?(C0), especially deviations in quantization and in the feedback DAC outputs. We have 55

first order sigma−delta modulation : sinusoidal inputs 20

SNR gain over classical reconstruction (dB)

18 16 14 peak−to−peak amplitude q/2 q/200 q/2000 0

12 10 8 6 4 2 3 dB/octave slope 0 3

4

5

6 7 8 9 log2(R) (oversampling ratio in octaves)

10

11

Figure 5.3: SNR gain over classical reconstruction using alternate projections, with 1st order  modulation and sinusoidal inputs of di erent amplitudes. Even with sinusoids of amplitude q=2000, the gain of 3dB/octave is recovered for R high enough. performed these tests with multi-stage single-bit  modulations, which are known to be particular sensitive to feedback nonlinearities We measure the level of deviation in terms of percentage of the step size q . In gure 5.6, we see that 1% of imperfections does not a ect the performances with regard to the ideal case. This means that, if the maximum of excursion of input signals is 1V , 10mV of deviation will not decrease the quality of reconstruction when using alternate projections. We of course assume that these deviations can be measured on the circuit which is being considered. When imperfections are of 10% (100mV !), we still have results similar to an ideal converter when using a classical reconstruction.

56

2nd order sigma−delta modulation : sinusoidal inputs 12

SNR gain over classical reconstruction (dB)

10

two−stage (amplitude=0.5q) double−loop (0.5q)

8

double−loop (0q)

6

4

2

0

−2 3

3 dB/octave slope 3.5

4

4.5 5 5.5 6 6.5 log2(R) (oversampling ratio in octaves)

7

7.5

8

Figure 5.4: SNR gain over classical reconstruction for 2nd order  modulation with constant and sinusoidal inputs. sinusoidal inputs : amplitude = 0.5 q alternate projections alpha=2 beta=1

70

alpha=beta=1

SNR (dB)

60

50

classical reconstruction

40

alternate projections alpha=beta=2

30

alpha=beta=1

1st order sigma−delta

classical reconstruction

20 simple ADC 10 3

3.5

4

4.5 5 5.5 6 6.5 log2(R) (oversampling ratio in octaves)

7

7.5

8

Figure 5.5: Improvement of signal reconstruction using relaxation coecients in simple ADC and 1st order  modulation. 57

multi−stage sigma−delta modulators : sinusoidal inputs (amplitude=0.5q) 110

100

90

imperfection degree 12.5% 10% 5% 1%

SNR (dB)

80 4th order 70 alternate projections

60 3rd order

classical reconstruction

50

40

2nd order

30 3

3.5

4

4.5 5 5.5 6 6.5 log2(R) (oversampling ratio in octaves)

7

7.5

8

Figure 5.6: Signal reconstruction with imperfections in quantization and DAC feedback on 2nd and 3rd order single-bit multi-stage  modulation. With 1% of imperfections, the alternate projections are not a ected. With 10%, performances are still as good as classical reconstruction with an ideal converter.

58

Appendix A

Appendix

A.1 Proof lemmas 3.2 and 3.5

For simplicity, we suppose that the period T0 of input signals is equal to 1 (the time dimension can always be rescaled). To prove lemmas 3.2 and 3.5, the idea is to see that VM(kn) can be uniformly approximated by by the nth order integral U (?n) [t] of U [t] taken at time t = Mk . This approximation will be expressed in lemmas A.2 (n = 1) and lemma A.5 (n  1). Then the maximum value and the average of jV (k)j will be approximated by the in nite and the L1 norms, respectively, of this integral. The upper and lower bounds will be then derived from norm equivalence (lemma A.6).

A.1.1 Preliminary notations and lemmas

Notation A.1 Norms k  k1 and k  k1 Z1 For a continuous function f on [0; 1], kf k1 = max jf [t]j and kf k1 = jf [t]jdt. t2[0;1] 0

Lemma A.2 Let f be a function C 1 on [0; 1]. Then 8M  1; 8k = 1; :::; M , k 1 X f [ j ] ? Z Mk f [t]dt  1 kf 0k M M 1 M 0 j =1 k 1 X j Z Mk 1 M f [ M ] ? 0 jf [t]jdt  M kf 0k1 j =1

(A.1) (A.2)

This is proved in A.1.4. We are going to generalize this lemma to higher order integration. We rst introduce some notation for the nth order continuous time and discrete integration.

Notation A.3 nth order derivative and integral For a function F , C 1 on [0; 1], and an integer n  0, f (n) designates the nth order derivative

and f (?n) is the nth order integral which is equal to 0 at t = 0. More precisely, f (0) = f and for n  1, f (?n) is the integral of f (?n+1) which is equal to 0 at t = 0.

59

Notation A.4 nth order discrete sum

n is the M -point sequence For a function f de ned on [0; 1], and an integer M  1, If;M de ned recursively as follows

0 (k) = f [ k ] and for n  1; 8k = 1; :::; M; I n (k) = 8k = 1; :::; M; If;M f;M M

k X j =1

n?1 (j ) If;M

Lemma A.5 Let f be a function C 1 on [0; 1] and n  0 an integer. k Z 1 k

X n (j ) ? M f (?n) [t]dt  1 X

f (m)

8M  1; 8k = 1; :::; M; M1n If;M (A.3) 1 0 j=1 M m=?n+1 Z k 1 k X M 1

f (m)

f (?n)[t] dt  1 X n (j ) ? 8M  1; 8k = 1; :::; M; M n If;M M 1 0 j=1 m=?n+1 (A.4)

This is proved in A.1.5. Remark : Replacing n by n ? 1, (A.3) can be rewritten

1 X 1 1

f (m)

n ( ? n ) k 8M  1; 8k = 1; :::; M; M n If;M (k) ? f [ M ]  M 1 m=?n+2 . o n We recall that S is the unit sphere of V0: S = U 2 V0 kU~ k = 1 Lemma A.6 Let N be a norm on V0 and n 2 Z. Then   9B+ > 0; 8U 2 S ; N U (n)  B+ If moreover n  0, then   9B? > 0; 8U 2 S ; N U (n)  B?

(A.5)

(A.6) (A.7)

Proof : Properties of nite dimensional normed spaces 2

A.1.2 Proof of lemma 3.2

n (k) for k = 1; :::; M . If we apply (A.5) on By de nition of L, we have exactly V (k) = IU;M f = U , we nd

V (k) 1 X 1

U (m)

( ? n ) k 8M  1; 8k = 1; :::; M; M n ? U [ M ]  M 1 m=?n+2 Let t0 2 [0; 1] be the instant when U (?n) [t] achieves its maximum. We have (?n)

(?n)

U [t0] = U 1

60

(A.8)

For k 2 f1; :::; M g, using Lagrange inequality between Mk and t0 , we nd

(?n) k

(?n)

k U [ M ] ? U 1  j M ? t0j

U (?n+1)

1

Together with (A.8), we nd that 8M  1; 8k = 1; :::; M ,

V (k) 1



n ?

U (?n)

 j Mk ? t0j

U (?n+1)

+ 1 X

U (m)

1 1 1 M m=?n+2 M Since j Mk ? t0 j  1 and M1  1, this immediately gives an upper bound to VM(kn) : V (k) 1 1



n 

U (?n)

+

U (?n+1)

+ X

U (m)

= X

U (m)

1 1 1 1 M

(A.9)

m=?n

m=?n+2

This last term can be bounded by a constant c2 > 0, according to relation (A.6) of lemma A.6. This proves that max jV (k)j  c2 M n 1kM Inequality (A.9) also implies that

V (k) 1



n 

U (?n)

? j Mk ? t0 j

U (?n+1)

? 1 X

U (m)

1 M 1 1 M m=?n+2

For every M  1, it is always possible to nd kM 2 f1; :::; M g such that j kMM ? t0 j  M1 . Then

V (k) V (k ) 1

n  nM 

U (?n)

? 1 X

U (m)

max 1kM M 1 1 M m=?n+1 M

From relation (A.7) of lemma A.6, we can nd B1 > 0 such that

U (?n)

1  B1 . From relation (A.6), we can nd B2 > 0 such that

1 X

U (m)

 B2 1

m=?n+1

As soon as M  2BB12 then

V (k) B2  B1  B ? max 1 n 1kM M M 2

Taking c1 = B21 , then we have shown that when M is large enough max jV (k)j  c1M n 2 1kM

61

(A.10)

A.1.3 Proof of lemma 3.5

Applying (A.4) with f = U and k = M , we nd

M 1 1 X V (k) ?

U (?n)

 1 X

U (m)



M M n 1 M m=?n+1 1 j =1

(A.11)

According to (A.10), we can bound the right hand side by BM2 . Consequently, M V (k)



(?n)

B2 X 1

U 1 ? M  M M n 

U (?n)

1 + BM2 j =1

(A.12)

Applying property (A.7) of lemma A.6, it is possible to nd constants 0 < B? < B+ such that (?n)

B?  kU

k1  B + The right hand side of (A.12) can be bounded by c4 = B+ + B2 . Then for any M  2BB2? , the left hand side of (A.12) can be lower bounded by



c3 = B? ? 2BB2?

?1

B2 = B2? > 0 2

A.1.4 Proof of lemma A.2

Let us consider integers M  1 and k 2 f1; :::; M g. Since f is continuous

8j = 1; :::; k; 9tj 2 Then Z k 1 X f[ j ] ?

M j=1

M

0

k M

h j?1 M

i Z

; Mj ;

j M j?1 M

f [t]dt = M1 f [tj ]

! Z Mj k  k  X 1 X 1 j ] ? f [t ] (A.13) j f [ t ] dt = f [ f [ ] ? f [t]dt = j M j?1 M M M j =1

j =1

M

Since jf j is also continuous, we can derive in a similar way that

8j = 1; :::; k;

9t0j

2

h j?1

j M ;M

For a xed j 2 f1; :::; kg, 8t 2

i

k Z X 1 j ; M f [ M ] ?

h j?1

0

j =1

j M ;M

i

k M

jf [t]jdt = M1

k  X f [ j ] ? f [t0 ]  j M

j =1

(A.14)

! j j j 0 f [ M ] ? jf [t]j  f [ M ] ? f [t]  M ? t maxj jf [s]j  M1 kf 0k1 s2[t; M ]

Therefore, since k  M , both expressions (A.13) and (A.14) can be bounded in absolute value by M1 kf 0 k1 2 62

A.1.5 Proof of lemma A.5

For a xed integer M  1, let us introduce the proposition

Z Mk 1 k X 1 X

f (m)

" n (j ) ? P (n) : \ 8k = 1; :::; M; M n+1 If;M f (?n) [t]dt  M1 1 0 m=?n+1

j =1

Let us show P (n) by induction. P (0) is just the expression of lemma A.2. Suppose now that P (n ? 1) is true for a certain n  1. Using the fact that Z Mj j X j ( ? n ) n n ? 1 If;M (j ) = If;M (i) and f [ M ] = f (?n+1) [t]dt (A.15) 0

i=1

we can write

Z k 1 X n (j ) ? I f;M M n+1 j =1

1 0 Z Mj j k X X 1 1 n?1 (i) ? f (?n) [t]dt = M @ M n If;M f (?n+1) [t]dtA 0 0 i=1 0 j=1k 1 Z Mk X 1 +@ f (?n) [t]dtA f (?n) [ j ] ? k M

M j=1

M

0

(A.16)

Applying the fact that P (n ? 1) is true, we nd

Z j k 1 X 1 X n?1 (i) ? n If;M M M j =1

1 1 X

f (m)



f (m)

 1 X f (?n+1) [t]dt  Mk 2 1 1 M m=?n+2 0 m=?n+2

i=1

j M

(A.17)

Then applying (A.1) on function f (?n) which is C 1, we nd

k 1 X (?n) j Z Mk (?n) 1

(?n+1)

1 M f [ M ] ? 0 f [t]dt  M f j =1

(A.18)

P (n) is then a consequence of (A.16),(A.17) and (A.18). The induction is completed: P (n) is true for every n  0. To show (A.4), we write an equation similar to (A.16), using (A.15)

k Z 1 X If;M n (j ) ? M n+1 j =1

Z j 0 j 1 k (?n) X X 1 1 M n?1 (i) ? f [t] dt = M @ M n If;M 0 f (?n+1) [t]dt A 0 i=1 0 j=1k 1 Z k X M 1 f (?n) [ j ] ? f (n)[t] dtA +@ M k M

M j=1

0

Using the inequality jjaj ? jbjj  ja ? bj and applying P (n ? 1), the rst term can bounded

(?ben+1)

1 in absolute value like in (A.17). The second term can be bounded by M f

1 by ( ? n ) applying (A.2) on f 2 63

A.2 Proof of fact 4.5

Let D?(n) and D+(n) be the linear and time invariant operators whose z transforms are (1?z ?1)n and (z ? 1)n . For two given sequences a and x, let a and x be their respective extension to negative time indices: 8k  1; a(k) = a(k) and x(k) = x(k) 8k  0; a(k) = x(k) = 0 It can be veri ed that a(n)? and x(n)+ are the restriction to N of D?(n) [a] and D+(n) [x]. Let d(?n) and d+(n) be the impulse responses of D?(n) and D+(n) . We have +X 1   8j  1; a(n)? (j ) = d(?n)  a (j ) = d(?n)(k ? j )a(k) k=1

+X 1   8k  1; x(n)+ (k) = d(+n)  x (k) = d(+n)(j ? k)x(j ) j =1 P jx(j )j2 and Suppose that x = L?1 [a] = a(n)? . Then (a) = 1 2 j 1 1 @ jx(j )j2 +X 1 @x(j ) @ (k) = 1 +X = @a 2 j =1 @a(k) j =1 @a(k) x(j )

(A.19) (A.20)

(A.21)

Using (A.19), we have

@x(j ) = @a(n)? (j ) = d(n) (k ? j ) ? @a(k) @a(k) Since the z transforms of D?(n) and D+(n) verify D?(n)(z ) = (?1)n D+(n) (z ?1 ), then

8k 2 Z; d(?n)(k) = (?1)nd(+n)(?k) Using successively (A.21), (A.22), (A.23) and (A.20) we nd +1 1 @ (k) = +X (n) (k ? j )x(j ) = (?1)n X d(n) (j ? k)x(j ) = (?1)n x(n)+ (k) 2 d + ? @a j =1 j =1

A.3 Proof of fact 4.7 Using Euler characterization and unicity of the minimum, we are going to show that s0 = ap?1 + bp is necessarily the minimum of  subject to C (Ip). First, we have to check that s0 2 C (Ip). We use the fact that 8i = 1; :::; p ? 1 ; bp(ki) = 0 and bp(kp) = 1 64

(A.22) (A.23)

This implies that s0 (ki ) = ap?1 (ki) 2 Q(ki ), since ap?1 2 C (Ip?1). We also nd that s0 (kp) = ap?1 (kp) + . Using assumption (2) on ap?1 and the de nition of  in (4), we conclude that s0 (kp) is the closest bound of Q(kp) to ap?1 (kp) (A.24) Therefore s0 (kp) 2 Q(kp ). Let us now show that the Euler inequality characterization is satis ed by s0 with regard to C (Ip). From fact 4.5, we can see that a 7! @ @a is linear. Therefore

@ @ @ (A.25) @s0 = @ap?1 +  @bp Using the fact that C (Ip?1) and E (Ip ) are separable constraints and using theorem 4.3, we nd that 8k 2= Ip?1; @a@ (k) = 0 (A.26) p?1 @ (k) = 0 8k 2= Ip; @b (A.27) p Since Ip  Ip?1 , this implies that @ (k) = 0 (A.28) 8k 2= Ip; @s 0 Let us now study the sign of @s@0 (k)  (a(k) ? s0 (k)) when k 2 Ip and a 2 C (Ip). We rst consider the case k = kp. Applying (A.26) and (4.4) at k = kp, we nd @ (k ) = 0 and @ (k )  0. From (A.25), this implies that @ (k ) =  @ (k ) and @ap?1 p @bp p @s0 p @bp p therefore  @  sign @s (kp) =  (A.29) 0 For any r 2 Q(kp), (r ? s0 (kp)) and (r ? ap?1 (kp )) have the same sign equal to , according to assumption (2) and (A.24). We can apply this to r = ap?1 (kp ) 2 Q(kp) and nd that @ (A.30) @s0 (kp)  (a(kp) ? s0 (kp))  0 Next, we consider k = ki , where 1  i  p ? 1. Because a 2 C (Ip) also belongs to C (Ip?1), we can apply theorem 4.3 on S = C (Ip?1) and nd that @ (k )  (a(k ) ? a (k ))  0 (A.31) i p?1 i @ap?1 i

Using assumption (2) and (4.4), the signs of @a@p?1 (ki ) and @b@p (ki ) are respectively (?1)p?i and (?1)p?i . From (A.25), we derive that

 @  (ki ) = (?1)p?i sign @s 0 65

(A.32)

Since ap (ki) = s0 (ki ), (A.31) implies

@ @s0 (ki )  (a(ki) ? s0 (ki))  0

(A.33)

Gathering (A.28),(A.30) and (A.33), we have

X @ k

@s0 (k)  (a(k) ? s0(k))  0

Property (3) is a consequence of (A.29) and (A.32) 2

A.4 Proof of fact 4.12

Let k  1, l  n be two integers, and b 2 V such that 8h = 1; :::; l, y (n)+ (k + h) = 0 where y = b(n)?.

A.4.1 Preliminaries

De nition A.7 We de ne Sp(k) as follows 8X ( > < k Sp(l) for k  1 1 for k  1 S0 (k) = 0 for k  0 and 8p  0 ; Sp+1 (k) = > l=1 : 0 for k  0

(A.34)

We have an analytical expression of Sp (k), as follows

Fact A.8

8p  1; k  1 ; Sp(k) = k(k + 1):::p(!k + p ? 1)

(A.35)

Proof : This is obtained by induction with p, using de nition A.7 2

Fact A.9 For p  0, and h1; h2 two integers such that 0  h1  h2, then h2 X Sp(r ? h1 ) = Sp+1(h2 ? h1) r=1

Proof : With the change of variable r0 = r ? h1 , using the fact that 1 ? h1  1 and de nition A.7, we nd h2 X r=1

Sp(r ? h1 ) =

h2X ?h1 r0 =1?h1

Sp (r0) =

h2X ?h1

Notation A.10 Sp?1(k) = (Sp(k))?1 66

r0 =1

Sp(r0) = Sp+1(h2 ? h1 ) 2

Fact A.11 8q 2 f1; :::; ng; 8h = 1; :::; l + q ? 1, qX ?1 ( n ? q )+ y (k + h) = Sj (h ? j )y (n?q+j )+ (k + 1) j =0

(A.36)

The proof is shown in section A.4.4. Fact A.12 8q 2 f1; :::; ng; 8h = 1; :::; l,

qX ?1 nX ?1 ( n ? q ) ? ( n ? q + j ) ? (k) + Sq+j (h ? j )y (j )+(k + 1) b (k + h) = Sj (h)b j =0 j =0

(A.37)

The proof is shown in section A.4.5.

A.4.2 Proof of fact 4.12

Considering q = 2; :::; n, relation (A.36) of fact A.11 can be written in matrix way:

8h = 1; :::; l 2 y(n?2)+(k + 1) 3 2 y(n?2)+ (k + h) 3 66 y(n?3)+(k + 1) 77 66 y(n?3)+ (k + h) 77 77 + N1(h)  y(n?1)+ (k + 1) (A.38) 66 7 = M1 (h)  66 .. .. 7 4 5 4 5 . . (0)+ (0)+ y (k + 1) y (k + h) where M1(h) is the (n ? 1) square matrix given in (A.42) and N1(h) is the (n ? 1) column vector given in (A.46). Similarly, considering q = 1; :::; n ? 1, relation (A.37) of fact A.12 can be written in matrix way:

8h = 1; :::; l 2 (n?1)? 2 (n?1)? 3 2 (n?2)+ 3 3 b (k + h) b (k) y (k + 1) 66 b(n?2)?(k + h) 77 66 b(n?2)+(k) 77 66 y(n?3)+(k + 1) 77 66 6 7 7 77 (A.39) .. .. .. 75 = M2(h)  64 75 + M3(h)  664 4 5 . . . (1) ? (1) ? (0)+ b (k + h) b (k) y (k + 1) +N2 (h)  y (n?1)+ (k + 1) where M2 (h) and M3 (h) are the (n?1) square matrices given in (A.43) and (A.44) respectively and N2 (h) is the (n ? 1) column vector given in (A.47). Using the notation Ub (k) (notation 4.13), the relations (A.38) and (A.39) can be summarized as: " # N ( h ) 1 8h = 1; :::; l; Ub(k + h) = M0(h)Ub(k) + N2(h) y(n?1)+(k + 1) (A.40) where M0 (h) is the (2n ? 2) square matrix given in (A.45). Relation (A.37) applied at q = n can be written: 8h = 1; :::; l; b(k + h) = b(k) + [N3(h) N4(h)]Ub(k) + S2n?1 (h ? (n ? 1))y (n?1)+ (k + 1) (A.41) 67

where N3 (h) and N4(h) are the (n ? 1) row vectors given in (A.48) and (A.49) respectively. When taking h = l, S2n?1 (l ? (n ? 1)) 6= 0 since l  n. We can therefore eliminate y (n?1)+ (k + 1) between (A.40) and (A.41) at h = l. We nd Ub (k + l) = M (l)Ub(k) + N (l)(b(k + l) ? b(k)) where M (l) is the (2n ? 2) square matrix given in (A.50) and N (l) is the (2n ? 2) column vector given in (A.51) 2

A.4.3 Analytical expressions of ( ) and ( ) versus M l

N l

l

We recapitulate the construction of matrices M (l) and N (l). From fact A.8, we have 8p  1; Sp(k) = 0 for k  0 and Sp(k) = k(k + 1):::p(!k + p ? 1) for k  1 In paragraph A.4.2, we have constructed

2 66 M1(l) = 666 4

0  0 .. S0(l + 1) 0 . .. . . . . . . . Sn?2 (l ? (n ? 3))    S1(l) S0(l + 1)

S0(l + 1) S1 (l)

3 77 77 75

3 2 S0(l) 0    0 66 .. 77 S ( l ) S ( l ) 0 . 77 6 1 0 M2 (l) = 66 . . . 75 .. .. 4 .. Sn?2 (l)    S1 (l) S0 (l) 3 2 Sn?1 (l ? (n ? 2))    S2(l ? 1) S1(l) 66 Sn?2 (l ? (n ? 2))    S3(l ? 1) S2(l) 77 M3 (l) = 66 .. .. .. 77 . . . 5 4 S2n?3 (l ? (n ? 2))    Sn(l ? 1) Sn?1 (l) " # h

1 (l) 0 M0(l) = M M3 (l) M2 (l)

N1(l) = S1 (l) S2(l ? 1)    Sn?1 (l ? (n ? 2))

h

iT iT

(A.42)

(A.43)

(A.44) (A.45) (A.46)

N2 (l) = Sn (l ? (n ? 1))       S2n?2 (l ? (n ? 1)) h i N3(l) = S2n?2 (l ? (n ? 2))    Sn?1 (l ? 1) Sn (l)

(A.47)

N4 (l) = Sn?1 (l)       S1 (l)

(A.49)

h

"

#

i

1 (l) M (l) = M0 (l) ? S2?n1?1 (l ? (n ? 1)) N N2 (l) [N3(l) N4(l)]

68

(A.48) (A.50)

"

#

N1(l) N (l) = S2?n1?1 (l ? (n ? 1)) N (A.51) 2(l) Once n is xed, M (l) and N (l) can be precalculated in terms of l. We give their expression for n = 2 in appendix A.6.1

A.4.4 Proof of fact A.11 We will use the fact that

hX ?1 ( j )+ ( j )+ 8j  0; 8h  1; y (k + h) = y (k + 1) + y(j+1)+(k + r) (A.52) r=1 which is a consequence of the relation y (j +1)+ (h) = y (j )+ (h + 1) ? y (j )+ (h). We are going to

prove (A.36) by induction with q 2 f0; :::; ng.

By assumption, 8h = 1; :::; l ? 1, y (n)+ (k + h) = 0. Applying (A.52) with j = n ? 1, we nd that 8h = 1; :::; l, y(n?1)+ (k + h) = y (n?1)+ (k + 1) = S0(h)y (n?1)+ (k + 1) since S0 (h) = 1. This proves (A.36) for q = 1. Now let us assume that (A.36) is true for a certain q 2 f1; :::; n ? 1g. Let us consider h 2 f1; :::; l + qg. Applying (A.52) with j = n ? q ? 1 we nd

y(n?q?1)+ (k + h) = y (n?q?1)+ (k + 1) +

hX ?1 r=1

y (n?q)+ (k + r)

(A.53)

Using the assumption that (A.36) is true at q , we nd that

0 1 y (n?q)+ (k + r) = @ Sj (r ? j )y (n?q+j)+ (k + 1)A r=1 j =0 r=1 ! qX ?1 hX ?1 Sj (r ? j ) y(n?q+j)+ (k + 1) =

hX ?1

?1 hX ?1 qX

j =0 r=1

Using fact A.9 and the fact that S0(h) = 1, (A.53) becomes

y (n?q?1)+ (k + h) = S0(h)y (n?q?1)+ (k + 1) +

qX ?1 j =0

Sj+1 (h ? 1 ? j )y (n?q+j)+ (k + 1)

With the change of variable j 0 = j +1 and inserting the rst term into the sum, we nd that, for h = 1; :::; l + (q + 1) ? 1

y (n?(q+1))+ (k + h) =

(q+1) X?1 j 0 =0

Sj0 (h ? j 0)x(n?(q+1)+j 0 )+ (k + 1)

which proves (A.36) for (q + 1). The induction is completed 2 69

A.4.5 Proof of fact A.12 We will use the fact that

h X 8j  0; h  1; b(j)?(k + h) = b(j)?(k) + b(j+1)?(k + r) r=1

(A.54)

which is consequence of the relation b(j +1)+ (h) = b(j )+ (h) ? b(j )+ (h ? 1). Note that (A.54) is still true at k = 0, with the convention b(j )? (0) = 0. We are going to prove (A.37) by induction with q 2 f1; :::; ng. Applying formula (A.36) of fact A.11 at q = n, we have

8h = 1; :::; l; y(k + h) =

nX ?1 j =0

Sj (h ? j )y (j)+(k + 1)

Using (A.54) with j = n ? 1 and the fact that y = b(n)? , we nd that, for h 2 f1; :::; lg,

b(n?1)? (k + h) = b(n?1)? (k) +

h X

y(k + r)

1 = b(n?1)? (k) + @ Sj (r ? j )y (j )+(k + 1)A r=1 j =0 ! nX ?1 X h ( n ? 1) ? =b (k) + Sj (r ? j ) y (j)+ (k + 1) r=1 0 ?1 h nX X

(A.55)

j =0 r=1

Using fact A.9 and the fact that S0(h) = 1 we have

b(n?1)?(k + h) = S0 (h)b(n?1)?(k) +

nX ?1 j =0

Sj+1 (h ? j )y (j)+(k + 1)

This proves (A.37) for q = 1. Now, let us suppose that (A.37) is true for a certain q 2 f1; :::; n ? 1g. Let us consider h 2 f1; :::; lg. Applying (A.54) with j = n ? q ? 1, we nd

b(n?q?1)? (k + h) = b(n?q?1)? (k) +

h X b(n?q)? (k + r)

r=1

Using the assumption that fact A.12 is true at q , we can write

0 ?1 1 h h qX nX ?1 X X b(n?q)? (k + r) = @ Sj (r)b(n?q+j)? (k) + Sq+j (r ? j )y (j)+(k + 1)A r=1 r=1 j =0 j =0 ! ! qX ?1 X nX ?1 X h h ( n ? q + j ) ? = Sj (r) b (k) + Sq+j (r ? j ) y(j)+ (k + 1) j =0 r=1

j =0 r=1

70

Using fact A.9, performing the change of variable j 0 = j + 1 in the rst summation, and using the fact that S0 (h) = 1, (A.55) becomes

8h 2 f1; :::; lg, b(n?(q+1))? (k + h) =

(q+1) X?1 j 0 =0

Sj 0 (h)b(n?(q+1)+j 0)? (k) +

nX ?1 j =0

S(q+1)+j (h ? j )y (j)+(k + 1)

This proves (A.37) for (q + 1). The induction is completed 2

A.5 Proof of fact 4.14 We rst have the following lemma

Lemma A.13 A signal b minimizes  subject to E (Ip) if and only if 8i = 1; :::; p ? 1 ; b(ki) = 0 and b(kp) = 1 8q = 0; :::; n ? 1 ; y(q)+(kp + 1) = 0 8k 2= Ip ; y(n)+(k) = 0 where y = b(n)? .

(A.56) (A.57) (A.58)

Proof : ()) If b minimizes  subject to E (Ip), then b = bp . Relation (A.56) is trivial and (A.58) is a consequence of theorem 4.3 applied to S = E (Ip). Property (A.57) comes from fact 4.9. (() We show that a sequence satisfying (A.56), (A.57) and (A.58) veri es the Euler inequality applied to S = E (Ip ). First, (A.56) directly implies that b 2 E (Ip). Let us take any b0 2 E (Ip). n (n)+ where y = b(n)? (from fact 4.5), we UsingP(A.56), (A.58) and the fact that @ @b = (?1) y 0 have k @ @b (k)  (b (k) ? b(k))  0 2 Since 8k 2= Ip, yp(n)+ (k) = 0, we have, for every i = 1; :::; p,

8l = 1; :::; li ? 1; yp(n)+(ki?1 + l) = 0 Applying fact 4.12 on b = bp with k = ki?1, l = li , we nd that Ubp (ki) = M (li )Ubp (ki?1) + N (li)(bp(ki) ? bp(ki?1 )) Since bp(ki) = 0 for i = 0; :::; p ? 1 and bp (kp) = 1, we have Ubp (ki ) = M (li)Ubp (ki?1) for i = 1; :::; p ? 1 Ubp (kp) = M (lp )Ubp (kp?1) + N (lp) for i = p If we de ne for i = 1; :::; p, Mi = M (li)M (li?1):::M (l1), then Ubp (ki) = MiUbp (0) for i = 1; :::; p ? 1 71

(A.59)

Ubp (kp) = M (lp)Ubp (kp?1) + N (lp) for i = p

(A.60)

0 = PUb (ki ) Wp;i p

(A.61)

Ubp (0) = P T Wp;0 0(0)

(A.62)

For i = 0; :::; p, we have

and in the particular case of i = 0 (k0 = 0), we have Relations (A.59), (A.61) and (A.62) imply that

8i = 1; :::; p ? 1 ; Wp;i0 = (PMiP T )Wp;0 0

(A.63)

0 = On?1 (consequence of (A.57)), relations (A.60), (A.61) and (A.62) Using the fact that Wp;p imply that 0 = (PMp P T )Wp;0 0 + PN (lp) If (PMp P T ) was not invertible, it would be possible to nd a vector Wp;0 00 6= Wp;0 0 such that

0 = (PMp P T )Wp;0 00 + PN (lp) One can check that it would be possible to construct a sequence b0p 6= bp which satis es (A.56), (A.57) and (A.58): this is absurd by unicity of the minimum. Therefore, (PMp P T ) is necessarily invertible. Then,

Wp;0 0 = ?(PMp P T )?1 PN (lp) Relation (A.63) becomes

8i = 1; :::; p ? 1 ; Wp;i0 = ?(PMi P T )(PMpP T )?1PN (lp) From the recursive de nition of Ri and Mi it is easy to verify that 3 2 PP T 66 PM1 P T 77 77 Rp = Mp P T and Mp = 66 .. 5 4 . PMp?1 P T

(A.64)

We know that PRp = PMp P T is invertible. Therefore (A.64) implies formula (4.9). Next, we have Vp = QUbp (kp ) Applying (A.60), (A.62) and using the fact that Rp = Mp P T , we nd formula (4.10) 2

72

A.6 Study of algorithm 3.4 in the case n = 2

A.6.1 Expression of

W

0

0

p;i ; Vp

From appendix A.4.3, the expressions of M (l) and N (l) are " 2 " # # 1 3 2 l + 3 l + 1 6 l 2 M (l) = ? l2 ? 1 1 l(l2 ? 1) 2l2 ? 3l + 1 and N (l) = l2 ? 1 l ? 1 2 We have " P #= [1 0] and Q = [0 1]. Matrix Rp is just a column vector of size 2. Let us write Rp = p . Then PRp = p and QRp = p . The recursive construction (4.8) becomes b

0 = 1; 0 = 0; M0 = [ ] and 8i  1;

2 Consequently Mp = 64

0 3

"

#

"

i = M (l ) i?1 i i i?1

#

"

i?1 ; Mi = M i?1

#

(A.65)

.. 75. From (4.9) and (4.10) we nd .

p?1 0 = ? 2 6 i for i = 0; :::; n ? 1 and V 0 = 2 3 lp ? 1 ? 2 p  Wp;i p lp ?1 p lp ?1 p

A.6.2 Proof of conjecture 4.6 in the case = 2 n

By assumption of conjecture 4.6, 8i = 1; :::; p, li  2. When l  2, the of" M (#l) # " coecients are always negative. This implies that 1 and 1 are negative since 1 = M (l1) 10 . 1 From (A.65), it is easy to see by induction that 8i  1, sign( i) = sign( i) = (?1)i . This 0 ) = (?1)p+1?i . In the context n = 2, W 0 = yp (ki +1). We know from implies that sign(Wp;i p;i fact 4.10 that yp = yp(n?2)+ is piecewise linear on intervals of the type fki?1 + 1; :::; ki + 1g. On such an interval the slope (yp (ki +1)?yp (ki?1 +1))=li of li is necessarily of sign (?1)p+1?i for 1  i  p ? 1. One can verify that this is also true at i = p, using the fact that yp (kp +1) = 0 (fact 4.9). This implies that the variation of slope of yp about time index ki + 1 has a sign equal to (?1)p?i for i = 1; :::; p ? 1. One can check that this is also true for i = p since the slope of yp becomes 0 from k  kp + 1 (fact 4.9). Since yp(2)+ (k) = (yp (k + 2) ? yp(k + 1)) ? (yp (k + 1) ? yp(k)) is the variation of slope  (2)+of ypabout k p+?i1, we have just shown that (2)+ @ 8i = 1; :::; p, sign yp (ki) = (?1) . This leads to (4.4) since y = @bp from fact 4.5

2

A.6.3 Exponential decay of

W

0

p;i

for increasing

p

Since i and i have always the same sign and M (l) have all negative coecients, from the recursive relation (A.65) we can write 2 li +1 j ij = l2i 1?1 ?(2li2 + 3li + 1)j i?1j + 6lij i?1j  2li l+3 2i ?1 j i?1j 73

l+1 2l+1 From algorithm 3.1, li is always chosen greater than 2. Since 2l2l+3 2 ?1 = l+1 is an increasing function, its minimum for l  2 is reached at l = 2 and is worth 25 . Therefore, j i j  25 j i?1 j which means that i increases exponentially. This implies that for a xed i,

jWp;i0 j = l2p?6 1 jj ijj  2 jj ijj p

p

decays exponentially when p increases.

A.7 Proof of fact 4.15

Let (Ri)0ip and (Mi)0ip be the matrix sequences recursively de ned by (4.8), (Ti)0ip be an arbitrary sequence of invertible matrices of size (n ? 1), (R0i )0ip and (M0i )0ip the matrix sequences recursively de ned by

(

R00 = P T M0 = [ ] 0

8 0 > < Ri = M" (li)R0 0i?1T#i and 8i = 1; :::; p ; > M0 = Mi?1 T : i PR0i?1 i

(A.66)

For every p  0, we de ne Sp = T1T2:::Tp (S0 = In?1;n?1 ). Let us show by induction the proposition Q(p) : \ M0p = MpSp and R0p = RpSp " We have: 8p  0, Sp Tp+1 = Sp+1 . Q(0) is true because S0 = In?1;n?1, M0 = M00 = [ ] and R0 = R00 = P T . Now, suppose that Q(p) has been proved for some p  1. Then

M0p+1 =

"

# " # " # M0p T = MpSp T = Mp S = M S p+1 p+1 PR0p p+1 PRp Sp p+1 PRp p+1

and

R0p+1 = M (Kp)R0pTp+1 = M (Kp)RpSp Tp+1 = M (Kp)RpSp+1 = Rp+1Sp+1 This implies that Q(p + 1) is true. The induction is completed. Applying Q(p) and the fact that Sp is invertible, we have (QR0p)(PR0p)?1 = (QRp)(PRp )?1 and M0p (PR0p)?1 = Mp(PRp )?1 We nally complete the proof by using these equalities in (4.9) and (4.10) respectively 2

74

Bibliography [1] J.C.Candy, \A use of limit cycle oscillations to obtain robust analog-to-digital converters", IEEE Trans. Commun., vol. COM-22, pp.298-305, Mar. 1974. [2] S.K.Tewksbury and R.W.Hallock, \Oversampled, linear predictive and noise shaping coders of order N > 1", IEEE Trans. Circuits Syst., vol. CAS-25, pp.436-447, July 1978. [3] D.C.Youla and H.Webb, \Image restoration by the method of convex projections: part 1 - theory", IEEE Trans. Medical Imaging, 1(2):81-94, Oct.1982. [4] J.C.Candy, \A use of double integration in sigma-delta modulation", IEEE Trans. Commun., vol. COM-33, pp.249-258, Mar.1985. [5] J.C.Candy, \Decimation for sigma delta modulation", IEEE Trans. Commun., vol. COM-34, pp.72-72, Jan.1986. [6] Y.Matsuya, K.Uchimura, A.Iwata and al., \A 16-bit oversampling A-to-D conversion technology using triple-integration noise shaping", IEEE J. Solid-State Circuits, vol. CS-22, pp.921-929, Dec.1987. [7] R.M.Gray, \Spectral analysis of quantization noise in a single loop sigma-delta modulator with dc input", IEEE Trans. Commun., vol.37, pp.588-599, June 1989. [8] K.C.H.Chao, S.Nadeem, W.L.Lee and C.G.Sodini, \A higher order topology for interpolative modulators for oversampling A/D conversion", IEEE Trans. Circuits Sys., vol. CAS-37, pp.309-318, Mar.1990. [9] S.Hein and A.Zakhor, \Lower bounds on the MSE of the single and double loop Sigma Delta modulators", Proc. Int. Symp. on Circuits and Systems, pp.1751-1755, May 1990. [10] R.M.Gray, \Quantization noise spectra", IEEE Trans Information Theory, vol. IT36, pp.1220-1244, Nov.1990. [11] S.Hein and A.Zakhor, \New properties of sigma delta modulators with dc input", to appear in IEEE Trans. Commun. [12] S.Hein and A.Zakhor, \Optimal decoding for data acquisition applications of sigma delta modulators", submitted to IEEE Trans. Signal Processing. [13] N.T.Thao and M.Vetterli, \Oversampled A/D conversion using alternate projections", Conf. on Information Sciences and Systems, the Johns Hopkins University, pp.241-248, Mar. 1991. 75

[14] N.T.Thao and M.Vetterli, \Optimal MSE signal reconstruction in oversampled A/D conversion using convexity", Proc. Int. Conf. on Acoust., Speech and Signal Processing, Mar.1992.

76

Suggest Documents