Non-invasive audio identification - CiteSeerX

___________________________________ Audio Engineering Society

Convention Paper Presented at the 112th Convention 2002 May 10–13 Munich, Germany This convention paper has been reproduced from the author's advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be obtained by sending request and remittance to Audio Engineering Society, 60 East 42nd Street, New York, New York 10165-2520, USA; also see www.aes.org. All rights reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the Journal of the Audio Engineering Society.

___________________________________ Time-quantized frequency modulation with time dispersive codes for the generation of sigma-delta modulation M.O.J. Hawksford Centre for Audio Research and Engineering University of Essex UK CO4 3SQ [email protected] http://esewww.essex.ac.uk/research/audio Abstract Time quantization and noise shaping applied to linear frequency modulation (LFM) can form an alternative although unconventional means of generating 1-bit uniformly sampled code that is similar in structure to a feedback sigmadelta modulator (SDM). Fundamental insight into the SDM process and base-line coding spectrum emerges, where specifically linearity of signal conversion is studied and compared to that of linear pulse code modulation (LPCM). Time dispersive limiters both within and outside the noise shaper are investigated and their consequence on linearity explored. A noise averaging simulation reveals intrinsic distortion and noise modulation to be low when appropriate dither is used. Abbreviation DAC digital-to-analogue converter DSD direct stream digital FFT fast Fourier transform LFM linear frequency modulation LPCM linear pulse-code modulation MMiT magic moments in time NST noise shaping transfer function PSZC positive-slope zero crossing SACD Super-Audio Compact Disc SDM sigma-delta modulator TPDF triangular probability density function

HAWKSFORD

TIME QUANTIZED FREQUENCY MODULATION

1 INTRODUCTION This paper attempts to develop further the linkage between sigmadelta modulation (SDM) and time-quantized frequency modulation as an alternative means of code generation. The fundamentals of this study were first proposed in 1972 [1,2] although the application to high performance audio coding in the context directstream digital (DSD) as used at the core of Super Audio CD (SACD) [3] had yet to be recognized and exploited. This present study also includes a linearity comparison between high-order multi-level and binary SDM.

time co-ordinates of a SDM code and is achieved without recourse to a feedback loop. The problem of obtaining SDM code is then transformed into one of redistributing MMiT such that they are constrained in time by the SDM clock. This forms a process of time quantization. If the centre frequency of the LFM is set to one half the clock frequency of SDM and this centre frequency corresponds to an input signal of zero, then it follows that the average number of pulses generated by a SDM is equal to the number of MMiT. Consequently, time quantization only has to redistribute MMiT, it does not have to create additional pulses in the SDM output code. This is important as theoretically it implies that on average there is always a quantized time location for each MMiT.

The subject of SDM has once more been brought into debate and there have been a number of papers that have discussed coding linearity and in particularly the search for an effective dither strategy [4]. Applying analogies with conventional uniform quantization theory it has been stated [5] by the author that the core distortion mechanism in SDM is that more than two quantization levels are required once dither with a triangular probability density function (TPDF) has been added prior to quantization. Lipshitz and Vanderkooy have since studied this topic in depth [6,7] where there is evidence that dither is incapable of complete linearization. There is also an excellent study by Risbo [8] that researches a number of critical factors relating to SDM including chaos and stability. Fortunately their results also reveal that generally distortion products are at such low level that other systems’ imperfections are more likely to dominate in real-world systems, especially where high-order loop filters are employed.

2

LINEARITY OF CODE GENERATED BY MAGIC MOMENTS in TIME (MMiT) Earlier publications [1,2] have established the equivalence of single integrator delta-modulation and time-quantized phase modulation and also of first-order SDM and time-quantized frequency modulation. Hence, by including amplitude modulation to describe sampling, these two models link analogue modulation processes with quantization and noise shaping. The SDM equivalent model incorporates linear frequency modulation where the centre frequency is set normally to one half the bitstream pulse rate. The input signal then modulates the LFM where frequency is proportional to input amplitude. Reference points or MMiT are identified on the oscillator output waveform, such as the positiveslope zero crossings (PSZC), where subsequently these locations are quantized along the time axis using a grid of equally spaced time slots. Where a time-quantized PSZC occurs a 1 pulse is introduced, otherwise a –1 pulse is inserted (logic 0). A first-order model of oscillator, time dither and time quantization is illustrated in Figure 2-1.

At the 110th AES convention [9] the author developed further the ideas of employing linear frequency modulation (LFM) at the core of the SDM process. Here a new method of including dither was proposed where the dither function was applied to time quantization rather than amplitude quantization. In this paper this technique is developed further both by exploiting LFM, presenting further coding examples and exploring quantization techniques that are more appropriate to real-time code generation.

In this section the inherent linearity of the LFM process is considered together with the computational techniques used to calculate the MMiT. At each MMiT a unit impulse is created which then forms an unconstrained SDM code where effectively the time resolution has infinite precision.

The principal motivation for exploiting LFM is that it can be used to generate a sequence of magic moments in time (MMiT) that enable a low-pass filtered binary pulse sequence, to reproduce an analogue signal with virtually zero distortion products except under extreme conditions of modulation. MMiT form the optimum Voltage controlled oscillator

I/P

Zero-crossing extraction

Time quantization

SDM bitstream

MMiT

Temporal dither VCO output

Time displacements

Time slots with period equal to bitstream period

1

1

1 0

1 0

1 0

0

0

1 0

1 0

1 0

Quantized pulse output sequence Process of time quantization

Figure 2-1 Time-quantized frequency modulation model of SDM.

AES 112TH CONVENTION, MUNICH, GERMANY, 2002 MAY 10–13

2

Defining a frequency modulated cosine wave of amplitude A with

f sdm /(2 H )

centre frequency

slfm

then,

æ t ö = A cos ç ò w dt ÷ è t =0 ø

and.

w

= p

y (t ) ö f sdm æ ç1 + ÷ H è Ymax ø

t

t0

æ t æ f æ y (t ) ö ö ö = A cos ç ò ç p sdm ç1 + ÷ ÷ dt ÷ ç t =0 ç H è Ymax ø ÷ø ÷ø è è

t2

tr

t3

tN

To compute MMiT a 3-stage strategy was adopted: 1. 2. 3.

æ f æ = A cos ç p sdm ç t + ç H è è slfm

In this expression

t

ò

t =0

y (t ) Ymax

öö dt ÷ ÷ ÷ øø

is the LFM signal, y(t) is the input signal,

· ·

the equivalent SDM. The H-factor was defined earlier [9] where normally H = 1 . There it was shown that H could be chosen to

·

produce additional signal space to enable time quantization to be applied without quantizer saturation. As such a limited regime for SDM linearity was produced.

The LFM signal is converted to a square wave using a squaring (or sign) function. A difference signal is computed between adjacent samples, that is non-zero only at the zero crossing transitions. By interrogating the sign of the inter-sample difference, the PZCs can be identified and by using a sort function applied to this difference sequence, a vector zr(1:L) computed that contains only the sequenced sample numbers of samples just prior to an actual PZC. Knowing the time location of a sample that precedes a PZC,

·

By using PSZCs as MMiT reference points then the calculation of MMiT is achieved by solving the non-linear equation,

f sdm æ ç tr + H çè

y (t ) ö ò Y dt ÷÷ t = 0 max ør

a sample of the LFM signal

= 2r tra

In discrete signal processing terms it is a difficult task to seek precise solutions. The problem can be visualized with reference to Figure 2-2, where each MMiT is associated with a unique integer r

·

versus time t graph is greater

= p

then by differentiating

f sdm æ çt + H è

t

ò

t =0

y (t ) Ymax

slfm ( zr ( r ) )

slfm ( zr ( r + 1) ) + slfm ( zr ( r ) )

The approximate MMiT value tra is then substituted back into the LFM signal to form an error signal, since if the estimate were exact the LFM signal is zero. A new estimate

tra Þ tra + 1.5 error

ö dt ÷ ø

and an iteration made (say 100 times) until the error (see zerror in subroutine) has converged down to an acceptable level.

¶f / ¶t > 0 , the bound on y(t) is

1 + y ( t ) / Ymax > 0 .

= zr ( r ) -

for tra is then made as,

than zero. That is,

f

each side of a PZC is

using linear interpolation as,

where tr represents the time of the rth-MMiT in the LFM output.

f

slfm

calculated and a more accurate time location tra estimated

tr

providing the slope of the phase

An LFM signal is computed with an oversampling factor ‘of’ over the bit rate fsdm. Linear interpolation is performed either side of a detected PSZC to achieve a closer approximation. Finally, an iterative error-driven procedure is used to converge towards the optimum solution.

To describe the process in more detail a search for MMiT was implemented using the following signal processing techniques:

y ( t ) and fsdm the bit rate of

Ymax the maximum peak value of

if

t1

Figure 2-2 Seeking natural sampling solutions for MMiT {tr}.

whereby

slfm

Phase of y(t)

The following MATLAB1 subroutine was used to perform an iterative search for each MMiT: 1

MATLAB is a trade name of MatWorks Inc.

HAWKSFORD


% search for PZC sd=.5*(1+sign([sign(sm(2:L))-sign(sm(1:L-1)) 0]-.1));

0

-50

% sort approximate PZC locations to determine their time coordinates [p q]=sort(sd.*(1:L)); [mx my]=max(q(Lx/2:L)); % zr is a vector that defines the sample number of the sample just preceding a PZC zr=q(my+Lx/2:L).*sign(p(my+Lx/2:L)); clear p q mx my

-100

-150

-200

-250

-300

% stage 1: linear interpolation to improve time estimate of PZC tr=zr-sm(zr)./(sm(zr+1)-sm(zr)); % stage 2: iterative error correction for PZCs for x=1:100 zerror=cos(g*((tr)*dt-(a1/(h1*w0))*cos(h1*w0*(tr)*dt)(a2/(h2*w0))*cos(h2*w0*(tr)*dt))); tr=tr-1.5*zerror; end

out ( f ) =

åe

10

3

10

4

10

5

10

6

10

7

10

8

Figure 2-3(b) Spectrum of MMiT pulse sequence: A = 0.25. 0

-50

To determine the coding accuracy of the input signal the Fourier transform of a sequence of unit impulses located at each MMiT was calculated. However, because MMiT are asynchronous there is no uniform sampled data to form a fast Fourier transform (FFT) so a direct calculation based upon delay elements is performed, where the output spectrum out(f) is given by the summation, N

-350 2 10

-100

-150

-200

-250

- j 2p ftr -300

r =1

To illustrate any signal distortion resulting from the LFM process, example spectra are displayed in Figure 2-3(a-d) derived using two equal amplitude sinusoids of frequencies 19 kHz and 20 kHz, where the amplitude per simulation are 0.45, 0.25, 0.1 and 0.01 respectively (1 representing 100% modulation depth). Typical error values in computing MMiT location are estimated to be in the range 10-7 to 10-12, the higher level of error occurring only under extreme modulation levels. Figure 2-3(a) shown in red is the only spectrum to indicate overload where the 2 sinusoidal inputs combine to give a modulation depth of about 0.9. Backing off the amplitude to 0.25 reveals that there is no significant spectral spillage within the audio band. Also, in Figure 2-3(d) spectral replication, as harmonics of the sampling frequency of 2.8 MHz is evident.

-350 2 10

10

3

10

4

10

5

10

6

10

7

10

8

Figure 2-3(c) Spectrum of MMiT pulse sequence: A = 0.10. 50

0

-50

-100

-150

-200

-250 0 -300 -20 -350 2 10

10

3

10

4

10

5

10

6

10

7

10

8

-40

Figure 2-3(d) Spectrum of MMiT pulse sequence: A = 0.01. -60

-80

-100

-120

-140 2 10

10

3

10

4

10

5

10

6

10

7

10

8

Figure 2-3(a) Spectrum of MMiT pulse sequence: A = 0.45.


These results demonstrate the spectral spread resulting from LFM, where except under extreme modulation, there is negligible overlap within the signal band. As such the computational precision and significance of the MMiT is validated. Hence, in investigating strategies for time-quantization it is important to observe that there is virtually no signal degradation in converting from input to MMiT sequence. Consequently, any distortion is due solely to time quantization in forming a uniformly sampled pulse sequence.

4

HAWKSFORD


shaping. It is significant, that within the temporal noise-shaping loop no amplitude limits are imposed on the quantizer. However, the greater the order of the noise shaper the greater becomes the temporal shifting of pulses away from their optimum MMiT with resulting increases in quantization noise. It is constructive to consider the time dislocation of pulses as a form of jitter, so a rise in noise is inevitable. It has been shown [9] that the noise shaper must be combined with TPDF in order to decorrelate this deliberate jitter from the signal. As such the time quantization process is noisy but linear.

3 TIME QUANTIZATION This section reviews the basic techniques of time quantization used to relocate MMiT to a uniformly sampled and thus constrained pulse sequence. Knowing the MMiT locations {tr}, quantization can be performed using a combination of time dither and temporal noise shaping, where the process takes as input the time sequence,

MMiT Þ

{tr }r =0 . N

The process with only time dither and time quantization is illustrated in Figure 3-1, while Figure 3-2 includes temporal noise

Time quantizer time dither scaled to match time quanitization interval 1/f sdm

Time output 1 fSDM

{tr}

{tqr}

1 fSDM

time location of 1 pulses Time input

Figure 3-1 Time quantization with time dither.

time dither scaled to match time quanitization interval 1/f sdm

Time quantizer Time output 1 fSDM

{tr}

1 f SDM Time input

{tqr} time location of 1 pulses

HTN(z) Figure 3-2 Time quantization with temporal noise shaping and time dither. The noise shaping process normally introduces a redistribution of the pulse sequence although using a short buffer, correction can be achieved by re-sequencing. However, if the noise shaper assigns two or more pulses to the same time slot then a multi-amplitude pulse is formed that is in conflict with the coding requirement of binary SDM. Employing either an open loop or closed-loop


temporal error correction strategy can solve this problem. 3.1 Open-loop temporal error correction: Type 1 The occurrence of coincident and incorrectly sequenced pulses can be corrected by using a novel sort procedure (introduced in [9]) that is applied open loop after the noise shaper. This process

5

HAWKSFORD


guarantees the required positive arithmetic progression of pulses, seeks out both dual and multiple coincident pulses and translates coincident samples into a near-symmetric bi-directional pulse distribution. A critical requirement of this process is that the number of unit pulses remains invariant so there is no loss of pulse area. Consider a vector

[ X r ]r =1 N

[Yr ]r =1 with non-coincident pulses is computed as follows, N

where

(

)

= sort [ X r ]r =1 - [1: N ] + [1: N ] N

[1: N ] implies a vector [1 2 3 … r … N].

To demonstrate the validity of this algorithm, three examples are considered, where the vector length is selected arbitrarily as N = 10. Example 3.1.1

[ X r ]r =1 10

= [2 5 8 11 9 9 13 15 16 19]

Subtracting vector

[ Z r ]r =1 10

[1:10] ,

Re-ordering into a positive sequence,

sort [ Z r ]r =1 = [1 3 3 4 5 6 7 7 7 9] 10

[Yr ]r =1 10

[1:10] ,

= [2 5 6 8 10 12 14 15 16 19]

The sum of sample co-ordinates = 107 for both vectors X and Y and the error between input and output vectors is,

[ X r ]r =1 10

error =

- [Yr ]r =1 10

= [0 0 2 3 - 1 - 3 - 1 0 0 0] The error reveals a zero mean showing the area under the pulse sequence has not be altered while the coincident pulse have been dispersed in time. Example 3.1.2

[ X r ]r =1 10

= [2 5 8 9 13 13 13 16 18 19]

whereby after sorting,

[Yr ]r =1 10

- [Yr ]r =1 10

Example 3.1.3

[ X r ]r =1 10

= [2 2 8 9 13 13 13 16 19 19]

whereby after sorting,

[Yr ]r =1 10

= [1 3 8 9 11 13 15 16 18 20]

Sum of sample co-ordinates = 114 for both vectors X and Y and the error between input and output vectors is,

error =

[ X r ]r =1 10

- [Yr ]r =1 10

= [1 - 1 0 0 2 0 - 2 0 1 - 1]

= [1 3 5 7 4 3 6 7 7 9]

Finally adding back the vector

10

= [0 0 0 0 2 0 - 2 0 0 0]

The sorted vector

N

[ X r ]r =1

error =

of length N that contains the time

quantized sample locations of 1-pulses.

[Yr ]r =1

Sum of sample co-ordinates = 116 for both vectors X and Y and the error between input and output vectors is,

= [2 5 8 9 11 13 15 16 18 19]


Observe in each example how the sorted values are unique and arranged in an arithmetic progression. Also, the area under the input and output sequences are invariant. 3.2 Open-loop temporal error correction: Type 2 Section 3.1 showed how the problem of coincident pulses could be solved using a sort function and a uniformly sequenced offset vector. The technique achieved a redistribution of pulses where the pulse dispersion was shown to correct anomalies in the noise shaper output. An earlier paper [5] investigated a technique of pulse redistribution to account for quantizer overload where it was prudent to use symmetric pulse redistribution about each anomaly requiring correction. A symmetrical mapping function has the advantage of producing only errors in the amplitude spectrum of the function, the phase being zero. When asymmetric functions are used there will be both amplitude and phase errors. The technique of symmetrical pulse mapping requires an iterative procedure to convert multiple coincident pulses into a binary sequence. The mapping function is defined as follows: If M > 1 is a positive integer where

sdm( x) = M then replace sdm(x) and adjacent samples using the symmetrical mapping,

sdm( x) Þ M - 2 sdm( x - 1) Þ sdm( x - 1) + 1 sdm( x + 1) Þ sdm( x + 1) + 1 Similarly, if M < -1 is a negative integer where

6

HAWKSFORD


sdm( x) = - M then,

sdm( x) Þ M + 2 sdm( x - 1) Þ sdm( x - 1) - 1 sdm( x + 1) Þ sdm( x + 1) - 1

SDM 0 = [0 2 0 1 2 1 0 3 0 0 1 0 2 0 ] In this more extreme example where the sum of the vector elements is 12 out of a vector length of 14, the symmetric pulse mapping required 7 iterations to converge to a binary sequence. The intermediate results are given in the following table where each row shows the sequence progression commencing with the source pulse sequence. The iteration number is given in the lefthand column. Symmetric pulse mapping showing all 7 iterations to converge 0 1 2 3 4 5 6 7

Mapping is applied progressively to the SDM code that has been derived from the temporal noise shaper where each sample outside the amplitude range –1 to 1 is converted. However, a single pass does not guarantee the SDM code is constrained to binary, therefore the process is then repeated until the condition is met. Simulation has confirmed that this mapping converges to a binary sequence providing the sum of the samples in a vector of length N does not exceed N. To illustrate the procedure, consider the following two examples: Example 3.2.1 (same as example 3.1.3, Section 3.1) The temporal noise shaper output is,

[ X r ]r =1 10

= [2 2 8 9 13 13 13 16 19 19]

from which the SDM code is derived as:

SDM 0

= [0 2 0 0 0 0 0 1 1 0 0 0 3 0 0 1 0 0 2 0]

Applying symmetric pulse mapping, then

SDM 1 =

[1 0 1 0 0 0 0 1 1 0 0 1 1 1 0 1 0 1 0 1]

which gives an input-output error,

error = [-1 2 - 1 0 0 0 0 0 0 0 0 - 1 2 - 1 0 0 0 - 1 2 - 1] Comparing this code with Example 3.1.3 in Section 3.1, then

SDM p

= [1 0 1 0 0 0 0 1 1 0 1 0 1 0 1 1 0 1 0 1]

where the corresponding error is,

error = [-1 2 - 1 0 0 0 0 0 0 0 - 1 0 2 0 - 1 0 0 - 1 2 - 1]

[ X r ]r =1 10

= [2 2 4 5 5 6 8 8 8 11 13 13]

where transforming into multi-level SDM code gives the initial sequence as,


2 0 0 1 1 1 1 1

0 1 2 0 1 1 1 1

1 2 0 2 0 1 1 1

2 0 2 0 2 0 1 1

1 2 0 2 0 2 0 1

0 1 2 0 2 0 2 0

3 1 1 2 0 2 0 1

0 1 1 1 2 0 1 1

0 0 0 0 0 1 1 1

1 1 1 1 1 1 1 1

0 1 1 1 1 1 1 1

2 0 0 0 0 0 0 0

0 1 1 1 1 1 1 1

The process converged after the 7th iteration to form the binary sequence,

SDM 7 = [1 1 1 1 1 1 0 1 1 1 1 1 0 1] where the overall error vector is,

error = [-1 1 - 1 0 1 0 0 2 - 1 - 1 0 - 1 2 - 1] This class of pulse amplitude limitation can be applied not only to the code produced by a temporal noise shaper and LFM model, but also to the output of a conventional [10] multi-level SDM noise shaper. In this case it forms a multi-level to 2-level transformer providing the input signal falls within the coding range of the down-converted binary code. Such a process can be used in digital-to-analogue conversion (DAC) where the DAC amplitude resolution is less than the resolution of the noise shaper output. 3.3 Closed-loop temporal error correction: Type 3 Rather than apply correction for pulse coincidence as a separate process applied to the output of the temporal noise shaper, correction can be applied directly within the feedback loop. The process is summarised as follows where tq(n) represents the quantizer output after the nth sample: 1.

The present value of tq(n) is compared against a time window of past samples and tested for any identical values.

2.

If coincidence is detected, then a quantum tq(n) as,

Comparing the error terms shows that although there is similarity, the symmetric pulse substitution has achieved one error cluster where the time dispersion is halved, a result that should contribute to lower distortion in the recovered signal. Example 3.2.2 This example shows a SDM code with more densely packed clusters of multi-level pulse, that require more than one iteration to form a binary sequence. The temporal noise shaper output is,

0 1 1 1 1 1 1 1

d

is added to

tq ( n ) Þ tq ( n ) + d 3.

The test is reapplied and if coincidence is detected once again, then a further quantum is added to tq(n) and so on, otherwise the current values is assigned to the output.

Using this technique pulse coincidence is eliminated and because the process is contained within a feedback loop, any modification to tq(n) is considered an inherent characteristic of the quantizer and partially corrected by feedback However, in this process there is no constraint placed upon the degree of pulse displacement which can occur, this may then contribute additional noise to the process.

7

HAWKSFORD


3.4

Closed-loop temporal error correction with constrained time dispersion: Type 4 Because displacing samples in SDM code is similar to a controlled jitter process, it is advantageous to constrain the peak temporal pulse displacement introduced by the noise shaper. This can be achieved both by constraining the noise shaper that inevitably becomes more energetic as its order is raised and also by limiting the maximum pulse displacement range.

Black: SDM spectrum, Red: 16 bit @ 44.1 kHz noise level, Blue: noise-shaping transfer function, Green: FM 0

-20

-40

-60

-80 -100

To constrain the output range of the quantizer a limiter function is introduced just before the coincidence detector-corrector (described in Section 3.3). A quantizer input-output error ddt is calculated at each sample instance, where if tr(n) and tq(n) are the respective input and outputs of the noise shaper just after the nthsample is processed, then in MATLAB notation, ddt=round((tr(n)-tq(n))/of); If the limiter is set at lim, then a program fragment defines the process as, if ddt>lim tq(n)=tq(n)+(ddt-lim)*of; elseif ddt