maximal coherence rotation for stereo coding

3 downloads 0 Views 299KB Size Report
Could We Increase the Interchannel 'Similarity'? Maximal Coherence Rotation (MCR). Coherence — Measuring Interchannel Similarity coh(Y0. ,Y1. ) = (Y0. ,Y1. ).
MAXIMAL COHERENCE ROTATION FOR STEREO CODING Shuhua Zhang1, Weibei Dou2, and Huazhong Yang Tsinghua National Laboratory for Information Science and Technology, Department of Electronic Engineering, Tsinghua University E-mail: [email protected]; [email protected]

Introduction

continued

Stereo Audio Coding — Channel Representation Model I Mid/Sid(M/S) Stereo: “sum + difference” I Intensity Stereo(IS): “one base channel + a few intensity ratios” I AMR-WB+ stereo module: “one base channel + a few interchannel prediction coefficients and gains” I Binaural Cue Coding (BCC) and MPEG-4 Parametric Stereo (PS): “one base channel + a few spatial parameters” I By Karhunen-Loeve ` Transform (KLT): “main + minor”

Applications in Stereo Coding

Mathematical Properties of the MCR

MCR as A Stereo Coding Tool I MCR stereo coding scheme

Y0 and Y1 have equal length after the MCR rotation  1  kY0k = kY1k = √ kXl k2 + kXr k2 . 2 I The difference between Y0 and Y1 , Yd = Y0 − Y1 is minimized I

Right

Find θMCR

kYdk2 = kXl k + kXr k2 − 2hY0, Y1i. I

Left

The coherence after the MCR rotation is always non-negative

Y0

MCR

Y1 ϕY

Xr

I

Mono Encode

T/F

Mono Bitstream

Parameter Bitstream

Subjective listening test 3

Coherence — Measuring Interchannel Similarity hY0, Y1i coh(Y0, Y1) = , kY0k · kY1k

Relative Score

Binaural Hearing Scene Rotating Sound Sources to the Median Plane MCR

Virtual Source

Constrains Source

Linear: easy computation I Invertible: recovering the original channel pair I Orthogonal: no amplification of quantization noises

Median Plane

s(t)

I

hr (t)

General Form — Rotation on the 2-D plane

hl (t)

Left

−3 −2 −1 0

sl (t)

Right

sr (t)

I

Considering only intensity (level) induced lateralization I After MCR, Interchannel Level Difference (ILD) 0 dB I A source is moved to the median plane, virtually I After MCR, Interchannel Coherence (IC) increases I A source becomes more compact: narrowed sound image width

-1 -2 n 1 02 03 1 1 2 3 1 2 3 a 0 2 3 0 0 0 0 0 0 e 0 0 0 es es es sm sm sm si si si sc sc sc m

much worse 3 much better worse 2 better slightly worse 1 slightly better equal

Results

MCR as A Stereo Preprocessing Tool I downmix distortion: out-of-phase channel pairs annihilate I after the MCR: always in phase, coh(Y0 , Y1 ) ≥ 0 I after the MCR: positive and larger IC I downmixing after the MCR: no annihilation

The MCR Angle and Sound Source Azimuth

Maximizing coherence ⇔ Maximizing inner product

0

1. speeches: essentially the same 2. music: mostly better (MCR vs AMR-WB+) 3. overall: 0.5 higher averagely 4. low score of si02: mainly due to pre-echo (strongly transient components, but stereo impression better. 5. low score of si03: mainly due to envelope undulation (MDCT processing of strongly harmonic components), also stereo impression better

I

Conditions of Coherence Maximizing

1

1. test sequences: 12 stereo clips from MPEG, 10 – 20 s long, sampling rate 48 kHz 2. test subjects: 15 young normal listeners 3. bitrate: MCR, AMR-WB+ 24 kbps 4. standard: ITU-R Rec. P.830-1, relative score (MCR vs AMR-WB+)

where left channel subband spectral vector (row) right channel subband spectral vector (row) rotation angle subband vectors after rotation (row)

2

-3

    Y0 cos θ sin θ Xl , Y1 − sin θ cos θ Xr

I

F/T

1. Time-to-Frequency (T/F) mapping: MDCT, block length 2048, 48 subbands 2. Downmix: ‘0.5×left + 0.5×right’ 3. Mono coder: MPEG-2 AAC 4. θMCR coding: 48 per frame, vector quantized, 4 dimensions, 2 bit per dimension 5. decoding: essentially the inverse of the MCR encoding

Xd

Maximal Coherence Rotation (MCR)

where θMCR is called MCR angle.

Down Mix

Quantize θMCR

ϕX

θ

. . .

Yd

Xl

θMCR = arg maxcoh(Y0, Y1),

Apply MCR

Geometric Illustration of the Mathematical Properties of the MCR

Could We Increase the Interchannel ‘Similarity’?

Coherence Maximizing

. . .

coh(Y0, Y1)|θ=θMCR ≥ 0.

Stereo Coding Gain Positively Correlates with Interchannel ‘Similarity’

Xl Xr θ Y0,1

T/F

Front

θMCR = arg maxhY0, Y1i. θ

I

kXr k2 − kXl k2 hY0, Y1i = sin 2θ + cos 2θhXl , Xr i. 2 I

An orthogonal transform MCR — maximizing Interchannel Coherence (IC) Left

θMCR on [−π/2, π/2]

θMCR = − π4

θ de

  hXl , Xr i ≥ 0, θ0, θMCR = θ0 − π/2, hXl , Xr i < 0, θ0 ≥ 0,   θ0 + π/2, hXl , Xr i < 0, θ0 < 0, where 1 kXr k2 − kXl k2 θ0 = arctan . 2 2hXl , Xr i

Conclusions

θMCR = 0

The inner product as a function of theta

Suppose hXl , Xr i ≥ 0 I The MCR angle loosely corresponds to the azimuth I If θMCR = 0, the source locates on the median plane I If θMCR = −π/4, the source locates to the left I If θMCR = π/4, the source locates to the right

θMCR =

π 4

Right

The 3 MCR properties — equal power, minimized difference, and non-negative coherence An approximate relation between the MCR angle and the azimuth of a sound source

I

Circuits and Systems Lab – Department of Electronic Engineering – Tsinghua University – Beijing, China

Future works I Extend the real MCR to complex MCR I Reduce the MCR artifacts on strong harmonic or transient signals I Extend the 2-channel MCR to multichannel MCR

This work is supported by the National Science Foundation of China (NSFC 60832002).