MAXIMAL COHERENCE ROTATION FOR STEREO CODING Shuhua Zhang1, Weibei Dou2, and Huazhong Yang Tsinghua National Laboratory for Information Science and Technology, Department of Electronic Engineering, Tsinghua University E-mail:
[email protected];
[email protected]
Introduction
continued
Stereo Audio Coding — Channel Representation Model I Mid/Sid(M/S) Stereo: “sum + difference” I Intensity Stereo(IS): “one base channel + a few intensity ratios” I AMR-WB+ stereo module: “one base channel + a few interchannel prediction coefficients and gains” I Binaural Cue Coding (BCC) and MPEG-4 Parametric Stereo (PS): “one base channel + a few spatial parameters” I By Karhunen-Loeve ` Transform (KLT): “main + minor”
Applications in Stereo Coding
Mathematical Properties of the MCR
MCR as A Stereo Coding Tool I MCR stereo coding scheme
Y0 and Y1 have equal length after the MCR rotation 1 kY0k = kY1k = √ kXl k2 + kXr k2 . 2 I The difference between Y0 and Y1 , Yd = Y0 − Y1 is minimized I
Right
Find θMCR
kYdk2 = kXl k + kXr k2 − 2hY0, Y1i. I
Left
The coherence after the MCR rotation is always non-negative
Y0
MCR
Y1 ϕY
Xr
I
Mono Encode
T/F
Mono Bitstream
Parameter Bitstream
Subjective listening test 3
Coherence — Measuring Interchannel Similarity hY0, Y1i coh(Y0, Y1) = , kY0k · kY1k
Relative Score
Binaural Hearing Scene Rotating Sound Sources to the Median Plane MCR
Virtual Source
Constrains Source
Linear: easy computation I Invertible: recovering the original channel pair I Orthogonal: no amplification of quantization noises
Median Plane
s(t)
I
hr (t)
General Form — Rotation on the 2-D plane
hl (t)
Left
−3 −2 −1 0
sl (t)
Right
sr (t)
I
Considering only intensity (level) induced lateralization I After MCR, Interchannel Level Difference (ILD) 0 dB I A source is moved to the median plane, virtually I After MCR, Interchannel Coherence (IC) increases I A source becomes more compact: narrowed sound image width
-1 -2 n 1 02 03 1 1 2 3 1 2 3 a 0 2 3 0 0 0 0 0 0 e 0 0 0 es es es sm sm sm si si si sc sc sc m
much worse 3 much better worse 2 better slightly worse 1 slightly better equal
Results
MCR as A Stereo Preprocessing Tool I downmix distortion: out-of-phase channel pairs annihilate I after the MCR: always in phase, coh(Y0 , Y1 ) ≥ 0 I after the MCR: positive and larger IC I downmixing after the MCR: no annihilation
The MCR Angle and Sound Source Azimuth
Maximizing coherence ⇔ Maximizing inner product
0
1. speeches: essentially the same 2. music: mostly better (MCR vs AMR-WB+) 3. overall: 0.5 higher averagely 4. low score of si02: mainly due to pre-echo (strongly transient components, but stereo impression better. 5. low score of si03: mainly due to envelope undulation (MDCT processing of strongly harmonic components), also stereo impression better
I
Conditions of Coherence Maximizing
1
1. test sequences: 12 stereo clips from MPEG, 10 – 20 s long, sampling rate 48 kHz 2. test subjects: 15 young normal listeners 3. bitrate: MCR, AMR-WB+ 24 kbps 4. standard: ITU-R Rec. P.830-1, relative score (MCR vs AMR-WB+)
where left channel subband spectral vector (row) right channel subband spectral vector (row) rotation angle subband vectors after rotation (row)
2
-3
Y0 cos θ sin θ Xl , Y1 − sin θ cos θ Xr
I
F/T
1. Time-to-Frequency (T/F) mapping: MDCT, block length 2048, 48 subbands 2. Downmix: ‘0.5×left + 0.5×right’ 3. Mono coder: MPEG-2 AAC 4. θMCR coding: 48 per frame, vector quantized, 4 dimensions, 2 bit per dimension 5. decoding: essentially the inverse of the MCR encoding
Xd
Maximal Coherence Rotation (MCR)
where θMCR is called MCR angle.
Down Mix
Quantize θMCR
ϕX
θ
. . .
Yd
Xl
θMCR = arg maxcoh(Y0, Y1),
Apply MCR
Geometric Illustration of the Mathematical Properties of the MCR
Could We Increase the Interchannel ‘Similarity’?
Coherence Maximizing
. . .
coh(Y0, Y1)|θ=θMCR ≥ 0.
Stereo Coding Gain Positively Correlates with Interchannel ‘Similarity’
Xl Xr θ Y0,1
T/F
Front
θMCR = arg maxhY0, Y1i. θ
I
kXr k2 − kXl k2 hY0, Y1i = sin 2θ + cos 2θhXl , Xr i. 2 I
An orthogonal transform MCR — maximizing Interchannel Coherence (IC) Left
θMCR on [−π/2, π/2]
θMCR = − π4
θ de
hXl , Xr i ≥ 0, θ0, θMCR = θ0 − π/2, hXl , Xr i < 0, θ0 ≥ 0, θ0 + π/2, hXl , Xr i < 0, θ0 < 0, where 1 kXr k2 − kXl k2 θ0 = arctan . 2 2hXl , Xr i
Conclusions
θMCR = 0
The inner product as a function of theta
Suppose hXl , Xr i ≥ 0 I The MCR angle loosely corresponds to the azimuth I If θMCR = 0, the source locates on the median plane I If θMCR = −π/4, the source locates to the left I If θMCR = π/4, the source locates to the right
θMCR =
π 4
Right
The 3 MCR properties — equal power, minimized difference, and non-negative coherence An approximate relation between the MCR angle and the azimuth of a sound source
I
Circuits and Systems Lab – Department of Electronic Engineering – Tsinghua University – Beijing, China
Future works I Extend the real MCR to complex MCR I Reduce the MCR artifacts on strong harmonic or transient signals I Extend the 2-channel MCR to multichannel MCR
This work is supported by the National Science Foundation of China (NSFC 60832002).