Chroma and Chords - Department of Electrical Engineering

ELEN E4896 MUSIC SIGNAL PROCESSING

Lecture 11: Chroma and Chords 1. Features for Music Audio 2. Chroma Features 3. Chord Recognition

Dan Ellis

Dept. Electrical Engineering, Columbia University [email protected] E4896 Music Signal Processing (Dan Ellis)

http://www.ee.columbia.edu/~dpwe/e4896/ 2013-04-08 - 1 /18

1. Features for Music Audio

• Challenges of large music databases how to find “what we want”...

• Euclidean metaphor music tracks as points in space

• What are the dimensions?

“sound” - timbre, instruments → MFCC melody, chords → Chroma rhythm, tempo → Rhythmic bases E4896 Music Signal Processing (Dan Ellis)

2013-04-08 - 2 /18

MFCCs

Logan 2000

• The standard feature for speech recognition 0.5

Sound

0

FFT X[k] spectra

−0.5 0.25

0.255

0.26

0.265

0.27 time / s

10

Mel scale freq. warp log |X[k]|

audspec

5 0

4

0x 10

1000

2000

3000

freq / Hz

0

5

10

15

freq / Mel

0

5

10

15

freq / Mel

0

10

20

30

quefrency

2 1 0 100

50

IFFT cepstra

0

200 0

Truncate

−200

MFCCs E4896 Music Signal Processing (Dan Ellis)

2013-04-08 - 3 /18

MFCC Example

• Resynthesize by imposing spectrum on noise MFCCs capture instruments, not notes

freq / Hz

Let It Be - log-freq specgram (LIB-1) 6000 1400 300

coefficient

MFCCs 12 10 8 6 4 2

freq / Hz

Noise excited MFCC resynthesis (LIB-2) 6000 1400 300 0

5

E4896 Music Signal Processing (Dan Ellis)

10

15

20

25 time / sec

2013-04-08 - 4 /18

MFCC Artist Classification

Ellis 2007

• 20 Artists x 6 albums each train models on 5 albums, classify tracks from last • Model as MFCC mean + covariance per artist

u2 tori_amos suzanne_vega steely_dan roxette radiohead queen prince metallica madonna led_zeppelin green_day garth_brooks fleetwood_mac depeche_mode dave_matthews_b cure creedence_c_r beatles aerosmith

u2 to su st ro ra qu pr me le ma gr fl ga de da cu cr be ae


Confusion: MFCCs (acc 55.13%)

true

“single Gaussian” model 20 (mean) + 10 x 19 (covariance) parameters 55% correct (guessing ~5%)

2013-04-08 - 5 /18

2. Chroma Features MIDI note number

• What about modeling tonal content (notes)? 75

melody spotting chord recognition cover songs...

70 65 60 55

MFCCs exclude tonal content Polyphonic transcription is too hard 45 40

MIDI note number

• •

50

e.g. sinusoidal tracking: confused by harmonics

75 70 65 60 55 50

Recognized True 40 45

22

24

26

• Chroma features as solution...


28

30

32

34

2013-04-08 - 6 /18

Chroma Features

Fujishima 1999

• Idea: Project all energy onto 12 semitones regardless of octave

maintains main “musical” distinction invariant to musical equivalence no need to worry about harmonics? G F

3 2

D C A

chroma

freq / kHz

chroma

4

G F

D C

1 50 100 150 fft bin

0

2

4

6

8

time / sec

A

50

100

150

200

250 time / frame

NM

C(b) =

B(12 log2 (k/k0 )

b)W (k)|X[k]|

k=0

W(k) is weighting, B(b) selects every ~ mod12 E4896 Music Signal Processing (Dan Ellis)

2013-04-08 - 7 /18

Better Chroma

• Problems:

blurring of bins close to edges limitation of FFT bin resolution

• Solutions: D C A

)

4

freq / kHz

chroma

(

G F

0

2000 freq / Hz

3 2 1 0

2

4

6

8

time / sec

chroma

peak picking - only keep energy at center of peaks G F D C A

50

100

150

200

time / frame

Instantaneous Frequency - high-resolution estimates adapt tuning center based on histogram of pitches E4896 Music Signal Processing (Dan Ellis)

2013-04-08 - 8 /18

Chroma Resynthesis

Ellis & Poliner 2007

• Chroma describes the notes in an octave ... but not the octave • Can resynthesize by presenting all octaves ... with a smooth envelope “Shepard tones” - octave is ambiguous M

0 -10 -20 -30 -40 -50 -60

12 Shepard tone spectra freq / kHz

level / dB

b b o+ 12 yb (t) = W (o + ) cos 2 w0 t 12 o=1

Shepard tone resynth

4 3 2 1

0

500

1000

1500

2000

2500 freq / Hz

0 2

4

6

8

10 time / sec

endless sequence illusion E4896 Music Signal Processing (Dan Ellis)

2013-04-08 - 9 /18

Chroma Example

• Simple Shepard tone resynthesis

can also reimpose broad spectrum from MFCCs

freq / Hz

Let It Be - log-freq specgram (LIB-1) 6000 1400 300

chroma bin

Chroma features B A G E D C

freq / Hz

Shepard tone resynthesis of chroma (LIB-3) 6000 1400 300

freq / Hz

MFCC-filtered shepard tones (LIB-4) 6000 1400 300 0

5


10

15

20

25 time / sec

2013-04-08 - 10/18

Beat-Synchronous Chroma

Bartsch & Wakefield 2001

• Drastically reduce data size

by recording one chroma frame per beat Let It Be - log-freq specgram (LIB-1)

freq / Hz

6000 1400 300

chroma bin

Onset envelope + beat times

Beat-synchronous chroma

B A G E D C

Beat-synchronous chroma + Shepard resynthesis (LIB-6)

freq / Hz

6000 1400 300 0

5


10

15

20

25 time / sec

2013-04-08 - 11/18

3. Chord Recognition

G

5

A-C-D-F

0

A-C-E

A

B-D-G

E D C

C-E-G

chroma bin

• Beat synchronous chroma look like chords ...

10

15

20

time / sec

can we transcribe them?

• Two approaches

manual templates (prior knowledge) learned models (from training data)


2013-04-08 - 12/18

Chord Recognition System

• Analogous to speech recognition

Sheh & Ellis 2003

Gaussian models of features for each chord Hidden Markov Models for chord transitions

Beat track

Audio

test

100-1600 Hz BPF

Chroma

25-400 Hz BPF

Chroma

beat-synchronous chroma features

C maj

B A G

train

Labels

chord labels

HMM Viterbi

Root normalize

Gaussian

Unnormalize

24 Gauss models

E D C

C

D

E

G

A

B

C

D

E

G

A

B

c min

B A G E D

Resample

C b a g

Count transitions

24x24 transition matrix

f e d c B A G F E D C


C

D

EF

G

A

B c

d

e f

g

a

b

2013-04-08 - 13/18

HMMs

• Hidden Markov Models are good for inferring hidden states

.8

.8

S

underlying Markov “generative model” each state has emission distribution

.1 .1

A .1

B .1

.1

C

p(qn+1|qn) S A qn B C E

.1

.1

.7

E

qn+1 S A B C E 0 0 0 0 0

1 .8 .1 .1 0

0 .1 .8 .1 0

0 .1 .1 .7 0

0 0 0 .1 1

SAAAAAAAABBBBBBBBBCCCCBBBBBBCE


q=B

q=C 3

0.6

xn

p(x|q)

q=A

0.8

AAAAAAAABBBBBBBBBBBCCCCBBBBBBBC

0.4

0

2

0

0.8

q=A

q=B

q=C

3

0.6 0.4

2 1

0.2 0

Observation sequence

1

0.2

xn

infer smoothed state sequence

State sequence

Emission distributions

p(x|q)

observations tell us something about state...

0

1

2

3

4

observation x

0

0

10

20

30

time step n

2013-04-08 - 14/18

HMM Inference

p(x|q) • HMM defines emission distribution and transition probabilities p(qn |qn 1 ) • Likelihood of observed given state sequence: p({xn }|{qn }) =

Model M1 0.8

0.7

A

S

0.2

0.1

E

0.1

S • • • •

S A B E

B

0.2

1

A B E 0.9 0.1 • 0.7 0.2 0.1 • 0.8 0.2 • • 1

E

States

0.9

xn Observations x1, x2, x3 p(x|B) p(x|A)

B A S

2

3

n

n

p(x|q) q{

x1

x2

x3

A 2.5 0.2 0.1 B 0.1 2.2 2.3

0.8 0.8 0.2 0.1 0.1 0.2 0.2

• By dynamic programming,

0.7

0.9

0

1

2

3

4 time n

All possible 3-emission paths Qk from S to E q0 q1 q2 q3 q4 S S S S

A A A B

A A B B

A B B B

E E E E

1)

Observation likelihoods

Paths

0.7

p(xn |qn )p(qn |qn

p(Q | M) = Πn p(qn|qn-1)

p(X | Q,M) = Πn p(xn|qn) p(X,Q | M)

.9 x .7 x .7 x .1 = 0.0441 .9 x .7 x .2 x .2 = 0.0252 .9 x .2 x .8 x .2 = 0.0288 .1 x .8 x .8 x .2 = 0.0128 Σ = 0.1109

2.5 x 0.2 x 0.1 = 0.05 0.0022 2.5 x 0.2 x 2.3 = 1.15 0.0290 2.5 x 2.2 x 2.3 = 12.65 0.3643 0.1 x 2.2 x 2.3 = 0.506 0.0065 Σ = p(X | M) = 0.4020


we can also identify the best state sequence given just the observations

2013-04-08 - 15/18

Key Normalization

• Chord transitions depend on key of piece dominant, relative minor, etc...

• Chord transition

probabilities should be key-relative estimate main key of piece rotate all chroma features learn models

aligned chroma

Taxman

I'm Only Sleeping

G F

G F

G F

D

D

D

C

C

C

A

A

A A

Love You To

C D

F G

A

Aligned Global model G F

G F

D

D

D

C

C

C

A

A A

C D

She Said She Said

C D

F G

Good Day Sunshine

A

G F

G F

D

D

D

C

C

C

A

A

A

F G

A

C D

F G

C D

F G

And Your Bird Can Sing

G F

C D

F G

A A

F G

C D

Yellow Submarine

G F

A


Eleanor Rigby

A

C D F G aligned chroma

2013-04-08 - 16/18

Chord Recognition

freq / Hz

• Often works: Audio

Let It Be/06-Let It Be

2416 761

Beatsynchronous chroma Recognized

G

C

G

G F E D C B A 0

2

A:min

a

F 4

6

C

G

F

C

C

G

F

C

8

10

12

C

A:min

G

G 14

• But only about 60% of the time E4896 Music Signal Processing (Dan Ellis)

A:min/b7 F:maj7

C

F:maj6

Ground truth chord

A:min/b7 F:maj7

240

a 16

18

2013-04-08 - 17/18

Summary

• Music Audio Features

capture information useful for classification

• Chroma Features

12 bins to robustly summarize notes

• Chord Recognition

Sometimes easy, sometimes subtle


2013-04-08 - 18/18

References • • • • • •

B. Logan, “Mel frequency cepstral coefficients for music modeling,” in Proc. Int. Symp. Music Inf. Retrieval ISMIR, Plymouth, September 2000. D. Ellis, “Classifying Music Audio with Timbral and Chroma Features,” in Proc. Int. Symp. Music Inf. Retrieval ISMIR-07, pp. 339-340,Vienna, October 2007. T. Fujishima, “Realtime chord recognition of musical sound: A system using common lisp music,” In Proc. Int. Comp. Music Conf., pp. 464–467, Beijing, 1999. D. Ellis and G. Poliner, “Identifying Cover Songs With Chroma Features and Dynamic Programming Beat Tracking,” Proc. ICASSP-07, pp. IV-1429-1432, Hawai'i, April 2007. M. A. Bartsch and G. H. Wakefield, “To catch a chorus: Using chroma-based representations for audio thumbnailing,” in Proc. IEEE WASPAA, Mohonk, October 2001. A. Sheh and D. Ellis, “Chord Segmentation and Recognition using EM-Trained Hidden Markov Models,” Int. Symp. Music Inf. Retrieval ISMIR-03, pp. 185-191, Baltimore, October 2003.


2013-04-08 - 19/18