Topic Models for Signal Processing - MLSP - Carnegie Mellon ...

Topic Models for Signal Processing Bhiksha Raj, Carnegie Mellon University Paris Smaragdis, University of Illinois, Urbana Champaign Urbana‐Champaign http://topicmodelsforaudio.com [org] ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

The Engineer and the Musician Once upon a time a rich potentate p y discovered a previously unknown recording of a beautiful piece of music. Unfortunately it was badly damaged. He greatly wanted to find out what it would sound like if He greatly wanted to find out what it would sound like if it were not. So he hired an engineer and a musician to solve the problem.. ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

The Engineer and the Musician The engineer worked for many years. He spent much money and published He spent much money and published many papers. Finally he had a somewhat scratchy restoration of the music..

The musician listened to the music carefully for a day, transcribed it, broke out his trusty keyboard and broke out his trusty keyboard and replicated the music. ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

The Prize Who do you think won the princess?

ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Sounds – an example •

A sequence of notes

•

Chords from the same notes Chords from the same notes

•

A piece of music from the same (and a few additional) notes


Sounds – another example • A sequence of sounds

• A proper speech utterance from the same sounds p p p


Template Sounds Combine to Form a Signal • The component sounds “combine” to form complex sounds – Notes form music – Phoneme‐like structures combine in utterances

• Sound in general is composed of such “building blocks” or g p g themes – Which can be simple – e.g. notes, or complex, e.g. phonemes – Our definition of a building block: the entire structure occurs Our definition of a building block: the entire structure occurs repeatedly in the process of forming the signal

• Goal: Learn these building blocks automatically, from data Goal: Learn these building blocks automatically from data – Learn to be the musician – But through automated analysis of data


The representation

FREQ

AMPL TIME

•

TIME

Requirement: Building blocks must combine additively – Linear: The presence of two notes does not distort either note – Constructive: Notes do not cancel – adding one note will not diminish another

•

Spectral representations: The magnitude spectra of uncorrelated signals add – In theory, power spectra add. In practice, this holds better for magnitude spectra h dd h h ld b f d

•

We represent signals spectrographically, as a sequence of magnitude spectral vectors estimated from (overlapping) segments of signal spectral vectors estimated from (overlapping) segments of signal ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Discovering Building Blocks •

The magnitude spectrogram is a matrix of numbers – The f‐th frequency component in frame t is the (t,f )‐th entry in the matrix

•

Standard matrix decomposition methods may be applied to discover additive p y pp building blocks: – PCA, ICA...

•

Constraint: PCA ICA etc will result in bases with negative components Constraint: PCA, ICA etc. will result in bases with negative components – Not physically meaningful (what is a negative power spectrum)

Example: Spectrogram of two intermittent tones Example: Spectrogram of two intermittent tones

FREQ Q

Example: PCA bases from the tones sound


+

+

Discovering Building Blocks •

The power spectrogram is a matrix of numbers – The f‐th frequency component in frame t is the (t,f )‐th entry in the matrix

•

Standard matrix decomposition methods may be applied to discover additive p y pp building blocks: – PCA, ICA...

•

Constraint: PCA ICA etc will result in bases with negative components Constraint: PCA, ICA etc. will result in bases with negative components – Not physically meaningful (what is a negative power spectrum)

Example: Spectrogram of two intermittent tones Example: Spectrogram of two intermittent tones

FREQ Q

Example: PCA bases from the tones sound


+

+

Magnitude Spectra are like Histograms

• Magnitude spectrograms comprise sequences of magnitude spectra • Magnitude spectra are sequences of non‐negative numbers – Which can be viewed as scaled histograms

• Histograms may be viewed as the outcome of draws from a g y multinomial distribution • Analysis techniques related to a multinomial processes apply y q p pp y ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

What follows • A Tutorial in two parts • Part A: – Model definition – Equations, update rules and other squiggles Equations update rules and other squiggles

• Part B: – Applications and extensions and the other fun stuff Applications and extensions and the other fun stuff

• What is covered – Application of multinomial and topic models to audio li i f li i l d i d l di

• What is not covered – Text and text Text and text‐related related models models ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Part A: Overview • Multinomial distributions and Topic models • Signals representations Application of PLCA to signals • Application of PLCA to signals – Bag of frequencies model – Bag of spectrograms model B f d l

• Latent variable models with priors – Sparsity and entropic priors – Cross Cross‐entropic entropic priors priors ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Part A: Overview • Convolutional models • High‐dimensional data – Tensors and higher dimension data


Part B Overview Part B Overview • • • • • • •

Low Rank Models of Signals Low‐Rank Models of Signals Separation of Monophonic Mixtures Recognition in Mixtures ii i i Temporal demixing Pitch Tracking User Interfaces User Interfaces Multimodality ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

The Multinomial Distribution

• The outcome of the draw can take one of a finite number of discrete values – A man drawing coloured balls from an urn • The probability of retrieving a ball of a given color depends on the fraction of balls in the urn that are of that color

– P(red) = no. red balls / Total balls ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

lion dog dog cat cat cat lion dog lion lion dog cat

P(word)

A probability distribution over words

lion

dog

cat

• The balls have words written on them – I.e. draws now obtain words, not colors I e draws now obtain words not colors

• P(word) = No. of balls with given word on it / total balls total balls ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

The probability of a collection lion

dog cat

dog

cat

lion dog lion

dog

cat

cat

lion

dog dog lion cat dog lion ..

=

dog x num(dog) cat x num(cat) li lion x num(lion) (li )

P (n( word1 ) of word1 , n( word 2 ) of word 2 ,...)  CP ( word1 ) n ( word1 ) P( word 2 ) n ( word 2 ) ...  C  P ( word ) n ( word ) word

loggP(n( word1 ) off word1 , n( word 2 ) off word 2 ,,...))  

 n(word ) logg P(word )  C '

word

• Probability of drawing a specific collection of words • Constant accounts for permutations C f i ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Given

dog x num(dog) cat x num(cat) cat x num(cat) lion x num(lion)

Find

lion

dog

dog cat

cat

lion dog lion

dog

cat

P(worrd)

Estimating probabilities gp

lion

lion

dog

cat

cat

• Given: A collection of words, find the underlying probability distribution underlying probability distribution – A problem we encounter repeatedly

• The maximum likelihood estimate: The maximum likelihood estimate: {P( word )}  arg max{ P ( word )}

 n(word ) log P(word )    P(word )

word

P( word ) 

word

n( word )  n(word ' )

word '

lion

dog cat

dog

cat

lion dog lion

dog

cat

lion

P(word)

The expected counts p

lion

dog

cat

cat

• Given: – The probabilities of all words P(word) – The total number of draws N

• What is: Wh t i – The expected number of draws of any word?

• Answer: E[n(word)] = N.P(word) Answer: E[n(word)] = N P(word) ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

lion

dog cat

dog

cat

lion dog lion

dog

cat

lion

P(word)

The inverse problem p

lion

dog

cat

cat

• Given: The number of times “lion” was drawn N(lion) • How many times was “dog” drawn? y g N (lion) N (dog )  P (dog ) P(lion (lion) ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

The inverse multinomial • Given P(word) for all words • Observed n(word1), n(word ) n(word2))..n(word n(wordk) • What is n(wordk+1), n(wordk+2)…

   N o   n( word i )  ik n ( word i )  P P(n( word k 1 ), n( word k  2 ),...)   P ( word ) o i   ik ( N o )  n( word i )   ik 

• No is the total number of observed counts – n(word1) + n(word2) + …

• Po is the total probability of observed events – P(word1) + P(word2) + … ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

An Modified Experiment dog dog lion cat dog lion ..

lion

dog

dog cat

lion lion dog

cat dog cat

cat lion

lion

dog

dog cat

lion lion dog

cat dog cat

cat lion

lion

dog

dog cat

lion lion dog

cat dog cat

cat lion

lion

dog

dog cat

lion lion dog

cat dog

cat lion

cat

lion

dog

dog cat

lion lion dog

cat dog cat

cat lion

lion

dog

dog cat

lion lion dog

cat dog

cat lion

cat

• Multiple pots with different distributions Multiple pots with different distributions – The picker first selects a pot with probability P(pot) – Then he draws a ball from the pot h h d b ll f h

• We only see the outcomes! • Can we estimate the pots ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

A mixture multinomial • The outcomes are a mixture of draws from many pots of draws from many pots – i.e. a mixture of draws from several multinomials several multinomials

dog dog lion cat dog lion ..

dog cat

lion dog dog cat cat lion dog lion dog cat


• The process is a mixture multinomial • Probability of word P b bili f d P( X )   P( Z ) P( X | Z ) Z

• Z is the pot • X are the words th d ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj




Mixture multinomials are multinomials too

lion

dog

dog cat

lion lion dog

cat dog cat

cat lion

lion

dog

dog cat

lion lion dog

cat dog cat

cat lion

lion

dog

dog cat

lion

cat dog

lion dog

cat

cat lion

lion

dog

dog cat

lion lion dog

cat dog cat

cat lion

lion

dog

dog cat

lion lion dog

cat dog cat

cat lion

lion

dog

dog cat

lion lion dog

cat dog

cat lion

cat

• From the outside viewer’s perspective, it is j t just a multinomial lti i l – Outcome is one of a countable, finite set of values

• Constraint: Probabilities are composed from component multinomials P( X )   P( Z ) P( X | Z ) Z

Multinomials and Simplexes

• Multinomials are probability vectors – If vocab size = V, live in V‐1 subspace of V‐space – Strictly in positive orthant

• Mixture multinomials reside within simplexes specified by p p y component multinomials ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Estimating Mixture Multinomials logP(n( X 1 ) of X 1 , n( X 2 ) of X 2 ,...)    n( X ) log P( X ) X

  n( X ) log l  P( Z ) P( X | Z ) X

Z

• Direct optimization not possible • Expectation Maximization P( Z ) P( X | Z ) X n( X ) log Z P(Z ) P( X | Z )  X n( X ) log Z Q( X , Z ) Q( X , Z )   n( X ) Q( X , Z ) log X

Z

P( Z ) P( X | Z ) Q( X , Z )


Estimating Mixture Multinomials logP(n( X 1 ) of X 1 , n( X 2 ) of X 2 ,...)    n( X ) log  P( Z ) P( X | Z ) X

Z

• Introducing Q(X,Z) I t d i Q(X Z) P( Z ) P( X | Z ) X n( X ) log Z P(Z ) P( X | Z )  X n( X )Z Q( X , Z ) log Q( X , Z )

• Optimizing with respect to Q(X,Z) P( Z ) P( X | Z ) Q( X , Z )   P( Z ' ) P( X | Z ' ) Z'

• Optimizing P(Z) and P(X|Z)

 n( X )Q( X , Z ) P( Z )   n( X )Q( X , Z ' ) X

Z'

X

P( X | Z ) 

n( X )Q( X , Z )  n( X ' )Q( X ' , Z ) X'

EM as counting dog dog lion cat dog lion ..

lion

dog

dog cat

lion lion dog

cat dog cat

cat lion

lion

dog

dog cat

lion lion dog

cat dog cat

cat lion

lion

dog

dog cat

lion lion dog

cat dog cat

cat lion

lion

dog

dog cat

lion lion dog

cat dog cat

cat lion

lion

dog

dog cat

lion lion dog

cat dog cat

cat lion

lion

dog

dog cat

lion lion dog

cat dog

cat lion

cat

• If the exact urn for each drawn instance was known f h f hd k – Compute distribution of each urn using urn‐specific counts


EM as counting dog dog lion cat dog lion ..

lion

dog

dog cat

lion lion dog

cat dog cat

cat lion

lion

dog

dog cat

lion lion dog

cat dog cat

cat lion

lion

dog

dog cat

lion lion dog

cat dog cat

cat lion

lion

dog

dog cat

lion lion dog

cat dog cat

cat lion

lion

dog

dog cat

lion

cat dog

lion dog

cat lion

cat

lion

dog

dog cat

lion lion dog

cat dog

cat lion

cat

• When the exact urn is unknown Wh th t i k – Fragment each instance according to Q(X,Z) • Based on a current estimate of probabilities • Every instance of X will be fragmented the same way E i t f X ill b f t d th

– Compute distribution of each urn by counting fragments ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

EM as fragmentation and counting P( Z ) P( X | Z ) Q( X , Z )   P( Z ' ) P( X | Z ' )

Fragmentation

 n( X )Q( X , Z ) P( Z )   n( X )Q( X , Z ' )

n( X )Q( X , Z ) P( X | Z )   n( X ' )Q( X ' , Z )

Z'

X

Z'

X'

X

Counting • We will use the ll h “fragmentation and counting” “f d ” explanation for the most part in the remaining slides • Instead of deriving detailed solutions of deriving detailed solutions •

Which are probably easy to derive anyway ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Modification: Given P(X|Z) P( Z ) P( X | Z ) Q( X , Z )   P( Z ' ) P( X | Z ' )

Fragmentation

Z'

 n( X )Q( X , Z ) P( Z )   n( X )Q( X , Z ' ) X

Z'

Counting

X

• Given P(X|Z) Given P(X|Z) (urn) and a collection of counts n(X) (urn) and a collection of counts n(X) • What is the manner in which urns were selected? • What Wh t is P(Z)? i P(Z)? ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Topics and Bags of Words • The bag of words model: – Put all words in a “document” into a “bag” theoverthesupper pp j jumped d quick i k – Only considers occurrence brown dog lazy had fox for of words cat the dog the • Sequence information not used S i f ti t d

• An alternate vector representation [4 1 1 1 1 1 1 2 1 1 1 1 ] [4 1 1 1 1 1 1 2 1 1 1 1 ] The Quick Brown Fox Jumped Over Lazy Dog Had Cat For Supper

– Each number represents the number of instances of a word ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Topics and Documents the home thegoal tackle quick running run for lazy had throw cat the ball the lion

dog

dog cat

lion lion dog

cat dog cat

cat lion

sports

secretary the president the congress court run running vote camp had motion cat the ball the lion

dog

dog cat

lion lion dog

cat dog cat

cat lion

politics

the martian thealien floating lights running antennae for rings had saucer green pthe the crop

alien abduction

lion

dog

dog cat

lion lion dog

cat dog

cat lion

cat

• Bags of words for different topics have different distributions – Drawn from different pots – Different words with different frequencies Different words with different frequencies ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Documents as mixtures of topics lion

dog

dog cat

lion lion dog

cat dog cat

cat lion

sports

lion

dog

dog cat

lion lion dog

cat dog

cat lion

politics

cat

alien abduction

lion

dog

dog cat

lion lion dog

cat dog

cat lion

cat

• Bag of words model for newspaper: mixes g p p topics • Given word distributions for topics and bag of Given word distributions for topics and bag of words for newspaper, find composition? – Estimate mixture weights Estimate mixture weights ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Documents as mixtures of topics sports politics newspaper alien abduction

• A document lies in topic simplex – Mixture weights are inversely related to distance g y to topic – May be multiple ways of mixing topics to obtain same document ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Documents as mixtures of topics (0,0,1)

topic1

(0,0,1) = topic1

topic3 (1,0,0) topic2

(1,0,0) = topic3

( (0,1,0) = topic2 ) (0,1,0)

• Mixture Mixture multinomial model  multinomial model  maps document from a maps document from a point within topic simplex to a point in mixture weight simplex – Different locations in mixture weight simplex may result in same document vector • Depending on arrangement of topic vectors D di t ft i t ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Documents differ in composition SI

(1,0,0) = sports

NYT Weekly world news (0,0,1) = politics (0,0,1) politics

( (0,1,0) = alien abduction ) l bd

• Relative proportion of different topics varies Relative proportion of different topics varies – I.e P(Z) has different structure for different documents

• I.e. location within mixture weight and/or topic simplex g / p p differs for different documents ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

A priori probabilities on mixtures P()

 = {P(Z)}

• Location of document in mixture weight simplex may vary • Distribution of locations of documents from a given category can be captured by a priori probability distributions ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Dirichlet Distributions 1  i 1 P(;  )  P  i B( ) i B ( ) 

 ( ) i

i

     i   i 

• The most common model for distributions of a priori probabilities are Dirichlet distributions – Mathematical tractability – Conjugate prior to multinomial j g p ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Estimation with prior ˆ  arg max log P({ X } | ) P()  

ˆ  arg max  

 n(word ) log P(word )    P(word )  log P()

word

word

• Utilize the a priori distribution to better estimate mixture weights to explain new documents – Resolves issue of multiple solutions Resolves issue of multiple solutions – Improves estimate in all cases

• Maximum a posteriori estimator • Compare to maximum likelihood estimator below ˆ  arg max  

 n(word ) log P(word )    P(word )

word


word

Estimating the Dirichlets (and others) Estimating the Dirichlets (and others) • Estimate mixture weights on training data and g g estimate Dirichlet parameters • Jointly Jointly estimate mixture weights and Dirichlets estimate mixture weights and Dirichlets on on training data • Jointly estimate topics, mixture weights and Dirichlet parameters from training data • Requires complex estimation procedure that are not relevant to us ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

More complex models More complex models • Other forms of priors Other forms of priors – E.g. correlated topics

• Priors on priors Priors on priors • Capturing temporal sequences – E.g. Markov chains on topics

• Etc.. • Not directly relevant ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Mixture multinomials and audio Mixture multinomials and audio • Audio Audio data representations are similar to data representations are similar to histograms • Topic model frameworks apply. • We will generically refer to topic models and extensions applied to audio as PLCA extensions applied to audio as PLCA – Probabilistic Latent Component Analysis


Representing Audio

• Basic representation: Spectrogram – Computed using a short‐time Fourier transform – Segment audio into “frames” of 20‐64ms • Adjacent frames overlap by 75% Adj tf l b 75%

– “Window” the segments – Compute a Fourier spectrum from each frame Compute a Fourier spectrum from each frame ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Spectrogram S1(1)

S2(1)

S3(1)

S4(1)

S5(1)

S6(1)

…

S1(2)

S2(2)

S3(2)

S4(2)

S5(2)

S6(2)

…

S1(3)

S2(3)

S3(3)

S4(3)

S5(3)

S6(3)

…

…

…

…

…

…

…

…

• The spectrogram is a series of complex vectors. – Can be decomposed into magnitude and phase

• We will however work primarily on the magnitude • Why: magnitude spectra of uncorrelated signals combine additively ddi i l – In theory power spectra are additive, but in practice, this holds better for magnitudes g ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

The Spectrogram as a Histogram • A generative model for one frame of a spectrogram • A magnitude spectrum represents energy against discrete frequencies • This may be viewed as a histogram of draws from a multinomial FRAME t HISTOGRAM

f

Pt (f ) f FRAME t

Power spectrum of frame t

Probability distribution underlying the t-th spectral vector ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

A Mixture Multinomial Model • The “picker” has multiple urns – For each draw he selects an urn and then draws a ball from it

• Overall probability of drawing f is a mixture multinomial HISTOGRAM

multiple draws


The Picker Generates a Spectrogram

• The picker has a fixed set of Urns – Each urn has a different probability distribution over f

• He draws the spectrum for the first frame H d th t f th fi t f – In which he selects urns according to some probability P0(z)

• Then draws the spectrum for the second frame p – In which he selects urns according to some probability P1(z)

• And so on, until he has constructed the entire spectrogram ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

























• And so on, until he has constructed the entire spectrogram – The number of draws in each frame represents the energy in that frame • Actually the total magnitude spectral value

The PLCA Model

• The URNS are the same for every frame – These are the component multinomials or bases for the source that generated the signal

• The only difference between frames is the probability with which he selects the urns Frame‐specific spectral distribution

Pt ( f )   z Pt ( z ) P( f | z )

SOURCE specific bases

Frame(time) specific mixture weight ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Spectral View of Component Multinomials

•

For audio each component multinomial (urn) is actually a normalized histogram over frequencies P(f |z) g q (f | ) – I.e. a spectrum

•

Component multinomials represent spectral building blocks for the given sound source

•

The spectrum for every analysis frame is explained as an additive combination of these basic spectral patterns combination of these basic spectral patterns ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Learning Bases

• By “learning” the mixture multinomial model for any sound source we “discover” these “basis” spectral structures for the source • The model can be learnt from spectrograms of training audio p g g from the source using the EM algorithm – May not even require large amounts of audio


Likelihood of a Spectrogram   P ( St ( f )t , f )     Pt ( z ) P ( f | z )   t f  z

St ( f )

  log P( St ( f )t , f )   St ( f ) log  Pt ( z ) P( f | z )  t f  z  • St(f) is the f‐th frequency of the magnitude spectrogram in the t‐th frame • Once again a direct maximum likelihood estimation is not possible • EM estimation required ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Learning the Bases • Simple EM solution – Except bases are learned from all frames Except bases are learned from all frames Pt ( z | f ) 

Pt ( z ) P ( f | z )  Pt ( z ' ) P( f | z ' ) z'

 P ( z | f )S ( f ) P ( z)   P ( z '| f )S ( f ) t

t

f

t

t

z'

t

f

 P ( z | f )S ( f ) P( f | z )   P ( z | f ' )S ( f ' ) t

t

t

t

f'

t

t


The “Basis” distribution


Pt ( z ) P ( f | z )  Pt ( z ' ) P( f | z ' ) z'

f

 P ( z | f )S ( f ) P ( z)   P ( z '| f )S ( f ) t

t

f

t

t

t

z'

t

f

 P ( z | f )S ( f ) P( f | z )   P ( z | f ' )S ( f ' ) t

t

t

t

f'

t

t




Pt ( z ) P ( f | z )  Pt ( z ' ) P( f | z ' )

fragmentation

z'

f

 P ( z | f )S ( f ) P ( z)   P ( z '| f )S ( f ) t

t

f

t

t

t

z'

t

counting

f

 P ( z | f )S ( f ) P( f | z )   P ( z | f ' )S ( f ' ) t

t

t

t

f'

t

t



Learning Building Blocks Speech Signal

b bases

Basis‐specific spectrograms

Pt ( Z ) P ( f | z )

From Bach’s Fugue in Gm

Frequency 

P(f|z)

Pt(z) Time 

What about other data

• Faces – Trained 49 multinomial components on 2500 faces • Each face unwrapped into a 361‐dimensional vector

– Discovers parts of faces Discovers parts of faces ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Given Bases Find Composition St (f )

Pt ( f )   z Pt ( z ) P( f | z )

f

f 5 5 5 8

1 4 7 4

1

1 1 7 4

• Iterative process: – Compute a posteriori probability of the z Compute a posteriori probability of the zth topic topic for each frequency f in the t‐th spectrum Pt ( z | f ) 

Pt ( z ) P ( f | z )  Pt ( z ' ) P( f | z ' ) z'

– Compute mixture weight of zth basis  P( z | f ) S ( f ) P ( z)   P( z '| f )S ( f ) t

f

t

t

z'

f

Bag of Frequencies vs. Bag of Spectrograms • The PLCA model described is a “bag of frequencies” model – Similar to “bag of words”

• Composes spectrogram one frame at a time – Contribution of bases to a frame does not affect other frames Contribution of bases to a frame does not affect other frames

• Random Variables: – Frequency q y – Possibly also the total number of draws in a frame

Z

f Nt

Pt(Z)

Bag of Frequencies PLCA model time T=0: P0(Z) T=0: P

5

5 5

T=1: P1(Z) T=1: P

1

4

T=k: Pk(Z) T=k: P

1

7

1

1 7

8

4

Z=0

Z=1

4

Z=2 P(f|z)

Z=M

• Bases are simple distributions over frequencies • Manner of selection of urns/components varies from analysis frame to analysis frame ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Bag of Spectrograms PLCA Model

P(T|Z)

P(F|Z)

P(T|Z)

Z 1 Z=1

P(F|Z)

P(T|Z)

Z 2 Z=2

P(F|Z) Z M Z=M

• Compose the entire spectrogram all at once • Complex Complex “super super pots pots” include two sub pots include two sub pots – One pot has a distribution over frequencies: these are our bases – The second has a distribution over time

• Each draw: – – – –

Select a superpot Draw “F” from frequency pot q yp Draw “T” from time pot Increment histogram at (T,F)

P(t , f )   P ( z ) P(t | z ) P( f | z ) Z

The bag of spectrograms DRAW P(T|Z) ( | )

P(F|Z) ( | )

P(T|Z) ( | )

Z=1

P(F|Z) ( | )

P(T|Z) ( | )

Z=2

P(F|Z) ( | )

P(T|Z)

Z=M

P(F|Z) Z

F

T f

Z

f

t F

T

(T,F)

t Repeat N times Repeat N times

• Drawing procedure

P(t , f )   P( z ) P(t | z ) P ( f | z ) Z

– Fundamentally equivalent to bag of frequencies model Fundamentally equivalent to bag of frequencies model • With some minor differences in estimation ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Estimating the bag of spectrograms P( z ) P( f | z ) P(t | z )  P( z ' ) P( f | z ' ) P(t | z ' )

P( z | t , f )  P(T|Z) ( | )

P(F|Z) ( | )

P(T|Z) ( | )

Z=1

P(F|Z) ( | )

P(T|Z) ( | )

Z=2

P(F|Z) ( | ) Z=M

?

z'

 P( z | t , f )S ( f ) P( z )   P( z '| t , f )S ( f ) t

t

f

t

z'

t

f

 P( z | t , f ) S ( f ) P( f | z )   P( z | t , f ' )S ( f ' )

f

t

t

t

t

f'

P(t , f )   P( z ) P(t | z ) P ( f | z ) Z

• EM update rules

t

 P( z | t , f )S ( f ) P (t | z )   P( z | t ' , f )S ( f ) t

f

t'

– Can learn all parameters – Can learn P(T|Z) and P(Z) only given P(f|Z) – Can learn only P(Z)


t'

f

Bag of frequencies vs. bag of spectrograms • Fundamentally equivalent y q • Difference in estimation – Bag of spectrograms: For a given total N and P(Z), the total “energy” assigned to a basis is determined • increasing its energy at one time will necessarily decrease its energy elsewhere

– No such constraint for bag of frequencies • More unconstrained

• Bag Bag of frequencies more amenable to imposition of a priori of frequencies more amenable to imposition of a priori distributions • Bag of spectrograms a more natural fit for other models g p g ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

The PLCA Tensor Model

P(A|Z)

P(B|Z)

P(C|Z)

P(A|Z)

Z

P(B|Z)

P(C|Z) Z

• The bag of spectrograms can be extended to multivariate data P(a, b,...c)   P( z ) P (a | z ) P(b | z )...P(c | z ) Z

• EM update rules are essentially identical to bivariate case ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

How many bases • Previous examples assumed knowledge of number of bases b fb • How do we know the correct number of bases – In general we do not – No good mechanism to learn this automatically

• Must determine as many bases as data/math will allow – With appropriate constraints for best discovery of bases ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

An Geometric View (0,0,1) x

x

o

x = normalized training spectrum

o = learned component multinomial xo x x x x x x xx x xx x xo x x x

(1,0,0)

• •

(0,1,0)

Normalized spectrograms/spectral vectors live on probability simplex Normalized spectrograms/spectral vectors live on probability simplex ML estimation approximates these as linear combinations of components – Spectral vectors lying outside the region enclosed by components are modeled with error • As measured by KL distance between approximation and true vector A d b KL di b i i d

•

ML estimation learns parameters to minimize this error ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

PLCA as Matrix Decompositions Bag of frequencies model Bag of frequencies model Spectrogram = DxT matrix

=

Components DxK P(F|Z)

Weights KxT Pt((Z))

Energy: diag(n(t)) TxT

B Bag of spectrograms model f d l

Spectrogram = DxT Spectrogram DxT matrix

=


Weights P(Z) KxK diagonal

Weights KxT P(T|Z) ( | )

• PLCA decomposes the spectrogram matrix as a product of matrices – The first matrix represents multinomial components The first matrix represents multinomial components – The second represents activations

A Mathematical Constraint Spectrogram = DxT matrix Spectrogram = DxT

=


Weights KxT Pt(Z) or P(Z)P(T|Z)

• Estimate RHS to minimize KL between LHS and RHS • Unique estimates are possible if K=D bases may be individual frequencies • The model imposes no constraint p

– “Natural” bases, on the other hand are complex sounds

• Requirement: Learning procedure that permits learning more bases than the number of dimensions b h h b f di i ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

An Alternate View

• Allow any arbitrary number of bases (urns) p y y p f frame only a small number of y • Specify that for any specific bases may be used – Although there are many spectral structures, any given frame only has a few of these has a few of these

• Only a small number or urns are used in any frame – The mixture weights must be sparse g p ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

Sparse overcomplete estimation DxT

=

DxK KxT

• Learning more bases than dimensions results in overcomplete representations p • The activation (second matrix) must be sparse to accommodate – I.e. sparse, overcomplete representations

• SSparse activation matrix => no. of non‐zero entries in the matrix is ti ti ti f t i i th ti i small – I.e. the columns of the matrix have small L0 norm ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

A Brief Segue: Compressive Sensing and Sparse Estimation and Sparse Estimation Y

• Given: Y where h Y = AX • dim(X) >> dim(Y)

D x 1

A

=

DxK

– K >> D

• Given A, estimate X • Underspecified, no unique solution for X Underspecified no unique solution for X – Unique solutions if X is sparse ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj

X

K x 1

Sparse Estimation and CS • Y = AX, Y = Dx1, X = Kx1, D

Topic Models for Signal Processing - MLSP - Carnegie Mellon ...

Topic Models for Signal Processing - MLSP - Carnegie Mellon ...

Suggest Documents

signal processing for robust speech recognition - Carnegie Mellon ...

signal processing for robust speech recognition - Carnegie Mellon ...

Topic-conditioned Novelty Detection - Carnegie Mellon University ...

Evaluating Comparative Evaluation Models - Carnegie Mellon School ...

A locally adaptive window for signal matching - Carnegie Mellon ...

Distributed algorithm for graph signal inpainting. - Carnegie Mellon

Image-Processing Projects for an Algorithms - Carnegie Mellon ...

Carnegie Mellon Electricity Industry Center | Carnegie Mellon University

Download - Carnegie Mellon University

Download - Carnegie Mellon University

Download - Carnegie Mellon University

12 - Carnegie Mellon University

Phinding Phish - Carnegie Mellon CyLab - Carnegie Mellon University

Carnegie Mellon Electricity Industry Center | Carnegie Mellon University

Carnegie Mellon - CiteSeerX

vTube - Carnegie Mellon University

Algorithms: - Carnegie Mellon University

Untitled - Carnegie Mellon University

CARNEGIE MELLON UNIVERSITY LIBRARIES

Linear Time-Varying Models for Signal Processing

Carnegie Mellon University - Personal.psu.edu

carnegie mellon university - CiteSeerX

Carnegie Mellon University - CiteSeerX

Carnegie Mellon University