A proper speech utterance from the same sounds. p p p ... repeatedly in the
process of forming the signal ... Application of multinomial and topic models to
audio.
Topic Models for Signal Processing Bhiksha Raj, Carnegie Mellon University Paris Smaragdis, University of Illinois, Urbana Champaign Urbana‐Champaign http://topicmodelsforaudio.com [org] ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
The Engineer and the Musician Once upon a time a rich potentate p y discovered a previously unknown recording of a beautiful piece of music. Unfortunately it was badly damaged. He greatly wanted to find out what it would sound like if He greatly wanted to find out what it would sound like if it were not. So he hired an engineer and a musician to solve the problem.. ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
The Engineer and the Musician The engineer worked for many years. He spent much money and published He spent much money and published many papers. Finally he had a somewhat scratchy restoration of the music..
The musician listened to the music carefully for a day, transcribed it, broke out his trusty keyboard and broke out his trusty keyboard and replicated the music. ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
The Prize Who do you think won the princess?
ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
Sounds – an example •
A sequence of notes
•
Chords from the same notes Chords from the same notes
•
A piece of music from the same (and a few additional) notes
ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
Sounds – another example • A sequence of sounds
• A proper speech utterance from the same sounds p p p
ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
Template Sounds Combine to Form a Signal • The component sounds “combine” to form complex sounds – Notes form music – Phoneme‐like structures combine in utterances
• Sound in general is composed of such “building blocks” or g p g themes – Which can be simple – e.g. notes, or complex, e.g. phonemes – Our definition of a building block: the entire structure occurs Our definition of a building block: the entire structure occurs repeatedly in the process of forming the signal
• Goal: Learn these building blocks automatically, from data Goal: Learn these building blocks automatically from data – Learn to be the musician – But through automated analysis of data
ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
The representation
FREQ
AMPL TIME
•
TIME
Requirement: Building blocks must combine additively – Linear: The presence of two notes does not distort either note – Constructive: Notes do not cancel – adding one note will not diminish another
•
Spectral representations: The magnitude spectra of uncorrelated signals add – In theory, power spectra add. In practice, this holds better for magnitude spectra h dd h h ld b f d
•
We represent signals spectrographically, as a sequence of magnitude spectral vectors estimated from (overlapping) segments of signal spectral vectors estimated from (overlapping) segments of signal ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
Discovering Building Blocks •
The magnitude spectrogram is a matrix of numbers – The f‐th frequency component in frame t is the (t,f )‐th entry in the matrix
•
Standard matrix decomposition methods may be applied to discover additive p y pp building blocks: – PCA, ICA...
•
Constraint: PCA ICA etc will result in bases with negative components Constraint: PCA, ICA etc. will result in bases with negative components – Not physically meaningful (what is a negative power spectrum)
Example: Spectrogram of two intermittent tones Example: Spectrogram of two intermittent tones
FREQ Q
Example: PCA bases from the tones sound
ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
+
+
Discovering Building Blocks •
The power spectrogram is a matrix of numbers – The f‐th frequency component in frame t is the (t,f )‐th entry in the matrix
•
Standard matrix decomposition methods may be applied to discover additive p y pp building blocks: – PCA, ICA...
•
Constraint: PCA ICA etc will result in bases with negative components Constraint: PCA, ICA etc. will result in bases with negative components – Not physically meaningful (what is a negative power spectrum)
Example: Spectrogram of two intermittent tones Example: Spectrogram of two intermittent tones
FREQ Q
Example: PCA bases from the tones sound
ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
+
+
Magnitude Spectra are like Histograms
• Magnitude spectrograms comprise sequences of magnitude spectra • Magnitude spectra are sequences of non‐negative numbers – Which can be viewed as scaled histograms
• Histograms may be viewed as the outcome of draws from a g y multinomial distribution • Analysis techniques related to a multinomial processes apply y q p pp y ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
What follows • A Tutorial in two parts • Part A: – Model definition – Equations, update rules and other squiggles Equations update rules and other squiggles
• Part B: – Applications and extensions and the other fun stuff Applications and extensions and the other fun stuff
• What is covered – Application of multinomial and topic models to audio li i f li i l d i d l di
• What is not covered – Text and text Text and text‐related related models models ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
Part A: Overview • Multinomial distributions and Topic models • Signals representations Application of PLCA to signals • Application of PLCA to signals – Bag of frequencies model – Bag of spectrograms model B f d l
• Latent variable models with priors – Sparsity and entropic priors – Cross Cross‐entropic entropic priors priors ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
Part A: Overview • Convolutional models • High‐dimensional data – Tensors and higher dimension data
ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
Part B Overview Part B Overview • • • • • • •
Low Rank Models of Signals Low‐Rank Models of Signals Separation of Monophonic Mixtures Recognition in Mixtures ii i i Temporal demixing Pitch Tracking User Interfaces User Interfaces Multimodality ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
The Multinomial Distribution
• The outcome of the draw can take one of a finite number of discrete values – A man drawing coloured balls from an urn • The probability of retrieving a ball of a given color depends on the fraction of balls in the urn that are of that color
– P(red) = no. red balls / Total balls ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
lion dog dog cat cat cat lion dog lion lion dog cat
P(word)
A probability distribution over words
lion
dog
cat
• The balls have words written on them – I.e. draws now obtain words, not colors I e draws now obtain words not colors
• P(word) = No. of balls with given word on it / total balls total balls ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
The probability of a collection lion
dog cat
dog
cat
lion dog lion
dog
cat
cat
lion
dog dog lion cat dog lion ..
=
dog x num(dog) cat x num(cat) li lion x num(lion) (li )
P (n( word1 ) of word1 , n( word 2 ) of word 2 ,...) CP ( word1 ) n ( word1 ) P( word 2 ) n ( word 2 ) ... C P ( word ) n ( word ) word
loggP(n( word1 ) off word1 , n( word 2 ) off word 2 ,,...))
n(word ) logg P(word ) C '
word
• Probability of drawing a specific collection of words • Constant accounts for permutations C f i ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
Given
dog x num(dog) cat x num(cat) cat x num(cat) lion x num(lion)
Find
lion
dog
dog cat
cat
lion dog lion
dog
cat
P(worrd)
Estimating probabilities gp
lion
lion
dog
cat
cat
• Given: A collection of words, find the underlying probability distribution underlying probability distribution – A problem we encounter repeatedly
• The maximum likelihood estimate: The maximum likelihood estimate: {P( word )} arg max{ P ( word )}
n(word ) log P(word ) P(word )
word
P( word )
word
n( word ) n(word ' )
word '
lion
dog cat
dog
cat
lion dog lion
dog
cat
lion
P(word)
The expected counts p
lion
dog
cat
cat
• Given: – The probabilities of all words P(word) – The total number of draws N
• What is: Wh t i – The expected number of draws of any word?
• Answer: E[n(word)] = N.P(word) Answer: E[n(word)] = N P(word) ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
lion
dog cat
dog
cat
lion dog lion
dog
cat
lion
P(word)
The inverse problem p
lion
dog
cat
cat
• Given: The number of times “lion” was drawn N(lion) • How many times was “dog” drawn? y g N (lion) N (dog ) P (dog ) P(lion (lion) ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
The inverse multinomial • Given P(word) for all words • Observed n(word1), n(word ) n(word2))..n(word n(wordk) • What is n(wordk+1), n(wordk+2)…
N o n( word i ) ik n ( word i ) P P(n( word k 1 ), n( word k 2 ),...) P ( word ) o i ik ( N o ) n( word i ) ik
• No is the total number of observed counts – n(word1) + n(word2) + …
• Po is the total probability of observed events – P(word1) + P(word2) + … ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
An Modified Experiment dog dog lion cat dog lion ..
lion
dog
dog cat
lion lion dog
cat dog cat
cat lion
lion
dog
dog cat
lion lion dog
cat dog cat
cat lion
lion
dog
dog cat
lion lion dog
cat dog cat
cat lion
lion
dog
dog cat
lion lion dog
cat dog
cat lion
cat
lion
dog
dog cat
lion lion dog
cat dog cat
cat lion
lion
dog
dog cat
lion lion dog
cat dog
cat lion
cat
• Multiple pots with different distributions Multiple pots with different distributions – The picker first selects a pot with probability P(pot) – Then he draws a ball from the pot h h d b ll f h
• We only see the outcomes! • Can we estimate the pots ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
A mixture multinomial • The outcomes are a mixture of draws from many pots of draws from many pots – i.e. a mixture of draws from several multinomials several multinomials
dog dog lion cat dog lion ..
dog cat
lion dog dog cat cat lion dog lion dog cat
lion dog dog cat cat lion dog lion dog cat
• The process is a mixture multinomial • Probability of word P b bili f d P( X ) P( Z ) P( X | Z ) Z
• Z is the pot • X are the words th d ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
lion dog dog cat cat lion dog lion dog cat
lion dog dog cat cat lion dog lion dog cat
lion dog dog cat cat lion dog lion dog cat
Mixture multinomials are multinomials too
lion
dog
dog cat
lion lion dog
cat dog cat
cat lion
lion
dog
dog cat
lion lion dog
cat dog cat
cat lion
lion
dog
dog cat
lion
cat dog
lion dog
cat
cat lion
lion
dog
dog cat
lion lion dog
cat dog cat
cat lion
lion
dog
dog cat
lion lion dog
cat dog cat
cat lion
lion
dog
dog cat
lion lion dog
cat dog
cat lion
cat
• From the outside viewer’s perspective, it is j t just a multinomial lti i l – Outcome is one of a countable, finite set of values
• Constraint: Probabilities are composed from component multinomials P( X ) P( Z ) P( X | Z ) Z
Multinomials and Simplexes
• Multinomials are probability vectors – If vocab size = V, live in V‐1 subspace of V‐space – Strictly in positive orthant
• Mixture multinomials reside within simplexes specified by p p y component multinomials ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
Estimating Mixture Multinomials logP(n( X 1 ) of X 1 , n( X 2 ) of X 2 ,...) n( X ) log P( X ) X
n( X ) log l P( Z ) P( X | Z ) X
Z
• Direct optimization not possible • Expectation Maximization P( Z ) P( X | Z ) X n( X ) log Z P(Z ) P( X | Z ) X n( X ) log Z Q( X , Z ) Q( X , Z ) n( X ) Q( X , Z ) log X
Z
P( Z ) P( X | Z ) Q( X , Z )
ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
Estimating Mixture Multinomials logP(n( X 1 ) of X 1 , n( X 2 ) of X 2 ,...) n( X ) log P( Z ) P( X | Z ) X
Z
• Introducing Q(X,Z) I t d i Q(X Z) P( Z ) P( X | Z ) X n( X ) log Z P(Z ) P( X | Z ) X n( X )Z Q( X , Z ) log Q( X , Z )
• Optimizing with respect to Q(X,Z) P( Z ) P( X | Z ) Q( X , Z ) P( Z ' ) P( X | Z ' ) Z'
• Optimizing P(Z) and P(X|Z)
n( X )Q( X , Z ) P( Z ) n( X )Q( X , Z ' ) X
Z'
X
P( X | Z )
n( X )Q( X , Z ) n( X ' )Q( X ' , Z ) X'
EM as counting dog dog lion cat dog lion ..
lion
dog
dog cat
lion lion dog
cat dog cat
cat lion
lion
dog
dog cat
lion lion dog
cat dog cat
cat lion
lion
dog
dog cat
lion lion dog
cat dog cat
cat lion
lion
dog
dog cat
lion lion dog
cat dog cat
cat lion
lion
dog
dog cat
lion lion dog
cat dog cat
cat lion
lion
dog
dog cat
lion lion dog
cat dog
cat lion
cat
• If the exact urn for each drawn instance was known f h f hd k – Compute distribution of each urn using urn‐specific counts
ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
EM as counting dog dog lion cat dog lion ..
lion
dog
dog cat
lion lion dog
cat dog cat
cat lion
lion
dog
dog cat
lion lion dog
cat dog cat
cat lion
lion
dog
dog cat
lion lion dog
cat dog cat
cat lion
lion
dog
dog cat
lion lion dog
cat dog cat
cat lion
lion
dog
dog cat
lion
cat dog
lion dog
cat lion
cat
lion
dog
dog cat
lion lion dog
cat dog
cat lion
cat
• When the exact urn is unknown Wh th t i k – Fragment each instance according to Q(X,Z) • Based on a current estimate of probabilities • Every instance of X will be fragmented the same way E i t f X ill b f t d th
– Compute distribution of each urn by counting fragments ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
EM as fragmentation and counting P( Z ) P( X | Z ) Q( X , Z ) P( Z ' ) P( X | Z ' )
Fragmentation
n( X )Q( X , Z ) P( Z ) n( X )Q( X , Z ' )
n( X )Q( X , Z ) P( X | Z ) n( X ' )Q( X ' , Z )
Z'
X
Z'
X'
X
Counting • We will use the ll h “fragmentation and counting” “f d ” explanation for the most part in the remaining slides • Instead of deriving detailed solutions of deriving detailed solutions •
Which are probably easy to derive anyway ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
Modification: Given P(X|Z) P( Z ) P( X | Z ) Q( X , Z ) P( Z ' ) P( X | Z ' )
Fragmentation
Z'
n( X )Q( X , Z ) P( Z ) n( X )Q( X , Z ' ) X
Z'
Counting
X
• Given P(X|Z) Given P(X|Z) (urn) and a collection of counts n(X) (urn) and a collection of counts n(X) • What is the manner in which urns were selected? • What Wh t is P(Z)? i P(Z)? ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
Topics and Bags of Words • The bag of words model: – Put all words in a “document” into a “bag” theoverthesupper pp j jumped d quick i k – Only considers occurrence brown dog lazy had fox for of words cat the dog the • Sequence information not used S i f ti t d
• An alternate vector representation [4 1 1 1 1 1 1 2 1 1 1 1 ] [4 1 1 1 1 1 1 2 1 1 1 1 ] The Quick Brown Fox Jumped Over Lazy Dog Had Cat For Supper
– Each number represents the number of instances of a word ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
Topics and Documents the home thegoal tackle quick running run for lazy had throw cat the ball the lion
dog
dog cat
lion lion dog
cat dog cat
cat lion
sports
secretary the president the congress court run running vote camp had motion cat the ball the lion
dog
dog cat
lion lion dog
cat dog cat
cat lion
politics
the martian thealien floating lights running antennae for rings had saucer green pthe the crop
alien abduction
lion
dog
dog cat
lion lion dog
cat dog
cat lion
cat
• Bags of words for different topics have different distributions – Drawn from different pots – Different words with different frequencies Different words with different frequencies ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
Documents as mixtures of topics lion
dog
dog cat
lion lion dog
cat dog cat
cat lion
sports
lion
dog
dog cat
lion lion dog
cat dog
cat lion
politics
cat
alien abduction
lion
dog
dog cat
lion lion dog
cat dog
cat lion
cat
• Bag of words model for newspaper: mixes g p p topics • Given word distributions for topics and bag of Given word distributions for topics and bag of words for newspaper, find composition? – Estimate mixture weights Estimate mixture weights ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
Documents as mixtures of topics sports politics newspaper alien abduction
• A document lies in topic simplex – Mixture weights are inversely related to distance g y to topic – May be multiple ways of mixing topics to obtain same document ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
Documents as mixtures of topics (0,0,1)
topic1
(0,0,1) = topic1
topic3 (1,0,0) topic2
(1,0,0) = topic3
( (0,1,0) = topic2 ) (0,1,0)
• Mixture Mixture multinomial model multinomial model maps document from a maps document from a point within topic simplex to a point in mixture weight simplex – Different locations in mixture weight simplex may result in same document vector • Depending on arrangement of topic vectors D di t ft i t ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
Documents differ in composition SI
(1,0,0) = sports
NYT Weekly world news (0,0,1) = politics (0,0,1) politics
( (0,1,0) = alien abduction ) l bd
• Relative proportion of different topics varies Relative proportion of different topics varies – I.e P(Z) has different structure for different documents
• I.e. location within mixture weight and/or topic simplex g / p p differs for different documents ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
A priori probabilities on mixtures P()
= {P(Z)}
• Location of document in mixture weight simplex may vary • Distribution of locations of documents from a given category can be captured by a priori probability distributions ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
Dirichlet Distributions 1 i 1 P(; ) P i B( ) i B ( )
( ) i
i
i i
• The most common model for distributions of a priori probabilities are Dirichlet distributions – Mathematical tractability – Conjugate prior to multinomial j g p ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
Estimation with prior ˆ arg max log P({ X } | ) P()
ˆ arg max
n(word ) log P(word ) P(word ) log P()
word
word
• Utilize the a priori distribution to better estimate mixture weights to explain new documents – Resolves issue of multiple solutions Resolves issue of multiple solutions – Improves estimate in all cases
• Maximum a posteriori estimator • Compare to maximum likelihood estimator below ˆ arg max
n(word ) log P(word ) P(word )
word
ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
word
Estimating the Dirichlets (and others) Estimating the Dirichlets (and others) • Estimate mixture weights on training data and g g estimate Dirichlet parameters • Jointly Jointly estimate mixture weights and Dirichlets estimate mixture weights and Dirichlets on on training data • Jointly estimate topics, mixture weights and Dirichlet parameters from training data • Requires complex estimation procedure that are not relevant to us ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
More complex models More complex models • Other forms of priors Other forms of priors – E.g. correlated topics
• Priors on priors Priors on priors • Capturing temporal sequences – E.g. Markov chains on topics
• Etc.. • Not directly relevant ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
Mixture multinomials and audio Mixture multinomials and audio • Audio Audio data representations are similar to data representations are similar to histograms • Topic model frameworks apply. • We will generically refer to topic models and extensions applied to audio as PLCA extensions applied to audio as PLCA – Probabilistic Latent Component Analysis
ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
Representing Audio
• Basic representation: Spectrogram – Computed using a short‐time Fourier transform – Segment audio into “frames” of 20‐64ms • Adjacent frames overlap by 75% Adj tf l b 75%
– “Window” the segments – Compute a Fourier spectrum from each frame Compute a Fourier spectrum from each frame ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
Spectrogram S1(1)
S2(1)
S3(1)
S4(1)
S5(1)
S6(1)
…
S1(2)
S2(2)
S3(2)
S4(2)
S5(2)
S6(2)
…
S1(3)
S2(3)
S3(3)
S4(3)
S5(3)
S6(3)
…
…
…
…
…
…
…
…
• The spectrogram is a series of complex vectors. – Can be decomposed into magnitude and phase
• We will however work primarily on the magnitude • Why: magnitude spectra of uncorrelated signals combine additively ddi i l – In theory power spectra are additive, but in practice, this holds better for magnitudes g ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
The Spectrogram as a Histogram • A generative model for one frame of a spectrogram • A magnitude spectrum represents energy against discrete frequencies • This may be viewed as a histogram of draws from a multinomial FRAME t HISTOGRAM
f
Pt (f ) f FRAME t
Power spectrum of frame t
Probability distribution underlying the t-th spectral vector ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
A Mixture Multinomial Model • The “picker” has multiple urns – For each draw he selects an urn and then draws a ball from it
• Overall probability of drawing f is a mixture multinomial HISTOGRAM
multiple draws
ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
The Picker Generates a Spectrogram
• The picker has a fixed set of Urns – Each urn has a different probability distribution over f
• He draws the spectrum for the first frame H d th t f th fi t f – In which he selects urns according to some probability P0(z)
• Then draws the spectrum for the second frame p – In which he selects urns according to some probability P1(z)
• And so on, until he has constructed the entire spectrogram ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
The Picker Generates a Spectrogram
• The picker has a fixed set of Urns – Each urn has a different probability distribution over f
• He draws the spectrum for the first frame H d th t f th fi t f – In which he selects urns according to some probability P0(z)
• Then draws the spectrum for the second frame p – In which he selects urns according to some probability P1(z)
• And so on, until he has constructed the entire spectrogram ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
The Picker Generates a Spectrogram
• The picker has a fixed set of Urns – Each urn has a different probability distribution over f
• He draws the spectrum for the first frame H d th t f th fi t f – In which he selects urns according to some probability P0(z)
• Then draws the spectrum for the second frame p – In which he selects urns according to some probability P1(z)
• And so on, until he has constructed the entire spectrogram ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
The Picker Generates a Spectrogram
• The picker has a fixed set of Urns – Each urn has a different probability distribution over f
• He draws the spectrum for the first frame H d th t f th fi t f – In which he selects urns according to some probability P0(z)
• Then draws the spectrum for the second frame p – In which he selects urns according to some probability P1(z)
• And so on, until he has constructed the entire spectrogram ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
The Picker Generates a Spectrogram
• The picker has a fixed set of Urns – Each urn has a different probability distribution over f
• He draws the spectrum for the first frame H d th t f th fi t f – In which he selects urns according to some probability P0(z)
• Then draws the spectrum for the second frame p – In which he selects urns according to some probability P1(z)
• And so on, until he has constructed the entire spectrogram ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
The Picker Generates a Spectrogram
• The picker has a fixed set of Urns – Each urn has a different probability distribution over f
• He draws the spectrum for the first frame H d th t f th fi t f – In which he selects urns according to some probability P0(z)
• Then draws the spectrum for the second frame p – In which he selects urns according to some probability P1(z)
• And so on, until he has constructed the entire spectrogram – The number of draws in each frame represents the energy in that frame • Actually the total magnitude spectral value
The PLCA Model
• The URNS are the same for every frame – These are the component multinomials or bases for the source that generated the signal
• The only difference between frames is the probability with which he selects the urns Frame‐specific spectral distribution
Pt ( f ) z Pt ( z ) P( f | z )
SOURCE specific bases
Frame(time) specific mixture weight ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
Spectral View of Component Multinomials
•
For audio each component multinomial (urn) is actually a normalized histogram over frequencies P(f |z) g q (f | ) – I.e. a spectrum
•
Component multinomials represent spectral building blocks for the given sound source
•
The spectrum for every analysis frame is explained as an additive combination of these basic spectral patterns combination of these basic spectral patterns ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
Learning Bases
• By “learning” the mixture multinomial model for any sound source we “discover” these “basis” spectral structures for the source • The model can be learnt from spectrograms of training audio p g g from the source using the EM algorithm – May not even require large amounts of audio
ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
Likelihood of a Spectrogram P ( St ( f )t , f ) Pt ( z ) P ( f | z ) t f z
St ( f )
log P( St ( f )t , f ) St ( f ) log Pt ( z ) P( f | z ) t f z • St(f) is the f‐th frequency of the magnitude spectrogram in the t‐th frame • Once again a direct maximum likelihood estimation is not possible • EM estimation required ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
Learning the Bases • Simple EM solution – Except bases are learned from all frames Except bases are learned from all frames Pt ( z | f )
Pt ( z ) P ( f | z ) Pt ( z ' ) P( f | z ' ) z'
P ( z | f )S ( f ) P ( z) P ( z '| f )S ( f ) t
t
f
t
t
z'
t
f
P ( z | f )S ( f ) P( f | z ) P ( z | f ' )S ( f ' ) t
t
t
t
f'
t
t
ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
The “Basis” distribution
Learning the Bases • Simple EM solution – Except bases are learned from all frames Except bases are learned from all frames Pt ( z | f )
Pt ( z ) P ( f | z ) Pt ( z ' ) P( f | z ' ) z'
f
P ( z | f )S ( f ) P ( z) P ( z '| f )S ( f ) t
t
f
t
t
t
z'
t
f
P ( z | f )S ( f ) P( f | z ) P ( z | f ' )S ( f ' ) t
t
t
t
f'
t
t
ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
The “Basis” distribution
Learning the Bases • Simple EM solution – Except bases are learned from all frames Except bases are learned from all frames Pt ( z | f )
Pt ( z ) P ( f | z ) Pt ( z ' ) P( f | z ' )
fragmentation
z'
f
P ( z | f )S ( f ) P ( z) P ( z '| f )S ( f ) t
t
f
t
t
t
z'
t
counting
f
P ( z | f )S ( f ) P( f | z ) P ( z | f ' )S ( f ' ) t
t
t
t
f'
t
t
ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
The “Basis” distribution
Learning Building Blocks Speech Signal
b bases
Basis‐specific spectrograms
Pt ( Z ) P ( f | z )
From Bach’s Fugue in Gm
Frequency
P(f|z)
Pt(z) Time
What about other data
• Faces – Trained 49 multinomial components on 2500 faces • Each face unwrapped into a 361‐dimensional vector
– Discovers parts of faces Discovers parts of faces ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
Given Bases Find Composition St (f )
Pt ( f ) z Pt ( z ) P( f | z )
f
f 5 5 5 8
1 4 7 4
1
1 1 7 4
• Iterative process: – Compute a posteriori probability of the z Compute a posteriori probability of the zth topic topic for each frequency f in the t‐th spectrum Pt ( z | f )
Pt ( z ) P ( f | z ) Pt ( z ' ) P( f | z ' ) z'
– Compute mixture weight of zth basis P( z | f ) S ( f ) P ( z) P( z '| f )S ( f ) t
f
t
t
z'
f
Bag of Frequencies vs. Bag of Spectrograms • The PLCA model described is a “bag of frequencies” model – Similar to “bag of words”
• Composes spectrogram one frame at a time – Contribution of bases to a frame does not affect other frames Contribution of bases to a frame does not affect other frames
• Random Variables: – Frequency q y – Possibly also the total number of draws in a frame
Z
f Nt
Pt(Z)
Bag of Frequencies PLCA model time T=0: P0(Z) T=0: P
5
5 5
T=1: P1(Z) T=1: P
1
4
T=k: Pk(Z) T=k: P
1
7
1
1 7
8
4
Z=0
Z=1
4
Z=2 P(f|z)
Z=M
• Bases are simple distributions over frequencies • Manner of selection of urns/components varies from analysis frame to analysis frame ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
Bag of Spectrograms PLCA Model
P(T|Z)
P(F|Z)
P(T|Z)
Z 1 Z=1
P(F|Z)
P(T|Z)
Z 2 Z=2
P(F|Z) Z M Z=M
• Compose the entire spectrogram all at once • Complex Complex “super super pots pots” include two sub pots include two sub pots – One pot has a distribution over frequencies: these are our bases – The second has a distribution over time
• Each draw: – – – –
Select a superpot Draw “F” from frequency pot q yp Draw “T” from time pot Increment histogram at (T,F)
P(t , f ) P ( z ) P(t | z ) P( f | z ) Z
The bag of spectrograms DRAW P(T|Z) ( | )
P(F|Z) ( | )
P(T|Z) ( | )
Z=1
P(F|Z) ( | )
P(T|Z) ( | )
Z=2
P(F|Z) ( | )
P(T|Z)
Z=M
P(F|Z) Z
F
T f
Z
f
t F
T
(T,F)
t Repeat N times Repeat N times
• Drawing procedure
P(t , f ) P( z ) P(t | z ) P ( f | z ) Z
– Fundamentally equivalent to bag of frequencies model Fundamentally equivalent to bag of frequencies model • With some minor differences in estimation ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
Estimating the bag of spectrograms P( z ) P( f | z ) P(t | z ) P( z ' ) P( f | z ' ) P(t | z ' )
P( z | t , f ) P(T|Z) ( | )
P(F|Z) ( | )
P(T|Z) ( | )
Z=1
P(F|Z) ( | )
P(T|Z) ( | )
Z=2
P(F|Z) ( | ) Z=M
?
z'
P( z | t , f )S ( f ) P( z ) P( z '| t , f )S ( f ) t
t
f
t
z'
t
f
P( z | t , f ) S ( f ) P( f | z ) P( z | t , f ' )S ( f ' )
f
t
t
t
t
f'
P(t , f ) P( z ) P(t | z ) P ( f | z ) Z
• EM update rules
t
P( z | t , f )S ( f ) P (t | z ) P( z | t ' , f )S ( f ) t
f
t'
– Can learn all parameters – Can learn P(T|Z) and P(Z) only given P(f|Z) – Can learn only P(Z)
ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
t'
f
Bag of frequencies vs. bag of spectrograms • Fundamentally equivalent y q • Difference in estimation – Bag of spectrograms: For a given total N and P(Z), the total “energy” assigned to a basis is determined • increasing its energy at one time will necessarily decrease its energy elsewhere
– No such constraint for bag of frequencies • More unconstrained
• Bag Bag of frequencies more amenable to imposition of a priori of frequencies more amenable to imposition of a priori distributions • Bag of spectrograms a more natural fit for other models g p g ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
The PLCA Tensor Model
P(A|Z)
P(B|Z)
P(C|Z)
P(A|Z)
Z
P(B|Z)
P(C|Z) Z
• The bag of spectrograms can be extended to multivariate data P(a, b,...c) P( z ) P (a | z ) P(b | z )...P(c | z ) Z
• EM update rules are essentially identical to bivariate case ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
How many bases • Previous examples assumed knowledge of number of bases b fb • How do we know the correct number of bases – In general we do not – No good mechanism to learn this automatically
• Must determine as many bases as data/math will allow – With appropriate constraints for best discovery of bases ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
An Geometric View (0,0,1) x
x
o
x = normalized training spectrum
o = learned component multinomial xo x x x x x x xx x xx x xo x x x
(1,0,0)
• •
(0,1,0)
Normalized spectrograms/spectral vectors live on probability simplex Normalized spectrograms/spectral vectors live on probability simplex ML estimation approximates these as linear combinations of components – Spectral vectors lying outside the region enclosed by components are modeled with error • As measured by KL distance between approximation and true vector A d b KL di b i i d
•
ML estimation learns parameters to minimize this error ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
PLCA as Matrix Decompositions Bag of frequencies model Bag of frequencies model Spectrogram = DxT matrix
=
Components DxK P(F|Z)
Weights KxT Pt((Z))
Energy: diag(n(t)) TxT
B Bag of spectrograms model f d l
Spectrogram = DxT Spectrogram DxT matrix
=
Components DxK P(F|Z)
Weights P(Z) KxK diagonal
Weights KxT P(T|Z) ( | )
• PLCA decomposes the spectrogram matrix as a product of matrices – The first matrix represents multinomial components The first matrix represents multinomial components – The second represents activations
A Mathematical Constraint Spectrogram = DxT matrix Spectrogram = DxT
=
Components DxK P(F|Z)
Weights KxT Pt(Z) or P(Z)P(T|Z)
• Estimate RHS to minimize KL between LHS and RHS • Unique estimates are possible if K=D bases may be individual frequencies • The model imposes no constraint p
– “Natural” bases, on the other hand are complex sounds
• Requirement: Learning procedure that permits learning more bases than the number of dimensions b h h b f di i ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
An Alternate View
• Allow any arbitrary number of bases (urns) p y y p f frame only a small number of y • Specify that for any specific bases may be used – Although there are many spectral structures, any given frame only has a few of these has a few of these
• Only a small number or urns are used in any frame – The mixture weights must be sparse g p ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
Sparse overcomplete estimation DxT
=
DxK KxT
• Learning more bases than dimensions results in overcomplete representations p • The activation (second matrix) must be sparse to accommodate – I.e. sparse, overcomplete representations
• SSparse activation matrix => no. of non‐zero entries in the matrix is ti ti ti f t i i th ti i small – I.e. the columns of the matrix have small L0 norm ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
A Brief Segue: Compressive Sensing and Sparse Estimation and Sparse Estimation Y
• Given: Y where h Y = AX • dim(X) >> dim(Y)
D x 1
A
=
DxK
– K >> D
• Given A, estimate X • Underspecified, no unique solution for X Underspecified no unique solution for X – Unique solutions if X is sparse ICASSP 2011 Tutorial: Applications of Topic Models for Signal Processing – Smaragdis, Raj
X
K x 1
Sparse Estimation and CS • Y = AX, Y = Dx1, X = Kx1, D