s Background information on the extraction of related features from multiple data sets s Explanation of the statistical
Extracting Coactivated Features from Multiple Data Sets Michael U. Gutmann University of Helsinki
Aapo Hyvärinen University of Helsinki
[email protected]
[email protected]
Michael U. Gutmann – University of Helsinki
ICANN 2011 - p. 1
Contents
Introduction Extraction of coactivated features
This talk is about a new method to find related features (structure) in multiple data sets.
Testing on artificial data
■ Application to real data Summary
■ ■ ■
Michael U. Gutmann – University of Helsinki
Background information on the extraction of related features from multiple data sets Explanation of the statistical model underlying our method Testing our method on artificial data Application to real data (here: natural images, in the paper: also brain imaging data)
ICANN 2011 - p. 2
Introduction
Michael U. Gutmann – University of Helsinki
ICANN 2011 - p. 3
An example Data set 1
15 10
10
5
5
0
0
−5
−5
−10
−10
−15 −15
−10
−5
0
5
Data set 2
15
10
15
−15 −15
−10
−5
0
5
10
15
Goals: 1. Characterize each data set separately → Eigenvalues and eigenvectors of covariance matrices 2. Find relations between the two data sets
Michael U. Gutmann – University of Helsinki
ICANN 2011 - p. 4
Correlation based method (1/3) Whitened data set 1 (normalized representation) 4
4
3
3
2
2
1
1
0
0
−1
−1
−2
−2
−3
−3
−4 −4
■ ■
After change of basis: data is still white
−2
0
2
4
−4 −4
−2
0
2
4
Whitening is defined up to a rotation Choose coordinate systems which best describe the relation between the two data sets 1 and 2
Michael U. Gutmann – University of Helsinki
ICANN 2011 - p. 5
Correlation based method (2/3) Whitened data set 1 4
3
3
e1
2
Correlation between projections on ei
Whitened data set 2
4
3
e2
2
2 1
1
α1
α2
0
−1
−1
−2
−2
−3
−3
−4 −4
−4 −4
α1
1 0
0.5
0
0
−1 −2
−2
0
2
4
−0.5
−3 −2
0
2
4
−2
0
2
α2 ■
■
■
■
The x-coordinate of a data point in the coordinate system defined by ei is given by its projection on ei . Compute the correlation between the x-coordinates for different coordinate systems. Choose the coordinate systems for which the x-coordinates are most strongly correlated (here: correlation coefficient of ≈ 0.6). Described method is called Canonical Correlation Analysis (CCA).
Michael U. Gutmann – University of Helsinki
ICANN 2011 - p. 6
Correlation based method (3/3) Data set 1
30 20
20
10
10
0
0
−10
−10
−20
−20
−30 −30
−20
−10
0
10
Whitened data set 1
Data set 2
30
20
−30 −30
30
−20
Whitened data set 2
6
6
4
4
2
2
−10
0
10
20
30
Correlation between projections on ei 0.08
3
0.06
2
0.04
e1
0
α1
−2
e2
−2
−4 −6 −6
α1
0
1
−2
0
2
4
6
−6 −6
0
0 −0.02
−1
α2
−0.04
−4
−4
0.02
−2
−4
−2
0
2
4
6
−0.06
−3 −2
0
2
−0.08
α2
→ Method does not seem to work here. Michael U. Gutmann – University of Helsinki
ICANN 2011 - p. 7
What does “not work” mean?
It means 1. that we did not find meaningful features within each data set 2. that we did not find any relation between the two data sets Whitened data set 1 6
6
Axis 1
4
2
2
0
0
−2
−2
−4
−4
−4
−2
0
2
4
10
6
−6 −6
Projection on axis 2
4
−6 −6
Blue: original data Red: data with same marginal distr.
Whitened data set 2
Axis 2 −4
−2
0
2
4
6
5
0
−5
−10 −10
−5
0
5
10
Projection on axis 1
⇓
⇓
Identify independent components Michael U. Gutmann – University of Helsinki
⇓
Identify coactivation ICANN 2011 - p. 8
In this talk . . .
Introduction ● CCA
I will present a method where
● Limitations ● New method Extraction of coactivated features Testing on artificial data Application to real data Summary
1. the features for each data set are maximally statistically independent 2. the features across the data sets tend to be jointly activated: they have statistically dependent variances 3. multiple data sets can be analyzed
Michael U. Gutmann – University of Helsinki
ICANN 2011 - p. 9
Extraction of coactivated features
Michael U. Gutmann – University of Helsinki
ICANN 2011 - p. 10
Statistical model underlying our method (1/2) ■ Introduction Extraction of coactivated features
■
Given n data sets, we assume that each set is formed by iid. observations of a random vector zi ∈ Rd . To model structure within a data set, we assume that
● Statistical model ● Learning
zi =
Testing on artificial data
d X
qik sik
(i = 1, . . . n)
k=1
Application to real data
■
qik : orthonormal, si1 , . . . , sid : statistically independent To model structure across the data sets, we assume that s1k , . . . , snk are statistically dependent. 6
6
4
4
q11
0
−2
−2
q12
q21
−4
−4
Michael U. Gutmann – University of Helsinki
5
q22
2
0
−6 −6
10
z22
z21
2
Blue: original data Red: data with same marginal distr.
s21
Summary
−4
−2
0
z11
2
4
6
−6 −6
−4
−2
0
z12
0
−5
2
4
6
−10 −10
−5
0
s11
5
10
ICANN 2011 - p. 11
Statistical model underlying our method (2/2) ■ Introduction
Dependency assumptions: The k-th sources from all the data sets share a common (latent) variance variable σk :
Extraction of coactivated features
s1k = σk s˜1k
● Statistical model
s2k = σk s˜2k
snk = σk s˜nk
...
● Learning Testing on artificial data
■
Application to real data Summary
■
The s˜1k , . . . , s˜nk are Gaussian (zero mean, possibly correlated). Choosing a prior for σk completes the model specification. Correlation of squares
s11
Linear correlation 1
1
0.8
0.8
Blue: original data Red: data with same marginal distr. 10
5 0.6
s21
s12
0.6
s21
0.4
s22
0.2
0.2
−5
0 0
s11
Michael U. Gutmann – University of Helsinki
0
0.4
s12
s21
s22
s11
s12
s21
s22
−10 −10
−5
0
s11
5
10
ICANN 2011 - p. 12
Applying the method – learning the parameters ■ Introduction Extraction of coactivated features
■
● Statistical model ● Learning Testing on artificial data
■
Application to real data Summary
The most interesting parameters of the model are the features qik . They can be learned by maximizing the log-likelihood ℓ(q11 , . . . , qnd ). For the special case of uncorrelated sources, ! n d T X XX 1 n ℓ(q1 , . . . , qd ) = (qik T zi (t))2 , Gk t=1 k=1
(1)
i=1
where Gk (u) = log
Z
2
u pσk (σk ) − 2 n exp 2σk (2πσk2 ) 2
dσk .
(2)
Equations show that the presented method is related to Independent Subspace Analysis (see paper)
Michael U. Gutmann – University of Helsinki
ICANN 2011 - p. 13
Testing on artificial data
Michael U. Gutmann – University of Helsinki
ICANN 2011 - p. 14
Simulation setup and goals
Introduction
■
Extraction of coactivated features Testing on artificial data ● Setup ● Results Application to real data Summary
■
Michael U. Gutmann – University of Helsinki
Setup ◆ Three data sets (n = 3) of dimension four (d = 4) ◆ No linear correlation in the sources si k ◆ Randomly chosen orthonormal mixing matrices Q1 , Q2 , Q3 ◆ 10000 observations ◆ Learning of the parameters by maximization of √ log-likelihood, with ad hoc nonlinearity G(u) = − 0.1 + u Quantities of interest ◆ Error in the mixing matrices ◆ Identification of the coupling
ICANN 2011 - p. 15
Results −3
−4.2
● Setup
Objective
Testing on artificial data
Squared estimation error
−4.3
Introduction Extraction of coactivated features
3
−4.4 −4.5
● Results
−4.6 Application to real data
−4.7
Summary
0.6
0.7 0.8 0.9 Correct coupling (fraction)
1
x 10
2.5 2 1.5 1 0.5 0.5
0.6 0.7 0.8 0.9 Correct coupling (fraction)
1
Results for one estimation problem. Optimization performed for 20 different initializations. ■ ■
■
Michael U. Gutmann – University of Helsinki
Correct coupling at maximum of the log-likelihood Presence of local maxima (not nice! but no catastrophe, see following slides) Learning the right mixing matrices without learning the right coupling seems possible. ICANN 2011 - p. 16
Application to real data
Michael U. Gutmann – University of Helsinki
ICANN 2011 - p. 17
Setup
Introduction
■
Extraction of coactivated features Testing on artificial data
■
Application to real data
■
● Setup ● Results Summary
■
Michael U. Gutmann – University of Helsinki
Data set 1: 10000 image patches of size 25px × 25px, extracted at random locations from natural video data Data set 2: same image patches, 40ms later For each data set separately: whitening and dimension reduction (98% of variance retained) Learning of 50 features per data set by maximization of the log-likelihood of the model (same objective function as for the artificial data).
ICANN 2011 - p. 18
Learned features Our method: First data set
Second data set
Introduction Extraction of coactivated features Testing on artificial data Application to real data ● Setup ● Results Summary
Canonical correlation analysis: First data set
Michael U. Gutmann – University of Helsinki
Second data set
ICANN 2011 - p. 19
Summary
Michael U. Gutmann – University of Helsinki
ICANN 2011 - p. 20
Summary ■ Introduction
Presented a new method to find related features (structure) in multiple data sets:
Extraction of coactivated features
1. The features for each data set are maximally statistically independent. 2. The features across the data sets tend to be jointly activated: they have statistically dependent variances. 3. Multiple data sets can be analyzed.
Testing on artificial data Application to real data Summary ● Summary
■
Michael U. Gutmann – University of Helsinki
In the paper: ◆ more theory (in particular, more on the relation to CCA) ◆ more simulations with natural images ◆ simulations with brain imaging data
ICANN 2011 - p. 21