Extracting Coactivated Features from Multiple Data Sets - Google Sites

Extracting Coactivated Features from Multiple Data Sets Michael U. Gutmann University of Helsinki

Aapo Hyvärinen University of Helsinki

[email protected]

[email protected]

Michael U. Gutmann – University of Helsinki

ICANN 2011 - p. 1

Contents

Introduction Extraction of coactivated features

This talk is about a new method to find related features (structure) in multiple data sets.

Testing on artificial data

■ Application to real data Summary

■ ■ ■


Background information on the extraction of related features from multiple data sets Explanation of the statistical model underlying our method Testing our method on artificial data Application to real data (here: natural images, in the paper: also brain imaging data)

ICANN 2011 - p. 2

Introduction


ICANN 2011 - p. 3

An example Data set 1

15 10

10

5

5

0

0

−5

−5

−10

−10

−15 −15

−10

−5

0

5

Data set 2

15

10

15

−15 −15

−10

−5

0

5

10

15

Goals: 1. Characterize each data set separately → Eigenvalues and eigenvectors of covariance matrices 2. Find relations between the two data sets


ICANN 2011 - p. 4

Correlation based method (1/3) Whitened data set 1 (normalized representation) 4

4

3

3

2

2

1

1

0

0

−1

−1

−2

−2

−3

−3

−4 −4

■ ■

After change of basis: data is still white

−2

0

2

4

−4 −4

−2

0

2

4

Whitening is defined up to a rotation Choose coordinate systems which best describe the relation between the two data sets 1 and 2


ICANN 2011 - p. 5

Correlation based method (2/3) Whitened data set 1 4

3

3

e1

2

Correlation between projections on ei

Whitened data set 2

4

3

e2

2

2 1

1

α1

α2

0

−1

−1

−2

−2

−3

−3

−4 −4

−4 −4

α1

1 0

0.5

0

0

−1 −2

−2

0

2

4

−0.5

−3 −2

0

2

4

−2

0

2

α2 ■

■

■

■

The x-coordinate of a data point in the coordinate system defined by ei is given by its projection on ei . Compute the correlation between the x-coordinates for different coordinate systems. Choose the coordinate systems for which the x-coordinates are most strongly correlated (here: correlation coefficient of ≈ 0.6). Described method is called Canonical Correlation Analysis (CCA).


ICANN 2011 - p. 6

Correlation based method (3/3) Data set 1

30 20

20

10

10

0

0

−10

−10

−20

−20

−30 −30

−20

−10

0

10

Whitened data set 1

Data set 2

30

20

−30 −30

30

−20

Whitened data set 2

6

6

4

4

2

2

−10

0

10

20

30

Correlation between projections on ei 0.08

3

0.06

2

0.04

e1

0

α1

−2

e2

−2

−4 −6 −6

α1

0

1

−2

0

2

4

6

−6 −6

0

0 −0.02

−1

α2

−0.04

−4

−4

0.02

−2

−4

−2

0

2

4

6

−0.06

−3 −2

0

2

−0.08

α2

→ Method does not seem to work here. Michael U. Gutmann – University of Helsinki

ICANN 2011 - p. 7

What does “not work” mean?

It means 1. that we did not find meaningful features within each data set 2. that we did not find any relation between the two data sets Whitened data set 1 6

6

Axis 1

4

2

2

0

0

−2

−2

−4

−4

−4

−2

0

2

4

10

6

−6 −6

Projection on axis 2

4

−6 −6

Blue: original data Red: data with same marginal distr.

Whitened data set 2

Axis 2 −4

−2

0

2

4

6

5

0

−5

−10 −10

−5

0

5

10

Projection on axis 1

⇓

⇓

Identify independent components Michael U. Gutmann – University of Helsinki

⇓

Identify coactivation ICANN 2011 - p. 8

In this talk . . .

Introduction ● CCA

I will present a method where

● Limitations ● New method Extraction of coactivated features Testing on artificial data Application to real data Summary

1. the features for each data set are maximally statistically independent 2. the features across the data sets tend to be jointly activated: they have statistically dependent variances 3. multiple data sets can be analyzed


ICANN 2011 - p. 9

Extraction of coactivated features


ICANN 2011 - p. 10

Statistical model underlying our method (1/2) ■ Introduction Extraction of coactivated features

■

Given n data sets, we assume that each set is formed by iid. observations of a random vector zi ∈ Rd . To model structure within a data set, we assume that

● Statistical model ● Learning

zi =


d X

qik sik

(i = 1, . . . n)

k=1

Application to real data

■

qik : orthonormal, si1 , . . . , sid : statistically independent To model structure across the data sets, we assume that s1k , . . . , snk are statistically dependent. 6

6

4

4

q11

0

−2

−2

q12

q21

−4

−4


5

q22

2

0

−6 −6

10

z22

z21

2

Blue: original data Red: data with same marginal distr.

s21

Summary

−4

−2

0

z11

2

4

6

−6 −6

−4

−2

0

z12

0

−5

2

4

6

−10 −10

−5

0

s11

5

10

ICANN 2011 - p. 11

Statistical model underlying our method (2/2) ■ Introduction

Dependency assumptions: The k-th sources from all the data sets share a common (latent) variance variable σk :


s1k = σk s˜1k

● Statistical model

s2k = σk s˜2k

snk = σk s˜nk

...

● Learning Testing on artificial data

■

Application to real data Summary

■

The s˜1k , . . . , s˜nk are Gaussian (zero mean, possibly correlated). Choosing a prior for σk completes the model specification. Correlation of squares

s11

Linear correlation 1

1

0.8

0.8

Blue: original data Red: data with same marginal distr. 10

5 0.6

s21

s12

0.6

s21

0.4

s22

0.2

0.2

−5

0 0

s11


0

0.4

s12

s21

s22

s11

s12

s21

s22

−10 −10

−5

0

s11

5

10

ICANN 2011 - p. 12

Applying the method – learning the parameters ■ Introduction Extraction of coactivated features

■

● Statistical model ● Learning Testing on artificial data

■

Application to real data Summary

The most interesting parameters of the model are the features qik . They can be learned by maximizing the log-likelihood ℓ(q11 , . . . , qnd ). For the special case of uncorrelated sources, ! n d T X XX 1 n ℓ(q1 , . . . , qd ) = (qik T zi (t))2 , Gk t=1 k=1

(1)

i=1

where Gk (u) = log

Z

2

u pσk (σk ) − 2 n exp 2σk (2πσk2 ) 2

dσk .

(2)

Equations show that the presented method is related to Independent Subspace Analysis (see paper)


ICANN 2011 - p. 13



ICANN 2011 - p. 14

Simulation setup and goals

Introduction

■

Extraction of coactivated features Testing on artificial data ● Setup ● Results Application to real data Summary

■


Setup ◆ Three data sets (n = 3) of dimension four (d = 4) ◆ No linear correlation in the sources si k ◆ Randomly chosen orthonormal mixing matrices Q1 , Q2 , Q3 ◆ 10000 observations ◆ Learning of the parameters by maximization of √ log-likelihood, with ad hoc nonlinearity G(u) = − 0.1 + u Quantities of interest ◆ Error in the mixing matrices ◆ Identification of the coupling

ICANN 2011 - p. 15

Results −3

−4.2

● Setup

Objective


Squared estimation error

−4.3

Introduction Extraction of coactivated features

3

−4.4 −4.5

● Results

−4.6 Application to real data

−4.7

Summary

0.6

0.7 0.8 0.9 Correct coupling (fraction)

1

x 10

2.5 2 1.5 1 0.5 0.5

0.6 0.7 0.8 0.9 Correct coupling (fraction)

1

Results for one estimation problem. Optimization performed for 20 different initializations. ■ ■

■


Correct coupling at maximum of the log-likelihood Presence of local maxima (not nice! but no catastrophe, see following slides) Learning the right mixing matrices without learning the right coupling seems possible. ICANN 2011 - p. 16



ICANN 2011 - p. 17

Setup

Introduction

■

Extraction of coactivated features Testing on artificial data

■


■

● Setup ● Results Summary

■


Data set 1: 10000 image patches of size 25px × 25px, extracted at random locations from natural video data Data set 2: same image patches, 40ms later For each data set separately: whitening and dimension reduction (98% of variance retained) Learning of 50 features per data set by maximization of the log-likelihood of the model (same objective function as for the artificial data).

ICANN 2011 - p. 18

Learned features Our method: First data set

Second data set

Introduction Extraction of coactivated features Testing on artificial data Application to real data ● Setup ● Results Summary

Canonical correlation analysis: First data set


Second data set

ICANN 2011 - p. 19

Summary


ICANN 2011 - p. 20

Summary ■ Introduction

Presented a new method to find related features (structure) in multiple data sets:


1. The features for each data set are maximally statistically independent. 2. The features across the data sets tend to be jointly activated: they have statistically dependent variances. 3. Multiple data sets can be analyzed.

Testing on artificial data Application to real data Summary ● Summary

■


In the paper: ◆ more theory (in particular, more on the relation to CCA) ◆ more simulations with natural images ◆ simulations with brain imaging data

ICANN 2011 - p. 21

Extracting Coactivated Features from Multiple Data Sets - Google Sites

Extracting Coactivated Features from Multiple Data Sets - Google Sites

Suggest Documents

Extracting Coactivated Features from Multiple Data Sets - Google Sites

Extracting linguistic rules from data sets using ... - Semantic Scholar

Extracting Knowledge From Massive Astronomical Data Sets - arXiv

Extracting Data From ContourPLot

Working with Panel Data: Extracting Value from Multiple ... - SAS Support

Noise reduction in multiple-echo data sets using ... - Google Sites

EXTRACTING FACES AND FACIAL FEATURES FROM COLOR ...

EXTRACTING NOISE-ROBUST FEATURES FROM ... - Semantic Scholar

EXTRACTING FEATURES FROM 3D UNSTRUCTURED ... - CiteSeerX

Extracting value from big data

Extracting value from big data

Comparison of drought indicators derived from multiple data sets over ...

extracting data from virtual data warehouses - wseas.us

On Extracting Knowledge from the Data Warehouse for ... - Google Sites

Deep Learning of Image Features from Unlabeled Data for Multiple ...

Multiple sets of features for automatic genre classification of web ...

WebSets: Extracting Sets of Entities from the Web Using Unsupervised ...

Extracting temporal EEG features with BCIpy - Google Sites

Extracting Relational Data from HTML Repositories - CiteSeerX

Extracting Assumptions from Incomplete Data - CiteSeerX

Extracting Metadata From the Data Analysis Workflow

Extracting Predictive Information from Heterogeneous Data Streams ...

Extracting Patterns from Guitar Accompaniment Data: Some ...

Extracting Structured Data from Web Pages - CiteSeerX