Gaussian process regression and recurrent neural networks for fMRI ...

Gaussian process regression and recurrent neural networks for fMRI image classification Emanuele Olivetti1,2, Diego Sona1, Sriharsha Veeramachaneni1 {olivetti|sona|sriharsha}@itc.it (1)

ITC/Irst (2) ICT International Doctorate School, University of Trento Via Sommarive, Povo – Trento (Italy)

1

Or better... Dimensionality reduction and recurrent neural networks for feature rating prediction from fMRI data Emanuele Olivetti1,2, Diego Sona1, Sriharsha Veeramachaneni1 {olivetti|sona|sriharsha}@itc.it (1)

ITC/Irst (2) ICT International Doctorate School, University of Trento Via Sommarive, Povo – Trento (Italy)

2

Outline ●

Foreword

●

Pre-Processing: – – –

●

Smoothing Mutual Information Clustering

Prediction: –

Recurrent Neural Networks (NO GPR!)

●

Results & Conclusion

●

Future Work

HBM 2006 - June, 15th 2006

3

Prologue: February 2006 ●

no prior domain knowledge (wikipedia)

●

no specific software

●

just for fun, starting from scratch

Initial Dataset: ●

●

3x3 Pre-processed and Spatially Normalized Brain Image Data (Analyze format): INPUT 2x3x27(19) Feature ratings data convolved with a hemodynamic filter : TARGET

HBM 2006 - June, 15th 2006

4

Schema of our Approach ●

Create a generic model to map fMRI data of a single subejct to a single Target Feature Rating –

Pre-processing: reduce noise and dimensionality of the input dataset (curse of dimesionality) 2 5 10 10 103

3

10 –

Predict Feature Ratings using non-linear timedependent regression From now on: Subject 1 and Language

HBM 2006 - June, 15th 2006

5

Pre-processing

HBM 2006 - June, 15th 2006

6

Pre-Processing (1):

HBM 2006 - June, 15th 2006

Smoothing

7

Pre-Processing (2): I  X ;Y = ∑

∑

y ∈Y x∈ X

Mutual Information p x , y p  x , y log p  x p y ●

●

Subject 1 – Movie 1 – Language: slice 22

HBM 2006 - June, 15th 2006

Mutual Information measures “informativeness” of a voxel with respect to a given Target Feature For each voxel: compute Mutual Information between its values in time and each feature rating

8

Pre-processing (2): Mutual Information

Language – Subject 1, Movie 1

HBM 2006 - June, 15th 2006

9

Pre-processing (2): Mutual Information

Thresholding and Merging top 1.5x104 voxels

Movie 1

Movie 2

Subject 1 – Movie 1 – Language slice 22

Merging = Intersection ● ~80% overlap between movie 1 and movie 2 ●

HBM 2006 - June, 15th 2006

~104 voxels

10

Pre-processing (2): Mutual Information - Thresholding and Merging

HBM 2006 - June, 15th 2006

Language – Subject 1

11


HBM 2006 - June, 15th 2006


12


HBM 2006 - June, 15th 2006

Music – Subject 2

13


HBM 2006 - June, 15th 2006

Motion – Subject 2

14

Pre-Processing (3): Clustering ●

Idea: reduce 104 voxels to ~102 representatives

●

K-Means clustering, K=200

●

d(voxelA,voxelB) = f(spatial distance, temporal correlation)

d =d

 spatial

1−

1−r temporal 

●

α = 0.5

●

correlation: movie 1*+2*+3

HBM 2006 - June, 15th 2006

15

Pre-processing (3): Clustering

HBM 2006 - June, 15th 2006


16


Language - Subject 1 - cluster 21 (~300

HBM 2006 - June, 15th 2006

17


Averaging ●

For each cluster compute average over its voxels: –

1 cluster (= ~10^2 voxels) → 1 average

–

1 movie = 200 averages x 10^3 timesteps 105 th

HBM 2006 - June, 15 2006

103

~102 103

18

Prediction

HBM 2006 - June, 15th 2006

19

Prediction: mixture of Recurrent Neural Networks ●

RNN pros: – – –

– ●

simple/standard non-linear exploit time-dependence inputs (brain and movie inertia) no need for priors on data distribution

RNN cons: –

RNN could overfit

HBM 2006 - June, 15th 2006

20

Prediction: RNN - Model y t = f wTy x t  ●

200 input units

●

1 output unit

●

4 hidden units

logistic

wy

xt tanh

tanh

tanh

tanh

T

●

[wu ,wx]

ut ut = input vector HBM 2006 - June, 15th 2006

T

x t =g w u u t w x x t−1  , x 0=0

xt-1 xt-1 = internal state at (t-1)

●

trained with BackPropagation through time each input normalized separately 21

Prediction: RNN mixture movie

movie

2

1 1

1

2

1

1

1

2

2

2

2

15

Mixture: Y t =∑ max 0, r  y t  i=1

th

HBM 2006 - June, 15 2006

i

i

movie

3

22

Prediction: Observations / Insights ●

●

easy features (Faces, Language, etc.): 1 or 2 hidden units are good difficult features (Food, Attention, etc.): accuracy (low) independent of number of hidden units: – –

●

Pre-processing removed important information? Information not present in fMRI data?

NOTE: we don't exploit correlation between features

HBM 2006 - June, 15th 2006

23

Conclusions ●

Proposed method very general: – –

●

●

●

●

No hand tuning for features No hand tuning for subjects

score>0.482 on base features using movie 1 and 2 (train set) to predict movie 3 Pre-processing more important than prediction Averaging predictions over subjects is more accurate (objectivity vs. subjectivity ?) rd

Our 3 (best) submission was only on base features because lack of CPU time

HBM 2006 - June, 15th 2006

24

Future Work ●

Exploit: – – – – – –

●

volumes' autocorrelation during pre-processing objectivity/subjectivity per feature features inter-dependence linguistic context (features' smooth evolution) morphological context (volumes' smooth evolution) past and future (contextual RNN)

Explore different spatial/temporal trade-off during clustering intereseted in details? Come to poster 677

HBM 2006 - June, 15th 2006

25

We'd like to thank... ●

●

●

Scientific/Numerical Python project http://www.scipy.org, http://www.numpy.org for their amazing numerical libraries M.de Hoon, S.Imoto, S.Miyano, Laboratory of DNA Information Analysis, Human Genome Center – University of Tokyo, for pycluster library Francisco Pereira, for his post on “Machine Learning (Theory)” blog: http://hunch.net/?p=166

HBM 2006 - June, 15th 2006

26

Some extra slides

HBM 2006 - June, 15th 2006

27

Mutual Information (continued)

HBM 2006 - June, 15th 2006

28

Training Process: RNN mixture Given a subject and a feature: ●

●

●

●

●

●

join movie 1 and movie 2, remove blanks 5-fold cross validation train 3 times each of the five models with different random initialization of parameters, among the 15 resulting models remove those having negative correlation on the validation set evaluate remaining models on movie 3 average all outputs according to the correlations on validation set (weighted average)

HBM 2006 - June, 15th 2006

29

This presentation is distributed under a Creative Common License

Attribution 2.5 You are free: ●

to copy, distribute, display, and perform the work

●

to make derivative works

●

to make commercial use of the work

Under the following conditions: Attribution. You must attribute the work in the manner specified by the author or licensor. For any reuse or distribution, you must make clear to others the license terms of this work. Any of these conditions can be waived if you get permission from the copyright holder.

HBM 2006 - June, 15th 2006

30

Gaussian process regression and recurrent neural networks for fMRI ...

Gaussian process regression and recurrent neural networks for fMRI ...

Suggest Documents

Neural-net-induced Gaussian process regression for

implementing gaussian process inference with neural networks

RNNSecureNet: Recurrent neural networks for

Recurrent Neural Networks

Gaussian Cox process regression

Heteroscedastic Gaussian Process Regression - CiteSeerX

recurrent convolutional neural network regression for

Recurrent Convolutional Neural Network Regression for ... - arXiv

A Combined Neural Network/Gaussian Process Regression Time ...

Gaussian Process Regression for Rendering Music Performance

Gaussian Process Regression for Hand Gesture

Gaussian process regression model for ... - Proteome Science

Gaussian Process Regression for Structured Data Sets

A Combined Neural Network/Gaussian Process Regression Time ...

Heteroscedastic Gaussian Process Regression for ... - Semantic Scholar

Bagging for Gaussian Process Regression - Google Sites

Approximate Inference for Robust Gaussian Process Regression

Fast Gaussian Process Regression for Big Data

Gaussian Process Regression for Sensor ... - Semantic Scholar

Shape-constrained Gaussian Process Regression for ... - CiteSeerX

Revisiting Gaussian Process Regression Modeling for ... - MDPI

Time-dependent gaussian process regression and ...

Application of Time-Delay Neural and Recurrent Neural Networks for

Predictive State Recurrent Neural Networks