Faithful reconstruction of imagined letters from 7T fMRI measures in ...

0 downloads 208 Views 1MB Size Report
Kamitani, Y. (2008). Visual image reconstruction from human brain activity using a combination of multiscale local image
Faculty of Psychology and Neuroscience

OHBM 2017: 1536

Faithful reconstruction of imagined letters from 7T fMRI measures in early visual cortex Mario Senden1*, Thomas Emmerling1*, Martin Frost1, Rainer Goebel1,2

Department of Cognitive Neuroscience, Maastricht University, The Netherlands 2 Department of Neuroimaging and Neuromodeling, Netherlands Institute for Neuroscience, The Netherlands * Equal contributions 1

Introduction

Correspondence to Mario Senden

or

Supplementary Figure 1

Thomas Emmerling

[email protected]

[email protected]

T +31-(0)43-3882071

T +31-(0)615857111

Results

In recent years, a number of studies have successfully reconstructed visually perceived images (e.g. Miyawaki et al. 2008; Schoenmakers et al. 2013; Thirion et al., 2006). However, the reconstruction of mental imagery presents a more challenging goal. Previous work has demonstrated functional retinotopic activation of early visual areas during visual mental imagery (e.g. Ganis, et al., 2004; Goebel et al., 1998) as well as the possibility to decode the abstract content of this imagery (e.g. Emmerling et al., 2016; Naselaris et al., 2014). This suggests that it might indeed be possible to reconstruct the specific, recognizable, content of mental imagery. Such reconstruction could open new frontiers for neurofeedback and brain-computer interfaces (BCIs), especially for the development of content-based BCI speller systems. Here we present a proof-of-concept study showing the feasibility of reconstructing four imagined letters from functional magnetic resonance imaging (fMRI) data measured at 7T in early visual cortex. Figure 4: Visual field coverage. Mapped visual field locations given by the x and y parameters of the isotropic Gaussian pRF model combined for V1, V2, and V3. While coverage is good for subjects 1 and 3, subject 2 presents with extensive holes in visual field coverage.

Methods Experimental Design Three subjects (mean age 31.1 years, 1 female) underwent three training sessions to practice mental imagery (see figure 1). After training, subjects underwent fMRI measurements including one retinotopy run (moving bar), four imagery runs, and one perceptual run during which the four letters were visually presented in trials of 6 seconds separated by resting phases of 9 or 12 seconds. During imagery runs, previously learned tone patterns indicated which letter to imagine. There was no visual stimulation besides the fixation cross and guiding box. Imagery trials lasted 6 seconds separated by 9 or 12 second rest periods.

Figure 1: Training task. In the reference phase (top), four letters H, T, S & C were paired with a tone pattern. In the trial phase (bottom), the tone pattern was played and the letter shown for 5s (fading out after 3s) followed by an imagery period of 18s, a probing period of 4.5s, and a resting period of 3s or 6s.

Pre-processing Functional data were high-pass filtered using a general linear model (GLM) Fourier basis set of three cycles sine/cosine. Functional runs were co-registered and aligned to the anatomical scan using an affine transformation (9 parameters) and z-normalized. An isotropic Gaussian population receptive field model was fit based on the retinotopy run. Using the resulting eccentricity and polar angle maps, regions-of-interest (ROIs) for visual areas V1, V2, and V3 were defined. The resulting surface patches were projected back into volume space.

Pre-training We used a de-noising autoencoder in order to learn efficient letter features (figure 2). The autoencoder was trained to reproduce letter-typical voxel co-activations (voxel patterns, VPs). Voxel patterns within each ROI were obtained from perceptual data. First, single trial VPs were obtained by averaging BOLD activations in the range from +2 until +4 volumes after trial onset. Grand average VPs per letter were subsequently obtained by averaging over all single trial VPs of a letter and z-normalizing. These grand average VPs were used for training.

v1

v2

v3

ṽ1

ṽ2

ṽ3

v4

ṽ4



vN-3



ṽN-3

vN-2

ṽN-2

vN-1

vN

ṽN-1

ṽN

Figure 2: De-noising autoencoder. A single-layer autoencoder was trained to reproduce VPs after adding Gaussian noise (σ = 12). The number of units in the hidden layer was 10% of ROI voxels. Hidden units had a sigmoid activation function while output units activated linearly. The learning rate was 10-6, momentum was 0.9, batches had a size of 100, and loss was measured by the sum of squared distances. Training lasted 2500 iterations.

Figure 5: Reconstructed visual field images (VFIs). Reconstructed VFIs of one run are visualized for each subject and ROI using voxel patterns averaged in the range from +2 until +4 volumes after trial onset. Perceptual voxel patterns were obtained from the raw BOLD time-series while imagery voxel patterns were obtained from cleaned BOLD time-series after feeding raw data through the autoencoder. For comparison, reconstructions of imagined letters without using the autoencoder can be found in supplementary figure 1 (see qr-code above).

Subject 1

Subject 2

Subject 3

Classification Accuracy

We recorded anatomical (voxel size = 0.7 × 0.7 × 0.7 mm 3 , TR = 5000 ms) and functional (voxel size = 0.8 × 0.8 × 0.8 mm3, TR = 3000 ms) images with a Siemens Magnetom 7T scanner and a 32-channel head-coil. The field-of-view covered occipital, parietal, and temporal areas

V1

V2

V2

V1V2V3

V1

V2

V2

V1V2V3

V1

V2

V2

V1V2V3

Figure 6: Multi-voxel pattern analysis classification accuracies. Average classification accuracies across four leave-one-out runs of imagery data are given for four ROIs in each subject. Classification was performed for letter-specific voxel patterns averaged in the range from +2 until +4 volumes after trial onset. The black horizontal line indicates accuracies expected by chance; grey horizontal lines demarcate the 95th percentile of permutation classification accuracies (1000 permutations). Average classification accuracies exceed both theoretical chance levels and the 95th percentile in all ROIs and subjects.

Conclusions Classification

Reconstruction Based on the mapping from visual field to cortex given by volume pRFs, we obtained weights mapping the cortex to a visual field image (VFI): W VFI   W

pRF

D

H

T

v3

v4

S

C

vN-3

vN-2

T

where D is a diagonal matrix of the inverse outdegree of each pixel in the visual field. Using these weights, VFIs were calculated from grand average VPs (VFI=WVFIVP). Since activations for imagery runs were noisy compared to perceptual runs, they were cleaned using the pre-trained autoencoder prior to extracting grand average VPs.

v1

v2



vN-1

vN

Figure 3: Letter-classifier. A four unit softmax classifier was stacked on the pretrained hidden layer (red weights). The network was then trained to classify single trial VPs in imagery runs. These runs were split into training and testing datasets in a leave-one-run-out procedure. The learning rate was 10-4, momentum was 0.9, batches had a size of 96, and loss was measured by cross-entropy. Training lasted 250 iterations.

Our proof-of-concept study shows that recognizable reconstructions of imagined letters from high resolution fMRI data of the visual cortex are feasible. This constitutes an important first step in the development of content-based BCI speller systems. Nevertheless two limiting factors need to be borne in mind when developing such systems. First, population receptive fields need to be estimated to a high degree of precision. This is obvious when considering subject 2 for whom visual field coverage was worst and who also showed lowest fidelity of reconstruction. Second, mental imagery is cognitively very demanding. As such, the spatial distribution of brain activation in early visual cortex might be noisy and deviate from expectations given retinotopy. This requires subjects to be trained extensively to maintain a vivid image of each letter in mind as well as analysis tools (such as the denoising autoencoder used here) able to capture essential features in the data. In a real-time setting, the problem might be further alleviated by providing subjects with feedback in the form of online reconstructions of their imagined letters.

Funding This project has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under Grant Agreement No. 7202070 (HBP SGA1) as well as under ERC-2010-AdG grant (269853).

References Emmerling, T. C., Zimmermann, J., Sorger, B., Frost, M. A., & Goebel, R. (2016, January). Decoding the direction of imagined visual motion using 7T ultra-high field fMRI. NeuroImage, 125, 61–73. Ganis, G., Thompson, W. L., & Kosslyn, S. M. (2004, July). Brain areas underlying visual mental imagery and visual perception: an fMRI study. Brain research. Cognitive brain research, 20(2), 226–41. 2004.02.012 Goebel, R., Khorram-Sefat, D., Muckli, L., Hacker, H., & Singer, W. (1998, May). The constructive nature of vision: direct evidence from functional magnetic resonance imaging studies of apparent motion and motion imagery. The European journal of neuroscience, 10(5), 1563–73. Miyawaki, Y., Uchida, H., Yamashita, O., Sato, M., Morito, Y., Tanabe, H. C., … Kamitani, Y. (2008). Visual image reconstruction from human brain activity using a combination of multiscale local image decoders. Neuron, 60(5), 915–929. Naselaris, T., Olman, C. A., Stansbury, D. E., Ugurbil, K., & Gallant, J. L. (2014, October). A voxel-wise encoding model for early visual areas decodes mental images of remembered scenes. NeuroImage, 105, 215–228. Schoenmakers, S., Barth, M., Heskes, T., & van Gerven, M. (2013). Linear reconstruction of perceived images from human brain activity. NeuroImage, 83, 951–961. Thirion, B., Duchesnay, E., Hubbard, E., Dubois, J., Poline, J.-B., Lebihan, D., & Dehaene, S. (2006). Inverse retinotopy: inferring the visual content of images from brain activation patterns. Neuroimage, 33(4), 1104–1116.