A Dictionary Learning Algorithm for Multi-Channel ... - IEEE Xplore

4 downloads 0 Views 1MB Size Report
A Dictionary Learning Algorithm for Multi-Channel. Neural Recordings. Tao Xiong, Yuanming Suo, Jie Zhang, Siwei Liu, Ralph Etienne-Cummings, Sang Chin, ...
A Dictionary Learning Algorithm for Multi-Channel Neural Recordings Tao Xiong, Yuanming Suo, Jie Zhang, Siwei Liu, Ralph Etienne-Cummings, Sang Chin, Trac D. Tran Department of Electrical and Computer Engineering The Johns Hopkins University Email: {tao.xiong, ysuo1, jzhang41, sliu71, retienne, schin11, trac} @jhu.edu

Abstract— Multi-channel neural recording devices are widely used for in vivo neuroscience experiments. Incurred by high signal frequency and large channel numbers, the acquisition rate could be on the order of hundred MB/s, which requires compression before wireless transmission. In this paper, we adopt the Compressed Sensing framework with a simple on-chip implementation. To improve the performance while reducing the number of measurements, we propose a multi-modal structured dictionary learning algorithm that enforces both group sparsity and joint sparsity to learn sparsifying dictionaries for all channels simultaneously. When the data is compressed 50 times, our method can achieve a gain of 4 dB and 10 percentage units over state-of-art approaches in terms of the reconstruction quality and classification accuracy, respectively.

Fig. 1. A schematic of the proposed multi-channel CS approach for a Tetrode setup. The multi-modal structured dictionary learning is our key contribution in this paper.

I. I NTRODUCTION In the brain, neurons communicate with each other through generation of Neural Action Potentials, commonly known as spikes. These spike are short-lasting events generated by neurons through a rapid change of membrane potentials. Neuroscientists have used electrodes to measure the spikes in order to study the activity of the brain. Typically, these experiments require not only the measurements of spikes from one neuron, but also from a group of neurons in a small area of the brain. As a result, multi-electrode array (MEA) or silicon probes containing hundreds or even ten of thousands of recording sites are designed. The spikes are typically sampled at 20 to 30 kHz at a resolution above 8 bits. Thus, MEA with hundreds of recording site would generate data on the order of MB/s. Transmission of this large volume of data wirelessly requires power consumption on the order of mW [1]. To meet the requirements of both high acquisition rate and energy efficiency, compression techniques must be utilized. Compressed Sensing (CS) [2] is shown to be promising for neural signal compression. Our previous work [3] shows that a signal dependent CS approach can achieve much better compression rate, recovery performance and spike classification accuracy than the traditional methods (i.e.,spike detection, on-chip wavelet transformation and wavelet-based CS methods). There are two key elements of a CS system, the sensing matrix and the sparsifying dictionary as shown in Figure 1. The sensing matrix S is implemented on-chip to compress the signal before transmission or readout. A random Bernoulli sensing matrix can be implemented on-chip using

978-1-4799-2346-5/14/$31.00 ©2014 IEEE

simple digital accumulators operating at signal Nyquist rate. The sparsifying dictionary D is a transform domain where the signal can be represented with only a few coefficients. The design of such dictionary is the key to guarantee good recovery performance and classification quality. Our previous approaches [3], [4] have used signal dependent dictionary and data dictionary to demonstrate state-of-art reconstruction and classification performances for single channel neural recordings. In this paper, we extend our signal dependent CS approach to Tetrode with following contributions, which can be applied in MEA due to the correlation among local electrodes. a) Improved spike classification quality using group sparsity: It is well known that spikes generated by different neurons have distinct shapes. By introducing group sparsity in the dictionary learning, we map each neuron to a specific group of dictionary atoms. Thus in reconstruction, we choose only atoms from the same group rather than different groups to recover each spike. This group sparsity constraint is shown to significantly improve the classification performance. b) Enhanced reconstruction by joint sparsity: In an multielectrode recording, spikes from a single neuron is often picked up by all electrodes surrounding it. To take advantage of this correlation between measurements of different electrodes, we add joint sparsity constraint so that the recovered spikes for different channels strictly resemble the same neuron. This helps us achieve much better recovery performance with the same number of measurements. The rest of the paper is organized as follow. In Section II,

a4

D4 D3

Amplitude

x4

a1

a1,3 Class 3

x2 Tetrode

a1,1 Class 1 a1,2 Class 2

D1

x1

Zeros

a2

D2

x3

Chosen Support

a3

D1,1

D1,2

D1,3

Class 1

Class 2

Class 3

Joint Sparsity

Group Structure Time

Fig. 2. An illustration of the shape differences among spikes of three different neurons (color coded) using synthetic dataset from [6].

we introduce the prior works and our multi-modal structured dictionary learning method. In Section III, we compare the proposed approach with other CS based approaches using both synthetic and real datasets. We end the paper with a conclusion in Section IV. II. M ULTI - MODAL S TRUCTURED D ICTIONARY L EARNING A. Prior Work For spike signals, there are two approaches of designing sparsifying dictionary D in the CS framework. The first one is to use signal agnostic dictionaries such as the wavelet or Gabor dictionary [5] since spikes can be sparsely represented in these time-frequency dictionaries. On the other hand, since each neuron registers a unique and repetitive spike signals at the recording electrode, we can learn a signal dependent dictionary using prior information of the spikes. The latter approach can more sparsely represent the spikes, so it outperforms the signal agnostic dictionary in term of compression rate, recovery quality and signal classification accuracy [3]. Moreover, we have also shown that using data directly as the sparsifying dictionary can also lead to comparable performances to signal dependent dictionaries with much lower computational cost [4]. Nevertheless, these approaches are only designed for single channel recording. So none of them considers the correlation between measurements of multiple channels. Moreover, the label differences between dictionary atoms for different neurons are not taken into account. B. Our Approach 1) Signal model: We assume there are T channels and G groups (classes) of neural signals in the data. We incorporate two key features in our signal model: group sparsity and joint sparsity. As shown in Figure 3, neural signal xt for each channel t (i.e., t = 1, ..., 4 in a Tetrode setup) is linearly represented using the corresponding dictionary Dt , which is a concatenation of sub-dictionaries Dt,g for different types of neural signals. For example, D4,1 contains dictionary atoms

Fig. 3. The illustration of group structure (color coded blocks) and joint sparsity (red dotted lines) for multi-modal structured dictionary learning.

for class 1 in channel 4. We let coefficient matrix A = [a1 , a2 , ..., aT ], where at denotes sparse coefficients vector of channel t and at,g is the sub-vector extracted from at using group structure g. For example, a4,1 is the sub-vector in a4 that contains coefficients for the sub-dictionary D4,1 . As shown in Figure 2, different neurons generate spikes that are unique and repetitive in shapes. Therefore, it is better to represent the signal using dictionary atoms belonging to a single group g (same neuron) rather than atoms from different groups (different neurons). To achieve this goal, we enforce group sparsity by dividing our dictionary Dt into different sub-dictionaries and choosing only the atoms from the same sub-dictionary Dt,g . As a result, the non-zero coefficients in at will belong to same sub-vector at,g as shown in Figure 3. The aforementioned group sparsity works independently for each channel. However, neural spikes generated by the same neuron are recorded simultaneously by all nearby channels in similar patterns as in Figure 4. This indicates that a high correlation exists between measurements of different channels. To capture this correlation, we implement joint sparsity using row-`0 quasi-norm in our dictionary learning method, which to choose the same group g for all channels. As shown in Figure 3, sparse coefficients of all channels share the same support pattern in the same group as a result of the joint sparsity constraint. 2) Multi-modal structured dictionary learning: Based on our signal model, we propose a multi-modal structured dictionary learning (MMSDL) algorithm to incorporate both group sparsity and joint sparsity constraints. We use ”multi-modal” here to emphasize the fact that our algorithm can handle the scenario that the dictionary is different for each individual channel as in a Tetrode setup. Our algorithm iterates between two stages: sparse coding and codebook update. In sparse coding stage, the objective function is defined as: min ||xt − Dt at ||2 A

s.t. ||A||row,0 ≤ Krow ,

G X g=1

I(||at,g ||2 > 0) ≤ Kblock , ∀ t.

Note that the time stamps for training data xt from different channel t need to be identical to guarantee our joint sparsity constraint. Here we enforce the data fidelity term ||xt −Dt at ||2 for all channels simultaneously. To embed the joint sparsity, we limit ||A||row,0 , the number of nonzero rows [7], to be no greater than Krow . To enforce the group sparsity, we want the number of blocks with non-zero coefficients in at to be smaller or equal to a threshold Kblock (which is 1 in our case). In our formulation, I is the indicator function. We propose Algorithm 1 to solve the optimization problem. In sparse coding step, we decouple joint sparsity and group sparsity into sub-problems and solve them sequentially. We first represent the input signals with the entire dictionary Dt using Simultaneous Orthogonal Matching Pursuit (SOMP) [7]. Then we determine the group g (type of neuron), which is chosen for all channels via sparse representation-based classification (SRC) [8]. Finally, we decompose the signal via SOMP again but using only the chosen sub-dictionary Dt,g . We denote this three-step procedure as Structured SOMP. For code book stage, we adopt the same approach as K-SVD [9]. Algorithm 1 Multi-Modal Structured Dictionary Learning Require: For each channel t, training data Xt = [xt,1 , xt,2 , ..., xt,n ], where xt,i denotes the signal belonging to t-th channel at i-th time stamp. Number of groups G, maximum active row number Krow and number of maximum iteration maxIter. 1: Initialization by randomly selection from training data. 2: while iter ≤ maxIter do 3: for i := 1 to n do 4: Solve the representation problem via SOMP, min ||xt − Dt at ||2 A

s.t. ||A||row,0 ≤ Krow , ∀ t. 5: 6:

Determine the class g for the i-th signals of all channels using SRC. Solve the representation problem with the chosen sub-dictionary Dt,g via SOMP, min ||xt − Dt,g at ||2 A

s.t. ||A||row,0 ≤ Krow , ∀ t. 7: 8: 9: 10: 11:

end for Codebook update: we use the same method as in [9] for updating each column. Set iter = iter + 1. end while Return Dt , t = 1, 2, ..., T

3) Reconstruction and classification approach: In our multi-channel CS framework, we use a Bernoulli sensing matrix S to compress the neural signals xt ∈ RN into measurement vectors yt ∈ RM because of its simple implementation using integrated circuits [3]. Given the compressed measurements yt and the learned dictionary Dt from our

MMSDL method, we use Structured SOMP (Step 4 - 6 in our dictionary learning procedure) to solve the following problem, ˆ at = arg min ||yt − SDt at ||2 A

s.t. ||A||row,0 ≤ Krow ,

G X

I(||at,g ||2 > 0) ≤ Kblock , ∀ t.

g=1

Then, the estimate of neural signal x ˆt is simply x ˆt = Dt ˆ at . For our method and other methods used for comparison, we determine the group (class) g via SRC using the entire dictionary Dt . III. E XPERIMENTAL R ESULTS In this section, we first compare the single channel recovery performance of our MMSDL with the dictionary trained using K-SVD [3], the data dictionary [4] and wavelet dictionary [5]. The dataset used is the Leicester neural signal database [6]. Then we compare the multi-channel recovery and classification performance of proposed approach with other approaches using the publicly available dataset hc-1 d14521, which collected by Tetrode from the neurons of the rat [10]. For all experiments, we randomly split the data into two halves with one part for training and the other for testing. We repeat our experiment ten different times and report the average results. The recovery performance is measured in terms of Signal to Noise and Distortion Ratio (SNDR), which is defined as SNDR = 20 log ||xt ||2 /||xt − x ˆt ||2 and has the unit in dB [1]. The classification performance is found in terms of classification accuracy (CA), which is the percentage of correctly classified test signals. A. Recovery performance for single channel case We compare three dictionaries generated using proposed MMSDL, signal dependent dictionary, data dictionary and wavelet dictionary under different compression ratio (CR = N M ) in a CS framework. We choose same random Bernoulli matrix for all approaches to compress the neural signal. We use Structured SOMP in our framework and OMP for other approaches because this represents the fundamental difference in the signal models between our approach and other approaches. The recovery performance is shown in Table 1. Note that there is much less difference when the CR is small (i.e., 10 times). However, our method works significantly better by about 4 dB compared to either signal dependent dictionary or data dictionary when the CR goes up to 50 times. B. Recovery and classification performances for multichannel case We also compare our method with the prior work for a multi-channel setup. The SNDR used here is the average recovery performance of all four channels and the classification method is based on SRC. The recovery and classification performance is shown in Table 2 and Table 3, respectively. The results shows that the dictionary by our MMSDL improves

TABLE I

TABLE III

C OMPARISON OF SINGLE CHANNEL RECOVERY PERFORMANCE ( IN

C OMPARISON OF MULTI - CHANNEL CLASSIFICATION PERFORMANCE ( IN

SNDR)

BETWEEN DIFFERENT

CS

METHODS .

CA) BETWEEN DIFFERENT CS

METHODS .

Dictionary Learning & Recovery Method

CR = 50

CR = 10

Dictionary Learning & Recovery Method

CR = 50

CR = 10

MMSDL & Structured SOMP

8.84

13.61

MMSDL & Structured SOMP

92.55

98.50

K-SVD & OMP

4.87

12.44

K-SVD & OMP

82.50

88.10

Data Dictionary & OMP

4.89

10.48

Data Dictionary & OMP

83.10

84.25

Wavelet Dictionary & OMP

-0.85

-0.84

Wavelet Dictionary & OMP

63.10

75.75

0.4

0.4

0.2

0.2

0

0

−0.2

−0.2

−0.4

Our approach can be used in conjunction with a simple hardware implementation and applied to other multi-channel biological signal monitoring. In the future, we would like to extend our approach to unsupervised dictionary learning for spike sorting.

−0.4 0

50

100

0

Channel 1

50

100

Channel 2

0.4

0.4

0.2

0.2

0

0

−0.2

−0.2

−0.4

The authors are partially supported by NSF under Grant 1057644, CCF-1117545 and Grant DMS-1222567, ARO under Grant 60219-MA, ONR under Grant N00014-12-1-0765 and Grant N00014-10-1-0223, and AFOSR under grant FA955012-1-0136. R EFERENCES

−0.4 0

50

100

ACKNOWLEDGMENT

0

Channel 3

50

100

Channel 4

Fig. 4. An example of multi-channel signals (red) and CS recovered signals (blue) using our approach with CR of 50 times and SN DR = 9.17 dB.

both the recovery and classification performance. The recovery performance and classification performance are improved by more than 3 dB and 10 percentage units, respectively. To show the recovered neural signal, we show an example of the multichannel recovered signal with the CR of 50 times in Figure 4. The red curve is the ground truth and the blue curve is the recovered signal for each channel using our approach. It can seen that the main structure of the neural signal are well preserved with only two measurements in this case. IV. C ONCLUSION We propose a multi-modal structured dictionary learning approach in a CS framework. Our method combines group structure and joint sparsity to promote both reconstruction and classification performances of multi-channel neural recordings. TABLE II C OMPARISON OF MULTI - CHANNEL RECOVERY PERFORMANCE ( IN SNDR) BETWEEN DIFFERENT CS METHODS . Dictionary Learning & Recovery Method

CR = 50

CR = 10

MMSDL & Structured SOMP

7.96

10.25

K-SVD & OMP

5.15

8.79

Data Dictionary & OMP

6.06

8.28

Wavelet Dictionary & OMP

-0.10

0.34

[1] F. Chen, A. P. Chandrakasan, and V. M. Stojanovic, “Design and analysis of a hardware-efficient compressed sensing architecture for data compression in wireless sensors,” IEEE Journal of Solid-State Circuits, vol. 47, no. 3, pp. 744–756, 2012. [2] D. L. Donoho, “Compressed sensing,” IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1289–1306, 2006. [3] J. Zhang, Y. Suo, S. Mitra, S. P. Chin, S. Hsiao, R. F. Yazicioglu, T. D. Tran, and R. Etienne-Cummings, “An efficient and compact compressed sensing microsystem for implantable neural recordings,” IEEE Transactions on Biomedical Circuits and Systems, vol. 99, 2014. [4] Y. Suo, J. Zhang, R. Etienne-Cummings, T. D. Tran, and S. Chin., “Energy-efficient two-stage compressed sensing method for implantable neural recordings,” in IEEE International Conference on Biomedical Circuits and Systems (BioCAS), pp. 150–153, 2013. [5] K. G. Oweiss, A. Mason, Y. Suhail, A. M. Kamboh, and K. E. Thomson, “A scalable wavelet transform vlsi architecture for real-time signal processing in high-density intra-cortical implants,” IEEE Transactions on Circuits and Systems, vol. 54, no. 6, pp. 1266–1278, 2007. [6] R. Quian Quiroga, Z. Nadasdy, and Y. Ben-Shaul, “Unsupervised spike detection and sorting with wavelets and superparamagnetic clustering,” Neural Computation, vol. 16, pp. 1661–1687, 2004. [7] J. A. Tropp, A. C. Gilbert, and M. J. Strauss, “Simultaneous sparse approximation via greedy pursuit,” in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 5, pp. v– 721, 2005. [8] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210–227, 2009. [9] M. Aharon, M. Elad, and A. Bruckstein, “K-svd: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Transactions on Signal Processing, vol. 54, no. 11, pp. 4311–4322, 2006. [10] D. A. Henze, Z. Borhegyi, J. Csicsvari, A. Mamiya, K. D. Harris, and G. Buzs´aki, “Intracellular features predicted by extracellular recordings in the hippocampus in vivo,” Journal of neurophysiology, vol. 84, no. 1, pp. 390–400, 2000.