A Deep Learning Framework of Quantized Compressed Sensing for ...

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2016.2604397, IEEE Access 1

A Deep Learning Framework of Quantized Compressed Sensing for Wireless Neural Recording Biao Sun, Member, IEEE, Hui Feng, Student Member, IEEE, Kefan Chen, and Xinshan Zhu∗ , Member, IEEE

Abstract—In low-power wireless neural recording tasks, signals must be compressed before transmission to extend battery life. Recently, Compressed Sensing (CS) theory has successfully demonstrated its potential in neural recording applications. In this study, a deep learning framework of quantized CS, termed BW-NQ-DNN, is proposed, which consists of a binary measurement matrix, a non-uniform quantizer, and a noniterative recovery solver. By training the BW-NQ-DNN, the three parts are jointly optimized. Experimental results on synthetic and real datasets reveal that BW-NQ-DNN not only drastically reduce the transmission bits but also outperforms the state-ofthe-art CS-based methods. On the challenging high compression ratio task, the proposed approach still achieves high recovery performance and spike classification accuracy. This framework is of great values to wireless neural recoding devices, and many variants can be straightforwardly derived for low-power wireless telemonitoring applications. Index Terms—Wireless neural recording, quantized compressive sensing, non-uniform quantization, deep learning

I. I NTRODUCTION The capabilities of wireless sensing of brain activities are highly desired by neuroscience research communities, where chronic recording from the untethered and free-behaving animal models are being investigated [1], [2]. In such applications, wireless data transmission dominates the energy consumption, and the limited telemetry bandwidth could be overwhelmed by the large amount of neural data generated from multiple recording sites. Conventional data compression or feature extraction techniques are computational demanding, which consume large silicon area and offset the energy benefits from reduced data transmission [3], [4]. Recently, the field of Compressed Sensing (CS) [5], [6] has shown potential in achieving compression and recovery performance comparable to the previous approaches but with simpler hardware resources. In CS, a signal x ∈ RN is compressed by a simple matrix-vector multiplication, y = Φx,

(1)

M×N

where Φ ∈ R is the measurement matrix and y ∈ RM is the compressed measurements. Usually (1) is underdetermined, i.e., M < N , and the ratio M/N is called the compression ratio (CR) of CS. In this system, the signal x cannot be uniquely retrieved from Φ and y. However, if x has a sparse representation θ in a pre-defined basis Ψ ∈ RN ×N , Asterisk indicates corresponding author. The authors are with the School of Electrical Engineering and Automation, Tianjin University, Tianjin, 300072, China. Email: [email protected]. This work was supported by the National Natural Science Foundation of China under Grants 61401303 and 51578189.

N samples Neural signal

Spike detection

x

M measurements

Φx

y

Q(·)

MB bits

z

Fig. 1. A unified framework for wireless neural recording using quantized compressed sensing.

i.e., x = Ψθ where only a few entries K ≪ N of θ are nonzero, it is possible to estimate θ from y if the effective matrix A = ΦΨ satisfies the restricted isometry property (RIP) [7]. Some families of random matrices, like appropriatelydimensioned matrices with i.i.d. Gaussian elements or i.i.d. Bernoulli elements, have been demonstrated to satisfy the RIP with high probability. In practice, y must be quantized before transmission, due to the fact that wireless communication can only use digital bits, i.e., z = Q(y) = Q(Φx), (2) where z is the quantized measurements and Q(·) denotes a quantization operator that maps a real value to finite quantization levels [8]. The procedure (2) is called Quantized Compressed Sensing (QCS). After a quantized data stream z is received at the receiver, the reconstruction algorithm takes quantized measurements as input, correspondingly. A common method for recovering the signal from the quantized measurements is the ℓ1 -minimization θˆ = arg minkθkℓ1 θ

subject to kz − Aθk ≤ ǫ,

(3)

where ǫ is the noise tolerance for quantization operation. Other methods such as Basis Pursuit De-Quantization (BPDQ) [9], Quantized Iterative Hard Thresholding (QIHT) [10] and Quantized Variational Message Passing (QVMP) [11] are also ˆ the original signal widely used for QCS. After estimating θ, ˆ ˆ = Ψθ. can be recovered by x Based on CS theory, the typical structure for wireless neural recording system is depicted in Fig. 1. The neural spikes are first detected and aligned, then they are compressed with measurement matrix, after that the compressed measurements are quantized into digital bits, after transmitting these digital bits to a computer or a fusion center, the spikes are finally recovered using an off-line recovery algorithm. A. Challenges and State-of-the-art Porting QCS to wireless neural recording necessitates overcoming the following challenges:

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.


Sparsity representation: One challenge to apply CS to wireless neural recording is that the neural spikes are not sparse on common dictionaries such as discrete cosine transform basis and discrete wavelet transform basis. Recovering spikes using these dictionaries will severely degrade the performance. To alleviate this issue, Zhang et al. [12] proposed learning dictionaries using K-SVD and developed a signal dependent CS approach to compress the data. Suo et al. [13] proposed to use the recorded neural data directly as the sparsity dictionary. Xiong et al. proposed an unsupervised dictionary learning algorithm for single channel and multi-channel neural recordings [14], [15]. Although the signal dependent methods are only applicable when training data is available, the authors claimed that the training data is indeed available in wireless neural recording applications. Measurement matrix design: Performance guarantees for CS indicate that the measurement matrix must satisfy the RIP. Therefore, the design of measurement matrix is crucial for spike recovery. Some families of random matrices, e.g., appropriately-dimensioned matrices with i.i.d. Gaussian elements or i.i.d. Bernoulli elements, have been demonstrated to satisfy the RIP with high probability. Among of them, the i.i.d. random Bernoulli matrix is more wildly used as the measurement matrix due to its hardware implementation efficiency [16]. However, many research results showed that the i.i.d. random Bernoulli matrix are not optimal [17], [18], and how to design a suitable measurement matrix for neural spikes still remains a problem. Quantization of the measurements: Quantization in CS inevitably introduces errors. However, suitably using quantization can largely reduce the wireless transmission bit-budget [19]. QCS is attractive for power or bandwidth limited applications and various algorithms were developed to recover signals from quantized measurements. Haboba et al. [20] studied the quantization error and its effect on reconstructing signals with fixed sparsity. Wang et al. [21], [22] used QCS in lowenergy telemonitoring of EEG signals. Liu et al. [23] proposed a Bayesian de-quantization algorithm to reconstruct photoplethysmography signals. All above algorithms concerned uniform quantization. The non-uniform quantization that outperforms uniform one especially at low-rate scenario has also been exploited in QCS. Sun et al. [24] proposed optimal nonuniform quantization with respect to mean-squared error of the lasso reconstruction. Kamilov et al. [25], [26] examined the optimal non-uniform quantization of CS measurements under reconstruction with message passing algorithms. However, these algorithms are not only computational intensive but also require the signal distribution as the prior knowledge, which is often unavailable in neural recording applications. Efficient recovery solver: Convex optimization algorithms (e.g., ℓ1 -minimization [27]), greedy algorithms (e.g., orthogonal matching pursuit [28] and iterative hard thresholding [29]), and Bayesian algorithms (e.g., sparse Bayesian learning [30] and approximate message passing [31]) are wildly used CS recovery solvers. However, these algorithms have high computational complexity hence can only be used off-line. There is a high demand for developing effective spike recovery algorithms for real-time neural recording applications.

Recently, the deep architectures such as stacked denoising auto-encoders [32], convolutional neural networks [33] and deep fully connected networks [34] have been investigated for sparse signal recovery in CS. Results showed that using a deep neural network as the CS recovery solver significantly reduce the computational complexity. B. Contributions and Paper Organization This study aims to solve above problems and develop an efficient QCS approach for wireless neural recording. Based on deep learning theory, a novel framework that consists Binary Weights, Non-uniform Quantizer, and Deep Neural Network is proposed, denoted by BW-NQ-DNN. By training the neural network, a binary measurement matrix, an optimized nonuniform quantizer, and a non-iterative recovery solver can be obtained simultaneously. Compared with prior works, the main contributions are as follows: 1) Instead of generating the measurement matrix randomly or deterministically, BW-NQ-DNN directly learns a binary measurement matrix during the training phase, which outperforming the i.i.d. random Bernoulli matrix while keeping the implementation efficiency. 2) A non-uniform quantizer is optimized for QCS, outperforming uniform ones for wireless neural recording, especially with low quantization bit-depth. To our best knowledge, this is the first time that a deep neural network has been used for the task of non-uniform quantizer optimization. 3) A non-iterative recovery solver is learned for QCS, leading to a significant advantage compared to state-of-theart both in recovery accuracy and computational complexity. This paper is organized as follows. Section II provides basic background material on quantization. Section III presents the details of BW-NQ-DNN and the training method. Section IV shows the experimental results for neural spike compression and recovery. Section V concludes this paper. II. Q UANTIZATION A. Uniform Quantization A quantizer is a process that discretizes its input by performing a mapping from a continuous set to some discrete set. Specifically, consider K-point regular scalar quantizer Q, defined by its output levels C = {ci ; i = 1, 2, . . . , K}, partition cells {(pi−1 , pi ) ⊂ R; i = 1, 2, . . . , K}, and a mapping ci = Q(s) when s ∈ [pi−1 , pi ) [8]. Additionally define a cell Q−1 (ci ) = [pi−1 , pi ) as the inverse image of the output level ci under Q. For i = 1, if p0 = −∞ the closed interval [p0 , p1 ) is replaced by an open interval (p0 , p1 ). Uniform or linear quantization, where partition cells have equal size and shape, is commonly used in practice and has interesting asymptotic properties [35]–[37]. An example of uniform quantization is illustrated in Fig. 2(a). However, the uniform quantization is not always optimal for natural signals. Take speech signal as an example, the uniform quantization provides unneeded quality for large signals which are least likely to occur, and pronounced truncation effects for the more frequent small amplitude signals. Therefore, uniform quantization does not perform as well as a quantizer with wider



Q(x) 4∆

3∆

3∆

2∆

2∆

∆

∆

−4∆ −3∆ −2∆ −∆

∆

III. D EEP L EARNING F RAMEWORK OF Q UANTIZED C OMPRESSED S ENSING

Q(x)

4∆

2∆

3∆

4∆

Non-uniform

x

x

−∆

−∆

−2∆

−2∆

−3∆

−3∆

−4∆

−4∆

(a)

(b)

Fig. 2. (a) Mid-rise uniform quantization. The mid-point value within a cell is taken as the quantized value for a sample point falling in that cell. (b) The µ-law companding curve and its corresponding non-uniform quantization.

In this section, we propose a deep learning framework for QCS that compresses neural spikes to low dimensional quantized measurements by learning a binary measurement matrix and a non-uniform quantizer, and recovers the measurements back to neural spikes by learning a nonlinear solver, as depicted in Fig. 3. The whole neural network consists of three parts: a compression net, a quantization net, and a recovery net. Both the compression net and the recovery net are fully connected, i.e., each layer takes all neurons in the previous layer and connects it to every single neuron it has. The three parts will be described in detail in the following subsections. A. Compression Net

partition cells at high amplitudes and narrower partition cells at lower amplitudes.

B. Non-uniform Quantization for CS Compared with uniform quantization, non-uniform quantization can provide a significant improvement in distortion [38], [39]. The companding method is a standard way of generating non-uniform quantizers from a uniform one, where the input signal is transformed using a nondecreasing and smooth companding function ϕ : R → (0, 1), then quantized using a uniform quantizer Q with K equidistant levels on (0, 1), and finally passed through the expander function ϕ−1 . An example of µ-law companding function and its corresponding partition cells for speech signal is depicted in Fig. 2(b). Typically, a non-uniform quantizer is optimized by adapting the companding function ϕ to minimize the distortion between the random vector s ∈ Rm and its quantized representation ˆs = Qϕ (s), where Qϕ (s) , Q(ϕ(s)). For example, for a given vector s and the MSE distortion metric, optimization is performed by solving ϕ∗ = arg min E ks − Qϕ (s)k2 , ϕ

(4)

where minimization is done over all K-level scalar quantizers. One standard way of optimizing ϕ is via the Lloyd algorithm, which iteratively updates the partition cells and output levels by applying necessary conditions for quantizer optimality. For the CS framework, however, searching the quantizer that minimizes MSE between s and ˆs is not necessarily equivalent to minimizing MSE between the sparse vector x and its CS ˆ [24], [40]. This reconstruction from quantized measurements x is due to the nonlinear effect added by any particular CS recovery solver. Hence, instead of solving (4), we aim to solve ˆ k2 , ϕ∗ = arg min E kx − x ϕ

(5)

where minimization is performed over all K-level regular ˆ is obtained through a CS recovery scalar quantizers and x solver.

The compression net consists of two layers: (1) an input layer with N nodes, and (2) a compression layer with M nodes which applies a linearity to the affine transformation of its input, i.e., y = Tc (Φx + bc ), (6) where Φ ∈ RM×N is the real weight matrix, bc and Tc (·) denote the bias vector and the activation function of the sensing layer, respectively. To be compatible with canonical CS framework, we assume that the compression layer has no bias terms, i.e., bc = 0. In addition, we choose the identity function as the activation function, i.e., Tc (x) = x. Therefore, the compression net can be described as in (1). To reduce the hardware implementation complexity and simplify computations in neural networks, we constrain the weights of the compression net to be ±1 during training. Specifically, we estimate Φ using a binary matrix B ∈ {−1, +1}M×N and a scaling factor α ∈ R+ such that Φ ≈ αB. Therefore, the matrix-vector multiplication in (1) can be approximated by y = Φx ≈ αBx.

(7)

Without loss of generality, we assume Φ, B are vectors in RH , where H = M × N . To find an optimal estimation for Φ ≈ αB, we solve the following least square optimization: α∗ , B∗ = arg minJ(B, α), α,B

(8)

where J(B, α) is the cost function as J(B, α) = kΦ − αBk2 .

(9)

By expanding equation (9), we have J(B, α) = α2 BT B − 2αΦT B + ΦT Φ. = −2αΦT B + const.

(10)

Since B ∈ {−1, +1}H , BT B = H is a constant. ΦT Φ is also a constant because Φ is a known vector. The optimal solution for B can be achieved by maximizing the following constrained optimization B∗ = arg min − ΦT B B

s.t.

B ∈ {−1, +1}H .

(11)



Compression Net

x

Quantization Net

y

Recovery Net

x ˆ

z ϕ ϕ

ϕ

Binary Weights

Non-uniform Quantizer

Fig. 3. Architecture of the BW-NQ-DNN. Dashed arrows and solid-line arrows denote connections with binary weights and real weights, respectively.

yM

This optimization can be solved by assigning Bi = +1 if Φi ≥ 0 and Bi = −1 if Φi < 0, then the optimal solution is

c−K c−K+1

y2 cK

1/∆

∗

B = sign(Φ).

(12)

y1

−K 1/∆

ψ

1−K

−K

In order to find the optimal value for the scaling factor α, by taking the derivative of J with respect to α and setting it to zero, we have 1 α∗ = ΦT B∗ . (13) H

1/∆

ψ ψ

1−K

−K K

ψ ψ

1−K

ψ

K

ψ ψ K

ψ

By replacing B∗ with sign(Φ), we have 1 X 1 1 |Φi | = kΦkℓ1 . α∗ = ΦT sign(Φ) = H H H

ϕ(yM ) ϕ(y2 ) ϕ(y1 )

(14)

Therefore, the optimal estimation of a real matrix can be obtained by taking the sign of matrix elements. The optimal scaling factor is the average of absolute matrix elements.

Fig. 4. Computational graph of the nonlinear companding function ϕ. The blue nodes denote the input and output of the network. The orange nodes denote the learnable parameters defined in (15).

B. Quantization Net We use the companding method to generate the non-uniform quantizer for our neural network. The quantization net consists of a nonlinear companding function ϕ : R → (0, 1) and a uniform quantizer, as described in section II-B. The implementation details of the two parts are described as follows. 1) Nonlinear Companding Function: In order to devise a computational approach for learning the nonlinear companding function ϕ, we adopt the following parametric representation for the nonlinearities ϕ(y) ,

K X

k=−K

ck ψ

y

∆

−k ,

(15)

where c , {ck }, k ∈ [−K, K] are the coefficients of the representation and ψ is a basis function positioned on the grid ∆[−K, −K + 1, . . . , K] ⊆ ∆Z, where the constant ∆ > 0 denotes the distance between two grid points. The computational graph of the nonlinear companding function ϕ is depicted in Fig. 4. The parametric function ϕ can be trained using backpropagation [41] and optimized simultaneously with other layers.

The update formulations of {ck } are simply derived from the chain rule. The gradient of ck is ∂C ∂C ∂ϕ(y) = ∂ck ∂ϕ(y) ∂ck y X ∂C i = ψ −k ∂ϕ(yi ) ∆ y

(16)

i

∂C is the where C represents the cost function. The term ∂ϕ(y i) gradient propagated from the deeper layer (i.e., the uniform P quantizer). The summation yi runs over all elements of y.

In this paper, we represent the nonlinear companding function ϕ in terms of its expansion with polynomial B-splines [42], [43]. The main advantage of the B-spline representation is that it can approximate any nonlinearity with an arbitrary precision for a sufficiently small ∆. Accordingly, our basis function corresponds to ψ = β d , where β d refers to a B-spline of degree d ≥ 0. Within the family of polynomial splines, the



following cubic B-splines  |x|3 2  2   3 − |x| + 2 ,  1 3 β 3 (x) = 6 (2 − |x|) ,     0,

when 0 ≤ |x| ≤ 1, when 1 ≤ |x| ≤ 2,

(17)

when 2 ≤ |x|.

tend to be the most popular in applications due to their minimum curvature property [42]. The derivatives of B-splines can be computed via the following formula

1 1 d d β (x) = β d−1 (x + ) − β d−1 (x − ), (18) dx 2 2 which simply reduces the degree by one. By applying this formula to the expansion of ϕ, we can easily obtain a closed form expression for ϕ′ in terms of quadratic B-splines as  3   − |x|2 , when 0 ≤ |x| ≤ 21 ,   4 9 1 β 2 (x) = when 21 ≤ |x| ≤ 32 , (19) 8 − 2 |x|(3 − |x|),     0, when 23 ≤ |x|.

2) Propagating Gradients Through Uniform Quantization: The derivative of the uniform quantization function is zero everywhere except the discontinuities, making it apparently incompatible with backward propagation since the exact gradient of the cost with respect to the quantities before the discretization would be zero. Bengio et al. [44] studied the question of estimating or propagating gradients through discrete neurons and found that the fastest training was obtained when using the “straight-through estimator,” previously introduced by Hinton et al. in [45]. We follow a similar approach but use the version of the straight-through estimator that takes into account the saturation effect. Consider the uniform quantization function q = Q(x), and assume that the gradient gq = ∂C ∂q has Then, our straight-through estimator of ∂C ∂x is gx = gq 1|x|≤Vsat ,

(20) been obtained. (21)

where Vsat is the saturation level of the quantizer and 1|x|≤Vsat outputs 1 when |x| ≤ Vsat and 0 otherwise. Note that this preserves the gradient’s information and cancels the gradient when |x| is too large. Not canceling the gradient will significantly worse the performance. C. Recovery Net The resulting measurements z produced by the compression net and quantization net is then mapped back to a recovered N × 1 vector through the recovery net, as depicted in Fig. 1. We consider a Multi-Layer Perceptron (MLP) [46], [47] architecture to learn a nonlinear solver that maps a quantized compressed measurements z via several hidden layers back ˆ . Additional reasons why the MLP arto recovered spike x chitecture is a reasonable choice for the compressed sensing problem can be found in [34]. The recovery net consists of two types of layers: (1) L ≥ 1 recovery layers with ρN nodes, where ρ > 1 is the redundancy

factor, and (2) an output layer with N nodes. Each recovery layer is followed by a tangent sigmoid activation function defined as 1 − e−2x . (22) Tr (x) = 1 + e−2x Therefore, given the weight matrix Wl and the bias vector bl , the activation of the l-th recovery layer is given by al = Tr (Wl al−1 + bl ),

l = 1, . . . , L,

a0 = z.

(23)

The last recovery layer is connected to the output layer without nonlinearity. Consider that the recovery procedure of CS are often performed off-line, e.g., on a computer or a fusion center. Instead of the energy efficiency, the recovery accuracy is the key point we concern. Therefore, we use real weights for all recovery layers to improve the recovery performance. D. Training Method The three parts of the BW-NQ-DNN are jointly trained by learning all parameters of the model. The set of all parameters is denoted by Ω = {α, B, Wl , bl , c}, l = 1, . . . , L.

(24)

Therefore, the BW-NQ-DNN can be represented by a nonlinˆ = M(x, Ω). For a training set D train with T ear mapping as x spikes, i.e., D train = {x(1) , x(2) , . . . , x(T ) }, we use the mean squared error (MSE) as the cost function, C(Ω) =

T

2 1 X

M(x(i) , Ω) − x(i) . T t=1 2

(25)

The MSE is used in this work since our goal is to optimize the Signal to Noise and Distortion Ratio (SNDR) [12] which is directly related to the MSE. We use the stochastic gradient descent (SGD) [48] algorithm to train the BW-NQ-DNN. The overall training procedure can be summarized by the following steps: (1) First, the real weight matrix Φ is binarized into B and its corresponding scaling factor α, then forward propagation is performed by using B and α in the compression net and using real weights Wl in the recovery net. The forward propagation procedure is demonstrated in Algorithm 1. (2) Next, backward propagation is performed to compute the gradients with respect to Φ, c, Wl and bl . (3) Last, parameter updates are computed using the real weights for both compression net and recovery net. The back propagation and parameter updating procedures are demonstrated in Algorithm 2. Note that we only binarize the weights during the forward propagation and backward propagation for the compression net. For updating the parameters, we use the high precision (real) weights. Because the parameter changes are tiny in gradient descend, binarization after updating the parameters ignores these changes and the training objective can not be improved. For the recovery net, we use real weights for all three steps. Once the training finished, the inference procedure of the recovery net can be seen as a CS solver to reconstruct the



Algorithm 1 Forward Propagation. Q(·) denotes a uniform quantizer. Input: a mini-batch of training data {x}, previous weights Φt and Wlt , l = 1, . . . , L, previous biases btl , l = 1, . . . , L, and activation function of recovery layers T . 1: for the compression layer do 2: Bt = sign(Φt ) 3: αt = H1 kΦt kℓ1 4: y = αt Bt x 5: end for 6: for the quantization layer do 7: z = Q ϕ(y) 8: end for 9: a0 = z 10: for each recovery layer l in range(1, L) do 11: al = T (Wlt al−1 + btl ) 12: end for Output: output of the sensing layer y, output of the quantization layer z, and output of the recovery layers al , l = 1, . . . , L.

spikes. At the inference stage, we only perform forward propagation with the learned weights. Therefore, the computational complexity of spike recovery will be significantly reduced as Theorem 1. Theorem 1. The computational complexity of spike recovery using BW-NQ-DNN is O(M N L).

(26)

Proof. Note that the recovery net has L layers. Performing forward propagation on each layer costs ρM N computations. In addition, the output layer costs N computations to output the recovered spike. By accumulating computations over all layers and taking the big O operation, the proof completes. IV. E XPERIMENTAL R ESULTS In this section, we examine the performance of the proposed BW-NQ-DNN against state-of-the-art schemes for wireless neural recording applications. A. Data Description Both synthetic and real datasets were employed in various experiments. We adopted the difficult1 dataset from University of Leicester neural signal database [49] to evaluate the recovery performance. The dataset contains 4130 spikes from 3 different neurons. All signals were sampled at 24kHz with 16-bit resolution. We also carried out benchmarking on the publicly available dataset hc-1 [50], which consists of simultaneous intracellular and extracellular recordings of cells in the hippocampus of anesthetized rats. Recordings from an extracellular tetrode and an intracellular electrode were made simultaneously so that the cell recorded on the intracellular electrode was also recorded extracellularly by a tetrode. In

Algorithm 2 Backward Propagation & Parameters Updating. C is the cost function for minibatch, ⊙ indicates element-wise multiplication. Input: a mini-batch of training data {x}, previous weights Φt and Wlt , l = 1, . . . , L, previous biases btl , l = 1, . . . , L, previous quantization parameters ct , grid distance ∆, learning rate η, activation function of recovery layers T , output of the sensing layer layer y, output of the quantization layer z, and output of the recovery layers al , l = 1, . . . , L. ∂C 1: Initialize the gradient on the output layer g ← ∂a L 2: a0 = z 3: for each recovery layer l in range(L, 1) do 4: g ← g ⊙ T ′ (al ) 5: ∆Wl = ηgaTl−1 6: ∆bl = ηg 7: g ← (Wlt )T g 8: Update W: Wlt+1 = Wlt − ∆Wl 9: Update b: bt+1 = btl − ∆bl l 10: end for 11: for the quantization layer do 12: g ← g ⊙ 1|ϕ(y)|≤Vsat 13: ∆c = ηgϕ(y) 14: g ← (ct )T g ⊙ ϕ′ (y) 15: Update c: ct+1 = ct − ∆c 16: end for 17: for the compression layer do 18: ∆Φ = ηgxT 19: Update Φ: Φt+1 = Φt − ∆Φ 20: end for Output: updated weights Φt+1 and Wlt+1 , l = 1, . . . , L, updated quantization parameters ct+1 , and updated biases bt+1 , l = 1, . . . , L. l

this paper, we selected the d14921 subset to evaluate the performance of all algorithms. The dataset contains four channels extracellular signals that sampled at 20 KHz with 16-bit resolution. For both datasets, all spikes were aligned to their absolute peaks in a 64-sample spike window. B. Experiment Setup The following algorithms have been chosen for performance comparison. (1) The proposed BW-NQ-DNN. All experiments use minibatch SGD on batches of 32 spikes. The nonlinear function ϕ was defined with 200 basis functions that were spread uniformly over the dynamic range [−Vsat , Vsat ] and all elements of c were initialized to 1. Other parameters were set to be η = 0.05, ρ = 1.5, and L = 3 1 . (2) SDNCS [12], which uses a sparse representation dictionary learned from data and recovers the original spikes by OMP. (3) BPDQ [9], which recovers sparse signals from quantized measurements by ℓ1 -minimization. 1 We tested several combinations of these parameters and chose the best one.



Leicester Difficult1

25 SNDR (dB)

15

20 15

10

10

5

5

0

4

8

12

16

20

24

28

32

100

80

80

60

60

40

40

20

20 8

12

C. Results 1) Recovery performance versus the number of measurements: We first evaluate the SNDR and CA of all algorithms versus the number of measurements M using both synthetic dataset and real dataset. The experimental results are shown in Fig. 5, where each point indicates the SNDR or CA averaged over all spikes at a specified number of measurements. From the SNDR comparison, we observe that BW-NQDNN consistently outperforms other QCS methods. From the CA comparison, we observe that BW-NQ-DNN yields better classification performance than the other algorithms, especially when M is small. Even the number of measurements is only 4 (with bit-depth B = 2), BW-NQ-DNN can achieve above 65% and 79% classification accuracy for Leicester difficult1 and hc-1 dataset, respectively. 2) Recovery performance versus quantization bit-depth: We then evaluate the SNDR and CA of all algorithms versus the quantization bit-depth. Fig. 6 shows the results using varying bit-depth B ∈ {2, 3, 4, 5, 6, 7, 8} when M = 4. The proposed BW-NQ-DNN has the best SNDR among the four methods both for Leicester difficult1 dataset and hc-1 dataset. Also, we observed that BW-NQ-DNN had a performance gap about 15.2 dB for Leicester difficult1 dataset and 15.6 dB for hc-1 dataset, respectively. This is because that for larger bitdepth such as B ∈ {6, 7, 8}, the variance of the quantization error was small, increasing the bit-depth will not have any gain in this situation.

16

20

24

28

32

16

20

24

28

32

4

8

12

16

20

24

28

32

M

Fig. 5. Recovery SNDR (top row) and CA (bottom row) averaged over all spikes from Leicester difficult1 dataset (left column) and hc-1 dataset (right column), versus the different number of measurements M for BW-NQ-DNN (blue trace), SDNCS (red trace), BPDQ (yellow trace), and QVMP (purple trace), respectively. The bit-depth was set to be B = 2.

(28)

The wavelet decomposition [49] method was used to extract features from recovered spikes. The first 10 features of each spike were used for classification by superparamagnetic clustering (SPC) [49] algorithm, and the classification results were compared with the ground truth labels contained in the datasets.

12

M

Leicester Difficult1

25

15

20 15

10

10

5

5

0

2

3

4

5

hc-1

25

BW-NQ-DNN SDNCS BPDQ QVMP

20 SNDR (dB)

# correctly classified spikes × 100%. # total spikes

8

0 4

6

7

8

0

2

3

4

B

CA (%)

CA =

4

M

100

(27)

Classification Accuracy (CA) was also used as a performance metric, calculated as a percentage of the total number of spikes correctly classified, i.e.,

0

M

0

kxk2 SNDR = 20 log . ˆ k2 kx − x

hc-1

25


20

CA (%)

(4) QVMP [11]. QVMP is a variational Bayesian DeQuantization algorithm that has better performance than greedy algorithms. Each dataset was divided into training section and test section, composed of 60% and 40% of the spikes. The training section was used to train the neural network for BW-NQ-DNN and learn the sparse representation dictionary for SDNCS, whereas the test section was used to evaluate the recovery performance. For both BPDQ and QVMP, we use the same dictionary as in SDNCS. To measure the recovery quality, we employed the Signal to Noise and Distortion Ratio (SNDR) [12] to quantify the error ˆ: percentage between the original x and the recovered x

5

6

7

8

6

7

8

B

100

100

80

80

60

60

40

40

20

20

0

0 2

3

4

5 B

6

7

8

2

3

4

5

B

Fig. 6. Recovery SNDR (top row) and CA (bottom row) averaged over all spikes from Leicester difficult1 dataset (left column) and hc-1 dataset (right column), versus the different bit-depth B for BW-NQ-DNN (blue trace), SDNCS (red trace), BPDQ (yellow trace), and QVMP (purple trace), respectively. The number of measurements was set to be M = 4.

3) Trade-off between compression of CS and quantization bit-depth: Fig. 7(a) and Fig. 7(b) show the SNDR and CA with varying number of measurements M ∈ {4, 8, 12, 16} and quantization bit-depth B ∈ {2, 4, 6, 8} for Leicester difficult1 dataset, respectively. We found that the qualities of recoveries were affected by the configurations of M and B even if under the same transmission bit-budget M B. It is worth noting that smaller values of MB are always preferable since it reduces the total transmission bit-budget. In Fig. 7(a), the configuration B = 8, M = 16 had the maximum SNDR for M B < 128 bits. However, Fig. 7(b) showed that M = 8, B = 4 yielded 100% CA. This configuration has the minimal transmission bit-budget, which is preferable by QCS to compress neural spikes. With M = 8, B = 4, we transmit 32 bits instead of 128



36

BW+NQ B=2 BW+NQ B=4 BW+NQ B=8 BW+UQ B=2 BW+UQ B=4 BW+UQ B=8 Bern+NQ B=2 Bern+NQ B=4 Bern+NQ B=8

30 SNDR (dB)

samples for low-power wireless neural recording. Using an energy model of 0.4 uJ/bit which facilitates Bluetooth LowEnergy (BLE) as the wireless transmission protocol [22], energy consumption of QCS was 12.8 uJ. For neural spikes quantized with B bits, the energy consumption of the QCS data compressor can be reduced to the only 1/B of noncompressed wireless neural recording.

24 18 12

30

100

6

25 20 15

M=4 M=8 M=12 M=16

10

8 162432

48 6472 96 Bit-budget (bits)

128

4

8

12

16

20

24

28

32

M

80

M=4 M=8 M=12 M=16

70 60 8 162432

(a)

48 6472 96 Bit-budget (bits)

Fig. 8. Recovery SNDR versus M with varying bit-depth B ∈ {2, 4, 8} for Leicester difficult1 dataset. Three cases were considered, i.e., using the complete BW-NQ-DNN framework (BW+NQ), replacing the learned binary measurement matrix with i.i.d random Bernoulli matrix (Bern+NQ), and replacing the learned non-uniform quantizer with a uniform one (BW+UQ).

128

10

1

10

0

(b)

Fig. 7. (a) Recovery SNDR and (b) CA versus different transmission bitbudget M B for Leicester difficult1 dataset. We varied M ∈ {4, 8, 12, 16} and B ∈ {2, 4, 6, 8}. BW-NQ-DNN was used to recover spikes from quantized measurements with different M and B.

4) Highlights of learned binary measurement matrix and non-uniform quantizer: To highlight importance of the learned binary measurement matrix and the non-uniform quantizer, we considered the following three cases, i.e., 1) the complete network with learned binary measurement matrix and learned non-uniform quantizer (denoted by BW+NQ), 2) replacing the learned binary measurement matrix with i.i.d random Bernoulli matrix (denoted by Bern+NQ), and 3) replacing the learned non-uniform quantizer with a uniform one (denoted by BW+UQ). Fig. 8 gives the recovery performance in these cases with varying M and B. The results show that replacing any of the two parts decreases the recovery performance. We note that BW+NQ outperforms Bern+NQ, especially at high compression ratio (i.e., when M is low). With the increase of M , the two methods have almost the same SNDR. We also note that BW+NQ outperforms BW+UQ with low bit-depth (i.e., B = 2 and 4). When the bit-depth is high, e.g., B = 8, the two cases have almost the same performance, indicating that using non-uniform quantizer has no gain in high bit-depth situations. 5) Computation time: To evaluate the computational efficiency of BW-NQ-DNN, we performed a computation time comparison at different M , with MATLAB implementations of the four algorithms. The results are depicted in Fig. 9. It demonstrates that BW-NQ-DNN is over 30-times faster than SDNCS and over 3000-times faster than BPDQ and QVMP. V. C ONCLUSION Based on deep neural networks, this paper presents a novel QCS framework for wireless neural recording, in which a binary measurement matrix, a non-uniform quantizer and a non-iterative recovery solver are jointly optimized during the training phase. Experimental results showed that the proposed approach outperforms state-of-the-art in terms of both recovery

Time (s)

5

CA (%)

SNDR (dB)

90

10

-1

10

-2

10

-3

10

-4


4

8

12

16

20

24

28

32

M

Fig. 9. Computation time averaged over all spikes from Leicester difficult1 dataset versus the different number of measurements M for BW-NQ-DNN, SDNCS, BPDQ, and QVMP, respectively.

quality and computation time, thus is preferable for realtime wireless neural recording applications. The framework proposed in this paper is also suitable for low-power wireless telemonitoring of physiological signals. It can also be used as a low-power data compressor or encoder for more types of signals such as audios and images. R EFERENCES [1] D. A. Schwarz, M. A. Lebedev, T. L. Hanson, D. F. Dimitrov, G. Lehew, J. Meloy, S. Rajangam, V. Subramanian, P. J. Ifft, Z. Li et al., “Chronic, wireless recordings of large-scale brain activity in freely moving rhesus monkeys,” Nature methods, vol. 11, no. 6, pp. 670–676, 2014. [2] M. Yin, D. A. Borton, J. Komar, N. Agha, Y. Lu, H. Li, J. Laurens, Y. Lang, Q. Li, C. Bull et al., “Wireless neurosensor for full-spectrum electrophysiology recordings during free behavior,” Neuron, vol. 84, no. 6, pp. 1170–1182, 2014. [3] M. Chae, W. Liu, Z. Yang, T. Chen, J. Kim, M. Sivaprakasam, and M. Yuce, “A 128-channel 6mw wireless neural recording ic with on-thefly spike sorting and uwb tansmitter,” in Solid-State Circuits Conference, 2008. ISSCC 2008. Digest of Technical Papers. IEEE International. IEEE, 2008, pp. 146–603. [4] T. Wu and Z. Yang, “Power-efficient vlsi implementation of a feature extraction engine for spike sorting in neural recording and signal processing,” in Control Automation Robotics & Vision (ICARCV), 2014 13th International Conference on. IEEE, 2014, pp. 7–12. [5] D. L. Donoho, “Compressed sensing,” Information Theory, IEEE Transactions on, vol. 52, no. 4, pp. 1289–1306, 2006. [6] E. J. Candes and T. Tao, “Near-optimal signal recovery from random projections: Universal encoding strategies?” Information Theory, IEEE Transactions on, vol. 52, no. 12, pp. 5406–5425, 2006.



[7] E. J. Candès, “The restricted isometry property and its implications for compressed sensing,” Comptes Rendus Mathematique, vol. 346, no. 9, pp. 589–592, 2008. [8] R. M. Gray and D. L. Neuhoff, “Quantization,” IEEE transactions on information theory, vol. 44, no. 6, pp. 2325–2383, 1998. [9] L. Jacques, D. K. Hammond, and J. M. Fadili, “Dequantizing compressed sensing: When oversampling and non-gaussian constraints combine,” IEEE Transactions on Information Theory, vol. 57, no. 1, pp. 559–571, 2011. [10] L. Jacques, K. Degraux, and C. De Vleeschouwer, “Quantized iterative hard thresholding: Bridging 1-bit and high-resolution quantized compressed sensing,” arXiv preprint arXiv:1305.1786, 2013. [11] Z. Yang, L. Xie, and C. Zhang, “Variational bayesian algorithm for quantized compressed sensing,” IEEE Transactions on Signal Processing, vol. 61, no. 11, pp. 2815–2824, 2013. [12] J. Zhang, Y. Suo, S. Mitra, S. P. Chin, S. Hsiao, R. F. Yazicioglu, T. D. Tran, and R. Etienne-Cummings, “An efficient and compact compressed sensing microsystem for implantable neural recordings,” Biomedical Circuits and Systems, IEEE Transactions on, vol. 8, no. 4, pp. 485– 496, 2014. [13] Y. Suo, J. Zhang, T. Xiong, P. S. Chin, R. Etienne-Cummings, and T. D. Tran, “Energy-efficient multi-mode compressed sensing system for implantable neural recordings,” Biomedical Circuits and Systems, IEEE Transactions on, vol. 8, no. 5, pp. 648–659, 2014. [14] T. Xiong, Y. Suo, J. Zhang, S. Liu, R. Etienne-Cummings, S. Chin, and T. D. Tran, “A dictionary learning algorithm for multi-channel neural recordings,” in Biomedical Circuits and Systems Conference (BioCAS), 2014 IEEE. IEEE, 2014, pp. 9–12. [15] T. Xiong, J. Zhang, Y. Suo, D. N. Tran, R. Etienne-Cummings, S. Chin, and T. D. Tran, “An unsupervised dictionary learning algorithm for neural recordings,” in Circuits and Systems (ISCAS), 2015 IEEE International Symposium on. IEEE, 2015, pp. 1010–1013. [16] F. Chen, A. P. Chandrakasan, and V. M. Stojanović, “Design and analysis of a hardware-efficient compressed sensing architecture for data compression in wireless sensors,” Solid-State Circuits, IEEE Journal of, vol. 47, no. 3, pp. 744–756, 2012. [17] H. Monajemi, S. Jafarpour, M. Gavish, D. L. Donoho, S. Ambikasaran, S. Bacallado, D. Bharadia, Y. Chen, Y. Choi, M. Chowdhury et al., “Deterministic matrices matching the compressed sensing phase transitions of gaussian random matrices,” Proceedings of the National Academy of Sciences, vol. 110, no. 4, pp. 1181–1186, 2013. [18] S. Li and G. Ge, “Deterministic construction of sparse sensing matrices via finite geometry,” Signal Processing, IEEE Transactions on, vol. 62, no. 11, pp. 2850–2859, 2014. [19] J. N. Laska, P. T. Boufounos, M. A. Davenport, and R. G. Baraniuk, “Democracy in action: Quantization, saturation, and compressive sensing,” Applied and Computational Harmonic Analysis, vol. 31, no. 3, pp. 429–443, 2011. [20] J. Haboba, M. Mangia, F. Pareschi, R. Rovatti, and G. Setti, “A pragmatic look at some compressive sensing architectures with saturation and quantization,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 2, no. 3, pp. 443–459, 2012. [21] A. Wang, W. Xu, Z. Jin, and F. Gong, “Quantization effects in an analogto-information front end in eeg telemonitoring,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 62, no. 2, pp. 104–108, 2015. [22] A. Wang, Z. Jin, C. Song, and W. Xu, “Adaptive compressed sensing architecture in wireless brain-computer interface,” in Proceedings of the 52nd Annual Design Automation Conference. ACM, 2015, p. 173. [23] B. Liu and Z. Zhang, “Quantized compressive sensing for low-power data compression and wireless telemonitoring,” IEEE Sensors Journal, vol. PP, no. 99, pp. 1–1, 2016. [24] J. Z. Sun and V. K. Goyal, “Optimal quantization of random measurements in compressed sensing,” in 2009 IEEE International Symposium on Information Theory. IEEE, 2009, pp. 6–10. [25] U. Kamilov, V. K. Goyal, and S. Rangan, “Optimal quantization for compressive sensing under message passing reconstruction,” in Information Theory Proceedings (ISIT), 2011 IEEE International Symposium on. IEEE, 2011, pp. 459–463. [26] U. S. Kamilov, V. K. Goyal, and S. Rangan, “Message-passing dequantization with applications to compressed sensing,” IEEE Transactions on Signal Processing, vol. 60, no. 12, pp. 6270–6281, 2012. [27] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis pursuit,” SIAM review, vol. 43, no. 1, pp. 129–159, 2001. [28] J. A. Tropp and A. C. Gilbert, “Signal recovery from random measurements via orthogonal matching pursuit,” IEEE Transactions on information theory, vol. 53, no. 12, pp. 4655–4666, 2007.

[29] T. Blumensath and M. E. Davies, “Iterative hard thresholding for compressed sensing,” Applied and Computational Harmonic Analysis, vol. 27, no. 3, pp. 265–274, 2009. [30] M. E. Tipping, “Sparse bayesian learning and the relevance vector machine,” Journal of machine learning research, vol. 1, no. Jun, pp. 211–244, 2001. [31] D. L. Donoho, A. Maleki, and A. Montanari, “Message-passing algorithms for compressed sensing,” Proceedings of the National Academy of Sciences, vol. 106, no. 45, pp. 18 914–18 919, 2009. [32] A. Mousavi, A. B. Patel, and R. G. Baraniuk, “A deep learning approach to structured signal recovery,” in 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton). IEEE, 2015, pp. 1336–1343. [33] K. Kulkarni, S. Lohit, P. Turaga, R. Kerviche, and A. Ashok, “Reconnet: Non-iterative reconstruction of images from compressively sensed measurements,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 449–458. [34] M. Iliadis, L. Spinoulas, and A. K. Katsaggelos, “Deep fullyconnected networks for video compressive sensing,” arXiv preprint arXiv:1603.04930, 2016. [35] W. R. Bennett, “Spectra of quantized signals,” Bell System Technical Journal, vol. 27, no. 3, pp. 446–472, 1948. [36] D. Hui and D. L. Neuhoff, “Asymptotic analysis of optimal fixed-rate uniform scalar quantization,” IEEE Transactions on Information Theory, vol. 47, no. 3, pp. 957–977, 2001. [37] V. Bach and R. Seiler, “Analysis of optimal high resolution and fixed rate scalar quantization,” IEEE Transactions on Information Theory, vol. 55, no. 4, pp. 1683–1691, 2009. [38] A. Gersho and R. M. Gray, Vector quantization and signal compression. Springer Science & Business Media, 2012, vol. 159. [39] S. Graf and H. Luschgy, Foundations of quantization for probability distributions. Springer, 2000. [40] V. Misra, V. K. Goyal, and L. R. Varshney, “Distributed scalar quantization for computing: High-resolution analysis and extensions,” IEEE Transactions on Information Theory, vol. 57, no. 8, pp. 5298–5325, 2011. [41] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal representations by error propagation,” DTIC Document, Tech. Rep., 1985. [42] M. Unser, “Splines: A perfect fit for signal and image processing,” IEEE Signal processing magazine, vol. 16, no. 6, pp. 22–38, 1999. [43] ——, “Sampling-50 years after shannon,” Proceedings of the IEEE, vol. 88, no. 4, pp. 569–587, 2000. [44] Y. Bengio, N. Léonard, and A. Courville, “Estimating or propagating gradients through stochastic neurons for conditional computation,” arXiv preprint arXiv:1308.3432, 2013. [45] T. Tieleman and G. Hinton, “Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude,” COURSERA: Neural Networks for Machine Learning, vol. 4, no. 2, 2012. [46] D. W. Ruck, S. K. Rogers, and M. Kabrisky, “Feature selection using a multilayer perceptron,” Journal of Neural Network Computing, vol. 2, no. 2, pp. 40–48, 1990. [47] S. S. Haykin, S. S. Haykin, S. S. Haykin, and S. S. Haykin, Neural networks and learning machines. Pearson Upper Saddle River, NJ, USA:, 2009, vol. 3. [48] L. Bottou, “Large-scale machine learning with stochastic gradient descent,” in Proceedings of COMPSTAT’2010. Springer, 2010, pp. 177– 186. [49] R. Q. Quiroga, Z. Nadasdy, and Y. Ben-Shaul, “Unsupervised spike detection and sorting with wavelets and superparamagnetic clustering,” Neural computation, vol. 16, no. 8, pp. 1661–1687, 2004. [50] D. A. Henze, Z. Borhegyi, J. Csicsvari, A. Mamiya, K. D. Harris, and G. Buzsáki, “Intracellular features predicted by extracellular recordings in the hippocampus in vivo,” Journal of neurophysiology, vol. 84, no. 1, pp. 390–400, 2000.


A Deep Learning Framework of Quantized Compressed Sensing for ...

A Deep Learning Framework of Quantized Compressed Sensing for ...

Suggest Documents

An Efficient Deep Quantized Compressed Sensing

Recovery of Quantized Compressed Sensing ... - PHySIS Project

Deep artifact learning for compressed sensing and parallel MRI

Deep Residual Learning for Compressed Sensing CT Reconstruction ...

A Deep Learning Framework using Passive WiFi Sensing for

Fast binary embeddings, and quantized compressed sensing ... - arXiv

Compressed Sensing Meets Machine Learning - Classification of ...

A Mathematical Framework for Deep Learning in

Compressed Sensing Meets Machine Learning - Inst.eecs.berkeley.edu

Compressed Counting Meets Compressed Sensing

Compressed Sensing Meets Machine Learning - Inst.eecs.berkeley.edu

Deep Learning Models for Multimodal Sensing

An Energy Efficient Compressed Sensing Framework for the ... - MDPI

A COMPRESSED SENSING APPROACH FOR ... - CiteSeerX

Retrieving Sparse Patterns using a Compressed Sensing Framework ...

Compressed Sensing for Tactile Skins

Compressed Learning of Deep Neural Networks for ... - MDPIwww.researchgate.net › 5cc0a32f4585156cd7afa122 › C

Compressed Sensing Electron Tomography for

A Perturbed Compressed Sensing Protocol for Crowd Sensing

A Perturbed Compressed Sensing Protocol for Crowd Sensing

A compressed sensing based AI learning paradigm ...

A Deep Learning Framework for Robust and Accurate Prediction of

Compressed Sensing MRI

IRJET- Deep Learning Framework Analysis