30th Annual International IEEE EMBS Conference Vancouver, British Columbia, Canada, August 20-24, 2008
Comparison of Spike-Sorting Algorithms for Future Hardware Implementation Sarah Gibson, Jack W. Judy, and Dejan Markovi´c Abstract— Applications such as brain-machine interfaces require hardware spike sorting in order to (1) obtain singleunit activity and (2) perform data reduction for wireless transmission of data. Such systems must be low-power, lowarea, high-accuracy, automatic, and able to operate in real time. Several detection and feature extraction algorithms for spike sorting are described briefly and evaluated in terms of accuracy versus computational complexity. The nonlinear energy operator method is chosen as the optimal spike detection algorithm, being most robust over noise and relatively simple. The discrete derivatives method [1] is chosen as the optimal feature extraction method, maintaining high accuracy across SNRs with a complexity orders of magnitude less than that of traditional methods such as PCA.
I. I NTRODUCTION During a neural recording, a single electrode often receives electrical signals from multiple neurons simultaneously. For applications such as neural prosthetics and neuroscience research, spike sorting is a critical step in neuronal signal processing for two reasons. The first reason is functional: Adjacent cells may encode completely different information. Neuroscientists often need to know which spikes come from which neurons in order to understand the neuronal circuitry, and brain-machine interfaces (BMIs) often depend on singleunit activity as input. The second reason is practical: data reduction. Recent advances in BMI technology allow for the recording of hundreds of channels simultaneously. At the same time, there is a growing demand for wireless transmission of data. Communication bandwidth and power limitations require us to perform data reduction on-chip before transmission, and spike sorting is one way of accomplishing this. Requirements on hardware for spike sorting are that it must be low-power in order to prevent heat-related tissue damage and low-area in order to be implantable. Requirements on the algorithms implemented in the hardware are that they be accurate, automatic, real-time, and computationally simple in order to stay within the power limitations. In this paper, we evaluate several spike-sorting algorithms that have been published recently that seem promising for hardware implementation. Thus, each algorithm presented here is evaluated in terms of its accuracy versus its computational complexity. Whereas many independent groups (e.g., [2]–[7]) have evaluated individual algorithms using different, often biological, data sets, we have used the neuralsignal simulator introduced in [7] to develop synthetic data Sarah Gibson, Jack W. Judy, and Dejan Markovi´c are with the Department of Electrical Engineering, University of California, Los Angeles, CA, USA.
[email protected]
978-1-4244-1815-2/08/$25.00 ©2008 IEEE.
sets in order to obtain an accurate, unbiased comparison between algorithms. Additionally, we evaluate algorithms over a wider range of SNRs, from very high (∼15 dB) to very low (∼ −10 dB). II. A LGORITHMS Spike sorting can be divided into three main steps: (1) Spike Detection and Alignment, separating spikes from noise and aligning to a common point; (2) Feature Extraction, transforming spikes into a certain set of features; and (3) Clustering, classifying spikes into different groups (i.e., neurons) based on extracted features. In this section, we present a representative sample of existing spike detection and feature extraction algorithms that are automatic (and potentially unsupervised), real-time, and practical for hardware implementation. Alignment and clustering algorithms will not be addressed here; they are topics of current research. A. Spike Detection Spike detection algorithms involve two main steps: (1) pre-emphasis of the spike and (2) application of a threshold [2]. This section describes three very different methods of pre-emphasis: absolute value, nonlinear energy operator, and stationary wavelet transform product. The method of automatically determining the threshold for each method is also stated. Spike detection using all of the methods was accomplished as follows: When a sample in the preemphasized signal crosses the threshold, a 3-ms window is applied to the signal and the result is saved as a spike. This window length was chosen because a spike duration is unlikely to be longer than this and because it assures that we do not capture more than one spike from the same neuron in the window. 1) Absolute Value: A simple, commonly used detection method is to apply a threshold to the voltage of the waveform. This threshold can be applied to either the raw (filtered) waveform or to the absolute value of the waveform. Applying a threshold to the absolute value of the signal is more intuitive, since spikes can either be positive- or negativegoing. The absolute value threshold is confirmed to be better than a simple threshold in [8]. As in [7], the threshold T hr was automatically set to ) ( |x(n)| , (1) T hr = 4σN , σN = median 0.6745 where x(n) is a sample of the waveform at time n and σN is an estimate of the standard deviation of the noise.
5015
2) Nonlinear Energy Operator: The nonlinear energy operator (NEO), also called the Teager energy operator (TEO), originally described in [9], has been proposed for use in spike detection ([2], [3], [8]). In discrete time, the NEO ψ is defined as ψ[x(n)] = x2 (n) − x(n + 1) · x(n − 1).
(2)
The NEO is large only when the signal is both high in power (i.e., x2 (n) is large) and high in frequency (i.e., x(n) is large while x(n + 1) and x(n − 1) are small). Since a spike by definition is characterized by localized high frequencies and an increase in instantaneous energy [2], this method has an obvious advantage over methods that look only at an increase in signal energy or amplitude without regarding the frequency. Similarly to the method in [2], the threshold T hr was automatically set to a scaled version of the mean of the NEO: N 1 X ψ[x(n)], T hr = C N n=1
(3)
where N is the number of samples in the signal. The scale was initially chosen to be C = 8 (by experiment, as described in Section III-C) and then used as a constant. 3) Stationary Wavelet Transform Product (SWTP): The discrete wavelet transform, originally presented in [10], is ideally suited for the detection of signals in noise (e.g., edge detection, speech detection). Recently it has also been applied to spike detection ([4], [11], [12]). Here we use the method presented in [4]. First, the stationary wavelet transform (SWT) is calculated at 5 consecutive dyadic scales (W (2j , n), j = 1, . . . , 5). Then the scale 2jmax with the largest sum of absolute values is found: ! N X |W (2j , n)| . (4) jmax = argmax j∈{1,...,5}
n=1
From here, we calculate the point-wise product P (n) between the SWT at this scale and the SWTs at the two previous scales: P (n) =
jmax Y
|W (2j , n)|.
(5)
j=jmax −2
This product is then smoothed by convolving it with a Bartlett window w(n) in order to eliminate spurious peaks, and a threshold is applied. Again, the threshold T hr was automatically set to a scaled version of the mean of this result: N 1 X w(n) ∗ P (n), (6) T hr = C N n=1 where N is the number of samples in the signal. The scale was initially chosen to be C = 2 (by experiment, as described in Section III-C) and then used as a constant.
B. Feature Extraction Feature extraction (FE) emphasizes the difference between waveforms and reduces the dimensionality of these differences, which serve as input to clustering. In this section, principal component analysis, the discrete wavelet transform, discrete derivatives, and the integral transform are described. 1) Principal Component Analysis (PCA): Since PCA has become a benchmark FE method in neural signal processing, we have included it in our analysis as a basis for comparison. In PCA, we find an orthogonal basis (i.e., the “principal components” or PCs) for our data that captures the directions in the data with the largest variation, and we express each spike as a series of PC coefficients ci : ci =
N X
P Ci (n) · s(n),
(7)
n=1
where s is a spike, N is the number of samples in a spike/PC, and P Ci is the ith PC. The PCs are found by performing eigenvalue decomposition of the covariance matrix of the data; in fact, the PCs are the eigenvectors themselves. See [13] for a more detailed description of the method. PCA results in the same number of coefficients as samples in the original spikes (N ). However, as most of the energy is captured in the first few components, we kept only the 3 largest PC scores for our analysis. 2) Discrete Wavelet Transform (DWT): The DWT has been proposed for FE by [7]. The DWT should be a good method for FE since it is a multi-resolution technique that provides good time resolution at high frequencies and good frequency resolution at low frequencies. The DWT is also appealing because it can be implemented using a series of filter banks, keeping the complexity relatively low. We used the Haar wavelet because it provided one of the highest levels of accuracy out of all the wavelets tested while being the simplest to implement. The DWT yields about the same number of expansion coefficients as samples in the original spike. We then perform dimensionality reduction using the Lilliefors test for normality, taking the 10 coefficients whose distribution over all spikes differ most from the normal distribution.1 3) Discrete Derivatives (DD): A method similar to the DWT but simpler was presented in [1]. In it we essentially compute discrete derivatives by computing the slope at each sample point, over a number of different time scales: ddδ (n) = s(n) − s(n − δ),
(8)
where s is a spike. We chose to use δ = 1, 3, and 7. This yielded about 3× more “expansion coefficients” as samples in the original spike, so we again reduced the dimensionality to 10 using the Lilliefors test.1 4) Integral Transform (IT): In the IT method [6], spikes are classified based on the areas under the positive and
5016
1 The
number of coefficients has not yet been fully optimized.
5017
!
"
#
$
pre-emphasis (absolute value, NEO, or SWTP) and then systematically varying the threshold on the pre-emphasized signal from very low (the minimum value of the preemphasized signal) to very high (the maximum value of the pre-emphasized signal). At each threshold value, spikes were detected and PD and PFA were calculated in order to form the ROC curve. The area under the ROC curve (also called the “choice probability”) represents the probability that an ideal observer will correctly classify an event in a two-alternative forced-choice task. Thus, a higher choice probability corresponds to a better detection method. The ROC curves were also used in order to choose the parameter C in (3) and (6). For each of the two methods, the ROC curve of an initial training data set was examined and the best threshold chosen given an acceptable error (PD > 70% and PFA < 30%). C was then defined by dividing this threshold by the mean of those samples used to generate the ROC curves (i.e., the respective pre-emphasized signal). D. Complexity Calculations
MS Fig. 2. Spike shapes used in test data sets. Each set in column A contains 2 neurons, B 3 neurons, C 4 neurons, and D 5 neurons. For each set of spikes shown, 4 data sets were generated: 2 where the amplitudes of all spikes were normalized and 2 where the amplitudes were different (as shown). Within these pairs of data sets, the firing rate of each neuron was equal (40 Hz) in one and varied (from 5 Hz to 40 Hz) in the other.
and the detection rate is defined as number of misses PD = 1 − PM , PM = . number of true positives
The computational complexity of each of the algorithms was estimated using its respective equations. For detection algorithms, complexity is estimated as the sum of the operations per sample required for pre-emphasis and the operations per sample required for threshold calculations, assuming 10 s of training for threshold calculations per 1 hr of data. For FE algorithms, complexity is estimated per spike, assuming 72 samples (3 ms) per spike. As in [6], we define the complexity as Nadditions + 10Nmultiplications . IV. R ESULTS
(12)
We defined true positives as the samples within known spikes and true negatives as all other samples. False alarms are all the samples within a detected spike that are not part of a true spike, and misses are all samples of true spikes that are not part of detected spikes. As depicted in Fig. 1, the accuracy of each FE method was calculated as follows. For each signal over the range of SNRs, true spikes were “detected” using the file of true spike times. This assured that the accuracy calculations for the FE methods were not affected by the possible inaccuracies of spike detection methods. Features from the detected spikes were then extracted using the FE methods described in Section II. The results were clustered using the Matlab implementation of fuzzy c-means, the most accurate out of all clustering methods tried. The accuracy was then calculated by comparing the computed cluster assignment of each spike to the actual identities of each spike. Since fuzzy c-means is non-deterministic, this was performed N = 100 times, and the median result was reported. C. Receiver Operating Characteristic (ROC) ROC curves were used to evaluate the performance of the various spike detection algorithms. For a given method, the ROC curve was generated by first performing the appropriate
A. Spike Detection Fig. 3 shows the median ROC curve for each spike detection method. It is clear from this figure that the SWTP method is inferior to the other two methods. However, the curves corresponding to the absolute value and NEO methods are too close to draw any conclusions, so we statistically compared the underlying choice-probability distributions in order to determine which of these methods is better. Fig. 4 shows the distribution of choice probabilities for each detection method. The difference between each distribution is statistically significant (Kurskal-Wallis test, p < 0.01). The median of the NEO choice-probability distribution is highest, indicating that its performance is best. Fig. 5 shows both the probability of detection versus SNR and the probability of false alarm versus SNR for each detection method when the threshold calculation techniques discussed in Section II are used. This figure shows, more dramatically than Figs. 3 and 4, that NEO performs better than absolute value across SNRs. This indicates either that the threshold calculation technique in (3) is more robust than the technique in (1), or that the NEO method is generally less sensitive to the choice of threshold than the absolute value method. Table I shows the complexity per input sample of each spike detection algorithm, not including smoothing of the SWTP. Note that when estimating the complexity of the threshold calculation for the absolute value method (1),
5018
5019
TABLE II C OMPUTATIONAL COMPLEXITY OF FEATURE EXTRACTION ALGORITHMS PER 72- SAMPLE SPIKE ( COEFFICIENT SELECTION NOT INCLUDED ) Algorithm
Additions
Multiplications
PCA* DWT† DD IT*
5184 280 205 72
5184 560 0 0
*
Computational Complexity 57024 5880 205 72
Requires offline training Level-5 Haar wavelet
†
&ODVVLILFDWLRQ$FFXUDF\
$$
0#!
are performed on detected spikes only (i.e., much less frequently), the complexity of the overall system is so far dominated by the complexity of detection. We estimate that once clustering is added to the system, the overall complexity will be dominated by clustering, which will be at least 3× more complex than spike detection and FE combined. The next step is to similarly evaluate existing methods for alignment at the front end and clustering at the back end of the spike sorting process. A challenge will be to find an unsupervised, accurate, and computationally simple algorithm for use in a hardware spike sorter. The dimensionality of features used for clustering will also have to be optimized. Finally, verification methods, such as autocorrelograms and cross-correlograms, should be employed. VII. ACKNOWLEDGMENTS
$74
)4
The authors thank Dr. Y. Ben-Shaul for use of his neuralsignal simulator.
R EFERENCES
&RPSXWDWLRQDO&RPSOH[LW\
Fig. 7. Median classification accuracy, averaged over all data sets and noise levels (N = 1632), after fuzzy c-means clustering versus computational complexity in number of additions required per spike for each FE algorithm. Error bars show standard error of the mean.
per spike, this reduces our data rate from 24 000 samples per second to 7200 samples per second—a 70% reduction. Feature extraction can further reduce the data rate. The IT method inherently reduces the dimensionality of the features that represent each spike to 2, a further reduction of 97%. Neither PCA nor DWT inherently reduces the dimensionality of features, and DD actually increases the dimensionality of features. However normally, as in this paper, only the first 3 PCs are used, effectively further reducing the data by 96%. Furthermore, coefficient selection methods, such as the Lilliefors test, can be used to reduce the dimensionality of DWT/DD features. In this paper, where 10 coefficients were chosen for each method, the data rates were effectively further reduced by 86%. Again assuming 100 spikes per second, the total data reduction achieved from detection and feature extraction is 99.2% for IT, 98.8% for PCA, and 95.8% for DWT and DD (when 10 coefficients are selected). VI. C ONCLUSIONS AND F UTURE W ORK Based on the present analysis, we advocate the NEO for spike detection and DD for FE. NEO would require 504 000 add operations per second (OPS), and DD would require 20 500 OPS (assuming a firing rate of 100 Hz), for a system total of 524 000 OPS, which is feasible for ultra-low-power implementations. The system so far is estimated to have about 25 000 gates and a power dissipation of 1 µW per channel when implemented in a 90-nm technology at 0.4 V. Notice that since spike detection operations must be performed on every sample of raw data, while FE operations
[1] Z. Nadasdy et al., “Comparison of unsupervised algorithms for on-line and off-line spike sorting,” presented at the 32nd Annu. Meeting Soc. for Neurosci., 2002. [Online]. Available: http://www.vis.caltech.edu/∼zoltan/ [2] S. Mukhopadhyay and G. Ray, “A new interpretation of nonlinear energy operator and its efficacy in spike detection,” IEEE Trans. Biomed. Eng., vol. 45, no. 2, pp. 180–187, Feb. 1998. [3] K. H. Kim and S. J. Kim, “Neural spike sorting under nearly 0db signal-to-noise ratio using nonlinear energy operator and artificial neural-network classifier,” IEEE Trans. Biomed. Eng., vol. 47, no. 10, pp. 1406–1411, Oct. 2000. [4] K. H. Kim and S. J. Kim, “A wavelet-based method for action potential detection from extracellular neural signal recording with low signal-tonoise ratio,” IEEE Trans. Biomed. Eng., vol. 50, no. 8, pp. 999–1011, Aug. 2003. [5] A. Zviagintsev, Y. Perelman, and R. Ginosar, “Algorithms and architectures for low power spike detection and alignment,” J. Neural Eng., vol. 3, no. 1, pp. 35–42, 2006. [6] A. Zviagintsev, Y. Perelman, and R. Ginosar, “Low-power architectures for spike sorting,” in 2nd Int. IEEE EMBS Conf. Neural Eng., Arlington, VA, Mar. 2005, pp. 162–165. [7] R. Quian Quiroga, Z. Nadasdy, and Y. Ben-Shaul, “Unsupervised spike detection and sorting with wavelets and superparamagnetic clustering,” Neural Comp., vol. 16, no. 8, pp. 1661–1687, Aug. 2004. [8] I. Obeid and P. D. Wolf, “Evaluation of spike-detection algorithms for a brain-machine interface application,” IEEE Trans. Biomed. Eng., vol. 51, no. 6, pp. 905–911, Jun. 2004. [9] J. F. Kaiser, “On a simple algorithm to calculate the ‘energy’ of a signal,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP ’90), vol. 1, Albuquerque, NM, Apr. 1990, pp. 381–384. [10] S. G. Mallat, “A theory for multiresolution signal decomposition: The wavelet representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 11, no. 7, pp. 674–693, Jul. 1989. [11] E. Hulata, R. Segev, Y. Shapira, M. Benveniste, and E. Ben-Jacob, “Detection and sorting of neural spikes using wavelet packets,” Phys. Rev. Lett., vol. 85, no. 21, pp. 4637–4640, Nov. 2000. [12] R. J. Brychta et al., “Wavelet methods for spike detection in mouse renal sympathetic nerve activity,” IEEE Trans. Biomed. Eng., vol. 54, no. 1, pp. 82–93, Jan. 2007. [13] M. Abeles and M. H. Goldstein, Jr., “Multispike train analysis,” Proc. IEEE, vol. 65, no. 5, pp. 762–773, May 1977. [14] D. A. Henze et al., “Intracellular features predicted by extracellular recordings in the hippocampus in vivo,” J. Neurophysiol., vol. 84, pp. 390–400, 2000. [15] K. D. Harris, D. A. Henze, J. Csicsvari, H. Hirase, and G. Buzs´aki, “Accuracy of tetrode spike separation as determined by simultaneous intracellular and extracellular measurements,” J. Neurophysiol., vol. 84, pp. 401–414, 2000.
5020