Abstract â In recent years, a variety of statistical methods have been proposed for studying neuronal codes. Many methods are developed by researchers.
VERIFYING NEURONAL CODES: THE STATISTICAL PATTERN RECOGNITION APPROACH Mark Laubach The John B. Pierce Laboratory and Dept of Neurobiology, Yale School of Medicine, New Haven, CT 06519 USA Abstract – In recent years, a variety of statistical methods have been proposed for studying neuronal codes. Many methods are developed by researchers specifically for their own data and are rarely validated by outside groups. An alternative approach is to use methods that have been developed by experts in signal processing and statistics, that have passed through peer review, and that are widely available. My research group uses such methods to study neural coding in the cerebral cortex. We study simultaneously record neuronal spike trains and wideband field potentials in the cerebral cortex of animals that perform psychophysical and sensorimotor behavioral tasks. This approach allows us to assess neuronal codes that are behaviorally relevant, free from the effects of anesthesia, and may be represented across multiple scales. We define relevant features in neuronal activity using waveletbased methods and then use pattern recognition methods to quantify the information content of the features on a trial-by-trial basis. Using these methods, we are able to interpret the network properties of neuronal codes and, using a cluster of workstations, we can now analyze neuronal ensemble data sets during the actual performance of behavioral tasks. This approach leads to a new kind of experimental neurophysiology that allows for the verification of neuronal codes. Keywords – brain-machine interface, neuronal coding, neuroengineering, neuronal ensemble recording
I. INTRODUCTION There has been a recent explosion of interest in recording neuronal activity from large populations of neurons distributed in multiple parts of the brain [1]. Technology for recording such activity has become available over the last 10-15 years. However, there are still few established, wellaccepted methods for analyzing data sets acquired through neuronal ensemble recording. Such data sets are typically of a high dimensionality and the individual variables (the neurons) are nonstationary and highly noisy. These special properties present a real challenge for traditional methods for data analysis. Future progress towards implementing clinically relevant applications of neuronal ensemble recording will require a major effort at developing and testing procedures for data analysis. Recent efforts towards this end are described in the present manuscript.
A typical experiment in our laboratory involves making recordings from as many as 64 electrodes in the brains of rats that have been trained to perform behavioral tasks [2]. Neurophysiological methods are used to simultaneously record the electrical activity of neurons in several brain areas (e.g., motor cortex, thalamus, and basal ganglia) that are thought to be essential for a given task. Our goal is to describe and then quantify how neurons are active on different kinds of trials. Traditionally, data analysis in neurophysiology is done off-line often after it is no longer possible to record again from the same sets of neurons. A major benefit of the procedures described here is that they can be used for on-line analyses of neuronal activity during individual behavioral sessions. II. METHODOLOGY Standard methods are used for implanting arrays of microwire electrodes into multiple areas of the rat brain and for recording spike trains, i.e., the sequences of action potentials fired over time, from many (up to 192) neurons simultaneously [3]. For the past ten years, I have worked on new tools for analyzing data collected with neuronal ensemble recording methods. These tools arise from applications of standard methods for signal processing and statistical pattern recognition (machine learning or data mining). (See [4] for review.) I have used established and accepted methods that have been developed by experts in signal processing and statistics, that have passed through peer review, and that are widely available (e.g., over the Internet). These methods are described below. The basic tenet of the statistical pattern recognition approach is as follows: A specific experimental parameter is varied (i.e., frequency of a tone, location of a tactile probe, success in performing a task). If neuronal activity can be used to predict for each trial in the experiment the type of trial that occurred (e.g., correct or incorrect response) or a specific parameter that was varied (e.g., tone frequency) at levels that are significantly higher than expected by chance, then there must be information in the neuronal activity about the experimental manipulation.
Neuronal ensemble data analysis is an extremely time consuming process with most analyses taking several hours, if not days, to run to completion. As a result, studies of neuronal coding are done “offline”, often several days after the data were collected. In most cases, this makes it almost impossible to go back into the laboratory and make experimental tests of putative neuronal codes. To allow for a more real-time approach to neurophysiology and data analysis, my group has recently implemented novel computer-based methods to carry out on-line analyses of neurophysiological data. To do this, several different computer systems have been integrated to work together as a cluster. A cluster of computers is used for data analysis and is linked to other computers that control events in the behavioral task, control the acquisition of the neuronal data, display on-going analysis results, and present patterns of electrical stimulation to selected recording sites. Such feedback through microstimulation can be used to perturb on-going patterns of neuronal activity or to “insert neural codes” by selectively stimulating key recording sites at specific times during a behavioral trial. A. Data collection Behavioral experiments are controlled using custom scripts written for the Matlab Data Acquisition Toolbox (The Mathworks, Natick, MA) and hardware from Tucker-Davis Technologies (Gainesville, FL). A neuronal ensemble recording system from Plexon Inc (Dallas, TX) is used for recording spike trains. These data are sent from a computer that controls the recording system to the head node of our analysis cluster via custom TCP/IP software. In addition to spike data, wideband field potentials (i.e., filtered between 3.3 Hz and 5.9 kHz) are also measured and stored directly to computer hard disk via a Single-Board Computer (SBC) (Microstar Labs, Bellevue, WA).
In some cases, a second level of dimension reduction is carried out at the multivariate level using principal or independent component analysis. (iii) Motivated by earlier work [5], we extract features that identify time points when a spike train varies over the set of experimental manipulations. Again, the major goal of this step of the analysis is dimension reduction. For classification problems (e.g., discriminate between different types of stimuli), we use the Discriminant Pursuit algorithm [6] for feature extraction. For regression problems (e.g., predict a parameter of a stimulus), we use the Local Regression Basis algorithm [7]. (iv) Next, the predictive power of the features is assessed using pattern recognition methods. (v) Finally, the relative importance of each neuron in a given ensemble is evaluated using modifications of the basic pattern recognition approach. Trial orders for a given variable are randomly permuted (neuronrandomization) or the variable is removed from the data set (neuron-dropping) (see below). C. Preprocessing We routinely preprocess spike trains prior to feature extraction and quantification. Spike trains are smoothed using a combination of low-pass filtering and decimation. As depicted in Fig. 2, these procedures converts spike trains into smooth time-varying representations of spike density. For example, with 3-fold decimation and a spike precision of 1 ms, each spike is smoothed out over a 3 ms epoch.
Fig.2
Fig.1
B. Data analysis: Overview Our general strategy for data analysis is as follows (Fig. 1): (i) Spike trains are preprocessed through low-pass filtering and decimation. This serves to smooth out variations in the timing of spikes and to reduce the dimensionality of data. (ii)
D. Feature Extraction For classification problems, we use the Discriminant Pursuit (DP) algorithm [6]. This method finds structure that contrasts neural activity across different types of trials. Formally, this amounts to maximizing a difference vector, δ , defined as:
δ = x1 − x 2 = ∆ + 1
N
⋅ z,
z ~ N (0,1)
where x1 and x 2 are the average perievent histograms for each of two classes, ∆ is the true
distance between the classes, N is the number of samples, and z is random noise. δ is then partitioned into time-frequency components via wavelet packet analysis. Over a series of iterations, components that account for the largest portions of δ are chosen as features and removed from δ . Then, coefficients (single numbers for each feature) are computed for each trial in a given experiment.
Fig.3
DP is illustrated in Fig. 3. Spike trains were simulated using a Poisson random number generator (raster plots and perievent histograms are shown in A). Two classes of signals were created that differed only in the timing of the onset of a burst of three spikes at a variable time prior to the end of the trial (time 0). It is very difficult to discriminate between these two classes of spike trains using traditional methods from neurophysiology (e.g., t-tests), as there is no difference in spike counts per trial (B). Classifications using features from principal component analysis [5] were at chance levels (features in D). However, the signals were successfully classified using DP. The difference vector and features are shown in C and E, respectively. Further analysis revealed that features 2 and 3 were most critical for classification. These features accounted for the onset and offset of the temporal epoch with the triplet of spikes.
predictive features. In our experience, this is exactly the kind of data that is obtained with neuronal ensemble recording. RF is a randomized and aggregated version of a standard decision-tree algorithm (i.e., CART) (Fig. 4). Methods like RF are preferred for feature quantification over more traditional methods like discriminant analysis because of their nonparametric nature (e.g., non-Gaussian distributions and unequal class variances are not a problem). Other non-parametric methods like Learning Vector Quantization (LVQ) [9] produce similar results to those obtained with RF. However, the special features of RF, discussed below, make the method superior to LVQ when it is necessary to estimate the network properties, e.g., relative importance of neurons, of a given data set. The RF method improves on the standard decision tree approach through the use of bootstrap aggregation [10], or “bagging” (Fig. 4). Bootstrapping is a statistical technique for replicating a data set with a distribution approximating the original. A bootstrap set is formed by drawing samples randomly from the original data set with replacement. A randomized decision-tree is constructed for each bootstrap data set. Aggregating in RF takes place by averaging the response values over the randomized trees (for regression) or by determining the winner class from votes made by the collection of trees (for classification). Whenever a bootstrap set is formed in Random Forest, the original set is randomly partitioned into two subsets: the sampling set (of pre-determined size) and the “out-of-bag” set. The bootstrap set is drawn only from the sampling set, not from the out-of-bag set. Once the decision trees are constructed, the out-of-bag sets can be used to estimate the generalization error. Each case in the original set is run on the decision trees for which the case was in the out-of-bag set and the error rates of all out-of-bag samples are averaged to arrive at an overall estimate of the success of classification.
E. Feature Quantification A powerful method for data mining, Random Forest (RF) [8], is used for feature quantification. This method is well suited for analyses of data sets that contain high dimensional, noisy, and weakly Fig.4
F. Variable Importance The RF method assesses the importance of features extracted from an ensemble of simultaneously recorded neurons by comparing the performance of data sets with all features and those obtained from out-of-bag samples in which a given feature is randomized across groups or over the response variable. That is, after each decision tree is constructed, the values of the i-th variable in the out-of-bag cases are randomly permuted. This effectively destroys any information conveyed by a given feature. Then each out-of-bag case is run through the decision tree. This procedure is repeated for each tree and the results are aggregated for each feature. Features with the largest effects on the success of classification or regression, gauged though several metrics (e.g., the Gini index), are thus identified. This entire process occurs within about 10 m for data sets that include 20-30 neurons, 2-8 classes of data, and 50-300 observations per class. By contrast, methods such as LVQ require several days to achieve similar results using so-called neuron-dropping analyses. G. On-Line Implementation The hardware for neuronal recording and control of the behavioral apparatus (described above) is linked to a cluster of workstations that consists of 10 dual-processor PCs running the Linux operating system (Fig. 5). One computer serves as the “head node” for the cluster of workstations and controls the execution of processes on eight of the computers through custom Python scripts and the C3 package (Cluster Command and Control, Oak Ridge National Laboratory). Another computer contains a single-board computer (Microstar Labs) that is used for A/D conversion for field potentials and for analog output control for micro stimulation.
Fig.5
III. CONCLUSION A paradigm is described for evaluating neuronal codes that is based on well-established methods for statistical pattern recognition and a novel approach for real-time analyses of neurophysiological data.
Wavelet-based methods are used to define features that are embedded in spike trains. These methods greatly reduce the dimensionality of our data. Statistical pattern recognition methods are then used to quantify neuronal coding by estimating the information content of wavelet-based features on a trial-by-trial basis. These methods also allow us to interpret the network properties of putative neuronal codes. By running these analyses over a cluster of workstations that is tightly linked to our neuronal recording system, we are able to make online tests of the predictive relationships between neuronal activity and animal behavior. This approach leads to a new neurophysiology in which neuronal codes can be verified experimentally. ACKNOWLEDGEMENT Supported by DARPA N66001-02-C-8023 to ML. REFERENCES [1] M.A. Nicolelis and J.K. Chapin, “Controlling robots with the mind,” Sci Am., Vol. 287, pp. 4653, 2002. [2] M. Laubach, J. Wessberg, and M.A. Nicolelis, "Cortical ensemble activity increasingly predicts behaviour outcomes during learning of a motor task," Nature, Vol. 405, pp. 567-571, 2000. [3] M.A. Nicolelis et al., “Reconstructing the engram: simultaneous, multisite, many single neuron recordings,” Neuron, Vol. 18, pp. 529-537, 1997. [4] T. Hastie, R. Tibshirani and J. Friedman, "Elements of Statistical Learning: Data Mining, Inference and Prediction" New York:Springer, 2001. [5] B.J. Richmond and L.M. Optican, “Temporal encoding of two-dimensional patterns by single units in primate inferior temporal cortex. II. Quantification of response waveform,” J Neurophysiol, Vol. 57, pp. 147-161, 1987. [6] J.B. Buckheit and D.L. Donoho, "Improved linear discrimination using time-frequency dictionaries," Proc SPIE, Vol. 2569, pp. 540-551, 1995. [7] N. Saito and R.R. Coifman, "On local orthonormal bases for classification and regression", Proc. 20th IEEE Int. Conf. on Acoustics, Speech and Signal Proc., May 1995, pp. 1529-1532. [8] L. Breiman, "Random Forests," Machine Learning, Vol. 45, pp. 5-32, 2001. [9] T. Kohonen. “Self-Organizing Maps,” 3rd edition, New York: Springer, 2000. [10] L. Breiman, "Bagging Predictors," University of California Berkeley, Statistics Department Technical Report No. 421, 1994.