Software Models of Auditory Processing. M. Coath ... and in software, adjusts the synaptic weight, or effi- ... designated as the keyword and used as the exposure.
Emergent Feature Sensitivity in Hardware and Software Models of Auditory Processing M. Coath1, S. Sheik2, G. Indiveri2, E. Chicca4, S.L. Denham3, T. Wennekers1 1 School of Computing and Mathematics, Plymouth University, UK. 2 Institute of Neuroinformatics, University of Zurich and ETH, Switzerland. 3 School of Psychology, Plymouth University, UK. 4 University of
Bielefeld, Germany.
Introduction We demonstrate the emergence of dynamic feature sensitivity through exposure to formative stimuli in simplified models of auditory processing. Results from analogue neuromorphic hardware allow us to address problems of real-time asynchronous communication in multi-chip systems, and the realization in hybrid VLSI technology of neural computational principles that, we propose, underlie plasticity in neural processing of dynamic stimuli. The availability of hardware on which such models can be implemented makes this a significant step towards the development of adaptive, neurobiologically plausible, spike-based, artificial sensory systems. By characterizing the performance in informationtheoretic terms we provide a basis for the quantitative comparison of networks, connectivity patterns, and learning strategies, and also an objective basis for design decisions.
Stimuli representing the first two formants corresponding to spoken words were used and a vocabulary of seven words was chosen: and, of, yes, one, two, three, four. In each trial one of the ‘words’ was designated as the keyword and used as the exposure stimulus.
SSI FI Response
−10
−5
0
5
10
θ (FM rate) ms−1
Spike patterns representing two FM stimulus classes, linear and forked. The velocity, is given as the number of channels activated per millisecond. Left: Linear frequency sweep at 1.0 ms−1 . Right: Forked frequency sweep.
Network response after exposure to FM stimuli. Red dots: Example tuning curve, showing mean rate of neurons in analogue VLSI hardware, shading indicates rate S.D. Blue: Stimulus Specific Information. Green dashes: estimated Fisher Information. The SSI shows a profile comparable to in vivo measurments [8]
Keyword spotting Figures show the resultant connectivity patterns after training for three of the seven formant track stimuli.
Methods Network model Topology Each of the 32 A neurons in the network projects to a B1 and a B2 neuron via excitatory synapses, and the B1 neurons each project to one B2 neuron via inhibitory synapses. The B1 neurons project, via an intermediate population C, to B2 neurons in other locations. The C neurons implement the propagation delays, and the C to B2 delayed projections are mediated by excitatory plastic synapses which are the loci of the STDP.
C
B2
B1
C
A = Excitatory
B2
B1
A = Excitatory STDP
C
B2
B1
A random ‘sentence’ used as an probe sequence in the software experiments. Each element is a smoothed and down-sampled formant track extracted from speech. The presentation rate shown is ≈ 4 stimuli per second which is the rate used for the exposure phase.
Results Spike train distance and connectivity Since spikes produced by the A neurons are the ones that drive the membrane potential of the B2 neurons, the spike train distance D [6] between the A and B2 activity is tightly linked to the synaptic weight matrix of the B1 to B2 projections. Specifically, small D measures imply high probability of long term potentiation in the corresponding synapse.
After exposure the pattern of connectivity in the network is different for each of the keywords in the set. These differences represent the spectrotemporal correlations in the stimuli. Also shown are the results of the keyword spotting trials, using B2 rate threshold, for stimuli presented at the normal rate (100% or four syllables per second) and both compressed and expanded rates showing robustness to temporal distortion.
A = Inhibitory
Neural network diagram, with one of the 32 sub-units highlighted and two neighboring sub-units in gray, included to indicate lateral connections. Implementation. The hardware implementation consists of three multi-neuron 0.35 μm CMOS spiking chips and an Address Event Representation (AER) mapper [1] connected in a serial loop. The software model is implemented in custom C code. The synaptic update rule, implemented on the chip and in software, adjusts the synaptic weight, or efficacy, upon arrival of a pre-synaptic spike, depending on the instantaneous membrane potential and the internal state of the post-synaptic neuron [2, 3, 4].
Hardware network is based on custom PCBs which supply bias voltages to the chips. These biases can be configured via a USB interface connected to the PC workstation. The AER events are handled by dedicated FPGAs [5].
Stimuli and Exposure Frequency modulated stimuli. For training the hardware network spike inputs representing upward, downward and combined (‘forked’) stimuli were prepared offline. In the software network the same stimuli were presented as patterns of current injection. The network ‘learns’ in an exposure phase during which it is repeatedly exposed to the same exposure stimulus exposure stimulus (ES). Keyword spotting. Formants in speech are peaks in the the frequency response of sounds that identify vowels. In most cases the two first formants are enough to disambiguate a vowel. Simplified formant tracks were used to investigate keyword spotting in the software version of the model.
Left: Spike train distance D, between population A and population B2 for a network exposed to a forked frequency sweep at 5.0 ms−1 . Right: resulting synaptic weight, or connectivity, matrix synapses per projection from B1 to B2 neurons.
Emergent tuning We measured the total number of spikes from B2 neurons after exposure for different velocities of FM sweep; both linear sweeps and forked sweeps are used to measure the FM sweep tuning curve after training. The tuning curves show a Gaussian-like profile which peaks at the velocity of ES in each case [7].
Measured FM sweep tuning curves after exposure to linear FM stimuli. Response of the network is quantified by the number of output spikes from the population of B2 neurons. These results are from hardware, comparable results were obtained in software.
Fisher Information and Stimulus Specific Information To compare networks of different sizes and topologies, as well as different learning rules in future experiments, and also to investigate the robustness of the results to stimulus variation and noise it is useful to be able to compare results from different networks. This requires an objective measure of the network performance, or acuity. We calculate the tuning curve, where the stimulus is assumed to be encoded in the response rate, the estimated Fisher Information, where it is the high-slope regions that are the most informative, and the SSI which combines aspects of both [8].
The error is the number of non-keywords that were classified as keywords. Each point shows the mean error over 10 trials each consisting of 140 stimuli. Each series represents a different presentation rate, given as a percentage of the exposure rate, during the probe phase. Standard error bars are small and hidden by the markers.
Discussion • Neuromorphic auditory models to date have mainly focused on distinguishing static patterns, under the assumption that dynamic patterns can be learned as sequences of static ones. • The spiking network is able to adapt and selectively respond to the dynamic spectro-temporal features of stimuli. The network after exposure can be used to distinguish similar stimuli that have parametrically different dynamic properties, e.g. direction or speed of FM sweep. • No hardware provision for the implementation of delays in the design. Network exploits the variation and mismatch of the components in analog VLSI devices used in the setup. • We have gone beyond the demonstration of emergent sensitivity to a stimulus parameter by quantifying the increase in acuity in information-theoretic terms. This will provide a basis for the quantitative comparison of networks, connectivity patterns, and learning strategies.
References [1] Fasnacht, D. and Indiveri, G., CISS 2011 (2011) 1. [2] Fusi, S., Rev Neurosci 14 (2003) 73. [3] Brader, J. M., Senn, W., and Fusi, S., Neural Comput 19 (2007) 2881. [4] Shouval, H. Z., Bear, M. F., and Cooper, L. N., Proc Natl Acad Sci USA 99 (2002) 10831. [5] Fasnacht, D., Whatley, A., and Indiveri, G., ISCAS 2008 (2008) 648. [6] van Rossum, M. C., Neural Comput 13 (2001) 751. [7] Sheik, S. et al., Frontiers in Neuromorphic Engineering (2012). [8] Butts, D. A. and Goldman, M. S., PLoS Biol 4 (2006) e92.
Acknowledgements: This work is funded by EU FP7 grants 231168-SCANDLE and 257219-neuroP), and by the Excellence Cluster 277 (CITEC, Bielefeld University)