Visual saliency on networks of neurosynaptic cores - Semantic Scholar

Visual saliency on networks of neurosynaptic cores Identifying interesting or salient regions in an image plays an important role for multimedia search, object tracking, active vision, segmentation, and classification. Existing saliency extraction algorithms are implemented using the conventional von Neumann computational model. We propose a bottom-up model of visual saliency, inspired by the primate visual cortex, which is compatible with TrueNorth-a low-power, brain-inspired neuromorphic substrate that runs large-scale spiking neural networks in real-time. Our model uses color, motion, luminance, and shape to identify salient regions in video sequences. For a three-color-channel video with 240 136 pixels per frame and 30 frames per second, we demonstrate a model utilizing 3 million neurons, which achieves competitive detection performance on a publicly available dataset while consuming 200 mW.

Introduction From an engineering perspective, building a portable, energy-efficient, and real-time vision system that is capable of identifying interesting or salient visual regions of an image is challenging, because of the inherent ambiguity and intractability of the vision problem [1–10]. Biological visual systems, on the other hand, have evolved specialized attentional neural circuitry that is able to quickly locate and discriminate visual events important for an organism’s survival [11]. Remarkably, the biological neural networks that underlie this complex visual task are extremely energy-efficient (the entire human brain consumes only 10 W), perform well in a wide range of conditions (low light and moving and cluttered environments), and react quickly (complex identification occurs in a few hundred milliseconds). It has been proposed that biological visual systems make use of the fact that salient objects in a visual image differ statistically from their backgroundVthe so called Bpop-out[ effect that automatically draws an observer’s attention. Based on Treisman and Gelade’s Feature Integration Theory [12], Koch and Ullman [13] hypothesized a purely stimulus-driven model of primate visual attention selection. The basic elements of this hypothesis are 1) a massively

Digital Object Identifier: 10.1147/JRD.2015.2400251

A. Andreopoulos B. Taba A. S. Cassidy R. Alvarez-Icaza M. D. Flickner W. P. Risk A. Amir P. A. Merolla J. V. Arthur D. J. Berg J. A. Kusnitz P. Datta S. K. Esser R. Appuswamy D. R. Barch D. S. Modha

parallel computation of separate feature maps (orientation, color, motion, etc.), 2) the merging of the separate feature maps into a single topographic saliency map encoding the relative locations of salient regions, and 3) a winner-take-all mechanism enabling the serial selection of conspicuous image locations. Itti et al. [14] presented an algorithmic implementation building upon [13] that has been used for diverse tasks such as predicting eye-movement fixation patterns in primates [15], target detection [14], and video compression [16]. Judd et al. [17] used a database of eye-tracking experiments to build a supervised learning model of saliency that combines bottom-up cues (characterized by little or no high-level direction) and top-down image semantics (satisfying specific goals and targets). By drawing upon the rich and diverse literature on saliency algorithms, our main contribution is to invent a spiking-based, real-time, bottom-up visual saliency model and implement it on a brain-inspired processor, TrueNorth [18, 19], to enable real-world vision applications (Figures 1 and 2).

Hardware and software substrates We begin by providing an overview of the TrueNorth hardware, followed by an introduction to the software environment used for application development. This material constitutes a necessary prerequisite for describing the implementation of the saliency model.

ÓCopyright 2015 by International Business Machines Corporation. Copying in printed form for private use is permitted without payment of royalty provided that (1) each reproduction is done without alteration and (2) the Journal reference and IBM copyright notice are included on the first page. The title and abstract, but no other portions, of this paper may be copied by any means or distributed royalty free without further permission by computer-based and other information-service systems. Permission to republish any other portion of this paper must be obtained from the Editor. 0018-8646/15 B 2015 IBM

IBM J. RES. & DEV.

VOL. 59

NO. 2/3

PAPER 9

MARCH/MAY 2015

A. ANDREOPOULOS ET AL.

9:1

Figure 1 A TrueNorth chip consists of 4,096 interconnected cores. Each core takes its input spikes through 256 input axons and also has 256 output neurons. A network of nine such cores is shown at the bottom left inset.

The TrueNorth architecture TrueNorth is a low-power, brain-inspired, digital chip architecture [18, 19] with one million spiking neurons and 256 million synapses organized in 4,096 neurosynaptic cores. TrueNorth is implemented in a 28-nm silicon process and has 5.4 billion transistors (Figure 1). The cores are interconnected by a two-dimensional on-chip mesh network. Further, multiple TrueNorth chips can be seamlessly interconnected by grid tiling. Each core consists of 256 input axons i 2 f0; . . . ; 255g and 256 output neurons j 2 f0; . . . ; 255g, interconnected by programmable binary synapses Wi;j , implemented as a 256 256 binary crossbar (Figure 3). To implement weighted synapses, each axon i has a type index Gi 2 f0; 1; 2; 3g, and each neuron j assigns a 9-bit signed weight, SjGi , to axon type Gi . Thus, the effective weight from axon i to neuron j is Wi;j SjGi . Information is communicated via spikes, generated by neurons and sent to axon inputs via the on-chip/off-chip interconnection network. A spike is a packet with a target delivery time, encoding the value 1. The absence of a spike

9:2


corresponds implicitly to a value of 0. An axon receiving a spike transfers it to each neuron it is connected to via the binary synaptic crossbar. Spikes may be used to represent values using the rate, time, and/or place at which spikes are created (Figure 4). A core’s operation is driven by a global clock with a nominal 1-ms tick, during which all spikes are delivered to their destinations. The computation performed by neuron j at tick t is defined by the neuron equation, described in detail in [20]. It is an extension of the leaky integrate-and-fire neuron model and comprises five operations executed in sequence: 1. Synaptic integrationVThe neuron state, or membrane potential, Vj ðtÞ, is the sum of its value Vj ðt 1Þ at the previous tick and the weighted sum of input spikes Ai ðtÞ that arrive at the neuron from up to 256 input axons i, using the neuron’s weight SjGi associated with each axon’s type Gi : X Ai ðtÞ Wi;j SjGi Vj ðtÞ ¼ Vj ðt 1Þ þ i

IBM J. RES. & DEV.

VOL. 59

NO. 2/3

PAPER 9

MARCH/MAY 2015

Figure 2 An overview of the basic saliency system’s corelet architecture (top) as well as example inputs and outputs (bottom). Each of these subcomponents is described in the paper.

2. Leak integrationVThe neuron membrane potential Vj ðtÞ is incremented (or decremented) by a signed leak j , acting as a constant bias on the neuron dynamics. 3. Threshold evaluationVAfter synaptic and leak integration, Vj ðtÞ is compared with a threshold j . 4. Spike firingVIf Vj ðtÞ j , the neuron Bfires[ or injects a spike into the network, bound for its destination axon. Note that if Vj ðtÞ does not reach its threshold, it will not spike. 5. ResetVIf a spike is fired, the neuron resets Vj ðtÞ to a configurable reset value. In addition to this basic operation mode, the neuron also supports several leakage modes, stochastic modes for synapses, leak, thresholds, and more. This neuron model has been demonstrated to implement a wide range of arithmetic, logical, and stochastic operations, and to emulate the 20 Izhikevich models of biological neurons [20]. Configuring a neuron’s computational behavior is accomplished by setting the 23 neuron parameters (synaptic weights, leak, thresholds, stochastic operation, etc.). Various neuron configurations were used to implement a number of different

IBM J. RES. & DEV.

VOL. 59

NO. 2/3

PAPER 9

MARCH/MAY 2015

algorithms and applications [21]. In this paper, we use several neuron configurations/types to implement the needed functionality. Each neuron can output up to one spike per tick. With a tick frequency of 1 kHz, each neuron can output between 0 and 1,000 spikes per second, sent to an axon located either on the same core or on a different core. An axon may receive its input from more than one neuron (referred to as a bus-OR), such that if more than one spike arrives at an axon in the same tick, these spikes are merged into a single spike (logical OR operation). Furthermore, each neuron has an associated delay value between 1 and 15, which is the number of ticks from the time a spike is generated by the neuron until the time the spike is consumed by the target axon. At the time of consumption, the axon distributes the spike to up to 256 neurons on the core, according to the programmed crossbar connectivity. The corelet programming paradigm This section provides a brief overview of the corelet programming paradigm [22] for building TrueNorth applications. This new paradigm supports the different


9:3

Figure 3 The left inset shows a TrueNorth core’s crossbar, with input spikes entering via the axons to the left, and spikes exiting via neurons at the bottom. Black circles at crossbar intersections denote synaptic connectivity. PRNG stands for Bpseudo-random number generator.[ The right insets give an example of two periodically spiking neurons that send their periodic spikes to a four-way splitter which is responsible for making four copies of each bus-OR merged input spike (see Table 1 for the actual neuron parameters). Such splitters and synchronization neurons are fundamental blocks used in the saliency system. The six neurons are indexed as six distinct neuron types A-F.

The corelet programming paradigm offers

Figure 4 The value 7 encoded by three different spike-coding schemes, all using a 16-tick time window. In rate coding, the information is encoded by the number of spikes occurring over the time window. In burst coding, the spikes encoding the value are sequential and commence at the beginning of the time window. In time-to-spike coding, the spike’s time of occurrence in ticks within the time window denotes the value.

thinking programmers must acquire in their migration from implementing sequential algorithms on von Neumann architectures toward programming brain-inspired, neurosynaptic architectures.

9:4


1. An abstraction for a TrueNorth program, named a corelet, which is an object-oriented paradigm for creating and hierarchically composing networks [22], conceptually akin to the concepts of compositional modularity [23]. 2. A library of reusable corelets suitable for combining into larger, complex functional networks. 3. An end-to-end corelet programming environment (CPE) that integrates seamlessly with the TrueNorth simulatorVCompass. Compass is a highly scalable, parallel, spike-for-spike equivalent simulator of the TrueNorth functional blueprint, which runs on Linux**, OS X**, Windows**, and IBM Blue Gene* supercomputers. Compass has been tested with networks containing more than 2 billion neurosynaptic cores, 500 billion neurons, and more than 1014 synapses [24, 25]. A TrueNorth program is a complete specification of a network of neurosynaptic cores, along with a specification of its input axons and output neurons. A corelet is an abstraction that represents a parameterized TrueNorth program, exposing only its external inputs and outputs while masking all other details of the neurosynaptic cores and their connectivity and configuration. The internal network

IBM J. RES. & DEV.

VOL. 59

NO. 2/3

PAPER 9

MARCH/MAY 2015

connectivity of these external inputs and outputs is hidden from the corelet user through the use of lookup tables (referred to as input connectors and output connectors respectively). By specifying values for the corelet parameters, a TrueNorth program instantiation is generated that can specify the TrueNorth processor’s behavior. The saliency system introduced in this paper is a corelet that consists of multiple subcorelets, each of which in turn consists of several subcorelets. Some of the more prominent subcorelets used within the saliency system are: convolution, Gaussian pyramid, image gradient, rate-to-burst conversion, local averaging, center-surround filter, weighted feature merge, image upsampling, splitter, delay, motion detection, and motion history corelets.

Methods Figure 2 illustrates the block-level organization of the components that make up the saliency system. The input consists of sequences of video frames, where each frame typically consists of three channels representing luminance and chromaticity. A spike representation of each pixel’s intensity and color in each frame is provided as input to the system. The saliency system’s pipeline first calculates a Gaussian pyramid [26] in each channel, which enables the processing of each input frame at multiple scales. The luminance channel is sent to the motion detection block, which detects image pixels where a significant change in luminance occurs over a short period of time, and is realized in the form of a multiscale frame-difference algorithm combined with a stochastic motion history neuron for retaining the recent frame-difference history. The user has the flexibility of specifying the actual separation of frame pairs whose difference is calculated, as well as the spatial scales used. The Gaussian pyramid outputs for all three channels are also sent to the center surround detector [27]. The center-surround detector responds strongly to small regions that are significantly different than their surroundings, and is realized in the form of a 15 15 center-on, surround-off filter that is applied to the multiscale edge maps. These edge maps are extracted from the output of the Gaussian pyramid by a finite differences algorithm. In unison, this Gaussian pyramid with edge detection operation is akin to performing a difference-of-Gaussians on the input image. Subsequently, a spatial rescaling step is applied. It involves upsampling and downsampling blocks to bring all the motion detection and center surround outputs to a common resolution. Afterwards, a smoothing block is applied to the registered images. It suppresses speckle noise and enhances the centers of salient pixel patches. The final step involves applying a cascade of nonlinear weighted local maxima blocks, combined with weighted averaging, to produce the final saliency map.

IBM J. RES. & DEV.

VOL. 59

NO. 2/3

PAPER 9

MARCH/MAY 2015

The next section explains how to map the above-described conceptual blocks onto corelets for the TrueNorth architecture. Gaussian pyramid Input considerations: Source of input spikes If this is a standalone corelet, the spikes arrive from the external video transduction mechanism that represents each channel of each input frame as a set of spikes. If this is a subcorelet of a larger system, the corelet that generates these input spikes must also output a spike-based representation of each frame in a video sequence. Input considerations: Input spike format The input consists of sequences of video frames entering at 30 frames per secondVone frame every 33 ticks. Each frame typically consists of three channels, some channels (typically one) representing luminance and the rest chromaticity. For each channel, the pixel values are uniformly quantized into 4 bits and then rate-coded using 0 to 15 spikes distributed across the first 15 ticks of the 33 tick window (see Figure 4). Algorithm: Functional description The algorithm calculates the Gaussian pyramid representation of each input frame, which provides a multiscale representation of each input frame. A cascade of Gaussian-filter-based convolutions followed by subsampling is applied, which provides the desired pyramidal representation of the input. Algorithm: Corelet implementation As shown in Figure 2, the first layer of the saliency pipeline consists of a corelet that extracts three-layer Gaussian pyramidsVone pyramid per input channel. The resolutions for the three layers are 240 136, 120 68, and 60 34 pixels ðwidth heightÞ. Each downsampled pyramid layer applies a 3 3 convolution with a discrete Gaussian kernel to the previous layer’s output and downsamples the resultant image by discarding the output of every second horizontal and vertical pixel. As shown in Figure 5 and Figure 6, the kernel consists of three distinct non-negative integers r, s, and t. Each layer outputs a 4-bit rate-coded representation of the result. Figures 5, 6, and Table 1 show the neuron parameters necessary to implement this convolution. In order to make efficient use of the TrueNorth architecture, each core receives input from up to 256 pixels, with one axon used per input pixel. Each input pixel/axon affects the output of multiple neurons, due to the overlapping 3 3 window of the Gaussian convolution. As a result, it becomes necessary to use three different neuron types per core (see Figure 5 and Figure 6) and three axon


9:5

Figure 5 Gaussian convolutions and weighted average neuron. Shown here are the neuron parameters used to calculate the 3 3 Gaussian convolution for each block of pixels mapped to a core, by using neuron types A-E (also see Table 1). Note that the letters a, b, c, and d in square boxes denote an axon type. See Figure 6 for related information.

Figure 6 Kernel and gradient. The subfigure at the left shows the modeling of a 5 5 Gaussian kernel as a cascade of two 3 3 kernels where the second 3 3 kernel is applied over four distinct grates. The right subfigure shows the edge extraction network, with the neuron type A-D specification given in Table 1. See Figure 5 for related information.

types. The fourth axon type of each neuron has a negative weight and is used to reset the membrane potential every frame. Note that neurons of type D calculate the weighted average value of the eight border pixels of each 3 3 kernel, and then by using a copy of the central pixel value (neuron E) a weighted average neuron C calculates the weighted

9:6


average of the two inputs, which provides the desired result. Splitters are used to replicate any input pixels that are used at the boundaries by more than one core. Each one of these neurons also uses rounding bias axons that increment the membrane potential by a constant value at the start of each period to make the normalized output value lie closer the

IBM J. RES. & DEV.

VOL. 59

NO. 2/3

PAPER 9

MARCH/MAY 2015

Table 1 The assignments for the neuron parameters of the networks discussed in this paper. These neuron parameters define the dynamic behavior of the neuron according to the neuron equation [20] with each new tick of the simulation and with each new set of input spikes that enter the neuron at the corresponding tick. Variables r, s, t, tc , tm , k are described in the BMethods[ section of this paper and in Figures 3, and 5 through 8. The function fl(.) stands for the floor operator.

rounded result after division by r þ 4s þ 4t or 4s þ 4t. Synchronization neurons (see Figure 3) are used to send periodic reset and rounding pulses to the control axons, at the appropriate ticks within each frame. While only 3 3 kernels were used in this instantiation of the Gaussian pyramid, the corelet can also use kernels with larger support regions as it is shown in Figures 5 and 6 for 5 5 kernels. The idea is to decompose an image into multiple sub-images by uniformly sampling the original image every f ¼ 2 pixels horizontally and vertically to create f 2 subimages (f is referred to as the grating parameter) and apply a 3 3 convolution to each of these sub-images. In other words, the idea is to first apply a standard 3 3 Gaussian convolution, and then successively apply more convolutions as needed, with larger but sparser kernels (see Figures 5 and 6), so that the result approximates the desired kernel.

IBM J. RES. & DEV.

VOL. 59

NO. 2/3

PAPER 9

MARCH/MAY 2015

Output considerations: Output spike format For each channel, output connectors are provided that represent the output for the 240 136, 120 68, and 60 34 pixel resolutions of the respective channel. Output considerations: Destination of output spikes For each output resolution of the luminance channel, one copy of the spikes is sent to a motion detection corelet, and the other copy is sent to an edge extraction corelet, which is in turn used as part of the center surround operation. Each unique output resolution of the non-luminance channels is sent to an edge extraction corelet. Rate-to-burst conversion and edge extraction Input considerations: Source of input spikes The input spikes represent the result of a Gaussian pyramid layer.


9:7

Input considerations: Input spike format The input to the edge extraction network must be burst-rate coded (see Figure 4). If only rate-coded input is available, a rate-to-burst conversion corelet can be used first to convert the rate code to a burst code. Tests show that if a rate-coded representation is provided to the edge extractor, system performance is not significantly affected, and therefore, removing the conversion mechanism is possible in order to decrease the system’s neuron count. Algorithm: Functional description The absolute gradient network is shown in Figure 6. The network takes a batch of pixels as input, and for every pixel ðx; yÞ in this batch, it uses the burst representation of pixel intensity values at coordinates fðx 1; yÞ; ðx; yÞ; ðx; y 1Þg to calculate the horizontal and vertical finite differences, thus approximating the partial derivative of the input intensity image. Algorithm: Corelet implementation The output of the Gaussian pyramid network is rate-coded but not necessarily burst coded. This suggests the inclusion of an optional rate-to-burst code conversion network to convert the output of the Gaussian pyramid before being used by the edge extraction network, in order improve performance. In other words, we need a mechanism that takes as input a sequence of potentially random spikes and outputs the same number of concatenated/juxtaposed spikes, since this is the input that the edge extraction mechanism expects. The idea is simpleVby using a linear neuron ðj ¼ 1Þ with a positive threshold of 1 and a negative threshold of 255, which is also initialized to a membrane potential of 255, we are guaranteed that the neuron will store any input spikes without producing any output spikes, so long as the total number of input spikes does not exceed 255. By using a control pulse to increase the membrane potential by 255, once all the output spikes of the corresponding Gaussian pyramid neuron have entered the conversion neuron’s axons, we can force the neuron to output a sequence of juxtaposed spikes (burst code) of length equal to the number of input spikes. After all the spikes have exited the neuron, a reset pulse along a synapse with weight 255 causes a reset of the membrane potential to 255, at which point the process is repeated for the next frame. The edge-extraction corelet (Figure 6) can then process this burst-coded input. By merging the first two output neurons of the finite-differences network in Figure 6 using a single bus-OR axon, and also merging the last two neurons using another bus-OR axon, we are guaranteed that the two destination axons will take as input a number of spikes that is equal to the absolute value of the horizontal and vertical finite differences, respectively.

9:8


Output considerations: Output spike format The four output neurons provide a number of spikes equal to the positive and negative values of the horizontal and vertical finite differences. As a result, for any frame at most two neurons output spikes. Output considerations: Destination of output spikes As previously described, each horizontal and vertical pair of neurons can be bus-OR merged via an input axon of the local averaging corelet, providing as input to the local averaging module the absolute value of the horizontal or vertical finite differences. Motion detection Input considerations: Source of input spikes The input spikes correspond to the output of a Gaussian pyramid layer, typically that of a luminance channel. Input considerations: Input spike format The input consists of sequences of video frames entering at 30 frames per secondVone frame every 33 ticks. The pixel values may be rate-coded using 0 to 15 spikes distributed across the first 15 ticks of the 33 tick window, or burst-coded. Algorithm: Functional description As it is shown in the saliency system’s architecture in Figure 2, the Gaussian pyramid is sent to a motion detection module (Figure 7). The network subtracts the current frame from the k-th previous frame by calculating the absolute value of the pixel intensity differences at each scale. If this value is at least equal to a user-specified threshold tm , a single output spike is created for that pair of pixels. This spike in turn is sent to a motion history neuron, which uses the stochastic threshold parameters of the TrueNorth neuron [20] to integrate the current input spike with previous spikes and fire a spike with a probability proportional to the current membrane potential of the neuron. Notice that a negative leak is associated with the motion history neuron, and as a result in the absence of new inputs, the probability of it firing decreases with time. In other words, the motion history neuron fires with a time weighted probability that increases as it receives more spikes over a more recent time window. Algorithm: Corelet implementation Every input pixel goes through a splitter corelet creating two copies of the input pixel intensity’s rate-coded representation. One copy goes through a number of delay neurons so that the total extra delay is 33 ticks (i.e., k ¼ 1 frame delay). This delayed frame and the current frame finally go into the main motion detection network axon inputs (see Figure 7). These two rate-coded inputs are subtracted from one another using two neurons H and I. One neuron’s membrane potential is increased in proportion to the positive difference

IBM J. RES. & DEV.

VOL. 59

NO. 2/3

PAPER 9

MARCH/MAY 2015

Figure 7 Motion detection network. See Figure 8 for a blocked center-surround network. See Table 1 for the neuron parameter values.

of the inputs, and the other neuron’s membrane potential is increased in proportion to the negative difference of the inputs. A synchronization neuron E is then responsible for sending a periodic probe signal at the appropriate tick after all the input spikes have entered the axons, so that the two neurons output a total of at most one spike for the current frame if and only if the absolute value of the difference between the two input rate codes is at least equal to threshold tm . These two neurons are merged using a bus-OR which causes a neuron J to output a single spike if and only if the absolute value of the difference between the two inputs is at least equal to the desired threshold. Right after the probing control spike enters, a reset spike is sent from neuron F, which resets the membrane potentials of neurons H and I in preparation for the next frame. Notice that there is also a suppression neuron G that is responsible for suppressing any possible outputs at the beginning of the simulation. This is important until the first delayed frame enters the input axon, to ensure that no unwarranted output spikes are created. Notice that we only need to use one periodic probe, reset, and suppress neuron per core, since the corresponding control spikes can be shared by all neurons in the crossbar. The output of neuron J is then sent to the motion history neuron K of Figure 7. As previously described, this is a leaky neuron that fires with a time-weighted probabilistic threshold that is proportional to the number of spikes that entered the neuron over a finite

IBM J. RES. & DEV.

VOL. 59

NO. 2/3

PAPER 9

MARCH/MAY 2015

time window. By appropriately controlling the neuron parameters, it is possible for example to maintain a Bmotion tail[ of the recent paths traversed by the moving target. Output considerations: Output spike format The output of stochastic neuron K is binarized so that a single spike is sent to the next corelet if and only if the stochastic neuron outputs at least a certain number of spikes during the frame. Output considerations: Destination of output spikes The binarized output spikes of each scale of the motion detection corelet are sent to a Bnormalization[ subcorelet that uses nearest-neighbor interpolation to resample the corresponding map to a common resolution. This in effect registers the images to a common coordinate frame. Local averaging and center surround Input considerations: Source of input spikes The input spikes are a function of the output of the edge extraction module. Input considerations: Input spike format The edge extraction module provides the positive and negative values of the horizontal and vertical partial derivatives at every pixel of an image. These four outputs are


9:9

Figure 8 The blocked center-surround network is implemented as a cascade of filtering operations applied on the original input. See Table 1 for the neuron parameter values.

fused at the corresponding input axon of the local-averaging module by a bus-OR operation, providing a simple and efficient representation of the intensity of the edge at the corresponding pixel. Each one of the four inputs that are bus-ORed is rate-coded using 0 to 15 spikes distributed across the first 15 ticks of the 33 tick window. Algorithm: Functional description Figure 8 shows the 15 15 center-on, surround-off operator used, where the central 5 5 pixels correspond to the center-on region. Compared with often-used operators, such as difference of Gaussians, the Haar-like operator with a nontrivial support region lends itself to an efficient implementation on TrueNorth. There are two main components to the operator. The first component is local averaging, which estimates the 5 5 average pixel value at each scale and pixel that is output by the corresponding absolute gradient networks (see Figs. 2 and 6). Then, based on the definition of image gratings given above for the Gaussian convolution corelet, each locally averaged matrix/image is decomposed into 25 grates/sub-images, by using an f ¼ 5 grating parameter. Similar to the case of the 5 5 Gaussian, each grate is an independent non-overlapping sub-matrix/sub-image containing all the information needed to calculate the center surround response.

9 : 10


Algorithm: Corelet implementation Figure 8 and Table 1 show the neuron tiling and neuron parameters used to apply the center surround operator on a 2D rate-coded input matrix corresponding to each one of these grates. The placement of the three axon types fa; b; cg across a grate’s pixels is shown, which is similar to the placement for the Gaussian kernel. The fourth axon type available is used for the reset axon which resets the membrane potential at the end of the frame. An output neuron type from A, B, and C is associated with each grate pixel, where for each output neuron its corresponding 3 3 receptive field is centered at an axon/input pixel of type a, b and c, respectively. By distributing within each grate the neuron types in the manner shown in Figure 8, it is possible to create a network that uses nine 5 5 pixel sub-tiles to patch together a 15 15 center-surround operator, which outputs a single spike if the result is at least equal to the user-specified threshold tc . Neurons A, B output the thresholded center surround response, while neuron D outputs the average response of the surround region, which is later combined with a copy of a 5 5 center region (neuron E) by using a Bpositive difference[ neuron of type C to give the thresholded center surround response for the rest of the image pixels.

IBM J. RES. & DEV.

VOL. 59

NO. 2/3

PAPER 9

MARCH/MAY 2015

Similar to the previously described synchronization design pattern, the fourth axon type in each grate is dedicated as a reset axon with a negative synaptic weight of 255, and linear combinations of the other two or three positively weighted axon types of a neuron are defined so that control spikes, which arrive simultaneously and one tick before the reset spike, increase the membrane by a sufficient amount to cause a single output spike only when the center is greater than the average of the surrounding by at least tc . Mathematically, this can be expressed by 8 centerðx; yÞ P8 i¼1 avgi ðx; yÞ 8tc , where center ðx; yÞ denotes the 5 5 local average of the patch centered at pixel ðx; yÞ, and avgi ðx; yÞ, for i ¼ 1; . . . ; 8 denotes the local averages of the eight 5 5 patches that have a horizontal or vertical distance of 5 pixels from pixel ðx; yÞ, or in other words are centered at pixels fðx 5; y 5Þ; ðx 5; yÞ; ðx 5; y þ 5Þ; ðx; y þ 5Þ; ðx; y 5Þ; ðx þ 5; y 5Þ; ðx þ 5; yÞ; ðx þ 5; y þ 5Þg of the original input image. Notice that for uniform image regions the filter gives no response. Within each grate, the problem of center-surround operators that overlap multiple cores is addressed in the same way it was addressed for the Gaussian pyramid, namely by using splitters to replicate pixels whose values are used by multiple neighboring cores.

The input for this normalization routine is provided by the motion detection and center-surround routines previously described.

Output considerations: Output spike format Either one or zero spikes are output per frame for each output neuron on which a center-surround operator is centered. The output spikes might have an extra delay associated with them to synchronize their arrival at the destination axon with the outputs from other modules.

Algorithm: Corelet implementation The weighted average operation relies to a large extent on the weighted average neuron, an instantiation of which was previously used for the Gaussian pyramid corelet (see neuron C of Gaussian convolution in Figure 5). Assume that the user wishes to use the weighted average neurons to merge n feature maps, where with each feature map a positive weight of importance is associated. The first step is to normalize the weights so that their sum does not exceed 255. This constraint is attributed to the use of an 8 bit synaptic weight in TrueNorth. Since each weighted average neuron can find the average of two inputs, we apply a cascade of weighted pairwise merges, where at each layer of the cascade, the merge weight of each input is equal to the summation of the weights of all previously merged features that the input rate code represents. As shown in Figure 2, in the basic saliency system a single weighted max operation is first applied to each triple of corresponding pixels from the motion or center surround sequence. This results in numerous merged maps, where each map contains the cross-scale maximum of motion and center surround responses. The output of this operation is then followed by a number of weighted merge operations, which merge the resulting maps across all channels into the final saliency map. In Figure 9, we show the resulting raw saliency maps, as well as their peaks, which result after thresholding the raw saliency maps. Note that the corelet is flexible in the actual sequence of operations performed. For example with different corelet parameter values it is

Output considerations: Destination of output spikes Similar to the output for the motion detection module, the spikes are sent to a Bnormalization[ subcorelet that uses nearest-neighbor interpolation to resample the corresponding map to a common resolution. This in effect registers the binary images to a common coordinate frame. Weighted multichannel merge Input considerations: Source of input spikes As discussed in the introduction, one of the core components of the saliency map hypothesis is the existence of a stage where the individual feature maps are merged into a single master saliency map. This paper’s saliency system merges the maps through a preprocessing stage for registering and smoothing the individual multiscale feature maps, followed by the application of a network for merging input feature maps by applying cascades of (and possibly alternating) weighted max and weighted average operations (see Figure 2). The first stage of this merging phase consists of a normalization/registration that aligns all the multiscale motion and center-surround maps into a common resolution.

IBM J. RES. & DEV.

VOL. 59

NO. 2/3

PAPER 9

MARCH/MAY 2015

Input considerations: Input spike format The input is binary and therefore consists of 0 or 1 spike per frame per pixel. Algorithm: Functional description The registration downsamples or upsamples some of the feature maps through a nearest-neighbor interpolation corelet (implemented through a splitter network), in order to bring the feature maps to a common resolution. This is then followed by a Gaussian smoothing stage that inhibits speckle noise and enhances the centroids of salient patches. For the subsequent weighted max operator, a gain value is assigned to the input rate code at the moment the code is created at the source, which effectively multiplies the number of input spikes by a weight of significance specified at corelet/network instantiation time. By merging the burst-rate-codes through a bus-OR, we obtain the maximum of the input rate codes. A cascade of weighted averages can then merge multiple results of the weighted max operation into a single saliency map.


9 : 11

Figure 9 Examples of input frames for different camera orientations and illuminations (column 1), the corresponding saliency maps (column 2), and the thresholded saliency maps that result after suppressing any values below a certain threshold (column 3).

possible to fuse all maps using a single max operation without any Gaussian smoothing or averaging. Output considerations: Output spike format The output consists of a single saliency map (typically 120 68 or 240 136 pixels) where each pixel outputs between 0 and 15 spikes per frame, with more spikes denoting a more salient region. Sometimes an extra thresholding is performed on this map by suppressing to zero the regions that have below a certain number of spikes, as this better helps visualize the level sets. Output considerations: Destination of output spikes The output spikes are used by whatever module might Bwant[ to use the saliency system, such as a router system that simulates eye movements.

Results and discussion The system is evaluated on the publicly available NeoVision Tower dataset [28, 29], which contains annotated video sequences of pedestrians, cars, buses, trucks, and bicyclists. This provides a qualitative and quantitative measure of

9 : 12


system performance on a real-world problem. To the best of our knowledge, this constitutes the first attempt at constructing a saliency model that is implemented and tested on a low-power spiking neurosynaptic architecture. Our results provide an early proof-of-concept that TrueNorth provides a viable platform for a low-power implementation of a complex application. The dataset consists of 45,000 RGB frames at a resolution of 1,920 1,088 pixels per frame ðwidth heightÞ. The video was recorded at 30 frames per second, for a total of 25 minutes of video. See Figure 2 for an example RGB frame. The transduction process converts each input frame to a 240 136 pixel three-channel input frame, where each channel is quantized to 4 bits from the original 8 bits. This results in a 0 to 15 spike rate-coded representation for every pixel of every input channel, and an average of 23,500,000 input spikes per second. The basic saliency system was optimized for the L log Yb space (a mix of channels from CIE L a b and YCbCr color spaces) and was tested using a model consisting of 13,727 neurosynaptic cores (about 3 million neurons). The network size depends on many of the corelet

IBM J. RES. & DEV.

VOL. 59

NO. 2/3

PAPER 9

MARCH/MAY 2015

Table 2

System performance achieved for various color representations.

parameters, such as the number of layers and the resolutions/filters of the Gaussian pyramids, as well as the parameter k of the motion detector, which specifies the number of frames separating the frame pairs used in motion detection, which in turn affects the number of delay neurons needed. Under appropriate parameters, the corelet can generate saliency models that process a maximum of 52 frames per second, since each frame can be pipelined within a minimum of 19 ticks for 4-bit inputsVeach extra Gaussian pyramid layer is the most costly factor that decreases the maximum frame rate. Our basic model and synchronization neuron parameters were set to expect a single frame every 33 ticks. The motion detector was only applied to the luminance channel and the center surround operator was applied to all 3 channels. The system accuracy was measured with respect to various overlap ratios between salient/non-salient pixels and foreground annotated objects versus background pixels. The following list provides more details. (i) The true positive rate ðTPÞ is defined as the number of ground truth patches overlapping at least one salient pixel divided by the number of ground truth patches. (ii) The positive predictive value ðPPV Þ is defined as the number of salient labeled pixels overlapping a ground truth patch divided by the number of all salient labeled pixels. (iii) The true negative value ðTN Þ is the number of non-salient pixels not overlapping the ground truth patch divided by the number of pixels not overlapping a ground truth patch. (iv) The negative predictive value ðNPV Þ is the number of non-salient pixels not overlapping a ground truth patch divided by the number of non-salient pixels. For completeness, YCbCr, and CIE L a b spaces were also tested under slightly different model parameters/thresholds, and gave performance results (see Table 2) demonstrating that system performance under 4-bit quantization is only slightly affected by the choice of color representation. We also executed on a single TrueNorth chip, consisting of 4,096 cores, a smaller version of the saliency system (four chips are necessary to run the full 13,727 core system). The smaller version uses a single input channel (luminance) to calculate the Gaussian pyramid, motion detection, center surround and merging. On a 4-second test video sequence, we measured power consumption of 50 mW, and based on this we extrapolate a power consumption of 200 mW for the full system. On

IBM J. RES. & DEV.

VOL. 59

NO. 2/3

PAPER 9

MARCH/MAY 2015

average, each neuron’s firing rate for the single chip version of the basic saliency system was 97 spikes/second or 3 spikes/frame. Qualitatively, we observe that the system detects most moving objects, and has trouble detecting small moving objects with low contrast. Most of the nonannotated salient moving object detections occurred due to splashing water near a fountain. While gusts of wind often caused foliage to move, the resolution of the input stream diminished such false detections. Salient non-moving objects, such as sitting pedestrians, were usually detected by the center surround operators. False positive detections typically occurred near objects whose structure varied significantly compared to their surroundings, such as lamp-posts (see Figures 2 and 9).

Conclusion We described an implementation of a bottom-up visual saliency algorithm on a spiking, neurosynaptic, non-von Neumann, parallel, distributed, real-time, energy-efficient architecture that supplies an alternative template to the prevailing von Neumann architecture for future research in cognitive computing [30]. Future improvements include the use of different spike coding schemes for producing saliency maps with higher dynamic ranges, the inclusion of more channels, and more features [31].

Acknowledgments We thank Hayley Wu and Marc Gonzalez-Tallada for providing suggestions that improved its presentation. This research was sponsored by DARPA (Defense Advanced Research Projects Agency) under contract No. HR0011-09C-0002. The views and conclusions contained herein are those of the authors and should not be interpreted as representing the official policies, either expressly or implied, of DARPA or the U.S. Government. *Trademark, service mark, or registered trademark of International Business Machines Corporation in the United States, other countries, or both. **Trademark, service mark, or registered trademark of Linus Torvalds, Apple, Inc., or Microsoft Corporation in the United States, other countries, or both.


9 : 13

References 1. J. K. Tsotsos, A Computational Perspective on Visual Attention. Cambridge, MA, USA: MIT Press, 2011. 2. S. W. Zucker, BStereo, shading, surfaces: Curvature constraints couple neural computations,[ Proc. IEEE, vol. 102, no. 5, pp. 812–829, May 2014. 3. A. Andreopoulos and J. K. Tsotsos, B50 years of object recognition: Directions forward,[ Comput. Vis. Image Understanding, vol. 117, no. 8, pp. 827–891, Aug. 2013. 4. G. Cauwenberghs, BReverse engineering the cognitive brain,[ Proc. Nat. Academy Sci., vol. 110, no. 39, pp. 15512–15513, Sep. 2013. 5. G. Indiveri, B. Linares-Barranco, T. J. Hamilton, A. van Schaik, R. Etienne-Cummings, T. Delbruck, S. Liu, P. Dudek, P. Ha¨fliger, S. Renaud, J. Schemmel, G. Cauwenberghs, J. Arthur, K. Hynna, F. Folowosele, S. Saighi, T. Serrano-Gotarredona, J. Wijekoon, Y. Wang, and K. Boahen, BNeuromorphic silicon neuron circuits,[ Frontiers Neurosci., vol. 5, no. 73, May 2011. 6. E. Neftcia, J. Binasa, U. Rutishauserb, E. Chiccaa, G. Indiveria, and R. J. Douglas, BSynthesizing cognition in neuromorphic electronic systems,[ Proc. Nat. Academy Sci., vol. 110, no. 37, pp. E3468–E3476, Jun. 2013. 7. R. J. Vogelstein, U. Mallik, E. Culurciello, G. Cauwenberghs, and R. Etienne-Cummings, BSaliency-driven image acuity modulation on a reconfigurable array of spiking silicon neurons,[ in Proc. Adv. Neural Inf. Process. Syst., 2004, pp. 1457–1464. 8. R. J. Vogelstein, U. Mallik, J. T. Vogelstein, and G. Cauwenberghs, BDynamically reconfigurable silicon array of spiking neurons with conductance-based synapses,[ IEEE Trans. Neural Netw., vol. 18, no. 1, pp. 253–265, Jan. 2007. 9. S. B. Furber, F. Galluppi, S. Temple, and L. A. Plana, BThe SpiNNaker project,[ Proc. IEEE, vol. 102, no. 5, pp. 652–665, May 2014. 10. K. A. Zaghloul and K. Boahen, BA silicon retina that reproduces signals in the optic nerve,[ J. Neural Eng., vol. 3, no. 4, pp. 257–267, Dec. 2006. 11. L. Itti and C. Koch, BComputational modelling of visual attention,[ Nat. Rev. Neurosci., vol. 2, no. 3, pp. 194–203, Mar. 2001. 12. A. Treisman and G. Gelade, BA feature integration theory of attention,[ Cognitive Psychol., vol. 12, no. 1, pp. 97–136, Jan. 1980. 13. C. Koch and S. Ullman, BShifts in selective visual attention: Towards the underlying neural circuitry,[ Human Neurobiol., vol. 4, no. 4, pp. 219–227, 1985. 14. L. Itti, C. Koch, and E. Niebur, BA model of saliency-based visual attention for rapid scene analysis,[ IEEE Trans. Pattern Analysis Mach. Intell., vol. 20, no. 11, pp. 1254–1259, Nov. 1998. 15. D. J. Berg, S. E. Boehnke, R. A. Marino, D. P. Munoz, and L. Itti, BFree viewing of dynamic stimuli by humans and monkeys,[ J. Vis., vol. 9, no. 5, p. 19, May 2009. 16. L. Itti, BAutomatic foveation for video compression using a neurobiological model of visual attention,[ IEEE Trans. Image Process., vol. 13, no. 10, pp. 1304–1318, Oct. 2004. 17. T. Judd, K. Ehinger, F. Durand, and A. Torralba, BLearning to predict where humans look,[ in Proc. IEEE 12th Int. Conf. Comput. Vis., 2009, pp. 2106–2113. 18. P. A. Merolla, J. V. Arthur, R. Alvarez-Icaza, A. S. Cassidy, J. Sawada, F. Akopyan, B. L. Jackson, N. Imam, C. Guo, Y. Nakamura, B. Brezzo, I. Vo, S. K. Esser, R. Appuswamy, B. Taba, A. Amir, M. D. Flickner, W. P. Risk, R. Manohar, and D. S. Modha, BA Million spiking-neuron integrated circuit with a scalable communication network and interface,[ Science, vol. 345, no. 6197, pp. 668–673, Aug. 2014. 19. A. Cassidy, R. Alvarez-Icaza, F. Akopyan, J. Sawada, J. V. Arthur, P. A. Merolla, P. Datta, M. Gonzalez Tallada, B. Taba, A. Andreopoulos, A. Amir, S. K. Esser, J. Kusnitz, R. Appuswamy, C. Haymes, B. Brezzo, R. Moussalli, R. Bellofatto, C. Baks, M. Mastro, K. Schleupen,

9 : 14


20.

21.

22.

23. 24.

25.

26. 27. 28. 29.

30.

31.

C. E. Cox, K. Inoue, S. Millman, N. Imam, E. McQuinn, Y. Y. Nakamura, I. Vo, C. Guo, D. Nguyen, S. Lekuch, S. Asaad, D. Friedmann, B. L. Jackson, M. D. Flickner, W. P. Risk, R. Manohar, and D. S. Modha, BReal-time scalable cortical computing at 46 giga-synaptic ops/watt with 100 speedup in time-to-solution and 100; 000 reduction in energy-to-solution,[ in Int. Conf. High Perform. Comput., Netw., Storage Anal.VSupercomput., 2014, pp. 27–38. A. S. Cassidy, P. Merolla, J. V. Arthur, S. K. Esser, B. Jackson, R. A Icaza, P. Datta, J. Sawada, T. M. Wong, V. Feldman, A. Amir, D. Ben-Dayan Rubin, F. Akopyan, E. McQuinn, W. P. Risk, and D. S. Modha, BCognitive computing building block: A versatile and efficient digital neuron model for neurosynaptic cores,[ in Proc. IEEE Int. Joint Conf. Neural Netw., 2013, pp. 1–10. S. K. Esser, A. Andreopoulos, R. Appuswamy, P. Datta, D. Barch, A. Amir, J. Arthur, A. Cassidy, M. Flickner, P. Merolla, S. Chandra, N. Basilico, S. Carpin, T. Zimmerman, F. Zee, R. A. Icaza, J. A. Kusnitz, T. M. Wong, W. P. Risk, E. McQuinn, T. K. Nayak, R. Singh, and D. S. Modha, BCognitive computing systems: Algorithms and applications for networks of neurosynaptic cores,[ in Proc. IEEE Int. Joint Conf. Neural Netw., 2013, pp. 1–10. A. Amir, P. Datta, W. P. Risk, A. S. Cassidy, J. A. Kusnitz, S. K. Esser, A. Andreopoulos, T. M. Wong, M. Flickner, R. A. Icaza, E. McQuinn, B. Shaw, N. Pass, and D. S. Modha, BCognitive computing programming paradigm: A corelet language for composing networks of neurosynaptic cores,[ in Proc. IEEE Int. Joint Conf. Neural Netw., 2013, pp. 1–10. G. S. Banavar, BAn application framework for compositional modularity,[ Ph.D. Dissertation, Univ. Utah, Salt Lake City, UT, USA, 1995. T. M. Wong, R. Preissl, P. Datta, M. Flickner, R. Singh, S. K. Esser, E. McQuinn, R. Appuswamy, W. P. Risk, H. D. Simon, and D. S. Modha, B1014 ,[ IBM Res. Div., Armonk, NY, USA, Res. Rep. RJ 10 502, 2012. R. Preissl, T. M. Wong, P. Datta, M. Flickner, R. Singh, S. K. Esser, W. P. Risk, H. T. D. Simon, and D. S. Modha, BCompass: A scalable simulator for an architecture for cognitive computing,[ in Proc. IEEE Int. Conf. High Perform. Comput., Netw., Storage Anal. (SC), 2012, p. 54. P. J. Burt, BFast filter transform for image process.,[ Comput. Graph. Image Process., vol. 16, no. 1, pp. 20–51, May 1981. D. H. Hubel, BThe visual cortex of the brain,[ Sci. Amer., vol. 209, no. 5, pp. 54–62, 1963. Neovision2 dataset-iLab-University of Southern California, Los Angeles, CA, USA. [Online]. Available: http://ilab.usc.edu/ neo2/dataset/ R. Kasturi, D. Goldgof, R. Ekambaram, G. Pratt, E. Krotkov, D. D. Hackett, Y. Ran, Q. Zheng, R. Sharma, M. Anderson, M. Peot, M. Aguilar, D. Khosla, Y. Chen, K. Kim, L. Elazary, R. C. Voorhies, D. F. Parks, and L. Itti, BPerformance evaluation of neuromorphic-vision object recognition algorithms,[ in Int. Conf. Pattern Recog., 2014, pp. 2401–2406. B. Shaw, A. Cox, P. Besterman, J. Minyard, C. Sassano, R. A. Icaza, A. Andreopoulos, R. Appuswamy, A. Cassidy, S. Chandra, P. Datta, E. Mcquinn, W. Risk, and D. S. Modha, BCognitive computing commercialization: Boundary objects for communication,[ in Proc. Int. Conf. IDEMI, Porto, Portugal, Sep. 4–6, 2013, pp. 1–10. J. M. Wolfe and T. S. Horowitz, BWhat attributes guide the deployment of visual attention and how do they do it?[ Nat. Rev. Neurosci., vol. 5, no. 6, pp. 1–7, Jun. 2004.

Received May 1, 2014; accepted for publication June 5, 2014

IBM J. RES. & DEV.

VOL. 59

NO. 2/3

PAPER 9

MARCH/MAY 2015

Alexander Andreopoulos IBM Research - Almaden, San Jose, CA 95120 USA ([email protected]). Dr. Andreopoulos is a Research Staff Member working on the Cognitive Computing/DARPA SyNAPSE project at IBM Research - Almaden. His research interests lie in the areas of computer vision, vision-based robotics, computational neuroscience, machine learning, and medical imaging. He has an Honors B.Sc. degree from the University of Toronto in computer science (first class honors) and mathematics, as well as M.Sc. and Ph.D. degrees in computer science from York University, Toronto, Canada.

Brian Taba IBM Research - Almaden, San Jose, CA 95120 USA ([email protected]). Dr. Taba is a Software Engineer working on the DARPA SyNAPSE project in the Cloud and Synaptic Systems department at the Almaden Research Center. He received a B.S. degree in electrical engineering from the California Institute of Technology in 1999 and a Ph.D. degree in bioengineering from the University of Pennsylvania in 2005.

Andrew S. Cassidy IBM Research - Almaden, San Jose, CA 95120 USA ([email protected]). Dr. Cassidy is a Research Staff Member on the SyNAPSE project at the Almaden Research Center. He received M.S. and Ph.D. degrees in electrical and computer engineering from the Carnegie Mellon University and Johns Hopkins University in 2002 and 2010, respectively. He subsequently joined IBM at the Almaden Research Center, where he has worked on large-scale neural computing architecture, with the SyNAPSE team. He is author or coauthor of over 20 technical papers. Dr. Cassidy is a member of the Institute of Electrical and Electronics Engineers (IEEE), Tau Beta Pi, and Eta Kappa Nu.

Rodrigo Alvarez-Icaza IBM Research - Almaden, San Jose, CA 95120 USA ([email protected]). Dr. Alvarez is a Research Staff Member on the SyNAPSE project at the Almaden Research Center. He received a B.S. degree in mechanical and electrical engineering from Universidad Iberoamericana, Mexico City, an M.S. degree in bioengineering from the University of Pennsylvania, and a Ph.D. degree in bioengineering from Stanford University in 1999, 2005, and 2010 respectively. His research focuses on brain-inspired computer architectures and spans across all layers of hardware. He is author or coauthor of over 20 patents and 12 technical publications.

Myron D. Flickner IBM Research - Almaden, San Jose, CA 95120 USA (mdfl[email protected]). Mr. Flickner is an engineer, manager, inventor, programmer, and author with more than 20 patents and more than 75 publications in areas of image analysis, computer virus detection, retail, neuromorphic systems, and human-computer interaction. He currently works at IBM Research - Almaden on cognitive computing, creating brain-inspired low-power computer systems. Mr. Flickner joined IBM Research - Almaden in 1982, working on automated inspection of thin film disk heads. Since then, he held a variety of roles in IBM and Google. He received a B.S. degree (1980) and an M.S. degree (1982) in electrical engineering from Kansas State University.

William P. Risk IBM Research - Almaden, San Jose, CA 95120 USA ([email protected]). Dr. Risk is a Senior Engineer and Technical Project Manager working on the DARPA SyNAPSE project at IBM Research - Almaden. He received a B.S.E. degree in electrical engineering from Arizona State University in 1982, and M.S. and Ph.D. degrees in electrical engineering from Stanford University in 1983 and 1986, respectively. He subsequently joined IBM at the Almaden Research Center, where he has worked on lasers, optics, optical storage, quantum cryptography, nanoscale devices,

IBM J. RES. & DEV.

VOL. 59

NO. 2/3

PAPER 9

MARCH/MAY 2015

visualization, and neurosynaptic systems. He is author or coauthor on 114 publications, 15 patents, and one book. Dr. Risk is a Fellow of the Optical Society of America and a Senior Member of the Institute of Electrical and Electronics Engineers (IEEE).

Arnon Amir IBM Research - Almaden, San Jose, CA 95120 USA ([email protected]). Dr. Amir is a Research Staff Member in the Cognitive Computing Group at the IBM Almaden Research Center, where he works on the DARPA SyNAPSE project. As a member of the software team, he develops the corelet programming paradigm and new algorithms for neurosynaptic computational substrates. He received his B.Sc. degree in electrical and computer engineering from the Ben Gurion University, Israel, in 1989, and an M.Sc. and D.Sc. degrees in computer science from the Technion-Israel Institute of Technology in 1992 and 1997, respectively. Since joining IBM in 1997, he worked on a number of projects, from eye-gaze tracking and human-computer interaction, speech and video indexing and retrieval, to video archival and tape storage. He initiated and co-invented the Emmy-awarded Linear Tape File System (LTFS). Dr. Amir has coauthored more than 70 technical papers and 20 issued patents. He served as program chair and other roles at various international conferences in computer vision and multimedia. He is a senior member of the Institute of Electrical and Electronics Engineers (IEEE).

Paul A. Merolla IBM Research - Almaden, San Jose, CA 95120 USA ([email protected]). Dr. Merolla received his B.S. degree with high distinction in electrical engineering from the University of Virginia, Charlottesville, Virginia, in 2000 and his Ph.D. degree in bioengineering from the University of Pennsylvania, Philadelphia, in 2006. He was a Post-Doctoral Scholar in the Brains in Silicon Lab at Stanford University (2006–2009), working as a lead chip designer on Neurogrid, an affordable supercomputer for neuroscientists. Starting in 2010, he has been a Research Staff Member at the IBM Almaden Research Center, where he was a lead chip designer for the first fully digital neurosynaptic core as part of the DARPA-funded SyNAPSE project, and more recently, the TrueNorth chip with one million neurons and 256 million synapses, which consumes less than 100 mW. His research involves building more intelligent computers, drawing inspiration from neuroscience, neural networks, and machine learning. His interests include low-power neuromorphic systems, asynchronous circuit design, large-scale modeling of cortical networks, statistical mechanics, machine learning, and probabilistic computing.

John V. Arthur IBM Research - Almaden, San Jose, CA 95120 USA ([email protected]). Dr. Arthur is a Research Staff Member working on the SyNAPSE project. He received a B.S.E. degree in electrical engineering from Arizona State University in 2000 and a Ph.D. degree in bioengineering from the University of Pennsylvania in 2006. He was a postdoctoral scholar in bioengineering at Stanford University. His research focuses on applying brain-inspired principles to chip design and architecture, with interests including dynamical systems, neuromorphic and neurosynaptic architecture, and hardware-aware algorithm design.

David J. Berg IBM Research - Almaden, San Jose, CA 95120 USA ([email protected]). Dr. Berg is a Senior Software Engineer in the Cognitive Computing Group at the IBM Almaden Research Center. He received a B.S. degree in cognitive science at the University of California at San Diego in 2003, and a Ph.D. degree in computational neuroscience at the University of Southern California in 2013. He subsequently joined IBM at the Almaden Research Center, where his work is focused on software tools and visual processing algorithms for the DARPA SyNAPSE project.


9 : 15

Jeff A. Kusnitz IBM Research - Almaden, San Jose, CA 95120 USA ([email protected]). Mr. Kusnitz has been with IBM for more than 26 years. He spent 15 of those years working in IBM Research and Software Group organizations on speech and telephony platforms and technologies, in roles ranging from software development to platform integration to worldwide standards as IBM’s representative to several industry standards organizations. Following that, he worked with IBM Research’s WebFountain and Semantic Super Computing organizations, focusing primarily on enterprise-scale text analytics and indexing, developing and maintaining several services used within IBM to manage and mine IBM-internal web pages. He is currently working at IBM Research - Almaden in the Cognitive Computing Group, focusing on infrastructure, tooling, and simulators.

Pallab Datta IBM Research - Almaden, San Jose, CA 95120 USA ([email protected]). Dr. Datta is a Research Staff Member on the DARPA SyNAPSE project in the Cognitive Computing Group at the IBM Almaden Research Center. He received a B.E. degree in electronics engineering from University of Allahabad, India, in 1999, and Ph.D. degree in computer engineering from Iowa State University in 2005. Prior to joining IBM Almaden Research Center, Dr. Datta had worked at The NeuroSciences Institute in San Diego, California. Dr. Datta was also a Technical Staff Member at Los Alamos National Lab in the Information Sciences (CCS-3) Division. He was also a visiting researcher at INRIA, Sophia-Antipolis France. He is currently working on large-scale simulations using the IBM Neuro-Synaptic Core Simulator (Compass) and development of the Corelet Programming Language for programming the reconfigurable neurosynaptic hardware. He is also involved in the development of algorithms and applications with networks of neurosynaptic cores for building cognitive systems. His technical interests include neuromorphic architecture and simulations, high-performance computing, machine learning, optimization techniques, and graph theory. He is author or coauthor of several patents and 20 technical papers. Dr. Datta is a member of the Institute of Electrical and Electronics Engineers (IEEE) and the Association for Computing Machinery (ACM).

University of Pittsburgh, an M.S. degree in computer science from the University of California at Santa Barbara, and a Ph.D. degree in vision science from the University of California at Berkeley. He joined the Cognitive Computing Group at the IBM Almaden Research Center in 2010. Dr. Barch is a member of the American Association for the Advancement of Science (AAAS).

Dharmendra S. Modha IBM Research - Almaden, San Jose, CA 95120 USA ([email protected]). Dr. Modha is an IBM Fellow and IBM Chief Scientist, Brain-Inspired Computing. He is also Principal Investigator for the DARPA SyNAPSE project. He holds a B.Tech. degree in computer science and engineering from IIT Bombay (1990) and a Ph.D. degree in electrical and computer engineering from University of California at San Diego (1995). Dr. Modha has authored more than 60 publications in international journals and conferences, holds more than 50 U.S. patents, and is an IBM Master Inventor. He is a member of the IBM Academy of Technology, the American Association for the Advancement of Science, the Association for Computing Machinery, and the Society for Neuroscience. He is also a Fellow of the Institute of Electrical and Electronic Engineers (IEEE) and a Fellow of the World Technology Network. He has received the FAST (File and Storage Technologies) Test-of-time Award; the Best Paper Award at IDEMI (International Conference on Integration of Design, Engineering and Management for Innovation); First Place for the 2012 Science/NSF International Science and Engineering Visualization Challenge, Illustration Category; the Best Paper Award at ASYNC (International Symposium on Asynchronous Circuits and Systems); and the ACM Gordon Bell Prize. SyNAPSE was named as one of the Best Innovation Moments of 2011 by The Washington Post, and Dr. Modha was named as one of the B10 Electronics Visionaries to Watch[ by the EE Times on its 40th Anniversary.

Steve K. Esser IBM Research - Almaden, San Jose, CA 95120 USA ([email protected]). Dr. Esser is a Research Staff Member working on the DARPA SyNAPSE project in the Cloud and Synaptic Systems department at the IBM Almaden Research Center. He has a B.S. and Ph.D. degree from the University of Wisconsin-Madison, where he developed computational models of the brain during sleep and wakefulness. His current research focuses on brain-inspired algorithms and applications for operation on the TrueNorth chip.

Rathinakumar Appuswamy IBM Research - Almaden, San Jose, CA 95120 USA ([email protected]). Dr. Appuswamy received his B.Tech. degree from Anna University, Chennai, India, and his M.Tech. degree from the Indian Institute of Technology, Kanpur, India, both in electrical engineering in 2002, and 2004, respectively. He received the M.A. degree in mathematics and his Ph.D. degree in electrical and computer engineering both from the University of California, San Diego, in 2008, and 2011, respectively. During 2011, he was a postdoctoral researcher at IBM Research - Almaden, and since 2012 he has been a Research Staff Member. His research interests include multi-modal learning, network coding, communication for computing, and network information theory.

Davis R. Barch IBM Research - Almaden, San Jose, CA 95120 USA ([email protected]). Dr. Barch is a Senior Software Engineer in the Cognitive Computing group at the IBM Almaden Research Center. He received a B.S. degree in chemistry from The George Washington University, an M.S. degree in biochemistry from the

9 : 16


IBM J. RES. & DEV.

VOL. 59

NO. 2/3

PAPER 9

MARCH/MAY 2015

Visual saliency on networks of neurosynaptic cores - Semantic Scholar

Visual saliency on networks of neurosynaptic cores - Semantic Scholar

Suggest Documents

Influence of Depth Cues on Visual Saliency - Semantic Scholar

Impact of feature saliency on visual category learning - Semantic Scholar

Influence of Depth Cues on Visual Saliency - Semantic Scholar

Fusion of Saliency Maps for Visual Attention ... - Semantic Scholar

Graph-Based Visual Saliency

Fast Visual Tracking Using Motion Saliency in Video - Semantic Scholar

Visual saliency and potential field data ... - Semantic Scholar

Saliency Density Maximization for Efficient Visual ... - Semantic Scholar

Vibrotactile Target Saliency - Semantic Scholar

SalGAN: visual saliency prediction with adversarial networks - arXiv

Investigating the Effects of Visual Saliency on Deictic Gesture ...

Cluster-Based Co-Saliency Detection - Semantic Scholar

Structural Complexity and Saliency in ... - Semantic Scholar

Incremental Sparse Saliency Detection - Semantic Scholar

Visual Analysis of Bipartite Biological Networks - Semantic Scholar

Saliency Modeling from Image Histograms - Semantic Scholar

Enhancing by Saliency-guided Decolorization - Semantic Scholar

Efficient Scale-space Spatiotemporal Saliency ... - Semantic Scholar

Visual Saliency Based on Scale-Space Analysis in the Frequency ...

Visual Saliency Based on Multiscale Deep Features - HKU

Bottom-Up Saliency Detection Model Based on Human Visual ...

CO-SALIENCY BASED VISUAL OBJECT CO

An HVS-inspired video deinterlacer based on visual saliency ...

Image Compression based on Visual Saliency at Individual Scales