IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 21, NO. 11, NOVEMBER 2010
1697
Recognition of Partially Occluded and Rotated Images With a Network of Spiking Neurons Joo-Heon Shin, David Smith, Waldemar Swiercz, Kevin Staley, J. Terry Rickard, Javier Montero, Lukasz A. Kurgan, and Krzysztof J. Cios
Abstract—In this paper, we introduce a novel system for recognition of partially occluded and rotated images. The system is based on a hierarchical network of integrate-and-fire spiking neurons with random synaptic connections and a novel organization process. The network generates integrated output sequences that are used for image classification. The proposed network is shown to provide satisfactory predictive performance given that the number of the recognition neurons and synaptic connections are adjusted to the size of the input image. Comparison of synaptic plasticity activity rule (SAPR) and spike timing dependant plasticity rules, which are used to learn connections between the spiking neurons, indicates that the former gives better results and thus the SAPR rule is used. Test results show that the proposed network performs better than a recognition system based on support vector machines. Index Terms—Image recognition, partially occluded and rotated images, spiking neurons, synaptic plasticity rule.
I. Introduction
D
IGITAL image databases grow exponentially as a result of advances in imaging hardware and its proliferation in commercial, medical, and military systems. Unfortunately, the value of such data is not yet fully realized due to relatively slow advancements in automated image recognition, recall, and understanding. Organizations such as the Defense Advanced Research Projects Agency, after spending millions on
Manuscript received November 2, 2009; revised February 19, 2010; accepted April 20, 2010. Date of current version November 3, 2010. The work of K. J. Cios was supported by NIH, under Grant 1R01NS064675-01. J.-H. Shin is with the Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284 USA (e-mail:
[email protected]). D. Smith is with University of Colorado Denver, Denver, CO 80204 USA (e-mail:
[email protected]). W. Swiercz and K. Staley are with Massachusetts General Hospital, Boston, MA 02114 USA, and with Harvard Medical School, Boston, MA 02115 USA (e-mail:
[email protected];
[email protected]). J. T. Rickard is with Distributed Infinity, Inc., Larkspur, CO 80118 USA (e-mail:
[email protected]). J. Montero is with the Department of Statistics and Operational Research, Faculty of Mathematics, Complutense University of Madrid, Madrid 28040, Spain (e-mail:
[email protected]). L. Kurgan is with the Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB T6G 2M7, Canada (e-mail:
[email protected]). K. J. Cios is with the Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284 USA, and with IITiS Polish Academy of Sciences, Gliwice 44-100, Poland (e-mail:
[email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNN.2010.2050600
research in this area, have came to the conclusion that image recognition performed by humans is far better (in terms of speed and accuracy) than that of the best automatic recognition systems [5], [42]. For instance, humans are much quicker in recognizing previously seen objects even if they are partially occluded and rotated. Solving challenging image distortions, like occlusions and rotations, via automated means calls for development of systems that combine several computational approaches [22], [49]. More specifically, one of the basic problems in image analysis is the underlying assumption of a certain mathematical decomposition hypothesis, say, assuming that different causes of distortions are independent and can be isolated. For instance, given that an assumption that the occlusion and rotation are independent is correct, we can first solve the occlusion problem and later address the problem of the rotation, or the other way round. In practice, however, such independency cannot be proven, and therefore such sequential solutions could potentially fail. Manual analysis of large and ever growing image databases is impractical which motivates research in automated image recognition. Unfortunately, automating this process is difficult and poses many challenges. Recognizing the fact that computer systems will not be on par with humans for some time to come, we propose an approach that may lead to design of systems that improve over the existing solutions. We observe that a relatively low speed of information transmission between biological neurons, when compared with transmission speeds of computers, is compensated by a “smart” organization of the brain’s neural circuits that enables the remarkable ability of humans to perform complex image recognition tasks. This fact motivated us to design a system that attempts to model some brain operations with an artificial neural system, while being fully aware that this attempt is a simplistic approximation of how the brain’s neural circuits truly operate. The first attempt at mimicking the brain’s operation was the development of artificial neural networks (NN). Although the early artificial NN modeling attempts were relatively crude, in part because of the poor understanding of the brain operation, more recent designs were able to model specific regions of the brain quite accurately and they could solve tasks such as image segmentation that are outside of the modeling realm [8], [10], [11], [17], [27], [30], [32], [34], [36], [41], [44]. These advances were due in large part to a rapid progress in neuroscience, which has provided better understanding of how
c 2010 IEEE 1045-9227/$26.00
1698
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 21, NO. 11, NOVEMBER 2010
specific regions of the brain organize, reorganize, learn, and function; and at a lower level, how individual neurons function and operate. These discoveries have been closely followed by computational scientists who designed systems that mimic some of those operations to solve practical applications [10], [11], [44]. In our previous work, we developed a synaptic plasticity activity rule (SAPR) based on studies of the CA3 region of hippocampus and later used this rule to implement a network of spiking neurons for edge detection in images [48]. We also used a similar network topology and the SAPR to successfully model effects of synaptic depression and recovery governed by glutamate release [47]. We present here a novel image recognition system designed as a network of integrate-and-fire spiking neurons that uses both the SAPR and the spike timing dependent plasticity (STDP) [46] learning rules. The system consists of three layers of neurons including the input layer, the feature extraction layer [47], [48], and the recognition layer. The latter layer consists of excitatory neurons that have random synaptic connections to neurons from the feature extraction layer. We address the usage of spiking neurons in the recognition layer for the purpose of decision making, which is recognized as a challenging task [52]. In designing our system, we were inspired by organization (formation) of neural circuits and retrieval of neural memories in a primate hippocampus [54]. Many of the neuron connections in the brain appear during initial brain growth and later they are continuously organized/reorganized. These changes are due to repeated experiences such as seeing similar images many times [39], [51]. Visual input is transported to visual and entorhinal cortex for processing. The results are forwarded to hippocampus where new memories are formed. With time the new memories are transferred to cerebral cortex for long term storage [23]. Once the long term memory connections are established, the brain is able to immediately “recall” what the new input image is by comparing it with the stored information, which is exploited in our solution. As a result, relatively little processing is required to make the corresponding recognition decision. We evaluate our system using gray images of human faces. To make the recognition task more difficult for a computer (although this would not make it correspondingly more difficult for a human) we use occluded images presented at random angle/rotation. Upon presentation of the images, the system forms unique signature patterns that correspond to a person’s face in a way vaguely similar to the above described “organization” phenomenon in the brain. When a new face exemplar from a dataset is presented to the system, it “recalls” the corresponding person’s face “signature” and thus is able to correctly recognize it relatively quickly. One of the challenges is to decide whether a new image is truly similar (or dissimilar) to the image(s) seen during the organization phase. In our previous work, we explored different metrics to perform these comparisons [7]. In this paper, however, we design a new similarity measure that operates on the firing patterns of the recognition layer neurons rather than on pixels from the original image. The new
image similarity measure, named the “threshold comparator,” combines information concerning spike timing and the corresponding transmembrane potential. This hybrid design results in improved performance of the measure. Throughout this paper, we will use terms “system” and “network” interchangeably. Also, the terms “size” (of image) and (image) “resolution” are used in the same manner. II. Methods In contrast to dozens of artificial neural network models that use very simple neuron models and learning rules [2], [56], our network uses a neuron model that more closely mimics a biological neuron, namely, the integrate-and-fire model [29], along with the biologically-inspired dynamic plasticity rule [47], [48]. The design of a neural network requires determining the neuron model and the learning rule used to update the weights/synapses, and designing the topology of a network, which defines how the neurons are arranged and interconnected. The first two elements are specified a priori, while topology can be either static (defined a priori) or it can be dynamically modified by adding neurons and/or hidden layers as needed to solve a problem; the networks designed using the latter approach are referred to as ontogenic [6], [9], [14]. All artificial neuron models resemble, to varying extent, biological neurons. The degree of this resemblance, however, is an important distinguishing factor between different neuron models. The most accurate neuron models mimic all the key characteristics of biological neurons that include their temporal spiking nature, membrane potential, sodium, potassium, and calcium channels, threshold accommodation, refractory periods, and multi-compartmental structure (dendrites, soma, axon), etc. An example of a detailed, biologically-inspired model is the Hodgkin–Huxley neuron model [20]. At the other end of the spectrum is the McCulloch–Pitts model [31], which is very simple and preserves only a few features of a biological neuron. Other models, like the integrate-and-fire model [29] used here (see Table I), strive to achieve a balance between their ability to represent the underlying biological features/behavior and computational complexity. The ability of a network to learn from input data via the use of a learning rule that updates network connections is an indispensable characteristic of any artificial neural network algorithm. We developed the first dynamic synaptic activity plasticity rule (SAPR) for weight adjustments [48] (see Section B). In the following, we first briefly describe the modified MacGregor integrate-and-fire neuron model and then we discuss the learning rules and topology of the neural network. A. Spiking Neuron Model Integrate-and-fire (I&F) neuron models introduce a few simplifications of the neuronal spike generation process, as compared with the more detailed conductance models [20], while still implementing essential neuron properties. The modified MacGregor model used here closely represents the behavior of a biological neuron in terms of its transmembrane
SHIN et al.: RECOGNITION OF PARTIALLY OCCLUDED AND ROTATED IMAGES WITH A NETWORK OF SPIKING NEURONS
1699
TABLE I MacGregor’s Modified Neuron Model and Its Parameter Settings
Transmembrane potential
Refractory properties
Differential Equations dE dt = [−E + GK · (EK − E) + Ge · (Ee − E) +Gi · (Ei − E) + SCN] Tmem
dGK dt
=
Spike generation S = Threshold accommodation
dTh dt
=
−GK +B·S TGK
1 0
Parameter Settings
Equilibrium potential of the potassium conductance Ek = −12 mV, equilibrium potential of the excitatory conductance Ee = 80 mV, equilibrium potential of the inhibitory conductance Ei = −10 mV, membrane time constant Tmem = 25 ms (1) Amplitude of the postfiring potassium conductance decay B = 20, potassium conductance time constant TGK = 15 ms (2)
ifE ≥ Th ifE < Th
−(Th − Th0 ) + C·E Tth
Eq. No.
(3) Amplitude of the threshold C = 0.5, resting threshold of the cell Th0 = 10 mV, time constant for decay of threshold Tth = 30 ms
(4)
SCN is an external input current injected to the neuron. GK , Ge and Gi represent the activity of potassium, excitatory and inhibitory synaptic conductances, respectively.
potential, potassium channel response, refractory properties, and threshold accommodation [6], [27], [29], [48]. The modified MacGregor I&F model (for details, see [48]) and its parameter settings used here are shown in Table I. The neuron’s membrane potential changes according to the incoming spikes and is governed by (1)–(4). The spikes enter the neuron through synaptic connections, thereby increasing the synaptic conductance. This results in postsynaptic potential changes. The synaptic connections can be either excitatory or inhibitory. The type of synaptic connection depends on the presynaptic neuron. Excitatory neurons are processing mainstream information in the network while inhibitory neurons are used to provide negative feedback that works as a self-regulating mechanism, e.g., preventing the network from extreme excitation [23]. The weighted sum of all excitatory and inhibitory synaptic conductances yields the excitatory or inhibitory stimulus values, respectively. If the excitatory stimulus is too weak or the inhibitory stimulus is too strong, then the membrane potential cannot reach the neuron’s firing threshold. If the stimulus is strong enough for the membrane potential to reach this threshold, then the neuron fires, i.e., it generates an outgoing spike train traveling along the axon. For the sake of brevity, from now on we will use the term “neuron fired” to mean that a “neuron fired a train of spikes.” The neuron is incapable of responding to any additional stimulation for a short time immediately after the spike generation; this time interval is referred to as the absolute refractory period. The absolute refractory period is followed by an interval known as the relative refractory period, during which the neuron can only respond to a relatively strong stimulation. B. Learning/Plasticity Rules We use synaptic plasticity rules similar to the one stated first by Konorski [24] and then by Hebb [46]. In short, the relative activity between pre and postsynaptic neurons is critical for the synaptic changes; adjustment of the strength of synaptic connections between neurons takes place every time the postsynaptic neuron fires [46]. If the firing occurs, the synaptic weight values are updated according to the equation d (5) wij = α+− · PSPij dt
where wij is the synaptic weight between postsynaptic neuron i and presynaptic neuron j; α+− is the learning rate that controls positive and negative adjustments, respectively; and PSPij is the postsynaptic potential value of the connection between postsynaptic neuron i and presynaptic neuron j. Arrival of the action potential at the synaptic connection changes the synaptic conduction which elicits synaptic current alteration and thus results in the postsynaptic potential (PSP) modification. The PSP can be either excitatory or inhibitory and is proportional to the synaptic conductance change. These changes directly affect the neuron’s membrane potential. Various Konorski/Hebb-type learning rules have been extensively studied [4], [15], [16], [25], [40], [46], [48]. The prime example is the STDP rule [46] specified by (6) and illustrated in Fig. 1(b) α+ exp −t τ+ if t > 0 STDP (t) = (6) −α− exp t τ− if t ≤ 0 where t = (tpost − tpre ) is the time delay between the postsynaptic spike and the presynaptic spike; α+− is the learning rate; and τ+− is the time constant. The STDP rule embodies Konorski-type plasticity using the concept of relative timing, as can be seen in (5); we used α+− = 1 and τ+− = 20 ms. In contrast to all of the above rules, the synaptic activity plasticity rule (SAPR) [48] uses the actual synaptic dynamics to decide amount of adjustment. When modification of the synaptic weight between the pre and postsynaptic neurons occurs, the SAPR adjusts the synaptic weight depending on the particular synapse type and its recent actual activity. There is no explicit equation or function shape for the synaptic strength adjustment in the SAPR. The adjustment only approximates possible function using a PSP shape [48]. Fig. 1(a) shows just one example of a learning function using a general PSP shape for excitatory and inhibitory synapses. The actual shape varies depending on the particular synapse parameters, current synaptic strength, and learning rate used. In contrast to the STDP function, the SAPR function is continuous, has a finite range of values, and is dynamic (i.e., it changes from experiment to experiment while the STDP function is static) [48]. Synaptic connection strengths are bounded to be in the [0.1, 5] interval by using a hard-limiting function. Similar to
1700
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 21, NO. 11, NOVEMBER 2010
Fig. 1. Learning functions used in (a) SAPR and (b) STDP. Note that t = (tpost − tpre ) is the time delay between the postsynaptic spike and the presynaptic spike. For STDP, we used α+− = 1 and τ+− = 20 ms. (a) Synaptic strength modification when using SAPR. (b) Synaptic strength modification when using STDP.
the STDP, the polarity of weight change in the SAPR generally matches the phase of spike times, although it has been shown that this is not necessarily true because polarity might exhibit frequency dependence [26]. In general, frequency dependence of spike timing dependent plasticity is a complex phenomenon that depends on the precise frequency of stimulation, presynaptic depression and postsynaptic desensitization, the amplitude and duration of the neuronal calcium transient determined by calcium-dependent calcium release, calcium buffering, and calcium transport capacities, and the relative sensitivities of CaMKII and calcineurin. At this point, there is not enough information concerning these processes to permit adequate modeling of the frequency dependence of the SAPR. The advantage of the SAPR is that instead of using an artificial function, as used in the STDP, it uses the actual value present in each synapse. Modification of the synaptic weights between pre and postsynaptic neurons takes place every time the postsynaptic neuron fires. When the firing occurs, all of the neuron’s incoming synapses are evaluated and their synaptic strengths are adjusted depending on the particular synapse type and recent activity. The amount of the adjustment is proportional to the contribution of a particular synapse to the neuron’s firing. If a particular excitatory presynaptic neuron spike arrives before the postsynaptic neuron fires, then the related synapse is assumed to have a positive contribution, and thus its synaptic strength increases by an amount proportional to the current postsynaptic potential (PSP) value. When an excitatory presynaptic neuron’s spike arrives after the postsynaptic neuron fires, it has no contribution to the recent firing and thus its strength is decreased by an amount proportional to the current PSP value. We also used learning for the inhibitory synapses because this lead to improvements in the quality of image recognition, see Fig. 8. C. Network Topology Many approaches in computational neuroscience are based on the observation that the brain uses a hierarchical structure to perform cognitive tasks [37], [38], although details of
this structure are yet to be fully understood. Our network also uses a hierarchical organization, from the sensory input layer to the recognition layer. The network topology for the image recognition system consists of three layers: 1) the sensory/receptive layer, which consists of only excitatory neurons; 2) the feature extraction layer, which consists of both excitatory and inhibitory neurons; and 3) the recognition layer, which consists of excitatory neurons [see Fig. 2(a)]. The recognition layer uses the output of the feature extraction layer to generate sequences of firings that are applied as specific signatures to recognize new images. The sensory and feature extraction layers draw on our previous work concerning application of networks of spiking neurons in image recognition [7], [48]. The sensory layer’s dimensions are three times larger, i.e., the corresponding area is nine times larger, than the size of the processed image to allow for overlapping between neurons in the sensory layer. The number of excitatory neurons in the feature extraction layer is also three times larger than the number of pixels of the input image, while the number of inhibitory neurons in this layer is equal to the number of pixels in the image [48]. This results in the 9:1 ratio of excitatory to inhibitory neurons, which is consistent with the corresponding estimates in the hippocampal region [50]. The inhibitory neurons provide negative feedback to prevent the network from becoming extremely excited [23]. Fig. 8 demonstrates the importance of using inhibitory neurons in the feature extraction layer; the face shown there is more clearly outlined using inhibition than without using it. The connections from the sensory to the feature extraction layers are shown in Fig. 2(c). Excitatory neurons in the sensory and feature extraction layers have the same dimension, 3n×3m which means that each excitatory neuron in the sensory layer is connected to the corresponding (i.e., located at the same position) excitatory neuron in the feature extraction layer. However, since inhibitory neurons in the feature extraction layer are organized into an n×m matrix, each inhibitory neuron in the feature extraction layer is connected to the corresponding 3 × 3 matrix of neurons in the sensory layer [see Fig. 2(c)].
SHIN et al.: RECOGNITION OF PARTIALLY OCCLUDED AND ROTATED IMAGES WITH A NETWORK OF SPIKING NEURONS
1701
Fig. 2. Topology of the network. (a) High-level block diagram. (b) Recurrent synaptic connections between the excitatory neurons in the feature extraction layer. (c) Synaptic connections between the excitatory neurons in the sensory/feature extraction layer and the inhibitory neurons in the feature extraction layer.
The feature extraction layer includes three types of synaptic connections. Similar to the connections from the excitatory neurons in the sensory layer to the inhibitory neurons in the feature extraction layer, each inhibitory neuron is connected to a 3 × 3 matrix of excitatory neurons within the feature extraction layer, see Fig. 2(c), and vice versa. This layer also includes recurrent connections between its excitatory neurons, where each neuron is connected to the eight neighboring excitatory neurons [48], as shown in Fig. 2(b). The recognition layer, which consists of only excitatory neurons, collects the information coming from the feature extraction layer. It is constructed by randomly (and evenly) partitioning the total number of excitatory neurons in the feature extraction layer. Thus, for r neurons in the recognition layer and n × m size of the input images, each neuron in the recognition layer is randomly connected to (3n × 3m)/r excitatory neurons in the feature extraction layer. Fig. 3 shows the relation between the number of recognition neurons, r, and the number of synaptic connections, c (the number of synaptic connections to a neuron in the recognition layer from (3n × 3m)/r neurons in the feature extraction layer). The relation is 9nm = rc, and Table II shows values of r and c used. The recognition layer, which consists of spiking neurons, is an important part of the network. It takes input from the excitatory neurons in the feature extraction layer and summarizes the features extracted by this layer. The system tracks the excitation of neurons in the recognition layer over a short period of simulation time (300 ms) to generate and store an “organization/signature” pattern that is later used to identify similar images. The latter operation constitutes the “recall.” The signature pattern of a recognition neuron is a vector of its transmembrane potentials (E) recorded whenever the neuron fired (see Table I). Neurons that did not fire in the recognition layer, based on their inputs from the feature extraction layer, do not form signature patterns.
Fig. 3. Illustration of formation of the “signature” vectors in the recognition layer where r and k are the number of neurons in the recognition layer and the number of classes, respectively.
During the training (or organization) phase the system creates an index of the organization patterns that correspond to an image class; see Table III for the pseudo-code. When presented with a new image, the system passes it through the network and compares the outcome pattern to the “recall” table in which the previously identified “organizations” are stored. The comparison is based on a similarity measure which is defined below. D. Image Similarity Measure We have experimented with several image similarity measures [55], including the image similarity measure from [7], Euclidean and Hamming distances, as well as spiking information. Since none of them performed satisfactorily, we propose a new similarity measure named the “threshold comparator,”
1702
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 21, NO. 11, NOVEMBER 2010
TABLE II Number of Neurons in the Recognition Layer and the Corresponding Synaptic Connections from the Feature Extraction Layer to the Recognition Layer for 32 × 32 Image No. of neurons (r) in the recognition layer No. of synaptic connections (c)
9 1024
18 512
36 256
72 128
144 64
288 32
576 16
1152 8
2304 4
4608 2
9216 1
The best results are obtained using the shaded number of recognition neurons that correspond to the image size [see also Fig. 7(a) and (b)].
TABLE III Pseudo-Code of an Algorithm to Compute ‘‘Signature’’ Patterns and to Achieve Performance of the System Organization: Computing signature vectors E = [Er,k ] Input: - an image training dataset Ttra = {(xi , yi ) | xi ∈ Rnm , i = 1, · · · , Ntra } with yi ∈ {1, · · · , k}, where xi stands for an image in n × m resolution - a number of recognition neurons r - a choice of synaptic plasticity such as STDP (α+− = 1 and τ+− = 20) or SAPR (α+− = 1) Output: a signature vector for each class E = [Er,k ] Tr.1 Initialize E = [Er,k ]; Tr.2 Initialize the network based on the topology; - Arrange the sensory, feature extraction and recognition layers based on the topology explained in Fig. 2 - Initialize synapses in the processor From sensory to feature extraction layer with fixed synapses of 1.4 In the feature extraction layer with random synapses in [0.6, 2.5] From the feature extraction to recognition layer with random synapses in [0.6, 2.5] Tr.3 For i ← 1 to Ntra Tr.4 For time ← 1 to 300 - Stimulation of the sensory layer; Normalize the intensity of pixels in the image from [0, 255] to [35, 61] Get stimuli for each sensory neuron at the time by a positive-valued sinusoid with amplitude of the normalized pixel intensity and a period of 300 ms - Activation of the feature extraction layer - Activation of the recognition layer Tr.5 For j ← 1 to r Tr.6 If (j th recognition neuron fired a spike) Tr.7 Ei [time × r + j] = transmembrane potential value; Tr.8 End If Tr.9 End For Tr.10 Apply synaptic plasticity by dtd wpq = α+− · PSPpq between neurons p and q, where α+− = 1, and PSPpq is the postsynaptic potential value for the connection between neurons p and q. The synaptic connections are bounded by [0.1, 5]; Tr.11 End For Tr.12 Save only synapses that are over 4; i Tr.13 Update [Er,k ] = average of [Er,k ] and [Er,k ] for each recognition neuron r; Tr.14 End For Tr.15 Return E = [Er,k ]; Recall: Performance of the system Input: an image testing dataset Ttest = {(xi ,yi )|xi ∈ Rnm , i = 1, · · · , Ntest } and E = [Er,k ] Output: a performance of the system in terms of accuracy, precision, recall and harmonic mean Tt.1 For Ninput ←1 to Ntest Tt.2 Generate ENinput based on the steps 3–11 in Organization; Tt.3 Identify image with Threshold Comparator
N
- Calculate Ei,j − Ei,jinput ≤ Eth , where Eth =10;
- Calculate the match scores Mj = #of matches 300 · r; - Select winning class byNinput winning class = arg max Mj ; 1≤j≤k
Tt.4 End For Tt.5 Evaluate the performance of the system based on (10); Tt.6 Return the performance of the system;
SHIN et al.: RECOGNITION OF PARTIALLY OCCLUDED AND ROTATED IMAGES WITH A NETWORK OF SPIKING NEURONS
1703
Fig. 4. Illustration of how the network recognizes an image using the threshold comparator. Only the E values of neurons that fired are used. Eth is the threshold value used for finding the best match.
in which the vector is the transmembrane potential value of the neurons that fired in the recognition layer. The MacGregor’s modified neuron model used here (see Table I) fires a spike of the same absolute voltage when the transmembrane potential (1) reaches the threshold (4). Thus, the model is different in this aspect from the Hodgkin–Huxley neuron model, which can vary its spiking voltage due to rate effects and has no threshold value. Therefore, instead of spike voltages, we use transmembrane potential values in the comparator. That is, the comparator uses both the spike timing information (a discrete value) and the corresponding transmembrane potential (a continuous value) [35], [53]. The comparator “matches” a new image with the closest “organization” (signature) stored in the recall table, much like a new sensory input will influence a given brain region dedicated to processing the input, see Fig. 4. Given a new input image, Ninput , the threshold comparator finds the best match after the recognition layer generates its output. When the output for the new input image is generated, the comparator compares the transmembrane potential of the neurons that have fired with the signature vectors stored in the recall table. If the difference between the two is less than threshold Eth , namely Ninput (7) Ei,j − Ei,j ≤ Eth where E is the transmembrane potential, 1 = i = r and 1 = j = k, then a match is found. After the match is found, the matching score is calculated by dividing all the matches by the total number of neurons in the recognition layer (for simulation time = 300 ms) # of matches (8) 300 · r so that Mj ∈ [0, 1]. The winning class for the input Ninput is Mj =
Ninput winning class =
arg
max Mj .
1≤j≤k
(9)
Fig. 4 illustrates the operation of the threshold comparator.
E. Datasets Four image datasets were used to test the performance of the proposed network. Each dataset has different characteristics, illustrated via a few representative images given in Fig. 5. The Japanese female facial expression (JAFFE) dataset [28] contains 213 images of seven facial expressions which include six basic facial expressions and one neutral expression posed by ten Japanese models. The faces are of 256 × 256 resolutions. The Olivetti Research Laboratory (ORL) (or AT&T) dataset [43] contains face images of 40 people. Each picture is taken at a different time, with slightly varied lighting, different angles, open/closed eyes, glasses/no-glasses, and smiling/nonsmiling facial details. The faces are of 92 × 112 resolutions and are represented using gray scale (one byte per resolution). The Carnegie Mellon University (CMU) dataset [33] contains 640 face images of 20 people. This dataset is characterized by varying the pose (straight, left, right, up), expression (neutral, happy, sad, angry), eyes (wearing sunglasses or not) and resolution. Although this set includes three different scale-resolutions for each image, we used the 128×120 full-resolution images. The University of Manchester Institute of Science and Technology (UMIST) dataset [18] consists of 564 face images of 20 people. It covers a wide range of different angles of poses with variations of race, sex, and appearance. The faces are at the 92 × 112 resolutions. For each dataset, we prepared its occluded and rotated version by introducing occlusions into 50% of images according to the procedure shown in Fig. 6 and by rotating 25% of images by 90°, 180° and 270° each in the clockwise direction. While Fig. 5 shows ten example images for one model from each dataset, the complete occlusion and rotation datasets can be found at http://www.egr.vcu.edu/ cs/dmb/Projects/NSN/TNN− supplementary.html. F. Network Simulation: Organization and Recall Given an input image Ninput the network organizes through the sequence of the following three steps: 1) the excitatory neu-
1704
Fig. 5.
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 21, NO. 11, NOVEMBER 2010
Examples of the original and occlusion and rotation images.
state of the recognition layer is recorded [the actual membrane potential (E)] for the neurons that fired. During the organization/training phase, each input image generates its own signature vector in the recognition layer. An average of each recognition neuron’s output is used to form a signature vector, for all image presentations from every class (see line Tr.13 in Table III). After an input image is processed, we keep only the synapses strengths that exceed a certain threshold (i.e., 4), see line Tr.12 in Table III. If synapses of some neurons did not have their values above the threshold then they are set to 0. Then, the next image in the training data is processed, etc. The formation of the signature vector is based on the spiking activity of neurons in the recognition layer. Basically, if we kept the synapses strengths after the signature vector was formed by presenting the first image to the network, then the neurons which spiked would be the only ones spiking even when other very different images were input because their synaptic strengths were set to a very high value. Thus, resetting the synaptic strength values to zero allows other neurons in the recognition layer to spike when presented with new images. This is done in order to allow all images to contribute to the formation of the signature vector; otherwise, only the first input image would form this vector. During the recall/testing phase, the output of the neurons in the recognition layer for a given input image is used to find the best match with the signature vectors stored in the recall table by using the threshold comparator (see line Tt.3 in Table III). The pseudo-code for the two phases with actual parameter values is provided in Table III. G. Evaluation
Fig. 6. Generation of an occlusion: random numbers are distributed uniformly in the range of random height/width, provided above. Note that n is # of vertical pixels, m is # of horizontal pixels, occ(x, y) is a random occlusion point, occheight is a random height of occlusion rectangle where n/3 = occheight = 2n/3, and occwidth is a random width of occlusion rectangle where m/3 = occwidth = 2 m/3.
rons in the sensory layer are stimulated; 2) the excitatory and inhibitory neurons in the feature extraction layer are activated; and 3) excitatory neurons in the recognition layer are activated. The grey image pixel values entering the sensory layer are normalized to provide proper firings of neurons, and then a sinusoidal function with amplitude of the normalized pixel intensity and a period of 300 ms is used to stimulate the sensory layer (see line Tr.4 in Table III) [3], [48]. The sensory layer receives the input image and passes the signal to the feature extraction layer to extract visual features in terms of the excitatory neurons that fired. Then, the recognition layer neurons fire in a certain pattern according to the input image. Each input image is presented to the network for 300 ms of simulation time. For each simulation, the current
We evaluated the performance of the system using ten-fold cross-validation (10-FCV) as follows. The entire dataset is randomly partitioned into ten subsets; nine subsets are used for training and the remaining one is used for testing. This procedure is repeated ten times, and the results are averaged. 10-FCV is used on both the original images (JAFFE, ORL, CMU, and UMIST) and the same sets of images after we modified them to the occlusion and rotation images. The results of 10-FCV are analyzed using performance measures specified in (10), where TP denotes true positive, TN true negative, FP false positive, and FN false negative predictions TP + TN Accuracy = . (10) TP + TN + FP + FN Evaluation of results using additional performance measures of precision, recall and harmonic mean are provided at http:// www.egr.vcu.edu/cs/dmb/Projects/NSN/TNN− supplementary. html.
III. Results and Discussion The system is tested using two different settings for the four image datasets (see Section II-E). For computational efficiency the images are compressed into 32 × 32 (in pixels) size. The results reported below are based on two experiments. One compares performance of the system using different number
SHIN et al.: RECOGNITION OF PARTIALLY OCCLUDED AND ROTATED IMAGES WITH A NETWORK OF SPIKING NEURONS
1705
of neurons in the recognition layer, with the SAPR and STDP rules. The second compares the results while using only the SAPR (because it performed better) with a recognition system implemented using support vector machine (SVM). The SVMbased solution was used since this classifier was previously shown to provide accurate results on the JAFFE and ORL datasets [1], [45]. A. Experiment 1: Comparison of Designs With Different Number of Neurons in the Recognition Layer Using the SAPR and STDP Rules The network was tested with different numbers of neurons in the recognition layer for each dataset. Fig. 7 only shows the trend of performance of the system in terms of accuracy. During the organization phase, the network processes instances of faces and monitors the “signature” vectors generated at the output of the recognition layer. Next, the network is tested using unseen face instances. As described in Section II-D, we use the threshold comparator (with Eth = 10) to measure similarity between the outputs of the recognition layer. We observe that increasing the number of the recognition neurons up to a certain level leads to improved performance for both the SAPR and STDP rules; see Fig. 7(a) and (b). For certain combinations of the number of recognition neurons and the number of synaptic connections they receive from the feature extraction layer, the system performs well. This agrees with the hypothesis that the “brain” allocates different neurons (using a fixed number) for processing a recognition task. This idea is similar to the idea of polychronization [21], or neural Darwinism, embodied in the Theory of neuronal group selection [13]. Fig. 7(a) and (b) also shows that the best results are obtained using the number of synaptic connections between c = 4 and c = 16. The sharp drop in performance from the number of recognition neurons r = 2304 to r = 4608 is caused by too few synaptic connections (c = 2 and c = 1, respectively) to the recognition layer (see Table II). We also observe that the designed network is characterized by a relatively good performance for r between 72 and 2304, especially for the original images processed with the SAPR rule, which would allow for computationally efficient implementations that use smaller number of neurons. The performance comparison while using the SAPR and STDP rules is shown in Fig. 7(c). This is done separately for the original images and the occlusion and rotation images. The average and standard deviation of the difference between using the SAPR and STDP are calculated for each recognition neuron. In general, the SAPR performs better than the STDP, in particular on the occlusion and rotation images. The differences are between −0.84% and 36.02% for the original images and between 1.5% and 23.15% for the occlusion and rotation images. Although for r between 144 and 2304 using the original images both plasticity rules give comparable results (differences are less than 1%), in case of the occlusion and rotation images the network using SAPR is always better. Although we do not fully understand why the SAPR performs better than the STDP in our application, we argued in the original paper where the rule was introduced
Fig. 7. Performance of the proposed system in terms of accuracy shown in (10) with different number of neurons in the recognition layer using 10-FCV. (a) Using SAPR. (b) Using STDP. (c) Comparison of the SAPR and STDP rules on the original (black bars) and the occlusion and rotation (gray bars) images.
[48] that it was more biologically relevant because it relied purely on the PSPs. The significant difference between the two rules is that the SAPR is continuous around zero (see Fig. 1), which was the reason for its development; we were looking for a synaptic plasticity rule that would not allow for big changes (STDPt) SAPR(t) at t → 0) for small time differences. Fig. 8 shows the quality of the extracted key face features with and without inhibitory neurons in the feature extraction layer. Adjusting the synaptic strength by a small amount (according to SAPR, as opposed to STDP) when pre and postsynaptic neurons fire very closely in time results
1706
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 21, NO. 11, NOVEMBER 2010
Fig. 8. Graphical comparison of the results generated on the feature extraction layer using the SAPR and STDP rules on one of the images, (a) and (b) with inhibition, and (c) without inhibition. The excitatory neuron firings are shown in red at the output of the feature extraction layer. SAPR is less prone to saturate in areas of low contrast/information than STDP.
Fig. 9. (a) and (b) Performance comparison between the network of spiking neurons using SAPR and the SVM in terms of accuracy shown in (10).
in better recognition for the occlusion and rotation images. Thus, in all subsequent tests we use only the SAPR learning. B. Experiment 2: Performance Comparison With SVM In this experiment, the proposed system that uses the SAPR is compared with the SVM on the (a) original and (b) occlusion and rotation datasets, without any preprocessing except for rescaling to the 32 × 32 size [see Fig. 9(a) and (b)]. The occlusion and rotation datasets were prepared as described in Section E. However, in order to calculate the standard deviation error for Fig. 9(b) we generated ten different variations of the data, which can be seen at http://www.egr. vcu.edu/cs/dmb/Projects/NSN/TNNsupplementary.html. Note that there are no error bars for the original images because the results shown in Fig. 9(a) are based on the 10-FCV.
We used 1152 neurons in the recognition layer in all experiments. To encode the images as inputs to the SVM, we converted each image into the 1024-dimentional (which corresponds to the total number of pixels) intensity vector. 10-FCV was performed on each dataset. We report the SVM result for the 2nd degree polynomial kernel with the value of the complexity constant C = 1. This kernel provides favorable predictive performance when compared with the 1st and 3rd degree polynomial and Radial Basis Function kernels. As shown in Fig. 9, SVMs perform slightly better on all original datasets, except the CMU. However, in the case of the more challenging occlusion and rotation datasets, the system using the SAPR outperforms the SVMs in terms of accuracy. Results for other performance measures are provided at http:// www.egr.vcu.edu/cs/dmb/Projects/NSN/TNN− supplementary. html. To achieve good performance using the SVM on occluded datasets, several preprocessing tasks were performed [12], [19]. In contrast, our system does not require any preprocessing of the images. One of the key challenges in object recognition is to achieve invariance to object transformations. The especially challenging case concerns objects that are transformed in a nonlinear way, which is characteristic of the occlusion and rotation images that are considered here. IV. Conclusions We developed a novel image recognition system, based on a network of spiking neurons, for solving computationally challenging task of recognizing partially occluded and rotated face images. The developed network is hierarchical and consists of three layers including the input sensory, feature extraction, and recognition layers. During organization/training phase, the system generates signature vectors and saves them in the
SHIN et al.: RECOGNITION OF PARTIALLY OCCLUDED AND ROTATED IMAGES WITH A NETWORK OF SPIKING NEURONS
recall table. During the recall/testing phase, the new output is matched with the stored signature vectors. The matching is performed using the introduced image similarity measure, called threshold comparator, based on the transmembrane potential values of neurons that generated spikes. Although it is known that information transfer between the neurons in the brain is much slower than that in modern computers, its organization allows for superior human performance in difficult tasks like image recognition. Thus, we designed the network using simple clues from the still limited knowledge of how the brain processes information. They included hierarchical network of spiking neurons, random synaptic connections, and the biologically plausible learning rules. Using the system we compared performance of the SAPR and STDP synaptic plasticity rules on the original face images and on the more difficult occlusion and rotation images. The results showed that the SAPR performed better on both. We attributed it to its continuity around zero that resulted in small adjustments in contrast to big adjustments when using STDP. We have also shown that a satisfactory performance of the system can be achieved using a specific number of the recognition neurons and a specific number of synaptic connections, given an image size. Next, we compared the system with the SVM classifier. The SVM performed negligibly better on three out of four original images. However, our system significantly outperformed SVM, by 5% to 10% in terms of accuracy, on the challenging occlusion and rotation images. The result showed that in solving difficult (for computers) imaging problems it was beneficial to borrow the ideas/mechanisms of how brains process information. We do not claim, however, any biological plausibility of the system. It is interesting to note that if problems cannot be assumed independent (in our case the two problems are occlusion and rotation) then their combination may define a new much more difficult problem, which is not simply the sum of the isolated problems. Although the presented results are good a detailed study of all aspects of the network is planned to fully understand fundamental principles governing its behavior. Acknowledgment The authors would like to thank the reviewers for helping them to improve this paper. We are also grateful to L. Keniston and C. Nguyen for their helpful comments on the paper. References [1] A. Amine, S. Ghouzali, M. Rziza, and D. Aboutajdine, “An improved method for face recognition based on SVM in frequency domain,” Mach. Graph. Vision, vol. 18, no. 2, pp. 187–199, 2009. [2] P. F. Baldi and K. Hornik, “Learning in linear neural networks: A survey,” IEEE Trans. Neural Netw., vol. 6, no. 4, pp. 837–858, Jul. 1995. [3] G. Barrionuevo and T. H. Brown, “Associative long-term potentiation in hippocampal slices,” Proc. Natl. Acad. Sci., vol. 80, no. 23, pp. 7347– 7351, 1983. [4] G. Bi and M. Poo, “Synaptic modifications in cultured hippocampal neurons: Dependence on spike timing, synaptic strength, and postsynaptic cell type,” J. Neurosci., vol. 18, no. 24, pp. 10464–10472, 1998.
1707
[5] R. Chellappa, C. L. Wilson, and S. Sirohey, “Human and machine recognition of faces: A survey,” Proc. IEEE, vol. 83, no. 5, pp. 705–741, May 1995. [6] K. J. Cios, W. Pedrycz, R. W. Swiniarski, and L. A. Kurgan, Data Mining: A Knowledge Discovery Approach. Berlin, Germany: Springer, 2007. [7] K. J. Cios and I. Shin, “Image recognition neural network: IRNN,” Neurocomputing, vol. 7, no. 2, pp. 159–185, 1995. [8] K. J. Cios, W. Swiercz, and W. Jackson, “Networks of spiking neurons in modeling and problem solving,” Neurocomputing, vol. 61, pp. 99–119, Oct. 2004. [9] K. J. Cios and L. M. Sztandera, “Ontogenic neuro-fuzzy algorithm: FCID3,” Neurocomputing, vol. 14, no. 4, pp. 383–402, 1997. [10] G. Deco and E. Rolls, “A neurodynamical cortical model of visual attention and invariant object recognition,” Vision Res., vol. 44, no. 6, pp. 621–642, 2004. [11] A. Delorme and S. J. Thorpe, “Face identification using one spike per neuron: Resistance to image degradations,” Neural Netw., vol. 14, nos. 6–7, pp. 795–803, 2001. [12] O. Déniz, M. Castrillón, and M. Hernández, “Face recognition using independent component analysis and support vector machines,” Pattern Recognit. Lett., vol. 24, no. 13, pp. 2153–2157, 2003. [13] G. M. Edelman, Neural Darwinism: The Theory of Neuronal Group Selection. New York: Basic Books, 1987. [14] E. Fiesler and K. J. Cios, “Supervised ontogenic neural networks,” Handbook on Neural Computation: Supplement 1, E. Fiesler and R. Beale, Eds. New York: Taylor and Francis, 1997. [15] S. Fusi, “Hebbian spike-driven synaptic plasticity for learning patterns of mean firing rates,” Biol. Cybernet., vol. 87, nos. 5–6, pp. 459–470, 2002. [16] S. Fusi, M. Annunziato, D. Badoni, A. Salamon, and D. J. Amit, “Spikedriven synaptic plasticity: Theory, simulation, VLSI implementation,” Neural Comput., vol. 12, no. 10, pp. 2227–2258, 2000. [17] W. Gerstner and W. M. Kistler, Spiking Neuron Models: Single Neurons, Populations, Plasticity. Cambridge, U.K.: Cambridge Univ. Press, 2002. [18] D. B. Graham and N. M. Allinson, “Characterizing virtual eigensignatures for general purpose face recognition,” in NATO ASI Series F, Computer and Systems Sciences, H. Wechsler, P. J. Phillips, V. Bruce, F. Fogelman-Soulie and T. S. Huang, Eds. Berlin, Germany: SpringerVerlag, 1998. [19] B. Heisele, P. Ho, J. Wu, and T. Poggio, “Face recognition: Componentbased versus global approaches,” Comput. Vision Image Understand., vol. 91, nos. 1–2, pp. 6–21, 2003. [20] A. L. Hodgkin and A. F. Huxley, “A quantitative description of membrane current and its application to conduction and excitation in nerve,” J. Physiol., vol. 117, no. 4, pp. 500–544, 1952. [21] E. M. Izhikevich, “Polychronization: Computation with spikes,” Neural Comput., vol. 18, no. 2. pp. 245–282, 2006. [22] H. R. Kanan, K. Faez, and Y. Gao, “Face recognition using adaptively weighted patch PZM array from a single exemplar image per person,” Pattern Recognit., vol. 41, no. 12, pp. 3799–3812, 2008. [23] E. R. Kandel, J. H. Schwartz, and T. M. Jessell, Essentials of Neural Science and Behavior. New York: McGraw-Hill, 1996. [24] J. Konorski, Conditioned Reflexes and Neuron Organization, Cambridge, U.K.: Cambridge Univ. Press, 1948. [25] M. Lengyel, J. Kwag, O. Paulsen, and P. Dayan, “Matching storage and recall: Hippocampal spike timing-dependent plasticity and phase response curves,” Nature Neurosci., vol. 8, no. 12, pp. 1677–1683, 2005. [26] J. Lisman and N. Spruston, “Postsynaptic depolarization requirements for LTP and LTD: A critique of spike timing-dependent plasticity,” Nature Neurosci., vol. 8, no. 7, pp. 839–841, 2005. [27] J. Lovelace and K. J. Cios, “A very simple spiking neuron model that allows for modeling of large, complex systems,” Neural Comput., vol. 20, no. 1, pp. 65–90, 2008. [28] M. J. Lyons, S. Akamatsu, M. Kamachi, and J. Gyoba, “Coding facial expressions with gabor wavelets,” in Proc. 3rd IEEE Int. Conf. Autom. Face Gesture Recognit., Apr. 1998, pp. 14–16. [29] R. J. MacGregor, Theoretical Mechanics of Biological Neural Networks. San Francisco, CA: Academic, 1993. [30] T. Masquelier and S. J. Thorpe, “Unsupervised learning of visual features through spike timing dependent plasticity,” PLoS Comput. Biol., vol. 3, no. 2, pp. 247–257, 2007. [31] W. S. McCulloch and W. H. Pitts, “A logical calculus of the ideas immanent in nervous activity,” Bull. Math. Biophys., vol. 5, no. 4, pp. 115–133, 1943.
1708
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 21, NO. 11, NOVEMBER 2010
[32] B. W. Mel, “SEEMORE: Combining color, shape, and texture histogramming in a neurally inspired approach to visual object recognition,” Neural Comput., vol. 9, no. 4, pp. 777–804, 1997. [33] T. Mitchell, Machine Learning. New York: McGraw-Hill, 1997. [34] J. Mutch and D. Lowe, “Multiclass object recognition with sparse, localized features,” in Proc. 2006 IEEE Comput. Soc. Conf, Comput. Vision Pattern Recognitk, Jun. 2006, pp. 11–18. [35] T. Ohno-Shosaku, T. Maejima, and M. Kano, “Endogenous cannabinoids mediate retrograde signals from depolarized postsynaptic neurons to presynaptic terminals,” Neuron, vol. 29, no. 3, pp. 729–738, 2001. [36] B. A. Olshausen, C. H. Anderson, and D. C. Van Essen, “A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information,” J. Neurosci., vol. 13, no. 11, pp. 4700–4719, 1993. [37] M. Riesenhuber and T. Poggio, “Hierarchical models of object recognition in cortex,” Nature Neurosci., vol. 2, no. 11, pp. 1019–1025, 1999. [38] M. Riesenhuber and T. Poggio, “Models of object recognition,” Nature Neurosci., vol. 3, no. 11, pp. 1199–1204, 2000. [39] F. Rosenblatt, “The perceptron: A probabilistic model for information storage and organization in the brain,” Psychol. Rev., vol. 65, no. 6, pp. 386–408, 1958. [40] D. M. Sala and K. J. Cios, “Self-organization in networks of spiking neurons,” Aust. J. Intell. Inform. Process. Syst., vol. 5, no. 3, pp. 161– 170, 1998. [41] D. M. Sala and K. J. Cios, “Solving graph algorithms with networks of spiking neurons,” IEEE Trans. Neural Netw., vol. 10, no. 4, pp. 953–957, Jul. 1999. [42] A. Samal and P. A. Lyengar, “Automatic recognition and analysis of human faces and facial expressions: A survey,” Pattern Recognit., vol. 25, no. 1, pp. 65–77, 1992. [43] F. S. Samaria and A. C. Harter, “Parameterization of a stochastic model for human face identification,” in Proc. 2nd IEEE Workshop Applicat. Comput. Vision, Dec. 1994, pp. 138–142. [44] T. Serre, L. Wolf, S. Bileschi, M. Riesenhuber, and T. Poggio, “Robust object recognition with cortex-like mechanisms,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 3, pp. 411–426, Jan. 2007. [45] F. Y. Shih, C.-F. Chuang, and P. S. P. Wang, “Performance comparisons of facial expression recognition in JAFFE database,” Int. J. Pattern Recognit. Artif. Intell., vol. 22, no. 3, pp. 445–459, 2008. [46] S. Song, K. D. Miller, and L. F. Abbot, “Competitive Hebbian learning through spike-timing-dependent synaptic plasticity,” Nature Neurosci., vol. 3, no. 9, pp. 919–926, 2000. [47] W. Swiercz, K. J. Cios, J. Hellier, A. Yee, and K. Staley, “Effects of synaptic depression and recovery on synchronous network activity,” J. Clin. Neurophysiol., vol. 24, no. 2, pp. 165–174, 2007. [48] W. Swiercz, K. J. Cios, K. Staley, L. Kurgan, F. Accurso, and S. Sagel, “New synaptic plasticity rule for networks of spiking neurons,” IEEE Trans. Neural Netw., vol. 17, no. 1, pp. 94–105, Jan. 2006. [49] X. Tan, S. Chen, Z-H. Zhou, and F. Zhang, “Recognizing partially occluded, expression variant faces from single training image per person with SOM and soft k-NN ensemble,” IEEE Trans. Neural Netw., vol. 16, no. 4, pp. 875–886, Jul. 2005. [50] R. D. Traub and R. Miles, Neuronal Networks of the Hippocampus. Cambridge, U.K.: Cambridge Univ. Press, 1991. [51] M. Tsodyks and C. Gilbert, “Neural networks and perceptual learning,” Nature, vol. 431, pp. 775–781, Oct. 2004. [52] X. J. Wang, “Probabilistic decision making by slow reverberation in cortical circuits,” Neuron, vol. 36, no. 5, pp. 955–968, 2002. [53] R. I. Wilson and R. A. Nicoll, “Endogenous cannabinoids mediate retrograde signalling at hippocampal synapses,” Nature, vol. 410, pp. 588–592, Mar. 2001. [54] M. Yanike, S. Wirth, and W. A. Suzuki, “Representation of well-learned information in the monkey hippocampus,” Neuron, vol. 42, no. 3, pp. 477–487, 2004. [55] C-Y. Yen and K. J. Cios, “Image recognition system based on novel measures of image similarity and cluster validity,” Neurocomputing, vol. 72, nos. 1–3, pp. 401–412, 2008. [56] G. P. Zhang, “Neural networks for classification: A survey,” IEEE Trans. Syst., Man, and Cybernet. C: Applicat. Rev., vol. 30, no. 4, pp. 451–462, Nov. 2000.
Joo Heon Shin received the B.S. and M.S. degrees in mathematics from Hankuk University of Foreign Studies, Seoul, South Korea, in 1998 and 2000, respectively, the M.S. degree in computer science from San Diego State University, San Diego, CA, in 2006, and is currently pursuing the Ph.D. degree in computer science from the Department of Computer Science, Virginia Commonwealth University, Richmond, under the supervision of Dr. Cios, in the field of mathematical brain modeling, neural networks, and machine learning. Dave Smith is currently pursuing the M.S. degree in computer science and engineering from the University of Colorado Denver, Denver. His current research interests include artificial neural networks and software engineering.
Waldemar B. Swiercz received the M.S. degree from the AGH University of Science and Technology, Krakow, Poland, and the Ph.D. degree from the University of Colorado at Boulder, Boulder. He is currently a Post-Doctoral Fellow with Massachusetts General Hospital, Boston, and Harvard Medical School, Boston. His current research interests include neuroinformatics and image processing.
Kevin J. Staley received the M.D. degree from the University of California San Diego, San Diego. He completed his post-doctoral research training at Stanford University School of Medicine, Palo Alto, CA. He is currently the Chief of Pediatric Neurology at Massachusetts General Hospital, Boston, and the Joseph P. and Rose F. Kennedy Professor of Child Neurology and Mental Retardation, Harvard Medical School, Boston. His current research interests include neuronal ion transport and neural network dysfunction in epilepsy. Dr. Staley is an Associate Editor for the Journal of Neuroscience.
Terry Rickard received the B.S.E.E. and M.S.E.E. degrees from the Florida Institute of Technology, Melbourne, in 1969 and 1971, respectively, and the Ph.D. degree in engineering physics from the University of California San Diego, San Diego, in 1975. He has 39 years of experience in technology and financial organizations, all of it in management and technology development positions. His professional career includes both executive and research positions with Harris Corporation, Melbourne, Orincon Corporation, San Diego, OptiMark Technologies, Inc., Jersey City, NJ, Lockheed Martin, Denver, CO, and Distributed Infinity, Inc., Larkspur, CO. He has also served as the director of several companies, and is currently the Director of Thorium One, Vancouver, BC, Canada, and Resource Production Advisors, LLC, Denver. He consults for companies in the defense, financial, and mining industries, and develops proprietary trading algorithms for his own investment account. He has authored numerous technical publications in several branches of engineering, and in the fields of electronic market structure, matching algorithms, and trading strategies, which have appeared in refereed technical journals, books, and conference proceedings. In addition, he has authored several issued patents and current patents pending. His engineering technical expertise includes signal processing, optimization, neural networks, fuzzy and expert systems, and graphical knowledge representation and inference for machine intelligence. He has additional expertise in several financial engineering disciplines, including transaction systems, market structures, financial analytics, data mining, derivatives pricing, risk analysis, and trading strategies. His current research interests include type-2 fuzzy logic and conceptual spaces for knowledge representation and inference.
SHIN et al.: RECOGNITION OF PARTIALLY OCCLUDED AND ROTATED IMAGES WITH A NETWORK OF SPIKING NEURONS
Javier Montero received the Ph.D. degree in mathematics from Complutense University, Complutense University of Madrid, Madrid, Spain, in 1982, and has been leading research projects since 1987. He is currently a Professor and Dean of the Faculty of Mathematics, Complutense University of Madrid. He is author of more than 70 research papers in refereed journals such as Approximate Reasoning, Computational Intelligent Systems, Computer and Operational Research, European Journal of Operational Research, Fuzzy Sets and Systems, General Systems, IEEE Transactions on Neural Networks, IEEE Transactions on Systems, Man and Cybernetics, Information Sciences, Intelligent Systems, Journal of Algorithms, Knowledge Based Systems, Kybernetes, Kybernetika, Mathware, Multiple Valued Logic, New Mathematics and Natural Computation, Non-Linear Analysis, Omega, Pure and Applied Geophysics, Remote Sensing, Soft Computing, Top, and Uncertainty, Fuzziness and Knowledge-Based Systems, plus more than 70 refereed papers as book chapters. His current research interests include aggregation operators, preference representation, multicriteria decision aid, group decision making, system reliability theory and classification problems, mainly viewed as application of Fuzzy sets theory. Dr. Montero is currently the President of the European Association for Fuzzy Logic and Technology.
Lukasz Kurgan received the M.S. (Honors) degree in automation and robotics from the AGH University of Science and Technology, Krakow, Poland, in 1999, and the Ph.D. degree in computer science from the University of Colorado at Boulder, Boulder, in 2003. He joined the Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta, Canada, in 2003, where he has been an Associate Professor since 2007. He has published over 80 peer-reviewed articles. His current research
1709
interests include development and application of modern data mining methods in bioinformatics, with focus on analysis of sequence, structure, and function of biologically interesting macromolecules. Dr Kurgan is an Associate Editor of the BMC Bioinformatics, Neurocomputing, Open Proteomics Journal, Journal of Biomedical Science and Engineering, Open Bioinformatics Journal, and Protein and Peptide Letters journals, and has served on program committees of numerous conferences and workshops related to bioinformatics and data mining.
Krzysztof J. Cios received the M.S. and Ph.D. degrees from the AGH University of Science and Technology, Krakow, Poland, the MBA degree from the University of Toledo, Toledo, OH, and the D.Sc. degree from the Polish Academy of Sciences, Warsaw, Poland. He is currently a Professor and Chair of the Department of Computer Science, Virginia Commonwealth University, Richmond. His current research interests include neuroinformatics and data mining. His research has been funded by agencies such as NIH, NASA, NSF, NATO, and U.S. Air Force. He has published three books and about 200 journal and conference articles. Dr. Cios was the recipient of the Norbert Wiener Outstanding Paper Award, the Neurocomputing Best Paper Award, and the Fulbright Senior Scholar Award. He serves on editorial boards of several journals. He is a Foreign Member of the Polish Academy of Arts and Sciences.