Sensorimotor Control Learning Using a New Adaptive Spiking Neuro-Fuzzy Machine, Spike-IDS and STDP Mohsen Firouzi1,2,3, Saeed Bagheri Shouraki4, and Jörg Conradt1,2,3 1
Neuroscientific System Theory-Technische Universität München, Germany 2 Bernstein Center for Computational Neuroscience, München, Germany 3 Graduate School of Systemic Neurosciences-Ludwig-Maximilians-Universität, München, Germany 4 Research Group of Brain Simulation and Cognitive Science, ACL, Sharif University of Technology, Tehran, Iran {mohsen.firouzi,conradt}@tum.de,
[email protected]
Abstract. Human mind from system perspective deals with high dimensional complex world as an adaptive Multi-Input Multi-Output complex system. This view is theorized by reductionism theory in philosophy of mind, where the world is represented as logical combination of simpler sub-systems for human so that operate with less energy. On the other hand, Human usually uses linguistic rules to describe and manipulate his expert knowledge about the world; the way that is well modeled by Fuzzy Logic. But how such a symbolic form of knowledge can be encoded and stored in plausible neural circuitry? Based on mentioned postulates, we have proposed an adaptive Neuro-Fuzzy machine in order to model a rule-based MIMO system as logical combination of spatially distributed Single-Input Single-Output sub-systems. Each SISO systems as sensory and processing layer of the inference system, construct a single rule and learning process is handled by a Hebbian-like Spike-Time Dependent Plasticity. To shape a concrete knowledge about the whole system, extracted features of SISO neural systems (or equivalently the rules associated with SISO systems) are combined. To exhibit the system applicability, a single link cart-pole balancer as a sensory-motor learning task, has been simulated. The system is provided by reinforcement feedback from environment and is able to learn how to get expert and achieve a successful policy to perform motor control. Keywords: Sensorimotor Control Learning, Spiking Neural Networks, NeuroFuzzy, Spike Time Dependent Plasticity, Cart-Pole balancing.
1
Introduction
In order to exploit human intelligence, brain has always been looked through two general outlooks [1]. The Micro-Level studies which leads to connectionism paradigms in AI, e.g. ANNs; and Macro-Level studies, e.g. clinical researches leads to Symbolism in AI where machines model real world by creating formal symbols trying to acquire knowledge by manipulating them and discovering their relations. Expert Systems and Fuzzy Rule Base are good examples of symbolism. One open question S. Wermter et al. (Eds.): ICANN 2014, LNCS 8681, pp. 379–386, 2014. © Springer International Publishing Switzerland 2014
380
M. Firouzi, S.B. Shouraki, and J. Conradt
as main motivation of this work is how symbolic form of knowledge and rules can be emerged by a biologically realistic and connectionist style of information processing like brain? Hybrid systems in Machine Learning address this question [1]. In this paper we have proposed a new Adaptive Spiking Neuro-Fuzzy Inference System called Spike-IDS evaluated for sensorimotor learning. The architecture of Spike-IDS is motivated by a recursive Fuzzy algorithm called ALM [2]. The main underlying inspiration of proposed system in terms of architecture is Brain functionally distributed structure. Also data is encoded by delay coding through topographically arranged first-order Spike-Response-Model neurons [4]. From system viewpoint, Spike-IDS and ALM is motivated by reductionism in philosophy of mind, by which real world is internalized by human mind as combination of partial knowledge (rules) or emerged by combination of distributed modules [3]. Using this architecture and breaking down a MIMO system into set of SISO subsystems, information is fetched just from part of the whole system during learning and evaluation process without any recursion. It is shown that this style of information processing is faster than neural classifiers and even ANFIS [5]. To construct and modify the rules through synaptic distribution, Hebbian-like STDP learning is provided. In fact each SISO system provides a single rule and captures two important features of sensorimotor sub-space through synaptic distribution: a) input-output characteristic or equally rule consequent part; and b) its degree of contribution in whole MIMO system which is inversely related to output standard deviation around consequent and slope of input-output surface. Finally a Fuzzy rule base consolidates sub-systems (rules) to shape a concrete form of sensory-motor transformation. As a practical application, a cart-pole sensorimotor learning task is evaluated. The results shows system can learn to balance the stick by sensory-motor experiences without external supervisor like actor-critic reinforcement learning in which actor acts as a high level model of action selection in primary motor cortex and critic estimates cost of the action like Basal Ganglia [6]. In the next section, Spike-IDS is described. In section 3 an adaptive sensorimotor learning task has been investigated, and finally in section 4 some remarks is presented.
2
Spiking Neuro-Fuzzy Inference System, Spike-IDS
As is shown in Fig.1 Sensory space (x1 and x2) has been partitioned into fuzzy sets and Spike-IDS consists of three general layers. Input layer or Sensory layer, where single input (x1 or x2) would activate corresponding SISO systems hereafter SpikeIDS units, according to Fuzzy membership value of inputs. For instance in doubleInput-single-output system of Fig.1-left, if each input domain is segmented into two intervals (big and small), there are overall four Spike-IDS units capturing projected SISO points of sensory-motor sub-spaces as follows: X11 = {(x1,y)|x2∈[0,0.5]}, X12 = {(x1,y)|x2∈[0.5,1]}, X21 = {(x2,y)|x1∈[0,0.5]} and X22 = {(x2,y)|x1∈[0.5,1]}. Second layer or processing layer of the algorithm extracts SISO system characteristic trajectory, hereafter Narrow path (ψij in Fig.1) and its effectiveness in entire system or Spread value (σij in Fig.1). Spread indicates the deviation of motor output and derivative of sensory-motor surface around extracted Narrow path and shows how much the subsystem contributes in overall decision process. Eventually extracted partial features of sensory space (Narrow, Spread and fuzzy membership values) should be combined by
Sensorimotor Control Learning Using a New Adaptive Spiking Neuro-Fuzzy Machine
381
Fig. 1. Left: General Architeccture of Spike-IDS for 2-Input, 1-output System; Right: Struccture of RBF-like Spiking Neural Neetwork model for single Spike-IDS unit (processing layer)
Inference Layer to achieve a unified form of decision (motor output). Consequent ppart of single rule is determined d by corresponding Spike-IDS unit characteristic or equiivalently Narrow path. Durin ng on-line learning phase, new sensory-motor experieence accompanied with its costss of action (see section 3), result in a local adaptationn of SISO characteristic functio on or equally consequent part of associated Fuzzy rulee. In this section network archittecture, data coding, learning algorithm of each units and inference layer will be discu ussed in detail. 2.1
Structure of Spike--IDS Units, SISO Sub-systems
As is depicted in Fig.1-rig ght, each Spike-IDS unit is constructed as a single laayer feed-forward Network of SRM S neurons with multiple delayed synaptic terminals and overlapping Gaussian Receeptive Fields [4]. Each sub-synapse has a constant deelay (dk) and a plastic synaptic weight w (Wki,j). Membrane potential for post-synaptic neuuron j with m sub-synaptic conneection can be expressed as (1):
x j (t ) =
i∈Γ j
m k =1
w ε ( t − t i − d ) , ε (t ) = k i, j
k
t
τ
e
t (1− )
τ
(1)
Where ε is simplified model m of Post-Synaptic Potential [7]; ti is firing time of ith k Pre-Synaptic neuron, d is fixed f delay for kth sub-synapse which is arranged from zzero k to m-1, (d = {0, 1,…, m-1}}); and Wki,j is kth sub-synaptic weight between i, j neuroons. Also Γj is the set of pre-syn naptic neurons which are connected to post-synaptic neuuron j. When internal state variab ble, xj exceeds threshold voltage ϑ, neuron j will fire. 2.2
Sensor Encoding
The input-output of Spike-IIDS units are encoded using spatially arranged populatiions of SRM neurons with overrlapping Gaussian Receptive Fields. The coding we hhave used is population delay co oding in which more active neuron would fire earlier and
382
M. Firouzi, S.B. Sho ouraki, and J. Conradt
would contribute earlier in post-synaptic firing, than less active neurons. In Fig.2--left RFs for a population with 8 neurons distributed in normal interval [0, 1] is sho wn. Normalized firing time of these neurons for 0.3 is shown in Fig.2-right. Those nneurons with firing time biggeer than 0.9 ms are supposed as silent neurons. The cennter (Ci) and width (ɷi) of ith neu uron RF for a population with n neurons is defined as:
Ci =
1 2(i − 3) , ωi = 2(n − 2) γ i (n − 2)
(2)
The number of neurons and a width of RFs (adjusted by γ) tune the degree of fuzzziness of sensory-motor dataa in Spike-IDS units. Moreover neurons can fire once iin a specific time window whicch is set to 10 ms regarding typical neuron refractory tiime [7] (Tw in Fig.2-right).
Fig. 2. Left: Gaussian recepttive fields for 8 neurons encoding 0.3 as input; Right: Genneral scheme of neurons spike time delay for 0.3 as input.
2.3
Hebbian-Like Spik ke Time Dependent Plasticity
The learning algorithm in Spike-IDS networks is reinforcement Hebbian STDP [7]; by which the pre-synaptic neurons that contribute earlier in firing of post-synapptic neuron should be rewarded and silent neurons should be penalized. Therefore a leaarning window L(Δt) which determines d the way of weight modification as functionn of firing time delay between postsynaptic p and presynaptic neurons is defined as folloowing equations (Δtijk = ti - tj + dk):
Δwijk = ηL ( Δt ijk ), winit = 0, 0 < w < 3 L(Δt ) = (1 + b)e
( Δt −δ )2 2(κ −1)
− b, κ = 1 −
υ2
2 ln(b / b + 1)
(3)
(4)
This function potentiatess synapses between neurons i, j with rate η if Δti,jk < ν and depress synaptic weights iff Δti,jk > ν (Fig.3-left). Due to exponential model of EP PSP with time constant τ (1), the t firing of neuron i contribute in firing of neuron j not exactly after spike initiation n. Therefore learning window should be shifted slightlyy to
Sensorimotor Control Learning Using a New Adaptive Spiking Neuro-Fuzzy Machine
383
take it into consideration (δ ( in Fig.3-left sets to –τ). Also in (4), parameter ν annd b indicates reward neighborh hood and penalty depth respectively. Besides rewardding contributing neurons, silen nt neurons should be strongly penalized. So Δwij for ssubsynaptic weights between siilent input neurons and fired output neuron has set to –ηη. It is worth to mention that if Spike-IDS is provided by reinforcement signal durring learning (see section 3 and Fig.3-right), and if initiated control command was destrructive according to sensory in nputs and controller policy, Δwij in (3) should be negattive to erase the effect of wrong g sensory-motor mapping in current internal model.
Fig. 3. Left: STDP learning window w of Spike-IDS units; Right. Block diagram of sensorim motor learning controller using Spikee-IDS for cart-pole task
2.4
Sensorimotor Charracteristics, Narrow and Spread Decoding
In evaluation phase the Naarrow and Spread values in response to sensory input xin, should be decoded from acctivity of output population. Firing time of output neurrons indicates Fuzzy-like activattion degree of topographically arranged neurons. So sim mply by Center of Mass decodin ng over firing time of each neuron Narrow could be callculated. Similarly difference between first and last fired neuron RF, is a simple and efficient way of Spread deccoding:
)=
mo
ψ ( xin
T Cj
j =1 j mo
T j =1 j
, σ ( x in ) = C last − C first
(5) Where ψ and σ are extrracted Narrow and Spread for xin, mo is number of outtput neurons, Tj is firing time off output neuron j and Cj is center of jth neuron RFs. Clast and Cfirst are centers of receptivee fields for last and first fired neurons. 2.5
Inference Layer
Inference layer of the algorrithm including constructed rules uses Narrow and Sprread values to realize entire systtem characteristic. In the case of N-sensory inputs withh mi partitions for single ith senssory space (xi), the number of rules and associated SpiikeIDS units for ith input is den noted by li and total number of rules, L is as bellow:
384
M. Firouzi, S.B. Shouraki, and J. Conradt
L = i=1 li = i=1 ∏k =1,k ≠i mk N
N
N
li = ∏k =1,k ≠i mk N
,
(6)
Also the kth rule of the ith input variable, Rik (k = 1, 2… li) can be described as bellow:
Rik : ifx1 ∈ A1j1 ∧ x2 ∈ A2j2 ∧ ∧ xi −1 ∈ Aiji−−11 ∧
(7)
xi +1 ∈ Aiji++11 ∧ ∧ x N ∈ A jNN then Y = ψ ik ( xi )
Where ASJs is jsth segment of Sth input (1≤ S ≤ N, S ≠ i); in contrast with learning phase in modeling phase sub-domains are seen as fuzzy segments. Rule Rik would be valid if all antecedent terms of ASJs have non-zero membership degree in (7); then Xik unit is activated with corresponding truth degree of Rik; so its characteristic (Narrow, Spread) contributes in overall sensorimotor decision surface correspondingly. The overall output is obtained by Min–COGD composition:
yˆ is β 11ψ 11 or ... or β ikψ ik or ... or β Nl N ψ
Nl
N
(8)
Where or is S-Norm union operation and βik is normalized combinational term of reverse value of Spread and truth degree of rule Rik described as (9):
β ik =
α ik γ ik
N
lp
α p =1 q =1
pq
, α ik = log(
γ pq
1
σ ik
)
(9) In (9), αik is normalized reverse value of Spread for Spike-IDS unit Xik and γik is Rik truth degree. The logarithmic function applies to make smooth the sharpness of Spread reverse value, leading to smoother general decision surface [5].
3
Sensorimotor Learning, Single Cart-Pole Balancer
To show how proposed Adaptive Neuro-Fuzzy machine works in a real world sensorimotor learning scenario, a single link inverted pendulum task as a famous benchmark for sensorimotor learning has been investigated. The general architecture of the controller has been shown in Fig.3-right. The Action Selection Network would realize internal model of sensorimotor function and Action Evaluation Network would generate costs and rewards associated with generated commands as Basal Ganglia does in brain for motor control [6]. In fact ASN suggests an appropriate action (force signal, Fk) in accordance with current state (θk, dθk). Correspondingly AEN model the reward value (rk) and shows how much valuable or costly the action is (according to control policy: θ=0; dθ=0). If the cost was too big, Stochastic Action Modifier (SAM) regenerates a new uniform random action (F̃k) with mean Fk ; see (10). Eventually the output of SAM would be applied to the plant. The variance of SAM is a function of r̂k so that if critic validates suggested action as a low cost action (r̂k near to 1), it has been changed with less deviation around ASN and vice versa:
~ ~ F k = F k + N F ( e − rˆk α − e − α )
(10)
Sensorimotor Control Learning Using a New Adaptive Spiking Neuro-Fuzzy Machine
385
Where ÑF is a uniform random value between upper and lower boundary of F, α is constant parameter and r̂k is normalized value of rk. The modified action alters state of the plan followed by feedback signals for the evaluation. Then the score of previous action is updated in AEN as (11):
Fig. 4. Up: Critic surfaces, AEN evolution through learning iteration, Down: Actor surfaces, ASN evolution through learning iteration Failure / Success
1 0.5 0 -0.5 -1
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Epoches
Fig. 5. Failure and success through learning iteration Angel trajectory for initial state of 0.7rad, -4rad/sec
Angular Velocity trajectory for initial state of 0.7rad, -4rad/sec Angular Velocity (Degree/Sec)
40 Pole Angle (Degree)
(a) 30 20 10
X: 0.8311 Y: 0.5138
0 Settling Time (5%) -10
0
0.5
Time (S)
1
1.5
15
Angular Velocity (Degree/Sec)
Force trajectory for initial state of 0.7rad, -4 rad/sec (c)
Force (N)
10 5 0 -5
0
0.5
Time (S)
1
1.5
50 (b) 0 -50 -100 -150 -200
0
0.5
Time (S)
1
1.5
State Plane for initial stae of 0.7 rad, -4rad/sec
50
(d)
0 -50 -100 -150 -200 -5
0
5
10
15 20 Angel (Degree)
25
30
35
40
Fig. 6. (a): Angle response trajectory, (b): angular velocity, (c): force trajectory, (d): state transition plane over time; initial state: (θ = 0.7 radian, dθ = -4 radian/sec)
rknew =
(Δ + rˆk + 0.5) Δ = λ (rˆk +1 − rˆk ) 1.5 ,
(11)
386
M. Firouzi, S.B. Shouraki, and J. Conradt
So the action which generates a good state transition into low cost score (rk near 1) should be rewarded and bad actions have to be penalized. Since it is more desirable to achieve successful transitions in contrast with failure, so λ for rewards has been set near to 1 and greater than λ for penalty. The failure signal that indicates unrecoverable falling happens when the angel exceeds from its boundary values. ASN is initialized by uniform random data (Fig.4 a, down) and AEN around the set point and boundary regions has been initialized as 1 and -0.5 respectively (Fig.4 a, up). In Fig.4 the evolution of ASN and AEN through learning iterations has been shown. As is depicted in Fig.5 early experiences result in more failure and less success whereas through learning iteration success rate increases. In this experiment, the mass and the length of the pole are 200 g and 60 cm respectively, the mass of the cart is 500 g and fraction is neglected. ASN and AEN are implemented as Spike-IDS with 121 rules (11×11 partitions), Spike-IDS units have 15 input, 25 output neurons with 12 sub-synaptic connections. Also learning parameters are experimentally set to: τ=3, b=0.2, δ=-3, ν=5, γ=1.4, η=0.3, ϑ=10mv with epoch number 15. After 3000 epochs, response of final controller is evaluated by initial angle and angular velocity twice bigger than the boundary of learning phase. Fig.6 shows angle and angular velocity, applied force signal and state transition plane over 1.5 sec. The results demonstrate successful control task without any overshoot and undershoot and set-point settling time of 0.83 s.
4
Conclusions and Remarks
In this work a new Adaptive Spiking Neuro-Fuzzy Inference machine called SpikeIDS is proposed where rules are extracted through spatially distributed Spiking neural systems. Spike-IDS is mainly inspired by ALM algorithms in which a MIMO systems is described by logical combination of spatially distributed SISO sub-systems. The sensory and processing layer of this algorithm is implemented by biologically realistic principles e.g. Spiking Neural Substrate and Hebbian STDP learning. Also a real time sensorimotor learning task, single pole inverted pendulum is investigated and it is demonstrated that Spike-IDS can successfully learn the internal model of the plant and discover the cost-to-go through sensory-motor experiences.
References 1. Kolman, E., Margaliot, M.: Knowledge-based neurocomputing: A fuzzy logic approach. STUDFUZZ, vol. 234, pp. 1–5. Springer, Heidelberg (2009) 2. Shouraki, S.B., Honda, N., Yuasa, G.: Fuzzy interpretation of human intelligence. International Journal of Fuzziness and knowledge-Based Systems 7(4), 407–414 (1999) 3. Polkinghorne, J.: Belief in god in an age of science, pp. 25–48. Yale University Press, New Haven (1998) 4. Bohte, S.M., La Poutre, H., Kok, J.N.: Unsupervised clustering with spiking neurons by sparse temporal coding and multilayer rbf networks. IEEE Transactions on Neural Networks 13(2), 426–435 (2002) 5. Firouzi, M., Shouraki, S.B., Afrakuti, I.E.P.: Pattern Analysis by Active Learning Method Classifier. Journal of Intelligent & Fuzzy Systems 26(1), 49–62 (2014) 6. Shadmehr, R., Smith, M.A., Krakauer, J.W.: A computional neuroanatomy for motor control. Exp. Brain. Res. 185(3), 359–381 (2008) 7. Gerstner, W., Kistler, W.M.: Spiking Neuron Models, 1st edn. The Cambridge University Press, Cambridge (2002)