Sensorimotor Sequential Learning by a Neural Network Based on

0 downloads 0 Views 155KB Size Report
In H. Malmgren, M. Borga & L. Niklasson (Eds) Artificial Neural Networks in Medicine and Biology. London: Springer, 2000 ... 1 Introduction. 'Hebbian ... 2 Theory. 2.1 Redefined Hebbian learning means adaptive linear filtering. Given a neuron ...
In H. Malmgren, M. Borga & L. Niklasson (Eds) Artificial Neural Networks in Medicine and Biology. London: Springer, 2000 (Series: Perspectives in Neural Computing) pp. 271-276. Proceeding of the ANNIMAB-1 Conference, Göteborg, Sweden, 13-16 May 2000

Sensorimotor Sequential Learning by a Neural Network Based on Redefined Hebbian Learning Karl Theodor Kalveram 1 Institute for General Psychology, Section Cybernetical Psychology and Psychobiology, Heinrich-Heine-University 40225 Duesseldorf/Germany, Universitaetsstr.1 e-mail: [email protected] http://www.psycho.uni-duesseldorf.de/kalveram.htm Abstract A two-jointed arm is used to discuss the conditions under which a neural controller can acquire a precise internal model of a plant to be controlled without the help of an external superviser. The problem can be solved by a 'modified Hebbian rule' ensuring convergence of the synaptic strengths, a feedforward network called 'power network', and a learning algorithm called 'auto-imitation'. The modified Hebbian rule describes a neuron, that - in addition to a number of inputs with plastic weights - has also a teaching input established with a fixed synaptic weight. The power network can adopt accurate models even of non-linear plants like the two-jointed arm when established with modified Hebbian synapses. The auto-imitation algorithm provides the power network with the values to be achieved by the network after learning. The training must be able to generate arbitrary movements, first of low velocity, then of higher velocity.

1

Introduction

'Hebbian learning', an old neuro-psychological concept describing the process of accomplishing associations at the synaptic level [4], is being increasingly confirmed by neuro-biological explanations especially referring to long term potentiation (LTP) [1], [17], [14], [3]. The question tackled in the present paper is whether an organism using such a mechanism is principally capable of acquiring, for instance, feedforward controllers with the complexity and the high precision necessary for goal-directed low-level motor behaviour. Recently it has been shown that at least for a simulated one-jointed arm such an internal model can be established [12]. However, even this seemingly simple task turns out to require a complex learning environment of a particular stucture. This demands a redefinition of the synaptic update mechanism in Hebbian learning, as well as a special type of a three-layered artificial neural network called "power network" [9], and also the embedding of both network and limb system into a special learning algorithm called "autoimitation" [7], [10]. The purpose of the present paper is to demonstrate that in this learning environment an internal model also of a two-jointed arm can be acquired. Because of the dynamical inter-limb interactions due to reactive dynamic consequences and gravitational effects, this is a complex problem that is much more difficult to solve than for the simple case of an one-jointed arm.

2

Theory

2.1

Redefined Hebbian learning means adaptive linear filtering

Given a neuron completely behaving linearly, Hebbian learning can be expressed by the ‘updating rule' w'i=wi+r⋅xi⋅z, where xi represents the (momentary) presynaptic activation at a synapse i, z the postsynaptic potential generated through all n synaptic connections of the neuron with other neurons, wi the momentary synaptic strength, and w'i the resulting synaptic strength in the next time step. r controls the learning rate. In terms of continuous time, this can be written as T

wi = r ⋅

∫ 0

n

xi ⋅ z ⋅ dt , with z =

∑w ⋅x j

j

,

(1)

j=1

where T is the duration of the learning phase. (1) suggests that the strength of Hebbian synapse i is given by the correlation between the two variables xi and z, and, as a consequence, that z represents the required values to be attained after learning. However, this interpretation leads to a serious problem in that the synaptic strength thus defined 1

Supported by grants 417/18-2 and 417/24-1 from Deutsche Forschungsgemeinschaft (DFG)

1

cannot converge in those cases where the correlation between the variables xi and z differs from zero. This means that wi depends on the learning duration T, and if T approaches infinity, wi goes either into positive or negative saturation [13]. However, to process continuous data the synaptic weights of a neuron must also attain scaled values between such limits. Different normalization methods can be applied to get convergence [15], but if they fail, equation (1) is not applicable. Another feature of Hebbian learning is that it is not possible to directly apply desired values to the neuron during learning, except for the non-realistic supposition of an enforcing input that overrides the activation given by all other inputs. However, both problems can be solved if a new synapse with the fixed weight of –1 is added to the neuron. This has the function of a "teaching input" that provides a target variable y to the neuron. This can be expressed by T

n



wi = −r ⋅ xi ⋅ ( z − y) dt

with

∑w ⋅x

z=

j

and

j

T→∞ ,

(2)

j=1

0

where -y represents that part of the postsynaptic potential that originates from the teaching input. If weights can be selected such that the weighted sum z of x1 to xn finally attains y, the problems of convergence and of teaching are both solved in a biological plausible manner. Equation (2) also formally embodies the delta rule, formerly considered as not deducible from original Hebbian learning. Obviously, (2) imposes that learning and the application of that learning must be assigned to mutually exclusive phases, and in the application phase the teaching input must be such that y=0. In original Hebbian learning this teaching input is not provided [1], and thus (2) can be regarded as representing a modified type of Hebbian learning. Assuming that the transfer characteristic between the postsynaptic potential of the neuron and its output is linear, Fig.1 provides an analogue visualization of (2) in terms of the neuron model of McCulloch and Pitts. xi

regular input i Hebbian synapse synapse i

wi x1 w1

multiplier i

I

-r

xn

.... w

....

n

Σ -1 teaching input

y

output

n a = ∑ w j ⋅ xj − y 1 j = 

a

z

Figure 1: Hebbian synaptic learning redefined. Following (2), during learning the integrator I accumulates the products put out by the multiplier i. After learning, the connection to the synapse i is cut, and y is set to zero. Now I holds the output value representing the synaptic wi strengths. All other n synaptic units operate in the same manner. If learning ends with a→0, for all combinations of values of the input variables x1, ..., xn the output a now automatically attains the desired values of y.

Notice that the teaching input neither enforces the neuron to attain a post-synaptic potential equal to the input value of y, nor serves as a re-enforcing input signalling back an error provided by an external supervisor. The intracellular postsynaptic potential itself mirrors the error z-y, and has to be minimized (relaxation principle). Referring to adaptive signal processing, the neuron in Fig.1 with teaching input is an "adaptive linear combiner" capable of "adaptive linear filtering". As has been pointed out [16], the adaptation procedure, if conducted appropriately and using finite time intervals, approaches the minimum error step by step in a least mean squares steepest-descent manner, with r regulating the decrease of error per step.

2.2

The power network realizes multi-dimensional power series

Much evidence exists that the central nervous system is both additive and multiplicative, at least at the spinal level. Processes like sensory gating or descending modulation of reflexes [2] suggest that in these cases premotor efferent activity is a product of two or more variables. In order to map this property of the nervous system, the "power network" [9] can be used. This is a three layer SIGMA-PI network with feedforward architecture, fixed synaptic weights in the hidden layer, and plastic weights in the output layer. Thereby, the hidden neurons multiply potentiated variables coming from the input layer, and the output neurons then compute a weighted sum of these products. If M=number of output neurons and K=number of input neurons, the power network represents M abbreviated K-dimensional power series, or even Taylor series, known to approximate any function to every required degree of precision. In the present investigation, such a power network was used, established with 6 input neurons, 11 hidden nodes, 9 modified Hebbian synapses for the first output neuron, and 6 such synapses for the second output neuron.

2.3

The auto-imitation algorithm needs no external teacher

When modelling a plant, an autonomous controller has to solve the problem to get an internally produced teaching signal. An appropriate algorithm has been proposed by Kalveram [7], later on also called "auto-imitation" or "selfimitation" [8].

2

desired acceleration at1

state feedback

at2

move upper arm

u1 out

u2

z1

u3

te1 out

u5

z2

u6

te2

Q1

a2 v1

move fore arm

teach

u4

a1

v2 Q2

s1 s2

teach Controller (Power network)

Q1

Blind teacher

PG1

Q2

Plant (two-jointed arm)

PG2

Figure 2: Neural controller learning an inverse model of the plant (two-jointed arm of equation (3)) in the framework of autoimitation. The plant to be controlled is given by its tool transformation; i.e., the rule governing the behaviour of the plant in the forward direction (from torques to kinematics).

In Fig.2, the two-jointed arm movable in a plane as given in (3) demonstrates how the principle of auto-imitation is applied. In the learning phase, the plant gets arbitrary torques from the "blind teacher". The outputs z1, z2 of the power network are not connected to the inputs Q1, Q2 of the plant. These torques produce kinematic values of angular acceleration (a1, a2), velocity (v1, v2) and position (p1, p2) at joint 1 (shoulder) and joint 2 (elbow) according to (3). The kinematic outputs are proprioceptively measured and fed back into the network's regular inputs u1 to u6. The torque values, also measured proprioceptively, are simultaneously offered also to the teaching inputs te1 and te2. They function as targets to be attained after learning. This setup should enable the modified Hebbian synapses of the output layer to attain the right strengths. After training, the blind teacher's ouputs are removed, the outputs (z1, z2) of the network are linked with the plant’s inputs, and the connections between u1, u2 and a1, a2 are cut. The position and velocity variables remain connected with the network, because they are required as state feedback. Now, desired angular accelerations (at1, at2) can be fed into the inputs u1, u2 of the network which computes torque outputs forcing the plant to attain the desired accelerations. It should be noticed, that this auto-imitation is not only a "direct method of inverse modelling" [6], but can, as previously pointed out [7], [11], also immediately be related to the reafference principle [5].

3

Simulation of learning to control a two-jointed arm

The simulation experiment is based on the the two-jointed arm, governed by the coupled differential equations [8] DD1 + Cϕ DD 2 − DϕD 22 − 2DϕD 1ϕD 2 − E + B1ϕD 1 = Q1 = K1 (ϕ 01 − ϕ1) Aϕ DD 2 + Cϕ DD1 + DϕD 12 Bϕ

− F + B2ϕD 2 = Q2 = K 2 (ϕ 02 − ϕ 2 )

(3)

where A, B, C, D, E, F denote coefficients depending on the masses, lengths, moments of inertia, and angular positions ϕi of both the limbs. E and F refer to gravitationally induced torques. Bi resp. Ki (i=1,2) represent the coefficients of viscous damping resp. stiffness caused by muscular activity, which is considered to drive the limbs by displacing the mechanical equilibrium positions ϕ0i of the joints. For simulation, the length and mass of both arm segments were set to 0.5 m resp. 1 Kg. Mass was assumed to be symmetrically distributed over the segment lengths yielding moments of inertia of 0.1 Kg⋅m2. The selected values of damping and stiff-ness were Bi=0.5 N⋅m⋅s and Ki= 5 N⋅m. Initially, all Hebbian synapses were set to zero strength. The training runs endured 300 s. The blind teacher generated two approximately sinusoidal torque patterns, which were sent to the torque inputs of the arm and the teaching inputs of the power net. In condition 'hom', during the whole training the torque frequencies were f1=0.01 and f2=0.05. In condition 'spl', the training phase was split: In the first half, frequencies of f1=0.01 and f2=0.05 Hz were applied as before, in the second half frequencies of f1=0.1 and f2=0.5. The amplitudes in both modes were set to 4 and 2 N⋅m. All computations where performed using MATLAB and SIMULINK.

3

4

Results and discussion

Table 1 shows, that very exact identifications of the inverse model could be achieved with split training procedure 'spl'. This uses low velocity torque trajectories to train the network in the first half of a training session, followed by high velocity trajectories in the second half. In the low velocity part, predominantly synaptic Table 1: "True" synaptic weights of the synapses S1-S15, and weights finally attained in simulation runs under the training conditions 'hom' and 'spl'. Syn. true hom spl

S1 .45 .37 .45

S2 .125 .001 .126

S3 .1 -.12 .1

S4 -.125 -.16 -.124

S5 -.25 .23 -.25

S6 .125 .362 .126

S7 .5 .38 .50

Syn. true hom spl

S9 2.5 4.1 2.5

S10 .1 -.6 .1

S11 .125 .475 .125

S12 .1 -.01 .10

S13 .125 .009 .125

S14 .5 .68 .5

S15 2.5 1.6 2.5

S8 7.5 5.7 7.5

connections responsible for compensation of gravitational forces were established, while in the high velocity part synapses necessary for balancing out dynamical interactions between the limb segments were programmed. In the homogeneous condition 'hom' , however, model identification was bad. Therefore, unlike to the simultaneous LSQ-rule descripted in [8], [10], the sequential algorithm applied in the present paper didn’t produce reasonable synaptic weights in every case. The question, whether high precision low level sensorimotor learning is principally possible using a setup including auto-imitation and a power network with synapses from redefined Hebbian type, can be answered by “yes”, so far as training is appropriate. Otherwise “life-long” motor deficiencies can occur.

References [1] Brown, T.H., Chattarji, S. (1995) Hebbian synaptic plasticity. In M. Arbib (Ed) The handbook of brain theory and neural networks. MIT Press (pp. 454-459) [2] Gossard, J.P. and S. Rossignol, 1990. Phase-dependent modulation of dorsal root potentials evoked by peripheral nerve stimulation during fictive locomotion in the cat. Brain Research 537, 1-13 [3] Hagemann, G., Redecker, C., Neumann, D., Haefelin, T. Freund, H. J. and O. D. Witte, 1998. Increased long-term potentiation in the surround of experimentally induced focal cortical infarction. Ann. Neurol. 44, 255-258 [4] Hebb, D. O., 1949. The organization of behaviour. New York: Plenum Press [5] Holst, E. von and H. Mittelstaedt, 1950. Das Reafferenzprinzip. Naturwissenschaften 37, 464-476 [6] Jordan, M. I., 1988. Supervised learning and systems with excess degrees of freedom. COINS Technical Report 88-27, 1-41 [7] Kalveram, K.Th., 1981. Erwerb sensumotorischer Koordinationen unter störenden Um-welteinflüssen: Ein Beitrag zum Problem des Erlernens von Werkzeuggebrauch. (Engl. titel: Acquisition of sensorimotor co-ordinations under environmental disturbations. A contribution to the problem of learning to use a tool) In: L. Tent (Ed.): Erkennen, Wollen, Handeln. Festschrift für Heinrich Düker (S. 336-348). Göttingen: Hogrefe [8] Kalveram, K.Th., 1992. A neural network model rapidly learning gains and gating of reflexes necessary to adapt to an arm's dynamics. Biological Cybernetics 68, 183-191 [9] Kalveram K.Th., 1993a. Power series and neural-net computing. Neurocomp. 5, 165-174 [10] Kalveram K.Th., 1993b. A neural network enabling sensorimotor learning: Application to the control of arm movements and some implications for speech-motor control and stuttering. Psychological Research 55, 299-314 [11] Kalveram K. Th., 1998. Wie das Individuum mit seiner Umwelt interagiert. Psychologische, biologische und kybernetische Betrachtungen über die Funktion von Verhalten (engl. titel: How the individual interacts with its environment. Psychological, biological and cybernetical considerations about the function of behaviour). Lengerich: Pabst [12] Kalveram, K. Th., 1999. A modified model of the Hebbian synapse and its role in motor learning. Hum. Mov. Science 18, 185199 [13] MacKay, D.J. and K.D. Miller, 1990. Analysis of Linsker's simulations of Hebbian rules to linear networks. Network 1, 257297 [14] Pennartz, C.M.A. 1997. Reinforcement learning by Hebbian synapses with adaptive thresholds. Neuroscience 81, 303-319 [15] Shouval, H.Z. and M.P. Perrone, 1995. Post-Hebbian learning rules. In: M. A. Arbib (ed.) The handbook of brain theory and neural networks. Cambridge: The MIT Press (pp. 745-748) [16] Widrow, B. and S.D. Stearns, 1985. Adaptive signal processing. Englewood Cliffs: Prentice-Hall [17] Wilson, R.I., J. Yanovsky, A. Gödecke, D.R. Stevens, J. Schrader, and H.L. Haas, 1997. Endothelial nitric oxide synthase and LTP. Nature 386, 338

4

Suggest Documents