A Self-Organizing Multiple Model Architecture for Motor Imitation Axel ...

Int. J. Intelligent Information and Database Systems, Vol. 4, No. 1, xxxx

A Self-Organizing Multiple Model Architecture for Motor Imitation Axel Tidemann* and ¨ urk Pinar Ozt¨ Self-Organizing Systems Group, Department of Computer Science, Norwegian University of Science and Technology Sem Sælandsvei 7-9, 7491 Norway Fax: +47 7359 4466 E-mail: [email protected] E-mail: [email protected] ∗ Corresponding author Abstract: Learning by imitation allows humans to easily transfer motor knowledge between individuals. Our research is aimed towards equipping robots with imitative capabilities, so humans can simply show a robot what to do. This will greatly simplify how humans program robots. To achieve imitative behaviour, we have implemented a selforganizing connectionist modular architecture on a simulated robot. Motion tracking was used to gather data of human dance movements. When imitating the dance movements, the architecture self-organizes the decomposition of movements into submovements, which are controlled by different modules. The modules both collaborate and compete for control during the movement. The trajectory recorded during motion tracking was repeated, revealing recurrent neural activation patterns of the inverse models (i.e. controllers), indicating that the modules specialize on specific parts of the trajectory. Keywords: Human Robot Interaction, Agent Architectures, Neural Networks, Imitation Learning Reference to this paper should be made as follows: Axel Tidemann ¨ urk (xxxx) ‘A Self-Organizing Multiple Model Architecand Pinar Ozt¨ ture for Motor Imitation’, International Journal of Intelligent Information and Database Systems, Vol. 4, No. 1, pp.xxx–xxx. Biographical Notes: Axel Tidemann is a PhD student at the Norwegian University of Science and Technology (NTNU), where he obtained his MSc in Computer Science. His research focus is on learning by imitation, and how this mechanism can be used to model human expressiveness. ¨ urk is an associate professor at Norwegian University of Pinar Ozt¨ science and Technology (NTNU). She has an MSc in Computer Science from the University of Oslo and PhD in artificial intelligence from NTNU. Her current research interests are at the intersection between artificial intelligence, cognitive psychology, neuroscience, and control theory. She is particularly interested in understanding how AI hypotheses

c 200x Inderscience Enterprises Ltd. Copyright

1

2

¨ urk Axel Tidemann and Pinar Ozt¨ and systems can be informed by findings from other relevant research disciplines. A special focus in her research is imitative learning and the involved internal representations. She has previously worked with knowledge modeling and case-based reasoning, and multi-agent systems.

1

Introduction

The ability to learn motor skills by imitating another human being is regarded as an important part of human behaviour. Meltzoff and Moore [1997] believe this is an innate mechanism for humans, since it is observable in neonates only minutes after birth. Piaget [1962] relates imitation to the adaptation of sensory-motor schemas to the external world. Rizzolatti et al. [1996] discovered recurring neural activation both when observing and producing the same action. These neurons were dubbed “mirror neurons”, and are hypothesized to be building blocks of imitation [Schaal, 1999], language [Arbib, 2002] and mind reading [Gallese and Goldman, 1998]. Within the AI community, research on imitation learning can be divided into two groups; solving the correspondence problem, which is the transformation from an extrinsic to an intrinsic coordinate system [Nehaniv and Dautenhahn, 2002], or making the assumption that sensory input has already been transformed into a meaningful representation of the observer, and is ready to be used in an actionperception system [Schaal, 1999]. Model-based learning has been regarded as the most interesting approach to implement imitation learning [Schaal, 1999]. A modelbased approach consists of pairing an inverse model (i.e. controller) with a forward model (i.e. predictor). This approach has been taken by Demiris and Hayes [2002], Demiris and Khadhouri [2006] and Wolpert et al. [2003]. Besides computational neuroscience and robotics, the use of coupled inverse and forward models is also wellknown in the control literature [Jordan and Rumelhart, 1992]. Wolpert et al. [1998] argue that the cerebellum contains inverse/forward model pairings, and draws upon biological inspiration to develop an architecture that uses inverse/forward models. Our architecture is deeply influenced by the work of Demiris and Wolpert, and is implemented using Echo State Networks. We have selected dance as the current field of study, since it clearly demonstrates the need for an imitative capability. Students watch the teacher, and use nothing other than visual input to guide their own motor systems. Apart from imitating the desired movement, our architecture self-organizes the decomposition of movements into submovements. These submovements are learned by modules that become experts of that particular part of the trajectory. Since these decompositions of the movement leads to specialized modules, we regard these decompositions as motor primitives. These motor primitives form the basis of the observed complex movements, and we try to understand how the architecture carves out these niches in the motor control space.

2

The Multiple Paired Models Architecture

Our multiple paired models architecture (abbreviated MPMA) is inspired by Demiris’ HAMMER [Demiris and Khadhouri, 2006] and Wolpert’s MOSAIC [Wolpert

A Self-Organizing Multiple Model Architecture for Motor Imitation

3

et al., 2003] architectures. Both HAMMER and MOSAIC employ multiple paired inverse/forward models, and are architectures for motor learning and control used in an imitation setting. The MPMA seeks to combine the best of both architectures. The HAMMER architecture has a consistent inverse/forward pairing both when observing and executing an action. MOSAIC has a different ordering for observation and execution of an action: when observing an action, the ordering is the same as that of HAMMER. The difference arises when MOSAIC is used for executing an action; then the output of all the inverse models are summed and given as input to the forward models, and the prediction errors of the forward models are used to give responsibilities to the inverse models. This subtle difference is not present in the MPMA, since it complicates the architecture by having two modus operandi. Both HAMMER and MOSAIC have mechanisms to arbitrate between inverse models. The HAMMER architecture uses a confidence measure, based on the prediction error of the corresponding forward models. The MOSAIC architecture uses the same approach, but adds a responsibility predictor that uses context information to determine which inverse model is appropriate for controlling the robot prior to movement. We regard the responsibility predictor as an important part of building an architecture that can be predictive (i.e. knowing what to do before any movement is initiated) as opposed to the more reactive approach of continuously comparing the predicted state with the actual state (which can only be done after the actual state has occurred).

The use of multiple paired inverse/forward models (i.e. splitting motor control over different neural networks) is similar to Jacobs’ mixture of experts [Jacobs et al., 1991]. Furthermore, by spreading motor knowledge across several models the architecture can code redundancy, an important part of robust intelligent systems [Pfeifer and Scheier, 2001]. This approach can also avoid the effects of catastrophic forgetting [Ans et al., 2004], where learning new concepts destroys those previously stored, since the representational capacity of the architecture can be augmented by adding more models (although this is not done in the current implementation). The modular architecture allows for the study of low-level neural signals from different models, and further insight into how and why the models divide the movement to be imitated into smaller submovements. Farrar and Zipser [1999] propose that a modular structure where different neural networks are responsible for different submovements can efficiently deal with the complexity that arises when controlling 3D movements.

The MPMA has the advantage of both local and distributed representations. The local representation corresponds to an inverse/forward model coupling, and by examining the input/output relationship of the inverse/forward models, it is possible to tell where different concepts are stored. The neural networks used to implement the inverse/forward models are distributed representations, tolerant of noise and faulty network nodes. The architecture will be explained in more detail in the following section. In the following text, the term module is used to group three models together: the inverse model, forward model and the responsibility predictor.

4 2.1

¨ urk Axel Tidemann and Pinar Ozt¨ The Models in the Architecture

The MPMA consists of several modules, each with a paired inverse/forward model, and a responsibility predictor. The architecture is shown in figure 1. Imitation of movement involves activation of different modules in different points of the movement. The dashed arrows show the error signals for all models, demonstrating how all the neural networks start from a random state and are trained based on the self-organization of the architecture. The models will now be explained in more detail. The inverse model is a motor controller or behaviour. It has two input signals: 0 the current state xt and the desired state xt+1 of the system. The state xt describes the position of the system in a certain coordinate system. The task of the inverse model is to issue the motor commands uit that will achieve the desired state, given the current state. The error signal is given by the feedback motor error command, ufeedback . The ufeedback is a simple way to pull the system towards the desired state when the inverse models issue bad motor commands [Kawato, 1990], typically during early stages of training. The corrective motor command ufeedback is based 0 on the difference between the target state xt+1 and actual state xt+1 , even though motor commands and state are not in the same coordinate system (state is typically coordinates, whereas motor commands are joint velocities or torques). There are many ways to achieve a desired state in redundant robot systems [Jordan and Rumelhart, 1992]. The advantage of using the ufeedback is that it guarantees to find a solution to training the inverse model. Since it is based on the difference between the desired and actual state, its influence will decrease as the performance of the system increases. Another advantage is the increase in robustness of the system, since it acts as an online correction controller. If there are sudden perturbations or noise that the inverse and forward models are unable to cope with, the ufeedback will pull the system towards the desired state. The forward model is a predictor. Its inputs are the current state xt and the motor commands uit of its paired inverse model. The forward model predicts the consequences of the motor commands applied to the current state of the robot, i.e. the next state of the system x ˆit+1 . The difference between the actual next state and the predicted state is used as the error signal to train the forward model. The responsibility predictor (RP) is another predictor. Based on the context signal yt , the RP predicts the suitability pit of the module to control the robot prior to movement. How does the forward model and RP differ? The forward model predicts the consequences of the motor commands, issued by its paired inverse model. The RP uses the context signal (i.e. something other than the state of the system and the motor commands) to predict how well its own module will perform, before any motor commands are issued. Wolpert uses the following example: if there are different inverse models for lifting a cup depending on whether it is full or empty, the context (full/empty) can aid the system to pick the correct inverse model. Without the RP, the system would not be able to tell which of the inverse models were most appropriate for the task, and might have to try several before finding the best inverse model. Such an approach will have lesser performance compared to a situation where the best inverse model is selected immediately. We hypothesize this model is crucial for building a predictive architecture. Its error signal is taken from the λt vector (explained shortly).


5

A module outputs a confidence signal λit representing how much control it should exert over the robot. This is calculated by multiplying the pit with the likelihood lti . The likelihood represents a scalar value based upon the predictive performance of the forward model (which is a vector), by summing the difference between the predicted next state and the actual next state. The lti assumes the presence of Gaussian noise, as shown in equation (1). (1)

lti = √

1 2πσ 2

e

2 −|xt −ˆ xi t| 2σ 2

All the confidence signals are normalized into the final λt vector. This vector is multiplied with the corresponding inverse model output before all the motor commands are summed and applied to the robot. Modules that make good predictions will have more influence over the robot than modules with bad predictions. This is how the architecture implements switching of control between modules. Furthermore, it allows for multiple modules to collaborate to control the robot. The λ signal also promotes modules with good predictions, since it gates the error signal of each module. This is achieved by multiplying the error signal with the corresponding λ value, which allows good modules to further refine their models. 2.2

Input/output of the MPMA

The desired state was the 3D coordinates of the elbow and wrist position of both arms of the demonstrator. The elbow coordinates were normalized to the range [−1, 1], with the shoulder as the origo. The wrist position was normalized to the range [−1, 1], with the elbow as origo. The state of the robot was defined in the same way to overcome the correspondence problem [Nehaniv and Dautenhahn, 2002]. Findings in neuroscience anticipate a geometric stage where sensory input is transformed to postural trajectories that are meaningful to the motor system of the observer [Torres and Zipser, 2004]. The same approach has been taken in previous ¨ urk, 2007], but with joint angles work [Demiris and Hayes, 2002, Tidemann and Ozt¨ as input instead of 3D coordinates. Using coordinates as opposed to joint angles presents a bigger challenge to the inverse models, since the transformation from coordinates to joint angle velocities is harder to learn than the transformation from joint angles to joint angle velocities (which is simply the derivative). The modules use the context signal to determine which module is more suitable to control the robot prior to movement. The context signal in the current experiment is the melody playing while dancing. The dancer must continuously listen to the music while dancing, thus the melody is an appropriate context signal. The MPMA outputs motor commands to the robot in the form of joint angle velocities. Real world robots use forces, but the simulator uses joint angle velocities as motor commands. Since the direction and speed of the motor commands are defined, replacing the simulator with an inverse dynamics controller that calculates forces to be applied on a real-world robot should be straightforward. Using joint angle velocity as part of the state is common in many robot systems, however this is not done in the MPMA since the inverse/forward models are dynamic systems with memory (see the next section), capable of representing the changes in coordinates (i.e. velocity) internally.

6 3

¨ urk Axel Tidemann and Pinar Ozt¨ Experimental Setup

The human dance movement to be imitated was the dance to the song “YMCA” by The Village People, see figure 2. The dance was chosen for two reasons: 1) the song (and dance) is well-known and easy to explain verbally (e.g. as forming the letters YMCA with your arms) and 2) it is complex enough to make it an interesting imitation task. To track human dance movements, a Pro Reflex tracking system was used. By placing fluorescent markers on the body of the dancer, the system is able to track the movement in three dimensions over time using five infrared cameras, with a sampling frequency f = 200Hz. Markers were put on the wrist, elbow and shoulder of both arms of the dancer. After tracking, each 10th sample was extracted from the noisy data (in other words, the models had to predict 0.05 seconds into the future) and used as desired state to the MPMA. Since we wanted to examine to which extent the modules capture specific parts of the movement, we repeated the desired state three times (i.e. the movement to be imitated consisted of the YMCA trajectory four times) and added small amounts of noise (1%) during training. If the modules became experts on particular parts of the trajectory, their activations should exhibit a repeating pattern. We implemented a four degree of freedom (DOF) model of a human arm [Tolani and Badler, 1996] as the robot simulator, since the experiment consisted solely of imitating arm movements. Each arm had a 3DOF spherical shoulder joint and a 1DOF revolute elbow joint; the entire simulated robot was described with 8DOF. All the models (inverse/forward/RP) in the architecture were implemented using Echo State Networks (ESNs) [Jaeger and Haas, 2004]. An ESN is a recurrent neural network that is characterized with two features: 1) a large, sparsely connected hidden layer and 2) only the output layer weights are modified during training. When the ESN is created, the input weights are randomly generated, and not changed during training. The desired output sequence can then be found simply by linear regression. Traditional backpropagation networks modify all the layers during training, and therefore require a lot more computations to converge. The input/output of each model will now be specified. The inverse model had 24 inputs. 12 signals defined the current state of the simulated robot; recall that both elbow and wrist positions were given as 3D coordinates in the range [−1, 1] for each arm. The desired state was defined the same way. The inverse model had 8 outputs in the range [−1, 1], corresponding to the degrees of freedom for the robot, as explained earlier. The forward model had 20 input signals: 12 signals for the current state, and 8 signals from the paired inverse model. The forward model had 12 output signals (range [−1, 1]), describing the predicted next state. The RP had four inputs representing the context signal (defined as the melody playing while dancing, see figure 2). It had a single output, representing the predicted suitability of the module to control the robot, and was in the range [0, 1]. We wanted to test the performance of the system with different number of nodes in the hidden layer; there were four different configurations for all the networks in the architecture: 50, 100, 200 and 400 nodes in the hidden layer. Common to all the networks were the spectral radius α, defining the length of the memory (range [0, 1]). All the networks in all configurations had α = 0.1, enabling fast memory. The noise level v = 0.2 for all networks, adding 10% noise to the internal


7

state of the network. These parameters were found by experimental validation, as recommended by Jaeger [2005]. With such a high dimensional system, it is crucial to have good error signals to ensure convergence. The arm model of Tolani and Badler allow joint angle rotations to be found analytically from the positions of the elbow and wrist. This allows differences in desired and actual state to be expressed as differences in rotational angles, which the feedback controller adds to the final motor command to pull the system towards the desired state. Recall that the ufeedback also serves as training signal for the inverse models; since it is found analytically from differences in desired and actual state it is an accurate error signal, guiding the inverse models towards learning the correct input/output relationship. To further exploit the good error signals and ensure quick convergence, the output gain K of the feeback error controller was stronger than the output gain L of the inverse models during the early stages of training. For the first training epoch, K = 1, L = 0.01, which would force the system towards the desired trajectory based almost exclusively on the feedback error signal. Since all the networks are generated at random, they will not produce the desired behaviour at the first epoch. L increased and K decreased linearly as the performance of the system increased. The training would not stop until L = 1 and K < 0.15, to ensure that most of the motor control came from the inverse models of the architecture, but still allowing online corrective control from the feedback controller. The likelihood lit quantifies prediction performance as a scalar, based on a prediction error vector. lit should thus give a high probability to predictions that have a relatively low error. Through trial and error, we found a rule of thumb that σ should be in the range of 10-15% of the maximum error signal. The likelihood follows the Gaussian distribution; an error within one σ from µ = 0 (i.e. no error) will be rewarded fairly high, since N (σ, σ)/N (0, σ) = 0.6065. The forward model has 12 outputs in this experiment with range [−1, 1], yielding the maximum summed error as 24 (e.g. desired state was all −1, actual state all 1). σ = 3 (12.5% of the maximum error) is thus a good choice. If the σ value is too small, the architecture becomes unstable, since only perfect predictions will achieve a high likelihood. If the σ value is too large, the performance of the forward model will not matter at all, since all predictions will be given a relatively high likelihood. It is therefore important to define an appopriate σ value; our rule of thumb should be a good starting point. There were two stopping criteria related to the performance of the architecture: 1) the output pti of the RP with respect to the normalized λti had to be less than 3%, and 2) the trajectory produced by the system could not differ more than 3% from the desired trajectory. There were four modules in the architecture. We intended that the architecture would decompose the system in accordance with the context signal, i.e. one module would control the movement for each of the letters in YMCA (see figure 2), coinciding with the melody playing. In other words, the context signal represents our intention of how the movement should be decomposed in terms of motor primitives. Haruno et al. [2001] implemented an earlier version of the MOSAIC architecture using three modules corresponding to three different objects that were to be moved. In the current experiment there are four letters to be learned (i.e.

8

¨ urk Axel Tidemann and Pinar Ozt¨

one letter corresponds to the trajectory required to form a letter with the arms), thus four modules would seem to be an appropriate choice. We also wanted to examine how the responsibility predictor influenced the architecture. Recall that the novelty with the MPMA is the consistent inverse/forward ordering (present in HAMMER and in MOSAIC during action observation; this is also similar to the approach by Jordan and Rumelhart [1992]) with the use of the responsibility predictor. Two extra conditions were examined: 1) after training the architecture the RP was disabled, and the architecture was set to imitate the movement it had learned during training, and 2) the architecture was trained without the use of a RP. These tests will show how important the RP is as a part of a predictive architecture. Without the RP, the likelihood is the only way to differentiate between modules when controlling the robot, this is similar to the confidence measure used in HAMMER. The system was implemented in MatLab.

4

Results

Each of the network configurations (50, 100, 200 and 400 nodes in the hidden layer) were run 20 times. Table 1 shows how the architecture performed with the different network configurations. A hidden layer size of 200 might be best with regards to the number of epochs required to train the architecture and the Σufeedback /Σut ratio, however performance-wise the networks do not differ much. An example of the imitative performance of the architecture can be seen in figure 3; the strict stopping criteria ensured a close match between desired and actual trajectory (see figures 4 and 5). The Σufeedback /Σut ratio shows how much control the feedback motor controller exerted over the simulated robot at the last epoch. Even though the gain of the ufeedback was less than 0.15 (inverse model gain was 1), it still has a significant influence (slightly less than 1/4 on average). Even though the inverse models of the architecture control most of the output, the ufeedback is needed to ensure robustness. Table 1 also shows the average number of active modules when the simulated robot was to imitate the YMCA. We defined a module to be active during one of the letters when λit > 0.1 for at least 25% of the context signal, excluding small bursts of activity, but including persisting small contributions. The result is an indication of how much the modules would collaborate or dominate when controlling the robot. An example can be seen in figure 6, demonstrating collaboration and domination between modules, and how the architecture successfully self-organizes the decomposition of the target trajectory into submovements. To complement figure 6, we examined the neural activation of each module. Figure 7 shows the actions of each of the inverse models. Although the motor output activation patterns might be hard to interpret in terms of physical actions since they show 8 rotational velocities, it presents another important result: the recurrent neural activation patterns confirm that modules become experts on specific parts of the trajectory. The next step consisted of disabling the RP of the trained networks, to see how well the architecture performed without it. Table 2 shows the performance of the different networks. For all network configurations, the performance error mean is more than twice as high as for the original experiment. Even if the error is twice as high, it is still around 6.5%, which is a fairly low error. The feedback error ratio


9

reveals that the architecture performs worse without the RP; the ratios are about double compared to the experiments done with RP. These high ratios explain the low performance error, since the feedback error motor controller successfully pulls the system in the right direction when the architecture issues bad motor commands. Furthermore, the number of active modules have risen for all experiments. Instead of separating control of different parts of the movement, all modules contribute to the movement at the same time. This indicates that the RP plays an important role in dividing the motor space between the modules. However, since the architecture was trained with RP, it is not too surprising that the removal of the RP leads to worse performance. This is the reason why we trained the architecture without the RP as well. Initially, all parameters were kept the same when training the networks without RP. However, none of the networks would converge to the stopping criteria of a performance error less than 3%. The stopping criteria had to be tripled to 9%, and even then some of the network configurations would not converge. Table 3 shows the results of experiments where the architecture was trained without RP. Even with the relaxed stopping criteria, all of the networks with 50 and 100 nodes in the hidden layer did not converge. Only 10% of the networks with 200 nodes in the hidden layer converged; 75% converged with 400 nodes in the hidden layer. Both the performance error and feedback error ratios are much higher than the experiments where the RP was disabled after training; similarily, all modules were active during all parts of the movement. The decrease in performance and increase in motor error ratio is a testament to the importance of the RP. When training with RP, it clearly aids the separation and specialization of modules, since the performance error and feedback error ratios were lower when the RP was disabled but used during training, compared to when it was not used during training at all. There is also an indication that the RP allows for neural resources to be used more efficiently: when trained without RP, the networks with 50 and 100 nodes in the hidden layer did not converge, and only 10% of the networks with 200 nodes in the hidden layer. When trained with the RP, the networks with 50 and 100 nodes in the hidden layer performed on par with the networks with bigger hidden layers.

5

Discussion

Previous work has shown that the modules were capable of decomposing the ¨ urk, movement to be imitated into different submovements [Tidemann and Ozt¨ 2007], albeit in a much simpler environment: there were only four degrees of freedom for the robot; state was described by four joint angles as opposed to 3D coordinates, and the neural networks were standard backpropagation recurrent neural networks with fewer nodes in the hidden layer. The previous work did not establish to what extent the modules captured different motor primitives. When the movement to be imitated includes a repetition of a certain trajectory, we have shown how each module becomes experts on specific parts of the trajectory. This is confirmed by the repeated domination/collaboration when controlling the robot (figure 6) and the repeated neural activation patterns (figure 7). The results show that a single module tends to dominate control during the Y and A parts of the movement. For all experiments, the control during the M and

10


C letters tends to be collaborative, i.e. modules share control of the robot. The Y and A letters are symmetrical and are mostly controlled by the shoulder joints. On the other hand, the elbow joints become involved during control of the letters M and C. The tendency to share responsibility during these letters could then be due to the increase in number of joints moving to form the letters (figure 7 shows how the motor outputs of the elbows become active during the M and C letters). Table 1 also shows that most modules collaborate during control of the letter C. The letter C has movements both along the shoulder joints and the right elbow joint. It is also the only letter which is not symmetrical in its movement, unlike Y, M and A. This indicates that the letter C is harder to learn due to its lack of symmetry in addition to moving both shoulder and elbow joints, and therefore more modules collaborate to control the robot. The claim that the increase in complexity (and the reduction in symmetry) may be the reason why modules tend to collaborate, is backed up by neuroscientific findings, where nonsymmetric action in bimanual movements interfere and takes longer time to execute than symmetrical movements [Diedrichsen et al., 2001]. Asymmetric bimanual movements also demonstrate less interhemispheric correlations compared to symmetrical ones, indicating that the reduction in coupling between areas in the brain is why asymmetric movements are harder to do [Cardoso de Oliveira et al., 2001]. Is there an explanation to why sharing of motor control occurs? According to Cardoso de Oliveira [2002], both shared and separate motor codes exist in the brain, and these can explain why different degrees of coupling in neural activity are observed when the subject is performing symmetric (strong correlation) and asymmetric (weak correlation) movements. The model also fits nicely with the observation of active modules in the MPMA, i.e. some motor control is shared (collaboration), whereas some is separate (domination). For all experiments when the architecture is trained and tested with RP, the modules tend to switch in accordance with the context signal. This happens regardless of whether the modules dominate or collaborate (see figure 6). We designed the switches in the context signal according to our cognitive grouping of the movement (i.e. we set the boundaries in accordance with the letters formed), but it is the modules that determine how to represent these trajectories in the motor control space. We speculate that these trajectories on which modules become specialized can be called motor primitives. In the MPMA, these motor primitives are represented similarily to the muscle synergy notion explained by d’Avella and Bizzi [2005]. A motor primitive is not necessarily represented by a single module but is often distributed to several modules, meaning that the representation is grounded on muscle- or joint-synergies. The ability to spread motor primitives across modules can be seen as a desirable property of the MPMA; Davidson and Wolpert [2004] has suggested that MOSAIC should be revised to allow for share of motor control between modules.

6

Conclusion

We have implemented a connectionist self-organizing architecture for motor learning and control that builds upon the HAMMER and MOSAIC architectures. We have shown through studies of low-level neural activation how the architecture


11

self-organizes the decomposition of the movement into submovements, represented by specific modules. We have also shown how Wolpert’s idea of introducing a responsibility predictor is crucial to make the neural resources specialize into different parts of the movement. This specialization was also shown to increase the performance of the architecture. Furthermore, the connections between theories and experimentally verified claims in psychology and neuroscience and what is observed in the MPMA is appealing. However, it is important to keep focus on the scope of the architecture. The MPMA is nowhere near the complexity of the brain. But observing these phenomena in an architecture that was inspired from how the brain works is an indication that the architecture inhibits certain desirable properties, and that it is a good starting point for further work when it comes to develop agent architectures for motor control and learning.

7

Future Work

Future work will investigate how the MPMA scales with increasing lengths and complexities of movements to be imitated, to see if there are saturation points in the architecture where more neural resources must be added. Related to this is investigating how the MPMA functions with different number of modules. Another focus is investigating how the MPMA captures attractors through self-organization [Kuniyoshi et al., 2003], along with methodology for evaluating which modules capture what in the motor control space. We are also interested in how the architecture reacts to perturbations of the target state.

8

Acknowledgements The authors wish to thank Ruud van der Weel for using the Pro Reflex system.

References Bernard Ans, St´ephane Rousset, Robert M. French, and Serban Musca. Selfrefreshing memory in artificial neural networks: learning temporal structures without catastrophic forgetting. Connection Science, 16(2):71–99, June 2004. Michael Arbib. Imitation in animals and artifacts, chapter The Mirror System, Imitation, and the Evolution of Language, pages 229–280. MIT Press, Cambridge, 2002. S. Cardoso de Oliveira. The neuronal basis of bimanual coordination: Recent neurophysiological evidence and functional models. Acta Psychologica, 110:139– 159, 2002. S. Cardoso de Oliveira, A. Gribova, O. Donchin, H. Bergman, and E. Vaadia. Neural interactions between motor cortical hemispheres during bimanual and unimanual arm movements. European Journal of Neuroscience, 14(11):1881–1896, 2001. ISSN 0953816X.

12


Andrea d’Avella and Emilio Bizzi. Shared and specific muscle synergies in natural motor behaviors. PNAS, 102(8):3076–3081, 2005. doi: 10.1073/pnas.0500199102. URL http://www.pnas.org/cgi/content/abstract/102/8/3076. P.R. Davidson and D.M. Wolpert. Internal models underlying grasp can be additively combined. Experimental Brain Research, 155(3):334–340, 2004. Yiannis Demiris and Gillian Hayes. Imitation in animals and artifacts, chapter Imitation as a dual-route process featuring predictive and learning components: a biologically-plausible computational model, pages 327–361. MIT Press, Cambridge, 2002. Yiannis Demiris and Bassam Khadhouri. Hierarchical attentive multiple models for execution and recognition of actions. Robotics and Autonomous Systems, 54: 361–369, 2006. Jorn Diedrichsen, Eliot Hazeltine, Steven Kennerley, and Richard B. Ivry. Moving to directly cued locations abolishes spatial interference during bimanual actions. Psychological Science, 12(6):493–498, 2001. doi: 10.1111/1467-9280.00391. URL http://www.blackwell-synergy.com/doi/abs/10.1111/1467-9280.00391. D.S. Farrar and D. Zipser. Neural network models of bilateral coordination. Biol Cybern, 80(3):215–25, 1999. Vittorio Gallese and Alvin Goldman. Mirror neurons and the simulation theory of mind-reading. Trends in Cognitive Sciences, 2(12), 1998. Masahiko Haruno, Daniel M. Wolpert, and Mitsuo Kawato. MOSAIC model for sensorimotor learning and control. Neural Comp., 13(10):2201–2220, 2001. URL http://neco.mitpress.org/cgi/content/abstract/13/10/2201. Robert A. Jacobs, Micheal I. Jordan, Steven J. Nowlan, and Geoffrey E. Hinton. Adaptive mixtures of local experts. Neural Computation, 3:79–87, 1991. Herbert Jaeger. Tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the ”echo state network”. Technical report, German National Research Institute for Information Technology, 2005. Herbert Jaeger and Harald Haas. Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication. Science, 304(5667):78–80, 2004. doi: 10.1126/science.1091277. URL http://www.sciencemag.org/cgi/content/abstract/304/5667/78. Michael I. Jordan and David E. Rumelhart. Forward models: Supervised learning with a distal teacher. Cognitive Science, 16:307–354, 1992. Mitsuo Kawato. Feedback-error-learning neural network for supervised motor learning. In R. Eckmiller, editor, Advanced neural computers, pages 365–372, 1990. Yasuo Kuniyoshi, Yasuaki Yorozu, Yoshiyuki Ohmura, Koji Terada, Takuya Otani, Akihiko Nagakubo, and Tomoyuki Yamamoto. From humanoid embodiment to theory of mind. In Samuel Pierre, Michel Barbeau, and Evangelos Kranakis, editors, Embodied Artificial Intelligence, volume 2865 of Lecture Notes in Computer Science, pages 202–218. Springer, 2003. ISBN 3-540-20260-9.


13

Andrew N. Meltzoff and M. Keith Moore. Explaining facial imitation: A theoretical model. Early Development and Parenting, 6:179–192, 1997. Chrystopher L. Nehaniv and Kerstin Dautenhahn. Imitation in Animals and Artifacts, chapter The Correspondence Problem, pages 41–63. MIT Press, Cambridge, 2002. Rolf Pfeifer and Christian Scheier. Understanding Intelligence. MIT Press, Cambridge, MA, USA, 2001. ISBN 026266125X. Illustrator-Isabelle Follath. Jean Piaget. Play, dreams and imitation in childhood. W. W. Norton, New York, 1962. ISBN 0-393-00171-7. Giacomo Rizzolatti, Luciano Fadiga, Vittorio Gallese, and Leonardo Fogassi. Premotor cortex and the recognition of motor actions. Cognitive Brain Research, 3: 131–141, 1996. Stefan Schaal. Is imitation learning the route to humanoid robots? Cognitive Sciences, 3(6):233–242, 1999.

Trends in

¨ urk. Self-organizing multiple models for imitation: Axel Tidemann and Pinar Ozt¨ Teaching a robot to dance the YMCA. In IEA/AIE, volume 4570 of Lecture Notes in Computer Science, pages 291–302. Springer, June 2007. Deepak Tolani and Norman I. Badler. Real-time inverse kinematics of the human arm. Presence, 5(4):393–401, 1996. Elizabeth B. Torres and David Zipser. Simultaneous control of hand displacements and rotations in orientation-matching experiments. J Appl Physiol, 96(5):1978–1987, 2004. doi: 10.1152/japplphysiol.00872.2003. URL http://jap.physiology.org/cgi/content/abstract/96/5/1978. Daniel M. Wolpert, R. Chris Miall, and Mitsuo Kawato. Internal models in the cerebellum. Trends in Cognitive Sciences, 2(9), 1998. Daniel M. Wolpert, Kenji Doya, and Mitsuo Kawato. A unifying computational framework for motor control and social interaction. Philosophical Transactions: Biological Sciences, 358(1431):593–602, 2003.


14 9

Figures

yt

x

i pt

RESPONSIBILITY PREDICTOR

i λt

lit xt

xt

uit

x't+1

-

x̂ it+1

INVERSE MODEL

NORMALIZATION

LIKELIHOOD

λt

FORWARD MODEL

x -

FEEDBACK CONTROLLER

+

ufeedback

PLANT

ut

xt+1

Figure 1 The multiple paired models architecture, inspired from Wolpert et al. [2003] and Demiris and Khadhouri [2006]. See the text for details.

0

36

24 000I

48

00I0

124

0I00

I000

Figure 2 The movement to be imitated by the simulated robot. This is the dance to the song YMCA by The Village People. The letters Y M C A are formed using arm movements. The numbers show at which timestep the letter begins, coinciding with the context signal (seen as the four-digit vector). The entire trajectory was repeated three times. 3

3

3

3

2

2

2

2

1

1

1

1

0

0

0

0

−1

−1

−1

−1

Figure 3

An example of the imitation performed by the simulated robot.

−2

−2

−3

−2

−3 2

0

−2 −3

−2

−1

0

1

2

3

−2

−3 2

0

−2 −3

−2

−1

0

1

2

3

−3 2

0

−2 −3

−2

−1

0

1

2

3

2

0

−2 −3

−2

−1

0

1

2

3


15

1 0.5 Y

0

M C

A

Y

M C

A

Y

M C

A

Y

M C

Target state Actual state A

−0.5 −1

0

50

100

150

200

250 Right elbow X

300

350

400

450

500

1 0.5 Y

0

M C

A

Y

M C

A

Y

M C

A

Y

M C

A

−0.5 −1

0

50

100

150

200

250 Right elbow Y

300

350

400

450

500

1 0.5 Y

0

M C

A

Y

M C

A

Y

M C

A

Y

M C

A

−0.5 −1

0

50

100

150

200

250 Right elbow Z

300

350

400

450

500

1 0.5 Y

0

M C

A

Y

M C

A

Y

M C

A

Y

M C

A

−0.5 −1

0

50

100

150

200

250 Right wrist X

300

350

400

450

500

1 0.5 Y

0

M C

A

Y

M C

A

Y

M C

A

Y

M C

A

−0.5 −1

0

50

100

150

200

250 Right wrist Y

300

350

400

450

500

1 0.5 Y

0

M C

A

Y

M C

A

Y

M C

A

Y

M C

A

−0.5 −1

0

50

100

150

200

250 Right wrist Z

300

350

400

450

500

Figure 4 Actual trajectory versus desired trajectory, showing the close match between actual and desired trajectory, right arm. The grey background shows when the context signal switches, along with the letter formed with the arms.


16

1 0.5 Y

0

M C

A

Y

M C

A

Y

M C

A

Y

M C

A

−0.5 −1

0

50

100

150

200

250 Left elbow X

300

350

400

450

500

1 0.5 Y

0

M C

A

Y

M C

A

Y

M C

A

Y

M C

A

−0.5 −1

0

50

100

150

200

250 Left elbow Y

300

350

400

450

500

1 0.5 Y

0

M C

A

Y

M C

A

Y

M C

A

Y

M C

A

−0.5 −1

0

50

100

150

200

250 Left elbow Z

300

350

400

450

500

1 0.5 Y

0

M C

A

Y

M C

A

Y

M C

A

Y

M C

A

−0.5 −1

0

50

100

150

200

250 Left wrist X

300

350

400

450

500

1 0.5 Y

0

M C

A

Y

M C

A

Y

M C

A

Y

M C

A

−0.5 −1

0

50

100

150

200

250 Left wrist Y

300

350

400

450

500

1 0.5 Y

0

M C

A

Y

M C

A

Y

M C

A

Y

M C

A

−0.5 −1

0

Figure 5

50

100

150

200

250 Left wrist Z

300

350

400

Same as for figure 5, this shows the match for the left arm.

450

500


17

epoch 20 1 Y MC

0.5 0

0

A

50

Y MC

100

150

A

200

Y MC

250

A

300

Y MC

350

λ values and RP output, module 1

400

A

λ rp

450

500

1 Y MC

0.5 0

0

A

50

Y MC

100

150

A

200

Y MC

250

A

300

Y MC

350


400

A

λ rp

450

500

1 Y MC

0.5 0

0

A

50

Y MC

100

150

A

200

Y MC

250

A

300

Y MC

350


400

A

λ rp

450

500

1 Y MC

0.5 0

0

50

A

Y MC

100

150

A

200

Y MC

250

300

A

Y MC

350


400

A

450

λ rp 500

Figure 6 An example of collaboration and domination between modules when imitating the YMCA (400 nodes in the hidden layer, same experiment as in figures 4 and 5). The plot shows how the MPMA self-organizes the decomposition of the movement into submovements, and how these submovements are represented by different modules. The overlap in RP output and λ demonstrates stability in the architecture, since the RP made accurate predictions of how much its module would end up controlling the robot.


18

0.2

Y

0

M C

A

Y

M C

A

Y

M C

A

Y

M C

−0.2 0

0.2

50

Y

0

M C

100

A

150

200

250

300

350

400

Motor output, right shoulder θ(1)

Y

M C

A

Y

M C

A

Y

M C

Module 1 A Module 2 Module 3 Module 4 500 450 ufb A

−0.2 0

0.2

50

Y

0

M C

100

A

150

200

250

300

350

400


Y

M C

A

Y

M C

A

Y

M C

450

500

A

−0.2 0

0.2

50

Y

0

M C

100

A

150

200

250

300

350

400


Y

M C

A

Y

M C

A

Y

M C

450

500

A

−0.2 0

0.2

50

Y

0

M C

100

A

150

Y

200

250

300

350

Motor output, right elbow θ

M C

A

Y

M C

A

400

Y

M C

450

500

A

−0.2 0

0.2

50

Y

0

M C

100

A

150

Y

200

250

300

Motor output, left shoulder θ(1)

M C

A

Y

M C

350

A

400

Y

M C

450

500

A

−0.2 0

0.2

50

Y

0

M C

100

A

150

Y

200

250

300


M C

A

Y

M C

350

A

400

Y

M C

450

500

A

−0.2 0

0.2

50

Y

0

M C

100

A

150

Y

200

250

300


M C

A

Y

M C

350

A

400

Y

M C

450

500

A

−0.2 0

50

100

150

200

250

300

350

400

450

500

Motor output, left elbow θ Figure 7 Motor outputs of each module and ufeedback as they are sent to the robot, same experiment as in figures 4- 6. The recurring neural activation patterns show how the modules become experts on specific parts of the trajectory.

A Self-Organizing Multiple Model Architecture for Motor Imitation 10

19

Tables

Table 1 The performance error pe shows the difference (in percent) from the desired state to the actual state at the last epoch. The Σuf b /Σut ratio shows how much the feedback error motor controller influenced the total motor command at the last epoch. The average number of active modules shows when and to what extent modules would dominate or collaborate when controlling the robot. µ and σ are parameters of the normal distribution.

Nodes in hidden layer 50 100 200 400

Epochs µ/σ 18/3.78 17/5.07 14/3.89 15/4.71

pe µ/σ 2.76%/0.23% 2.73%/0.27% 2.76%/0.17% 2.79%/0.16%

Σuf b /Σut µ/σ 0.23/0.02 0.24/0.09 0.20/0.02 0.23/0.06

Average active modules Y M C A 1 1.6 2.7 1.05 1 1.95 2.25 1.05 1.05 1.8 2.25 1 1 2.15 2.65 1.05

Table 2 Experiments where the RP was used to train the architecture, but testing was performed where the RP was disabled. Same variables as in table 1. The architecture clearly performs worse than when trained with RP, as can be seen as the increase in pe and Σuf b /Σut ratio.

Nodes in hidden layer 50 100 200 400

pe µ/σ 6.67%/0.65% 6.57%/1.37% 6.46%/0.32% 6.59%/0.64%

Σuf b /Σut µ/σ 0.44/0.03 0.45/0.05 0.42/0.02 0.43/0.02

Average Y M 4 4 4 4 4 4 4 4

active modules C A 4 4 4 4 4 4 4 4

Table 3 Training and testing without the RP. Stopping criteria needed to be tripled to 9%, otherwise none of the experiments would converge. Still, the networks with 50 and 100 nodes in the hidden layer did not converge. Performance of the converging networks are worse compared to both training and testing with RP (table 1) and training with RP and disabling it during testing (table 2).

Nodes 50 100 200 400

Conv. exp. 0% 0% 10% 75%

Epochs µ/σ N/A N/A 15/12.7 17/9.3

pe µ/σ N/A N/A 8.92%/0.00% 8.91%/0.08%

Σuf b /Σut µ/σ N/A N/A 0.58/0.04 0.55/0.01

Average active modules Y M C A N/A N/A N/A N/A N/A N/A N/A N/A 4 4 4 4 4 4 4 4

A Self-Organizing Multiple Model Architecture for Motor Imitation Axel ...

A Self-Organizing Multiple Model Architecture for Motor Imitation Axel ...

Suggest Documents

Self-organizing Multiple Models for Imitation

Modern Architecture in Berlin - Edition Axel Menges

Self-organizing Multiple Models for Imitation: Teaching a ... - CiteSeerX

A REPEATED IMITATION MODEL WITH ... - Semantic Scholar

A CONTROL ARCHITECTURE FOR MULTIPLE ... - CiteSeerX

Axel Software - Axel Systems Ltd

Model-based Adversarial Imitation Learning

A Probabilistic Framework for Model-Based Imitation Learning

A Model for Inferring the Intention in Imitation Tasks - CiteSeerX

A Probabilistic Framework for Model-Based Imitation Learning

A Cognitive Model for Software Architecture Complexity

A Model-Driven Architecture for Interoperable

A Model for Communications Satellite System Architecture ...

A Model-Driven Architecture for Logging Navigation

AMABULO - A Model Architecture for Business Logic

A Multiple Factor Model for European Stocks

A model for multiple outcomes games

A Population-Growth Model for Multiple

A Multiple Geometric Deformable Model Framework for ...

A psychometric model for multiple choice items

A quantitative model for understanding multiple

Multiple-Model Multiple Imputation for Longitudinal ... - PVAMU.edu

A quantitative model for understanding multiple ...

A Domain-Specific Language and Simulation Architecture for Motor ...