Learning Dance Movements by Imitation: A ... - Semantic Scholar

3 downloads 0 Views 4MB Size Report
making a simulated robot learn human dance movements, using visual input to .... error motor signal was relative to the total motor command at the last epoch.
Learning Dance Movements by Imitation: A Multiple Model Approach ¨ urk Axel Tidemann and Pinar Ozt¨ IDI, Norwegian University of Science and Technology {tidemann, pinar}@idi.ntnu.no

Abstract. Imitation learning is an intuitive and easy way of programming robots. Instead of specifying motor commands, you simply show the robot what to do. This paper presents a modular connectionist architecture that enables imitation learning in a simulated robot. The robot imitates human dance movements, and the architecture self-organizes the decomposition of movements into submovements, which are controlled by different modules. Modules both dominate and collaborate during control of the robot. Low-level examination of the inverse models (i.e. motor controllers) reveals a recurring pattern of neural activity during repetition of movements, indicating that the modules successfully capture specific parts of the trajectory to be imitated. Keywords: Cognitive Modeling, Imitation Learning, Neural Networks

1

Introduction

Learning by imitation is regarded as a cornerstone of human cognition. Humans can easily transfer motor knowledge by demonstration, making it a natural way for humans to program robots. The research presented in this paper focuses on making a simulated robot learn human dance movements, using visual input to guide the motor system of the robot. A modular connectionist architecture designed for motor learning and control by imitating movements was implemented. The modules work as experts, learning specific parts of the movement to be imitated through self-organization. The research agenda is to understand how and why such decompositions occur. These decompositions can be considered as motor primitives. How these motor primitives are represented and combined to form complex movements is examined. The architecture has a core of multiple paired inverse/forward models, implemented by Echo State Networks.

2

Imitation in Psychology, Neuroscience and AI

Developmental psychologists have extensively studied how infants learn by imitating adults. Piaget sees imitation as a continuing adaption of motor and perception stimuli to the external world [1]. Meltzoff and Moore propose an architecture that combines production and perception of actions (including a self-correcting

ability) called active intermodal mapping (AIM) [2], believing imitation learning is an innate mechanism. Some neuroscientists believe that mirror neurons form a neurological basis for imitation [3]. Mirror neurons are characterized by firing both when observing and producing the same action, and are considered to form the neural basis of imitation [4], language [5] and mind reading [6]. Research on imitation learning in AI is coarsely divided in two groups: solving the correspondence problem (i.e. the transformation from an extrinsic to an intrinsic coordinate system) or focusing on the perception-action system (perceptual stimuli has already been transformed to an internal representation) [4]. Schaal regards model-based learning (the focus of this brief background section) to be the most interesting approach to implement imitation learning. Model-based learning is achieved by pairing an inverse model (i.e. controller) with a forward model (i.e. predictor). Demiris [7] and Wolpert [8] use a model-based approach in their architectures to implement imitation learning. Wolpert argues that multiple paired inverse/forward models are located in the cerebellum [9]; thus this approach is suitable for architectures inspired from how the brain works.

3

The Multiple Paired Models Architecture

The architecture (abbreviated MPMA) presented in this paper is designed to equip an agent with imitative capabilities. It consists of multiple paired forward and inverse models, and is inspired by Demiris’ HAMMER [7] and Wolpert’s MOSAIC [8] architectures. The MPMA seeks to combine the best of both; the consistent inverse/forward model ordering of HAMMER (MOSAIC has a different ordering depending on whether it is recognizing or executing an action), and the focus on self-organization and the use of a responsibility predictor (explained shortly) from MOSAIC. The modular structure of the brain serves as inspiration to create an architecture with multiple models. Wolpert argues that the central nervous system computes internal simulations that can be modeled using inverse/forward models [8]. By spreading motor knowledge across several models the architecture can code redundancy, an important part of robust intelligent systems. Furthermore, it is an approach for motor control that is well understood in the literature [10]. In the following text, the term module is used to group three models together: the inverse model, forward model and the responsibility predictor. The MPMA is shown in figure 1. The dashed arrow shows the error signal for all models. The forward model is a predictor ; it learns to predict the next state x ˆit+1 given the current state xt and the motor command i ut applied to the robot. The error signal is the difference between the predicted and the actual next state. The inverse model is a motor controller ; it learns what motor commands 0 uit (issued as joint angle velocities) will achieve a desired state xt+1 given the 0 current state xt . The error signal is based on the difference between xt+1 and xt+1 , called the feedback motor error command, ufeedback . The ufeedback is also added to the final motor output. Using the feedback controller is a simple way to pull the system in the right direction when bad motor commands are issued [11],

yt

x

i pt

RESPONSIBILITY PREDICTOR

i λt

i lt xt

xt x't+1

i ut

INVERSE MODEL

x̂ it+1

-

LIKELIHOOD

NORMALIZATION

λt

FORWARD MODEL

x -

FEEDBACK CONTROLLER

ufeedback

+

PLANT

ut

xt+1

Fig. 1. The MPMA, inspired from [8] and [7]. Details described in the text.

guaranteeing a solution when training an inverse model in a redundant system. Additionally, it increases the robustness of the system by acting as an online correction controller. The responsibility predictor (RP) predicts pit indicating how well the module is suited to control the robot prior to movement, based on the context signal yt . The λit signal is based upon the pit and the likelihood lti , which is a measure of how well the forward model predicts the next state (see [9] for details). The final lambda vector is normalized, which also serves as the error signal for the RP. The λt vector is responsible for switching control of the robot between modules. The motor commands from each module is multiplied with the corresponding λ value before summation and application to the robot. Modules that make good predictions will have more influence over the final motor command. The λ value also gates the learning of each model by multiplying the error signal with the corresponding λ value, promoting modules with good predictions. The desired state was the normalized 3D coordinates of the elbow and wrist position of both arms of the demonstrator. The elbow coordinates were in the range [−1, 1], with the shoulder as the origo. The wrist position was in the range [−1, 1], with the elbow as origo. The state of the robot was defined in the same way to overcome the correspondence problem [12]. Findings in neuroscience anticipate a geometric stage where sensory input is transformed to postural trajectories that are meaningful to the motor system of the observer [13]. A simpler approach has been taken in previous work [14, 15], using joint angles as input instead of 3D coordinates. The modules use the context signal to determine which module is more suitable to control the robot prior to movement (i.e. when lifting a cup, the context full/empty determines the appropriate inverse model). Since a dancer must continuously listen to the music while dancing, it is chosen as the context signal in the current experiment.

4

Experimental Setup

The experiment consisted of imitating the dance to the song YMCA by The Village People (see figure 2). The dance was chosen since it is rather well-known and easy to explain verbally, and sufficiently complex to make it an interesting imi-

tation task. Movement data was gathered with a Pro Reflex 3D motion tracking system, where markers were put on the shoulder, elbow and wrist of the dancer. The noisy data sampled at 20Hz was used as the desired state to the MPMA, forcing the models to predict 0.05 seconds into the future. The recorded YMCA movement was repeated three times and added small amounts of noise (1%) during training. Since the experiment was to imitate arm movements, a four degree of freedom model of a human arm was implemented [16]. The simulated robot was thus described by 8 degrees of freedom. To implement the inverse/forward models and the RP, Echo State Networks (ESNs) were used [17], exploiting the massive memory capacity and fast training algorithm.

0

36

24 000I

00I0

48 0I00

124 I000

Fig. 2. Dancing the YMCA, by forming the letters Y M C A using arm movements. The numbers show at which timestep the next letter is formed in the sequence. The four-digit vector is the context signal. This movement was repeated three times.

Each inverse model had 24 inputs: 12 signals for the current state corresponding to the 3D coordinates of the elbow and wrist of each arm, same for the desired state. The 8 outputs with range [−1, 1] corresponded to the degrees of freedom of the robot. There were 20 inputs for each forward model: 8 signals from the inverse model output (i.e. the suggested motor command), and 12 for the current state. There were 12 output signals (range [−1, 1]) to predict the next state. The RPs had four inputs from the context signal, with a single output in the range [0, 1]. Performance of the system was evaluated with different number of nodes in the hidden layer. All networks had spectral radius α = 0.1 (range [0, 1], determining the length of the memory with increasing α) and noise level v = 0.2 (effectively adding 10% noise to the internal state of the network). Good error signals are crucial to ensure convergence in such a high dimensional system. An advantage of using the arm model in [16] is that joint angle rotations can be found analytically from positions of the elbow and wrist. The difference in desired and actual state can thus be expressed as differences in rotational angles, which the feedback controller adds to the final motor command to pull the system in the right direction. In the early stages of training, the ufeedback gain K was stronger than the output gain L of the inverse models (i.e. K = 1, L = 0.01) to force the system towards the desired trajectory. L was

increased and K decreased linearly with increasing performance of the system, until L = 1 and K < 0.15. There were two stopping criteria: the prediction error pti with respect to λti had to be less than 3%, and the actual trajectory could not differ more than 3% from the desired trajectory. The architecture had four modules. It was the designer’s intention that the modules would decompose the movement according to the context signal, coinciding with the melody playing (see figure 2). The system was implemented in MatLab.

5

Results

Four different sizes of hidden layer were examined (50, 100, 200 and 400 nodes). Each of the network configurations were run 20 times. Figure 4 shows the close match between the desired and actual trajectory typical for the experiments. Table 1 shows the performance of the different runs. The Σufeedback /Σut ratio indicates to what extent the feedback controller influenced the total motor command at the last epoch. Being slightly less than 1/4 on average, it shows that the modules control most of the total motor output; however online corrections are needed to ensure robustness. Table 1 also shows how many modules were active on average during each of the letters. Being active was defined as λit > 0.1 for at least 25% of the context signal. This eliminates small bursts of activity from the modules but includes persisting small contributions. The average active modules tell to what extent the modules would dominate or collaborate during control of the movement. Figure 3a gives an example of how the modules collaborate and compete when controlling the robot, and how the architecture is capable of decomposing the movement into smaller submovements. Figure 3b shows the motor output of the inverse models of each module. This figure allows for a visual inspection of what each module does, complementing the average number of active modules. Table 1 indicates that 200 nodes might be the best configuration with respect to the number of training epochs and the Σufeedback /Σut ratio, however a performance-wise a clear preference cannot be made. Table 1. The pe tells how much the actual state deviated from the desired state (in percent) at the final epoch. The Σufeedback /Σut ratio is an indication of how strong the feedback error motor signal was relative to the total motor command at the last epoch. How many modules were active during each letter show how often modules would collaborate or dominate when controlling the robot. The µ and σ are parameters of the normal distribution. Nodes in hidden layer 50 100 200 400

Epochs µ/σ 18/3.78 17/5.07 14/3.89 15/4.71

pe Σufeedback /Σut Average active modules µ/σ µ/σ Y M C A 2.76%/0.23% 0.23/0.02 1 1.6 2.7 1.05 2.73%/0.27% 0.24/0.09 1 1.95 2.25 1.05 2.76%/0.17% 0.20/0.02 1.05 1.8 2.25 1 2.79%/0.16% 0.23/0.06 1 2.15 2.65 1.05

0.2

Y

0

M C

A

Y

M C

A

Y

M C

A

Y

Module 1 A Module 2 Module 3 4 500 ufb

M C

−0.2 0

0.2

epoch 20

Y

0

100

M C

A

150

Y MC

0.5

A

Y MC

A

Y MC

A

Y MC

A

0

λ rp 0.2

250

300

350

M C

A

Y

M C

Module 450

400

Motor output, right shoulder θ(1)

Y

0

50

100

150

200

250

300

350

400

λ values and RP output, module 1

450

500

50

Y

0

100

M C

A

150

Y MC

0.5

A

Y MC

A

Y MC

A

Y MC

A

0.2

50

Y

0

0

50

100

150

200

250

300

350

400

λ values and RP output, module 2

450

500

250

A

300

Y

350

M C

A

400

Y

M C

A

Y

M C

A

Y

450

M C

500

A

−0.2 0

λ rp

200

Motor output, right shoulder θ(2)

100

150

M C

A

200

250

300

350

400

Motor output, right shoulder θ(3)

1

0

200

−0.2

1

0

50

Y

M C

A

Y

M C

A

Y

450

M C

500

A

−0.2 0

50

100

150

200

250

300

350

Motor output, right elbow θ

400

450

500

1 Y MC

0.5

A

Y MC

A

Y MC

A

Y MC

A

λ rp

0.2

Y

0

M C

A

Y

M C

A

Y

M C

A

Y

M C

A

−0.2

0

0

50

100

150

200

250

300

350

400

λ values and RP output, module 3

450

0

500 0.2

1 Y MC

0.5

A

Y MC

A

Y MC

A

Y MC

A

λ rp

50

Y

0

M C

0

50

100

150

200

250

300

350

400

λ values and RP output, module 4

450

A

150

Y

200

250

300

Motor output, left shoulder θ(1)

M C

A

Y

M C

350

400

A

Y

450

M C

500

A

−0.2 0

0

100

50

100

150

500 0.2

Y

0

(a)

M C

A

Y

200

250

300

Motor output, left shoulder θ(2)

M C

A

Y

M C

350

400

A

Y

450

M C

500

A

−0.2 0

0.2

50

Y

0

100

M C

A

150

Y

200

250

300

Motor output, left shoulder θ(3)

M C

A

Y

M C

350

400

A

Y

450

M C

500

A

−0.2 0

50

100

150

200

250

300

Motor output, left elbow θ

350

400

450

500

(b) Fig. 3. (a): Collaboration and competition between modules controlling the robot (400 nodes), showing that the MPMA is capable of self-organizing the decomposition of a movement into smaller submovements. Overlapping RP output and λ indicate stability, since the RP correctly predicted how much it would influence the control of the robot. The gray background (stripes/fill) along with the corresponding letters shows the boundaries of the context signal. (b): The recurring patterns of motor output activity sent to the robot indicate that the modules have successfully captured parts of the movement to be imitated. The specific details of which modules captures what is subordinate to the fact that the modules have repeating activation patterns. 1

1

0.5

0.5 Y

0

M C

A

Y

M C

A

Y

M C

A

Y

M C

A

−0.5 −1

M C

A

Y

M C

A

Y

M C

A

Y

M C

Target state Actual state A

−0.5 0

50

100

150

200

250 Left elbow X

300

350

400

450

500

−1

1

1

0.5

0.5 Y

0

M C

A

Y

M C

A

Y

M C

A

Y

M C

A

0

50

Y

0

−0.5 −1

Y

0

M C

100

A

150

Y

M C

200

A

250 Right elbow X

Y

300

M C

350

A

400

Y

M C

450

500

A

−0.5 0

50

100

150

200

250 Left elbow Y

300

350

400

450

500

1

−1

0

50

100

150

200

250 Right elbow Y

300

350

400

450

500

1

0.5

0.5 Y

0

M C

A

Y

M C

A

Y

M C

A

Y

M C

A

−0.5

−1

−1

0

50

100

150

200

250 Left elbow Z

300

350

400

450

Y

0

−0.5 500

1

0

M C

A

50

Y

100

M C

150

A

200

Y

M C

250 Right elbow Z

A

300

Y

350

M C

400

A

450

500

1

0.5

0.5 Y

0

M C

A

Y

M C

A

Y

M C

A

Y

M C

A

−0.5

−1

−1

0

50

100

150

200

250 Left wrist X

300

350

400

450

500

1

1

0.5

0.5 Y

0

M C

A

Y

M C

A

Y

M C

A

Y

M C

A

−0.5

−1

−1

50

100

150

200

250 Left wrist Y

300

350

400

450

0

500

1

M C

A

50

Y

0

−0.5 0

Y

0

−0.5

0

M C

Y

100

A

50

M C

150

Y

100

M C

150

A

200

A

200

Y

M C

250 Right wrist X

Y

300

M C

250 Right wrist Y

A

Y

350

A

300

M C

400

Y

350

M C

400

A

450

500

A

450

500

1

0.5

0.5 Y

0

M C

A

Y

M C

A

Y

M C

A

Y

M C

A

−0.5

−1

−1

0

50

100

150

200

250 Left wrist Z

300

350

400

450

Y

0

−0.5 500

0

M C

50

A

Y

100

M C

150

A

200

Y

M C

250 Right wrist Z

300

A

Y

350

M C

400

A

450

500

Fig. 4. Desired trajectory versus actual trajectory, same experiment as figure 3. The actual trajectory differs less than 3% from the desired trajectory.

6

Discussion

The results reveal how each module becomes specialized on certain parts of the movement. When the movement to be imitated is repeated, the modules dominate and collaborate in controlling certain parts of the movement in accordance with the repetition of the movement (figure 3a). This is also the case on the neural activity layer, as low-level examination of the output of the inverse models reveal the same repeating pattern of activation (figure 3b). Note that the focus is not on what was learned where; the emphasis is examining and understanding the self-organizing capabilities of the architecture. For all experiments, most collaboration occurs when there is a break in symmetry of the movement, i.e. at the letter C (see table 1). Y, M and A are symmetrical. It should be noted that there is more collaboration during M than for Y and A; this could be due to introduction of control of the elbow joints, whereas Y and A are controlled mostly by the shoulder joints (see figure 4). The lack of symmetry could be a reason why the letter C is harder to learn, and therefore more modules collaborate to control the robot. This is backed up by neuroscientific findings, where nonsymmetric action in bimanual movements interfere and takes longer time to execute than symmetrical movements [18]. [19] argues for the presence of both shared and separate motor codes within the brain, explaining the varying degrees of coupling observed in neural activity when performing symmetric (strong correlation) and asymmetric (weak correlation) movements. This model accounts for the observation of active modules in the MPMA with respect to symmetry and complexity as discussed above. There tends to be a switch between modules that is fairly coincident with the context signal, regardless of domination or collaboration between modules. The design of the context signal indicates where the division should be, but it is the modules that determine how to represent these motor primitives. Regarding the representation of the motor primitives, our results exhibit an encoding fashion resembling d’Avellas muscle synergy notion [20], where the representation is grounded on muscle or joint synergies, not in a single module. Observing these phenomena in the MPMA which is inspired from how the brain works is an indication that the architecture inhibits certain desirable properties (keeping in mind its limited scope and complexity compared to the brain). We believe it is a good starting point for modeling the important cognitive function that is imitation learning.

7

Future Work

Future work will investigate how the MPMA scales with increasing lengths and complexities of movements to be imitated, to see if there are saturation points in the architecture where more neural resources must be added. Another focus is investigating how the MPMA captures attractors through self-organization [21], along with methodology for evaluating which modules capture what in the motor control space.

References 1. Piaget, J.: Play, dreams and imitation in childhood. W. W. Norton, New York (1962) 2. Meltzoff, A.N., Moore, M.K.: Explaining facial imitation: A theoretical model. Early Development and Parenting 6 (1997) 179–192 3. Rizzolatti, G., Fadiga, L., Gallese, V., Fogassi, L.: Premotor cortex and the recognition of motor actions. Cognitive Brain Research 3 (1996) 131–141 4. Schaal, S.: Is imitation learning the route to humanoid robots? Trends in Cognitive Sciences 3(6) (1999) 233–242 5. Arbib, M.: The Mirror System, Imitation, and the Evolution of Language. In: Imitation in animals and artifacts. MIT Press, Cambridge (2002) 229–280 6. Gallese, V., Goldman, A.: Mirror neurons and the simulation theory of mindreading. Trends in Cognitive Sciences 2(12) (1998) 7. Demiris, Y., Khadhouri, B.: Hierarchical attentive multiple models for execution and recognition of actions. Robotics and Autonomous Systems 54 (2006) 361–369 8. Wolpert, D.M., Doya, K., Kawato, M.: A unifying computational framework for motor control and social interaction. Philosophical Transactions: Biological Sciences 358(1431) (2003) 593–602 9. Wolpert, D.M., Miall, R.C., Kawato, M.: Internal models in the cerebellum. Trends in Cognitive Sciences 2(9) (1998) 10. Jordan, M.I., Rumelhart, D.E.: Forward models: Supervised learning with a distal teacher. Cognitive Science 16 (1992) 307–354 11. Kawato, M.: Feedback-error-learning neural network for supervised motor learning. In Eckmiller, R., ed.: Advanced neural computers. (1990) 365–372 12. Nehaniv, C.L., Dautenhahn, K.: The Correspondence Problem. In: Imitation in Animals and Artifacts. MIT Press, Cambridge (2002) 41–63 13. Torres, E.B., Zipser, D.: Simultaneous control of hand displacements and rotations in orientation-matching experiments. J Appl Physiol 96(5) (2004) 1978–1987 14. Demiris, Y., Hayes, G.: Imitation as a dual-route process featuring predictive and learning components: a biologically-plausible computational model. In: Imitation in animals and artifacts. MIT Press, Cambridge (2002) 327–361 ¨ urk, P.: Self-organizing multiple models for imitation: Teaching 15. Tidemann, A., Ozt¨ a robot to dance the YMCA. In: IEA/AIE 2007. Volume 4570 of Lecture Notes in Computer Science., Springer Verlag (2007) 291–302 16. Tolani, D., Badler, N.I.: Real-time inverse kinematics of the human arm. Presence 5(4) (1996) 393–401 17. Jaeger, H., Haas, H.: Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication. Science 304(5667) (2004) 78–80 18. Diedrichsen, J., Hazeltine, E., Kennerley, S., Ivry, R.B.: Moving to directly cued locations abolishes spatial interference during bimanual actions. Psychological Science 12(6) (2001) 493–498 19. Cardoso de Oliveira, S.: The neuronal basis of bimanual coordination: Recent neurophysiological evidence and functional models. Acta Psychologica 110 (2002) 139–159 20. d’Avella, A., Bizzi, E.: Shared and specific muscle synergies in natural motor behaviors. PNAS 102(8) (2005) 3076–3081 21. Kuniyoshi, Y., Yorozu, Y., Ohmura, Y., Terada, K., Otani, T., Nagakubo, A., Yamamoto, T.: From humanoid embodiment to theory of mind. In Pierre, S., Barbeau, M., Kranakis, E., eds.: Embodied Artificial Intelligence. Volume 2865 of Lecture Notes in Computer Science., Springer (2003) 202–218