An Optimal Control Approach for Hybrid Actuator System - CiteSeerX

2011 IEEE-RAS International Conference on Humanoid Robots Bled, Slovenia, October 26-28, 2011

An Optimal Control Approach for Hybrid Actuator System Takamitsu Matsubara∗† , Tomoyuki Noda† , Sang-Ho Hyon†‡ , and Jun Morimoto† ∗ Graduate

School of Information Science, Nara Institute of Science and Technology, Nara, Japan † Department of Brain Robot Interface, ATR Computational Neuroscience Laboratories, Kyoto, Japan ‡ Department of Robotics, Ritsumeikan University, Shiga, Japan

Abstract—In this study, we consider the control problem of a hybrid actuation system that is composed of more than two different types of actuators. Even though actuators are one crucial component for developing a robot with many degrees of freedom, such as humanoid and exoskeleton robots, technical difficulties and costs complicate the development of new actuators. On the other hand, since different types of actuators provide various strengths and weaknesses, an efficient approach will probably combine different types of actuators to design a novel actuation system. The crucial issue for such a hybrid actuator is a method that distributes the necessary torque/force to different actuators for generating target movements. We propose using an optimal control method to find the torque distribution strategy for hybrid actuators. We consider an optimal torque distribution method for a pneumatic-electric (P-E) hybrid actuator model that is used in our newly developed exoskeleton robot XoR. We present a control approach with a trajectory-based optimal control method with minimum energy criterion. Focusing on the robot’s ankle joint, we apply our control method in numerical simulations and demonstrate that it reasonably finds the optimal torque distribution strategy based on the nature of the given task, and the results of high control performance.

Fig. 1. The overview of a lower limb exoskeleton robot: XoR aimed at rehabilitation on biped walking or postural control for elderly persons, spinal cord injury, or stroke patients. The robot employs a pneumatic-electric (P-E) hybrid actuator, which is composed of an antagonistic pneumatic actuator and DC servo motor, on each joint for performing both compliance and performance in the controlled joint.

I. I NTRODUCTION The development of light weight actuators that can generate large torque/force is crucial for a robot can have many degrees of freedom, such as humanoid and exoskeleton robots. However, developing new actuators with prominent properties is quite difficult due to their technical difficulties and costs. On the other hand, since different types of actuators have different strengths and weaknesses, an efficient approach will probably combine different types of actuators to design a novel actuation system. The crucial issue of such a hybrid actuator is a method that distributes the necessary torque/force to different actuators for generating target movements. In this paper, we propose using an optimal control method to find the torque/force distribution strategy for the hybrid actuator. We consider an optimal torque distribution method for a pneumatic-electric (P-E) hybrid actuator model that is used in our newly developed exoskeleton robot: XoR [1] (Fig. 1). XoR employs a P-E hybrid actuator on each joint, which is composed of an antagonistic setup of two pneumatic artificial muscles and a DC servo motor, and resembles the DM2 actuation approach [2], [3]. Each component of the P-E hybrid

978-1-61284-868-6/11/$26.00 ©2011 IEEE

actuator, i.e., a pneumatic artificial muscle and an electric motor, has different strengths and weaknesses. The pneumatic artificial muscle is one of the most attractive actuators for human-friendly robots because of its high power/weight ratio and its inherent compliance provided by the material and the air inside [2], [4]–[6]. However, such natural compliance limits the actuation performance in terms of the control frequency bandwidth. The DC servo motor has complementary characteristics and provides high actuation performance in terms of the control frequency bandwidth; however, it tends to have low power/weight ratio [7]. Although the P-E hybrid actuator potentially provides the capability to exploit both advantages of these different actuators, the best torque distribution strategy can be task dependent. Based on above understanding, we address the issue of finding a torque distribution strategy by an optimal control approach with a minimum energy criterion, as inspired by several successful studies in biological modeling (e.g., [8], [9]) and robotic systems (e.g., [10]–[12]). In our approach, the best distribution strategy for minimizing the energy consumption

300

㻼㻭㻹㻌㻞

㻼㻭㻹㻌㻝

P2

P1 τ a = r (F1 − F2 )

F1

F2 㻰㻯㻌㻹㼛㼠㼛㼞

τm

θ τ = τ m +τ a

Fig. 2. The overview of the 1 DoF pneumatic-electric hybrid actuator system on the ankle joint of the XoR. The system is composed of three parts: the antagonistic-setup of pneumatic artificial muscles, the DC servo motor, and the 1 DoF rigid body dynamics to be controlled.

required for the given task is derived as the optimal control policy. For example, if the given task requires high control frequency rather than large torques, a DC servo motor might be dominantly used because of both the performance and the energy efficiency. If the task requires only huge torque, pneumatic artificial muscles might be actuated. Note that our approach attempts to find the optimal control policy in a task specific manner. On the other hand, previous studies focused on development of a task-free torque/force distribution strategy [2], [3], [13]. The advantage our approach enjoys over them is the optimality for each task. To validate our approach, we first introduce the 1 DoF rigid body system on the ankle of the XoR with the P-E hybrid actuator. Then we apply our approach to our system in numerical simulations and demonstrate that our method reasonably finds the optimal torque distribution strategy based on the nature of the given task, resulting in high control performance. The rest of our paper is organized as follows. Section II describes a model of the 1 DoF P-E hybrid actuator system on the ankle joint of the controlled XoR. The actuator dynamics of both the pneumatic and electric actuators are modeled as well as a 1 DoF rigid body system. Section III shows our control strategy for the system using an optimal control method with the minimum energy criterion. We utilize the Iterative Linear Quadratic Gaussian (ILQG) method [14], an algorithm of trajectory-based differential dynamic programming, and efficiently derive a locally optimal feedback controller. Section IV shows the numerical simulation settings and the obtained results. Section V discusses the remaining challenges and future work. II. M ODELING OF P NEUMATIC -E LECTRIC H YBRID ACTUATOR S YSTEM This section describes the modeling of the P-E hybrid actuator system on a lower limb exoskeleton XoR. We focus on the 1 DoF actuator system on the robot’s ankle joint.

The system overview is depicted in Fig. 2. The system is composed of three parts: the antagonistic-setup of pneumatic artificial muscles, the DC servo motor, and the 1 DoF rigid body dynamics to be controlled. The model of each part and its integration is presented in the following subsections. A. A pneumatic antagonistic actuator model The pneumatic antagonistic actuator consists of two sets of a rubber muscle (FESTO) connected with an air pomp. Air pressure is provided to the muscle by the air piloted proportional pressure control valve as the regulator (NORGREN, VP50). Focusing on i-th muscle (i = 1, 2), the dynamic model of the internal air pressure Pi of the muscle is described with the electric valve voltage vvi as control variable by dPi (1) = −Rp Pi + vvi dt where Lp and Rp are constant coefficients that determine the time constant of the system. Then, the force Fi generated by the muscle on the string is approximated with the internal air pressure Pi and the reduction ratio ²i of the muscle [2], [4]–[6] as ½ α(1 − ²i )2 Pi + βPi + γ (h = 1) Fi = (2) 0 (h = 0) Lp

where α, β and γ are constant coefficients. h is a conditioning flag. As shown in Fig. 2, the rubber muscle is connected with the pulley at the joint through a fixed length of the string. If the string is stretched, the h gets 1 and the rubber muscle can transmit the generating force to the joint. Otherwise, the force cannot be transmitted to the joint at all, which cases are indicated by h = 0. Note that such a condition depends on the both states of the rubber and the joint. The reduction ratio of the rubber might have a dynamic model, however, we approximately model it by ²i = k² Pi in static. Since the two muscles are placed in antagonistic on a joint, the joint torque can be derived as τa = r (F1 − F2 )

(3)

where r is the radius of the pulley at the joint. The joint torque τa is assumed to be limited as τa ≤ τa max . B. A DC servo motor

finds a locally optimal control policy in a computationally efficient manner. Its advantage is the applicability for nonlinear dynamics and non-quadratic cost. A brief summary of the method is presented in the next section.

The dynamic model of the DC servo motor is simply represented as follows [7]: di = −Rm i + vc + vm (4) dt where i and vm are the motor current and voltage. Lm is the self-inductance coefficient and Rm is the resistance coefficient. vc is the constant counter-electromotive force. The torque generated by the motor τm has a proportional relationship with the motor current i as Lm

τm = ki i

(5)

where ki is the torque constant. The joint torque τm is assumed to be limited as τm ≤ τm max and it is generally much smaller than the maximum torque of pneumatic actuators, i.e., max τm ¿ τamax .

III. ILQG

In this section, we present an approach of optimal control utilized in deriving control policies for the pneumatic-electric hybrid actuator system in the next section. A. Problem statement The optimal control problem for the system dynamics f as in Eq.(8) is to find the optimal control policy π ∗ that minimizes the cost-to-go function V π as π ∗ ← arg min V π where

= −µ

dθ − mgs sin(θ) + τ dt

= τa + τm

(6)

(7)

where m is the mass, I is the moment of inertia, µ is the viscous friction coefficient, g is the gravity coefficient, s is the position of the center of mass of the link. D. 1 DoF P-E hybrid actuator system By integrating the above three components, the dynamics of the 1 DoF P-E hybrid actuator system is modeled as x˙ = f (x, u)

T

c(x(s), π(x(s)))ds

(10)

0

and τ

Z V π = h(x(T )) +

The torques generated by the above two actuators are applied to the 1 DoF rigid body system. According to the standard Newton-Euler equations [7], its dynamics is modeled by d2 θ dt

(9)

π

C. 1 DoF rigid body system

I

METHOD FOR F INDING A LOCAL OPTIMAL CONTROL POLICY

(8)

˙ P1 , P2 , i] ∈ R5 is the state variable and u = where x = [θ, θ, 1 2 [vv , vv , vm ] ∈ R3 is the control variable of the system. Note again that the above system has an actuator redundancy due to its multiple input {vv1 , vv2 , vm } and single output {τ }. Also, it is a nonlinear dynamical system because of the model properties of the pneumatic artificial muscles and the rigid body dynamics as shown in Eqs (2) and (6). For deriving a control policy with the redundancy resolution, we address by the optimal control with a minimum energy criterion. By this approach, the one combination of inputs among all the candidates is derived so that it minimizes the energy required to perform the given task as the optimal control policy. However, deriving such an optimal policy for the system would not be straightforward because of the nonlinearlity of the system. For finding the optimal control policy, we use the ILQG method as one of the trajectory-based optimal policy findings. The ILQG

and c(x(s), π(x(s))) is the running cost and h(x(T )) is the terminal cost. The above problem can only be analytically solved if the dynamics is a linear function and the cost function is written in a quadratic form in terms of the state and control variables as well known as Linear Quadratic Regulator (LQR) [15]. In such a case, the globally effective optimal control policy is analytically obtained. On the other hand, to deal with nonlinear dynamics or non-quadratic costs, the problem cannot be solved analytically. However, we are still able to find a locally optimal policy. Several algorithms have been proposed so far, such as Differential Dynamic Programming (DDP) [16] and Iterative Liner Quadratic Gaussian (ILQG) [14]. In this paper, we utilize the ILQG because of its simplicity and recent success of applications for real robotic systems as in [10]–[12]. B. Finding a local policy by the ILQG method The ILQG method begins with the time discretization of the dynamics and the control policy by the time step ∆t . First, the ¯ 11:T = method starts from an initial (1-th) control sequence u ¯ 2, · · · , u ¯ T } and corresponding state trajectory x ¯ 11:T = {¯ u1 , u ¯2, · · · , x ¯ T } obtained by applying u ¯ 1:T to the dynamics {¯ x1 , x sequentially as follows: ¯ k+1 = x ¯ k + ∆t f (¯ ¯ k ). x xk , u

(11)

¯ 11:T is iteratively improved in Second, the control sequence u the sense of the minimization of the cost-to-go function. For the improvement, the discrete-dynamics is linearly approximated by the first order Taylor expansion at around each trajectory of the state and control variables as follows: Ã ¯ ! ¯ ∂f ¯¯ ∂f ¯¯ δxk + δut (12) δxk+1 = I + ∆t ∂x ¯x¯ k ∂u ¯u¯ k

302

= Ak δxk + Bk δuk

(13)

¯ k and δuk = uk − u ¯ k are the state and where δxk = xk − x control derivatives. Also, the cost function is discretized in the time as V π = h(x(T )) +

T −1 X

∆t c(xk , uk )

(14)

k=1

and c(xk , uk ) in the above equation is approximated in a quadratic form at around the trajectory as c(xk , uk )

¯ k + δuk ) = c(¯ xk + δxk , u 1 T = qk + δxk qk + δkT Qk δxk 2 1 + δuTk rk + δuTk Rk δuk + δuTk Pk δk (15) 2

where ¯ k ), qk = c(¯ xk , u ¯ ∂ 2 ck ¯¯ Qk = , ∂x∂x ¯x¯ k ,¯uk ¯ ∂ck ¯¯ rk = , ∂u ¯u¯ k

¯ ∂ck ¯¯ qk = , ∂x ¯x¯ k ¯ ∂ 2 ck ¯¯ Pk = , ∂u∂x ¯x¯ k ,¯uk ¯ ∂ 2 ck ¯¯ Rk = . ∂u∂u ¯u¯ k

= lk + Lk δxk

¯ k + δuk = u ¯ k ). = Lk (xk − x

(16)

(17) (18)

Since it has an optimal feedback term, it may be effective even with modeling errors and disturbances. IV. O PTIMAL C ONTROL FOR THE P-E

The cost function for the point-to-point reaching task with the minimum energy criterion can be defined in Eq.(14) with the following running and terminal costs: =

c(xt , π(xt )) =

where lk is an open-loop term and Lk xk is an feedback term. ¯ i1:T using the open-loop By updating the control sequence u i+1 i i i ¯ 1:T ← l1:T + u ¯ 1:T , a new control sequence terms l1:T , as u ¯ i+1 u is obtained. The ILQG method iteratively improves the 1:T control sequence until its convergence. ¯ 1:T Finally, with the lastly improved control sequences u ¯ 1:T , the local optimal control policy and the state trajectory x is given as uk δuk

A. Cost function for the point-to-point reaching

h(xT )

With the above approximations, a local LQR problem, in which the state and the control variables are δx and δu, can be formulated with the linear dynamics and the quadratic cost function at around the current i-th trajectory. Since the problem at each time step can be considered as a LQR problem, it is efficiently solved though a modified Ricatti-like set of equations and it yields an affine control law as δuk

and zero velocity as the initial condition in the fixed time. To perform the task efficiently, the high control frequency rather than the large torques may be significant. Thus, the DC servo motor should be dominantly activated. In the Case 2, since the initial condition is set by a far position with a large velocity, the pneumatic artificial muscles might be activated than the Case 1 because large torques are necessarily required. We demonstrate with these two settings that how such taskdependent optimal torque distribution strategies are reasonably derived by our approach.

HYBRID ACTUATOR

SYSTEM

In this section, we describe the control of the P-E hybrid actuator system by the approach of the optimal control with the minimum energy criterion. Its experimental settings in simulations and the obtained results are presented. To validate our control approach to the system, we adopt a point-to-point reaching task in the joint angle and its velocity of the system with two different initial conditions: in the Case 1, the joint is required to reach the target position from a near position

(xdes − xT )T Wx (xdes − xT ) c(ut ) = uTt Wu ut

(19)

where Wx = diag([wθ , wω , wvv1 , wvv2 , wvm ]) and Wu = diag([wP1 , wP2 , wi ]) are the weighting matrices, and xdes is the target position to be reached. T is the index that indicates the finite horizon and its product with the time step ∆t defines the motion execution time. The settings of the weighting matrices Wx and Wu control the trade-off between the reaching accuracy and the energy consumption. B. Simulation Results We applied our control approach to the 1 DoF P-E hybrid actuator system in numerical simulations. ∆t was set by ∆t = 0.01. We have observed from the real hardware system shown in Fig. 2 that the pneumatic artificial muscle has a high power potential, however, it cannot generate a high frequency torques and cannot achieve a quick response due to the nature of the slow time constant of the electric air valve and dynamics of the rubber materials. The DC servo motor is complementary capable to generate high frequency torques and achieve a quick response although its maximum power is quite low. With these understanding in the real hardware system, we determined the system parameters of the model in simulation as shown in Table I. Also, the domain of the control variables were limited as in 0.0 ≤ vv1,2 ≤ 1.0 and −1.0 ≤ vm ≤ 1.0. Two initial conditions for the task were set as follows: Case ˙ P1 , P2 , i] = [−0.7, 0.0, 0.0, 0.0, 0.0], Case ¯ 0 = [θ, θ, 1: x ¯ 2: x0 = [−1.0, 1.7, 0.0, 0.0, 0.0]. Other settings in the cost function were commonly set as shown in Table II. As a control result, the trajectory on the phase plane of the joint is plotted in Fig. 3. The result for the P-E hybrid actuator system is indicated by “P-E hybrid”. As the comparisons, we also applied the optimal control to the two other systems: the electric actuator system which employs only the DC servo motor (indicated by “Motor”), and the pneumatic actuator system which employs only the pneumatic actuator (indicated by “Pneumatic”). For both cases, the electric actuator system cannot achieve the reaching to the target properly due to the lack of the power. The pneumatic actuator system resulted in better performance, however, could not precisely reach to the

303

1.5

3 Motor Pneumatic P−E hybrid

2 ω [rad/s]

ω [rad/s]

1

0.5

0

1

Motor Pneumatic P−E hybrid

0

−0.5 −0.8

−0.6

−0.4 −0.2 θ [rad]

0

−1 −2

0.2

(a) Case 1

−1

0 θ [rad]

1

2

(b) Case 2

˙ P1 , P2 , i] = ¯ 0 = [θ, θ, Fig. 3. Resulted trajectories of the system by our control strategy for two cases on the phase plane. (a) Case 1 starts from x ¯ 0 = [−1.0, 1.7, 0.0, 0.0, 0.0]. The results for the P-E hybrid actuator system are indicated by “P-E hybrid”. [−0.7, 0.0, 0.0, 0.0, 0.0]. (b) Case 2 starts from x The results for the electric actuator system which employs only the DC servo motor and for the pneumatic actuator system which employs only the pneumatic actuator are indicated by “Motor” and “Pneumatic”, respectively, as the comparisons. “x” represents the target to be reached by the control.

TABLE I M ODEL PARAMETERS . Parameter Lp α γ r Rm ki µ s

Value 1.0 100.0 0.0 0.5 1.0 2.0 0.001 0.25

Parameter Rp β k² Lm vc I m −

given task and the pneumatic actuator was not required. For the Case 2, using the pneumatic actuator for generating large torques was more required than the Case 1 because it started from a worse initial condition. Thus, these results validated that the usefulness of the P-E hybrid actuator and our control approach could take the advantages of the actuator by deriving a suitable torque distribution strategy based on the given task along with the minimum energy criterion.

Value 1.0 1.0 0.5 0.01 0.01 1 1 -

V. D ISCUSSION TABLE II TASK PARAMETERS . Parameter T xdes Wx Wu

Value 1.5 [0.0,0.0,0.0,0.0,0.0] diag([100.0, 20.0, 0.0, 0.0, 0.0]) diag([0.0001, 0.1, 0.1])

target due to the lack of the high control frequency. The P-E hybrid actuator system resulted in the best performance among them by exploiting both advantages of the two actuators. The time course of the state and the control variables obtained from our control strategy to the P-E hybrid actuator system are drawn in Fig. 4. The shadow area indicates the areas in which the pneumatic actuator was activated. By comparing the result from Case 2 to the Case 1, it is found that the pneumatic actuator is frequently activated. To quantify this, we calculated the total control signals separately PT for the total motor and the pneumatic actuator by Um = k=1 {vk }2 PT 1 2 and Uatotal = k=1 {vv,k }2 + {vv,k }2 as shown in Table III. These results suggest that for the Case 1, the DC servo motor was dominantly used because it was enough for achieving the

In this paper, we have considered a torque/force distribution strategy for hybrid actuator systems by optimal control. As a concrete example, we considered an optimal torque distribution method for a pneumatic-electric (P-E) hybrid actuator model that is used in our newly developed exoskeleton robot XoR. We presented a control method using a trajectory-based optimal control method with the minimum energy criterion. Focusing on the robot’s ankle joint, our control method was applied in numerical simulations. We demonstrated that our method reasonably found the optimal torque distribution strategy according to the given task, and the experimental results validated our approach. We are currently executing the system identification and implementation of our control method on the real hardware system. To overcome the modeling errors, the approach of the robust control might be taken [10]. ACKNOWLEDGMENT This study is the result of ”Brain Machine Interface Development“ carried out under the SRBPS, MEXT. This study is partially supported by the Grant-in-Aid for Scientific Research from Japan Society for the Promotion of Science (WAKATE-B22700177). We would like to acknowledge Akimasa Uchikata for helpful discussions.

304

Joint Angle [rad]

0 0.5

t [s]

1

1.5

Velocity [rad/s]

−1 0 2 0

0.5

t [s]

1

1.5 Motor Voltage [V]

−2 0 1 0 −1 0

0.5

1

1.5

Valve 1 [V]

t [s]

1 0.5 0 0

0.5

t [s]

1

1.5 Valve 2 [V]

Joint Angle [rad] Velocity [rad/s] Motor Voltage [V] Valve 1 [V] Valve 2 [V]

1

1 0.5 0 0

0.5

t [s]

1

1.5

(a) Case 1

1 0 −1 0

0.5

t [s]

1

1.5

1

1.5

1

1.5

1

1.5

1

1.5

2 0 −2 0

0.5

t [s]

1 0 −1 0

0.5

t [s]

1 0.5 0 0

0.5

t [s]

1 0.5 0 0

0.5

t [s]

(b) Case 2

˙ P1 , P2 , i] = [−0.7, 0.0, 0.0, 0.0, 0.0]. (b) Case 2 starts from x ¯ 0 = [θ, θ, ¯ 0 = [−1.0, 1.7, 0.0, 0.0, 0.0]. For Fig. 4. Control Results. (a) Case 1 starts from x the Case 2, the pneumatic actuator is more frequently activated than the Case 1 because the it starts from a worse initial condition and the task achievement requires large torques. The shadow area indicates the area in which the pneumatic actuator is activated.

TABLE III T OTAL C ONTROL S IGNALS .

Case 1 Case 2

total Um 106.0 110.9

Uatotal 32.3 64.8

R EFERENCES [1] S.-H. Hyon, J. Morimoto, T. Matsubara, T. Noda, and M. Kawato, “XoR: Hybrid Drive Exoskeleton Robot That Can Balance,” in Proceedings of the IEEE/RSJ Intl. Conference on Intelligent Robots and Systems, 2011 (Accepted). [2] I. Sardellitti, J. Park, D. Shin, and O. Khatib, “Air muscle controller design in the distributed macro-mini (DM2) actuation approach,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2007, pp. 1822–1827. [3] D. Shin, I. Sardellitti, Y.-L. Park, O. Khatib, and M. Cutkosky, “Design and control of a bio-inspired human-friendly robot,” International Journal of Robotics Research, vol. 29, no. 5, pp. 571–584, 2010. [4] C.-P. Chou and B. Hannaford, “Measurement and modeling of mckibben pneumatic artificial muscles,” IEEE transactions on robotics and automation, vol. 12, no. 1, pp. 90–102, 1996. [5] F. Daerden and D. Lefeber, “Pneumatic artificial muscles: Actuators for robotics and automation,” European Journal of Mechanical and Environmental Engineering, vol. 47, no. 1, pp. 11–21, 2002. [6] R. V. Ham, B. Verrelst, F. Daerden, and D. Lefeber, “Pressure control with on-off valves of pleated pneumatic artificial muscles in a modular one-dimensional rotational joint,” in IEEE International Conference on Humanoid Robots, 2003, pp. 1–15. [7] T. Yoshikawa, Foundations of robotics: analysis and control. MIT Press, 1990. [8] T. Flash and N. Hogan, “The coordination of arm movements: an experimentally confirmed mathematical model,” Journal of Neuroscience, vol. 5, pp. 1688–1703.

[9] E. Todorov, “Optimallity principles in sensorimotor control,” Nature Neuroscience, vol. 7, no. 9, pp. 907–915, 2004. [10] J. Morimoto and C. G. Atkeson, “Minimax differential dynamic programming: An application to robust biped walking,” in NIPS, 2002, p. 1539 1546. [11] D. Mitrovic, S. Nagashima, S. Klanke, T. Matsubara, and S. Vijayakumar, “Optimal feedback control for anthropomorphic manipulators,” in IEEE International Conference on Robotics and Automation, 2010, pp. 4143–4150. [12] D. Mitrovic, S. Klanke, and S. Vijayakumar, “Learning impedance control of antagonistic systems based on stochastic optimization principles,” The International Journal of Robotics Research, vol. 30, no. 5, pp. 556– 573, 2010. [13] M. Zinn, B. Roth, O. Khatib, and J. K. Salisbury, “Design and control of a bio-inspired human-friendly robot,” The International Journal of Robotics Research, vol. 23, no. 4-5, pp. 379–398, 2004. [14] E. Todorov and W. Li, “A generalized iterative lqg method for locallyoptimal feedback control of constrained nonlinear stochastic systems,” in American Control Conference, 2005, pp. 300–306. [15] R. F. Stengel, Optimal Control and Estimation. Dover Publications, 1994. [16] D. H. Jacobson and D. Q. Mayne, Differential dynamic programming. Elsevier, 1970.

305