veloped to allow general robust legged locomotion with the MIT Cheetah ... controller towards the steady state gait while remaining free to explore the cost space.
Policy Regularized Model Predictive Control Framework for Robust Legged Locomotion by
Gerardo Bledt B.S., Mechanical Engineering Virginia Tech (2015) Submitted to the Department of Mechanical Engineering Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degrees of Master of Science in Mechanical Engineering and Master of Science in Electrical Engineering and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY February 2018 c Massachusetts Institute of Technology 2018. All rights reserved. ○
Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Department of Mechanical Engineering Department of Electrical Engineering and Computer Science December 15, 2017 Certified by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sangbae Kim Associate Professor of Mechanical Engineering Thesis Supervisor Certified by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Russ Tedrake Professor of Electrical Engineering and Computer Science Thesis Supervisor Accepted by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rohan Abeyaratne Professor of Mechanical Engineering Chairman, Department Committee on Graduate Students Accepted by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Leslie A. Kolodziejski Professor of Electrical Engineering and Computer Science Chair, Department Committee on Graduate Students
2
Policy Regularized Model Predictive Control Framework for Robust Legged Locomotion by Gerardo Bledt Submitted to the Department of Mechanical Engineering Department of Electrical Engineering and Computer Science on December 15, 2017, in partial fulfillment of the requirements for the degrees of Master of Science in Mechanical Engineering and Master of Science in Electrical Engineering and Computer Science
Abstract A novel Policy Regularized Model Predictive Control (PR-MPC) framework is developed to allow general robust legged locomotion with the MIT Cheetah quadruped robot. The full system is approximated by a simple control model that retains the key nonlinearities characteristic to legged contact dynamics while reducing the complexity of the continuous dynamics. Nominal footstep locations and feedforward forces for controlling the robot’s center of mass are designed from simple physics-based heuristics for steady state legged movement. By regularizing the predictive optimization with these policies, we can exploit the known dynamics of the system to bias the controller towards the steady state gait while remaining free to explore the cost space during transient behaviors and disturbances. The nonlinear optimization makes use of direct collocation on the simplified dynamics to pose the problem with a highly sparse structure for fast computation. A generalized approach to the controller design is independent from specific gait pattern and reference policy and allows stabilization of aperiodic locomotion. Simulation results show dynamic capabilities in a variety of gaits including trotting, bounding, and galloping, all without changing the set of algorithm parameters between experiments. Robustness to sensor and input noise, large push disturbances, and unstructured terrain demonstrate the ability of the predictive controller to adapt to uncertainty. Thesis Supervisor: Sangbae Kim Title: Associate Professor of Mechanical Engineering Thesis Supervisor: Russ Tedrake Title: Professor of Electrical Engineering and Computer Science
3
4
Acknowledgments First, I want to share my appreciation for the love and encouragement from my parents Carlos and Marcela throughout my whole life. Without them, none of what I have accomplished would have been possible. I dedicate the work in this thesis to my brother Carlos who was immensely influential in my decision to follow in his footsteps to become an engineer and pursue research at a higher level. I express my gratitude to my advisor Sangbae Kim for allowing me the opportunity to work on a project that I am passionate about, granting me the freedom to explore a topic that I enjoy, and providing a wealth of intuition in the field of robotics. As part of the Biomimetic Robotics Lab, I want to thank my labmates (including Cheetah) for making my time in the lab and on the project fun and successful. In particular, a huge thanks to Pat Wensing who helped guide me through the research process and provided an endless amount of information that made this thesis a success. A thanks to my thesis supervisor Russ Tedrake for having accepted to work with me as part of MSRP before coming to MIT and then again for this thesis. Also, I would like to thank everyone who has helped me with all of the aspects of graduate school other than research. Especially Leslie Regan and Janet Fischer for all of their support throughout this year and being so accommodating to all of the changes and difficulties during my time working towards completing the degrees. It made the whole process much simpler which has been very appreciated. Lastly, thanks to all my friends throughout my life in Madison, Virginia Tech, and MIT that have made my time outside of the lab and academics enjoyable and filled with great memories. They have provided a much needed distraction from all of the work associated with research.
5
6
Contents 1 Introduction 1.1
15
Impacts of Mobile Robots . . . . . . . . . . . . . . . . . . . . . . . .
15
1.1.1
17
Advances and Challenges in Legged Robotics . . . . . . . . . .
1.2
Predictive Control
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
1.3
Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
2 Theory and System Modeling 2.1
2.2
23
Robot Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
2.1.1
Kinematics . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
2.1.2
Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
Gait Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
3 Predictive Control Formulation
31
3.1
Simplified Discrete Dynamics . . . . . . . . . . . . . . . . . . . . . .
32
3.2
Generalized Prediction Horizon . . . . . . . . . . . . . . . . . . . . .
35
3.3
Physics-Based Policy Regularizer . . . . . . . . . . . . . . . . . . . .
36
3.4
PR-MPC Formulation . . . . . . . . . . . . . . . . . . . . . . . . . .
38
3.4.1
Cost Function . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
3.4.2
Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
4 Simulation Results
45
4.1
From Naïve MPC to PR-MPC
. . . . . . . . . . . . . . . . . . . . .
45
4.2
Basic Locomotion . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
7
4.3
4.4
Robustness Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
4.3.1
Disturbance Rejection & Environment Uncertainty . . . . . .
52
4.3.2
Diverse Gait Stabilization . . . . . . . . . . . . . . . . . . . .
55
Computation Timings . . . . . . . . . . . . . . . . . . . . . . . . . .
57
5 Conclusion 5.1
5.2
61
Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
61
5.1.1
Implementation on the MIT Cheetah 3 Robot . . . . . . . . .
62
5.1.2
Algorithm Improvements . . . . . . . . . . . . . . . . . . . . .
64
Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
65
8
List of Figures 1-1 MIT Cheetah 2 Robot. Experimental legged robotic platform developed by the Biomimetic Robotics Lab at MIT. . . . . . . . . . . .
19
1-2 Capabilities of PR-MPC. The novel framework allows for a wide variety of dynamic capabilities across multiple gaits and under disturbances with uncertain environments. . . . . . . . . . . . . . . . . . .
21
2-1 Kinematic Tree Model. The robot’s kinematics are defined as a kinematic tree composed of kinematic chains attached to a free floating base. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
2-2 Physical System Definitions. Coordinate systems and definitions for the rigid body model in 3D. The vectors 𝑟𝑗 specify the position of each foot relative to the body CoM, while forces 𝑓𝑗 provide the force under each foot. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
2-3 Trot Gait Pattern. A gait pattern map for the running trot that defines contact states for the leg over time with a red bar and swing phases otherwise. Distinct contact phases are notted by the dashed black lines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
3-1 Autonomous Navigation Plan. The robot will receive a goal state and autonomously generate a locomotion plan to safely navigate its environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
31
3-2 Overall System Framework. Proposed block diagram framework for autonomous navigation with the MIT Cheetah. Detailed descriptions of the white blocks are presented while gray blocks represent components that are outside the scope of this work. . . . . . . . . . .
32
3-3 Gaited Prediction Horizon Definition. An example prediction horizon definition for the running trot gait with {𝐺𝑎𝑖𝑡} = {𝑁 = 4, 𝐾 = 3}. The algorithm returns an optimized plan for all N phases depicted in blue, but executes only the first planned phase outlined in green. .
36
3-4 Physical Constraint Depiction. The optimization is constrained to remain physically feasible throughout the prediction horizon. . . .
40
4-1 Case A: Naïve MPC. Without any information from the simple physics references, the optimization returns forces and foot placements at local minima that is unable to stabilize the robot even during standing. 46 4-2 Case B: Force Seeded MPC. Resultant footstep locations without any footstep placement seeding or regularization cluster under the CoM. 47 4-3 Case C: Fully Seeded MPC. Resultant footstep locations with a given footstep placement seeding, but no regularization. . . . . . . . .
47
4-4 Case D: PR-MPC . Resultant footstep locations with both a seeded initial footstep location reference, as well as regularization for the full PR-MPC case.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
4-5 Predicted Footstep Planning. A receding prediction horizon for planning returns 𝑁 future footstep locations for each foot. The next predicted footstep is used for planning the swing foot trajectories. . .
49
4-6 Accelerating From Rest. From rest, the controller is able to accelerate to a nominal steady state trotting pace while stabilizing the transient behavior during the acceleration periods. . . . . . . . . . . .
50
4-7 Rapid Yawing. The robot turns with a commanded 30∘ 1𝑠 and rolls to match the natural motion expected from the conical pendulum model. 51 10
4-8 Trajectory Tracking. Given a trajectory that is assumed to be feasible, the controller tracks the trajectory throughout the simulation. .
51
4-9 Push Recovery Sequence. Footstep planning with MPC allows the robot to choose new footstep locations autonomously to recover from disturbances. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
4-10 Push Recovery Resulting States With the control framework specified, all of the robot’s states are able to quickly recover to the desired trajectory after large disturbances. . . . . . . . . . . . . . . . . . . .
53
4-11 Push Recovery Optimized Inputs Although the reference policy suggests no lateral forces, the optimization is able to return forces that help reject the push disturbance. . . . . . . . . . . . . . . . . . . . .
54
4-12 Rough Terrain Traversal. Successful Locomotion over rough, uneven terrains with no change to the controller and no prior knowledge of the environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
4-13 Comparison of Multiple Gait Footfalls. Resulting foothold locations for each of the optimized gaits throughout the simulation with respect to the CoM. Travel velocity direction is in the +𝑥 direction. .
56
4-14 Footfall Placements for Multiple Gaits. Footsteps are distinctive for each gait even though the same reference policy is used. . . . . . .
57
4-15 Gait Computation Time Comparison. PR-MPC solve timing and associated average step cost from simulation. The PR-MPC controller resulted in stable simulation for all data points shown. . . . . . . . .
58
4-16 Computational Benefits of PR-MPC . Comparatively, PR-MPC has lower solve times than the optimizations with less prior knowledge. 59 5-1 MIT Cheetah 3 Robot. The latest iteration of the MIT Cheetah project with improved hardware and software design. . . . . . . . . .
62
5-2 MIT Cheetah 3 Simulation Environment. A realistic simulation environment shows promising early results for PR-MPC on the new robot using the full system dynamics. . . . . . . . . . . . . . . . . . .
11
63
12
List of Tables 2.1
System parameters of the simplified Cheetah model . . . . . . . . . .
27
4.1
Simulation cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
4.2
𝑄 and 𝑅 weighting Matrix values . . . . . . . . . . . . . . . . . . . .
49
13
14
Chapter 1 Introduction Robotics has long been motivated by a desire to augment and even surpass human capabilities in disaster response scenarios. In recent years, the Fukushima Nuclear plant disaster directly resulted in the creation of the DARPA Robotics Challenge as a push to advance the state of robotics to the point that robots can be sent in to disastrous situations without endangering human lives. This would directly save lives and help minimize the negative impacts of such events. While the challenge resulted in a large growth in research being done within robotics and incredible advances in the fields of perception and autonomy, the main criticism was the inability for the robots to quickly navigate between challenges. In a real emergency the robot would not only have to sense its environment and make intelligent decisions, but also be able to safely move to wherever it needs to be in a timely manner and often while encountering uncertain terrains and unexpected obstacles, an ability that humans often take for granted.
1.1
Impacts of Mobile Robots
Many terrestrial robots are limited by their locomotion ability and are rendered incapable of assisting in situations where they would be well suited to replace human responders. For example, many wheeled robots are unable to traverse extremely uneven terrain and climb stairs, while legged robots have difficulty balancing and 15
reacting quickly to disturbances. If researchers are able to overcome these movement problems, robotics could have enormous impacts on the world. Ideally, robots should be able to surpass the physical capabilities of humans and therefore be able to access locations that would be too dangerous otherwise. In an effort to tackle these issues, recent research has been done to improve the capabilities of mobile robotics. When most people imagine robots, the vision of having fully functioning humanoids walking around seamlessly in our world usually comes to mind. They are able to interact with the current state of the world intelligently and complete tasks in any environment without help from human operators. While this vision may one day be a reality, there is still work to be done. However, there is already a push for development of mobile robots for more immediately practical and necessary applications. A large majority of the early robotics field has focused on developing high precision, single-task machines for repetitive movements in controlled environments such as assembly lines. These are robots that avoid environmental interactions and execute preplanned, structured movements. This is in contrast to mobile robots, which are designed to navigate around the world embracing varying degrees of contact interaction with the environment. Currently, there are mobile robots already in use for disaster response, surveillance, agriculture, maintenance, and countless other applications. However, while this is exciting progress, it is often the case that the robot was specifically designed to carry out a small number of specific repetitive tasks rather than general utility. Some may even require modification to the structure or terrain in which they are operating. In disaster relief situations such as in an earthquake or the Fukushima nuclear plant disaster, the terrain and environment would be highly unstructured, uncertain, and extremely dangerous. This would be the perfect situation to deploy a mobile robot to survey the situation or begin a search and rescue mission instead of sending in a human first responder. At this point, the robot should be able to traverse these treacherous situations reliably. This is a goal that those working on robotics hope to achieve, but there are still plenty of open problems in the field in need of solution. 16
1.1.1
Advances and Challenges in Legged Robotics
Compared to other forms of mobile robots, legged robots are subject to a unique set of advantages, as well as difficulties. Due to the nature of legged locomotion, the dynamics are particularly difficult to control due to the distinct modes during which foot contact with the ground is initiated or broken. It is not uncommon for a legged robot to be underactuated during one phase and overactuated during another phase and quickly switch between these modes. While recent research has shown impressive autonomous dynamic maneuvers in wheeled vehicles [19], quadcopters [6], flapping wing robots [7], and gliders [25], the same level of dynamic capabilities has yet to be achieved in legged robots. Traditional controls that yield these results for non-hybrid systems have proven to allow for some forms of legged movements [26, 17, 53, 43], but have failed to show truly robust, dynamic locomotion. This is in part due to the hybrid nature of legged contact dynamics constantly switching between actuation modes and the high complexity characteristic to legged systems. As such, optimization based controls are an attractive alternative for hybrid systems with this degree of complexity and nonlinearities when traditional linear control techniques are not appropriate. Successful implementations have been shown in both simulation and real robots [35, 34, 56, 20]. Hybrid Zero Dynamics (HZD) approach involves directly dealing with the the hybrid dynamics of the contact modes and have shown stable walking in bipeds through the optimization of periodic orbits for a given gait [52, 5]. In Differential Dynamic Programming (DDP), impressive acrobatics have been shown for a full humanoid model, but the computations are still too slow for real time application [44]. Optimizations are able to handle the nonlinearities and intricate combinations of mode switches in legged robots where traditional controllers cannot, but often suffer from large computation requirements and unwelcome optimal solutions that render them infeasible in practice. To deal with these difficulties, the use of simple models, or templates [23, 13], has proven effective in creating planned inputs to control the full dynamics, as well as 17
meeting real-time constraints. A common simplification to legged locomotion is to approximate the full dynamics with the Spring Loaded Inverted Pendulum (SLIP) model [3]. The inverted pendulum dynamics have been well studied and provide a greatly reduced model for controlling a large degree of freedom system with high fidelity. The benefits of embedding the SLIP model dynamics into the HZD optimization are seen in [36]. Similarly, techniques for general limit cycle generation based on the SLIP model have proven to work on unknown terrain with various systems [8]. Simple descriptions for the Center of Mass (CoM) dynamics in running have enabled graceful real-time disturbance recovery [49, 11] in simulation experiments. In addition to the SLIP model for approximating CoM dynamics, other approaches have attempted to incorporate angular dynamics into simple models for planning and control. Templates that incorporate an angular state have been proposed [24], but their general application to control high-DoF robots remains open. In similar work, the control of centroidal angular momentum [30] has gained much attention as a heuristic for balance. The evolution of the centroidal momentum is linked to the net external wrench [48]. This approach generalizes the contributions of all legs into the torques and forces on the CoM. Using this, methods have shown that it can be exploited to accelerate off-line whole-body trajectory optimization [9] or simplify linear-quadratic control of balance [16]. While much of the research in legged locomotion is currently being done on humanoids, some notable advances have occurred with quadruped robots. Quadrupeds have distinct advantages from the larger amount of legs that can be used at a time for balance and disturbance rejection, but also feature added complexity in designing or generating gaits. Recent results show that they can be utilized in a variety of different scenarios that traditional non-legged robots cannot. For example, the Spot-Mini robot from Boston Dynamics has shown the capability for climbing stairs and stabilizing several gaits. The HyQ quadruped robot from IIT implements a variety of different gaits and control on high slope terrain [12, 27]. ANYmal from ETH Zurich’s Robotic Systems Lab as recently shown some advanced locomotion capabilities in unstructured terrains [18]. The MIT Cheetah 2 robot shown in Figure 1-1 has 18
demonstrated untethered autonomous jumping over obstacles [33] and high-speed, low-power bounding [31].
Figure 1-1: MIT Cheetah 2 Robot. Experimental legged robotic platform developed by the Biomimetic Robotics Lab at MIT.
1.2
Predictive Control
Recently, the use of Model Predictive Control (MPC) in robots has become increasingly popular. Model predictive control is achieved by solving an optimization that considers the modeled dynamics of the system over a receding prediction horizon and returns the set of control inputs that minimize a cost function for the desired behavior at the predicted timesteps. Variations include linear, quadratic, or nonlinear optimization problems, each with varying benefits and drawbacks. Research into the use of MPC for legged robotics has proven not only feasible, but shown to surpass the capabilities of some more traditional controllers. Results from applied MPC on hoppers [41], bipeds [37], humanoids [45, 14], and quadrupeds [29] have demonstrated the power of these predictive methods to plan movements for stabilization of complex systems with hybrid dynamics. With the improvements in computation times 19
through both more powerful computers and more efficient optimization solvers, MPC has been growing as an area of interest in robotics. Despite this, the computational demands of MPC still limit its application in dynamic high DoF systems. Bipedal humanoid walking trajectory generation the CoM has been achieved with known footstep orientations using linear MPC [15, 55] and with optimized orientations using nonlinear MPC [28]. Large push recovery was demonstrated with a predictive controller on the Sarcos robot [42]. In addition to handling the complex dynamics, predictive optimization techniques can naturally handle kinematic and dynamic considerations, such as joint and torque limits, self-collisions, and contact friction through the constraints and cost function. In an extension to DDP [45], simple heuristics are injected into the optimization in order to return controls that are feasible and safe. Predictive techniques have the distinct advantage of being able to anticipate future contact points that can be leveraged for stability later rather than attempting to instantaneously recover with the current contact mode. This widens the range of possible motions to include flight periods.
1.3
Contribution
The main contribution of this thesis is to develop and validate a novel control framework for generalized legged locomotion using a Policy Regularized Model Predictive Controller (PR-MPC) at its core. The approach uses a simple template model along with input regularization to deal with the complex legged locomotion dynamics that have made online nonlinear optimization difficult in robotics. Commonly used physics-based heuristics from the SLIP model and vertical impulse scaling are described as reference policies for foot positioning and corresponding feedforward ground reaction forces for a generic legged gait. This provides a good guess for the optimization solution in steady state. By regularizing the optimization inputs with these reference policies, the predictive controller biases the solution towards them without forcing a result. While respecting physical constraints, the optimization searches around the neighborhood of the policy for a solution that may yield better 20
state tracking of the desired trajectory. In particular, under disturbances and uncertain terrain, the nominal policy may not be optimal and require the optimization to find a better set of inputs. Using a direct collocation approach allows a sparse structure for the cost function, constraints, and their respective derivatives which allows for a high degree of generality across multiple 3D gaits and reference policies. Various capabilities in different simulation conditions are showcased in Figure 1-2 using a single set of controller parameters for all cases. Tro0ng with Disturbances
Galloping Turns
Bounding
Figure 1-2: Capabilities of PR-MPC. The novel framework allows for a wide variety of dynamic capabilities across multiple gaits and under disturbances with uncertain environments. Derivations, descriptions, and results throughout the thesis build on the introduction to PR-MPC as presented in previous work by the author as part of the MIT Biomimetic Robotics Lab [2]. Results from this thesis have encouraged the development of the controller for online implementation on the physical robot hardware. The proposed algorithm also provides a framework for testing of various novel algorithms and techniques for gait or trajectory planning with a robust, dynamically stable locomotion controller.
21
22
Chapter 2 Theory and System Modeling In this thesis, a control framework is designed, implemented, and tested in simulation for the MIT Cheetah 2 model. It serves as a proof of concept for the validity of the proposed method for robot locomotion. A dynamically accurate simulation environment was developed to evaluate the control performance. A successful implementation of the controller in simulation signifies that it is feasible for hardware implementation.
2.1
Robot Model
A system model for the MIT Cheetah 2 robot is defined. The kinematics and dynamics for the model are described throughout this section, both of which will be subject to certain appropriate simplifications of the actual robot. As a complex, high degree of freedom system the key simplifications are made in order to simulate the MIT Cheetah robot while adhering to realistic approximate dynamics for the system.
2.1.1
Kinematics
The MIT Cheetah robot is made up of a solid base with four legs that each have three degrees of freedom. Each leg has two powerful custom motors with extremely high torque density provide the hip and knee motions [50]. A relatively weaker abduction / 23
adduction motor allows some lateral positioning of the feet. A four bar linkage allows the robot to be designed with two motors located at the shoulder which control both the shoulder joint, as well as the elbow joint. This provides the benefit of having very light legs with low inertia [32]. As a result of this design having powerful, torque dense motors acting on low inertia legs, the robot is able to rapidly swing its legs in the air. The robot’s kinematics can roughly be modeled as a rigid box floating base with linked rigid bodies attached to each of the hip and shoulder joints. The femur and foot rigid bodies are constrained to be parallel due to the four bar linkage that actually composes the links on the robot. By formatting the kinematics as a tree structure as shown in Figure 2-1, it is easy to use established tree traversal algorithms to relate the various joints to one another systematically.
Figure 2-1: Kinematic Tree Model. The robot’s kinematics are defined as a kinematic tree composed of kinematic chains attached to a free floating base.
2.1.2
Dynamics
Using the above kinematic definition, we can model the dynamics of the full system. However, because the robot is designed such that it has light legs with low inertia and only about 10% of the total mass, its dynamics can be simplified to only include the 24
base body while treating the legs as massless rigid bodies. This approach has been proven to be valid for this robot in [33], which uses the same single rigid body base assumption for bounding and autonomous jumping. Figure 2-2 depicts the dynamic coordinate system definitions as well as the system inputs. We first define an inertial coordinate system denoted by ℐ as the leading superscript. A body-fixed coordinate system is attached to the robot’s CoM and denoted by ℬ as the leading superscript. Physical quantities expressed without an explicitly defined coordinate frame, as denoted by the leading superscript, are assumed to be in the inertial, ℐ, frame.
Figure 2-2: Physical System Definitions. Coordinate systems and definitions for the rigid body model in 3D. The vectors 𝑟𝑗 specify the position of each foot relative to the body CoM, while forces 𝑓𝑗 provide the force under each foot.
Since we are considering the dynamic contribution of the legs to be small compared to the base body, the state of the free floating base is defined to be updated as ⎧ ⎫ ⎪ ⎪ ⎪ 𝑝𝑐 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ℐ ⎪ ⎪ ⎨ 𝑞ℬ ⎬ 𝑥 := ⎪ ⎪ ⎪ 𝑝˙ 𝑐 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩𝜔⎪ ⎭ 25
(2.1)
where 𝑝𝑐 ∈ R3 provides the position of the body CoM, ℐ𝑞ℬ ∈ SO(3) the quaternion orientation representation of the body frame ℬ with respect to the inertial frame ℐ, and 𝜔 ∈ R3 the angular velocity of the body in the inertial world frame ℐ. These states describe the absolute state of the robot’s CoM.
Forces 𝑓𝑗 ∈ R3 acting at foot locations 𝑟𝑗 ∈ R3 relative to the CoM provide an external input to the system. While these ground reaction forces at the foot locations are used to model the dynamics of the system, these cannot be directly commanded by the robot in practice. Therefore it requires the use of the motor torques from the joints on the leg to generate these forces. This can be solved for simply with the following relation 𝜏𝑗 = 𝐽 (𝑞𝑗 )𝑇 𝑓𝑗
(2.2)
where 𝐽 (𝑞𝑗 ) is the joint angle, 𝑞𝑗 , dependent Jacobian for the leg, 𝑗. By design, the torques commanded to the motors can be assumed to be correct to the actual torques produced by the motor. Foot positions relative to the inertial frame origin are denoted as 𝑝𝑗 = 𝑝𝑐 + 𝑟𝑗 . Figure 2-2 graphically depicts the relationship between the CoM position, relative foot positions, and ground reaction forces.
[︁ ]︁ The system input vector 𝑢 = 𝑟1 𝑓1 . . . 𝑟4 𝑓4 , contains the forces and relative foot position vectors for each of the legs as these quantities are important to legged locomotion and provide the system input through contact forces. The net wrench created through this interaction of the feet with the ground includes terms which are bilinear in the forces 𝑓𝑗 and contact locations 𝑟𝑗 . The net force and net moment about the CoM, 𝑓 and 𝜏 respectively, are given as ⎧ ⎫ ⎨𝑓 ⎬
= ℎ(𝑢) =
⎩𝜏 ⎭
4 ∑︁
⎡
⎤
I ⎣ ⎦ 𝑓𝑗 [𝑟 ] 𝑗=1 𝑗 ×
(2.3)
where [𝑟𝑗 ]× ∈ so(3) is the skew symmetric matrix such that [𝑟𝑗 ]× 𝑏 = 𝑟𝑗 × 𝑏, for any 𝑏 ∈ R3 . Following this, the free floating base dynamics can be modeled as a 3D rigid 26
box with dynamics standardly described by the equations of motion ⎫ ⎧ ⎫ ⎧ ⎪ ⎪ 𝑝˙ 𝑐 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 𝑝˙ 𝑐 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 1 ⎬ (︁ )︁ ⎨ 𝑞˙ ⎬ ⎨ 𝑞 ⊗ 𝒲 2 = . 𝑥˙ = 𝑓 𝑥, 𝑢 = ⎪ ⎪ ⎪ 1 ⎪ ⎪ ⎪ ⎪ 𝑝¨𝑐 ⎪ 𝑓 − 𝑔 ⎪ ⎪ ⎪ ⎪ 𝑚 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ (︁ )︁⎪ ⎪ ⎪ ⎪ ⎩ 𝜔˙ ⎪ ⎭ ⎪ ⎪ −1 ⎩𝐼 𝜏 − 𝜔 × 𝐼𝜔 ⎭
(2.4)
where 𝒲 = [0 𝜔]𝑇 , 𝑞 ⊗ 𝒲 provides a quaternion product, and 𝑔 ∈ R3 is the gravitational acceleration. The inertia tensor 𝐼¯ ∈ R3×3 in inertial coordinates is ¯ as 𝐼¯ = ℐ 𝑅ℬ ℬ 𝐼¯ℐ 𝑅ℬ𝑇 . given from the inertia in body coordinates, ℬ 𝐼, The massless leg assumption greatly simplifies the model complexity and decreases simulation time while sacrificing little accuracy. As such, these simplified dynamics are fully described by equation 2.4 and assumed to closely approximate the true robot dynamics while needing to simulate 13 DoF rather than all 37 DoF for the full model including joint angles and velocities. With this in mind, we define the important parameters for the simplified MIT Cheetah 2. In Table 2.1, we can see the approximate parameters for the robot. Inertias are defined in the body frame, ℬ, as these are constant regardless of orientation. Body width and body height are the ℬ 𝑥 and ℬ 𝑦 axis dimensions of the robot base that provide the location of the hips. The hips are assumed to be at the same body fixed ℬ 𝑧 location as the CoM. Table 2.1: System parameters of the simplified Cheetah model Parameter
Value
Units
𝑚
33.0
kg
ℬ
𝐼¯𝑥𝑥
0.50
kg m2
ℬ
𝐼¯𝑦𝑦
2.90
kg m2
ℬ
𝐼¯𝑧𝑧
2.90
kg m2
Body Width
0.20
m
Body Length
0.70
m
𝑙𝑚𝑎𝑥
0.60
m
27
2.2
Gait Scheduling
Emergent gait generation is an open problem in robotics research that deals with automatic selection of footfall pattern and timings from some higher level specifications such as COM momentum [54]. However, this is outside the scope of this thesis. Therefore, the current framework uses operator selected, fixed-gait timing for choosing footfall patterns. Since we are using the massless leg assumption, we can enforce immediate touchdown and liftoff for the feet, meaning that gait timings can be enforced to be exact. For example, Figure 2-3 shows the gait pattern map for a running trot.
Figure 2-3: Trot Gait Pattern. A gait pattern map for the running trot that defines contact states for the leg over time with a red bar and swing phases otherwise. Distinct contact phases are notted by the dashed black lines.
Various diverse gaits can be automatically specified with these gait pattern maps that provide a set of phases over a chosen gait period time. Each phase has an associated percent duration of the total period, as well as an indication of which feet are currently in stance or in swing. A phase is more formally described as the duration in which the same set of feet are in stance or in swing, beginning and ending with either a touchdown or a lift off event of any of the feet. These events are contact state transitions. The gait pattern map will provide the following boolean variable
𝑠𝜑 =
⎧ ⎪ ⎨1, if 𝑐𝑜𝑛𝑡𝑎𝑐𝑡
(2.5)
⎪ ⎩0, if 𝑠𝑤𝑖𝑛𝑔 where 𝑠𝜑 is a phase-based state variable that gives the scheduled contact state of the leg according to the defined gait at the given time. 28
By defining gaits through this gait pattern map, it is possible to describe any cyclic gait. The legs do not need to have the same cycle as the others, but for simplicity we will be defining the gaits as having an overall period time, 𝑇𝑝 , and a stance time, 𝑇𝑠 . These parameters are equivalent for all of the legs. This inherently also provides the appropriate swing time to the swing leg trajectory generator. An offset between beginning of stance periods of each leg defines the overall gait pattern. Transitions between gaits are possible by interpolating between the phase offsets of each leg, as well as the timing parameters. However, we will assume that the gaits are defined and constant throughout the simulations. Generally, the majority of naturally occurring quadruped gaits seem to follow this as all legs tend to be in contact and swing roughly the same amount of time.
29
30
Chapter 3 Predictive Control Formulation The aim of this work is to move towards a control framework that allows autonomous navigation from a starting location to a desired goal state. As depicted in Figure 3-1, the robot should be able to safely navigate its environment, while avoiding obstacles and remaining on its feet throughout.
Figure 3-1: Autonomous Navigation Plan. The robot will receive a goal state and autonomously generate a locomotion plan to safely navigate its environment.
While autonomous navigation is the eventual goal of the project, this thesis focuses solely on a middle-level, predictive control algorithm for choosing foot placement locations and feed-forward forces on the stance feet based on a higher-level plan for 31
a desired CoM trajectory. We will assume that the desired CoM path, 𝑥𝑑 , is already safe and feasible. We also assume that the foot forces are adequately commanded by joint torques from the motor and that those torques are correctly applied to the robot’s motors. Furthermore, the state estimator is assumed to be accurate with varying degrees of zero-mean Gaussian noise. As mentioned previously in section 2.2, gaits are user specified and feature fixed desired phase timings and patterns.
𝒙𝑔𝑜𝑎𝑙 𝒛ො ෝ 𝒙 𝑡
𝒙𝑑
Path Planning
𝑮𝒂𝒊𝒕 ෝ 𝒙
Model Predictive Controller
Gait Generation
𝒇 𝑓 𝒓ො𝑓
ෝ 𝒙 ෝ 𝒙
Robot Dynamics
𝒙
State Estimation
𝑮𝒂𝒊𝒕
Figure 3-2: Overall System Framework. Proposed block diagram framework for autonomous navigation with the MIT Cheetah. Detailed descriptions of the white blocks are presented while gray blocks represent components that are outside the scope of this work.
Each of the gray blocks are current open research problems that for the purposes of developing this predictive middle-level control are assumed to be appropriately solved. Various solutions for accurate estimation of a robot’s state have been demonstrated in [40, 4] while countless others have found reliable trajectory planning methods [51, 22]. The outputs of each of the gray blocks are generated artificially to simulate actual outputs in practice.
3.1
Simplified Discrete Dynamics
Since nonlinear MPC is generally not computationally feasible to use online in problems of this complexity due to possibly unwanted local minima creating an irregular and poorly conditioned cost space, a simpler model would remove some of these non32
linearities and smooth out the cost surface. Therefore, a control model is defined from a set of simplifications and approximations that allow us to strategically reduce the model complexity while still capturing the nonlinearities that are characteristic of legged locomotion such as the interaction of the forces at the foothold locations with the CoM. This much simpler dynamic model for the robot is defined in order to improve computation time during the optimization for real-world applications.
The orientation of the robot is described in the control model through Euler angles as Θ = [𝜃, 𝜑, 𝜓]𝑇 , with an earth-fixed roll (𝜃), pitch (𝜑), yaw (𝜓) convention. While we consider motions with non-zero pitch and roll, their effects on the instantaneous dynamics of the system can can be simplified. Consider the net external torque from (2.3) described in body coordinates as
ℬ
𝜏ˆ =
4 ∑︁
ℐ
𝑅ℬ𝑇 [𝑟𝑗 ]× 𝑓𝑗
(3.1)
𝑗=1
Under well-controlled locomotion, the pitch and roll of the base body are bounded above and below by a small value, meaning that the ℬ𝑧 axis of the robot’s body frame shown in Figure 2-2 remains close to parallel with the ℐ𝑍 axis of the inertial world frame. With this in mind, the transformation of the net moment into body coordinates can be approximated by noting that lim
^ 𝜑)→(0,0) ^ (𝜃,
ˆ ˆ ℬ (Θ) ˆ = 𝑅𝑧 (𝜓) 𝑅
ℐ
(3.2)
ˆ denotes a rotation of 𝜓ˆ radians about the ℐ𝑍 axis. This provides an where 𝑅𝑧 (𝜓) approximated version of (2.3) as ⎧ ⎫ ⎤ ⎡ 4 ⎨ 𝐹^ ⎬ ∑︁ I3 ˆ 𝑥, 𝑢 ⎣ ⎦ 𝑓^𝑗 . ℎ(^ ^) := = ⎩ ℬ 𝜏^ ⎭ 𝑇 𝑅 (𝜓)[^ 𝑟] 𝑗=1
𝑧
(3.3)
𝑗 ×
Again, such an approximation will only be valid provided that ℬ𝑧 remains within some reasonably bounded cone along the ℐ𝑍 axis. This approximate body torque ℬ 𝜏ˆ 33
provides an approximate angular acceleration −1 ℬ
ℬ˙
ˆ = ℬ𝐼¯ 𝜔
𝜏ˆ
(3.4)
where velocity product terms ℬ𝜔 × ℬ𝐼¯ ℬ𝜔 are neglected since they are small in practice. The approximate alignment of the body frame and a yaw-rotated frame can also be used to simplify the dynamics of the Euler angle rates. The body angular velocity ℬ
˙ through 𝜔 is related to Euler angle rates Θ ˙ = 𝐵(Θ) ℬ 𝜔 Θ
(3.5)
where 𝐵(Θ) ∈ R3×3 is an Euler-angle rate matrix [47]. Although this matrix has a complex form in general, using the previous assumption, for states where roll and pitch are small, it can be approximated from lim (𝜃,𝜑)→(0,0)
𝐵(Θ) = I3 ,
(3.6)
˙ ≈ ℬ 𝜔. Note that all of these approximations are applied to the dynamics of giving Θ the system, allowing non-zero pitch and roll evolution over time, but neglecting their effects on the instantaneous dynamics. With these approximations, the transformation of torques to the body frame means that no nonlinearities in the approximate model dynamics appear outside of (3.3). The final simplified discrete control model is then described by ˆ 𝑥𝑖 , 𝑢 𝑥 ^𝑖+1 = 𝐴𝑖 𝑥 ^𝑖 + 𝐵𝑖 ℎ(^ ^𝑖 ) + 𝑑𝑖 ,
(3.7)
where 𝐴𝑖 and 𝐵𝑖 are both fixed matrices and 𝑑𝑖 a fixed vector dependent on the discrete time step, ∆𝑡𝑖 ⎡ ⎤ ⎡ 2 ⎤ ⎡ 2 ⎤ Δ𝑡𝑖 ℬ −1 Δ𝑡𝑖 𝐼 𝑎𝑔 I6 ∆𝑡𝑖 I6 2 ⎦ , 𝐵𝑖 = ⎣ ⎦ , 𝑑𝑖 = ⎣ 2 ⎦ 𝐴𝑖 = ⎣ ℬ −1 0 I6 ∆𝑡𝑖 𝐼 ∆𝑡𝑖 𝑎𝑔
(3.8)
¯ and 𝑎𝑔 = [𝑔 𝑇 , 0𝑇 ]𝑇 . These discrete dynamics have a simple with ℬ𝐼 = diag(𝑚I3 , ℬ𝐼) 3×1 34
form, and thus are a good candidate for MPC optimization. Their function evaluations will be rapid and state transition constraints enjoy a great deal of sparsity. These benefits collectively can be exploited in planning forces and footsteps for locomotion. The results of this work show that these approximations are valid, accelerate optimization, and are applicable for a wide range of dynamic gaits.
3.2
Generalized Prediction Horizon
Legged locomotion is characterized by sets of stance and swing phases, in which certain combinations of legs may be on the ground or in the air at any given time. Therefore, rather than formulate the prediction horizon as the standard look ahead in time, it is more intuitive to look forward a certain number of gait phases, 𝑁 which are selected by the user. This way, the prediction knows how much stance and flight time it has throughout the gait period and does not use too much force or too little force by ending the horizon mid stance or mid flight. The prediction horizon is further discretized into a chosen number of divisions per phase, 𝐾. However, the simplified discrete dynamics model is linear during flight phases since there are no feet in stance to exert a force on the ground and affect the CoM dynamics. Therefore there is only a need for one division per step during flight, 𝐾𝑓 𝑙𝑖𝑔ℎ𝑡 = 1. This reduces the number of optimization variables to optimize without further loss of accuracy since there can be no input forces during these ballistic trajectory flight periods anyway. The structure of the prediction for the example running trot gait described previously is shown in Figure 3-3. Here, the prediction horizon parameters of 4 gait phases, 𝑁 = 4, and 3 divisions per phase, 𝑁 = 3, have been arbitrarily selected for demonstration purposes. The gait pattern map is shown for 2 full periods which are made of 4 phases each. Stance phases that are being considered in the prediction horizon are shown in blue color, while those that are not are shown in red. Although the prediction algorithm returns planned inputs for all considered phases, only the first phase, outlined in green, is actually used as the feed forward forces. After the first planned phase is executed, a new multiple phase prediction is calculated. 35
𝑇𝑃
𝑇𝑆
𝟏) 𝑭𝑹 𝟐) 𝑭𝑳 𝟑) 𝑩𝑹 𝟒) 𝑩𝑳 𝑃ℎ𝑎𝑠𝑒, 𝑛: 𝐷𝑖𝑣𝑖𝑠𝑖𝑜𝑛, 𝑘: 𝑇𝑖𝑚𝑒 𝑆𝑡𝑒𝑝, 𝑖:
1 1 1
2 2
3 3
2 1 4
3 1 5
2 6
3 7
4 1 8
Figure 3-3: Gaited Prediction Horizon Definition. An example prediction horizon definition for the running trot gait with {𝐺𝑎𝑖𝑡} = {𝑁 = 4, 𝐾 = 3}. The algorithm returns an optimized plan for all N phases depicted in blue, but executes only the first planned phase outlined in green. Using this definition for the generalized prediction horizon, the controller can look ahead any arbitrary amount of gait phases with divisions that are infinitesimally small. However, the algorithm has computational efficiency of approximately O(NK) and so each increase in 𝑁 and 𝐾 is a heavy computational cost. This is an upper bound estimate that does not take into account the reduction in 𝐾 during flight phases. For this reason, it is not practical to choose large numbers for these prediction horizon parameters.
3.3
Physics-Based Policy Regularizer
This work presents a method for exploiting the known dynamics of the robot, biologically inspired observations about animal behavior during locomotion, and previous work in robotics through a Policy Regularized Model Predictive Control. Rather than approaching the MPC locomotion problem as a blackbox optimization, we can inject a known reasonable policy from simple, physics-based models. Regularization generally imposes certain structure on the optimization results to penalize complexity [46]. Due to the complex nonlinearity of legged robot contact interactions, injecting this regularization helps to smooth out the cost surface and bias the solution towards reasonable inputs without forcing the results. 36
These policies provide a reasonable initial guess for the foot locations and their associated ground reaction forces, but may not be the optimal solution. Furthermore, we can add feedforward heuristics to correct for disturbances. The designed policies show up during the optimization both as an initial guess, 𝑢0 , as well as a reference policy, 𝑢𝑟𝑒𝑓 , which regularizes the control inputs. The nominal footstep locations are designed using a combination of Raibert heuristic for forward walking with the inverted pendulum dynamics [39] and a feedback term from the capture point formulation to correct for deviations from the desired trajectory [38]. The step location is calculated from the CoM of the robot by the following 𝑝𝑟𝑒𝑓,𝑗 where 𝑃 = ( 10
0 0) 1 0
(︂ )︂ √︂ 𝑇𝑠 𝑧0 = 𝑃 𝑃 𝑝𝑐,0 + 𝑝ℎ,𝑗 + 𝑝˙ 𝑐,𝑑 + (𝑝˙ 𝑐 − 𝑝˙ 𝑐,𝑑 ) 2 ‖𝑔‖ 𝑇
(3.9)
is a projection matrix to give the footstep location in the ℐ 𝑋𝑌
plane, 𝑧0 is the nominal height of locomotion, 𝑝ℎ,𝑗 provides the position of hip 𝑗 relative to the CoM, and 𝑝𝑐,0 is the initial position of the CoM at the start of the step. For the purpose of constructing a reference control, a forward pass is calculated over the prediction horizon given the current state of the robot and the reference inputs. The CoM is approximated to follow the simplified dynamics in equation 3.7, yielding 𝑝𝑐,𝑖 as the position of the CoM at division 𝑖. This provides a reference vector of foot 𝑗 relative to the CoM at division 𝑖 as (︁ )︁ 𝑟𝑟𝑒𝑓,𝑗,𝑖 = 𝑠𝜑 𝑝𝑟𝑒𝑓,𝑗 − 𝑝𝑐,𝑖 .
(3.10)
Reference forces are calculated equivalently for each foot using the principle of vertical impulse scaling from [31], as well as the conical pendulum model to generate feedforward centripetal forces for turning [21]. These forces take the form
𝑓𝑟𝑒𝑓
⎡ ⎤ ˙ ˙ cos(𝜓𝑑 ) 𝜓𝑑 ‖𝑝𝑐,𝑑 ‖ ⎥ 𝑚𝑇𝑃 ⎢ ⎢ ⎥ ˙ = ⎢ sin(𝜓𝑑 ) 𝜓𝑑 ‖𝑝˙ 𝑐,𝑑 ‖ ⎥ 4 ∑︀ ⎣ ⎦ 𝑇𝑆 𝑠𝜑 ‖𝑔‖ 𝑗 37
(3.11)
where 𝑇𝑃 is the overall gait period and 𝑇𝑆 is the associated stance time for each leg. The vertical impulse scaling will allow the controller to anticipate flight periods and exert enough force during the known stance times to remain upright while in flight. A feedforward term could also be added to help with translational or rotational accelerations as well. Roughly, this PR-MPC approach says that given a certain gait and desired CoM trajectory under ideal conditions during steady state locomotion, we already know a good guess for the inputs and local minimum. However, if there are any disturbances or model uncertainties that cause it to deviate from the expected trajectory, the local minimum may change and the MPC will adjust the forces and foot placements to search for a new adequate set of inputs close to the determined policy.
3.4
PR-MPC Formulation
With our proposed framework, the MPC algorithm will take in a current state, as well as a desired trajectory and the gait pattern over the prediction horizon, returning optimized force and foot locations. An iterative cost function and constraints will be defined that penalize the deviation from the desired states and constrain aspects to be physically feasible. The general MPC optimization framework is posed as
min 𝜒
𝑇 ∑︁
ˆ𝑖, 𝑢 ˆ 𝑖 ) + ℓ𝑇 (𝑥 ˆ𝑇 ) ℓ𝑖 (𝑥
(3.12)
𝑖=0
subject to 𝑥 ^𝑖+1 = 𝑓 (^ 𝑥𝑖 , 𝑢 ^𝑖 )
(3.13)
ˆ𝑖, 𝑢 ˆ 𝑖) ≤ 0 𝑐𝑖 ( 𝑥
(3.14)
ˆ𝑖, 𝑥 ˆ 𝑖+1 , 𝑢 ˆ 𝑖, 𝑢 ˆ 𝑖+1 ) ≤ 0 𝑐𝑖′ (𝑥
(3.15)
PR-MPC is designed as a direct collocation formulation. The states and controls are both considered as decision variables where each incremental cost is dependent only on the current time step’s corresponding variables. The constraints are defined to be dependent solely on current and adjacent time steps, as in (3.15). Due to this, the 38
emergent banded diagonal sparsity patterns within the cost and constraint Jacobians and Hessians can be exploited in the solvers [10]. In addition, since the solutions to nonlinear differential evolutions are often sensitive to small perturbations, single shooting trajectory optimization often suffers from a high degree of nonlinearity as the dynamics constraints must be enforced throughout the considered time. In contrast, direct collocation formulations approximate the highly nonlinear dynamics constraints more appropriately by breaking them into smaller, less sensitive, integration windows [10]. This requires that the dynamics hold only between adjacent time steps rather than continuously throughout the entire trajectory.
3.4.1
Cost Function
The hope is that by using this physics-based policy regularization, the cost function can be designed to follow the given trajectory in a similar manner to what we would see in nature. We define a state trajectory error term, 𝑥 ˜, and a measure of deviation from the policy, 𝑢 ˜, through the following two equations 𝑥 ˜ = 𝑥𝑑 − 𝑥 ^
(3.16)
𝑢 ˜ = 𝑢𝑟𝑒𝑓 − 𝑢 ^
(3.17)
Each of these error terms will be used to create incremental costs at each step. By defining the following weighting matrices 𝑄 ≥ 0 ∈ R12×12 ,
(3.18)
𝑅 ≥ 0 ∈ R24×24 ,
(3.19)
the cost function can be written as a sum of the weighted quadratic costs for state error and reference policy deviation. ˆ𝑖, 𝑢 ˆ 𝑖) = 𝑥 ℓ𝑖 (𝑥 ˜𝑇𝑖 𝑄𝑖 𝑥 ˜𝑖 + 𝑢 ˜𝑇𝑖 𝑅𝑖 𝑢 ˜𝑖 ˆ𝑇 ) = 𝑥 ℓ𝑇 (𝑥 ˜𝑇𝑇 𝑄𝑇 𝑥 ˜𝑇 , 39
(3.20) (3.21)
This reference policy deviation is the regularization term which penalizes potentially undesirable solutions. By design, the policies bias the solution towards the reference, while allowing the optimization some freedom to explore the space around the initial guess for more desireable behavior. During steady state locomotion, the solution is generally expected to be close to the reference, but the optimization will take over and find a different solution during transient behavior and disturbances.
3.4.2
Constraints
Constraints must be added to the optimization for the results to produce physically realizable foot placements and ground reaction forces. For the duration of the prediction, the states must adhere to the simplified discrete dynamics that were defined for the robot’s control model. Intuitively, we know the foot must be touching the ground to produce force. It is also required that this foothold on the ground must be within the kinematic limits. Furthermore, it is assumed that the foot will not slip or cause any ground deformation. This requires that the forces remain within a friction pyramid approximation for the surface. The physical representations of these constraints is seen in Figure 3-4, along with their implications.
Figure 3-4: Physical Constraint Depiction. The optimization is constrained to remain physically feasible throughout the prediction horizon. 40
These constraints are written as equality and inequality constraints. First, the dynamic feasibility constraint is written iteratively through the simple dynamics control model defined previously, ˆ 𝑥𝑖 , 𝑢 𝑥 ^𝑖+1 = 𝑦^𝑖 = 𝐴𝑖 𝑥 ^𝑖 + 𝐵𝑖 ℎ(^ ^𝑖 ) + 𝑑𝑖
(3.22)
with timestep 𝑖 being used since the dynamics are not gait phase dependent. Next, the constraints on the foot location are defined as follows. Throughout stance, the foot must be on the ground, meaning that the 𝑍 coordinate of the foot in stance must be the same as the ground height at that point which is described as 𝑧ˆ𝑔 (𝑝𝑥,𝑓 , 𝑝𝑦,𝑓 ) = 𝑝𝑧,𝑓
(3.23)
where 𝑧ˆ𝑔 (𝑝𝑥,𝑓 , 𝑝𝑦,𝑓 ) is the estimated ground height at the (𝑥, 𝑦) coordinates of the foot location. Since this thesis is primarily focused on developing the control algorithm and framework, the ground height will be uniformly estimated to be 0 𝑚 even if it is not. Future advancements will incorporate sensor information from a LiDAR or camera vision to estimate the ground height. We also know that the foot location must be reachable at each timestep and therefore the magnitude of the vector from the hip to the foot must be less than the maximum length of the leg as ⃦ ⃦ ⃦ ⃦ ⃦𝑟ℎ→𝑓 ⃦ ≤ 𝑙𝑓,𝑚𝑎𝑥
(3.24)
and 𝑟ℎ→𝑓 being the vector from the hip to the corresponding foot location. In addition, the foot is restricted during stance. Since the ground is also assumed to be solid, the foot position is constrained to be constant throughout stance by the following equation 𝑠𝜑,𝑓,𝑖+1 𝑠𝜑,𝑓,𝑖 𝑝𝑓,𝑖+1 = 𝑠𝜑,𝑓,𝑖+1 𝑠𝜑,𝑓,𝑖 𝑝𝑓,𝑖
(3.25)
where 𝑠𝜑,𝑓,𝑖 ∈ {0, 1} is the gait scheduled boolean variable. Here the constraint is only active if both the current and next step foot contact state is scheduled to be
41
in contact. Along with the constraint that the foothold location must be kinematically reachable at each time step, the foot position will be ensured to be within the kinematic limits throughout the entire stance period. This is how the dark gray area denoting kinematic feasibility throughout the foot’s stance time in Figure 3-4 is implicitly enforced. The forces must also adhere to physical constraints. Since the robot’s feet are design to have a small base, in this case approximated by a point mass, and do not have any adhesive properties, the component of the force, normal to the ground plane must be positive as the foot cannot pull on the surface. ˆ 𝑝ˆ𝑓 ) ≥ 0 𝐹^𝑓 · ∇G(
(3.26)
ˆ 𝑝ˆ𝑓 ) is the estimated ground surface equation at the foot location. However, where G( similarly to the ground height estimate in constraint equation 3.23, this implementation does not use sensor estimates for this surface. Instead, it is assumed to be ˆ 𝑝ˆ𝑓 ) = [0, 0, 1]𝑇 . a uniformly flat plane at height, 𝑧ˆ𝑔 (𝑥, 𝑦) = 0, ∀(𝑥, 𝑦), meaning ∇G( Finally, friction cone constraints are subjected to an inner pyramidal approximation for simplicity with the following equation, −𝜇𝐹𝑧,𝑓 ≤ 𝐹𝑥,𝑓 ≤ 𝜇𝐹𝑧,𝑓
(3.27)
−𝜇𝐹𝑧,𝑓 ≤ 𝐹𝑦,𝑓 ≤ 𝜇𝐹𝑧,𝑓
(3.28)
using a coefficient of friction, 𝜇 = 0.3. Although it is likely that most normal operating conditions will allow higher lateral forces, we do not have prior knowledge of the actual ground material properties or normal vector. Therefore, this value is chosen to purposely approximate a relatively conservative contact scenario between the footpad and the ground for the MIT Cheetah. These foot placement and ground reaction force constraints require the optimization to return physically realizable solutions. Therefore, the full PR-MPC optimization problem can be posed more functionally with the cost function and the equality 42
and inequality constraints in the standard form as follows,
min 𝜒
subject to
𝑁 ∑︀ 𝐾𝑛 (︁ ∑︀
)︁ 𝑥 ˜ 𝑄˜ 𝑥+𝑢 ˜ 𝑅˜ 𝑢 𝑇
(︁
𝑇
𝑛=1 𝑘=1
𝑛,𝑘
𝑇
+ 𝑥 ˜ 𝑄˜ 𝑥
)︁ 𝑇
Simplified Discrete Dynamics (︁ )︁ ˆ 𝑥 ^𝑖+1 − 𝐴𝑖 𝑥 ^𝑖 + 𝐵𝑖 ℎ(^ 𝑥𝑖 , 𝑢 ^𝑖 ) + 𝑑𝑖 = 0 Foot placed on ground 𝑧ˆ𝑔 (𝑝𝑥,𝑓 , 𝑝𝑦,𝑓 ) − 𝑝𝑧,𝑓 = 0 Kinematic leg limits ‖𝑟ℎ→𝑓 ‖ − 𝑙𝑓,𝑚𝑎𝑥 ≤ 0 Foot stationary in stance ⃦ ⃦ ⃦ ⃦ 𝑠𝜑,𝑓,𝑖+1 𝑠𝜑,𝑓,𝑖 ⃦𝑝𝑓,𝑖+1 − 𝑝𝑓,𝑖 ⃦ = 0 Positive normal GRF ˆ 𝑝ˆ𝑓 ) ≤ 0 −𝐹^𝑓 · ∇G( Lateral force friction cone |𝐹𝑥,𝑓 | − 𝜇𝐹𝑧,𝑓 ≤ 0 |𝐹𝑦,𝑓 | − 𝜇𝐹𝑧,𝑓 ≤ 0
It is important to note that the optimization structure is designed to be generalized to any set of desired state trajectories, reference inputs, and gait scheduled constraints. The reference policy is chosen from common heuristics and simplifications, but could easily be substituted without the need to change the controller. Similarly, any number of defined gait pattern maps could be used for contact scheduling. Another key point is to acknowledge that the model used for predictions is a simplified linear model of a complex system with many more DoF than we are considering. Because of this, it is not necessary to achieve high degrees of optimality, nor adhere to strict constraint tolerances. Solving approximate models to exact precision will still be subject to errors when executed on the robot. Therefore, part of the formulation dealt with managing a tradeoff between computation time and optimality.
43
44
Chapter 4 Simulation Results The PR-MPC algorithm for the MIT Cheetah robot was tested extensively in simulations carried out using a realistic dynamics environment in MATLAB. Simulations used a Runge-Kutta method provided by the built-in ode45 function for integrating the system dynamics with events to detect gait phase timings. At the beginning of each gait phase, the PR-MPC algorithm was run using fmincon to provide the controller inputs over the upcoming phase. Foot locations were updated based on these inputs if a stance phase is scheduled to begin for the corresponding foot.
4.1
From Naïve MPC to PR-MPC
A set of simulations were run using the trot walk gait for several variations of MPC. ˆ 𝑠𝑒𝑒𝑑 , and the value for the input The values for the initial seeded guess for the inputs, 𝑢 regularization weighting matrix for each of the four tested cases is shown in Table 4.1. Here the initial seeded input guess is broken up in two parts, one for the force ˆ 𝑓 ,𝑠𝑒𝑒𝑑 , and one for the foot position vector components, 𝑢 ˆ 𝑟,𝑠𝑒𝑒𝑑 . The 0 components, 𝑢 entries specify all zero entries for the respective vector or matrix as appropriate. The gradual addition of policy information into the optimization clearly shows the effects and importance of each part towards the success of the algorithm. 45
Table 4.1: Simulation cases ˆ 𝑓 ,𝑠𝑒𝑒𝑑 Case 𝑢
ˆ 𝑟,𝑠𝑒𝑒𝑑 𝑢
𝑅
A
0
0
0
B
𝑓𝑟𝑒𝑓
0
0
C
𝑓𝑟𝑒𝑓
𝑟𝑟𝑒𝑓
0
D
𝑓𝑟𝑒𝑓
𝑟𝑟𝑒𝑓
𝑅𝑓^,𝑟^
The first and most basic variation, Case A, is referred to as naïve MPC because it does not incorporate any of the physics-based information, is a non-policy seeded, non-regularized optimization. The constraints are still enforced as they as standard across all legged locomotion, but no prior knowledge about the inputs is used. Since we have the initial guess of all zeros for both foot locations and ground reaction forces, the solution converges to an undesired local minimum almost immediately with very little force and therefore does not provide a stable locomotion solution in Figure 4-1.
Figure 4-1: Case A: Naïve MPC. Without any information from the simple physics references, the optimization returns forces and foot placements at local minima that is unable to stabilize the robot even during standing. Since the previous case proved to be infeasible for locomotion due to the inadequate ground reaction forces, the vertical impulse scaling based force reference policy was provided as an initial guess in Case B. Running the simulation with this setup was enough to counteract the effects of gravity, but the initial guess for foot locations was given as vectors of all zeros since only the force policy is provided for an initial guess. The zero location, 𝑟𝑟𝑒𝑓 = [0, 0, 0]𝑇 , corresponds to the position at zero height 46
directly under the COM. Figure 4-2 shows how the poor initial guess results in all of
Y Position [m]
the legs converging on average to this location signifying that it is a local minimum. Case B 0.2 0 -0.2 -0.5
-0.25
0
0.25
0.5
X Position [m] Figure 4-2: Case B: Force Seeded MPC. Resultant footstep locations without any footstep placement seeding or regularization cluster under the CoM.
The real power of the regularization can be seen in comparison with Case C, which is the MPC version provided with the full reference policy as an initial guess, but not regularized. The optimization is free to explore the entire cost space within the constraints without penalty to deviation from the reference. As seen in Figure 4-3, the majority of the footsteps are found to be around the nominal location, but on occasion the footstep deviates. While we command the robot to follow a flat orientation throughout, we know there are natural fluctuations from pitch and rolling motions. The lack of regularization means that footsteps and forces will search for a way to cancel these with no penalty and end up in local minima that may minimally
Y Position [m]
reduce the cost, but at the expense of unwanted foot locations and forces. Case C 0.2 0 -0.2 -0.5
-0.25
0
0.25
0.5
X Position [m] Figure 4-3: Case C: Fully Seeded MPC. Resultant footstep locations with a given footstep placement seeding, but no regularization.
47
With the full algorithm including the initial reference seed and the policy regularization, the optimization returns the expected results during forward steady state behavior without any disturbance. The footstep patterns are even and there are no major outliers as seen in Figure 4-4. The gait looks similar to a natural animal trot as it was designed to be and the CoM does not experience any large fluctuations.
Y Position [m]
Case D 0.2 0 -0.2 -0.5
-0.25
0
0.25
0.5
Time [s] Figure 4-4: Case D: PR-MPC . Resultant footstep locations with both a seeded initial footstep location reference, as well as regularization for the full PR-MPC case.
4.2
Basic Locomotion
With the PR-MPC allowing for successful standing and movement, basic locomotion capabilities were then tested. Weighting parameters were tuned by individually perturbing each state until a desired response was achieved. These were not precisely calculated as the algorithm is designed and expected to stabilize locomotion robustly rather than depend on exact state and input weights. The values used throughout all of the simulation experiments are tabulated in Table 4.2. It is important to note that no weighting parameters needed to be changed in order to achieve the results presented for the disturbance rejection and environment uncertainty tests in section 4.3.1, nor did they need to be changed for any of the tested gaits in section 4.3.2. 48
Table 4.2: 𝑄 and 𝑅 weighting Matrix values Parameter
Weight Vector
𝑄𝑝^
0
0
5000
𝑄Θ^
500
1000
2500
𝑄𝑝^˙
2500
2500
2500
𝑄ℬ 𝜔^˙
2.5
2.5
300
𝑅𝑟^
350
350
350
𝑅𝐹^
0.25
0.25
0.001
Units (︁ )︁2 1 𝑚
(︁
1 𝑟𝑎𝑑
)︁2
(︁ )︁2 𝑠 𝑚
(︁
𝑠 𝑟𝑎𝑑
)︁2
(︁ )︁2 1 𝑚
(︁ )︁2 1 𝑁
The simulation was run for 𝑁 = 3 future footstep predictions. This returned optimized forces for the current contact mode as well as footstep locations and forces over the next two contact mode phases. The future two footstep locations can be seen in Figure 4-5. Although several future footsteps are predicted, only the current ones are used for computing the control forces. The first future footstep of the non-contact legs is used to design a nominal swing foot trajectory. For the purposes of this work, the swing trajectory is simple and arbitrary as it has negligible effects on the CoM dynamics and can be easily replaced without modification to the controller.
Figure 4-5: Predicted Footstep Planning. A receding prediction horizon for planning returns 𝑁 future footstep locations for each foot. The next predicted footstep is used for planning the swing foot trajectories. 49
The reference policies are designed primarily for stable steady state locomotion during trotting. However, it is highly improbable that the references will be the optimal set of control inputs during non-steady state conditions. This is where the optimization takes over and explores the space around the initial guess for footstep locations and forces that will more closely States follow the desired state trajectory. The regularization will ensure that the optimization results do not find undesirable local minima. The state trajectories resulting from a desired velocity ramp can be seen in Figure 4-6. Transient behavior is seen as the robot first accelerates, but eventually reaches bounded steady behavior as the desired velocity no longer changes. A constant small pitch offset is observed, which is due to the discretization assumption of forces being applied impulses at the original footstep location.
1 roll pitch yaw
6 4
Velocity [m/s]
Orientation [deg]
8
2 0
x vel. y vel. z vel. des. x vel.
0.5
0 -2
0
5
10
0
15
5
10
15
Time [s]
Time [s]
Figure 4-6: Accelerating From Rest. From rest, the controller is able to accelerate to a nominal steady state trotting pace while stabilizing the transient behavior during the acceleration periods.
Next, a simulation was designed involving rapid turning through a commanded forward velocity in the robot’s body frame of 2 𝑚𝑠 and a desired yaw rate described by 𝜓˙ 𝑑 (𝑡) = 30∘ sin(40𝜋𝑡) 1𝑠 . The robot was able to follow the desired state trajectories as seen in Figure 4-7 while remaining stable and a slight roll into the turn naturally arises from the inputs and the conical pendulum reference. Again, these basic locomotion capabilities were achieved without any need for changes to the control parameters. 50
40
Vel
Angular
Angular Velocity [deg/s] AngularVelocity Velocity [deg/s] Angular [deg/s]
Orie
States States
-20 0
Velocity [m/s] Velocity Velocity [m/s][m/s]
States States
Velocity [m/s]
-40
Velocity [m/s] Velocity [m/s]
540 4010 15 20 20 [s] Time 20 20 40 40
Velocity Velocity [m/s] [m/s]
0
Angular Velocity [deg/s] Angular Velocity [deg/s] Angular Velocity [deg/s]
Orientation Orientation [deg] [deg]
Angular Angular Velocity Velocity [deg/s] [deg/s]
Orientation [deg] Orientation [deg] Orientation [deg]
Orientation [deg] Orientation [deg] Orientation [deg]
2 -2 5 10 15 20 0 5 10 15 2 2 10 Time [s] States Time [s] States 10 10 2 States 0 0 2 x velocity 0 0 0 40 2 1010 0 0 20 roll y velocity 20 roll rate 0 0 -20 10 pitch z velocity 10 pitch rate -20 20 -20 0 0 0 0 10 yaw desired x velocity 0 yaw rate -10 -10 0 -40 -2 -400 -2 0 0 desired y20velocity -10 0 -40 -2 5 10 15 20 desired 0 5 5 10 10 15 15 0 10 15 15020 20 0 0 5-20-20 55 10 101015 151520 20 200yaw0 rate 20 0 0 5 5 10 5 10 15 20 Time [s][s] 0 Time [s] Time [s] Time -20 Time [s] Time [s] Time [s] Time [s] -10 -2 [s] -10 -40-40 -2 Time 0 5 5 -10 0-10 1010 1515 2020 0-400 5 5 10 10 15 15 20 20 0 x0 velocity 5 5 10 10 15 15 20 20 -2 x velocity 5 10 [s] [s] 15 20 Time x20velocity -10 0[s] Time [s] 15 Time Time Time [s] [s] 15 0 5 10 20 0 5 10 15 0 5Time 20 rollroll y velocity y velocity 0 5 10 15 20 10 Time [s] roll rate roll y velocity roll rate roll rate Time [s] Time [s] Time [s] pitch z velocity pitch z velocity x velocity pitch pitch z velocity x velocity pitch rate Time [s] pitch raterate 40 2 yaw roll desired velocity yaw desired velocity roll 40 yaw velocity desired x velocity yx velocity xyxvelocity yaw rate yawyaw raterate rollroll raterate 2 desired y velocity desired yaw rate pitch desired desired y velocity pitch z velocity velocity roll yzyvelocity desired yawpitch rate pitch rate rate desired yaw rate roll rate 20 1 yaw yaw desired x velocity desired x velocity pitch z velocity yaw rate yaw pitchrate rate 1 20 desired y velocity desired y velocity velocity yaw desired x desired raterate desired yaw yawyaw rate 0 0 desired y velocity desired yaw 0 0 rate -20 -1 -20 -1 -40 -2 5 10 15 20 -40 0 5 10 15 20 -2 0 0 5 Time 10 [s] 15 20 0 5 Time 10 [s] 15 20 Time [s] Time [s] -10
Figure 4-7: Rapid Yawing. The robot turns with a commanded 30∘ 1𝑠 and rolls to match the natural motion expected from the conical pendulum model.
As mentioned before, the trajectory generation is assumed to already be feasible and stable as an input to the prediction. The trajectory was designed to traverse a simulated world for the robot depicted in Figure 4-8. In practice, this would likely be generated online based on external environment sensors.
Figure 4-8: Trajectory Tracking. Given a trajectory that is assumed to be feasible, the controller tracks the trajectory throughout the simulation. 51
20
4.3
Robustness Testing
When implemented on the real robot, there will rarely be situations that match these idealized conditions and therefore the controller should be able to handle a fair amount of situational uncertainty with ease. After success with basic forward locomotion, the controller was subjected to a number of situations to test its robustness. These include impulse pushes, turning around obstacles, hard stops, uneven terrain, and added state and foot placement noise. All of these were also tested using various other gaits besides the standard trot walk.
4.3.1
Disturbance Rejection & Environment Uncertainty
As with many legged robots, a simple test is to push the robot during steady state locomotion and see if it is able to recover from the disturbance and return to the desired trajectory. The same trotting simulation was run, but subjected to an outside instantaneous impulsive push of net impulse [−23.1, 8.25, 0]𝑇 Ns occurring at 15 seconds during the beginning of a stance phase. A full stance phase was allowed to occur before a replanned set of inputs was returned from the optimization. The sequence of events from the steady state locomotion, to the impulsive push causing a disturbance, to new input execution is shown in Figure 4-9.
Figure 4-9: Push Recovery Sequence. Footstep planning with MPC allows the robot to choose new footstep locations autonomously to recover from disturbances.
Figure 4-10 shows state trajectories following this push disturbance, exhibiting large roll and pitch. Through these excursions, the net body torque approximation in (3.3) causes deviation between the control model and simulation. Despite these 52
errors in the control model, stabilizing foot placements and forces are selected for closed-loop control. These results empirically support the admissibility of the approximations from Section 3.1 to generate appropriate controls. From analyzing the state trajectories during the simulation including the push, we can see that the robot is able to successfully recover and converge back to the original desired forward speed and the steady state trotting and constant offset pitching motion with an approximate time constant of 1 second.
States
roll pitch yaw
-20 -40 14
Angular Velocity [deg/s]
Position [m]
0
15
16
17
-200 15
16
17
15
16
17
18
1
0
14
x position y position z position
5
14
18
roll rate pitch rate yaw rate
200
10
0
Velocity [m/s]
Orientation [deg]
20
18
Time [s]
x vel. y vel. z vel. des. x vel.
0.5 0 -0.5 14
15
16
17
18
Time [s]
Figure 4-10: Push Recovery Resulting States With the control framework specified, all of the robot’s states are able to quickly recover to the desired trajectory after large disturbances. Prior to the push, the optimized force and foot placements closely follow the reference, but when the push causes a large deviation from the steady state behavior and the optimization is able to find new forces and foot placements to deal with this disturbance. Figure 4-11 is the time series plot of the inputs just prior to and after the push disturbance. Since only a small amount of regularization weight was added to the forces, large spikes appear after the push to mitigate the pitch and roll effects. Foot placements were given larger regularization weighting and therefore 53
remain closer to the nominal positions near their respective hip. Inputs
n n n
80
5
60
0
350 300 250
40
-5
20
-10
0
-15
200 150 100
14
16
18
50 -20 14
15
16
17
18
-20 14
15
16
17
18
0 14
15
16
17
18
0.2 0 -0.2
velocity 14
16
Velocity
18
-1.5
X Force dX dY dZ
-1
Y Force
-0.5
0
Z Force
0.5
1
1.5
Foot Placement
Figure 4-11: Push FRRecovery Optimized Inputs Although the reference policy COM Y Force FR Z Force FR X Force FL Y Force FL Z Force FL X Force FR Position suggests no lateral forces, the optimization is able to return forces that help reject BR Z Force BR X Force FL Position the push disturbance.BR Y Force BL X Force
BL Y Force
BL Z Force
BR Position
Total X Force
Total Y Force
Total Z Force
BL Position
World Map
Redraw
Although the controller is designed to incorporate ground information when available, the current simulations assume that the ground is flat with a constant zero height. This also signifies that the friction cones will be calculated relative to a ground plane normal parallel to the gravity vector and that the foot will touch down at the desired location. However, real situations often never follow these idealized environments. Therefore, the simulation world was modified as seen in Figure 4-12 to include obstacles, rough terrain with a randomly varying height of ±10 cm every 0.5 m, and random additive Gaussian noise on foot placement coordinates of 𝑝𝑥,𝑦 = 𝑝^𝑥,𝑦 + 𝒩 (0, 1) cm. Footstep locations are projected onto the surface resulting in persistent step placement errors that are handled through the PR-MPC approach. The robot is able to navigate its environment robustly handling rough terrain with no knowledge of the terrain provided during planning. 54
Figure 4-12: Rough Terrain Traversal. Successful Locomotion over rough, uneven terrains with no change to the controller and no prior knowledge of the environment.
4.3.2
Diverse Gait Stabilization
We have now shown that PR-MPC is capable of stabilizing the robot under the trot gait. However, since it is trivial to define multiple gait pattern maps for any number of desired nominal gaits, it is natural to test the generality of the controller with other gaits observed in nature. Gait pattern maps were generated for bounding and galloping. The same simulations were run for these different gaits including using the same controller gains as the walking trot. Throughout all testing, the same calculated reference policy for both force and foot locations was used in each of the gait tests. Despite the same parameters and policy being used for all gaits, we can see in Figure 4-13 that the converged footfall patterns are distinct for each case. Footfalls are relative to the CoM at each of the divisions per stance phase. In these tests, a prediction parameter of 𝐾 = 4 was used, which is why the footstep locations show up in clusters of 4. The simulation where speed is ramped up from 0𝑚/𝑠 to 1𝑚/𝑠 is used in each of these cases. Trotting features the longest stance time and therefore the clusters cover a longer range, while 55
galloping has a short stance time per phase and therefore has tighter clusters.
0.1 0.05 0 -0.05 -0.1 -0.15 -0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.1 0.05 0 -0.05 -0.1 -0.15 -0.5
0.1 0.05 0 -0.05 -0.1 -0.15 -0.5
Figure 4-13: Comparison of Multiple Gait Footfalls. Resulting foothold locations for each of the optimized gaits throughout the simulation with respect to the CoM. Travel velocity direction is in the +𝑥 direction.
As a clearer comparison, the first footstep input in each stance phase for all of the gaits is overlaid in Figure 4-14. Footsteps for the trot gait are steady and follow a straight line because they require little optimization as the reference policy was roughly chosen to mirror a steady state trotting gait. Since bounding is also a symmetrical gait about the sagittal plane, the footsteps appear under the hips in the ℬ 𝑦 direction, but are closer to the COM in the ℬ 𝑥 direction in order to reduce large pitching moments. The most drastic deviation from the reference policy is in the galloping gait. Since galloping has periods of single stance where forces are not evenly distributed about the ℬ 𝑦 axis. As a result, the optimized foot locations tend to 56
0
Y Position [m]
0.1
0.1
move closer to the sagittal plane in order to reduce roll. Interestingly, even though the same footstep reference policy is used for each gait, they end up converging towards
0
more natural locations for steady locomotion. -0.1 -0.5
0
0.5
X Position [m]
0.1 0
Y Position [m]
-0.5
0.5
0.1 X Position [m]
0 0 X Position [m]
0.5
-0.1 -0.5
-0.4
-0.3
-0.2
-0.1 0 0.1 X Position [m]
0.2
0.3
0.4
0.5
Figure 4-14: Footfall Placements for Multiple Gaits. Footsteps are distinctive for each gait even though the same reference policy is used.
4.4
Computation Timings
In order for this approach to extend into the real world robot, the optimization must be solvable quickly in real-time. For our gait based formulation, the absolute upper bound that the solve time must stay under is the current gait phase. For example, since the stance time, 𝑇𝑆 , in the trot walk is defined to be 0.25 𝑠, that is also the absolute upper bound on the time it takes for the optimization to converge. However, the goal is to be well below this upper bound so the results are as given as close as possible to when the most recent state estimation from the sensors is received. Several benchmark tests were run to characterize the computations of different gaits and prediction horizon parameters. Figure 4-15 shows the relative importance between prediction time (top row) and simulated optimization cost (bottom row) for the trot walk gait (left column) and bound (right column). Various choices of the gait parameters 𝑁 and 𝐾 over multiple trials show that in general computation time increases with more prediction steps, but the resultant normalized state deviation 57
cost on the full simulation model using the resultant control inputs yields diminishing benefits from longer predictions. This signals that there is a tradeoff between using a longer prediction horizon to follow the desired trajectory and taking too long to compute these results. Trot Walk
Bound
3
1 K=1 K=2 K=3
2.5 2.25
Average Prediction Time [s]
Average Prediction Time [s]
2.75
2 1.75 1.5 1.25 1 0.75 0.5 0.25 0
K=1 K=2 K=3
0.75
0.5
0.25
0 1
2
3
4
5
3
Prediction Phases, N
5
2
250
140 K=1 K=2 K=3
120
K=1 K=2 K=3
200
100
Simulation Cost
Simulation Cost
4
Prediction Phases, N
80 60 40
150
100
20 0
50 1
2
3
4
5
3
Prediction Phases, N
4
5
2
Prediction Phases, N
Figure 4-15: Gait Computation Time Comparison. PR-MPC solve timing and associated average step cost from simulation. The PR-MPC controller resulted in stable simulation for all data points shown. The computational benefit of the four tested MPC variants was also explored. Identical simulation conditions and controller parameters were used for each of the trot, bound, and gallop gaits over multiple tests for each of the simulation cases. The results are shown in Figure 4-16 where we can see a common trend of decreasing computation time as policy information being given to the controller optimization increases. This tends to support the hypothesis that the improved initial guesses from the physics-based heuristics allow for faster convergence and that the regularization 58
smooths out the cost space that takes less iterations to converge and does not explore possible needless local minima.
1 0.75
Case A: Naive MPC Case B: Force Seeded Case C: Full Seeded Case D: PR-MPC
0.5 0.25 0
00.4 .2 0.6 0.8 1 Figure 4-16: Computational Benefits of PR-MPC . Comparatively, PR-MPC has lower solve times than the optimizations with less prior knowledge. An important observation is that the computation time tends to decrease as more information is added to the optimization. However, in galloping the computation time actually increases when the footstep reference is added as an initial guess rather than the vectors of zeros. While this may seem to break the trend, it actually supports the underlying argument. For galloping, the naturally occurring local optima was generally closer to the CoM than to the supplied reference policy as seen in the footstep pattern in Figures 4-14 and 4.3.2. Since the reference footstep location was designed for trotting it took longer for the galloping optimization to find the stable footstep locations characteristic of that gait. This may be a key consideration when designing the reference policy and may require an improved metric or policy for choosing better general footsteps.
59
60
Chapter 5 Conclusion The work presented in this thesis is an important step towards a generalized 3D locomotion framework for legged robots. Through the use of simple physics-based heuristics as regularization terms in the predictive optimization, the robot is able to move around its environment without the need to specifically tune any parameters for different velocities, terrains, or even gaits. The wide range of capabilities show promise that the controller will be able to stabilize the robot’s motion under unforeseen conditions. It is important to note that while the controller in the current form performs well in the simulation, it does not make any claims of guarantee for stability. PR-MPC performed much better than the other tested MPC variations in all locomotion and disturbance tests in addition to converging to a solution much faster on average. This points to the power of the algorithm, both through biasing the solution towards physically meaningful results rather than a blackbox optimization result, as well as by reducing the need to explore the entire cost space from regularization.
5.1
Future Work
As with all theoretical and simulation work in robotics, the eventual goal is to successfully implement the results on a physical robot in the real world. The current framework and algorithm provide promising results in simulation, but work is currently underway for full integration with the experimental platform. 61
5.1.1
Implementation on the MIT Cheetah 3 Robot
Throughout this thesis, the MIT Cheetah 2 robot was used as the model for simulation. However, with the new MIT Cheetah 3 robot having been designed and built as seen in Figure 5-1, the plan is to implement the PR-MPC algorithm on the Cheetah 3 robot. Since the overall design at its core for both systems is a quadruped, this controller does not need any major modifications other than updating the physical parameters used in the controller for base dimensions, mass, and inertia to match the new robot instead. Cheetah 3’s mechanical design also features many upgrades, such as a much more powerful abduction / adduction motor that will allow for much more control and range of motion.
Figure 5-1: MIT Cheetah 3 Robot. The latest iteration of the MIT Cheetah project with improved hardware and software design.
In simulation, we are afforded the ability to pause time and allow the optimization to return solutions instantaneously. However, on the real robot, results must be computed and returned to the robot in real time, making the current solve times in the MATLAB implementation too slow. The algorithm has been implemented in C++ using IPOPT, a faster nonlinear optimization solver, in order to be a viable option for carrying out the predictions in real time. Initial tests have shown improved computation times allowing for one second prediction horizon calculations at 60 Hz, 62
which would allow use on a real time system. Even with faster computation for the optimization, the results are optimized for an instantaneous robot initial state, but by the time a control policy solution is found, it may no longer be the optimal one and perhaps no longer viable depending on contact state switches. Several timing issues must be worked out in the transition between paused time simulation and real-time experiments. An improved software architecture design allows simple simulation testing and integration of new controllers on the physical machine. Simulation testing can be carried out on a full dynamics model seen in Figure 5-2 and with a simple switch, the same code can be compiled instead for testing on the actual robot. The simulation doesn’t make the massless leg assumption and a high degree of correlation between the simulation and real world robot behavior has been observed across all tests. Preliminary results using PR-MPC for succesful locomotion have shown that it does generalize to the different robot parameters and legs with mass.
Figure 5-2: MIT Cheetah 3 Simulation Environment. A realistic simulation environment shows promising early results for PR-MPC on the new robot using the full system dynamics. 63
5.1.2
Algorithm Improvements
While the algorithm framework has resulted in very successful dynamic locomotion tests, this thesis was meant as a feasibility study into designing a predictive controller for a quadruped robot. The work presented here is the baseline iteration that contains the basic functionality, but has not yet been fully explored. These variations and improvements will likely be investigated as the direct next steps and are expected to increase robustness and generality. An improved general gait scheduling for leg independent and flexible timings has been developed using a different controller that allows individual scheduling of each leg that can be incorporated into the PR-MPC. It allows for early and late touchdown events as seen in [1]. The algorithm is written in a way that makes it simple to make use of early touchdowns and neglect legs that aren’t yet in contact. A predictive CoM algorithm is also being developed that will allow for more desirable positioning depending on which legs are about to take of or touch down. By running an offline instance of the optimization on the full model and enforcing a periodicity constraint, we can generate limit cycles to incorporate into the desired trajectories. Since we know that many dynamic gaits will feature pitching and rolling motions, it may be beneficial to supply these gait specific, steady state limit cycles so that the controller isn’t fighting against the natural movement, but rather it is embracing it. The same could be done for designing improved initial touchdown footstep locations for the various gaits. The current policy is designed for steady state trotting, but we see that other gaits, such as galloping have their own natural footstep locations. These footstep locations could be extracted or learned from the gathered simulation data. Along with the optimized states and inputs, the Lagrange multipliers for all of the constraints are also returned from the solver. This includes the Lagrange multipliers for the robot dynamics. Since the forces appear in the constraint, we may be able use these to generate instantaneous feedback controller at a much faster rate while a new solution is being computed in the nonlinear solver. This would provide multiple 64
benefits. For instance, in the push recovery test, the controller would act instantly to return to the desired state rather than waiting for a replan of steps at the next available PR-MPC solution. This would decrease deviations from the nominal trajectory and require less aggressive plans to form. Another benefit is that we could begin to add back some of the nonlinearities in the model and allow for an increased optimization time since the instantaneous feedback controller would take care of small perturbations without need for a complete replanning of footsteps and forces over a large horizon. Exciting preliminary results have been shown for a method to automatically select takeoff timings and improved reference footstep locations. This method would selects the gait schedule based simply on the desired CoM states rather than a user defined gait pattern map or cyclic phase-based gait. This method would lend itself naturally to integration with the PR-MPC as the selection of this gait schedule would be generated online during the initial first pass. No modifications to the current PRMPC would be needed as the cost function and constraints are is designed to be independent of the reference policy choice and gait schedule definition.
5.2
Implications
A successful implementation of PR-MPC on the robotic platform, would represent a first pass at a generalized framework for real-time legged locomotion. Being able of traverse a variety of unstructured terrains blindly while successfully rejecting large, unexpected disturbances would signal improved robustness that would broaden the range of possible operating environments for the robot as compared to previous methods. This also means that the framework can be used as a basis for reliable locomotion while many improvements and modifications are implemented as described in Section 5.1.2. This PR-MPC framework is a promising step towards extending the dynamic capabilities of legged robots.
65
66
Bibliography [1] Gerardo Bledt, Patrick M. Wensing, Sam Ingersoll, and Sangbae Kim. Contact model fusion for event-based locomotion in unstructured terrains. In 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, Submitted 2018. [2] Gerardo Bledt, Patrick M. Wensing, and Sangbae Kim. A predictive control framework to stabilize diverse gaits for the MIT cheetah. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Vancouver, Canada, Sept. 2017. [3] R. Blickhan. The spring-mass model for running and hopping. Journal of Biomechanics, 22(11):1217 – 1227, 1989. [4] M. Bloesch, M. Hutter, M. A. Hoepflinger, S. Leutenegger, C. Gehring, C. D. Remy, and R. Siegwart. State Estimation for Legged Robots - Consistent Fusion of Leg Kinematics and IMU. In Robotics: Science and Systems VIII, 2012. [5] B. G. Buss, K. A. Hamed, B. A. Griffin, and J. W. Grizzle. Experimental results for 3d bipedal robot walking based on systematic optimization of virtual constraints. In 2016 American Control Conference (ACC), pages 4785–4792, July 2016. [6] M. Cutler, N. Kemal Ure, B. Michini, and J. P. How. Comparison of fixed and variable pitch actuators for agile quadrotors. In AIAA Guidance, Navigation, and Control Conference (GNC), pages 1–17, Portland, OR, August 2011. [7] S. Dadashi, J. Feaster, G. Bledt, J. Bayandor, F. Battaglia, and A. Kurdila. Adaptive control for flapping wing robots with history dependent, unsteady aerodynamics. In 2015 American Control Conference (ACC), pages 144–151, July 2015. [8] H. Dai and R. Tedrake. Optimizing robust limit cycles for legged locomotion on unknown terrain. In 2012 IEEE 51st IEEE Conference on Decision and Control (CDC), pages 1207–1213, Dec 2012. [9] Hongkai Dai, Andrés Valenzuela, and Russ Tedrake. Whole-body motion planning with centroidal dynamics and full kinematics. In IEEE-RAS International Conference on Humanoid Robots, pages 295–302, 2014. 67
[10] M. Diehl, H.G. Bock, H. Diedam, and P.-B. Wieber. Fast direct multiple shooting algorithms for optimal robot control. In Moritz Diehl and Katja Mombaur, editors, Fast Motions in Biomechanics and Robotics, volume 340 of Lecture Notes in Control and Information Sciences, pages 65–93. Springer Berlin Heidelberg, 2006. [11] J. Englsberger, P. Kozlowski, and C. Ott. Biologically inspired deadbeat control for running on 3d stepping stones. In IEEE-RAS Int. Conf. on Humanoid Robots, pages 1067–1074, Nov 2015. [12] Michele Focchi, Andrea del Prete, Ioannis Havoutis, Roy Featherstone, Darwin G. Caldwell, and Claudio Semini. High-slope terrain locomotion for torquecontrolled quadruped robots. Autonomous Robots, 41(1):259–272, Jan 2017. [13] R. J. Full and D. E. Koditschek. Templates and anchors: neuromechanical hypotheses of legged locomotion on land. J. Exp. Biol., 202(23):3325–3332, 1999. [14] B. Henze, C. Ott, and M. A. Roa. Posture and balance control for humanoid robots in multi-contact scenarios based on model predictive control. In 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 3253–3258, Sept 2014. [15] Andrei Herdt, Holger Diedam, Pierre-Brice Wieber, Dimitar Dimitrov, Katja Mombaur, and Moritz Diehl. Online Walking Motion Generation with Automatic Foot Step Placement. Advanced Robotics, 24(5-6):719–737, 2010. [16] A. Herzog, N. Rotella, S. Schaal, and L. Righetti. Trajectory generation for multicontact momentum control. In IEEE-RAS Int. Conf. on Humanoid Robots, pages 874–880, Nov 2015. [17] Seokmin Hong, Yonghwan Oh, Young-Hwan Chang, and Bum-Jae You. Walking pattern generation for humanoid robots with lqr and feedforward control method. In 2008 34th Annual Conference of IEEE Industrial Electronics, pages 1698– 1703, Nov 2008. [18] M. Hutter, C. Gehring, A. Lauber, F. Gunther, C. D. Bellicoso, V. Tsounis, P. Fankhauser, R. Diethelm, S. Bachmann, M. Bloesch, H. Kolvenbach, M. Bjelonic, L. Isler, and K. Meyer. Anymal - toward legged robots for harsh environments. Advanced Robotics, 31(17):918–931, 2017. [19] A. Katriniok, J. P. Maschuw, F. Christen, L. Eckstein, and D. Abel. Optimal vehicle dynamics control for combined longitudinal and lateral autonomous vehicle guidance. In 2013 European Control Conference (ECC), pages 974–979, July 2013. [20] Coleman Knabe, Robert Griffin, James Burton, Graham Cantor-Cooke, Lakshitha Dantanarayana, Graham Day, Oliver Ebeling-Koning, Eric Hahn, Michael Hopkins, Jordan Neal, Jackson Newton, Chris Nogales, Viktor Orekhov, John 68
Peterson, Michael Rouleau, John Seminatore, Yoonchang Sung, Jacob Webb, Nikolaus Wittenstein, Jason Ziglar, Alexander Leonessa, Brian Lattimer, and Tomonari Furukawa. Team valor’s escher: A novel electromechanical biped for the darpa robotics challenge. Journal of Field Robotics, 34(5):912–939, 2017. [21] Darren Krasny and David Orin. Evolution of a 3D gallop in a quadrupedal model with biological characteristics. Journal of Intelligent & Robotic Systems, 60:59–82, 2010. [22] Scott Kuindersma, Robin Deits, Maurice Fallon, Andrés Valenzuela, Hongkai Dai, Frank Permenter, Twan Koolen, Pat Marion, and Russ Tedrake. Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot. Autonomous Robots, 40(3):429–455, Mar 2016. [23] Scott Kuindersma, Frank Permenter, and Russ Tedrake. An efficiently solvable quadratic program for stabilizing dynamic locomotion. CoRR, abs/1311.1839, 2013. [24] Sung-Hee Lee and A. Goswami. Reaction mass pendulum (RMP): An explicit model for centroidal angular momentum of humanoid robots. In IEEE Int. Conf. on Rob. and Automation, pages 4667–4672, April 2007. [25] Anirudha Majumdar and Russ Tedrake. Funnel libraries for real-time robust feedback motion planning. CoRR, abs/1601.04037, 2016. [26] Sean Mason, Nicholas Rotella, Stefan Schaal, and Ludovic Righetti. Balancing and walking using full dynamics LQR control with contact constraints. CoRR, abs/1701.08179, 2017. [27] Jouni Mattila, Janne Koivumaki, Darwin G. Caldwell, and Claudio Semini. A survey on control of hydraulic robotic manipulators with projection to future trends. IEEE/ASME Transactions on Mechatronics, PP(99):1–1, 2017. [28] M. Naveau, M. Kudruss, O. Stasse, C. Kirches, K. Mombaur, and P. Soures. A reactive walking pattern generator based on nonlinear model predictive control. IEEE Robotics and Automation Letters, 2(1):10–17, Jan 2017. [29] M. Neunert, F. Farshidian, A. W. Winkler, and J. Buchli. Trajectory Optimization Through Contacts and Automatic Gait Discovery for Quadrupeds. ArXiv e-prints, July 2016. [30] David E. Orin, Ambarish Goswami, and Sung-Hee Lee. Centroidal dynamics of a humanoid robot. Autonomous Robots, 35(2):161–176, 2013. [31] H. W. Park, Sangin Park, and S. Kim. Variable-speed quadrupedal bounding using impulse planning: Untethered high-speed 3d running of MIT cheetah 2. In 2015 IEEE International Conference on Robotics and Automation (ICRA), pages 5163–5170, May 2015. 69
[32] Hae-Won Park and Sangbae Kim. Quadrupedal galloping control for a wide range of speed via vertical impulse scaling. Bioinspiration and Biomimetics, 10(2):025003, 2015. [33] Hae-Won Park, Patrick Wensing, and Sangbae Kim. Online planning for autonomous running jumps over obstacles in high-speed quadrupeds. In Proceedings of Robotics: Science and Systems, Rome, Italy, July 2015. [34] Michael Posa, Cecilia Cantu, and Russ Tedrake. A direct method for trajectory optimization of rigid bodies through contact. The International Journal of Robotics Research, 33(1):69–81, 2014. [35] Michael Posa, Twan Koolen, and Russ Tedrake. Balancing and step recovery capturability via sums-of-squares optimization. In Proceedings of Robotics: Science and Systems, Cambridge, Massachusetts, July 2017. [36] I. Poulakakis and J. W. Grizzle. The spring loaded inverted pendulum as the hybrid zero dynamics of an asymmetric hopper. IEEE Transactions on Automatic Control, 54(8):1779–1793, Aug 2009. [37] M. J. Powell, E. A. Cousineau, and A. D. Ames. Model predictive control of underactuated bipedal robotic walking. In Proc. of the IEEE Int. Conf. on Robotics and Automation, pages 5121–5126, May 2015. [38] J. Pratt, J. Carff, S. Drakunov, and A. Goswami. Capture point: A step toward humanoid push recovery. In IEEE-RAS Int. Conf. on Humanoid Robots, pages 200–207, Genova, Italy, Dec. 2006. [39] Marc H. Raibert. Legged robots that balance. MIT Press, Cambridge, MA, USA, 1986. [40] Nicholas Rotella, Michael Blösch, Ludovic Righetti, and Stefan Schaal. State estimation for a humanoid robot. CoRR, abs/1402.5450, 2014. [41] M. Rutschmann, B. Satzinger, M. Byl, and K. Byl. Nonlinear model predictive control for rough-terrain robot hopping. In IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pages 1859–1864, Oct. 2012. [42] B. J. Stephens and C. G. Atkeson. Push recovery by stepping for humanoid robots with force controlled joints. In 2010 10th IEEE-RAS International Conference on Humanoid Robots, pages 52–59, Dec 2010. [43] Avik De Daniel E. Koditschek T. Turner Topping, Vasileios Vasilopoulos. Towards bipedal behavior on a quadrupedal platform using optimal control, 2016. [44] Y. Tassa, T. Erez, and E. Todorov. Synthesis and stabilization of complex behaviors through online trajectory optimization. In Proc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pages 4906–4913, Oct 2012. 70
[45] Y. Tassa, N. Mansard, and Todorov E. Control-limited differential dynamic programming. In Proc. of the IEEE Int. Conf. on Robotics and Automation, pages 1168–1175, 2014. [46] Martin J. Wainwright. Structured regularizers for high-dimensional problems: Statistical and computational issues. Annual Review of Statistics and Its Application, 1:233–253, 2014. [47] Kenneth Waldron and James Schmiedeler. Chapter 1: Kinematics. In Bruno Siciliano and Oussama Khatib, editors, Springer Handbook of Robotics. Springer, New York, 2008. [48] P. M. Wensing and D. E. Orin. Improved computation of the humanoid centroidal dynamics and application in dynamic whole-body control. International Journal of Humanoid Robotics (Special Issue on Whole-Body Control), 13:1550039:1–23, 2015. [49] Patrick M. Wensing and David E. Orin. High-speed humanoid running through control with a 3D-SLIP model. In Proc. of the IEEE/RSJ Int. Conf. on Intelligent Rob. and Sys., pages 5134–5140, Tokyo, Japan, Nov. 2013. [50] Patrick M. Wensing, Albert Wang, Sangok Seok, David Otten, Jeffrey Lang, and Sangbae Kim. Proprioceptive actuator design in the MIT cheetah: Impact mitigation and high-bandwidth physical interaction for dynamic legged robots. IEEE Trans. Robotics, 33(3):509–522, 2017. [51] M. Wermelinger, P. Fankhauser, R. Diethelm, P. KrÃijsi, R. Siegwart, and M. Hutter. Navigation planning for legged robots in challenging terrain. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1184–1189, Oct 2016. [52] E.R. Westervelt, J.W. Grizzle, and D.E. Koditschek. Zero dynamics of underactuater planar biped walkers. IFAC Proceedings Volumes, 35(1):551 – 556, 2002. 15th IFAC World Congress. [53] Eric R. Westervelt, Jessy W. Grizzle, Christine Chevallereau, and et al. Feedback control of dynamic bipedal robot locomotion, 2007. [54] D. Wettergreen and C. Thorpe. Gait generation for legged robots. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, volume 2, pages 1413–1420, Jul 1992. [55] Pierre-Brice Wieber. Trajectory free linear model predictive control for stable walking in the presence of strong perturbations. In Humanoids, 2006. [56] Alexander W. Winkler, Farbod Farshidian, Diego Pardo, Michael Neunert, and Jonas Buchli. Fast trajectory optimization for legged robots using vertex-based ZMP constraints. CoRR, abs/1705.10313, 2017.
71