Intelligent Planning and Control

Intelligent Planning and Control: Bringing together adaptive control and reinforcement learning for guaranteeing Optimal Performance and Robustness Dr. Girish Chowdhary, Lab. for Information and Decision Systems, Massachusetts Institute of Technology ([email protected]) Dr. Tansel Yucelen, School of Aerospace Engineering, Georgia Institute of Technology ([email protected]) Assoc. Prof. Eric Johnson, School of Aerospace Engineering, Georgia Institute of Technology ([email protected]) Prof. Frank Lewis, Automation and Robotics Research Institute, University of Texas Arlington ([email protected]) Dr. Alborz Geramifard, Lab. for Information and Decision Systems, Massachusetts Institute of Technology ([email protected]) Prof. Jonathan How, Lab. for Information and Decision Systems, Massachusetts Institute of Technology ([email protected])

Summary The problem of selecting the best set of decision actions in order to maximize the expected obtained cumulative reward is common to both biological and mechanical systems. Both autonomous planning and automatic control are concerned with choosing the right set of actions/inputs to maximize the obtained cumulative reward or minimize the incurred cumulative cost. At the heart of several control and planning algorithms are mathematical models that capture the underlying physical phenomena in various engineering applications. The challenge in solving this problem arises from uncertainties that are not captured by the available mathematical models; often introduced due to approximations made while deriving physical models from the first-principles, unforeseen increase in system complexity, time-variations, nonlinearities, disturbance, measurement noise, health degradation, and environmental uncertainties. Adaptive control is a leading methodology intended to guarantee stable high performance controllers in the presence of uncertainty. The problem of making decisions in the presence of uncertainties has also been widely studied in the planning literature. The typical approach there is to formulate these problems in the Markov decision processes (MDP) framework and search for the optimal policy. Solving MDPs without knowing the underlying model using reinforcement learning methods has become popular within the optimization community. The diminishing boundaries between individual fields of engineering and a greater emphasis on performance requirements of the system as a whole are bringing the problems of planning and control under uncertainties closer than ever. This has motivated several researchers to start searching for commonalities between adaptive control and reinforcement learning. The purpose of this workshop is to provide a detailed review of a number of well-established and emerging methods in both adaptive control and reinforcement learning by leading experts in the field. The goal is to create a venue for opening a pathway to merging ideas from these disciplines and allow for a unified presentation of the problem of planning and control under uncertainties in a data-rich world. Starting with an overview of nonlinear stability theory, this workshop will build a strong foundation of adaptive control techniques and the tools used in their stability and robustness analysis. Neuroadaptive control techniques will be introduced and their applications of fixed wing and rotary wing Unmanned Aerial Vehicles will be discussed. State-of-the-art approaches to adaptive control theory, including command governor-based adaptive control for guaranteed transient and steady state system performance and robustness, and derivative-free adaptive control for guaranteeing fast adaptation with effective disturbance rejection properties in presence of sudden parameter changes. Information enabled adaptive control techniques that leverage the growing ability to record and process large amounts of data online will be covered. These include the emerging area of nonparametric adaptive control, in which the structure and parameters of the adaptive element are changed online based on measured data. Applications of these methodologies to emerging areas including large-scale interconnected dynamical systems and heterogeneous multivehicle systems will be further discussed. The workshop will then continue onto covering the fundamental tools used for planning and decision making under uncertainty. The basics of probability theory and Markov Decision Processes (MDPs) will be covered. These will lend directly to a discussion of reinforcement learning that highlights the commonality between dynamic programming and reinforcement learning. A wide range of reinforcement learning algorithms including policy iteration, value iteration, Qlearning, SARSA, and LSTD will be reviewed. Novel advances in approximate reinforcement learning will be presented; including the recently developed incremental feature dependency discovery method, which discovers and exploits feature dependencies in the environment to form an efficient approximation of the value function. Having presented an overview of both fields, the workshop will then cover emerging methods that bring reinforcement learning and adaptive control together. These include intelligent cooperative control architecture and direct optimal adaptive control. Topics in health aware planning will be covered, which is an emerging area in planning seeking to use internal information from adaptive controllers to gauge system health and plan accordingly. Moreover, Benviniste’s and Ljung’s ODE methods will be covered, which provide ways to use well established results from Lyapunov theory to analyze the

stability of reinforcement learning and stochastic gradient descent algorithms. The workshop will conclude with a guided discussion on ways in which the two fields could be brought together solving the problem of efficient planning and control under uncertainty with stability and robustness guarantees.

Outline 1.

Motivation and Goals (20 minutes) a. Adaptive control for control under uncertainty b. Reinforcement learning for planning (and control) under uncertainty c. The need for an integrated methodology for robust adaptive planning and control under uncertainty

Section 1: Adaptive Control (1.5 hours) 2.

3.

4.

What is Adaptive Control all about? a. Ordinary differential equations and Lyapunov Technique b. Formulation of the Standard Adaptive Control Problem c. Robustness and stability of adaptive controllers d. Neural Networks and Neuroadaptive Control e. Experimental validation of approximate model inversion based neuroadaptive control on Unmanned Aerial Vehicles New methods in adaptive control a. Command Governor based adaptive control b. Derivative-Free adaptive control c. Distributed control of uncertain multivehicle systems d. Applications to uncertain large-scale interconnected systems Information enabled Adaptive control: A nonparametric approach a. Ensuring parameter convergence in adaptive control b. Reproducing Kernel Hilbert Spaces and Gaussian Processes (GP) c. A Transition in Uncertainty Parameterization from Deterministic to Distributions over Functions d. Online Update of Radial Bases using Budgeted Kernel Restructuring and GP-Based Adaptive Control

Section 2: Planning under Uncertainty (1.5 hours) 5.

6.

What is planning and decision making all about? a. Probability theory preliminaries b. Markov Decision Processes and Planning Problems c. Optimal Decision Making, Dynamic Programming, and the Bellman Equation Reinforcement Learning: Planning under uncertainty a. A shift in perspective: Learning and exploring for reward maximization b. Temporal difference methods: Q-Learning, SARSA, TD( , Least-squares TD c. Value and Policy Iteration algorithms d. Incremental Feature Dependency Discovery

Section 3: Bridging the Gap (2 hours) 7.

Intelligent Cooperative Control Planning (iCCA) a. Combining RL and cooperative control for multi-agent planning under uncertainty 8. Health aware planning a. Using information from the adaptive controller to improve planning performance 9. Using Lyapunov techniques for Reinforcement Learning Stability Proofs a. Benviniste’s and Ljung’s ODE methods b. Example: An analysis of effects of memory in Q-learning algorithms 10. Direct Optimal Adaptive Control a. Reinforcement learning and adaptive dynamic programming for optimal feedback control b. A policy iteration based solution 11. Guided Open Discussion on How to Bring Reinforcement Learning and Adaptive Control Together for Planning and Control in the Presence of Uncertainty a. How can adaptive control frameworks be used to improve planners? b. How can we plan to learn better?

Audience Adaptive control holds the promise to greatly reduce the development time and guarantee accurate tracking performance in the presence of uncertainties and disturbances. Reinforcement learning has been successfully used to solve difficult planning problems under severe uncertainty. This workshop will enable individuals from industry and academia to learn more about recent advances in adaptive control theory and reinforcement learning theory. The workshop will provide the tools needed for real-world adaptive control and reinforcement learning applications, and will be relevant to practicing professionals from electrical, mechanical, and aerospace industries. The workshop also intends to mix two fields with similar goals that have not interacted heavily in the past together to cultivate exciting new future research directions. This is expected to be of great value to experts and students of both fields.

Instructor Biographies Girish Chowdhary is currently a postdoctoral associate at the Massachusetts Institute of Technology’s Laboratory for Information and Decision Systems and the School of Aeronautics and Astronautics. He received his Ph.D. from Georgia Institute of Technology in 2010 where he was a member of the UAV Research Facility. Prior to joining Georgia Tech, Girish worked as a research engineer with the German Aerospace Center’s (DLR’s) Institute for Flight Systems Technology in Braunschweig, Germany. Girish received a MS degree in Aerospace Engineering from Georgia Tech in 2008, and a BE with honors from RMIT university in Melbourne, Australia in 2003. His research interests include adaptive control, nonlinear control, fault tolerant control, neuroadaptive control machine learning, Bayesian inference for learning, vision aided navigation, decentralized control of networked systems, collaborative planning and learning. He is interested in applications in aerospace guidance, navigation, and control, manned/unmanned aerial vehicles, autonomous ground vehicles, mechanical systems, and automated drilling. Tansel Yucelen received the B.S. degree in control engineering from Istanbul Technical University, Istanbul, Turkey, in 2006, and the M.S. degree in electrical and computer engineering from Southern Illinois University, Carbondale, Illinois, in 2008. He received a Ph.D. degree in aerospace engineering at the School of Aerospace Engineering, Georgia Institute of Technology, Atlanta, Georgia. He is currently a postdoctoral researcher at Georgia Institute of Technology. His research interests include absolute stability theory, adaptive control, computational methods, decentralized control, delay systems, discrete-time systems, dissipativity theory, disturbance rejection, fixed-architecture controller synthesis, fuzzy control, Hamilton-Jacobi-Bellman theory, identification, linear matrix inequalities, linear systems, modeling, multiobjective mixednorm controller synthesis, neural networks, neuroadaptive control, nonlinear systems, optimal control, optimization, predictive control, robust control, saturation control, stability analysis of linear systems, stability analysis of nonlinear systems, time-varying systems, and uncertain systems. His application interests include active noise control, active vibration control, aerospace systems, automation, electrical systems, flexible structures, flight control, isolation technology, manned aerial vehicles, mechanical systems, power systems, real-time systems, robotics, smart structure control, and unmanned aerial vehicles. He is the author of over 60 papers. He is a student member of AIAA, AUVSI, AMS, CSS, IEEE, and SIAM.

Eric N. Johnson is a Professor at the School of Aerospace Engineering, Georgia Institute of Technology, Atlanta, GA. He has a diverse background in guidance, navigation, and control - including applications such as airplanes, helicopters, submarines, and launch vehicles. He received a B.S. degree from University of Washington, M.S. degrees from MIT and The George Washington University, and a Ph.D. from Georgia Tech, all in Aerospace Engineering. He also has five years of industry experience working at Lockheed Martin and Draper Laboratory. He joined the Georgia Tech faculty in 2000, and has performed research in adaptive flight control, navigation, embedded software, and autonomous systems. He is the director of the Georgia Tech UAV Research Facility (UAVRF). He was the lead system integrator for rotorcraft experiments and demonstrations for the DARPA Software Enabled Control program, which included the first air-launch of a hovering aircraft and automatic flight of a helicopter with a simulated stuck swash-plate actuator. He was the principal investigator of the Active Vision Control Systems AFOSR Multi-University Research Initiative (MURI), developing methods that utilize 2D and 3-D imagery to enable aerial vehicles to operate in uncertain complex 3-D environments. He has received the NSF CAREER award, and is a member of the AIAA and AHS. Frank L. Lewis, Fellow IEEE, Fellow IFAC, Fellow U.K. Institute of Measurement & Control, PE Texas, U.K. Chartered Engineer, is Distinguished Scholar Professor, Distinguished Teaching Professor, and Moncrief-O’Donnell Chair at University of Texas at Arlington’s Automation & Robotics Research Institute. He obtained the Bachelor's Degree in Physics/EE and the MSEE at Rice University, the MS in Aeronautical Engineering from Univ. W. Florida, and the Ph.D. at Ga. Tech. He works in feedback control, intelligent systems, distributed control systems, and sensor networks. He is author of 6 U.S. patents, 216 journal papers, 330 conference papers, 14 books, 44 chapters, and 11 journal special issues. He received the Fulbright Research Award, NSF Research Initiation Grant, ASEE Terman Award, Int. Neural Network Soc. Gabor Award 2009, U.K. Inst Measurement & Control Honeywell Field Engineering Medal 2009. Received Outstanding

Service Award from Dallas IEEE Section, selected as Engineer of the year by Ft. Worth IEEE Section. Listed in Ft. Worth Business Press Top 200 Leaders in Manufacturing. Received the 2010 IEEE Region 5 Outstanding Engineering Educator Award and the 2010 UTA Graduate Dean’s Excellence in Doctoral Mentoring Award. Elected to UTA Academy of Distinguished Teachers 2012. He served on the NAE Committee on Space Station in 1995. He is an elected Guest Consulting Professor at South China University of Technology and Shanghai Jiao Tong University. Founding Member of the Board of Governors of the Mediterranean Control Association. Helped win the IEEE Control Systems Society Best Chapter Award (as Founding Chairman of DFW Chapter), the National Sigma Xi Award for Outstanding Chapter (as President of UTA Chapter), and the US SBA Tibbets Award in 1996 (as Director of ARRI’s SBIR Program). Alborz Geramifard is currently a postdoctoral associate at MIT’s Laboratory for Information and Decision Systems (LIDS). He is also affiliated with the computer science and artificial intelligence laboratory (CSAIL). Alborz received PhD from MIT working with Jonathan How and Nicholas Roy on representation learning and safe exploration in large scale sensitive sequential decision-making problems in 2012. Previously he worked on data efficient online reinforcement learning techniques at University of Alberta where he received his MSc in Computing Science under the supervision of Richard Sutton and Michael Bowling in 2008. Alborz received his BSc in Computer Engineering from Sharif University of Technology in 2003. His research interests lie in machine learning with the focus on reinforcement learning, planning, and brain and cognitive sciences. Alborz is the recipient of the NSERC postgraduate scholarships 2010-2012 program and serves as the Technical Committee of Robot Learning for IEEE Robotics and Automation community starting 2011. Dr. Jonathan P. How is the Richard C. Maclaurin Professor of Aeronautics and Astronautics at the Massachusetts Institute of Technology. He received a B.A.Sc. from the University of Toronto in 1987, S.M. & Ph.D. from MIT Aeronautics and Astronautics in 1990 and 1993, respectively, and then was a postdoctoral associate for two years at MIT. Prior to returning to MIT in 2000, he was an Assistant Professor in the Department of Aeronautics and Astronautics at Stanford University. His research interests span multi-agent planning, reinforcement learning and dynamic programming under uncertainty, robust and adaptive control, networked systems, GPS technology, and autonomous aerospace vehicle technologies. He is the author of over 75 journal and 260 conference publications. He received the 2002 Institute of Navigation Burka Award, a 2008 Boeing Special Invention award, 2011 IFAC Automatica award for best applications paper, is an Associate Fellow of AIAA, and a senior member of IEEE.