Parameter Tuning for Prediction-based Quadcopter ...

19 downloads 0 Views 395KB Size Report
Qi = ⊕[2, 1, 1, 256, 256, 512, 4, 1, 16, 4, 8, 64]. Ri = ⊕[1, 4, 64, 4]. (20). Fig. .... mobile robots. IEEE. Robotics Automation Magazine, 21(1), 64–73. doi:.
Parameter Tuning for Prediction-based Quadcopter Trajectory Planning using Learning Automata Peter T. Jardine ∗ Sidney Givigi ∗ Shahram Yousefi ∗∗ ∗

Department of Electrical and Computer Engineering, Royal Military College of Canada, Kingston, Ontario, Canada (e-mail: [email protected], [email protected]) ∗∗ Department of Electrical and Computer Engineering, Queen’s University, Kingston, Ontario, Canada (e-mail: [email protected]) Abstract: This paper presents a target tracking technique for a quadcopter based on Model Predictive Control (MPC) tuned using machine learning. Specifically, it uses learning automata to select the weighting parameters of the objective function such that they minimize tracking error. It develops an approximate linear state-space model for the quadcopter dynamics by linearizing around a hover condition. The optimum sequence of control actions is expressed as perturbations on a stabilizing feedback law expanded over a finite prediction horizon. Simulation results demonstrate the learned weighting parameters can be used to provide optimized trajectories when implemented as receding horizon MPC. Furthermore, a comparison with previous work demonstrates improved tracking performance. Keywords: Model Predictive Control, Quadcopter, Reinforcement Learning 1. INTRODUCTION

An important condition for this convexity is the use of a linear dynamic model.

This paper presents a target tracking technique for a quadcopter based on Model Predictive Control (MPC) tuned using machine learning. Specifically, it uses learning automata (LA) to select the weighting parameters of the objective function such that they minimize tracking error given nonlinear plant dynamics.

In Jardine and Givigi (2016) the authors use feedback linearization to simplify a constant-altitude nonlinear model. For a full 3-dimensional linear model, the authors in Hafez et al. (2015) use system identification. This paper derives a nonlinear model for a quadcopter in terms of Cartesian coordinates and Euler angles. Then, drawing on the Linear Quadratic Regulator (LQR) work in Suicmez and Kutay (2014), this model is discretized and linearized around a hover condition. In order to improve the stability characteristics of the system, the model is redefined in terms of perturbations on a stabilizing feedback law as described in Jardine et al. (2015).

Remotely piloted Unmanned Aerial Vehicles (UAVs) normally depend on a human operator to make real-time decisions about target tracking and trajectory planning. This poses a number of challenges, namely the need for a continuously reliable and secure command and control channel (Sparrow et al., 2015). Overcoming this limitation motivates research into autonomous UAVs capable of making decisions onboard. Due to their growing popularity in industry, a significant amount of this research has focused on quadrotor aircraft or quadcopters (Ribeiro et al., 2015). MPC is a popular topic of research for control (Zhao and Wang, 2015) and stability (Choi and Choi, 2014) of semi-autonomous and autonomous vehicles. MPC-based techniques provide a sequence of optimal inputs that make the most efficient use of energy resources while also considering constraints (Ramana et al., 2016). A summary of key developments in this Model Predictive Motion Planning is found in Howard et al. (2014). The primary challenge when implementing MPC is the high computational demands of the optimization step. This led to research aimed at the fast computation techniques (de la Pea et al., 2015). Given certain conditions, MPC can be formulated using a convex optimization and solved using computational efficient solvers (Grant and Boyd, 2014).

LA is a machine learning scheme that has been applied to adaptive control in random, uncertain environments (Narendra and Thathachar, 1989). In dos Santos et al. (2015) the authors used LA for autonomous construction tasks using quadcopters. They demonstrated that LA could be used to select the proportional, integral, and derivative gains in a low-level PID controller. Similar to PID control, multivariable MPC-based planning techniques require the selection of weighted objective function. This is normally selected by the designer to reflect the relative importance of minimizing each variable. In this paper, we use LA to select these weights offline such that they minimize tracking error over time. Since the learning is accomplished using the nonlinear plant model, these weights also compensate for some of the nonlinearities lost in the linearization of the controller. For the purposes of this paper a˙ and a ¨ denote the first and second time derivative of a. sin(x), cos(x), and tan(x) are

abbreviated as s(x), c(x), and t(x), respectively. |¯ x| is the norm of vector x ¯ and ⊕bi=1 is used to denote the diagonal concatenation of terms. For example: h a1 0 0 i ⊕3i=1 ai = 0 a2 0 (1) 0

0 a3

The remainder of this paper is organized as follows: Section 2 develops an approximate linear model representing the quadcopter dynamics; Section 3 uses the approximate linear model in a convex MPC formulation for target tracking; Section 4 describes LA technique used in this paper; Section 5 provides simulation results and analysis; Section 6 provides a proof of stability; and Section 7 concludes the paper. 2. DYNAMIC MODEL This section derives a linear state space model that approximates the nonlinear dynamics of the quadcopter. The primary assumption is that the quadcopter operates near a hover where all angles and rates can be approximated as zero. This model is based on the work in Suicmez and Kutay (2014), Abdolhosseini et al. (2012); and Nemati and Kumar (2014). The Cartesian position (ξ); Euler orientation (η); and Body Rates (ω) of the quadcopter are expressed as follows:

Furthermore, if we assume the angles in η are very small (φ, θ, ψ ≈ 0) we can express the rotational dynamics as:  " # ˙ x (Iy − Iz )θ˙ψ/I U2 l/Ix ˙ ˙   η¨ = (Iz − Ix )φψ/Iy + U3 l/Iy ˙ z U4 /Iz (Ix − Iy )φ˙ θ/I 

(5)

where Ix , Iy , and Iz are the moments of inertia with respect to the subscript axis and l is the arm length (Suicmez and Kutay, 2014). Let us define vectors for the quadcopter states and inputs:    ˙ U1 ξ U2  η˙  X=  U =  U3 ξ U4 η

(6)

and denote the combined expressions (4) and (5) as X˙ = f (X, U ). In order to obtain an approximate linearization, we define matrices Ac and Bc composed of the first-order partial derivatives of f (X, U ) about trim points Xe and Ue , otherwise known as the Jacobian matrices Given δX = X − Xe and δU = U − Ue , and assuming that in the hover f (Xe , Ue ) = 0, we can derive a discrete-time linear approximation for the quadcopter dynamics as:

T

ξ = [x y z] T η = [φ θ ψ] T ω = [p q r]

(2)

The quadcopter is controlled using four inputs. These are related to the force generated by the front (F1 ), right (F2 ), rear (F3 ), and left (F4 ) rotors. As shown in (3), U1 is the total force generated by the rotors; U2 and U3 control roll and pitch; and U4 controls yaw: 1

U2  U3 U4

 1 0 =

1 1 1 F1 −1 0 1  F2  −1 0 1 0 F3 −km km −km km F4

  (3)

where km is a constant that relates the force (Fi ) and torque (Ti ) produced by the ith rotor such that Ti = km Fi . Based on the equations of motion presented in Abdolhosseini et al. (2012) and Nemati and Kumar (2014), we define the translational dynamics of the quadcopter as: " ξ¨ = TEB

# " # 0 0 Kd ˙ 0 − 0 − ξ m U1 /m g

(4)

where m is the mass of the quadcopter; g is the gravitational constant; Kd is a diagonal matrix composed of the appropriate drag coefficients; and TEB is the transformation from the Body-Axis Reference Frame to the EarthFixed Frame defined in Nemati and Kumar (2014).

(7)

B

A

where x, y, and z are positions in the North, East, and Up directions with respect to a fixed origin; φ, θ, and ψ are bank, pitch, and yaw angles; and p, q, and r are the roll, pitch, and yaw rates.

U 

δXk+1 = (I12×12 + Ac T ) δXk + Bc T δUk |{z} | {z }

where T is the sampling time; k is the time step; and A and B are the system and inputs matrices for the Linear Time Invariant (LTI) state-space model. 3. CONVEX MPC-BASED GUIDANCE This section uses the linear model described above to develop a convex MPC-based guidance algorithm for target tracking. This requires a quadratic objective function, an explicit representation of the state predictions in terms of perturbations on a feedback law and a constrained, convex optimization. Let us assume the quadcopter starts in a level hover with states X0 . The goal is located at the origin. We define an objective function: T

¯ δU ¯) = X ¯ T QX ¯ + δU ¯ RδU ¯ + X T P XN J(X, N

(8)

¯ = [X1 X2 ... XN ]T and δ U ¯ = [δU0 , δU1 ... where X T δUN −1 ] are stacked vectors over a finite prediction horizon (N ) and XN is the terminal state in that prediction. N Diagonal matrices Q = ⊕N i=1 Qi and R = ⊕i=1 Ri such that Qi and Ri are the weighting matrices for the states and inputs at each step and P is the weight of the terminal cost. By expanding the approximate linear model developed in (7) over a finite prediction horizon (N ), we obtain the set of future states in terms of the initial states and inputs as follows: ¯ = A δX0 + B δ U ¯ +X ¯e X (9)

where A and B represent the quadcopter dynamics ex¯e panded over N as defined in Jardine et al. (2015); and X is a column vector composed of the trim points Xe stacked N times. As described in Jardine and Givigi (2016), formulating the target tracking problem in terms of closed-loop feedback predictions provides convenient stability characteristics. This is accomplished by redefining the δU in terms of perturbations on an ideal stabilizing feedback law. Recalling that we defined our coordinates such that the target is at the origin, let us define this law using proportional feedback as follows: δUk = −Kcl Xk + gk

(10)

where Kcl is a feedback gain and gk is a perturbation on the law. Expanded over the prediction horizon we obtain: ¯ = Ag X0 + Bg G + cug δU

(11)

where Ag and Bg represent the inputs expanded over N as defined in Jardine and Givigi (2016); and cug is a constant term. Given the results in (9) and (11) we see that the cost function (8) is alternatively expressed as a function of G, δX0 and Xe . Therefore, we can define an optimum sequence of perturbations (G∗ ) on the ideal feedback law through the solution of the following convex optimization: ¯ U ¯) G∗ = arg min J(X, G

subject to (9) (11) ¯ ≤ fx Mx X ¯ ≤ fu Mu U

(12)

where Mx , Mu , fx , and fu are appropriately constructed linear inequality constraints. 4. LEARNING AUTOMATA The elements of the diagonal weighting matrices Qi and Ri in (8) were selected offline using a form of LA called finite action-set learning automation (FALA). The training process is briefly described here. A more detailed description is found in dos Santos et al. (2015). Fig. 1 illustrates the learning architecture used in this paper. The training was characterized by successive iterations (n) during which various automata:

Fig. 1. Illustration of learning architecture used in this paper pij (1) =

1 ∀ i, j Na

(13)

The first candidate automata were selected at random and used to build the weighting matrices in the MPC formulation. A full receding horizon MPC implementation was then executed over a predetermined timeframe (Nt ). This produced a set of states at each time step, for which the tracking error (e(t)) was computed using the Euclidean distance from the target. This tracking error was summed to compute the total cost for the nth iteration according to the following cost function:

JLA (n) =

Nt X

e2 (t)

(14)

t=1

The minimum recorded cost (JLA,min ) and mean (JLA,med ) for all iterations was stored for computation of a reinforcement signal constrained between 0 and 1: ! ! JLA,med − JLA (n) Rc (n) = min max 0, ,1 JLA (n) − JLA,min

(15)

Only positive reinforcements were used to update the probability distributions of the automata. We define a parameter Rb = 20 as the upper limit of a term used to accelerate the training. We then define the reward for the nth iteration as dos Santos et al. (2015):

α(n) = {a1 (n), a2 (n), ..., aNr (n) } were selected as weighting parameters in (8), where Nr is the combined total number of elements in the diagonal matrices Qi and Ri . A finite number of Na candidate automata were available, each with a probability distribution pij (n), where i ∈ {1, 2, ..., Na } and j ∈ {1, 2, ..., Nr }. At the beginning of the procedure, all automata were given equal probability:

 R(n) =

Rc (n)Rb 0

if 0 ≤ Rc (n) ≤ 1 if Rc (n) = 0

(16)

At the beginning of each iteration, a probability distribution vector (¯ p(n)) was composed of the probability distributions for each candidate automata. This probability distribution vector was then updated and normalized according to the following update equation:

p¯(n + 1) =

p¯(n) + λR(n)¯ p(n) |¯ p(n) + λR(n)¯ p(n)|

(17)

where the learning rate λ = 0.01 is a parameter used to control the speed of convergence of the training. At the end of each iteration, the system was checked for convergence. Convergence was considered to have occurred when the probability of each automata reached least 0.95. If the system had not yet converged, new automata were selected at random and the procedure was repeated. Notice that even though a linear model was used in the controller, the learning was accomplished using the nonlinear plant dynamics.

After 9911 iterations, the probability distributions for converged on the following values: Qi = ⊕[2, 1, 1, 256, 256, 512, 4, 1, 16, 4, 8, 64] Ri = ⊕[1, 4, 64, 4]

(20)

Fig. 2 illustrates this convergence over time for the first three parameters corresponding to the Cartesian x, y, and z coordinates. Similar results were obtained for the other parameters.

5. RESULTS The LA technique described above was used to select the weighting matrices in the target tracking objective function (8). These weighting matrices were then used in a single target tracking simulation to demonstrate their performance. 5.1 Learning Simulation The architecture described in Fig. 1 was implemented offline for the purpose of learning which weighting matrices (Qi and Ri ) minimize tracking error over time. Let us define two sets of automata: αQ (n) = {Q1,1 (n), Q2,2 (n), ..., Q12,12 (n)}

(18)

αR (n) = {R1,1 (n), R2,2 (n), ..., R4,4 (n)}

(19)

which correspond to the diagonal elements of the weighting matrices in (8). We then select the best values of αQ and αR from a finite set of candidate automata using the FALA technique. In order to provide a wide range of values while minimizing the learning time, a total of Na = 10 possible automata were chosen as powers of 2 (i.e. 1, 2, 4, . . . , 512). For each iteration, a full receding horizon MPC implementation was executed starting from a hover at ξ = [−5, −4.5, −6.5]T (South, West and below the goal state located at the origin). The quadcopter was simulated by integrating the planned inputs through the nonlinear dynamic model using MATLAB. The quadcopter parameters were based on those of the QBall 2 produced by Quanser (2015) and are summarized in Table 1. Table 1. Quadcopter Parameters Parameter g m l Ix , I y , I z Kd T N min/max φ,θ,ψ ˙ θ, ˙ ψ˙ min/max φ, min/max U1

Value 9.81 1.79 0.3 0.03,0.03,0.04 ⊕[0.1, 0.1, 0.1] 0.1 30 ±0.35 ±2.10 0,40

Units m/s2 kg m kgm2 s rad rad/s N

Fig. 2. Convergence of Learned Parameters for Translational Positions In Fig. 2 we see the probabilities fluctuate throughout the learning process. This is due to the complexity of the model, which contains 16 interdependent states. 5.2 Tracking Simulation The convex MPC-based guidance formulation described in Section 3 and weighting matrices learned in (21) were used to develop a sequence of planned control inputs for a simulated quadcopter. The performance of the learned parameters was compared to values found in previous work. The characteristics of the simulation were kept the same as those in learning simulation above. The simulation was implemented as a receding horizon, which computed a new plan at each timestep. This added a degree of robustness, since the new plan compensated for disturbances not considered in the approximate linear

model. The quadcopter was initially placed in a hover at ξ = [−5, −4.5, −6.5]T with the goal at the origin. Fig. 3 provides a plot of the quadcopter tracking. We see the quadcopter converge on the target position in the x, y and z directions.

In Fig. 5 we see the accumulated error (during the simulation) for the learned parameters is less than that of the parameters presented in (21).

Fig. 3. Quadcopter reaches target in x-y-z directions Fig. 4 shows the stage cost decreases with time. The small cost increase early in the simulation was necessary to drive the quadcopter towards the target. This illustrates how the formulation considers the predicted behaviour of the quadcopter over the entire horizon when determining the optimum solution.

Fig. 5. Comparison of Learned Parameters to Previous Work 6. STABILITY Here we present a theorem to demonstrate the stability characteristics of the convex MPC formulation. The following theorem is based on the work of Mayne et al. (2000) in which stability is achieved by steering the vehicle states to a terminal set from which the stabilizing feedback law can take over. Let us define a set of admissible inputs (U), admissible states (X) and a terminal set containing the origin 0 ∈ XT ∈ X. XT is defined such that all inputs computed using the stabilizing feedback law from states in XT are also admissible (− Kcl Xk ∈ U ∀ Xk ∈ XT ). Theorem 1. (Stability) Considering the objective function (8), model (9), initial states X0 ∈ X, inputs Uk ∈ U, feedback law (10), constraints (12) and XN ∈ XT , when formulated as MPC, the system will be asymptotically stable. Proof. We can define a terminal set (XT ) such that it satisfies the following requirements:

Fig. 4. Cost decreases with time The accumulated error for this simulation was plotted against the accumulated error from a nearly identical simulation using parameters similar to those found in previous work. Previous work typically selects these parameters through trial-and-error (Suicmez and Kutay, 2014): Qi = ⊕[1, 1, 0.1, 100, 100, 1000, 50, 5, 0.1, 100, 10, 0.1] Ri = ⊕[10, 0, 0, 0] (21)

• it contains the origin (0 ∈ XT ); • it satisfies the state constraints (XT ∈ X); and • any input resulting from a state contained in the terminal set satisfies the input constraints (Kcl Xk ∈ U ∀ Xk ∈ XT ). T Furthermore, the terminal cost XN P XN is a Lyapunov function if Kcl and P are chosen using the stabilizing gain and positive-definite solution of the Discrete Algebraic Riccati Equation. According to Zhang et al. (1999), if T XN P XN is a Lyapunov function then the terminal set is also invariant under (10). Therefore, given these conditions, it follows from the proof in Section 4.6 of Mayne et al. (2000) that the system is asymptotically stable.

Fig. 6. Perturbations converge to zero over time This stability is best illustrated by a plot of the control signals or perturbations (gk ) over time. As shown in Fig. 6, these perturbations converge to zero over time. Inspection of (10) shows that when gk is zero, the stabilizing feedback law takes over. 7. CONCLUSION This paper presented an MPC-based target tracking technique for a quadcopter tuned using machine learning. Specifically, it used LA to select the weighting parameters of the objective function offline such that they minimized tracking error. Since the learning was accomplished using the full, nonlinear dynamics of the quadcopter, these weights also partially compensate for errors caused by the linearization required for a convex MPC formulation. Simulation results demonstrate that this technique produced optimized trajectories when implemented using a receding horizon. A comparison with previous work demonstrated improved tracking performance. Finally, when formulated in terms of perturbations on a stabilizing feedback law, the system is shown in to be asymptotically stable. Future work will investigate how these subtle tuning technique effect the flight dynamics of actual physical systems. REFERENCES Abdolhosseini, M., Zhang, Y., and Rabbath, C. (2012). Trajectory tracking with model predictive control for an unmanned quad-rotor helicopter: Theory and flight test results. In Proceedings of the 5th International Conference on Intelligent Robotics and Applications - Volume Part I, 411–420. Springer-Verlag, Berlin, Heidelberg. Choi, M. and Choi, S. (2014). Model predictive control for vehicle yaw stability with practical concerns. Vehicular Technology, IEEE Transactions on, 63(8), 3539–3548. de la Pea, D.M., Limn, D., Kouzoupis, D., Quirynen, R., Frasch, J., and Diehl, M. (2015). Block condensing for fast nonlinear MPC with the dual newton strategy. In 5th IFAC Conference on Nonlinear Model Predictive Control, 26 – 31. dos Santos, S.R.B., Givigi, S.N., and Nascimento, C.L. (2015). Autonomous construction of multiple structures using learning automata: Description and experimental

validation. IEEE Systems Journal, 9(4), 1376–1387. doi: 10.1109/JSYST.2014.2374334. Grant, M. and Boyd, S. (2014). CVX: Matlab software for disciplined convex programming. URL http://cvxr.com/cvx/. Hafez, A., Marasco, A., Givigi, S., Iskandarani, M., Yousefi, S., and Rabbath, C. (2015). Solving multiUAV dynamic encirclement via model predictive control. IEEE Transactions on Control Systems Technology, PP(99), 1–1. doi:10.1109/TCST.2015.2411632. Howard, T., Pivtoraiko, M., Knepper, R., and Kelly, A. (2014). Model-predictive motion planning: Several key developments for autonomous mobile robots. IEEE Robotics Automation Magazine, 21(1), 64–73. doi: 10.1109/MRA.2013.2294914. Jardine, P.T. and Givigi, S. (2016). A predictive motion planner for guidance of autonomous UAV systems. In IEEE International Systems Conference 2016. Jardine, P.T., Givigi, S., and Noureldin, A. (2015). Incorporating feedback predictions for optimized UAV attack mission planning. In Control and Automation (MED), 2015 23th Mediterranean Conference on, 740–746. Mayne, D., Rawlings, J., Rao, C., and Scokaert, P. (2000). Constrained model predictive control: Stability and optimality. Automatica, 36(6), 789 – 814. Narendra, K.S. and Thathachar, M.A.L. (1989). Learning Automata: An Introduction. Prentice-Hall, Inc., Upper Saddle River, NJ, USA. Nemati, A. and Kumar, M. (2014). Modeling and control of a single axis tilting quadcopter. In 2014 American Control Conference, 3077–3082. doi: 10.1109/ACC.2014.6859328. Quanser (2015). User Manual Qball 2 for QUARC: Set Up and Configuration. Quanser, Inc., Markham, ON, Canada. Ramana, M., Varma, S.A., and Kothari, M. (2016). Motion planning for a fixed-wing UAV in urban environments. In 4th IFAC Conference on Advances in Control and Optimization of Dynamical Systems, 419 – 4249. doi: 10.1109/CYBConf.2015.7175929. Ribeiro, T.T., Conceio, A.G., Sa, I., and Corke, P. (2015). Nonlinear model predictive formation control for quadcopters. In 11th IFAC Symposium on Robot Control, 39–44. Sparrow, R.D., Adekunle, A.A., Berry, R.J., and Farnish, R.J. (2015). Balancing throughput and latency for an aerial robot over a wireless secure communication link. In Cybernetics (CYBCONF), 2015 IEEE 2nd International Conference on, 184–189. doi: 10.1109/CYBConf.2015.7175929. Suicmez, E.C. and Kutay, A.T. (2014). Optimal path tracking control of a quadrotor UAV. In Unmanned Aircraft Systems (ICUAS), 2014 International Conference on, 115–125. doi:10.1109/ICUAS.2014.6842246. Zhang, L., Lam, J., and Zhang, Q. (1999). Lyapunov and riccati equations of discrete-time descriptor systems. Automatic Control, IEEE Transactions on, 44(11), 2134–2139. doi:10.1109/9.802931. Zhao, J. and Wang, J. (2015). Integrated model predictive control of hybrid electric vehicle coupled with aftertreatment systems. Vehicular Technology, IEEE Transactions on, PP(99), 1–1.

Suggest Documents