Adaptive Flight-Control Design Using Neural-Network

0 downloads 0 Views 7MB Size Report
A neural-network-aided nonlinear dynamic inversion-based hybrid ... mass of the airplane, lb .... where UA is the forward velocity in feet per second, WA is the vertical ... mA is the airplane mass; TA is the thrust in pounds of force; α is the angle of ..... where the NDI gains K1; K2; ::: and Km are constant diagonal matrices of ...
JOURNAL OF AEROSPACE INFORMATION SYSTEMS Vol. 11, No. 11, November 2014

Downloaded by INDIAN INSTITUTE OF SCIENCE on December 18, 2014 | http://arc.aiaa.org | DOI: 10.2514/1.I010165

Adaptive Flight-Control Design Using Neural-Network-Aided Optimal Nonlinear Dynamic Inversion Geethalakshmi S. Lakshmikanth∗ Wichita State University, Wichita, Kansas 67220 Radhakant Padhi† Indian Institute of Science, Bangalore 560012, India John M. Watkins‡ Wichita State University, Wichita, Kansas 67220 and James E. Steck§ Wichita State University, Wichita, Kansas 67260 DOI: 10.2514/1.I010165 A neural-network-aided nonlinear dynamic inversion-based hybrid technique of model reference adaptive control flight-control system design is presented in this paper. Here, the gains of the nonlinear dynamic inversion-based flightcontrol system are dynamically selected in such a manner that the resulting controller mimics a single network, adaptive control, optimal nonlinear controller for state regulation. Traditional model reference adaptive control methods use a linearized reference model, and the presented control design method employs a nonlinear reference model to compute the nonlinear dynamic inversion gains. This innovation of designing the gain elements after synthesizing the single network adaptive controller maintains the advantages that an optimal controller offers, yet it retains a simple closed-form control expression in state feedback form, which can easily be modified for tracking problems without demanding any a priori knowledge of the reference signals. The strength of the technique is demonstrated by considering the longitudinal motion of a nonlinear aircraft system. An extended single network adaptive control/nonlinear dynamic inversion adaptive control design architecture is also presented, which adapts online to three failure conditions, namely, a thrust failure, an elevator failure, and an inaccuracy in the estimation of CMα . Simulation results demonstrate that the presented adaptive flight controller generates a near-optimal response when compared to a traditional nonlinear dynamic inversion controller.

Nomenclature CL , CM , CD CT dT g I yy J mA QA q S TA U UA , W A U_ A , W_ A X α Δt δe η θA υADD λ ϕT

= = = = = = = = = = = = = = = = = = = = = = =

coefficients of lift, pitching moment, and drag coefficient of thrust distance between the thrust line and the fuselage axis acceleration due to gravity aircraft product moment of inertia cost function mass of the airplane, lb pitch rate of the airplane, deg ∕s dynamic pressure reference area thrust, lbf control vector forward and vertical velocities of the airplane, ft∕s forward and vertical accelerations of the airplane continuous-time state vector angle of attack, deg sampling period elevator deflection, deg learning rate of the adaptive neural network pitch angle of the airplane, deg output of the adaptive neural network costate vector angle between the thrust line and the fuselage axis

Received 14 August 2013; revision received 25 June 2014; accepted for publication 4 August 2014; published online 1 December 2014. Copyright © 2014 by Wichita State University, Wichita KS. Published by the American Institute of Aeronautics and Astronautics, Inc., with permission. Copies of this paper may be made for personal or internal use, on condition that the copier pay the $10.00 per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923; include the code 2327-3097/14 and $10.00 in correspondence with the CCC. *Department of Electrical Engineering and Computer Science. † Department of Aerospace Engineering. ‡ Department of Electrical Engineering and Computer Science; [email protected]. § Department of Aerospace Engineering, Department of Electrical Engineering and Computer Science; [email protected] (Corresponding Author). 785

786

LAKSHMIKANTH ET AL.

Subscripts k max N trim

= = = =

discrete time maximum range of the variable around the equilibrium point normalized value trim value

I.

Downloaded by INDIAN INSTITUTE OF SCIENCE on December 18, 2014 | http://arc.aiaa.org | DOI: 10.2514/1.I010165

D

Introduction

YNAMIC programming [1] has been one of the most systematic and powerful methods of nonlinear control design. However, it is well known that it suffers from the ”curse of dimensionality” and requires a large storage and computational requirement, thereby making it practically impossible to implement for real-life problems. An attempt to solve this drawback led to the development of approximate dynamic programming (ADP) [2,3] by Werbos. The nonlinear function approximation properties of neural networks have been wisely exploited in optimal nonlinear control design using this framework, followed by the adaptive-critic approach. The use of neural networks [4] in the design of reconfigurable control has been demonstrated very well in [5–28]. Ferrari and Stengel have demonstrated the use of an adaptive critic (AC)-based [5,6] optimal controller in the design of an optimal controller for the control of a six-degree–of-freedom (6-DOF) business jet aircraft. Ferrari and Stengel devised a weight update rule for the optimal control of a 6-DOF business jet airplane that performs very well even in the presence of failures. However, the AC method of control design involves the interactive training of the “action” and “critic” neural networks, and hence takes a longer time to converge to the optimal solution when compared to the single network adaptive critic (SNAC). An improvement to the dual-neuralnetwork-based AC architecture is the development of the SNAC controller [8,9]. As the name suggests, it uses a single neural network to arrive at the optimal solution. As a consequence, the interactive training procedures that are required for the AC are eliminated with the SNAC, enabling a much faster convergence compared to the AC. The ability of the SNAC in optimal state regulating applications has been demonstrated very well in [8,9,20]. For nonregulatory command following, it becomes practically difficult to define the entire state-space region for training the SNAC neural network without knowing the command sequence in advance. A version of the SNAC, termed, J-SNAC, where the critic network learns the mapping of the state vector to the cost function, is used for aircraft control where the error between the F-16 aircraft’s and the model follower’s responses is regulated by a critic network for a predefined trajectory [10]. Training can be accomplished for a limited state-space resulting from a set of pilot commands, but if real pilot commands cause the states to stray too far from the training domain, then the results will be less than optimal and may even become destabilizing. The SNAC is primarily meant for state regulation. If the command sequence is known a priori, the SNAC can still be used by posing the problem as a regulation problem, even though it would theoretically be a tracking problem. Janardhan et al. [11] have demonstrated the use of the SNAC for the optimal control of an unmanned aircraft: a 30%-scaled model of a Cessna 150. The SNAC controller is used in conjunction with an online neural network to take care of any deviations from the nominal model, structural or engine failures, etc. The authors have demonstrated the use of a SNAC controller to track a predefined command sequence or maneuver. Nonlinear dynamic inversion (NDI), on the other hand, is a very popular control design technique [13–18] for both output regulation and command following and has been used in a variety of application areas, especially in the field of aerospace engineering for suboptimal flight control. Its inherent advantages of straightforward design procedure, ease of implementation, and global exponential stability (GES) of tracking error make it a favorable control design method. Neural networks have been used in conjunction with NDI controllers to form a reconfigurable controller capable of adapting to system inversion errors, failure conditions, etc. Fisher [13] demonstrated the augmentation of an adaptive neural network with a traditional NDI controller for the robust, adaptive control of an aircraft. Lemon et al. [14] demonstrated the application of NDI design in conjunction with a simple neural-network-based adaptive bias corrector (ABC) adaptation for the model reference adaptive control (MRAC) of a general aviation aircraft [26]. The NDI controller is versatile with respect to the ease of design, implementation, and closed-loop structure. However, it lacks the interpretations, formalism, and advantages of the optimal control design approach. Most of the nonlinear control techniques are based on linearizing the equations of motion and using a nonlinear feedback. Brinker and Wise [19] demonstrated the robustness of the dynamic inversion controller when applied to the longitudinal and lateral control of an aircraft. Macfarland and Calise [20] proposed a method of using dynamic inversion in conjunction with a neural-network-based adaptive control strategy for the flight control of a missile. A number of MRAC-based flight-control systems have been presented in the past [29], in the recent work of Nguyen [30], and in [27,29,31]. However, most of the work employs a linearized reference model. The nonlinear flight system is linearized, and the NDI gains are computed to obtain the desired performance based on a linearized reference model. The presented hybrid control design method offers a great advantage in performance by using a stable, optimal nonlinear model as its reference when computing the NDI gains. This makes the response of the controller near optimal, with a reduced cost when compared to the traditional MRAC methods. The use of an optimal nonlinear reference model also offers the advantage of better adaptation to model uncertainties and parametric uncertainties. Having a nonlinear, stable, optimal reference model to compute the NDI gains also reduces the oscillations in the response to a great extent, even with an incorrect initialization of NDI gains. The simulation results demonstrate the same. It is very common to find an artificial neural network (ANN) as a nonlinear function approximator, with very little memory and computation time. Earlier, neural network based adaptive systems were not a reliable choice for a high risk application such as flight control, because of the empirical nature of these methods. This changed tremendously with the application of ANNs for the nonlinear inverse control of the XV-15 tilt rotor by Rysdyk et al. [22]. It has been demonstrated both theoretically as well by simulation in [22] that the ANN weights remain bounded during online training. The guaranteed boundedness of the signals is demonstrated in Rysdyk’s Ph.D. thesis [23]. The need for a control design approach that retains the advantages of both the SNAC and NDI techniques and makes SNAC completely extendable to command following (where knowing a set of predefined command sequences in advance is not a requirement) led to the method presented here of the SNAC-aided NDI design technique. This will be referred to as SNAC-NDI from this point forward for simplicity. The formulation of the technique in its very nascent stage is presented in [12], with a scalar nonlinear system as the demonstrating example. A detailed formulation and closed-form expressions resulting from the optimization procedures, when applied to a complex, nonlinear longitudinal aircraft system, is presented next. The ability of the newly developed technique in state regulation and command following is demonstrated effectively by comparing the responses with a traditional NDI controller. In addition, an extended architecture that can handle online failures, such as control failures, and adaptation to modeling errors is presented. Desktop simulation results of SNAC-NDI control applied to a small single-piston-engine Cessna aircraft are presented. Section II presents the longitudinal aircraft model used in the study. Section III contains the controller design procedure using the SNAC-NDI technique, and Sec. IV

787

LAKSHMIKANTH ET AL.

extends this architecture to adapt online to failure conditions. Section V details the results of the simulation and demonstrates the ability of the newly developed technique for near-optimal regulation, command following, and adaptation to failures. The newly developed technique not only retains the inherent advantages of both SNAC and NDI techniques, but simulation results also show that, since the selection of NDI gains is guided by a presynthesized SNAC network, a SNAC-NDI controller effectively transforms a marginally stable system to a stable system. The last section presents some concluding remarks. It should be noted that, even though a longitudinal aircraft system has been used as a demonstrating nonlinear system, without loss of generality, the formulation of the SNAC-NDI controller can be extended to a full six-degree-of-freedom aircraft system. Also, the motivation behind the consideration of a three-degree-of-freedom (3–DOF) longitudinal system is the easy extension to autolanding and takeoff scenarios. The authors present a novel, hybrid MRAC-based flight-control design considering a single trim point. The technique can be easily extended to multiple trim points. The design and implementation are simple and straightforward for the following reasons, First, the entire training process of the critic neural network of the SNAC system is performed offline, so the additional training time for more trim points does not change the execution of the online code. Second, controller gain expressions for the SNAC-NDI control equation are analytically derived offline as closedform expressions.

Downloaded by INDIAN INSTITUTE OF SCIENCE on December 18, 2014 | http://arc.aiaa.org | DOI: 10.2514/1.I010165

II.

Longitudinal Aircraft System

A complex nonlinear system with multiple control inputs is considered to demonstrate the ability of the newly developed technique in synthesizing a near-optimal control law. A 3-DOF longitudinal aircraft model [32] is considered here. The nonlinear longitudinal dynamics of an airplane are described by the following differential equations:  qS T U_ A  −W A QA − g sin θA  C sin α − CD cos α  A cos φT ; mA L mA  qS T −CL cos α − CD sin α − A sin φT ; W_ A  QA UA  g cos θA  mA mA 1 Q_ A  qS  cC  M − T A dT; I yy θ_ A  QA

(1)

where UA is the forward velocity in feet per second, W A is the vertical velocity in feet per second, QA is the pitch rate in degrees per second, θA is the pitch angle in degrees, g is the acceleration due to gravity, q  12 ρV 2true  is the dynamic pressure; S is the reference area of the wing; c is the mean aerodynamic chord length; mA is the airplane mass; T A is the thrust in pounds of force; α is the angle of attack in degrees; dT is the distance between the thrust line and the line passing through the aerodynamic center of the airplane; ϕT is the thrust angle, in degrees, between the thrust line and the line passing through the aerodynamic center of the airplane, which is zero; and Iyy is the mass moment of inertia. The coefficients of the lift, drag, and pitching moment are, respectively, given by CL  CL0  CLα α  CLQ^

α_ c QA c  CL?  CLδe δe; 2V true 2U α_ A1

CD  CD0  CDK CL0  CLα α2 ; CM  CM0  CMα α  CMQ^

α_ c QA c  CM ?  CMδe δe 2V true α_ 2U A1

(2)

p with V true  U2A  W 2A , UA1 as the initial forward velocity in feet per second, and δe as the elevator deflection in degrees. Also, the angle of attack satisfies α  θA − γ  tan−1 W A ∕UA , where γ is the flight-path angle in degrees. Equations (1) are, respectively, the force equations in the forward and vertical directions, the pitching moment equation about the Y axis, and the kinematic equation. We know that, for an airplane, the values of variables UA , W A , QA , θA , δe, and T A are in different ranges. To efficiently train the neural network, it is necessary to have the input and output variables of the neural network in the same range. To achieve this, Eqs. (1) are expressed in terms of the normalized deviation variables about a steady-state operating point. This enables us to choose the normalized deviation variables as states and controls, with all of them lying in the same range of  1 −1 . The normalized deviation variables are defined as follows. Note that we do not linearize the equations of motion in this normalization step: UA  UAtrim  UAN UAmax ; θA  θAtrim  θAN θAmax ;

W A  W Atrim  W AN W Amax ; δe  δetrim  δeN δemax ;

QA  QAtrim  QAN QAmax ; T A  T Atrim  T AN T Amax

(3)

where UAtrim , W Atrim , QAtrim , θAtrim , δetrim , and T Atrim are the values of UA , W A , QA , θA , δe, and T A obtained at the trim condition; UAN , W AN , QAN , θAN , δeN , and T AN are scalar values in the range  1 −1 ; and UAmax , W Amax , QAmax , θAmax , δemax , and T Amax are the normalization variables corresponding to UA , W A , QA , θA , δe, and T A , respectively. UAmax , etc. are chosen such that UAtrim − UAmax ≤ UA ≤ UAtrim  UAmax . We choose for our work UAtrim  168.8063 ft∕s and UAmax  25 ft∕s, so that 143.8063 ≤ UA ≤ 193 ft∕s. Defining X   x1

x2

x3

x4 T   UAN

W AN

QAN

θAN T ;

U   u1

u2 T   δeN

T AN T

(4)

as the state and control variables, respectively, we have from Eqs. (3) and (4), x1  UA − UAtrim ∕UAmax ;

x2  W A − W Atrim ∕W Amax ;

u1  δe − δetrim ∕δemax ;

u2  T A − T Atrim ∕T Amax

Substituting Eqs. (2), (3), and (5) into Eq. (1), we obtain

x3  QA − QAtrim ∕QAmax ;

x4  θA − θAtrim ∕θAmax ; (5)

788

LAKSHMIKANTH ET AL.

x_ 1 

 −W

1 UAmax

Atrim QAtrim

− x3 W Atrim QAmax − x2 QAmax W Amax − x2 x3 W Amax QAmax − g sinθAtrim  x4 θAmax  

Ω1 fΩ5 − CDK Ω2 cos Ω10  CLδe δetrim  u1 δemax  sin Ω10 g  Ω7 T Atrim  u2 T Amax   U Q Atrim Atrim  x3 U Atrim QAmax  x1 QAtrim U Amax  x1 x3 UAmax QAmax  g cosθAtrim  x4 θAmax 

x_ 2 

1 W Amax

x_ 3 

1 Ω fC  CMα Ω10  Ω4  CMδe δetrim  u1 δemax g − Ω9 T Atrim  u2 T Amax  QAmax 3 M0

x_ 4 

1 Q  x3 QAmax  θAmax Atrim

−Ω1 fΩ6  CDK Ω2 sin Ω10  CLδe δetrim  u1 δemax  cos Ω10 g − Ω8 T Atrim  u2 T Amax 

(6)

where  C Q    CM ? α_ c  ρSU2Atrue ρScU  2Atrue MQ^ Atrim  x3 QAmax c α_ Ω3   ; Ω2  CL0  CLα Ω10 2 ; ; Ω4  ; 2mA 2I yy 2UA1 2V true   CL ^ QAtrim  x3 QAmax c CL? α_ c  CL0  CLα Ω10  Q  α_ sin Ω10 − CD0 cos Ω10 ; Ω5  2UA1 2V true Ω6  fCL0  CLα Ω10 g cos Ω10  CD0 sin Ω10 ;         W Atrim  x2 W Amax cos ϕT sin ϕT dT ; Ω8  ; Ω9  ; Ω10  tan−1 Ω7  mA mA UAtrim  x1 UAmax I yy

Downloaded by INDIAN INSTITUTE OF SCIENCE on December 18, 2014 | http://arc.aiaa.org | DOI: 10.2514/1.I010165

Ω1 



The nonlinear differential equations [Eq. (6)] can be represented in the following control affine form, under the assumption that the variables CL0 , CLα , etc., are time invariant as X_  FX  GXU

(7)

where f2 f3 f4 T  −W Q  Atrim Atrim − x3 W Atrim QAmax − x2 QAmax W Amax − x2 x3 W Amax QAmax − g sinθAtrim  x4 θAmax 

FX   f1 f1  f2  f3 

1 UAmax

U

1

Ω1 fΩ5 − CDK Ω2 cos Ω10  CLδe δetrim sin Ω10 g  Ω7 T Atrim

Atrim QAtrim

 x3 UAtrim QAmax  x1 QAtrim UAmax  x1 x3 UAmax QAmax  g cosθAtrim  x4 θAmax   −Ω1 fΩ6  CDK Ω2 sin Ω10  CLδe δetrim cos Ω10 g − Ω8 T Atrim

W Amax

;

;

1 Ω fC  CMα Ω10  Ω4  CMδe δetrim g − Ω9 T Atrim ; QAmax 3 M0 1

f4 

θAmax

QAtrim  x3 QAmax 

g11

g21

g31

and  GX  g11 g31

g41

T

; g12 g22 g32 g42       Ω1 CLδe δemax sin Ω10 −Ω1 CLδe δemax cos Ω10 Ω7 T Amax  ; g12  ; g21  ; UAmax UAmax W Amax     Ω3 CMδe δemax −T Amax Ω9  g42  0 ; g32  ; g41  0; QAmax QAmax

III.

 g22 

 −Ω8 T Amax ; W Amax

Controller Design

The SNAC-NDI command-following controller uses a presynthesized SNAC regulator neural network to determine the NDI gains necessary for either the near-optimal state regulation or command-following application. The first step, therefore, is to synthesize a SNAC network for optimal regulation. We know that the SNAC [8–12] is applicable to a class of nonlinear systems where the optimal control, derived from the optimal control equation, can be directly expressed as a function of states and costates. Control affine nonlinear systems with a quadratic cost function fall under this class. Our newly developed control design technique extends the applicability of the SNAC to command-following applications and, at the same time, gives NDI a formalism that is connected to the optimal control design procedure. Synthesizing a SNAC controller for optimal regulation, which is the first step in controller design, will be described in the next section. A. SNAC Controller Synthesis for Optimal State Regulation (Offline)

The SNAC control design follows the ADP procedure of neural network training. The system, cost function, and derivation of the optimality conditions necessary for training the neural network are now all in discrete time. The system in Eq. (7) is discretized by the Euler discretization method to obtain the discrete time dynamics. We first define

789

LAKSHMIKANTH ET AL.

x4k T   UAN k

Xk   x1k

x2k

U k   u1

u2k T   δeNk

x3k

T AN k T ;

W AN k

θAN k T ;

QAN k

λk   λ1k

λ2k

λ3 k

λ4k T

(8)

as the discrete time state, control, and costate variables, respectively. We also define UAk  UAtrim  x1k UAmax ;

W Ak  W Atrim  x2k W Amax ;

QAk  QAtrim  x3k QAmax ;

θAk  θAtrim  x4k θAmax ;

δek  δetrim  u1k δemax ;

T Ak  T Atrim  u2k T Amax

(9)

The discrete time dynamics can now be written as

Downloaded by INDIAN INSTITUTE OF SCIENCE on December 18, 2014 | http://arc.aiaa.org | DOI: 10.2514/1.I010165

Xk1  Xk  ΔtFXk   GXk U k 

(10)

where Δt is the sampling period, FXk   FXXXk , and GXk   GXXXk . Pk →∞ T Xk Qc Xk  UTk Rc Uk Δt where The cost function for optimal state regulation in discrete time is given by Jd  12 0f 2 3 q1 0 0 0 h i 6 0 q2 0 0 7 6 7, and Rc  rc1 0 define the weighting constants on the states and control variables and are positive Qc  4 0 0 q3 0 5 0 rc2 0 0 0 q4 semidefinite and definite matrices, respectively. The detailed derivation of the optimality conditions can be found in [9] and will not be repeated here. On applying the stationary condition, we obtain the optimal control equation given by   1 T  ∂ 1 T ∂Xk Xk Qc Xk  U Tk Rc Uk Δt  λk1  0 ∂Uk 2 ∂Uk

T ⇒ Uk  −R−1 c GXk  λk1

(11)

Also, the costate equation is given by λk 

    ∂ 1 T ∂Xk1 T Xk Qc Xk  UTk Rc U k Δt  λk1 ∂Xk 2 ∂Xk

⇒ λk  Qc Xk Δt 



∂Xk1 ∂Xk

T

λk1

(12)

Equation (12) is used for training the critic network, and Eq. (11) replaces the action network of the traditional AC architecture. The data generation and training procedure explained in [9] is used to train the critic network. The neural network weights are first initialized using a procedure called pre-training. The method presented in [9] is employed here for our study and a brief explanation of is included here for clarity of presentation. Pretraining is a process of initializing the neural networks using the linear quadratic regulator laws. This process initializes the neural network weights to a value based on the linearized system. Following that, the pretrained network is trained, and it is subjected to a training process based on the optimal laws derived using ADP, called the costate equation [Eq. (12)] and the optimal control equation [Eq. (11)]. Equations (11) and (12) are derived for a nonlinear system, thus providing a closed-form expression for the optimal control law. It is a standard result used in many AC[5,6] and SNAC-based [7–10,12,28] nonlinear optimal control methods. Interested readers can find proof of stability and detailed derivations of the equations in [2,3,5,6,9]. The critic neural network is trained to effectively map the state vector Xk as its input to the costate vector λk1 as its output. The block diagram shown in Fig. 1 illustrates the procedure of obtaining the training data to train the critic network. A partially trained critic neural network gives the costate vector λak1 as the output with the state vector Xk as its input. This is then substituted in optimal control equation (11) to obtain the optimal control vector Uk. The state vector Xk and the control vector Uk are then substituted in the state and the optimal costate equations given by Eqs. (7) and (12), respectively, to obtain the target values for the costate vector λtk1. The critic network is then trained to map the state vector Xk to the costate vector λtk1. After successful training, the critic network maps the state vector Xk to the optimal costate vector λk1. In our study, the critic network is composed of four subnetworks: one for each component of the costate vector. After successful training, the critic network maps the state vector Xk to the optimal costate vector λk1. Each subnetwork consists of a feedforward neural network with an input layer consisting of four neurons; two hidden layers, with six neurons each; and an output layer with one neuron. A tangent sigmoid transfer function is used for the input and hidden layers, and a linear transfer function is used for the output layer. The telescopic training procedure outlined in [9] was adopted to train the SNAC neural network and is briefly outlined next for clarity. The normalized state and control vectors lie in the range of  1 −1 . As stated before, a discrete time approach is used for ADP formulation and SNAC training. Keeping the time instant k fixed, the set S  fXk : Xk ∈ domain of operationg is chosen such that the elements of the set closely cover the entire subset of the state space in which the critic network is to be trained. That is to say, the set has to be so chosen that the elements in it approximately cover the entire subset of the state space in which a typical state trajectory is supposed to lie at time step k, starting from any initial point in the domain of interest. For the class of problems concerning regulators, as time increases, the states move toward the origin; therefore, the set should also include points in state space that are close to zero. For i  1; 2; : : : I, the set Si  fXk : kXk k∞ ≤ Ci g, where Ci is a positive constant, is generated. To begin with, a small value of C1 is chosen and the set S1 is generated. The critic network is trained for all Xk ∈ S1 . After the network is trained well, a constant C2 is chosen such that C2 > C1 and the set S2 is generated. The network is then trained for all Xk ∈ S2 . This process of training is repeated until the states, Xk ∈ S1 , with SI  fXk : kXk k∞ ≤ CI g and CI > CI−1 > : : : C2 > C1 , cover the domain of interest. Many methods have been proposed to make the process of state generation straightforward. In this paper, we employ the method used in [9] with Ci  0.05i, i  1; 2; 3; : : : I − 1, and CI  1. Even though, theoretically, a single neural network with the same number of outputs as the number of costates (which happens to be the same as the number of states) is capable of approximating the underlying function, there are several numerical concerns that one should be careful about. As it is well known, the costates have no physical meaning and the individual components of the costate vector can vary quite a bit in magnitude. If there is any order-of-magnitude difference, a small error of the larger magnitude component can completely suppress the large error from the

Downloaded by INDIAN INSTITUTE OF SCIENCE on December 18, 2014 | http://arc.aiaa.org | DOI: 10.2514/1.I010165

790

LAKSHMIKANTH ET AL.

Fig. 1 SNAC: critic network training (244 × 182 mm [79 × 78 dots per inch (DPI)]).

smaller magnitude component. In view of these, it is always a good idea to decouple the network structure and assume individual subnetworks for each of the components of the output vector. It is also important to note that the training of the SNAC network is carried out for a single equilibrium point in the flight envelope, but the training can readily be extended to multiple equilibrium points to cover the entire operating flight envelope. Even though the SNAC network is trained in discrete time as explained previously, it can be used online/offline in the continuous domain to arrive at the continuous time-optimal control signal derived using the variational approach. It has been shown in [9] that the critic network in continuous time maps the state vector X to the costate vector λ. The continuous time system is defined by Eq. (7) and the corresponding cost function is defined as follows:

Jc 

1 2

Z

tf →∞

XT Qc X  U T Rc U dt

(13)

0

Following the variational approach of arriving at the optimal control equation, we obtain U   u1

u2 T   δeN

T AN T

T U  −R−1 c GX λ

(14)

The costate equation is given by ∂λT FX  GXU λ_  −Qc X − ∂X

(15)

where FX and GX are defined by Eq. (7), X and U are defined by Eq. (5), with λ   λ1 λ2 λ3 λ4 T as the costate vector. A fully trained SNAC network, when used in continuous time, takes the state vector X as the input to give the costate vector λ as the output, which is then used in the optimal control equation given by Eq. (14) to obtain the optimal control for state regulation. This control is then denormalized using Eq. (9) before applying it to the plant dynamics given by Eq. (1). The block diagram of the SNAC control design architecture is shown in Fig. 2. This method of SNAC design was used to obtain the optimal regulator control law for a nonlinear time-varying system of a morphing aircraft in [28]. Here, we extend SNAC to command-following applications, where the command sequence is not known a priori. The presented method does not involve online training of the SNAC neural network (critic network). Even though the neural network is trained to provide optimal control signals around a specific trim point, the architecture of the SNAC-NDI system is shown in Fig. 3 and the extended architecture that adapts online to failure conditions is shown in Fig. 4. A detailed explanation of the extended architecture is given in Sec. IV, and a corresponding block diagram is shown in Fig. 4.

Fig. 2

SNAC control design architecture [330 × 77 mm (50 × 50 DPI)].

Downloaded by INDIAN INSTITUTE OF SCIENCE on December 18, 2014 | http://arc.aiaa.org | DOI: 10.2514/1.I010165

LAKSHMIKANTH ET AL.

Fig. 3

Fig. 4

791

SNAC-NDI control design architecture [280 × 159 mm (69 × 69 DPI)].

SNAC-NDI architecture with online adaptation [230 × 172 mm (85 × 85 DPI)].

B. NDI Controller Design

This section details the NDI control design and formulation of a nonlinear system with the aircraft as the example. NDI has been extensively used in the areas of robotic systems, aerospace engineering, etc., to arrive at a suboptimal closed-form nonlinear control. Its advantages, such as ease of implementation and straightforward design method, make it a very popular nonlinear control design method. The denormalized rigid-body dynamics of an airplane over the entire flight envelope can be described by n differential equations, given by Eq. (1), and they can be expressed in the control affine form X_  FA X  GA XU;

Y A  CX

(16)

where X ∈ Rn is the state vector, U ∈ Rv is the control vector, C is a constant matrix of order ν × n, and YA ∈ Rv is the output vector. The total relative degree of the output vector, Y A   y1 y2 : : : yv T , is given by m  m1  m2  : : :  mv, where mi is the relative degree of the ith component yi . Therefore, we can write 

dm Y A dtm



 F A X  G A XU

(17)

Looking at Eq. (17), we see that the first requirement for the existence of the inverse control is that the control effective matrix G A X be nonsingular, i.e., the inverse of the matrix exists for all time. It is shown that this condition is fully satisfied in the normal flight envelope of an airplane [17]. des Typically, if Y des A denotes the reference input that is commanded by the pilot, the tracking error is defined as EA  Y A − Y A . Since m is the relative degree of Y A , the mth-order error dynamics are given by

792

LAKSHMIKANTH ET AL.

dm EA dm−1 EA dm−2 EA  K1  K2  · · · Km EA  0; m m−1 dt dt dtm−2

dm Y A dm Y des dm−1 EA dm−2 EA A  − K1 − K2 − · · · −Km EA m m m−1 dt dt dt dtm−2

(18)

where the NDI gains K1 ; K2 ; : : : and Km are constant diagonal matrices of order v. Typically, the desired time response of the system guides the computation of gains. This process of selection of gains is a very important and crucial step in NDI control design. On equating Eqs. (17) and (18), we arrive at the closed-form expression for NDI control given by

Downloaded by INDIAN INSTITUTE OF SCIENCE on December 18, 2014 | http://arc.aiaa.org | DOI: 10.2514/1.I010165

U  G A X−1



  dm Y des dm−1 EA dm−2 EA A  − K − K − : : : − K E X − F 1 2 m A A dtm dtm−1 dtm−2

(19)

Expression (19) can readily be used for state regulation or command-following applications. Even though the NDI control law offers a straightforward method of controlling the states in the output vector, it does not guarantee stable internal dynamics. It is important to check the stability of the states unobservable by the NDI control law. The NDI controller generates a suboptimal response, as the formulation does not involve any optimization procedures and there is no connection to the optimal control principles of ADP. The hybrid control design presented in this paper connects the NDI formulation with ADP and dynamically optimizes the performance of the NDI controller with the aid of the presynthesized SNAC neural network. For the aircraft, we are interested in controlling the forward velocity and pitch angle components of the state vector: x1 and x4 , respectively. We see from Eq. (6) that the forward velocity differential equation has a relative degree of one, and the pitch angle differential equation has a relative degree of two. We therefore define two scalar outputs: y1  x1 and y2  x4 . The effect of elevator deflection is assumed minimal when compared to the effect of thrust control input while following a forward velocity command. Similarly, the effect of thrust control input is ignored when computing the elevator deflection required to follow a pitch angle command. des Let us first look at y1  x1 . Let ydes 1 denote the command or reference signal for forward velocity. Defining e1  y1 − y1 , the first-order error dynamics are given by des y_ 1  y_des 1 − ku y1 − y1 

(20)

where ku is a positive scalar that exponentially drives the tracking error e1, defined previously as zero over finite time. Also, since the relative degree is one, from Eq. (6), we can write y_ 1  fy1  gy1 u2 NDI

(21)

where fy1  1∕UAmax 

 −W

Atrim QAtrim

− x3 W Atrim QAmax − x2 QAmax W Amax − x2 x3 W Amax QAmax − g sinθAtrim  x4 θAmax  

Ω1 fΩ5 − CDK Ω2 cos Ω10  CLδe δetrim  u1 δemax  sin Ω10 g  Ω7 T trim 

gy1  1∕UAmax Ω7 T max  Equating Eqs. (20) and (21), we arrive at the control expression for thrust given by des u2 NDI  1∕gy1 y_des 1 − ku y1 − y1  − f y1 

(22)

Equation (22) can readily be used to obtain the thrust control input necessary for either command following or state regulation (by setting ydes 1 and all its higher-order derivatives to zero). des To compute the elevator, we look at the output equation y2  x4 . Defining the tracking error e2  y2 − ydes 2 , where y2 is the target/commanded pitch angle, the second-order error dynamics are given by des _ 2 − y_ des y 2  ydes 2 − kq y 2  − kθ y2 − y2 

(23)

where kq and kθ are positive scalars chosen such that they satisfy the Hurwitz polynomial. This means the polynomial p2  kq p  kθ has all its roots strictly in the left half-complex plane [33]. Assuming that the internal dynamics are stable, the gains selected will drive the tracking error e2, defined previously, to zero over finite time. Also, since the relative degree is two, from Eq. (6), we obtain y 2  fy2  gy2 u1 NDI

(24)

where fy2  1∕θAmax Ω3 fCM0  CMα Ω10  Ω4  CMδe δetrim g − Ω9 T trim  u2 T max ; gy2  1∕θAmax Ω3 Cmδe δemax  Equating Eqs. (23) and (24), we arrive at the elevator deflection given by des _ 2 − y_ des u1 NDI  1∕gy2 ydes 2 − kq y 2  − kθ y2 − y2  − f y2 

(25)

Equation (25) can readily be used to obtain the elevator necessary for either command following or state regulation (by setting ydes 2 and all its des higher-order derivatives to zero). It is assumed that ydes 1 , y2 , and all their higher-order derivatives are bounded and known. Gains ku , kq , and kθ are chosen to give the desired response dynamics of a predefined model. The selection of gains makes the response either suboptimal or optimal. The following section explains a method of computing the NDI gains with the aid of a presynthesized SNAC network, making the system response near optimal. The general formulation and application to the airplane system are presented.

LAKSHMIKANTH ET AL.

793

C. SNAC–NDI

In this technique, the presynthesized SNAC network aids the computation of the NDI gains. Typically, the model is used to compute the NDI gains of the inverse controller. Assuming perfect state feedback and 100% control effectiveness, the NDI control-generated response mimics the model follower response. The goal of the hybrid technique is to use the optimality principles of SNAC and the closed-form solution of NDI control to arrive at a near-optimal and closed-form control law that can readily be used for both regulation and command-following applications [12]. The SNAC-NDI aims to dynamically optimize the NDI controller so that the resulting controller gives a near-optimal response even in the presence of uncertainties. The gains ku , kq , and kθ are now time-varying variables; therefore, the control expressions for the SNAC-NDI controller are given by

Downloaded by INDIAN INSTITUTE OF SCIENCE on December 18, 2014 | http://arc.aiaa.org | DOI: 10.2514/1.I010165

des des _ 2 − y_ des _ des u1 S-NDI  1∕gy2 y des 2 − kq ty 2  − kθ ty2 − y2  − f y2  u2 S-NDI  1∕gy1 y 1 − ku ty1 − y1  − f y1 

(26)

The first step in computing SNAC-NDI control is to synthesize a SNAC network for the system as a regulator, as explained in Sec. III.A. This can be done offline. Once the SNAC network is fully trained offline to deliver an optimal control law for output regulation, it is used online to compute the NDI gains. The gains ku t, kq t, and kθ t are evaluated in an effort to make the response of the SNAC-NDI controller match the SNAC response when used as a regulator. These gains can then be used for either near-optimal command following or state regulation. It is also necessary to arrive at the closed-form expression for these gains to eliminate the use of any online optimization tools that may introduce delays and make the system response sluggish. Figure 3 shows the architecture of the SNAC-NDI control design method. The top half of Fig. 3 shows the SNAC regulator that employs the system model to compute the time-varying NDI gains. The bottom half of Fig. 3 shows the SNAC-NDI controller that uses the computed NDI gains for command following. To arrive at the closed-form solution for the gain, ku t, we define the output vectors for state regulation using the NDI and SNAC methods as y_ 1  −ku ty1 ;

y_ 1  fy1  gy1 u2

(27)

where y_ 1 is the output dynamics obtained by the NDI method of control, y_1 is obtained by following the SNAC method of control computation, and u2 is obtained from the presynthesized SNAC neural network by applying optimal control equation (14). We now define a cost function given by 1 JD1  y_1 − y_1 2 r1  r2 ku t − ku t − Δt2  2

(28)

where r1 and r2 are positive scalars, and Δt is the integration time constant. The first term in the cost function defined by Eq. (28) is to minimize the difference between the output dynamics defined by Eq. (27). The second term is included so that the transition of the gains is smooth, in turn making the SNAC-NDI control signal continuous. The constrained optimization problem defined by min JD1 ku t

is solved to arrive at the following closed-form expression for ku t: ku t 

r2 ku t − Δt − r1 y_ 1 y1  r1 y21  r2 

(29)

Before updating the gain of the control law defined by Eq. (26) with the new value ku t, the following checks are made to ensure closed-loop stability [33] •

The Hurwitz polynomial is satisfied by ku t, which means ku t > 0. •

The variations in gains, absku t − ku t − Δt, are bounded and small. If the aforementioned checks are satisfied, the control law defined by Eq. (26) is updated with the new value ku t; otherwise, the gain value ku t − Δt is used in Eq. (26). We now move to evaluate the closed-form expressions for the gains kq t and kθ t.Our goal is to analytically derive closed-form expressions for the gains that will minimize the cost function defined by 1 JD2  y 2 − y 2 2 r1  r2 fkq t − kq t − Δt2  kθ t − kθ t − Δt2 g 2

(30)

where Δt is the time constant of integration, and the output dynamics y 2 and y 2 are defined by y 2  −kq ty_ 2 − kθ ty2 ;

y 2  fy2  gy2 u1

(31)

where, u1 is obtained from the presynthesized SNAC neural network by applying the optimal control equation defined by Eq. (14). The unconstrained optimization problem min JD2

kq t;kθ t

is solved to arrive at the following closed-form expression for kq t and kθ t, given by kq t 

r2 kq t − Δt − r1 y 2 y_ 2  kθ y_ 2 y2  ; r1 y_ 22  r2 

kθ t 

r2 kθ t − Δt − r1 y2 y2  kq y_ 2 y2  r1 y22  r2 

(32)

794

LAKSHMIKANTH ET AL.

Downloaded by INDIAN INSTITUTE OF SCIENCE on December 18, 2014 | http://arc.aiaa.org | DOI: 10.2514/1.I010165

The Appendix consists of a Lyapunov-based stability analysis of the SNAC-NDI-controlled system. The time-varying system is analyzed, and the conditions required for the stability of the closed-loop system are derived in this section. The Lyapunov-based stability analysis shows that keeping the variation of the gains appropriately bounded ensures global exponential stability of the closed-loop error dynamics. One of the drawbacks of the NDI technique is that it does not guarantee internal stability. NDI formulation is based on the mth-order error dynamics of the output signal, where m is the relative degree of the output signal. The controller is designed to exponentially drive the “tracking error” to zero. The NDI method ensures global exponential stability of the tracking error only. As it turns out, in general, there is no explicit guarantee about the stability of the internal dynamics [15,17,19–25,33], which arises when the relative degree of the system is lesser than the number of states. There are some local stability results that can be arrived at from the ”zero dynamics,” but the result is quite limited. The derivations presented in the Appendix assume that the closed-loop system is internally stable. With that assumption, the Appendix proves that keeping the variation of the NDI gains small and bounded guarantees global exponential stability of the closed-loop error dynamics. Accordingly, before updating the gains of the control law defined by Eq. (26) with the new values kq t and kθ t, the following checks are made to ensure closed-loop stability [33]. 1) The gains satisfy the Hurwitz polynomial, which implies any polynomial p2  kq tp  kθ t has roots that lie strictly in the left halfq

complex plane [33]. The gains are therefore required to satisfy the following conditions: kq t > 0, kq t > kq t2 − 4kθ t, and kθ t > 0. 2) The variations in gains, abskq t − kq t − Δt and abskθ t − kθ t − Δt, are bounded and small. If the aforementioned checks are satisfied, the control law given by Eq. (26) is updated with the new values: kq t and kθ t. An approximate NDI controller based on one previously developed at Wichita State University [14,16] was used in this study. Accordingly, the inverse controller takes the commanded forward acceleration and pitch acceleration as the input, and it outputs the elevator and thrust controls. An approximate solution can be found by assuming CDδe ≈ 0 and W_ A ≈ 0, and it is given by TA 

 D mU_ des A  mg sin γ  qSC ; cosα  φT 

δe 

I yy Q_ des C  T dT A − M δe0  A  cC  Mδe  cC  Mδe qS qS CMδe

(33)

where CD  CD0  CDK CL0  CLα α2 ; des U_ des A  U Amax −ku ty1 − y1 ;

CM δe0  CM0  CMα α 

CMQ^ QA c 2V true

CM ? α_ c 

des Q_ des A  θAmax −kq tx3 − kθ ty2 − y2 

α_

2UA1

; (34)

Here, in order to ignore the effect of elevator deflection while computing thrust control, as well as ignore the effect of thrust control while computing elevator deflection, dT is set to be equal to zero. Equation (34) is obtained after denormalization [given by Eq. (3)]; therefore, U_ des A and Q_ des A are the denormalized states. The steps in arriving at the SNAC-NDI control can be summarized as follows: 1) The SNAC network is trained offline for optimal state regulation. A fully trained SNAC neural network captures the relationship between the state vector Xk and the costate vector λk1 in the discrete time domain, or X to λ in continuous domain. 2) At t  t0 , an initial stabilizing value is chosen for gains ku t, kq t, and kθ t. des 3) Online, when ydes 1 and y2 are commanded, the states of the airplane model are assigned to the commanded value and the SNAC regulator is used to regulate the states of the model to the trim condition. The optimal trajectory that the aircraft model’s states follow is used online in the closed-form expressions given by Eqs. (29) and (32) to compute the gains for the SNAC-NDI controller. 4) For every time, t < tf , the gains are evaluated as per Eqs. (29) and (32). Checks are made on the new values of gains to ensure global exponential stability of the tracking error. 5) If the checks are satisfied, the obtained gains values are then substituted in the SNAC-NDI control expression given by Eq. (26) to obtain the required thrust and elevator control inputs. 6) State equations (1) are then used to progress forward in time. 7) The preceding process is repeated until t  tf . The simulation results shown in Sec. V prove the efficiency of the newly developed controller when applied to both state regulation and command following in comparison to the traditional NDI controller. It is also important to note that an ideal inverse controller assumes that a full knowledge of the system is readily available, and efficient state feedback and 100% control effectiveness are present. However, practically, one or more of these assumptions are not satisfied, and the inverse controller is very sensitive to errors in the system model uncertainties, loss of control effectiveness, and parameter inaccuracies. It is therefore essential to include an adaptive strategy to handle these deviations from the nominal flight conditions. The following section explains this phenomenon and the adaptive NDI control design procedure.

IV.

SNAC–NDI with Online Adaptation

In the literature [5–7,10,13–18], a neuroadaptive class of control typically consists of two neural networks: one that is trained offline using available plant models or recorded data; and the other, which is trained online to account for any modeling errors or failures. Accordingly, an extension to the architecture of Fig. 3 is shown in Fig. 4 that consists of an adaptive neural network, in addition to the SNAC neural network. This presented architecture consisting of two neural networks, where the SNAC neural network is trained offline based on the available plant model and the online neural network that adapts to uncertainties, modeling errors and failures. Continuing from the general nominal model described by Eq. (17), the deviations from the nominal system can be described by dm Y A  fF A X  ΔFA X; Ug  G A X  ΔGA X; UU dtm

(35)

where ΔFA and ΔGA are unknown, nonlinear, additive uncertainties representing unmodeled dynamics, parameter inaccuracies, and any deviations from the nominal flight dynamics. This uncertainty shows up in inversion equation (18) as

LAKSHMIKANTH ET AL.



dm Y A dtm



 

 m−1   m−2   dm Y des d EA d EA A − K t t tE − K − : : : − K 1 2 m A  ΔX; U dtm dtm−1 dtm−2

795

(36)

where ΔX; U represents the lumped uncertainty in the system, which is a combination of the uncertainties ΔFA and ΔGA . Therefore, there is a need for an online neural network that can learn to cancel this additive uncertainty effectively so that the system dynamics are very close to its nominal state dynamics even in the presence of inaccuracies. The output of the online neural network υADD will now appear in Eq. (36) as

Downloaded by INDIAN INSTITUTE OF SCIENCE on December 18, 2014 | http://arc.aiaa.org | DOI: 10.2514/1.I010165



dm Y A dtm



 

 m−1   m−2   dm Y des d EA d EA A − K t t tE − K − : : : − K 1 2 m A  Δ − υADD dtm dtm−1 dtm−2

(37)

Many neuroadaptive schemes can be found in the literature employing a variety of neural network architectures. One of the interesting and efficient methods of adaptive control used by NASA is the model reference adaptive control [13–18,27–31,34], which consists of a neural network to adapt to these uncertainties. The adaptive bias corrector architecture [14,16,26] is a simple bias updating neural-network-based adaptation. Developed at Wichita State University, it consists of a simple single bias neuron for which the weight is updated online to compensate for any errors between the desired and the actual responses of the system. The details of this adaptation strategy and the stability are discussed in [16,26] and will not be repeated here. The modeling error defined in Eq. (37) is assumed to be parameterizable by a neural network. In general, neural networks are capable of approximating continuous functions in the compact sets (i.e., bounded and closed sets). The output of the neural network is simply the bias weight of the neural network. It is then the adaptive signal, which is an estimate of the uncertainty and approximately cancels ΔX; U. The bias weight update rule is _ NN  −σW NN  ηEmodel  υ_ ADD  W

(38)

where W NN is the bias weight of the adaptive neural network, Emodel is the modeling error, η is the constant learning rate, and σ > 0 is the modification term added to the basic ABC adaptation law to keep the weight WNN bounded. The constant learning rate η was chosen by a simulation study of forward velocity and pitch angle responses for typical failure scenarios, as described in [14]. For our study, η was chosen to be 0.01 in the forward velocity adaptation loop and 0.15 in the pitch angle adaptation loop. To ensure the boundedness of the neural network bias weight W NN , the term −σWNN is added. This is a common sigma modification made to the weight update rule to keep the weights bounded. Interested readers can find a detailed explanation in [35]. Yucelen et al. demonstrated in [35] that the weights remain bounded and the modification terms improve robustness and performance of an adaptive controlled system. For our study, a sigma value of 1 × e−4 was chosen. The modeling error is defined as the difference in accelerations given by Emodel  X_ des − X_ act

(39)

_ act Since we are interested in controlling the forward velocity and pitching angle components of the state vector, Emodel  U_ des A − U A is used in the des act _ _ velocity control loop and Emodel  QA − QA is used in the pitching angle control loop. Although the actual components of the forward velocity and pitching angle accelerations are obtained directly from the measured output of the airplane system, the desired acceleration components are defined by Eq. (34), where the gains ku t, kq t, and kθ t are the values calculated with the aid of the SNAC regulator, as discussed in Sec. III.C. The extended architecture for adaptive control of a 3-DOF longitudinal airplane is shown in Fig. 4. In the following section (Sec. V), the authors have presented a number of cases of command-following scenarios with and without failure conditions. The cases are carefully selected after reviewing various flight-control papers [7,10,14,16,27–31,34]. The cases clearly represent the most common command-following scenarios and the most common failure conditions. The state regulation case was presented as a part of the simulation results to demonstrate that the SNAC-NDI works just as well as a SNAC controller, for state regulation applications. Accordingly, three different failure scenarios are considered and the results are included in Sec. V.B.2 to demonstrate the operation of the adaptive controller: 1) The first failure scenario is 33% thrust reduction during forward velocity command following, which demonstrates the adaptation due to engine failure. 2) The second failure scenario is 50% elevator reduction during theta command following, which demonstrates the loss in elevator deflection effectiveness. 3) The third failure scenario is inaccuracy in the estimation of the parameter CMα to about 50% of the actual value, which demonstrates the change in the center of gravity of the airplane. To demonstrate the advantage of using the SNAC-NDI controller, the responses are compared with the traditional NDI controller. The same ABC adaptation is used for both the SNAC-NDI and traditional NDI controlled systems.

V.

Results of Simulation

The simulation was carried out on a representative small single-piston-engine general aviation aircraft: aircraft A in [32]. The aircraft parameters and aerodynamic coefficients defined in [32] for an initial forward velocity of 100 kt (168.9 ft∕s) at 1000 ft are used. The trim values for the aircraft are tabulated in Table 1, and the normalization values used are shown in Table 2. The values of the weighting factors used in the SNAC optimal control cost function given by Eq. (13) are 2

2.5 6 0 Qc  6 4 0 0 and

0 0 0 0

3 0 0 0 0 7 7 0 0 5 0 2.5

796

LAKSHMIKANTH ET AL.

Table 1 Parameter UAtrim W Atrim QAtrim θAtrim δetrim T Atrim

Trim values

Value Unit 168.8063 ft∕s 5.6241 ft∕s 0 deg ∕s 1.907 deg 1 deg 213.7816 lbf

Downloaded by INDIAN INSTITUTE OF SCIENCE on December 18, 2014 | http://arc.aiaa.org | DOI: 10.2514/1.I010165

Table 2 Normalization values Parameter UAmax W Amax QAmax θAmax δemax T Amax

Rc 

Value Unit 25 ft∕s 4.17 ft∕s 0.154 deg ∕s 1.49 deg 0.831 deg 151.21 lbf



2.5 0

0 2.5



These values were chosen by trial and error, so that the response of the aircraft with the SNAC controller could handle the delays and dynamics associated with the elevator actuator and the engine model. The critic neural network was implemented in MATLAB using the Neural Network Toolbox and consisted of four subnetworks: one for each component of the costate vector λk. Each subnetwork consisted of four neurons in the input layer: one for each component of the state vector, with six neurons in the hidden layer and one neuron in the output layer. A “tangent sigmoid” transfer function was used in the input and hidden layers, and a ”linear” transfer function was used in the output layer. In the cost function used by the SNAC-NDI controller to calculate the controller gains, given by Eqs. (28) and (30), the scalars r1 and r2 are set to be equal to 10 and 1, respectively. This is to assign a larger weight to the minimization of the error between the responses generated by the SNAC-NDI and SNAC (as a regulator), when compared to the transition of the SNAC-NDI gains from its previous value. A second-order elevator actuator model was used for simulation with the dynamics defined by δ e  −ωn 2δe δe − 2ξδe ωn δe δ_ e  ωn 2δe

(40)

where ξδe is the elevator servo damping ratio, which is equal to 0.74; and ωn δe is the elevator servo natural frequency and is equal to 18.3 Hz. A rate limiter block was used before feeding the elevator signal to the actuator. The maximum slew rate of the elevator rate limiter was set to 4 deg ∕s. A first-order engine model for which the dynamics are defined by Eq. (41) was used: x_ Engine  1∕τEngine 1 − xEngine 

(41)

where τEngine is the engine lag time constant and is set to be equal to 0.25 s. A. SNAC–NDI Controller for State Regulation

To validate the SNAC-NDI controller, the response of the system with the SNAC-NDI controller for state regulation is compared to the response generated with the SNAC regulator. We know that the SNAC regulator drives the states of the system in an optimal manner, assuming the critic network is very well trained. We can now compare the responses due to the SNAC-NDI and NDI controllers with the SNAC response to see how des close the SNAC-NDI controller response is to the optimal. For state regulation, ydes 1 , y2 , and all its higher-order derivatives are set to zero in the SNAC-NDI and NDI control expressions given, respectively, by Eqs. (26), (25), and (22). Also, since there are no failures considered here, the adaptive neural network is turned off by setting the learning rate η to zero. The trim values are given by UAtrim  168.9 ft∕s; δetrim  1 deg

and

W Atrim  5.6241 ft∕s;

QAtrim  0 deg ∕s;

θAtrim  1.9079 deg;

T Atrim  213.78 lbf

The system is simulated with the initial offtrim conditions: UA0  UAtrim  0.5  UAmax  181.4 ft∕s, W A0  W Atrim  5.6241 ft∕s, QA0  QAtrim  0 deg ∕s, θA0  θAtrim  1.907 deg. and tf  45 s. Also, the gains assigned to the NDI-controlled system are set to be equal to the initial gain values of the SNAC-NDI-controlled system: ku NDI  ku t  0SNAC-NDI  0.2;

kq NDI  kq t  0SNAC-NDI  1;

kθ NDI  kθ t  0SNAC-NDI  2

were chosen to give the wanted response dynamics of a 20 s settling time for velocity response and a 10 s settling time for pitch angle response. The time history of the denormalized states, control variables, and gains are shown in Fig. 5. It can be seen from Fig. 5 that the SNAC-NDI gains quickly adjust to mimic the SNAC controller response very well as an optimal state regulator. Further, the costs of the two controllers were compared using the cost function given by

797

Downloaded by INDIAN INSTITUTE OF SCIENCE on December 18, 2014 | http://arc.aiaa.org | DOI: 10.2514/1.I010165

LAKSHMIKANTH ET AL.

Fig. 5 Comparison for state regulation: a) state trajectory [237 × 171 mm (137 × 128 DPI)], and b) control and gain trajectory [309 × 161 mm (112 × 102 DPI)].

Jperf 

1 2

Z

tf 0

ETX Qc EX  ETU Rc EU  dt

(42)

where EX   UA − Udes A

W A − W des A

QA − Qdes A

T θA − θdes A 

and EU   δe − δedes

Table 3

T T A − T des A 

Cost comparison as a state regulator

Initial condition UA0  UAtrim  0.5  UAmax  181.4 ft∕s, W A0  W Atrim  5.6241 ft∕s, QA0  QAtrim  0 deg ∕s, θA0  θAtrim  1.907 deg UA0  UAtrim − UAmax  143.9 ft∕s, W A0  W Atrim  5.6241 ft∕s, QA0  QAtrim  0 deg ∕s, θA0  θAtrim  1.907 deg UA0  UAtrim  0.5  UAmax  181.4 ft∕s, W A0  W Atrim  0.2  W Amax  6.4581 ft∕s, QA0  QAtrim  0.2  QAmax  0.03 deg ∕s, θA0  θAtrim  0.2  θAmax  2.20 deg UA0  UAtrim  UAmax  193.81 ft∕s, W A0  W Atrim  5.6241 ft∕s, QA0  QAtrim  0 deg ∕s, θA0  θAtrim  1.907 deg

Jperf SNAC 2.443  104

Jperf - SNAC- Jperf - NDI NDI 2.52  104 7.073  104

9.2  104

1  105

2.64  105

2.383  104

2.392  104

5.895  104

1  104

1.57  104

3.357  104

798

LAKSHMIKANTH ET AL.

The costs were computed for various initial conditions and tabulated in Table 3 to demonstrate that the SNAC-NDI controller performs just as well as the SNAC regulator. In the cost function given by Eq. (42), for a regulator, the desired values are equal to the trimmed values, Udes A  UAtrim , des des des  δe des W des trim , and T A  T Atrim . A  W Atrim , QA  QAtrim , θA  θAtrim , δe B. SNAC–NDI Controller for Command Following (Without Failures)

The main advantage of the new hybrid technique is that it can be readily used for command following, and the target command sequence/ trajectory need not be known a priori. Its performance is compared to that of the traditional NDI controller by computing the cost using the cost function defined by Eq. (42). The final or steady-state values of the denormalized states UA , W A , QA , and θA and the controls δe and T A can be computed from state equations (1) by setting U_ A , W_ A , Q_ A , and θ_ A to zero. The command-following ability of the newly developed technique is demonstrated next by considering two different scenarios. 1. Commanding a Forward Velocity Step

Downloaded by INDIAN INSTITUTE OF SCIENCE on December 18, 2014 | http://arc.aiaa.org | DOI: 10.2514/1.I010165

The airplane starts at the trim condition (i.e., normalized states X0   0 0 0 0 T ), and the final time is set to be tf  40 s. The pilot commands a forward velocity step. The normalized command can be written as

Y des   ydes 1

 ydes 2 

1 0

0 T ht ≥ 1i 0 T hotherwisei

This is a 25 ft∕s increase in forward velocity from the trim condition. Again, the responses generated with SNAC-NDI are compared to the NDI response, and the trajectories of the denormalized state and control variables are plotted in Fig. 6 with the time history of the gains. The initial values chosen are given by ku NDI  ku t  0SNAC-NDI  0.2;

kq NDI  kq t  0SNAC-NDI  1;

kθ NDI  kθ t  0SNAC-NDI  2

To compare the performance of the two controllers, the cost function defined by Eq. (42) is used. The desired values are given by des des des  1.8 deg, T des  315 lbf, and W des  1.4 ft∕s. The cost function J Udes perf is A  192.8063 ft∕s, QA  0 deg ∕s, θA  1.907 deg, δe A A then evaluated for both NDI and SNAC-NDI controller-driven airplanes. It is seen that J perf SNAC-NDI  1.633 × 105 and Jperf NDI  2.5 × 105 . It can be seen from Fig. 6 that the NDI controller saturates in thrust, and the cost comparison shows that the SNAC-NDI controller performs better than the traditional NDI controller.

Fig. 6 Comparison with a forward velocity step command: a) state trajectory [311 × 169 mm (112 × 96 DPI)] and b) control and gain trajectory [309 × 158 mm (112 × 99 DPI)].

LAKSHMIKANTH ET AL.

799

2. Commanding a Step in Pitch Angle with Incorrect Initialization of Gains

It is important to note that an incorrect selection of gains can make the closed-loop system either marginally stable or completely unstable. This scenario is demonstrated here by deliberately choosing the NDI gains that make the system marginally stable. The same initial value is chosen for both SNAC-NDI and NDI control design methods. As expected, the system driven by the NDI controller displays a marginally stable response. However, it is seen that, with SNAC-NDI, the stable SNAC controller alters the gains so as to stabilize the closed-loop system. This demonstrates a major advantage of using the SNAC-NDI controller when compared to a traditional NDI controller. The airplane starts at the trim condition, normalized states: X0   0 0 0 0 T and tf  45 s. The pilot commands a negative unit step in the pitch angle, and the normalized command can be written as

Downloaded by INDIAN INSTITUTE OF SCIENCE on December 18, 2014 | http://arc.aiaa.org | DOI: 10.2514/1.I010165

Y des   ydes 1

ydes 2 



 0 −1 T ht ≥ 1i  0 0 T hotherwisei

This is a decrease in pitch angle of 1.5 deg from its trim value. Again, the responses generated with SNAC-NDI are compared with the NDI response and the trajectories of the actual variables of the airplane, obtained by denormalizing the state and control variables given by UA, W A , QA , and θA , and δe and T A are plotted in Fig. 7 with the time history of the gains ku t, kq t, and kθ t. Also, the initial marginally stable values are given by ku NDI  ku t  0SNAC-NDI  0.13, kq NDI  kq t  0SNAC-NDI  5, and kθ NDI  kθ t  0SNAC-NDI  8. Further, in order to compare the performance of the two controllers, the cost function used is given by Eq. (42). The desired values are given des des des  1 deg, T des  145.7 lbf, and W des  5.67 ft∕s. The cost function J by Udes perf A  168.8063 ft∕s, QA  0 deg ∕s, θA  0.4 deg, δe A A is then evaluated for both NDI- and SNAC-aided NDI controller-driven airplanes. It is seen that Jperf SNAC-NDI  1.654 × 104 , Jperf NDI  4.809 × 104 , and Jperf NDI → ∞ as t → ∞. C. SNAC–NDI Controller for Command Following in Presence of Various Failure Conditions

The online neural network in Fig. 4 compensates for deviations of the system from its nominal state in case of any online failures, control failures, and/or parameter estimation inaccuracies. The adaptation of the adaptive neural network is now turned on. The learning rates for the adaptive neural networks were chosen to match the values presented in [14,16]. Accordingly, η is set to 0.01 in the forward velocity loop and 0.13 in the pitching angle loop. To demonstrate the effectiveness of the newly introduced technique to adapt online, we consider three different failure scenarios and present the results of the simulation in this section. To compare the performance of the SNAC-NDI and NDI controllers, the same adaptation scheme is used in the traditional NDI controller-driven adaptive system and trajectories are plotted for all the failure cases described next.

Fig. 7 Trajectory comparison with an incorrect gain initialization: a) state [315 × 169 mm (110 × 96 DPI)] and b) control and gain [309 × 163 mm (112 × 96 DPI)].

800

LAKSHMIKANTH ET AL.

1. Engine Failure While Following a Forward Velocity Step Command

The airplane starts at the trim condition. A forward velocity step increase of 16.8 ft∕s (10 kg∕s) is commanded at t  1 s. An engine failure is simulated by setting the actual thrust to be 66% of the commanded thrust for t > 5 s. The states, control, gains, and adaptation trajectories generated by SNAC-NDI and NDI controllers are compared and shown in Figs. 8 and 9. It can be seen from the plots that both controllers efficiently follow the pilot command with just 66% of the available thrust. Again, the responses show that the control effort of the NDI controller is more than that of the SNAC-NDI controller. Also, to compare the two control schemes based on the cost function defined in Eq. (42), we evaluate des des the steady-state values from the system equations in Eq. (1) with Udes A  184.6963 ft∕s, QA  0 deg ∕s, and θA  1.907 deg, assuming no des des des failures, and solve the set of simultaneous equations to arrive at δe  1.547 deg, T A  284.6 lbf, and W A  2.71 ft∕s. These values are substituted in Eq. (42) to evaluate the total cost. The total cost is evaluated to be Jperf SNAC-NDI  6.468 × 104 and J perf NDI  1.313 × 105 . It can be clearly seen from both Figs. 8 and 9 and the cost function comparison that the SNAC-NDI response is near optimal, even in the presence of failures, whereas the NDI controller response is suboptimal.

Downloaded by INDIAN INSTITUTE OF SCIENCE on December 18, 2014 | http://arc.aiaa.org | DOI: 10.2514/1.I010165

2. Elevator Failure While Following a Step Command in the Pitching Angle θ

The airplane starts at the trim condition. The pilot commands a negative step of -1.5 deg, which is equal to a −0.026 radians in pitch angle at t  1 s. An elevator failure is simulated by setting the actual elevator to be 50% of the commanded elevator for t > 3 s and tf  20 s. The states, control, gains, and adaptation trajectories generated by SNAC-NDI and NDI controllers are compared and shown in Figs. 10 and 11. Also, the trajectories of the NDI and SNAC-NDI controller responses are compared to the corresponding trajectories generated without an elevator failure. This is to show that the deviation of the SNAC-NDI controller in the presence of a failure from the corresponding response without a failure is minimal when compared to the NDI controller response. Also, the SNAC-NDI controller takes very little time to adapt to the failure and return to its near-optimal response. The oscillation in the SNAC-NDI elevator response is due to the effort of the SNAC-NDI controller in adapting to the failure and, at the same time, optimizing the response of the system. Since the commanded signal is the same as that presented in scenario 2 of Sec. V.B, the steady-state values used in the cost function defined in Eq. (42) remains the same. The total cost obtained is J perf SNAC-NDI  1.541 × 104 , whereas Jperf NDI  2.638 × 104 , demonstrating the

Fig. 8 Trajectory comparison, adapting to engine failure: a) state [351 × 185 mm (139 × 131 DPI)] and b) control and gain [353 × 192 mm (138 × 126 DPI)].

Downloaded by INDIAN INSTITUTE OF SCIENCE on December 18, 2014 | http://arc.aiaa.org | DOI: 10.2514/1.I010165

LAKSHMIKANTH ET AL.

801

Fig. 9 Adaptation trajectory comparison, adapting to engine failure [340 × 182 mm (144 × 133)].

Fig. 10 Trajectory comparison, adapting to elevator failure: a) state [436 × 208 mm (112 × 105 DPI)] and b) control and gain [289 × 143 mm (150 × 150 DPI)].

Downloaded by INDIAN INSTITUTE OF SCIENCE on December 18, 2014 | http://arc.aiaa.org | DOI: 10.2514/1.I010165

802

LAKSHMIKANTH ET AL.

Fig. 11 Adaptation trajectory adapting to elevator failure [304 × 148 mm (150 × 148 DPI)].

Fig. 12 Trajectory comparison, adapting to an inaccuracy in the estimation of CMα a) state [303 × 147 mm (150 × 148 DPI)] and b) control and gain [286 × 141 mm (150 × 150 DPI)].

Downloaded by INDIAN INSTITUTE OF SCIENCE on December 18, 2014 | http://arc.aiaa.org | DOI: 10.2514/1.I010165

LAKSHMIKANTH ET AL.

803

Fig. 13 Adaptation trajectory, adapting to an inaccuracy in the estimation of parameter CMα [320 × 154 mm (150 × 142 DPI)].

efficiency of the SNAC-NDI controller. It can also be seen from the plots that the SNAC-NDI efficiently follows the pilot command with just 50% of the available elevator. 3. Inaccuracy in the Estimation of Parameter CMα

An inaccuracy in the estimation of a very important parameter of the airplane CMα can cause flight instability. Therefore, the adaptation capability of the SNAC-aided NDI controller is put to a good test when a pilot commands a θ step in the presence of this inaccuracy. The airplane starts at the trim condition. The pilot commands a pitch angle step of -1.5 deg, which is equal to −0.026 rad at t  1 s. We choose estimated CMα to be equal to 50% of the actual CMα . The states, control, gains, and adaptation trajectories generated by SNAC-NDI and NDI controllers are compared and shown in Figs. 12 and 13. To compare the two control schemes, the cost function defined in Eq. (42) was evaluated with the steady-state values of variables the same as those listed in scenario 2 of Sec. V.B. The total costs are J perf SNAC-NDI  1.277 × 104 and Jperf NDI  1.656 × 104 , demonstrating the efficiency of the SNAC-NDI controller. It can also be seen from the plots that the SNAC-NDI efficiently follows the pilot command, even with an inaccurate estimation of CMα .

VI.

Conclusions

A hybrid, nonlinear flight-control system using the technologies of a single network adaptive critic and nonlinear dynamic inversion has been presented, and its performance has been compared to a traditional nonlinear dynamic inversion controller for various command-following and state regulation applications. This technique employs a stable, optimal nonlinear system as a reference model to compute the NDI feedback gains. The presented technique also connects the formulation of optimal control theory with the ease and closed-form expression of NDI, thus maintaining the advantages of the two techniques. The ability to stabilize the closed-loop system using SNAC-NDI plays a very important role when adapting to system failures or parameter inaccuracies, and it aid in the removal of unwanted dynamics, which is usually seen in systems using NDI control alone. The extended SNAC-NDI control system architecture adapts very well to most common failure conditions and performs better than the traditional NDI controller. Simulation results have demonstrated the same along with a cost-based comparison of the two techniques to emphasize the advantages of the presented technique in longitudinal flight-control applications.

Appendix: Lyapunov Stability Analysis of SNAC-NDI Controlled System A general nonlinear system with a well-defined relative degree m driven by a SNAC-NDI controller is a nonautonomous system, as the gains K1 t; K2 t; · · · ; Km t are time varying [33]. It is therefore very important to derive the stability criteria that guarantee closed-loop stability by accounting for the nonautonomous nature of the closed-loop systems. At every time cycle before updating control equation (26) with new values of gains K1 t; K2 t; · · · ; Km t, they are checked to see if they satisfy the Hurwitz polynomial. This is to ensure that all the roots of the Hurwitz polynomial have a negative real part. This is the first criterion that the gains have to satisfy [33]. The role of the scalar weight r2 in the cost function given by Eqs. (28) and (30) is not just to make the resulting control signal smooth and continuous but also to ensure that the variation of the gain from its previous value is small and bounded. The following Lyapunov analysis will prove that keeping the variation of the gains small and bounded is essential to guarantee global exponential stability of the system. The system is assumed to be internally stable. The SNAC-NDI control design is also based on feedback linearization, and therefore transforms a nonlinear system into a linearlike form by state feedback. If m is the relative degree of each component of the output vector Y, then the mth differentiation of Y is given by dm Y∕dtm  Fy X  Gy XU

(A1)

On substituting SNAC-NDI control expression (26) in Eq. (A1), we get dm E∕dtm  −K1 tdm−1 E∕dtm−1 − K2 tdm−2 E∕dtm−2 − · · · −Km tE

(A2)

804

LAKSHMIKANTH ET AL.

where E  Y − Y des . Clearly, Eq. (A2) is a linear time-varying (LTV) system. To arrive at the state-space form, we define Zi  di−1 Y − Y des ∕dti−1 for i  1; 2; 3; · · · ; m and Z   Z1 Z2 · · · Zm T . Equation (A2) can now be written in the state-space form given by 3 2 0 I Z_ 1 6 Z_ 2 7 6 0 0 7 6 6 6 · 7 6 · · 7 6 6 6 · 76 · · 7 6 6 4 · 5 4 0 0 −Km t −Km−1 t Z_ m 2

0 I · · 0

··· ··· ··· ··· ··· ···

−Km−2 t

32 3 0 Z 6 17 0 7 7 6 Z1 7 7 6 · 7 76 · 7 7 6 · 7 76 · 7 5 5 4 I Zm −K1 t

(A3)

where I is an identity matrix of the same order as Y, which is equal to v. Equation (A3) can be written in the form of a generalized LTV system:

Downloaded by INDIAN INSTITUTE OF SCIENCE on December 18, 2014 | http://arc.aiaa.org | DOI: 10.2514/1.I010165

_  KZ tZ Z

(A4)

where KZ t is a square matrix of order mv and is shown in Eq. (A3). ~ Z t, where KZ t0  is a constant matrix given by KZ t0   KZ t  t0  and K ~ Z t is The matrix KZ t can be written as KZ t  KZ t0   K the deviation of KZ t from its initial value KZ t0 . Rewriting Eq. (A4), ~ Z tZ _  KZ t0   K Z

(A5)

V Z  ZT PZ Z

(A6)

Let us consider a quadratic Lyapunov function candidate

where PZ is a symmetric positive definite matrix, which is the solution of the Lyapunov equation [33] PZ KZ t0   KZ t0 T PZ  −IZ

(A7)

where IZ is an identity matrix of order mv. Differentiating Eq. (A5) and substituting Eq. (A5), we get ~ Z tZ; _ Z  2ZT PZ KZ t0   K V ~ Z tZ; _ Z  2ZT PZ KZ t0 Z  2ZT PZ K V ~ Z tZ _ Z  2ZT PZ KZ t0 ∕2  KZ t0 T PZ ∕2Z  2ZT PZ K V

(A8)

Substituting Eq. (A7) in Eq. (A8), ~ Z tZ; _ Z  ZT −I Z Z  2ZT PZ K V ~ Z tZ; _ Z  −kZk2  2ZT PZ K V ~ Z tT PZ ∕2Z _ Z  −kZk2  2ZT PZ K ~ Z t∕2  K V

(A9)

where k · k indicates the two-norm of the variable. According to theorem zero [33], if V Z is a scalar and   V Z Z; t ≥ 0; ∀ Z; ∀ t ≥ 0 V_ Z Z; t ≤ 0; ∀ Z; ∀ t ≥ 0 then V Z will be bounded. Accordingly, from Eq. (A9), we infer that the condition for V Z to be bounded is that the eigenvalues of ~ Z tT PZ  be less than one. This implies that it is important to ensure that the deviation of the gain values be small and bounded for ~ Z t  K PZ K the GES of the tracking error Z. Furthermore, it has been shown in [A1] that, if δK > 0 is sufficiently small, then satisfying any of the following conditions guarantees exponential stability of Eq. (A5): _ Z tk ≤ δK kK

for all t ≥ 0

and kKZ t2  − KZ t1 k ≤ δK kt2 − t1 k;

for all t1 ; t2 ≥ 0

The preceding Lyapunov-based stability analysis shows that, keeping the deviation of the gain values for a SNAC-NDI controller, small and bounded ensures a GES of the tracking error. [A1] Ilchmann, A., Owens, D. H., and Pratzel-Wolters, D., “Sufficient Conditions for Stability of Linear Time-Varying Systems,” Systems and Control Letters, Vol. 9, No. 2, 1987, pp. 157–163. doi:10.1016/0167-6911(87)90022-3

LAKSHMIKANTH ET AL.

805

Acknowledgments This work was supported by NASA under award number NNXO9AP20A. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NASA.

Downloaded by INDIAN INSTITUTE OF SCIENCE on December 18, 2014 | http://arc.aiaa.org | DOI: 10.2514/1.I010165

References [1] Werbos, P. J., “Neuro Control and Supervised Learning: An Overview and Evaluation,” Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches, edited by White, D., and Sofge, D., Van Nostrand/Reinhold, New York, 1992. [2] Bryson, A. E., and Ho, Y. C., Applied Optimal Control, Taylor and Francis, Washington, D.C., 1975. [3] Bellman, R. E., Dynamic Programming, Princeton Univ. Press, Princeton, NJ, 1957. [4] Fausett, L., Fundamentals of Neural Networks Architectures, Algorithms and Applications, Prentice–Hall, Upper Saddle River, NJ, 1994. [5] Ferrari, S., and Stengel, R. F., “Online Adaptive Critic Flight Control,” Journal of Guidance, Control, and Dynamics, Vol. 27, No. 5, 2004, pp. 777–786. doi:10.2514/1.12597 [6] Ferrari, S., and Stengel, R. F., “Classical/Neural Synthesis of Nonlinear Control Systems,” Journal of Guidance, Control, and Dynamics, Vol. 25, No. 3, 2002, pp. 442–448. doi:10.2514/2.4929 [7] Unnikrishnan, N., and Balakrishnan, S. N., “Dynamic Reoptimization of a Missile Autopilot Controller in Presence of Unmodeled Dynamics,” AIAA Guidance, Navigation and Control Conference, AIAA Paper 2005-6387, Aug. 2005. [8] Padhi, R., Unnikrishnan, N., Wang, X., and Balakrishnan, S. N., “A Single Network Adaptive Critic (SNAC) Architecture for Optimal Control Synthesis for a Class of Nonlinear Systems,” Journal of Neural Networks, Vol. 19, No. 10, pp. 1648–1660. doi:10.1016/j.neunet.2006.08.010 [9] Padhi, R., “Optimal Control of Distributed Parameter Systems Using Adaptive Critic Neural Networks,” Ph.D. Dissertation, Aerospace Engineering Dept., Univ. of Missouri–Rolla, Rolla, MO, 2001. [10] Ding, J., and Balakrishnan, S. N., “An Online Nonlinear Optimal Controller Synthesis for Aircraft with Model Uncertainties,” AIAA Guidance, Navigation and Control Conference, AIAA Paper 2010-7738, Aug. 2010. [11] Janardhan, V., Schmitz, D., and Balakrishnan, S. N., “Development and Implementation of New Nonlinear Control Concepts for a UA,” Proceedings of the 23rd Digital Avionics Systems Conference, Vol. 2, IEEE, Piscataway, NJ, Oct. 2004, pp. 12.E.5–121-10. [12] Lakshmikanth, G. S., Padhi, R., Watkins, J. M., and Steck, J. E., “Single Network Adaptive Critic Aided Nonlinear Dynamic Inversion for Suboptimal Command Tracking,” Proceedings of the IEEE Multi Systems Conference, IEEE, Piscataway, NJ, Sept. 2011, pp. 1347–1352. [13] Fisher, J. R., “Aircraft Control Using Nonlinear Dynamic Inversion in Conjunction with Adaptive Robust Control,” M.S. Thesis, Mechanical Engineering Dept., Texas A&M Univ., College Station, TX, 2004. [14] Lemon, K., Steck, J., Hinson, B., Nguyen, N., and Kimball, D., “Model Reference Adaptive Flight Control Adapted for General Aviation: Controller Gain Simulation and Preliminary Flight Testing on a Bonanza Fly-By-Wire Testbed,” AIAA Guidance, Navigation, and Control Conference, AIAA Paper 20108278, Aug. 2010. [15] Enns, D., Bugajski, D., Hendrick, R., and Stein, G., “Dynamic Inversion: An Evolving Methodology for Flight Control Design,” International Journal of Control, Vol. 59, No. 1, 1994, pp. 71–91. doi:10.1080/00207179408923070 [16] Reed, S., “Demonstration of the Optimal Control Modification for General Aviation: Design and Simulation,” M.S. Thesis, Dept. of Aerospace Engineering, Wichita State Univ., Wichita, KS, 2010. [17] Shin, Y., “Neural Network Based Adaptive Control for Nonlinear Dynamic Regimes,” Ph.D. Dissertation, Mechanical Engineering Dept., Georgia Inst. of Technology, Atlanta, 2005. [18] Calise, A. J., Yucelen, T., Muse, J. A., and Yung, B.-J., “A Loop Recovery Method for Adaptive Control,” AIAA Guidance, Navigation and Control Conference, AIAA Paper 2009-5967, Aug. 2009. [19] Brinker, J., and Wise, K., “Stability and Flying Qualities Robustness of a Dynamic Inversion Aircraft Control Law,” Journal of Guidance, Control, and Dynamics, Vol. 19, No. 6, Nov.–Dec. 1996, pp. 1270–1277. doi:10.2514/3.21782 [20] McFarland, M., and Calise, A., “Robust Adaptive Control of Uncertain Nonlinear Systems Using Neural Networks,” IEEE Transactions on Automatic Control, Vol. 3, July 1997, pp. 1996–2000. doi:10.1109/ACC.1997.611038 [21] McFarland, M., and Calise, A., “Adaptive Nonlinear Control of Agile Antiair Missiles Using Neural Networks,” IEEE Transactions on Control Systems Technology, Vol. 8, No. 5, Sept. 2000, pp. 749–756. doi:10.1109/87.865848 [22] Rysdyk, R., Nardi, F., and Calise, A., “Robust Adaptive Nonlinear Flight Control Applications Using Neural Networks,” Proceedings of the American Control Conference, Vol. 4, IEEE, Piscataway, NJ, 1999, pp. 2595–2599. doi:10.1109/ACC.1999.786533 [23] Rysdyk, R., “Adaptive Nonlinear Flight Control,” Ph.D. Thesis, Georgia Inst. of Technology, Atlanta, Nov. 1998. [24] Rysdyk, R., Calise, A., and Chen, R., “Nonlinear Adaptive Control of Tiltrotor Aircraft Using Neural Networks,” Proceedings of IEEE International Conference on Control Applications, Vol. 2, IEEE, Piscataway, NJ, 1998, pp. 980–984. [25] Rysdyk, R., and Agarwal, R., “Nonlinear Adaptive Flight Path and Speed Control Using Energy Principles,” AIAA Paper 2002-4440, Aug. 2002. [26] Steck, J. E., Rokhsaz, K., Pesonen, U. J., Bruner, S., and Duerksen, N., “Simulation and Flight Test Assessment of Safety Benefits and Certification Aspects of Advanced Flight Control Systems,” Federal Aviation Administration, Final Rept. DOT/FAA/AR-03/51, Aug. 2003. [27] Chowdary, G., Johnson, E. N., Chadramohan, R., Kimbrell, M. S., and Calise, A., “Guidance and Control of Airplanes Under Actuator Failures and Severe Structural Damage,” Journal of Guidance, Control, and Dynamics, Vol. 36, No. 4, July–Aug. 2013, pp. 1093–1104. doi:10.2514/1.58028 [28] Bellahdid, L., Lakshmikanth, G. S., Chakravarthy, A., and Steck, J. E., “Single Network Adaptive Critic (SNAC) Design for a Morphing Aircraft,” AIAA Guidance, Navigation and Control Conference, AIAA Paper 2012-4614, Aug. 2012. doi:10.2514/6.2012-4614 [29] Chamseddine, A., Zhang, Y., Rabbath, C. A., Fulford, C., and Apkarian, J., “Model Reference Adaptive Fault Tolerant Control of a Quadrator UAV,” AIAA Infotech @ Aerospace 2011 Conference, AIAA Paper 2011-1606, 2011. doi:10.2514/6.2011-1606 [30] Nguyen, N. T., “Bi-Objective Optimal Control Modification Adaptive Law for Unmatched Uncertain Systems,” AIAA Guidance, Navigation and Control Conference, AIAA Paper 2013-4613, 2013. doi:10.2514/6.2013-4613 [31] Bhattacharyya, S., Krishna Kumar, K., and Nguyen, N. T., “Adaptive Autopilot Designs for Improved Tracking and Robustness,” AIAA Infotech @ Aerospace 2012 Conference, AIAA Paper 2012-2494, 2012. doi:10.2514/6.2012-2494 [32] Roskam, J., Airplane Flight Dynamics and Automatic Flight Controls — Part I, DAR Corp. Press, Lawrence, KS, 2007. [33] Slotine, J. E., and Li, W., Applied Nonlinear Control, Prentice–Hall, Upper Saddle River, NJ, 1991.

806

LAKSHMIKANTH ET AL.

[34] Lavertskry, E., and Wise, K.A., Robust and Adaptive Control, with Aerospace Applications, Springer, London, 2013. doi:10.1007/978-1-4471-4396-3 [35] Yucelen, T., and Calise, J. A., “Derivative-Free Model Reference Adaptive Control,” Journal of Guidance, Control, and Dynamics, Vol. 34, No. 4, July– Aug. 2011, pp. 933–950. doi:10.2514/1.53234

Downloaded by INDIAN INSTITUTE OF SCIENCE on December 18, 2014 | http://arc.aiaa.org | DOI: 10.2514/1.I010165

J. How Associate Editor