Reinforcement Learning Controller for Variable-speed Wind Energy ...

5 downloads 124 Views 445KB Size Report
improve its power quality and reduce its cost in the mean time. Many control strategies have been studied and applied to wind energy conversion systems.
Proceedings of the 33rd Chinese Control Conference July 28-30, 2014, Nanjing, China

Reinforcement Learning Controller for Variable-speed Wind Energy Conversion Systems Meng Wenchao1 , Yang Qinmin1 , Youxian Sun1 1. State Key Laboratory of Industrial Control Technology, Department of Control Science and Engineering, Zhejiang University, Hangzhou Zhejiang 310027, P. R. China E-mail: [email protected] Abstract: In this paper, a reinforcement learning based adaptive critic controller is proposed for the power capture control of variable-speed wind energy conversion systems (WECSs). The control objective is to optimize the power capture from wind by tracking the maximum power curve and minimize a predefined long-term cost function in the mean time. By minimizing the long-term cost function, both the power capture and the life time of mechanical part of a wind turbine are considered as opposed to most of existing literatures. The developed controller consists of an action network and a critic network. The critic network is introduced to evaluate the performance of the action network, and learn the cost-to-go function in an online manner. The estimate of cost-to-go function is then transmitted to the action network. The action network is utilized to provide the optimal generator torque rate with the help of the estimate of cost-to-go function. Here, a two-layer neural network structure is employed for both the action and critic network. Finally, the performance of the proposed controller is evaluated on a 1.5 MW three-blade wind turbine in simulating environment. Key Words: Wind energy conversion systems, adaptive control, nonlinear uncertain systems, reinforcement learning.

1 Introduction Renewable energy has attracted increasing interests from both academic and industrial communities because of growing concerns about environmental pollution and a possible energy shortage [1–3]. Among the renewable energies, wind power is playing a major part in the power market in the world. For example, in the past decades, wind capacity has increased at a fast rate up to thirty percent per year in the United States and Europe, and this trend will endure for some time [4]. Although the wind capacity is growing fast, wind currently is still providing a very small share in the electricity consumption in the world. The main factor that holds up the wind power penetrating into the electrical market is its power quality and high cost [5–7]. Therefore, new advance control algorithms are considered to be a promising way to improve its power quality and reduce its cost in the mean time. Many control strategies have been studied and applied to wind energy conversion systems. For example, classical control strategies such as PID regulator [8], optimal control in the LQ and LQG form [9, 10] were investigated for maximize power extracting. Although these control algorithms have good performance in simulation, they rely on the accurate mathematical model of WECS. Since the wind energy conversion systems possess a number of uncertain factors, such as meteorological conditions (wind speed and wind direction etc.) and continuously varying system load, it is infeasible to obtain the accurate mathematical model [11]. In [12], by using linear matrix inequalities (LMI) technique combined with predictive approach, the power control problem was converted to a convex optimization problem subject to various constraints. It should be noted that all these methods are on the basis of linearization techniques valid only around operation points. However, the wind energy converThis work is supported by National Natural Science Foundation (NNSF) of China under Grant 61104008, National High Technology Research and Development Program of China (863) under Grant 2012AA062201.

sion systems exhibit highly nonlinear characteristics. Therefore, such linear approaches may result in poor performance and low reliability since the operation point often changes with time due to the stochastic operating conditions and inherent system uncertainties. A great deal of efforts have been devoted to avoiding the negative effects introduced by aforementioned linear methods. In [13], by using a neural network to approximate the unknown nonlinear dynamics, an adaptive NN torque controller was presented for power capture control. In addition, A high-order robust sliding controller was investigated in [11] by viewing the nonlinearities as a bounded disturbance, and a satisfactory power regulation performance has been reported. More recently, a hierarchical control structure was developed in [14], which possesses plug-and-play property suitable for distributed systems. It should be noted that most of existing controllers in low speed region are only focused on capturing power from wind, while other factors like mechanical load are often neglected for convenience of analysis. Actually, when both of power capture and mechanical load are considered in a uniform framework, the problem becomes an optimal problem rather a tracking problem. Although this optimal problem can be addressed by using nonlinear model predictive control (NMPC)[15], the accurate system model is usually required for finding the optimal control input. Therefore, in this study, we are concerned about the optimal control of variable-speed wind energy conversion systems in terms of optimizing the power capture from wind by tracking the maximum power curve and minimizing a predefined long-term cost function in the mean time. Compared with most of works focused on low-speed region, we consider both the power capture and the life time of mechanical part of the wind turbine with the help of long-term cost function. The developed controller has two entities, i.e. critic network and action network. The action network is used to provide control input while the critic network is utilized to evaluate the performance [16]. Both the critic network

8877

and action network are represented using a two-layer neural network. Furthermore, no accurate system model is needed for solving the optimal problem as opposite to model based methods. Finally, the tracking performance is verified on a 1.5MW wind turbine. We organize the rest of this paper as follows. Section 2 gives the a typical model of variable-speed wind turbines. In Section 3, the problem formulations is presented, followed by the detailed reinforcement learning controller. Section 4 verify the performance of the proposed controller by simulating on a 1.5 MW three-bladed wind turbine. Finally, Section 5 concludes this paper.

0.45 0.4 0.35 Cp(λ,β)

0.3 0.25 0.2 0.15 0.1 0.05 0 14 40

10 30 10

The variable-speed wind energy conversion system can be modeled as several interconnected subsystems such as rotor, drive train and generator subsystems [4]. The rotor is designed to capture energy from wind and convert it to mechanical power, which is transformed into electrical power by the generator finally. In addition, the main task of a gear box is to couple the rotor shaft and the generator shaft for the sake of a difference between the optimal rotor and generator speed ranges. The wind power captured by the rotor is given by 1 Pa = ρπR2 Cp (λ, β)v 3 (1) 2 where Cp (λ, β) stands for the wind turbine power conversion efficiency, which is a function of the tip speed ratio λ and blade pitch angle β [17]. ρ, v, R are air density, wind speed and rotor radius, respectively. Moreover, the calculation of Cp (λ, β) needs the utilization of blade element theory with the requirement of aerodynamics, and the computations are very complicated [18]. Thus, without any loss of generality, the following numerical approximator is utilized, which can be commonly found in WECSs control literatures [19] [20].

where

116 Cp (λ, β) = 0.22( − 0.4β − 5)e−12.5/m m

(2)

1 1 0.035 = − 3 m λ + 0.08β β +1

(3)

tip speed ratio λ

Rωr v

Pa = ωr Ta

(5)

Cp (λ, β) λ

(6)

It thus follows that Ta =

1 ρπR3 Cq (λ, β)v 2 2

pitch angle β (deg)

Jr ω˙ r = Ta − Kr ωr − Tls Jg ω˙ g = Ths − Kg ωg − Tem

(8) (9)

where Jr , Jg are rotor and generator inertias, and Kr , Kg are rotor and generator external damping. The gearbox ratio ng is defined by ωg Tls ng = = (10) ωr Ths Combining (10) and (9), one has   n2g Jg ω˙ r = Tls − n2g Kg ωr − ng Tem

(11)

Thus, by recalling (8) with (11), we can obtain a single lumped mass model of the drive train as Jt ω˙ r = Ta − Kt ωr − Tg

(12)

⎧ ⎨ Jt = Kt = ⎩ Tg =

(13)

where Jr + n2g Jg Kr + n2g Kg ng Tem

Subsequently, we will use the simplified model (12) for control purposes. Further, the generated power will finally be given by P g = Tg ω r (14)

3 Controller Methodology

where Ta represents the aerodynamic torque. Moreover Cq (λ, β) =

0

shaft torque Tls . On the other hand, the high-speed shaft torque Ths drives the generator at speed ωg , which is also braked by the generator electromagnetic torque Tem . The rotor along with generator dynamics is provided by

(4)

where ωr is the rotor angular speed. Notice that the rotor power Pa can be rephrased as

0

Fig. 1: Typical variations of Cp for a wind turbine

The tip-speed ratio λ is defined as λ=

20

5

2 Wind Turbine Model

(7)

3.1 Problem Formulation The control objective is different when the wind turbine operates at different regions. We are focused on the low speed region in this study. The main task of low speed region is to acquire the maximum amount of energy available in the wind, which is given by

Fig.2 shows the drive train scheme [13]. The torque drives the rotor at speed ωr , which is also braked by the low-speed

8878

Pamax =

1 ρπR2 Cpmax v 3 2

(15)

-U 

Y N

7D

7OV 7KV

-J

7J N

7HP

-Ö N   = 

ZJ

H N

.J

.U

-Ö N

Fig. 2: Drive train dynamics

Fig. 3: Online reinforcement learning neural controller structure

with Cpmax being the maximum point of Cp . In practice, it is preferable to operate the wind turbine at an efficiency a bit lower than the maximum with the aim of leaving an energy buffer for the grid frequency control. More specifically, the desired power trajectory is given by Pd =

αPamax

(16)

with 0 < α < 1 denoting the ratio between the generator desired power Pg and maximum available power Pamax . Moreover, following mild assumption is needed for controller design. Assumption 1 The rate of desired power trajectory is bounded in the sense that P˙ d  ≤ B with B > 0 being an positvie constant. Define the tracking error as e = Pg − Pd , and its time derivative can be obtained as e˙ =T˙g ωr + Tg ω˙ r − P˙ d Tg =T˙g ωr + (Ta − Kt ωr − Tg ) − P˙ d Jt

(17)

Let the generator torque rate β = T˙g , we have e˙ = βωr +

Tg (Ta − Kt ωr − Tg ) − P˙ d Jt

(18)

noted that the long-term cost function is defined as a sum of discounted Lagrangian given by r(k) = qe2 (k) + γβ 2 (k)

e(k + 1) = β(k)T ωr (k) + f (Z(k)) − T P˙ d (k)

(19)

with Tg (k)T (Ta (k) − Kt ωr (k) − Tg (k)) Jt

(20)

where k denoting the time instant, T is the sampling period, and Z(k) = [ωr (k), v(k), Tg (k)]T . Thereafter, considering a tradeoff between the tracking error and the life of a wind turbine, it is preferable to find the generator torque rate β(k) such that the following long-term cost function is minimized. J(k) =

∞ 

λi r(k + i)

(21)

i=t0

with r(k) being the short-term cost or Lagrangian, 0 < λ < 1 is a discount factor for this optimal problem. It should be

(22)

where q, γ ∈ + are positive constants. 3.2 Main Structure Fig. 3 shows the block diagram of the proposed reinforcement learning controller, where the action NN generates the generator torque rate to the wind energy conversion systems, while the critic NN learns the long-term cost function. Both the action network and critic network are represented by a two-layer neural network (NN). The output of this type NN is   ˆ Tφ V TX O(X) = W (23) where X ∈ n1 and O ∈ n3 stand for the NN’s input ˆ ∈ n2 ×n3 and V ∈ n1 ×n2 stands for the and output. W hidden and output layer weights, respectively. n1 , n2 , n3 are the number of nodes in the input, hidden and output layer, respectively. φ(·) is the so-called vector of activation function. Referring to the universal approximation property, there exists an ideal weights W ∗ for any smooth function F (X) over an arbitrary compact set ΩX such that [21]

Moreover, for analysis convenience, the equation (18) can be rewritten in discrete-time as

f (Z(k)) =

E N



ZU N

H N

ZU

ND H N 7 ZU N

F (X) = W ∗T φ(V T X) + = W ∗T φ(X) +

(24)

with W ∗ , being the target weight matrix of the output layer and the corresponding reconstruction error, respectively. It should be noted that the reconstruction error can be made as small as possible by selecting enough nodes in the hidden layer and holding the hidden layer weight matrix V constant. The weights of the hidden layer V are omitted for convenience, since they are not updated. In general, it is assumed that the weight matrix W and reconstruction error

are bounded above such that W  ≤ wm , | | ≤ m with wm , m ∈ + being unknown constants. Notice that both the wm and m are artificial quantities required for analytical purposes. In other words, their real values will not appear in the ultimate control law. The results obtained in this study can also be extended to any other linearly parameterized neural networks, such as radial basis functions, fuzzy logic, splines, and etc [22].

8879

3.3 Critic Network Design

3.4 Action Network Design

In general, the critic network is utilized to estimate the optimal long-term cost function J ∗ (k). Since the long-term cost function J ∗ (k) is unknown, with the help of NN approximation, there exists an ideal two-layer neural network such that

In the reinforcement controller for wind energy conversion systems, the action network is utilized to provide the generator torque rate β(k). According to the error dynamics (20), a desired generator torque rate can be chosen as

J ∗ (k) = Wc∗T φc (e(k)) + ωc (k)

where Wc∗ is the target output layer weight, and φc is the activation function. ωc (k) is the reconstruction error satisfying |ωc (k)| ≤ ωcm with ωcm being an unknown positive constant. By using NN, we estimate the long-term cost function as ˆ ˆ c (k)T φc (e(k)) J(k) =W

(26)

ˆ c (k) is the estimate of W ∗ . Thereafter, the predicwhere W c tion error is defined as ˆ − J(k ˆ − 1) + r(k) Ec (k) = λJ(k)

β ∗ (k) = −

(25)

ˆ − J(k ˆ − 1) + r(k) Ec (k) =λJ(k) ˆ c (k)T φc (e(k)) − W ˆ c (k − 1)T φc (e(k − 1)) =λW + r(k) ˜ cT (k)φc (e(k)) − J ∗ (k − 1) + r(k) =λJ ∗ (k) + λW ˜ cT (k − 1)φc (e(k − 1)) − λωc (k) + ωc (k − 1) −W (28)

β ∗ (k) = −

1 2 E (k) 2 c

β(k) = −

ˆ c (k + 1) = W ˆ c (k) + ΔW ˆ c (k) W where

 ∂Ec (k) ˆ Wc (k) = lc − ˆ c (k) ∂W

= − lc λφc (e(k))Ec (k)

(32)

Thence, the update law for critic NN is given by ˆ c (k) − lc λφc (e(k)) ˆ c (k + 1) =W W

ˆ − J(k ˆ − 1) + r(k) λJ(k)

(33)

(36)

= − T ωr F (Za (k)) + T ωr (k) (β(k) − β ∗ (k)) + T ωr (k)β ∗ (k) − T P˙ d (k) ˜ a (k)φa (Za (k)) = − ka e(k) + T ωr (k) W −ωa (k)) − T P˙d (k) = − ka e(k) + T ωr (k)Λa + da

(37)

where ˆ a (k) − Wa∗ ˜ a (k) =W W ˜ a (k)φa (Za (k)) Λa = W da = − T ωr (k)ωa (k) − T P˙ d (k)

(31)

ˆ ˆ c (k) = − lc ∂Ec (k) = −lc ∂Ec (k) ∂Ec (k) ∂ J(k) W ˆ c (k) ˆ ˆ c (k) ∂Ec (k) ∂ J(k) ∂W ∂W

ka e(k) ˆ a φa (Za ) +W T ωr (k)

e(k + 1) =β(k)T ωr (k) − T ωr F (Za (k)) − T P˙d (k)

(30)

with lc > 0 being the learning rate. Recalling (26)(27)(30) and using the chain rule, the weight update law for critical NN can be obtained as

(35)

Recalling (20)(35)(36), we have

(29)

We present the following weight updating algorithm for the critic network with the help of a standard gradient-based adaptation method

ka e(k) + Wa∗T φa (Za (k)) + ωa (k) T ωr (k)

where Wa∗ is the target output layer weight, and φa is the activation function. ωa (k) is the reconstruction error satisfying |ωa (k)| ≤ ωam with ωam being an unknown positive constant. In general, the target weights Wa∗ is unknown and needs ˆ a to be the estimate of W ∗ , we are to be estimated. Let W a ready to present

ˆ c (k) should be designed to minimize the The update law of W prediction error. Therefore, the following objective function is introduced to assist the construction of ΔWc (k) Ec (k) =

(34)

where ka > 0 is a positive constant, and F (Za (k)) = T − fT(Z(k)) ωr (k) , Za (k) = Z (k) Notice that the term F (Za (k)) is traditionally obtained by using mechanism modeling methods. Hence, to be more general, the explicit form of F (Za (k)) is assumed to be unknown in this study. The desired generator torque rate β ∗ (k) cannot be implemented directly. Instead, by using the a twolayer NN to approximate the dynamics F (Za (k)), we have

(27)

ˆ c (k) − W ∗ , ˜ c (k) = W Recalling (26) (25) and introducing W c we have

ka e(k) + F (Za (k)) T ωr (k)

(38)

Finally, the closed-loop tracking error dynamics can be rewritten as e(k + 1) = −ka e(k) + T ωr (k)Λa + da

(39)

The action NN should be designed to track the desired power trajectory and minimize the long-term cost. Hence, the error for the action NN in this study consists of the functional estimation error and the error between the nominal desired long-term cost function Jd (k) and the critic signal ˆ J(k). More specifically, we define 

1 ˆ − Jd (k) Ea (k) = T ωr (k)Λa + J(k) T ωr (k) (40)

8880

Since the long-term cost should be as low as possible, we set Jd (k) = 0. Thence, the weights of action NN should be tuned to minimize the following error

10 Wind speed (m/s)

1 Ea (k) = Ea2 (k) (41) 2 Recalling (39)(40)(41) and using the chain rule, the weight update law for action NN can be obtained as

11 10.5

9.5 9 8.5 8

ˆ a (k) = − la ∂Ea (k) = −lc ∂Ea (k) ∂Ea (k) ∂Λa ΔW ˆ a (k) ˆ a (k) ∂Ea (k) ∂Λa ∂ W ∂W  ˆ = − la φa (Za (k)) T ωr (k)Λa + J(k)

7.5

0

10

20

= − la φa (Za (k))  ˆ × e(k + 1) + ka e(k) − da (k) + J(k) (42)

(43)

4 Validation Results

60

70

80

90

100

P

g

P

d

In order to illustrate the performance of the proposed reinforcement learning controller, we have carried out numerical analysis on a 1.5 MW wind turbine produced by WINDEY corporation, whose characteristics are shown in Fig.1 [13].

1.1 0.9 0.7 0.5 0.3 0

20

40

60

80

100

Time(s)

Fig. 5: Generator output power

Table 1: Wind Turbine Characteristics 1.5 MW R = 38.5 m Jr = 4456761 kg.m2 Jg = 123 kg.m2 Kr = 45.52 N.m/rad/s Kg = 0.4 N.m/rad/s ng = 104.494

We set other parameters used in the simulation as: air density ρ = 1.25 kg/m3 , maximum power ratio Cpmax = 0.4382, ratio np = 0.8. The effective wind speed is simulated by utilizing the Class A Kaimal turbulence spectra, which is commonly used in literatures [11] [4] . Moreover, the wind is supposed to have a a mean value 9m/s and turbulence intensity 12% shown in Fig.4. The parameters of developed controller are listed as follows: ka = 0.004, la = 0.001, lc = 10−5 , λ = 0.5, γ = q = 0.001, T = 0.01. The neural networks utilized in both critic network and action network have ten hidden nodes, and we select the activation function φ(·) as the hyperbolic tangent function. The initial hidden layer weights are randomly determined and held constant during the operation of the controller. The initial output layer weights are selected initially at zero. Fig.5 shows the generator power output along with the its desired trajectory. It can be seen that the generator output can track the desired trajectory with good performance in the presence of system uncertainties and NN approximation. Also, we have plotted the tracking error in Fig.6, which also verify the system performance. Finally, we plot the resulted generator torque in Fig.7.

8881

Tracking Error e (MW)

0.1

0

í0.1

í0.2

í0.3

0

20

40

60

80

100

Time(s)

Fig. 6: Tracking error for the proposed controller

700 600 Torque Tg (kN.m)

Rated power Rotor radius Rotor inertia Generator inertia Rotor friction coefficient Generator friction coefficient Gearbox ratio

50 Time(s)

1.3

g

ˆ a (k + 1) =W ˆ a (k) − la φa (Za (k)) W  ˆ × e(k + 1) + ka e(k) + J(k)

40

Fig. 4: Wind speed

Output Power P (MW)

with la > 0 being the learning rate. Since da (k) is unknown, we take it as zero in the ideal case. Thence, the weights of action NN is tuned as

30

500 400 300 200 100

0

20

40

60 Time(s)

Fig. 7: Generator torque Tg

80

100

5 Conclusion We have investigated the optimal control of variable speed wind energy conversion systems in low speed region by using reinforcement learning method. Taking both the power capture from wind and life time of mechanical part into consideration, the problem in low speed region has been formulated as an optimal problem in terms of a cost function rather than a tracking problem. To solve the optimal problem, the reinforcement learning is introduced which has two entities, i.e. critic network and action network. The action network is used to provide control input while the critic network is utilized to evaluate the performance. Both the critic network and action network are represented using a two-layer neural network. Meanwhile, the developed controller does not rely on accurate model of VS-WECS. Therefore, our developed controllers can be applied to various wind energy conversion systems without repeating the complex controller design procedure. Finally, simulations have been conducted on a 1.5MW wind turbine to verify the performance of our proposed controller.

[13]

[14]

[15]

[16]

[17] [18] [19]

References [1] M.A. Parker, R. Li, and S.J. Finney. Distributed control of a fault-tolerant modular multilevel inverter for direct-drive wind turbine grid interfacing. IEEE Trans.Industrial Electronics, 60(2):509–522, 2013. [2] F. Valenciaga C. Evangelista, P. Puleston. Lyapunov-designed super-twisting sliding mode control for wind energy conversion optimization. IEEE Trans. Industrial Electronics, 60(2):538–545, 2013. [3] S.M. Muyeen, R. Takahashi R, and T. Murata. A variable speed wind turbine control strategy to meet wind farm grid code requirements. IEEE Trans. Power Systems, 25(1):331– 340, 2010. [4] B. Beltran, T. Ahmed-Ali, and M. Benbouzid. High-order sliding-mode control of variable-speed wind turbines. IEEE Trans. Industrial Electronics, 56(9):3314–3321, 2009. [5] Y. Xue and N. Tai. Review of contribution to frequency control through variable speed wind turbine. Renewable energy, 36(6):1671–1677, 2011. [6] F. Blaabjerg, M. Liserre, and K. Ma. Power electronics converters for wind turbine systems. IEEE Trans. Industry Applications, 48(2):708–719, 2012. [7] B. Beltran and M El Hachemi Benbouzid. Second-order sliding mode control of a doubly fed induction generator driven wind turbine. IEEE Trans. Energy Conversion, 27(2):261– 269, 2012. [8] M. Hand and M.J. Balas. Non-linear and linear model based controller design for variable-speed wind turbines. In Proc. 3rd ASME/JSME joint fluids engineering conference, 1999. [9] T. Ekelund. Modeling and Linear Quadratic Optimal control of wind turbines. PhD thesis, Chalmers University of Technology, Sweden, 1997. [10] I. Munteanu, N.A. Cutululis, A.I. Bratcu, and E.Ceanga. Optimization of variable speed wind power systems based on a LQG approach. Control Engineering Practice, 13(7):903– 912, 2005. [11] B. Beltran, T. Ahmed-Ali, and M. El Hachemi Benbouzid. Sliding mode power control of variable-speed wind energy conversion systems. IEEE Trans. Energy Conversion, 23(2):551–558, 2008. [12] S.Bououden, M. Chadli, S.Filali, and A.E. Hajjaji. Fuzzy model based multivariable predictive control of a variable

[20]

[21]

[22]

8882

speed wind turbine: LMI approach. Renewable Energy, 37(1):434–439, 2012. W. Meng, Q. Yang, Y. Ying, Y. Sun, Z. Yang, and Y. Sun. Adaptive power capture control of variable-speed wind energy conversion systems with guaranteed transient and steady-state performance. IEEE Trans.Energy Conversion, 28(3):716–725, 2013. Y.She, X. She, and M.E. Baran. Universal tracking control of wind conversion system for purpose of maximum power acquisition under hierarchical control structure. IEEE Trans. Energy Conversion, 26(3):766–775, 2011. D. Schlipf and and M. K¨uhn D.J. Schlipf. Nonlinear model predictive control of wind turbines using lidar. Wind Energy, 16(7):1107–1129, 2013. Feiyue Wang, Huaguang Zhang, and Derong Liu. Adaptive dynamic programming: an introduction. Computational Intelligence Magazine, IEEE, 4(2):39–47, 2009. T. Burton, D. Sharpe, N. Jenkins, and E. Bossanyi. Wind energy: handbook. Wiley, 2001. S. Heier and R. Waddington. Grid integration of wind energy conversion systems. Wiley, 1998. J.G. Slootweg, H. Polinder, and W.L. Kling. Dynamic modelling of a wind turbine with doubly fed induction generator. In Proc. IEEE Power Engineering Society Summer Meeting, pages 644–649, Vancouver, BC, Canada, 2001. L. Ran, J.R. Bumby, and P.J. Tavner. Use of turbine inertia for power smoothing of wind turbines with a DFIG. In Proc. 11th International Conference on Harmonics and Quality of Power, pages 106–111, 2004. F. L. Lewis, S. Jagannathan, and A. Yesilderik. Neural Network Control of Robot Manipulators and Nonlinear Systems. Taylor and Francis, 1999. Q. Yang and S. Jagannathan. Reinforcement learning controller design for affine nonlinear discrete-time systems using online approximators. IEEE Trans. Systems, Man, and Cybernetics, Part B: Cybernetics, 42(2):377–390, 2012.

Suggest Documents