Formation Learning Control of Multiple Autonomous ... - IEEE Xplore

3 downloads 0 Views 1MB Size Report
Abstract—In this paper, a new concept of formation learning control is introduced to the field of formation control of multiple autonomous underwater vehicles ...
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON CYBERNETICS

1

Formation Learning Control of Multiple Autonomous Underwater Vehicles With Heterogeneous Nonlinear Uncertain Dynamics Chengzhi Yuan, Member, IEEE, Stephen Licht, and Haibo He, Senior Member, IEEE

Abstract—In this paper, a new concept of formation learning control is introduced to the field of formation control of multiple autonomous underwater vehicles (AUVs), which specifies a joint objective of distributed formation tracking control and learning/identification of nonlinear uncertain AUV dynamics. A novel two-layer distributed formation learning control scheme is proposed, which consists of an upper-layer distributed adaptive observer and a lower-layer decentralized deterministic learning controller. This new formation learning control scheme advances existing techniques in three important ways: 1) the multi-AUV system under consideration has heterogeneous nonlinear uncertain dynamics; 2) the formation learning control protocol can be designed and implemented by each local AUV agent in a fully distributed fashion without using any global information; and 3) in addition to the formation control performance, the distributed control protocol is also capable of accurately identifying the AUVs’ heterogeneous nonlinear uncertain dynamics and utilizing experiences to improve formation control performance. Extensive simulations have been conducted to demonstrate the effectiveness of the proposed results. Index Terms—Autonomous underwater vehicles (AUVs), formation learning control, multiagent systems, neural networks (NNs).

I. I NTRODUCTION S A TYPICAL type of ocean vehicles, autonomous underwater vehicles (AUVs) play an important role in performing various marine activities, such as pipeline inspections, seafloor mapping, bathymetric survey, and marine biology explorations, etc. [1]–[4]. For some specific activities, due to the complication of missions and/or stringent performance requirements, it is often demanded to employ multiple relatively inexpensive and small AUVs working in a collaborative

A

Manuscript received March 27, 2017; revised July 17, 2017; accepted September 7, 2017. This paper was recommended by Associate Editor Y. Shi. (Corresponding author: Chengzhi Yuan.) C. Yuan is with the Department of Mechanical, Industrial and Systems Engineering, University of Rhode Island, Kingston, RI 02881 USA (e-mail: [email protected]). S. Licht is with the Department of Ocean Engineering, University of Rhode Island, Narragansett, RI 02882 USA (e-mail: [email protected]). H. He is with the Department of Electrical, Computer, and Biomedical Engineering, University of Rhode Island, Kingston, RI 02881 USA (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCYB.2017.2752458

fashion, instead of using a single expensive specialized AUV performing solo missions. It is expected that greater efficiency and operational capability can be realized in a more efficient, economic, and safer way via coordination among multiple AUVs [5]. This motivates considerable research efforts over the past decades focusing on multiple AUVs coordination and distributed control [6]–[11]. In particular, formation control provides a fundamental paradigm for various multi-AUV coordination problems, aiming to regulate the relative positions, velocities, and orientations among multiple AUVs, while at the same time maintain a desired formation pattern [5]. The problem of multi-AUV formation control is well-known challenging, not only because the multiple AUVs have highly complex nonlinear dynamics and complicated interaction behaviors, but also because of the uncertain, dynamical, and adversarial underwater environments. Nevertheless, due to the broad applications of AUVs in nowadays ocean industries, the problem of multi-AUV formation control has received increasing attention from both marine technology and control engineering communities. Various formation control approaches have been reported in the literature, such as the behavioral approach [12], [13], leader-following approach [6], [10], and virtual structure approach [8], [14]. In the behavioral approach [12], [13], the overall formation control design problem is usually decomposed into several subproblems, and the control action for each vehicle is derived by a weighted average of the control solutions of these subproblems. This approach allows decentralized implementation while it is generally difficult to determine the associated weighting parameters. In the leader-following framework [6], [10], one vehicle is chosen to be the leader, while the remaining vehicles are designated as followers. The basic idea is that the leader tracks some desired path and the followers maintain a prescribed geometric configuration with the leader. This approach could determine the overall formation behavior of the vehicle group by designing a desired motion behavior for the leader vehicle. The idea of virtual structure/leader approach [8], [14] is similar to that of the leader-following approach, but instead of specifying a physical vehicle as a leader, a virtual structure will be constructed to generate the desired formation pattern. This approach inherits the merits from the leader-following approach while is more advantageous as the virtual leader dynamics can be arbitrarily

c 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. 2168-2267  See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 2

specified for more flexible formation control designs. Other formation control approaches include the artificial potential function method [15], [16], and vehicle-formation dynamics integration method [17], [18]. More comprehensive overviews on formation control of multivehicle systems can be found in [13] and [19]–[21]. In spite of the rich literature in the field, there still exist several critical issues that have not been adequately addressed to date. In particular, virtually all of the multi-AUV formation control algorithms from the above-mentioned references are developed under a strong assumption that all individual AUV agents in the group have an identical (homogenous) system dynamics and precise mathematical models for the AUVs are available for distributed control design. This assumption is deemed too unrealistic in practical AUV formation control. The technical difficulties of considering practical AUV formation control lie in the complex nonlinear dynamics of real AUVs and the networked interconnecting structure for multiple AUVs. Specifically, a typical idea for multi-AUV formation control design is to first plan desired formation paths/trajectories for all the AUVs in the group, and then design local tracking controllers for each individual AUV to track its corresponding planning path. However, it is generally difficult to enable an AUV to track the desired formation path due to its complex nonlinear dynamics, especially when no accurate knowledge about the AUV model is available for controller design. On the other hand, it is difficult to realize the overall multi-AUV formation control design in a full distributed and decentralized manner, while treating the overall problem as a centralized control design problem would lead to a very complex controller structure, especially when the AUV group size is large. As such, it is imperative to develop new distributed control methodologies to facilitate more flexible and reliable multi-AUV formation control designs by simultaneously considering the issues of heterogeneity, uncertainty, and nonlinearity in multi-AUV systems. In this paper, we propose a novel formation learning control scheme for multi-AUV systems with heterogenous nonlinear uncertain dynamics under the virtual leader-following framework. Specifically, each AUV agent under consideration is modeled as a nonlinear uncertain system, which is not necessarily identical to each other. The virtual leader is constructed as a neutrally stable linear time-invariant (LTI) system in order to generate a periodic reference trajectory, which is assumed to be available only for the leader’s neighboring AUV agents. To overcome the heterogeneity and nonlinear uncertainty issues as discussed above, a novel two-layer formation control design architecture is proposed, which consists of an upper-layer distributed adaptive (DA) observer and a lower-layer decentralized deterministic learning (DDL) controller. The upper-layer DA observer is serving as a guidance system to estimate the virtual leader’s system information. Interagent communication occurs on this layer by sharing estimated information among neighboring AUV agents. The lower-layer DDL controller will regulate the local AUV’s position to track the desired reference signal by feeding back only local information from the AUV itself. Since no interagent communication is involved on this layer, the associated DDL

IEEE TRANSACTIONS ON CYBERNETICS

controller can thus be designed and implemented in a fully distributed manner. The DDL control law is constructed by combining the methodologies from backstepping adaptive control theory [22] and deterministic learning control theory [23]. With the proposed formation learning control scheme, both formation tracking control performance and locally accurate identification/learning of the AUV’s nonlinear uncertain dynamics can be accomplished simultaneously during the formation control process using distributed localized radial basis function neural networks (RBF NNs). More importantly, the learned knowledge can be effectively stored/represented in a time-invariant fashion using distributed constant RBF NNs, and reutilized for more flexible and efficient distributed formation control. Extensive simulations have been conducted to demonstrate the effectiveness of the proposed results. The contributions of this paper can be understood as follows. A new formation learning control scheme with novel distributed formation control algorithms is proposed for multi-AUV formation control. This new scheme distinguishes itself from many existing formation control techniques in the following aspects. 1) It overcomes the two challenging issues related to heterogeneity and nonlinear uncertainty for multi-AUV formation control. 2) It realizes the distributed control design and implementation in a fully distributed manner without involving any global information. 3) It renders the distributed formation control system with a distinctive capability of adapting to uncertain AUV dynamics through locally accurate learning, knowledge representation/storing, and experience reutilization. The formation learning control concept proposed in this paper will not only introduce a new research pathway to the field of multi-AUV formation control, but also pave the foundation for further development of general multiagent distributed adaptive (DA) control systems with enabling collaborative intelligence. The rest of this paper is organized as follows. Some preliminary reviews and the problem statement are given in Section II. The main results of this paper including the DA observer, DDL controller, locally accurate learning analysis, and experiencebased formation control are presented in Section III through Section VI, respectively. Simulation results are provided in Section VII, and the conclusions are drawn in Section VIII. II. P RELIMINARIES AND P ROBLEM S TATEMENT A. Notation and Graph Theory We use R and R+ to denote the sets of real numbers and positive real numbers, respectively. Rm×n is the set of real m × n matrices, and Rn represents the set of real n × 1 vectors. The identity matrix of an arbitrary dimension is denoted by I. 1n denotes an n-dimensional column vector with all elements being 1. Sn and Sn+ are used to denote the sets of real symmetric n×n matrices and positive definite matrices, respectively. A block diagonal matrix with matrices X1 , X2 , . . . , Xp on its main diagonal is denoted by diag{X1 , X2 , . . . , Xp }. The notation A ⊗ B stands for the Kronecker product of matrices A and B.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. YUAN et al.: FORMATION LEARNING CONTROL OF MULTIPLE AUVs WITH HETEROGENEOUS NONLINEAR UNCERTAIN DYNAMICS

 is the vectorization of A, obtained by stackFor a matrix A, A ing the columns of the A on top of one another. For a series of column vectors x1 , . . . , xn , col{x1 , . . . , xn } represents a column vector by stacking them together. For two integers k1 < k2 , we denote I[k1 , k2 ] = {k1 , k1 + 1, . . . , k2 }. For x ∈ Rn , its norm is defined as x := (xT x)1/2 . For a square matrix A, λi (A) denotes its ith eigenvalue with λ(A) and λ(A) representing its maximum and minimum eigenvalues, respectively. A digraph G = (V, E) consists of a finite set of nodes V = {1, 2, . . . , N} and an edge set E ⊆ V × V. An edge of E from node i to node j is denoted by (i, j), where the nodes i and j are called the parent node and the child node, and the node i is also called a neighbor of the node j. Let Ni denote the subset of V which consists of all the neighbors of the node i. If the digraph G contains a sequence of edges of the form (i1 , i2 ), (i2 , i3 ), . . . , (ik , ik+1 ), then the set {(i1 , i2 ), (i2 , i3 ), . . . , (ik , ik+1 )} is called a path of G from node i1 to node ik+1 and node ik+1 is said to be reachable from node i1 . A directed tree is a digraph in which every node has exactly one parent except for one node, called the root, from which every other node is reachable. The directed graph G contains a directed spanning tree if and only if G has at least one node which can reach every other node. The weighted adjacency matrix of a digraph G is a non-negative matrix A = [aij ] ∈ RN×N , where aii = 0 and aij > 0 ⇒ (j, i) ∈ E. The Laplacian of a digraph G is denoted by L = [lij ] ∈ RN×N ,  where lii = N j=1 aij and lij = −aij if i = j. It is known that at least one eigenvalue of L is at the origin and all nonzero eigenvalues of L have positive real parts. Moreover, according to [24, Lemma 3.3], L has one eigenvalue at the origin and all other (N − 1) eigenvalues with positive real parts if and only if the digraph G contains a directed spanning tree. B. RBF Neural Networks RBF networks can be described by fnn (Z) = The Nn T w ∈ Z ⊂ i=1 i si (Z) = W S(Z) [25], where Z q R is the input vector, W = [w1 , . . . , wNn ]T ∈ RNn is the weight vector, Nn is the NN node number, and S(Z) = [s1 (Z − ξ1 ), . . . , sNn (Z − ξNn )]T , with si (·) being an RBF, and ξi (∀i ∈ I[1, Nn ]) being distinct points in state space. The Gaussian function si (Z − ξi ) = exp [[(−(Z − ξi )T (Z − ξi ))/γi2 ]] is one of the most commonly used RBFs, where ξi = [ξi1 , ξi2 , . . . , ξiq ]T is the center of the receptive field and γi is the width of the receptive field. The Gaussian function belongs to the class of localized RBFs in the sense that si (Z − ξi ) → 0 as Z → ∞. It has been shown in [25] and [26] that for any continuous function f (Z) : Z → R, where Z ⊂ Rq is a compact set, and for the NN approximator, where the node number Nn is sufficiently large, there exists an ideal constant weight vector W ∗ , such that for any  ∗ > 0, f (Z) = W ∗T S(Z) + , ∀Z ∈ Z , where || <  ∗ is the approximation error. The ideal weight vector W ∗ is an “artificial” quantity required for analysis, and is defined as the value of W that minimizes || for all Z ∈ Z ⊂ Rq , i.e., W ∗  arg minW∈RNn {supZ∈Z | f (Z) − W T S(Z)|}. Moreover, for any bounded trajectory Z(t) within the compact set Z , f (Z) can be approximated by using a

3

limited number of neurons located in a local region along the trajectory: f (Z) = Wζ∗T Sζ (Z) + ζ , where ζ is the approximation error, with ζ = O() = O( ∗ ), Sζ (Z) = [sj1 (Z), . . . , sjζ (Z)]T ∈ RNζ , Wζ∗ = [w∗j1 , . . . , w∗jζ ]T ∈ RNζ , Nζ < Nn , and the integers ji = j1 , . . . , jζ are defined by |sji (Zp )| > ι (ι > 0 is a small positive constant) for some Zp ∈ Z(t). This holds if Z(t)−ξji  < ε for t > 0. The following lemma regarding the persistent excitation (PE) condition for RBF NNs is recalled from [23]. Lemma 1: Consider any continuous recurrent trajectory1 Z(t) : [0, ∞) → Rq . Z(t) remains in a bounded compact set Z ⊂ Rq . Then for RBF NN W T S(Z) with centers placed on a regular lattice (large enough to cover compact set Z ), the regressor subvector Sζ (Z) consisting of RBFs with centers located in a small neighborhood of Z(t) is persistently exciting. C. Problem Statement In this paper, we consider a multiagent system consisting of N AUVs with heterogeneous nonlinear uncertain dynamics, which can be modeled as [1] η˙ i = Ji (ηi )νi Mi ν˙ i + Ci (νi )νi + Di (νi )νi + gi (ηi ) + i (χi ) = τi

(1)

where the subscript i ∈ I[1, N] labels the ith AUV agent in the group. For each i ∈ I[1, N], ηi = [xi , yi , ψi ]T ∈ R3 represent the ith AUV’s positions and compass heading in a global coordinate frame, and νi = [ui , vi , ri ]T ∈ R3 represent its linear velocities and angular heading rate about a body fixed reference frame. Mi , Ci (νi ), and Di (νi ) are the inertia matrix, the matrix of Coriolis and centripetal terms, and the damping matrix, respectively. gi (ηi ) is a vector of restoring forces due to gravitational and buoyancy forces and moments, which is equal to zero under the assumption that horizontal and vertical dynamics are fully decoupled. i (χi ) with χi := col{ηi , νi } is the vector of generalized deterministic unmodeled uncertain dynamics. τi ∈ R3 is the vector of control input⎡ signals, and Ji (ηi ) is⎤ the rotation matrix given cos(ψi ) − sin(ψi ) 0 by Ji (ηi ) = ⎣ sin(ψi ) cos(ψi ) 0⎦, which, together with the 0 0 1 vehicle inertia matrix Mi = MiT ∈ S3+ , is assumed to be known for control design, while the matrix coefficients Ci (νi ), Di (νi ), gi (ηi ), and i (χi ) are unknown nonlinear functions. For leader-following formation tracking control, we consider the following virtual leader dynamics to generate the tracking reference signals: χ˙ 0 = A0 χ0

(2)

where χ0 := col{η0 , ν0 } with η0 ∈ R3 and ν0 ∈ R3 are the states of the leader, A0 ∈ R3×3 is a constant matrix available to only the leader’s neighboring AUV agents. Given the multiple AUV systems (1) and the leader dynamics (2), we can define a non-negative matrix A = [aij ], i, j ∈ 1 A recurrent trajectory represents a large set of periodic and periodic-like trajectories generated from linear/nonlinear dynamical systems. A detailed characterization of recurrent trajectories can be found in [23].

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 4

I[0, N], where for i ∈ I[1, N], ai0 > 0 if and only if the agent i has access to the reference signals η0 and ν0 , and all other elements of A are arbitrary non-negative numbers satisfying aii = 0, ∀i ∈ I[1, N]. Let G = (V, E) be a diagraph with respect to A. Then, V = {0, 1, . . . , N} with the node 0 corresponding to the leader and all other nodes corresponding to the remaining N AUV agents. We have the following assumptions. Assumption 1: All the eigenvalues of A0 in the leader’s dynamics (2) are located on the imaginary axis. Assumption 2: The digraph G contains a directed spanning tree with the node 0 as its root. Note that Assumption 1 is used to ensure that the leader dynamics will generate stable and meaningful reference trajectories for formation control use. Essentially, with this assumption, all the states of the leader dynamics will remain uniformly bounded, i.e., χ0 = col{η0 , ν0 } ∈ 0 , for all t ≥ 0, where 0 ⊂ R6 is a compact set. Moreover, the associated system trajectory denoted by ϕ0 (χ0 (0)), starting from the initial condition χ0 (0), is a periodic signal. It will be shown later that this periodicity property is important in ensuring the PE condition and thus parameter convergence in the DA control systems. The eigenvalue constraints on A0 in Assumption 1 can be relaxed if only formation tracking control performance is concerned, which will also be clarified in the sequel. Remark 1: It is observed that the leader dynamics (2) is constructed as a neutrally stable LTI system, which can be used to generate sinusoidal reference trajectories with any arbitrary frequency for formation tracking control. Such type of leader dynamics is typically adopted in the multiagent leaderfollowing distributed control literature (e.g., [27] and [28]). On the other hand, with Assumption 2, we have the following results regarding the Laplacian matrix L of the network graph G. Specifically, let  be an N × N non-negative diagonal matrix whose ith diagonal elements is ai0 , i ∈ I[1, N]. Then G can be partitioned as L =  N the Laplacian L of  a −[a , . . . , a ] 01 0N j=1 0j , where for all j ∈ I[1, N], a0j > −1N H 0 if (j, 0) ∈ E and a0j = 0 otherwise. Then we have H1N = 1N since L1N+1 = 0. Furthermore, according to [29], all the nonzero eigenvalues of H, if any, have positive real parts, and H is nonsingular if and only if Assumption 2 is satisfied. The leader-following formation learning control problem considered in this paper is specified as follows. Problem 1: Given the multiple AUV systems (1) and the virtual leader dynamics (2) with a directed network topology G, our objective is to design a distributed NN learning control protocol using only local information. 1) Formation Keeping: All N AUV agents (1) will follow the leader (2) by keeping a predesignated formation pattern, i.e., each AUV agent will maintain a desired distance with respect to the leader’s position η0 . 2) Dynamics Learning: Each individual AUV’s nonlinear uncertain dynamics can be accurately identified/learned during the formation control process, and the learned knowledge can be further reutilized to achieve stable formation control with improved controlled performance.

IEEE TRANSACTIONS ON CYBERNETICS

Fig. 1.

Two-layer distributed controller architecture.

Remark 2: We stress that the above problem formulation assumes that: 1) formation control is required only in the horizontal plane, which is appropriate if the AUVs are all operating at constant depth and 2) the vertical dynamics of the 6-DOF AUV system (see [30]) are completely decoupled from the horizontal dynamics. To solve the above problem, we will propose a two-layer hierarchical design scheme. As shown in Fig. 1, it consists of an upper-layer DA observer exchanging information among neighboring agents, and the lower-layer DDL controller feeding back only local information from the AUV agent itself. The next section will present the upper-layer DA observer design, and the design of the lower-layer DDL control law will be given in the subsequent sections. III. D ISTRIBUTED A DAPTIVE O BSERVER D ESIGN For leader-following formation control, the leader’s information (including the tracking reference signals χ0 and system matrix A0 ) might not be available for all the AUV agents. This necessitates the interaction among the AUV agents to collaboratively estimate the leader’s information. To this end, inspired by the multiagent consensus and graph theories [5], we will construct the following DA observer for the AUV systems: χˆ˙ 0i (t) = Aˆ i0 (t)χˆ 0i (t) + β1

N

j aij χˆ 0 (t) − χˆ 0i (t)

(3)

j=0

where the superscript i ∈ I[1, N] is used to denote the observer index associated with the ith AUV agent, χˆ 0i = col{ηˆ 0i , νˆ 0i } ∈ R6 are the observer states. In particular, ηˆ 00 := η0 and νˆ 00 := ν0 . The time-varying system parameters Aˆ i0 (t) are updated via the following cooperative adaptation law: N

j ˙ Aˆ i0 (t) = β2 aij Aˆ 0 (t) − Aˆ i0 (t) ,

∀i ∈ I[1, N]

(4)

j=0

where Aˆ i0 ∈ R3×3 with Aˆ 00 := A0 . The positive numbers β1 , β2 > 0 are subject to design. Remark 3: Note that each AUV agent in the group will be equipped with an observer in the form of (3) and (4), which contains two state variables χˆ 0i and Aˆ i0 . For each i ∈ I[1, N], χˆ 0i is used to estimate the virtual leader’s state χ0 , and Aˆ i0 is used to estimate the leader’s system matrix A0 . The real-time information needed to implement the ith observer includes: 1) the estimated state χˆ 0i and matrix Aˆ i0 , which can be obtained

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. YUAN et al.: FORMATION LEARNING CONTROL OF MULTIPLE AUVs WITH HETEROGENEOUS NONLINEAR UNCERTAIN DYNAMICS

j

from the ith AUV itself; and 2) the estimated states χˆ 0 and j matrices Aˆ 0 for all j ∈ Ni , which can be obtained from the ith AUV’s neighbors. Recalled that in (3) and (4), if j ∈ Ni , then aij = 0, indicating that the ith observer does not utilize any information from the jth AUV agent. This means that the proposed distributed observer (3), (4) can be implemented in each local AUV agent using only local estimated information from the agent itself and its neighbors, but without involving any global information, such as the AUV group size and/or the network interconnection topology, etc. To examine the stability and convergence properties of the DA observer (3), (4), we denote χ˜ 0i = χˆ 0i − χ0i , A˜ i0 = Aˆ i0 − Ai0 for all i ∈ I[1, N]. Then, the following local error dynamics can be obtained for agent i ∈ I[1, N]: χ˙˜ 0i (t) = Aˆ i0 (t)χˆ 0i (t) − A0 χˆ 0i (t) + A0 χˆ 0i (t) − A0 χ0 (t) N

j + β1 aij χˆ 0 (t) − χ0 (t) + χ0 (t) − χˆ 0i (t) =

j=0 i A0 χ˜ 0 (t) + A˜ i0 (t)χ˜ 0i (t) + A˜ i0 (t)χ0 (t) N

j aij χ˜ 0 (t) − χ˜ 0i (t) + β1 j=0

A˙˜ i0 (t) = β2

N

j aij A˜ 0 (t) − A˜ i0 (t) .

j=0

By introducing χ˜ 0 = col{χ˜ 01 , . . . , χ˜ 0N }, A˜ 0 = col{A˜ 10 , . . . , A˜ N 0 }, ˜Ab = diag{A˜ 1 , . . . , A˜ N }, we have the error dynamics of the 0 0 overall multiagent system as χ˙˜ 0 (t) = ((IN ⊗ A0 ) − β1 (H ⊗ I6 ))χ˜ 0 (t) + A˜ b (t)χ˜ 0 (t) + A˜ b (t)(1N ⊗ χ0 (t)) (5) ˙A˜ (t) = −β (H ⊗ I )A˜ (t). (6) 0 2 3 0 We now present the main result of this section. Theorem 1: Consider the error system (5) and (6). Under Assumptions 1 and 2, if β1 , β2 > 0, then for all i ∈ I[1, N] and for any initial conditions χ0 (0), χˆ 0i (0), Aˆ i0 (0), we have limt→∞ A˜ i0 (t) = 0 and limt→∞ χ˜ 0i (t) = 0 exponentially. Proof: We first consider the estimation error dynamics (6), which can be rewritten in the following vector form:  ˙˜ (t) = −β (I ⊗ (H ⊗ I ))A (7) A 0 2 6 3 ˜ 0 (t). According to [29], all the eigenvalues of H have positive real parts under Assumption 2. Then, for any positive number β2 > 0, the matrix −β2 (I6 ⊗ (H ⊗ I3 )) is guaranteed to be Hurwitz, which implies exponential stability of system (7). ˜ (t) = 0 exponentially, leading to Thus, we have limt→∞ A 0 i limt→∞ A˜ 0 (t) = 0 exponentially for all i ∈ I[1, N]. Then, we consider the error dynamics for χ˜ 0 in (5). Based on the above discussions, we have limt→∞ A˜ b (t) = 0 exponentially, thus A˜ b (t)(1N ⊗ χ0 (t)) will decay to zero exponentially. According to [31, Lemma 1], if χ˙˜ 0 (t) = ((IN ⊗ A0 ) − β1 (H ⊗ I6 ))χ˜ 0 (t)

(8)

is exponentially stable, we can conclude that limt→∞ χ˜ 0 (t) = 0 exponentially. To prove this, with Assumption 1, we know

5

that all eigenvalues of matrix A0 have zero real parts, and since H is nonsingular and has all its eigenvalues on the righthalf plane, system (8) is exponential stable for any positive number β1 > 0. Consequently, we have limt→∞ χ˜ 0 (t) = 0, i.e., limt→∞ χ˜ 0i (t) = 0 exponentially for all i ∈ I[1, N]. It is seen that through cooperative observation and estimation, each individual agent would be able to adequately estimate both state and system matrix of the leader using the DA observer (3), (4). This estimated information will be further utilized by each agent to facilitate the lower-layer DDL control design, which will be presented in next section. IV. D ECENTRALIZED D ETERMINISTIC L EARNING C ONTROL D ESIGN In this section, we will develop the lower-layer DDL control law for the multi-AUV system (1) in order to achieve the overall formation learning control objective. To this end, we use di∗ to denote the desired distance between the ith AUV agent’s position ηi and the virtual leader’s position η0 . The formation control problem can thus be formulated as position tracking control for all the AUV agents, which requires each local AUV agent’s position ηi to track the reference signal ηd,i := η0 + di∗ . However, since the leader’s state information χ0 is not accessible for every AUV agent, we will apply ηˆ d,i := ηˆ 0i +di∗ instead of ηd,i as the tracking reference signal. It should be noted that according to Theorem 1, ηˆ d,i is generated from each local agent itself, and will exponentially converge to ηd,i . This thus guarantees that the DDL controller is implementable and the formation control objective is achievable for all i ∈ I[1, N] based on the use of ηˆ d,i . To design the DDL control law for both objectives of formation tracking control and accurate learning of the AUVs’ nonlinear uncertain dynamics, we will incorporate the methodologies from [23] of deterministic learning using RBF NNs and the well-known backstepping adaptive control design technique from [22]. Specifically, consider the ith AUV agent of (1), we define the position tracking error as z1,i = ηi − ηˆ d,i for all i ∈ I[1, N]. Then we have z˙1,i = Ji (ηi )νi − η˙ˆ 0i ,

∀i ∈ I[1, N].

(9)

Since Ji (ηi )JiT (ηi ) = I (∀i ∈ I[1, N]), we treat νi as a virtual control input to the above system of z1,i , and let z2,i = νi − αi

αi = JiT (ηi )(−K1,i z1,i + η˙ˆ 0i ),

∀i ∈ I[1, N]

(10)

where αi can be viewed as a desired virtual control input for νi in (9), K1,i ∈ S3+ is subject to design. Substituting (10) into (9) yields z˙1,i = Ji (ηi )z2,i − K1,i z1,i ,

∀i ∈ I[1, N].

(11)

Furthermore, evaluating the derivative of z2,i yields z˙2,i = ν˙ i − α˙ i = Mi−1 (−Ci (νi )νi − Di (νi )νi − gi (ηi ) − i (χi ) + τi ) − α˙ i , ∀i ∈ I[1, N] (12)

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 6

IEEE TRANSACTIONS ON CYBERNETICS

where α˙ i =

J˙ iT (ηi )(−K1,i z1,i + JiT (ηi )(K1,i η˙ˆ 0i

+ η˙ˆ 0i )

− K1,i Ji (ηi )νi + η¨ˆ 0i ).

(13)

Since the nonlinear functions Ci (νi ), Di (νi ), gi (ηi ), and i (χi ) are unknown, we denote Fi (Zi ) = Ci (νi )νi + Di (νi )νi + gi (ηi ) + i (χi ) (14)

T f1,i (Zi ) f2,i (Zi ) f3,i (Zi ) where Fi (Zi ) = and Zi = col{ηi , νi } ∈ Zi ⊂ R6 with Zi being a bounded compact set. We then employ the following RBF NNs to approximate fk,i (∀i ∈ I[1, N], k ∈ I[1, 3]) that is: ∗T Sk,i (Zi ) + k,i (Zi ) fk,i (Zi ) = Wk,i

(15)

∗ denotes the ideal constant weights, and | (Z )| ≤ where Wk,i k,i i ∗ k,i is the approximation errors with an arbitrarily small con∗ > 0 for all i ∈ I[1, N], k ∈ I[1, 3]. Since W ∗ stant k,i k,i is unknown, we will design a self-adaptation law to estimate ∗ online. Let W ˆ k,i be the estimate of W ∗ , then the DDL Wk,i k,i feedback control law will be constructed as

ˆ iT SiF (Zi ) + Mi α˙ i τi = −JiT (ηi )z1,i − K2,i z2,i + W

(16)

ˆ iT SiF (Zi ) = where K2,i ∈ S3+ is a design parameter, W ˆ T S2,i (Zi ) W ˆ T S3,i (Zi )]T is used to approximate ˆ T S1,i (Zi ) W [W 1,i 2,i 3,i the unknown nonlinear function vector Fi (Zi ) in (14) along the trajectory Zi within the compact set Zi . A robust selfˆ k,i will be constructed adaptation law for online updating W using the σ -modification technique [32]

˙ˆ = − S (Z )z + σ W (17) W k,i k,i k,i i 2k,i k,i ˆ k,i T > 0, and σ where z2,i = [z21,i , z22,i , z23,i ]T . k,i = k,i k,i > 0 for all i ∈ I[1, N], k ∈ I[1, 3] are free parameters to be designed. Remark 4: The proposed lower-layer DDL control law consists of (16) and (17). Different from the upper-layer DA observer design, this lower-layer control law is fully decentralized to each local agent in the sense that it utilizes only local agent’s own information for feedback control, includˆ k,i , but without involving any information ing χi , χˆ 0i , and W exchange among neighboring agents. ˆ k,i − W ∗ for all i ∈ I[1, N], k ∈ I[1, 3], ˜ k,i = W Denote W k,i then interconnecting the DDL control law (16), (17) to the AUV plant (1) yields the following closed-loop system:

z˙1,i = −K1,i z1,i + Ji (ηi )z2,i   ˜ iT SiF (Zi ) − i (Zi ) z˙2,i = Mi−1 −JiT (ηi )z1,i − K2,i z2,i + W

ˆ k,i ˜˙ k,i = −k,i Sk,i (Zi )z2k,i + σk,i W (18) W ˜ iT SiF (Zi ) = where for all i ∈ I[1, N], k ∈ I[1, 3], W T T T T ˜ ˜ ˜ [W S1,i (Zi ), W2,i S2,i (Zi ), W3,i S3,i (Zi )] , and i (Zi ) = 1,i

T 1,i (Zi ) 2,i (Zi ) 3,i (Zi ) . Based on the above closed-loop dynamics (18), we first have the following theorem summarizing the results of the overall system’s stability and tracking control performance. Theorem 2: Consider the local closed-loop system (18). For each i ∈ I[1, N], if there exists a sufficiently large compact set

Zi such that Zi ∈ Zi for all t ≥ 0, then for any bounded initial conditions, we have: 1) all the signals in the closedloop system remain uniformly ultimately bounded (UUB) and 2) the position tracking error ηi − ηˆ d,i converges exponentially to a small neighborhood around zero in finite time Ti > 0 by choosing the design parameters with sufficiently large λ(K1,i ) > 0 and λ(K2,i ) > 2λ(K1,i ) > 0, and sufficiently small σk,i > 0 for all i ∈ I[1, N], k ∈ I[1, 3]. Proof: 1) Consider the following Lyapunov function candidate for the closed-loop system (18): 1 T 1 1 T −1 ˜ k,i  W ˜ z1,i z1,i + zT2,i Mi z2,i + W k,i k,i . (19) 2 2 2 3

Vi =

k=1

Evaluating the derivative of Vi along the trajectory of (18) for all i ∈ I[1, N] yields V˙ i = zT1,i (−K1,i z1,i + Ji (ηi )z2,i )   ˜ iT SiF (Zi ) − i (Zi ) + zT2,i −JiT (ηi )z1,i − K2,i z2,i + W 3

T ˜ k,i ˆ k,i Sk,i (Zi )z2k,i + σk,i W W − k=1

= −zT1,i K1,i z1,i − zT2,i K2,i z2,i − zT2,i i (Zi ) −

3

T ˆ ˜ k,i σk,i W Wk,i ,

∀i ∈ I[1, N].

k=1

Choose K2,i = K1,i + K22,i such that K1,i , K22,i ∈ S3+ . Since through completion of squares we have ∗ 2 ˜ k,i 2 σk,i Wk,i σk,i W + 2 2 T (Z ) (Z )   ∗ 2 i i i  ≤ i  −zT2,i K22,i z2,i − zT2,i i (Zi ) ≤ i  4λ K22,i 4λ K22,i T ˆ ˜ k,i Wk,i ≤ − −σk,i W

∗ ,  ∗ ,  ∗ ]T . Then, we obtain where i∗ = [1,i 2,i 3,i

i∗ 2   4λ K22,i   3 ∗ 2 ˜ k,i 2 σk,i Wk,i σk,i W + + − . 2 2

V˙ i ≤ −zT1,i K1,i z1,i − zT2,i K1,i z2,i +

(20)

k=1

It follows that V˙ i is negative definite whenever:   3 i∗  σk,i ∗   Wk,i  z1,i  >     + 2λ K1,i 2 λ K1,i λ K22,i k=1   3 i∗  σk,i ∗   Wk,i z2,i  >      + 2λ K 1,i 2 λ K λ K k=1 1,i

22,i

i∗  ∗ ∗ ˜ k,i  >  ˜ k,i W Wk,i  := W  + 2 σk,i λ K22,i k=1 3

(21)

for all i ∈ I[1, N], ∃k ∈ I[1, 3]. This leads to UUB of the ˜ k,i for all i ∈ I[1, N], k ∈ I[1, 3]. As signals z1,i , z2,i , and W a result, it is easily verified that since ηˆ d,i = ηˆ 0i + di∗ with ηˆ 0i bounded (according to Theorem 1 and Assumption 1), ηi = z1,i + ηˆ d,i is bounded for all i ∈ I[1, N]. Similarly, the

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. YUAN et al.: FORMATION LEARNING CONTROL OF MULTIPLE AUVs WITH HETEROGENEOUS NONLINEAR UNCERTAIN DYNAMICS

boundedness of νi = z2,i + αi can be confirmed by the fact ˆ k,i = W ˜ k,i + W ∗ that αi in (10) is bounded. In addition, W k,i is also bounded for all i ∈ I[1, N], k ∈ I[1, 3] because of the ˜ k,i and W ∗ . Moreover, in light of (13), α˙ i boundedness of W k,i is bounded as all the terms on the right-hand side of (13) are bounded. This leads to the boundedness of the control signal τi in (16) since the Gaussian function vector SiF (Zi ) is guaranteed to be bounded for any Zi . As such, all the signals in the closed-loop system remain UUB, which completes the proof of the first part. 2) For the second part, it will be shown that ηi will converge arbitrarily close to ηˆ d,i in some finite time Ti > 0 for all i ∈ I[1, N]. To this end, we consider the following Lyapunov function candidate for the dynamics of z1,i and z2,i in (18): 1 T 1 (22) z z1,i + zT2,i Mi z2,i , ∀i ∈ I[1, N]. 2 1,i 2 The derivative of Vz,i is   V˙ z,i = zT1,i −K1,i z1,i + Ji (ηi )z2,i   ˜ iT SiF (Zi ) − i (Zi ) + zT2,i −JiT (ηi )z1,i − K2,i z2,i + W Vz,i =

= −zT1,i K1,i z1,i − zT2,i K2,i z2,i ˜ iT SiF (Zi ) − zT2,i i (Zi ), + zT2,i W

∀i ∈ I[1, N].

Similar to the proof of part one, we let K2,i = K1,i + 2K22,i with K1,i , K22,i ∈ S3+ . According to [23], the Gaussian RBF NN regressor SiF (Zi ) is bounded by SiF (Zi ) ≤ s∗i for any Zi and for all i ∈ I[1, N] with some positive number s∗i > 0. Through completion of squares we have ˜ iT SiF (Zi ) ≤ −zT2,i K22,i z2,i + zT2,i W

˜ i∗ 2 s∗2 W i  4λ K22,i

−zT2,i Ki,22 z2,i − zT2,i i (Zi ) ≤

i∗ 2   4λ K22,i

˜ i∗ = [W ˜∗ W ˜∗ W ˜ ∗ ]T . This leads to where W 1,i 2,i 3,i V˙ z,i ≤ −zT1,i K1,i z1,i − zT2,i K1,i z2,i + δi     1 T 1 z1,i z1,i + ≤ −2λ K1,i zT2,i Mi z2,i + δi 2 2λ(Mi ) ≤ −ρi Vz,i + δi , ∀i ∈ I[1, N] (23) where ρi = min{2λ(K1,i ), (2λ(K1,i )/λ(Mi ))} and δi = ∗ 2 ˜ i∗ 2 s∗2 (W i /4λ(K22,i )) + (i  /4λ(K22,i )) (∀i ∈ I[1, N]). Solving the inequality (23) yields 0 ≤ Vz,i (t) ≤ Vz,i (0) exp(−ρi t) +

δi , ρi

∀t ≥ 0, i ∈ I[1, N]

which together with (22) implies that 1

 δi z1,i 2 +z2,i 2 ≤ Vz,i (0) exp(−ρi t)+ min 1, λ(Mi ) 2 ρi and z1,i 2 + z2,i 2 ≤

2   Vz,i (0) exp(−ρi t) min 1, λ(Mi ) 2δi  . + ρi min 1, λ(Mi )

7

> Therefore, it is straightforward that given δ¯i  (2δi /ρi min{1, λ(Mi )}), there exists a finite time Ti > 0 for all i ∈ I[1, N] such that for all t ≥ Ti , both z1,i and z2,i satisfy z1,i (t) ≤ δ¯i , z2,i (t) ≤ δ¯i (∀i ∈ I[1, N]), where δ¯i can be made arbitrarily small by choosing sufficiently large λ(K1,i ) > 0 and λ(K2,i ) > 2λ(K1,i ) > 0 for all i ∈ I[1, N]. This ends the proof. Consequently, combining the results of Theorems 1 and 2, the following theorem can be obtained without proof. Theorem 3: Consider the multi-AUV system (1) and the virtual leader dynamics (2) with the network communication topology G. Under Assumptions 1 and 2, the objective 1) of Problem 1 (i.e., ηi converges to η0 +di∗ exponentially for all i ∈ I[1, N]) can be achieved by using the DA observer (3), (4) and the DDL control law (16), (17) with all the design parameters satisfying the requirements in Theorems 1 and 2, respectively. Remark 5: With the proposed two-layer formation learning control design architecture, interagent information exchange only occurs on the upper-layer DA observation, and only the observer’s estimated information but no physical plant state information is required to be shared among neighboring agents. Furthermore, since no global information is required for each local AUV control system design, the proposed formation learning control protocol can thus be designed and implemented in a fully distributed manner. Remark 6: It should be noted that for DA estimation (as discussed in Section III) and formation tracking control performance (as discussed in this section), the eigenvalue constraints on A0 in Assumption 1 is not required. This means that the formation tracking control is achievable for general reference trajectories (including not only periodic trajectories but also straight lines), as long as they are bounded. However, this constraint will be necessary in the next section to establish the accurate-learning property of the proposed approach. V. L OCALLY ACCURATE L EARNING F ROM C LOSED -L OOP F ORMATION C ONTROL For accurate learning/identification, we need to show parameter convergence of the RBF NNs weights in (16) and (17) to their ideal (optimal) values. The following theorem summarizes the main result of this section. Theorem 4: Consider the local closed-loop system (18) with Assumptions 1 and 2. For each i ∈ I[1, N], if there exists a sufficiently large compact set Zi such that Zi ∈ Zi for all t ≥ 0, then for any bounded initial conditions and ˆ k,i (0) = 0 (∀i ∈ I[1, N], k ∈ I[1, 3]), we have: 1) a partial W PE condition of internal closed-loop signals is satisfied; and 2) along the periodic reference tracking orbit ϕζ,i (Zi (t))|t≥Ti (denoting the orbit of the NN input signal Zi (t) starting from ˆ ζ,k,i converge to time Ti ), the local estimated neural weights W ∗ , and locally small neighborhoods of their optimal values Wζ,k,i accurate approximations of the nonlinear uncertain dynamics ˆ T Sk,i (Zi ), as fk,i (Zi ) (∀k ∈ I[1, 3]) in (14) are obtained by W k,i T ¯ Sk,i (Zi ), where for all i ∈ I[1, N], k ∈ I[1, 3] well as by W k,i ˆ k,i (t) ¯ k,i = meant∈[ta,i ,tb,i ] W W

(24)

with [ta,i , tb,i ] (tb,i > ta,i > Ti ) representing a time segment after the transient process.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 8

IEEE TRANSACTIONS ON CYBERNETICS

Proof: From Theorem 3, we have shown that for all i ∈ I[1, N], ηi will closely track the periodic signal ηd,i = η0 + di∗ in finite time Ti . In addition, (10) implies that νi will also closely track the signal JiT (ηi )η˙ˆ 0i since both z1,i and z2,i will converge to a small neighborhood around zero according to Theorem 2. Moreover, since η˙ˆ 0i will converge to η˙ 0 according to Theorem 1, and Ji (ηi ) is a bounded rotation matrix, νi will be also a periodic signal after finite time Ti , because η˙ 0 is periodic under Assumption 1. Consequently, since the RBF NN input Zi (t) = col{ηi , νi } is made as a periodic signal for all t ≤ Ti , the PE condition of some internal closed-loop signals, i.e., the RBF NN regression subvector Sζ,k,i (Zi ) (∀t ≥ Ti ), is satisfied according to Lemma 1. It should be noted that the periodicity of Zi (t) would lead to PE of the regression subvector Sζ,k,i (Zi ), but usually not the PE of the whole regression vector Sk,i (Zi ). Thus, we term this PE condition as a partial PE condition, and we will show the convergence of the assoˆ ζ,k,i → W ∗ , rather ciated local estimated neural weights W ζ,k,i ∗ ˆ k,i → W . than W k,i Thus, to prove accurate convergence of local neural weights ˆ ζ,k,i associated with the regression subvector Sζ,k,i (Zi ) under W the satisfaction of partial PE condition, we first rewrite the closed-loop dynamics of z1,i and z2,i along the periodic tracking orbit ϕζ,i (Zi (t))|t≥Ti by using the localization property of the Gaussian RBF NN z˙1,i = −K1,i z1,i + Ji (ηi )z2,i z˙2,i = Mi−1 (−Fi (Zi ) + τi ) − α˙ i

∗T F = Mi−1 −Wζ,i Sζ,i (Zi ) − ζ,i − JiT (ηi )z1,i − K2,i z2,i T F ˆ ζ,i ˆ T¯ SF¯ (Zi ) +W Sζ,i (Zi ) + W ζ ,i ζ ,i

T F  ˜ ζ,i Sζ,i (Zi ) − ζ,i = Mi−1 −JiT (ηi )z1,i − K2,i z2,i + W ∗T SF (Z ) +  ∗T F where Fi (Zi ) = Wζ,i ζ,i with Wζ,i Sζ,i (Zi ) = ζ,i i

∗T S ∗T ∗T T [Wζ,1,i ζ,1,i (Zi ), Wζ,2,i Sζ,2,i (Zi ), Wζ,3,i Sζ,3,i (Zi )] , and ζ,i =

T ζ,1,i ζ,2,i ζ,3,i being the approximation error, and ˆ T SF (Zi ) + W ˆ T SF (Zi ) = W ˆ iT SiF (Zi ) with subscripts ζ and W ζ,i ζ,i ζ¯ ,i ζ¯ ,i

ζ¯ denoting the regions close to and far away from the periodic trajectory ϕζ,i (Zi (t))|t≥Ti , respectively. According to [23], ˆ T SF (Zi ) is small, and the NN local approximation error W ζ¯ ,i ζ¯ ,i  =  ˆ T SF (Zi ) with    = O(ζ,i ) is also a ζ,i ζ,i − W ζ,i ζ¯ ,i ζ¯ ,i small number. Thus, the overall closed-loop adaptive learning system can be described by ⎡ ⎤ ⎡ ⎤ z˙1,i −K1,i Ji (ηi ) ⎢ z˙ ⎥ i ⎥ ⎢ 2,i ⎥ ⎢ −M −1 J T (ηi ) −M −1 K2,i ⎥ ⎢ ⎥ ⎢ i i i ⎥ ⎢ ˙˜ ⎥ ⎢ ⎥ ⎢ Wζ,1,i ⎥ = ⎢ 0 −ζ,1,i Sζ,1,i (Zi ) ⎥ ⎢ ⎥ ⎢ ⎢W ⎥ ⎣ 0 −ζ,2,i Sζ,2,i (Zi ) ˙˜ 0 ⎦ ⎣ ζ,2,i ⎦ 0 −ζ,3,i Sζ,3,i (Zi ) ˙˜ W ζ,3,i ⎡ ⎤ ⎡ ⎤ 0 z1,i  ⎢ z2,i ⎥ ⎢ ⎥ −ζ,i ⎢ ⎥ ⎢ ⎥ ⎢ ⎢ ⎥ ˜ ˆ × ⎢ Wζ,1,i ⎥ + ⎢ −σi,1 ζ,1,i Wζ,1,i ⎥ ⎥ (25) ⎣W ˜ ζ,2,i ⎦ ⎣ −σi,2 ζ,2,i W ˆ ζ,2,i ⎦ ˜ ζ,3,i ˆ ζ,3,i W −σi,3 ζ,3,i W

and

⎤ ⎤ ⎡ ˆ ˙ − S + σ W (Z )z ¯ ¯ ¯ i 2,i i,1 ˜¯ ζ ,1,i ζ ,1,i ζ ,1,i W

⎥ ⎢ ˙ ζ ,i,1 ⎥ ⎢ ⎢ ⎥ ⎥ ⎢W ˆ ˜ = − S + σ W (Z )z ⎢ ¯ ¯ ¯ i 2,i i,2 ζ ,2,i ⎥ ⎣ ζ¯ ,i,2 ⎦ ⎣ ζ ,2,i ζ ,2,i ⎦

˙˜ W ˆ ζ¯ ,3,i −ζ¯ ,3,i Sζ¯ ,3,i (Zi )z2,i + σi,3 W ζ¯ ,i,3 ⎡

(26)

where ⎡



⎢ ST (Zi ) ⎢ i = ⎢ −1 ⎢ ζ,1,i 0 ⎣Mi ⎣ 0

0 0 T (Z ) Sζ,2,i i

0

0 0 T (Z ) Sζ,3,i i

⎤ ⎤⎥ ⎥ ⎥⎥ ⎦⎦

for all i ∈ I[1, N]. The exponential stability property of the nominal part of subsystem (25) has been well-studied in [23], [33], and [34], where it is stated that PE of Sζ,k,i (Zi ) ˜ ζ,k,i ) = will guarantee exponential convergence of (z1,i , z2,i , W 0 for all i ∈ I[1, N] and k ∈ I[1, 3]. Based on this,   = O( ) = O( ), and σ  ˆ ζ,k,i can since ζ,i ζ,i i k,i ζ,k,i W be made small by choosing sufficiently small σk,i for all i ∈ I[1, N], k ∈ I[1, 3], both the state error signals (z1,i , z2,i ) ˜ ζ,k,i , (∀i ∈ I[1, N], k ∈ and the local parameter error signals W I[1, 3]) will converge exponentially to small neighborhoods of zero, with the sizes of the neighborhoods determined by the RBF NN ideal approximation error i as in (15) and ˆ ζ,k,i . σk,i ζ,k,i W ˆ ζ,k,i → W ∗ implies that along the The convergence of W ζ,k,i periodic trajectory ϕζ,i (Zi (t))|t≥Ti , we have ∗T fk,i (Zi ) = Wζ,k,i Sζ,k,i (Zi ) + ζ,k,i T T ˆ ζ,k,i ˜ ζ,k,i =W Sζ,k,i (Zi ) − W Sζ,k,i (Zi ) + ζ,k,i T ˆ ζ,k,i =W Sζ,k,i (Zi ) + ζ1 ,i,k T ¯ ζ,k,i =W Sζ,k,i (Zi ) + ζ2 ,i,k

where for all i ∈ I[1, N], k ∈ I[1, 3], ζ1 ,k,i = ζ,k,i − ˜ T Sζ,k,i (Zi ) = O(ζ,i ) due to the convergence of W ζ,k,i ˜ ζ,k,i → 0. The last equality is obtained according to the W ¯ ζ,k,i being the corresponding subdefinition of (24) with W ¯ k,i along the periodic trajectory ϕζ,i (Zi (t))|t≥Ti , vector of W ¯ T Sζ,k,i (Zi ). and ζ2 ,i,k being an approximation error using W ζ,k,i Apparently, after the transient process, we will have ζ2 ,k,i = O(ζ1 ,k,i ), ∀i ∈ I[1, N], k ∈ I[1, 3]. On the other hand, for the neurons with centers far away from the trajectory ϕζ,i (Zi (t))|t≥Ti , Sζ¯ ,k,i (Zi ) will become very small due to the localization property of Gaussian RBF ˆ k,i (0) = 0, it can NNs. From the adaptation law (17) with W be seen that the small values of Sζ¯ ,k,i (Zi ) will only slightly ˆ ζ¯ ,k,i . activate the adaptation of the associated neural weights W T ˆ ζ¯ ,k,i and W ˆ ¯ ζ¯ ,k,i and Thus, both W S (Z ), as well as W ζ¯ ,k,i ζ¯ ,k,i i T ¯ Wζ¯ ,k,i Sζ¯ ,k,i (Zi ), will remain very small for all i ∈ I[1, N], k ∈ I[1, 3] along the periodic trajectory ϕζ,i (Zi (t))|t≥Ti . This ¯ T Sk,i (Zi ) ˆ T Sk,i (Zi ) and W means that the entire RBF NN W k,i k,i can be used to locally accurately approximate the unknown

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. YUAN et al.: FORMATION LEARNING CONTROL OF MULTIPLE AUVs WITH HETEROGENEOUS NONLINEAR UNCERTAIN DYNAMICS

function fk,i (Zi ) along the periodic trajectory ϕζ,i (Zi (t))|t≥Ti , that is T T ˆ k,i ˆ ζ,k,i Sζ,k,i (Zi ) + ζ1 ,k,i = W Sk,i (Zi ) + 1,k,i fk,i (Zi ) = W T T ¯ ¯ = Wζ,k,i Sζ,k,i (Zi ) + ζ2 ,k,i = Wk,i Sk,i (Zi ) + 2,k,i

with the approximation accuracy level of 1,k,i = ζ1 ,k,i − ˆ T Sζ¯ ,k,i (Zi ) = O(ζ1 ,k,i ) = O(k,i ) and 2,k,i = ζ2 ,k,i − W ζ¯ ,k,i ¯ WζT¯ ,k,i Sζ¯ ,k,i (Zi ) = O(ζ2 ,k,i ) = O(k,i ) for all i ∈ I[1, N], k ∈ I[1, 3]. This ends the proof. Remark 7: The key idea in the proof of Theorem 4 is inspired from [23]. For more detailed analysis on the learning performance, including quantitative analysis on the learning accuracy level 1,i,k as well as 2,i,k and the learning speed, please refer to [33]. Furthermore, the AUV nonlinear dynamics (14) to be identified does not contain any timevarying random disturbances. This is important to ensure the accurate identification/learning performance under the deterministic learning framework. To understand the effects of time-varying external disturbances on the deterministic learning performance, interested readers are referred to [34] for more details. Remark 8: Based on (24), to obtain the constant RBF NN ¯ k,i for all i ∈ I[1, N], k ∈ I[1, 3], one needs to impleweights W ment the formation learning control law (16), (17) first. Then, according to Theorem 4, after a finite-time transient process, ˆ k,i will converge to constant steadythe RBF NN weights W state values. Thus, one can select a time segment [ta,i , tb,i ] with tb,i > ta,i > Ti for all i ∈ I[1, N] to record and store the ˆ k,i (t) for t ∈ [ta,i , tb,i ]. Finally, based on RBF NN weights W ¯ k,i can be calculated off-line using (24). these recorded data, W Remark 9: It is shown in Theorem 4 that locally accurate learning of each individual AUV nonlinear uncertain dynamics can be achieved using localized RBF NN along the periodic trajectory ϕζ,i (Zi (t))|t≥Ti , and the learned knowledge can be further represented and stored in a time-invariant ¯ T Sk,i (Zi ) for all fashion using constant RBF NNs, i.e., W k,i i ∈ I[1, N], k ∈ I[1, 3]. In contrast to many existing techniques (e.g., [35] and [36]), this is the first time, to the authors’ best knowledge, that locally accurate identification and knowledge representation using constant RBF NNs are accomplished and rigorously analyzed for multi-AUV formation control. VI. F ORMATION L EARNING C ONTROL U SING E XPERIENCES In this section, we will further address the objective 2) of Problem 1 to realize formation control without readapting to the AUV’s nonlinear uncertain dynamics. To this end, consider the multiple AUV systems (1) and the virtual leader dynamics (2), we adopt the DA observer (3), (4) to cooperatively estimate the leader’s state information, and replace the DDL control law (16), (17) using the following constant RBF NN controller without online adaptation of the NN weights: ¯ iT SiF (Zi ) + Mi α˙ i τi = −JiT (ηi )z1,i − K2,i z2,i + W

(27)

¯ iT SiF (Zi ) = [W ¯ T S1,i (Zi ), W ¯ T S2,i (Zi ), W ¯ T S3,i (Zi )]T . where W 1,i 2,i 3,i T ¯ Sk,i (Zi ) is the locally accurate RBF NN approximation of W k,i

9

the nonlinear uncertain function fk,i (Zi ) along ϕζ,i (Zi (t))|t≥Ti , ¯ k,i are obtained and the associated constant neural weights W from the formation learning control process as discussed in Remark 8. Theorem 5: Consider the multi-AUV system (1) and the virtual leader dynamics (2) with the network communication topology G. Under Assumptions 1 and 2, the formation control performance (i.e., ηi converges to η0 + di∗ exponentially with the same η0 and di∗ defined in Theorem 3 for all i ∈ I[1, N]) can be achieved by using the DA observer (3), (4) and the constant RBF NN control law (27) with the constant NN weights obtained from (24). Proof: The closed-loop system for each local AUV agent can be formed by interconnecting the controller (27) to the AUV dynamics (1) z˙1,i = −K1,i z1,i + Ji (ηi )z2,i ¯ iT SiF (Zi ) − Fi (Zi )) z˙2,i = Mi−1 (−JiT (ηi )z1,i − K2,i z2,i + W = Mi−1 (−JiT (ηi )z1,i − K2,i z2,i − 2,i ),

∀i ∈ I[1, N]

where 2,i = [21,i , 22,i , 23,i ]T . Then consider the Lyapunov function candidate Vz,i = (1/2)zT1,i z1,i +(1/2)zT2,i Mi z2,i , whose derivative along the above closed-loop system is    V˙ z,i = zT1,i −K1,i z1,i + Ji (ηi )z2,i + zT2,i −JiT (ηi )z1,i  − K2,i z2,i − 2,i = −zT1,i K1,i z1,i − zT2,i K2,i z2,i − zT2,i 2,i . Choose K2,i = K1,i + K22,i with K1,i , K22,i ∈ S3+ , then through completion of squares we have −zT2,i K22,i z2,i − T  /4λ(K ∗ 2 /4λ(K zT2,i 2,i ≤ (2,i ≤ (2,i 2,i 22,i )) 22,i )), T ˙ which implies that Vz,i ≤ −z1,i K1,i z1,i − zT2,i K1,i z2,i + ∗ 2 /4λ(K (2,i 22,i )) ≤ −ρi Vz,i + δi (∀i ∈ I[1, N]), where ρi = ∗ 2 /4λ(K min{2λ(K1,i ), (2λ(K1,i )/λ(Mi ))} and δi = (2,i 22,i )). Following a similar argument as in the proof of Theorem 2, we can conclude from the above inequality that all the signals in the closed-loop system remain bounded and ηi − ηˆ d,i will converge to a small neighborhood around zero in finite time, and the size of this neighborhood can be made arbitrarily small by choosing sufficiently large λ(K1,i ) > 0 and λ(K2,i ) > λ(K1,i ) > 0 for all i ∈ I[1, N]. On the other hand, according to Theorem 1, under Assumptions 1 and 2, the DA observer (3), (4) will render exponential convergence of ηˆ 0i → η0 . This, together with the above discussions, further implies that ηi will converge to ηd,i = η0 + di∗ exponentially, confirming the formation control objective. Remark 10: It is interesting to see that built upon the locally accurate learning results in Section V, the new distributed control protocol consisting of (3), (4), and (27) enables stable formation control along a repeated formation pattern. Compared to the formation learning control mechanism in Section IV using (3), (4) with (16) and (17), the results established in this section do not require any online RBF NN adaptation for all the AUV agents, which will significantly reduce the computational burden and thus facilitate implementation of the proposed distributed RBF NN formation control protocol. This constitutes another important novelty

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 10

IEEE TRANSACTIONS ON CYBERNETICS

TABLE I AUV S YSTEM PARAMETERS

(a)

Fig. 2.

Desired formation pattern.

(b)

Fig. 3.

Network topology G (agent 0 is the virtual leader).

of the proposed formation learning control scheme, compared to many existing techniques (e.g., [35] and [36]). VII. S IMULATION S TUDIES Consider the multiple heterogeneous AUV systems in the form of (1) with ⎡ ⎤ m11,i 0 0 m22,i m23,i ⎦ Mi = ⎣ 0 0 m23,i m33,i ⎡ ⎤ 0 0 −m22,i vi − m23,i ri ⎦ 0 0 −m11,i ui Ci = ⎣ m22,i vi + m23,i ri −m11,i ui 0 ⎤ ⎡ d11,i (νi ) 0 0 d22,i (νi ) d23,i (νi )⎦, gi = 0 Di = ⎣ 0 0 d32,i (νi ) d33,i (νi )

T i = 1,i (χi ) 2,i (χi ) 3,i (χi ) , ∀i ∈ I[1, 5] where m11,i = mi − Xu˙ ,i , m22,i = mi − Yv˙ ,i , m23,i = mi xg,i − Yr˙ ,i , m33,i = Iz,i − Nr˙ ,i , d11,i = −(Xu,i + Xuu,i ui ), d22,i = −(Yv,i + Yvv,i vi  + Yrv,i ri ), d23,i = −(Yr,i + Yvr,i vi  + Yrr,i ri ), d32,i = −(Nv,i + Nvv,i vi  + Nrv,i ri ), and d33,i = −(Nr,i +Nvr,i vi +Nrr,i ri ). The coefficients {X(·) , Y(·) , N(·) }

(c) Fig. 4. Distributed observer state convergence. (a) xˆ 0i → x0 (m). (b) yˆ i0 → y0 (m). (c) ψˆ 0i → ψ0 (deg).

are hydrodynamic parameters according to the notation of [30] and [37]. Detailed nomenclature can be found therein. For simulation purpose, the associated system parameters are borrowed from [37] (with slight modifications for different AUV agents) and listed in Table I with xg,i = 0.05 and Yr˙ ,i = Yrv,i = Yvr,i = Yrr,i = Nrv,i = Nrr,i = Nvv,i = Nvr,i = Nr,i = 0 for all i ∈ I[1, 5], and the model uncertainties are given by

T 1 = 0, 2 = 0.2u22 + 0.3v2 −0.95 0.33r2 

T 3 = −0.58 + cos(v3 ) 0.23r33 0.74u23

T 4 = −0.31 0 0.38u24 + v34

T 5 = sin(v5 ) cos(u5 + r5 ) −0.65 .

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. YUAN et al.: FORMATION LEARNING CONTROL OF MULTIPLE AUVs WITH HETEROGENEOUS NONLINEAR UNCERTAIN DYNAMICS

11

(a)

Fig. 6.

Multiple AUVs formation control.

(b)

ˆ 1,3 : (red) –; W ˆ 2,3 : Fig. 7. L2 norms of partial NN weights for AUV 3. W ˆ 3,3 : (blue) -.-. (gree) - -; W

(c) Fig. 5. Position tracking control. (a) xi → x0 (m). (b) yi → y0 (m). (c) ψi → ψ0 (deg).

It is easy to verify that with the above leader dynamics, Assumption 1 is satisfied, and the position tracking reference signal will be η0 = [80 sin(t), 80 cos(t), 80 sin(t)]T . The associated desired distances for all the AUV agents with respect to the leader’s position are chosen as d1∗ = 0, d2∗ = [10, −10, 0]T , d3∗ = [10, 10, 0]T , d4∗ = [−10, 10, 0]T , and d5∗ = [−10, −10, 0]T . The underlying communication topology G among these 5 AUV agents is shown in Fig. 3, which contains a spanning tree with the virtual leader (agent 0) as the root, thus Assumption 2 is satisfied. A. Simulation for DDL Formation Learning Control

We use N = 5 AUV agents to form a desired formation pattern as depicted in Fig. 2 with the AUV 1’s position η1 tracking a periodic signal η0 generated from the following virtual leader dynamics: ⎡

⎡ ⎢ ⎢ ⎢ ⎢⎡ ⎢ −1 ⎢ ⎣⎣ 0 0   η0 (0) = 0 80 ν0 (0)   η˙ 0 = ν˙ 0

1 ⎣0 0

0 0 1 0 0

⎤ 0 0⎦ −1 80

0 −1 0 0

0

T 80 .

⎤⎤ 0 0⎦⎥ ⎥  1 ⎥ ⎥ η0 ⎥ ν0 ⎥ ⎦ (28)

We first examine the formation learning control performance using the DA observer (3), (4) with the RBF-NN-based DDL control law (16), (17). For each i ∈ I[1, 5], since the associated nonlinear uncertain functions Fi (Zi ) in (14) to be approximated via RBF NNs are functions of νi , which implies the NN input Zi = [ui , vi , ri ]T , we thus construct ˆ T Sk,i (Zi ) using 8 × 8 × 8 = 512 the Gaussian RBF NN W k,i neuron nodes, with the centers evenly placed over the state space [−100, 100]×[−100, 100]×[−100, 100], and the widths γk,i = 60 for all i ∈ I[1, 5], k ∈ I[1, 3]. The observer and controller parameters are selected as β1 = β2 = 5, and K1,i = diag{30, 30, 30}, K2,i = diag{50, 50, 50} with k,i = 10, σk,i = 0.0001 for all i ∈ I[1, 5], k ∈ I[1, 3]. With initial

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 12

IEEE TRANSACTIONS ON CYBERNETICS

(a)

(a)

(b)

(b)

(c)

(c)

ˆ k,3 Sk,3 (Z3 ): Fig. 8. Function approximation for AUV 3. fk,3 (Z3 ): (red) - -; W ¯ k,3 Sk,3 (Z3 ): (green) - -. Approximation of (a) f1,3 (Z3 ), (blue) -.-; W (b) f2,3 (Z3 ), and (c) f3,3 (Z3 ).

conditions η1 (0) = [30, 60, 0]T , η2 (0) = [50, 40, 0]T , η3 (0) = [50, 80, 0]T , η4 (0) = [10, 70, 0]T , η5 (0) = [10, 30, 0]T and zero initial conditions for all the distributed ˆ k,i observer states (χˆ 0i , Aˆ i0 ) and the DDL controller states W (∀i ∈ I[1, 5], k ∈ I[1, 3]), we carry out the time-domain simulation using the DDL formation learning control law (16), (17) with (3) and (4). Simulation results are plotted in Figs. 4–8. The observation performance using the DA observer (3), (4) is shown in Fig. 4, which indicates perfect convergence of the observer states ηˆ 0i (∀i ∈ I[1, 5]) to the leader’s states η0 . The AUV position tracking control responses are given in Fig. 5. It is

Fig. 9. Comparison on position tracking control for AUV 3 using different control laws. (a) xi → x0 (m). (b) yi → y0 (m). (c) ψi → ψ0 (deg).

clearly shown that the AUV agent 1 is indeed tracking to the leader’s position signal (see the tracking errors in Fig. 5(a) along the x-axis, Fig. 5(b) along the y-axis, and Fig. 5(c) for vehicle heading), while the AUV agents 2–5 follow AUV 1 by maintaining the desired distances along both x and y-axes with a same vehicle heading angle. Moreover, the real-time tracking control performance with a predesignated formation pattern (as in Fig. 2) is illustrated in Fig. 6. For simplicity of presentation, we have selected the AUV agent 3 as a representative to demonstrate the RBF NN’s locally accurate learning performance. Fig. 7 shows that convergence of RBF NN weights is indeed achieved, which verifies the results of Theorem 4. We have also had the NN approximation results

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. YUAN et al.: FORMATION LEARNING CONTROL OF MULTIPLE AUVs WITH HETEROGENEOUS NONLINEAR UNCERTAIN DYNAMICS

(a)

13

formation control law, which is composed by the DA observer (3), (4) and the constant RBF NN controller (27). To this end, we consider the same virtual leader dynamics (28) to generate the same position tracking reference signals as in Section VII-A. For fairness of comparison, we choose the same initial conditions and same control gains to conduct the nonlinear simulation. The simulation results for the selected representative AUV agent 3 are plotted in Figs. 9 and 10. It can be seen from Fig. 9 that with very subtle difference comparing to the adaptive control law (16), (17), satisfactory tracking control performance can also be achieved using the constant RBF NN control law (27). Note that this is accomplished without any online recalculation/readaption of the NN weights, which could significantly reduce the computational burden in controller implementation and thus save system energy, especially when large number of neurons are involved. Comparative control input responses using these two different control laws can be observed from Fig. 10. Note that the simulation results in Fig. 10 exhibit large AUV control input responses, which may not be desirable in practice. A promising solution to overcome this issue is to consider certain input saturation constraints for the AUV formation control algorithm design. This however, is beyond the scope of the current work, and will be pursued in our future research. VIII. C ONCLUSION

(b)

(c) Fig. 10. Comparison on control input for AUV 3 using different control laws. (a) τ1,3 . (b) τ2,3 . (c) τ3,3 .

of the unknown system dynamics F3 (Z3 ) plotted in Fig. 8 ¯ T Sk,3 (Z3 ) (∀k ∈ I[1, 3]), ˆ T Sk,3 (Z3 ) and W using RBF NN W k,3 k,3 respectively. It is seen that locally accurate approximations are indeed achieved, and the learned knowledge of the AUV nonlinear dynamics can be stored/represented using localized constant RBF NNs. B. Simulation for Formation Learning Control Using Experiences We further examine the distributed control performance for the multi-AUV system using the experience-based distributed

This paper addressed the formation learning control problem for multiple AUVs with heterogeneous nonlinear uncertain dynamics under the virtual leader framework. A novel twolayer distributed formation learning control design scheme has been proposed, which consists of an upper-layer DA observer for estimating the virtual leader’s state and dynamics information through inter-AUV cooperation, and a lower-layer DDL controller to achieve formation tracking control and local RBF NN learning performance. It has been demonstrated theoretically and by extensive simulations that in contrast to many existing techniques, the proposed formation learning control protocol can be designed and implemented by each local AUV agent in a fully distributed fashion without using any global information. Another unique feature of the proposed control scheme lies in its distinctive capabilities of accurately identifying/learning the nonlinear uncertain dynamics of heterogeneous AUVs from closed-loop formation control, and enabling experiences reutilization for stable formation control with improved performance. We emphasize here that this paper aims to make the first step for future research on distributed cooperative intelligent learning control of multiagent systems. Several important issues along this research line are yet to be adequately addressed in the future, which include the following. 1) Considering more realistic underwater control circumstances, including denied GPS navigation, imperfect inter-AUV communication with time delays and time-varying/switching communication topologies, actuator saturations, underactuated effects, time-varying external disturbances, and obstacle/collision avoidance issues.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 14

IEEE TRANSACTIONS ON CYBERNETICS

2) Deepening the research on cooperative formation learning control of multi-AUV systems by considering online knowledge/experiences sharing among neighboring AUV agents. 3) Considering more complicated structures for the virtual leader (such as nonlinear/switching-type leader dynamics [38]) to generate more sophisticated tracking reference signals for more complex formation control tasks. R EFERENCES [1] T. I. Fossen, Guidance and Control of Ocean Vehicles. New York, NY, USA: Wiley, 1994. [2] G. L. Foresti, “Visual inspection of sea bottom structures by an autonomous underwater vehicle,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 31, no. 5, pp. 691–705, Oct. 2001. [3] J. Gao, A. A. Proctor, Y. Shi, and C. Bradley, “Hierarchical model predictive image-based visual servoing of underwater vehicles with adaptive neural network dynamic control,” IEEE Trans. Cybern., vol. 46, no. 10, pp. 2323–2334, Oct. 2016. [4] R. Cui, C. Yang, Y. Li, and S. Sharma, “Adaptive neural network control of AUVs with control input nonlinearities using reinforcement learning,” IEEE Trans. Syst., Man, Cybern., Syst., vol. 47, no. 6, pp. 1019–1029, Jun. 2017, doi: 10.1109/TSMC.2016.2645699. [5] W. Ren and R. W. Beard, Distributed Consensus in Multi-Vehicle Cooperative Control. London, U.K.: Springer-Verlag, 2008. [6] R. Cui, S. S. Ge, R. V. E. How, and Y. S. Choo, “Leader–follower formation control of underactuated autonomous underwater vehicles,” Ocean Eng., vol. 37, nos. 17–18, pp. 1491–1502, 2010. [7] Z. Peng, D. Wang, Z. Chen, X. Hu, and W. Lan, “Adaptive dynamic surface control for formations of autonomous surface vehicles with uncertain dynamics,” IEEE Trans. Control Syst. Technol., vol. 21, no. 2, pp. 513–520, Mar. 2013. [8] P. Millán, L. Orihuela, I. Jurado, and F. R. Rubio, “Formation control of autonomous underwater vehicles subject to communication delays,” IEEE Trans. Control Syst. Technol., vol. 22, no. 2, pp. 770–777, Mar. 2014. [9] S. Yin, H. Yang, and O. Kaynak, “Coordination task triggered formation control algorithm for multiple marine vessels,” IEEE Trans. Ind. Electron., vol. 64, no. 6, pp. 4984–4993, Jun. 2016, doi: 10.1109/TIE.2016.2574301. [10] R. Rout and B. Subudhi, “A backstepping approach for the formation control of multiple autonomous underwater vehicles using a leader–follower strategy,” J. Marine Eng. Technol., vol. 15, no. 1, pp. 38–46, 2016. [11] R. Cui, Y. Li, and W. Yan, “Mutual information-based multi-AUV path planning for scalar field sampling using multidimensional RRT,” IEEE Trans. Syst., Man, Cybern., Syst., vol. 46, no. 7, pp. 993–1004, Jul. 2016. [12] T. Balch and R. C. Arkin, “Behavior-based formation control for multirobot teams,” IEEE Trans. Robot. Autom., vol. 14, no. 6, pp. 926–939, Dec. 1998. [13] J. R. T. Lawton, “A behavior-based approach to multiple spacecraft formation flying,” Ph.D. dissertation, Dept. Elect. Comput. Eng., Brigham Young Univ., Provo, UT, USA, 2000. [14] W. Ren and N. Sorensen, “Distributed coordination architecture for multi-robot formation control,” Robot. Auton. Syst., vol. 56, no. 4, pp. 324–333, 2008. [15] S. S. Ge and C.-H. Fua, “Queues and artificial potential trenches for multirobot formations,” IEEE Trans. Robot., vol. 21, no. 4, pp. 646–656, Aug. 2004. [16] C.-H. Fua, S. S. Ge, K. D. Do, and K.-W. Lim, “Multirobot formations based on the queue-formation scheme with limited communication,” IEEE Trans. Robot., vol. 23, no. 6, pp. 1160–1169, Dec. 2007. [17] F. Fahimi, “Full formation control for autonomous helicopter groups,” Robotica, vol. 26, no. 2, pp. 143–156, 2008. [18] H. Yang, C. Wang, and F. Zhang, “A decoupled controller design approach for formation control of autonomous underwater vehicles with time delays,” IET Control Theory Appl., vol. 7, no. 15, pp. 1950–1958, Oct. 2013. [19] L. Paull, S. Saeedi, M. Seto, and H. Li, “AUV navigation and localization: A review,” IEEE J. Ocean. Eng., vol. 39, no. 1, pp. 131–149, Jan. 2014.

[20] J. W. Nicholson and A. J. Healey, “The present state of autonomous underwater vehicle (AUV) applications and technologies,” Marine Technol. Soc. J., vol. 42, no. 1, pp. 44–51, 2008. [21] B. Das, B. Subudhi, and B. B. Pati, “Cooperative formation control of autonomous underwater vehicles: An overview,” Int. J. Autom. Comput., vol. 13, no. 3, pp. 199–225, 2016. [22] M. Krstic, I. Kanellakopoulos, and P. Kokotovic, Nonlinear and Adaptive Control Design. New York, NY, USA: Wiley, 1995. [23] C. Wang and D. J. Hill, Deterministic Learning Theory for Identification, Recognition and Control. Boca Raton, FL, USA: CRC Press, 2009. [24] W. Ren and R. W. Beard, “Consensus seeking in multiagent systems under dynamically changing interaction topologies,” IEEE Trans. Autom. Control, vol. 50, no. 5, pp. 655–661, May 2005. [25] J. Park and I. W. Sandberg, “Universal approximation using radialbasis-function networks,” Neural Comput., vol. 3, no. 2, pp. 246–257, Jun. 1991. [26] M. J. D. Powell, The Theory of Radial Basis Function Approximation in 1990. London U.K.: Oxford Univ. Press, 1992. [27] Y. Su and J. Huang, “Cooperative output regulation with application to multi-agent consensus under switching network,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 42, no. 3, pp. 864–875, Jun. 2012. [28] C. Yuan, “Leader-following consensus of parameter-dependent networks via distributed gain-scheduling control,” Int. J. Syst. Sci., vol. 48, no. 10, pp. 2013–2022, 2017. [29] Y. Su and J. Huang, “Cooperative output regulation of linear multi-agent systems,” IEEE Trans. Autom. Control, vol. 57, no. 4, pp. 1062–1066, Apr. 2012. [30] T. T. J. Prestero, “Verification of a six-degree of freedom simulation model for the REMUS autonomous underwater vehicle,” Ph.D. dissertation, Dept. Ocean Eng., Massachusetts Inst. Technol., Cambridge, MA, USA, 2001. [31] H. Cai, F. L. Lewis, G. Hu, and J. Huang, “Cooperative output regulation of linear multi-agent systems by the adaptive distributed observer,” in Proc. IEEE Conf. Decis. Control, Osaka, Japan, Dec. 2015, pp. 5432–5437. [32] P. A. Ioannon and J. Sun, Robust Adaptive Control. Englewood Cliffs, NJ, USA: Prentice-Hall, 1996. [33] C. Yuan and C. Wang, “Persistency of excitation and performance of deterministic learning,” Syst. Control Lett., vol. 60, no. 12, pp. 952–959, 2011. [34] C. Yuan and C. Wang, “Performance of deterministic learning in noisy environments,” Neurocomputing, vol. 78, no. 1, pp. 72–82, 2012. [35] Z. Peng, J. Wang, and D. Wang, “Distributed maneuvering of autonomous surface vehicles based on neurodynamic optimization and fuzzy approximation,” IEEE Trans. Control Syst. Technol., to be published, doi: 10.1109/TCST.2017.2699167. [36] Z. Peng, D. Wang, Y. Shi, H. Wang, and W. Wang, “Containment control of networked autonomous underwater vehicles with model uncertainty and ocean disturbances guided by multiple leaders,” Inf. Sci., vol. 316, pp. 163–179, Sep. 2015. [37] R. Skjetne, T. I. Fossen, and P. V. Kokotovi´c, “Adaptive maneuvering, with experiments, for a model ship in a marine control laboratory,” Automatica, vol. 41, no. 2, pp. 289–298, 2005. [38] C. Yuan, “Distributed adaptive switching consensus control of heterogeneous multi-agent systems with switched leader dynamics,” Nonlin. Anal. Hybrid Syst., vol. 26, pp. 274–283, Nov. 2017.

Chengzhi Yuan (M’14) received the B.S. and M.S. degrees in control theory and applications from the South China University of Technology, Guangzhou, China, in 2009 and 2012, respectively, and the Ph.D. degree in mechanical engineering from North Carolina State University, Raleigh, NC, USA, in 2016. He is currently an Assistant Professor with the Mechanical, Industrial, and Systems Engineering Department, University of Rhode Island, Kingston, RI, USA. He has authored and co-authored over 40 journal articles and conference papers. His current research interests include dynamic systems and control theory, with particular focuses on cooperative intelligent learning systems, multirobot distributed control, hybrid systems, switching control, and robust analysis. Dr. Yuan has served extensively as an associate editor, the chair, and the co-chair, a program committee member in numerous international conferences.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. YUAN et al.: FORMATION LEARNING CONTROL OF MULTIPLE AUVs WITH HETEROGENEOUS NONLINEAR UNCERTAIN DYNAMICS

Stephen Licht received the Ph.D. degree in oceanographic and mechanical engineering from the MIT/WHOI Joint Program, Cambridge, MA, USA, in 2008, where he created “Finnegan the RoboTurtle.” He was a Senior Research Scientist with Maritime Research Group, iRobot, Bedford, MA, USA, and a Senior Robotics Engineer with Vecna Robotics, Cambridge, MA, USA. He was an Assistant Professor with University of Rhode Island, Kingston, RI, USA, where he is the Director of the Robotics Laboratory for Complex Underwater Environments.

15

Haibo He (SM’11) received the B.S. and M.S. degrees in electrical engineering from the Huazhong University of Science and Technology, Wuhan, China, in 1999 and 2002, respectively, and the Ph.D. degree in electrical engineering from Ohio University, Athens, OH, USA, in 2006. From 2006 to 2009, he was an Assistant Professor with the Department of Electrical and Computer Engineering, Stevens Institute of Technology, Hoboken, NJ, USA. He is currently the Robert Haas Endowed Chair Professor with the Department of Electrical, Computer, and Biomedical Engineering, University of Rhode Island, Kingstown, RI, USA. He has published one sole-author research book (Wiley), edited one book (Wiley-IEEE), and six conference proceedings (Springer), and authored and co-authored over 250 peer-reviewed journal and conference papers. His current research interests include adaptive dynamic programming, computational intelligence, machine learning and data mining, and various applications. Dr. He was a recipient of the IEEE International Conference on Communications Best Paper Award in 2014, the IEEE Computational Intelligence Society Outstanding Early Career Award in 2014, the National Science Foundation CAREER Award in 2011, and the Providence Business News Rising Star Innovator Award in 2011. He severed as the General Chair of the IEEE Symposium Series on Computational Intelligence in 2014. He is the Editor-in-Chief of the IEEE T RANSACTIONS ON N EURAL N ETWORKS AND L EARNING S YSTEMS .

Suggest Documents