Identification of Systems With Regime Switching and ... - CiteSeerX

0 downloads 0 Views 1MB Size Report
starting time, one must consider the uniform error bounds over all possible ... whose parameters change slowly in their magnitude [1], [3],. [31]. In literature ...
34

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 54, NO. 1, JANUARY 2009

Identification of Systems With Regime Switching and Unmodeled Dynamics G. George Yin, Fellow, IEEE, Shaobai Kan, Le Yi Wang, Senior Member, IEEE, and Cheng-Zhong Xu, Senior Member, IEEE

Abstract—This paper is concerned with persistent identification of systems that involve deterministic unmodeled dynamics and stochastic observation disturbances, and whose unknown parameters switch values (possibly large jumps) that can be represented by a Markov chain. Two classes of problems are considered. In the first class, the switching parameters are stochastic processes modeled by irreducible and aperiodic Markov chains with transition rates sufficiently faster than adaptation rates of the identification algorithms. In this case, tracking real-time parameters by output observations becomes impossible and we show that an averaged behavior of the parameter process can be derived from the stationary measure of the Markov chain and can be estimated with periodic inputs and least-squares type algorithms. Upper and lower error bounds are established that explicitly show impact of unmodeled dynamics. In contrast, the second class of problems represents systems whose state transitions occur infrequently. An adaptive algorithm with variable step sizes is introduced for tracking the time-varying parameters. Convergence and error bounds are derived. Numerical results are presented to illustrate the performance of the algorithm. Index Terms—Error bounds, exogenous noise, parameter switching, parameter tracking, persistent identification, unmodeled dynamics.

I. INTRODUCTION

T

HIS work is concerned with identification of systems with regime-switching parameters represented by discrete-time Markov chains [4], [9]. These may be viewed as hybrid systems for which there are many established applications such as manufacturing processes, machine learning, wireless networks, and scientific computation, to name just a few. In such hybrid systems, the systems dynamically switch among a finite number of discrete event states, and the known results in system identification cannot be directly carried over. Manuscript received April 27, 2007; revised March 24, 2008. Current version published January 14, 2009. This work was supported in part by the National Science Foundation under DMS-0603287 and in part by the National Security Agency under Grant MSPF-068-029, in part by a Wayne State University Graduate Research Assistantship, in part by the National Science Foundation under ECS-0329597 and DMS-0624849, and in part by the National Science Foundation under CCF-0611750, CNS-0702488, and DMS-0624849. Recommended by Associate Editor W. X. Zheng. G. G. Yin is with the Department of Mathematics, Wayne State University, Detroit, MI 48202 USA (e-mail: [email protected]). S. Kan is with the Department of Mathematics and Computer Science, John Jay College of Criminal Justice, City University of New York (CUNY), New York, NY 10019 USA (e-mail: [email protected]). L. Y. Wang and C. Z. Xu are with the Department of Electrical and Computer Engineering, Wayne State University, Detroit, MI 48202 USA (e-mail: [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TAC.2008.2009487

Unlike time-invariant systems, identification of time-varying systems requires tracking system parameters persistently over extended time intervals, which can be captured by the concept of persistent identification introduced and treated in [32], [33]. The unique requirement of persistent identification is that rather than establishing identification properties in reference to a fixed starting time, one must consider the uniform error bounds over all possible starting time. While identification errors due to noises can be established with similar algorithms and analysis tools since they rely primarily on data lengths, but not initial time, the impact of unmodeled dynamics is fundamentally different in persistent identification problems from traditional identification formulations. It was shown in [33] that for linear time invariant systems unmodeled dynamics will introduce irreducible identification errors that also intervene with observation noises. A primary goal of this paper is to extend the results of [32], [33] to parameter switching systems or hybrid systems. One typical approach of extending results from time-invariant systems to time-varying systems is to consider slowly timevarying systems in which system dynamics experience small changes over a small period of time [5], [6], [16], [41]. Similarly, stochastic algorithms are widely used to track a system whose parameters change slowly in their magnitude [1], [3], [31]. In literature, another known and well-established approach is to use the Kalman filter, least mean squares, and/or least mean squares with forgetting factors to carry out the tracking tasks; see [12], [13] for example. However, when the system parameters have large jumps, they are no longer “slowly varying” in the traditional sense. Recently in [19] and [40], Markovian jump processes with possibly large jump sizes were investigated. Building upon these results, in this paper we concentrate on persistent identification of systems with possibly large parameter jumps and unmodeled dynamics. Time-varying system parameters are modeled by a Markov chain, whose state space and probability transition matrix are unknown. The information on the parameters is limited to the switching frequency of the Markov chain and bounds on the unmodeled dynamics. It should be emphasized that within traditional identification scenarios without unmodeled dynamics, tracking jumping parameters in a Markov chain has been studied in various settings [7], [8], [14], [15], [19], [23], [29], [36], [38]; see also [11]–[13]. However, to the best of our knowledge, investigation of persistent identification problems with unmodeled dynamics has not been pursued in such systems. The system formulation in this paper encompasses three uncertainty components: random observation noises, deterministic worst-case unmodeled dynamics, and stochastic unknown parameters. The worst-case formulation of system identification was introduced in [25],

0018-9286/$25.00 © 2009 IEEE

YIN et al.: IDENTIFICATION OF SYSTEMS WITH REGIME SWITCHING AND UNMODELED DYNAMICS

commonly referred to as “set membership identification,” and pursued by many researchers; see [26]–[28] and references therein. The worst-case treatment of uncertainty requires different methodologies from those for stochastic systems. It is a challenge to handle them in an integrated framework. In [33], a method that employs worst-case probabilities of identification errors was introduced to treat jointly deterministic unmodeled dynamics and random noises. This paper develops new results that provide combined error bounds to include impact from all three uncertainty components. A. Problems and Contributions In this paper, we consider two classes of regime-switching systems: 1) The fast switching systems in which the probability of a parameter staying at the current value is much smaller than that of jumping to a different value. 2) The slow or infrequent switching systems in which the probability of a parameter staying at the current value is near one. In a companion paper [39], Wonham-type nonlinear filtering techniques are used to treat Markovian parameters with moderate variation and nonlinear observations. When parameter switching is relatively fast in comparison to system dynamics and estimation convergence speed, parameter tracking becomes intractable, a phenomenon of “uncertainty principle” was established in [42]; see also [1]. This paper treats such systems by tracking the average behavior of these systems, which is a viable and perhaps inevitable choice. It is a viable practical choice in problems of mobile agent control, distributed control, hierarchical systems, supervisory and management systems, in which a higher level control or decision making utilizes only averaged performance rather than operational details of subsystems which may experience much higher activities of local operations. It may be inevitable since identification with stochastic observation noises all relies on some type of data averaging to reduce noise effects and achieve parameter convergence. If within a small time window, parameters change values many times, it becomes impossible to distinguish noise and data, and consequently data can only be used to capture the average behavior of the system. For example, when one uses a least squares type procedure for system identification, the convergence rate of parameter estimates resembles essentially a sto. If chastic approximation algorithm with a step size the Markov chain switches states with high probability, faster with a high probability, the identification algothan rithm estimates the average of the system parameters. For such fast switching systems, we show that certain least squares type algorithms can be used to track the parameter average, persistently with any starting time. Since unmodeled dynamics introduces an irreducible bias on the parameter estimates, we characterize a class of inputs that are suitable for reducing identification errors from unmodeled dynamics. In addition, we derive lower bounds on identification errors which cannot be further reduced regardless what inputs are used. It should be emphasized that stronger conditions on inputs than traditional persistent excitation conditions, used primarily in adaptive control systems, are needed to overcome impact of unmodeled dynamics in persistent identification problems. Since this paper is confined to open-loop identification problems, inputs are design variables and these conditions are not practical limitations.

35

In the second class of problems, the states of the Markov chain switch relatively infrequently. This class of systems allows real-time tracking of their parameters with substantial accuracy. To track time-varying parameters, one often uses constant-step-size algorithms. However, selecting the constant step size is largely a nontrivial matter. In [1], an adaptive step size algorithm was suggested for slowly varying parameters. The idea was further exploited in [3] with simulation studies. The main rationale is that one constructs a secondary sequence of estimates for the step size. Here we use a similar idea to construct step size adaptation sequences. Then an adaptive algorithm with variable step sizes is introduced for identifying and tracking the time-varying parameters. It is shown that while noise effects can be reduced by averaging, time-varying effects of parameters and unmodeled dynamics introduce tracking biases. Explicit error bounds on such biases are established. The key methodology in accommodating unmodeled dynamics is a utility of Markov switched differential inclusions which, in our best knowledge, is new in treating unmodeled dynamics for system identification. B. Outline of the Paper The rest of the paper is organized as follows. Section II begins with the formulation of identification problems of regime-switching parameter processes. Section III proceeds with long-run average behavior of parameter estimates for fast switching systems. Convergence and convergence rates are established; upper and lower bounds on estimation errors are obtained to characterize fundamental impact of unmodeled dynamics. Section IV considers identification of infrequently switching parameters. An effective adaptive algorithm for tuning variable step sizes is proposed. Performance of the algorithm is analyzed and demonstrated by a couple of numerical examples. To enhance readability, the proofs of the main results are postponed to an appendix. II. FORMULATION Consider a single-input single-output, discrete-time, lineartime-varying system (1) where is the starting time of observations. In what follows, and for a vector (either finite or infinite dimensional), denote the norm and norm, respectively. The system is BIBO (bounded input bounded output) stable with . The probing input is deterministic . For a and can be selected, subject only to selected model order , the system can be represented by (2) where

denotes the transpose of

for

, and

Note that is an -dimensional vector, and is an infinite-dimensional vector associated with the unmodeled dynamics .

36

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 54, NO. 1, JANUARY 2009

The information on the unmodeled dynamics is only on its uni. For systems with decaying form bound, namely, impulse responses, can be reduced by increasing the model as . order . Since the system is BIBO stable, , starting at , the After applying an input sequence with oboutput is observed during . We will use the following assumptions servation length (A1) and (A2) throughout the paper. as , where is positive (A1) definite. is a sequence of independent (A2) The disturbance and identically distributed (i.i.d.) random variables such , , and the moment that exists. generating function Remark 2.1: In our setup, the input sequence is deterministic and a design variable. Assumption (A1) requires the input be selected to possess a certain persistent excita, tion property. (A1) implies that for any initial time , a fact that will be used in subsequent derivations. There are weaker conditions on inputs that do not require be positive definite, such as those in [12], [13] for adaptive control. However, unlike system tracking in adaptive control systems where input signals or its excitation capability may decay to zero when control performance is achieved, persistent identification requires input excitation capability be shift invariant, namely, sustained with any starting time. It will become apparent that to overcome impact from unmodeled dynamics, needs to be invertible. In this sense, this condition is unique to persistent identification with unmodeled dynamics. It is noted that since is a design variable in open-loop identification problems, such conditions do not impose difficulties in practical implementations. In fact, Condition (A1) is satisfied by all full-rank -periodic signals. Assumption (A2) is concerned with the observation noise. The i.i.d. condition is for simplicity. Existence of moment generating facilitates the derivation of estimation error bounds in what follows. (A2) is satisfied for many random processes, such as stationary Gaussian processes, noises with uniform distributions, etc. III. IDENTIFICATION OF FAST SWITCHING SYSTEMS: LONG-RUN AVERAGE BEHAVIOR AND ESTIMATION ERROR BOUNDS We start with the scenario when the parameter switching is relatively fast to the convergence speed of parameter estimates. Independent of identification algorithms, fast varying systems cannot be accurately identified from observation data due to the uncertainty principle [42]; see also [1]. Instead, a more meaningful problem is to track its average behavior. This problem has important implications and practical utilities. Consider, for instance, the problem of networked mobile agents. Communications among a local cluster of agents are often frequent with fast dynamic variations due to local motions. These detailed information exchange and fast variation of dynamics are, however, of little interests to network coordinators that monitor and manage information flows and decisions among clusters, but not individual agents. Consequently, identification of averaged behavior will suffice. Similarly, for a machine or a network computer that switches frequently among its operating modes due to workload variations, such as “in full operation,” “low usage,” “standby,”

etc., its daily productivity is the average of its productivity on each mode. As a motivational example, consider the following simple example. Suppose that there is a continuous-time system to be identified, whose observation is given by , where is the modeled part of the input is associated with the unknown dynamics. The paramand is a continuous-time Markov chain with a fieter process and generator with being nite state space and is irreducible, within a a small parameter. When very short period of time it reaches the steady state, that is, the averaged system. In this case, it is impossible to track the instantaneous variation of the process from observation data. The of the Markov chain satisfies the fortransition matrix . Denoting a fast-time variward equation , satisfies able by (3) . Fix and let . Then discretizing (3) with a step size yields , where becomes a 1-step transition matrix represents the th-step of a discrete-time Markov chain and transition probability. Then it can be demonstrated that is irreducible. For example, when , . As , ; for the discretized version, for the range of that we are interested in corresponding to . To illustrate, , a continuous-time Markov chain with consider and generator , where state space Let

. For the discussion, we use initial state and time interval [0, 5]. For better visualization, we magnify the interval by a factor of 100. The sample paths demonstrate that the smaller the is, the more variations the process displays (see Fig.1). It is impossible to track the Markovian parameter with reasonable accuracy. Such fast varying parameter processes are our focus of asymptotic behavior of the time-varying systems [18], [40, Sec. 1.4, pp. 17–18]. System identification is often used for control design [24]. For such fast varying systems, although one cannot keep track with the chain’s every move, it is relatively simple to track the average. Using the average or the limit system, one can proceed with optimal or near-optimal control design. That is, one uses optimal or near optimal controls of the limit system in the original system and shows that such controls are nearly optimal under suitable conditions, which provides a viable alternative for non-tractable systems. Note that in our approach, one need not know either the transition probability matrix or the state . space A. Long-Run Average Behavior For the problem treated in this section, in addition to (A1) and (A2), we also assume that the Markov chain is irreducible and aperiodic. This is given in (A3) below. (A3) Let be a discrete-time Markov Chain with fiand transition nite state space is probability matrix . Moreover, the Markov Chain are unknown. irreducible and aperiodic. The and and are independent.

YIN et al.: IDENTIFICATION OF SYSTEMS WITH REGIME SWITCHING AND UNMODELED DYNAMICS

Remark 3.1: Under (A3), as , the Markov chain approaches its stationary behavior through their inby variant measures. Denote the stationary distribution of . To track the average behavior of the system parameters may be considered by using a cost function

(4) denotes the conditional expectation with respect to and is the -step transition probability of the . In the last line of (4), Markov chain we have used the well-known ergodicity of the Markov chain. The objective is to choose so that (4) is minimized, which leads to the optimal solution where

37

in probability as . (ii) as . Moreover, the convergence is uniform in the starting time . Remark 3.4: Assertion (i) in Theorem 3.3 is essentially a consistency result in the sense of convergence in probability. Assertion (ii) indicates that the LS estimators are asymptotically unbiased. Probability one convergence as well as inclusion of unmodeled dynamics are included in the following corollary. Corollary 3.5: (i) Under the conditions of Theorem 3.3, w.p.1 as . (ii) Assume the conditions of Theorem 3.3 are satisfied and the unmodeled dynamics are included in the formulation. In addition, assume (6). Then, (i)

(8) Then, as

, , where

(5) Note that in lieu of (4), we could use a long-run average “cost function” defined by . Using this criterion, the Markov chain need not be aperiodic and irreducibility suffices. Moreover, the minimizer (5) remains same. are unknown, cannot be calculated directly. Since and Here, the goal of identifying a fast-varying system is to construct an identification algorithm and select an input so that can be identified. We will employ signals satisfying (A1) as the probing input and focus on the following standard least squares (LS) estimation procedure for parameter estimation: (6) Error bounds will be established under various conditions on parameter processes . It should be emphasized that by tracking the average behavior of the system, one endures an identification , for each . Using the idea diserror cussed in Remark 3.1, instead of dealing with aforementioned identification error, we examine the cost of the large-time average (4). for positive integer . For simplicity, we consider That is, the observation length is a multiple of the model order. Denote by the following class of input signals: , is -periodic and full rank}; see [33] and references therein for further details.

in probability, and and is defined in

(A1). To obtain w.p.1 convergence, note that since is irreducible and aperiodic, it is a -mixing sequence with exponential mixing rate [2, p. 167]. Thus it is ergodic ([17, p. 488]). This ergodicity together with boundedness of our input sequence leads to the w.p.1 convergence of the parameter estimates. For the inclusion of unmodeled dynamics, note that by virtue of (8) and (A1)

The rest of the proof essentially follows from Theorem 3.3 with slight modifications. The details are thus omitted. In addition to the above consistency results, we can also obtain a central limit theorem. To proceed, consider the vector

(9) We proceed to obtain a functional central limit theorem for . Lemma 3.6: Under (A3), the following assertions hold. converges weakly to a Brownian motion whose (i) mean is zero and whose covariance function is given by with

B. Convergence and Impact of Unmodeled Dynamics Before proceeding further, we present an auxiliary result for the irreducible Markov chain. Then we obtain the convergence of the estimator. Lemma 3.2: Under (A1) and (A3), (7) Theorem 3.3: Assume (A1)–(A3). In the absence of the undefined by modeled dynamics, consider the estimator

where entries

(10) denotes the diagonal matrix with and is the solution of

where . , converges weakly to a (ii) For each with 0 mean and one-dimensional Brownian motion , where . variance

38

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 54, NO. 1, JANUARY 2009

C. Upper Bounds on Estimation Errors

. For each Corollary 3.9: Assume (A1)–(A3) and , an upper bound of is given by

Continuing on our asymptotic study, we aim to evaluate the quality of our estimator. The investigation focuses on the esti. We first present an auxilmation errors, namely, iary result, proceed with the desired upper bounds of estimation errors, and end with the selection of the sample size to achieve desired accuracy. Lemma 3.7: Under assumptions (A1)–(A3), for

(14) with and . 1) Example 3.10: Suppose that the common distribution of , where . Then its moment the noise is . It generating function is given by follows that:

(11)

(15) Furthermore, we obtain

where

(12) and for are defined in Lemma 3.6 (ii). , Denote where is the moment generating function of , which is assumed to exist by virtue of (A2). Recall that denotes the “size” of the unmodeled dynamics. We proceed to prove the following result, which includes the impact of unmodeled dynamics, in a worst-case sense for all possible unmodeled dynamics bounded by , on probabilities of identification errors. Together with the lower bounds established in Theorem 3.13, these results characterize jointly impact of uncertainties from random noise, worst-case unmodeled dynamics, and stochastic unknown parameters. and the Theorem 3.8: Assume (A1)–(A3). Suppose identification algorithm is the least squares estimation (6). Let such that as , . Then for each , the following upper bound holds as :

(13) In system identification, we often want to answer the following important question. How large should the sample size of the observations be so that the estimation error will be within a desirable tolerance level (or confidence level)? To proceed, we introduce

for each , where denotes the infinite-dimensional vector of unmodeled dynamics, LS denotes the class of least squares algorithms, and .

(16) Remark 3.11: Note that and . In view of the above example, it is seen that the bound given in Corollary 3.9 can be further written as in (16) with . To proceed, define , where is the -dimensional identity matrix. For any , let be the estimation error. Similar to [33], let

Since a larger class of algorithms is allowed, and . Thus, any upper bound on is also an upper bound on . We next obtain lower probability bounds for the estimation errors. D. Lower Bounds Recall that the estimation error is defined by , which can be decomposed to three parts, , denotes the error due to unmodeled dynamics, is where is the error the error due to the stochastic disturbances, and , if attributes to the Markov chain. For each and , then . Thus, . , with Similar to [33], define if , if , and if . We then obtain the following lemma. With slight modifications (owing to the presence of the Markov chain), the proof of (i) is similar to [33, Lemma 2] and the proof of (ii) is similar to [33, Lemma 3]. The details are omitted. defined above, (i) if Lemma 3.12: With , , , and and , or and ,

YIN et al.: IDENTIFICATION OF SYSTEMS WITH REGIME SWITCHING AND UNMODELED DYNAMICS

then

. Moreover, (ii) there exists a such that

with

39

The idea of “infrequent switching” in a Markov chain can be captured concretely by the form of the transition matrix (19)

where . Theorem 3.13: Assume (A1)–(A3). Then

(17) . where Corollary 3.14: If in addition to the conditions of Theorem 3.13, the observation noise follows a normal distribution with mean 0 and variance , then

(18) where

,

.

E. A Remark on Moving Average Observation Noise So far, the developments are confined to independent and identically distributed observation noise, or white noise. In fact, much of the development can be extended to correlated random be disturbances of moving average type. To proceed, let a sequence of independent and identically distributed random variables with 0 mean and variance , and redefine the seby , for some quence , . Clearly, has zero mean and vari. Following similar approach as in Theorem 3.8, ance Corollary 3.9, Theorem 3.13, and Corollary 3.14, we obtain the following results. The main steps are essentially the same, only minor modifications are needed. reTheorem 3.15: Assume that (A1)–(A3) hold with . The following assertions hold. (i) Theorem 3.8 placed by and Corollary 3.9 continue to hold. (ii) Assume the conditions of replaced by . Theorem 3.13 and Corollary 3.14 with Then the conclusions of Theorem 3.13 and Corollary 3.14 continue to hold.

is a small number and the row sums of are all where equal to zero. The scenario of infrequent switching is an important alternative to slowly varying systems in which system parameters vary slowly in terms of their sizes. In particular, we will show that the introduction of unmodeled dynamics results in an uncertain dynamic equation of parameter estimates in the form of differential inclusion. Error bounds of such differential inclusion are established [35]. These bounds indicate that unmodeled dynamics render a non-trivial differential inclusion, hence leading to an irreducible estimation bias described by an uncertainty set (a residue set). Convergence to the residue set can be achieved when the rate of switching becomes small. A. Unmodeled Dynamics and Differential Inclusion Formulation: Fixed Step Size Algorithms To construct effective recursive identification algorithms for tracking parameter variations, appropriate selection of the step size is the key issue and a main challenge. It depends on the dynamics of the underlying system, the tracking algorithms, and characteristics of the time-varying parameters. In what follows, is a discrete-time Markov suppose that the true parameter chain in which the jumps occur infrequently, in the sense of the . The system is transition matrix given in (19) with a small still given by (2). To track the Markov chain, if we use a constant step size , we have an algorithm of the following form: (20) As far as the step size selection is concerned, there are three possible choices: (i) , (ii) (for example for some ), and (iii) . In the first case, the Markov chain changes very slowly. Hence, one has a case of near constant parameters. This situation can be handled as in [37]. In the second case, the Markov chain is relatively fast varying, and hence the result in Section III can be used. As for the third case, far reaching results can be obtained. Without . Then we have the following loss of generality, take result. Compared with the results in [36], a nonzero “bias” term is added due to the unmodeled dynamics. Furthermore, since other than the size of the unmodeled dynamics is of the order , not much is known, the limit dynamic system is not a differential equation but a differential inclusion with switching. Proposition 4.1: Suppose (A1)–(A3) hold and the step size . Suppose that (A1) is modified as is selected so that , as follows. For any

IV. MARKOVIAN PARAMETERS WITH INFREQUENT SWITCHING In the previous sections, we considered the cases of timevarying parameters when the transition probability matrix is irreducible and aperiodic. This section is devoted to the study of tracking properties for time-varying Markovian parameters, where the states jump with possibly large sizes but infrequently.

(21) , and is a set with center where 0 whose diameter (using norm) is of the order . Then the following assertions hold.

40

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 54, NO. 1, JANUARY 2009

(i) Choose arbitrary and let and be the piecewise constant interpolations of and , respectively, with and . That is, with and ,

Then as

, converges weakly to that is a solution of the switching differential

inclusion (22) where is a continuous-time Markov chain with generator that is given in (19). such that for all , (ii) There is a (23) (iii) When the unmodeled dynamics are absent, define , and

Then as

, converges weakly to . The limit process is a solution of the equation

where is the variance of given in (A2), is the identity matrix, and is an -dimensional standard Brownian motion. Remark 4.2: It is worthwhile to mention some novelty of the above results. (i) First, in lieu of an ordinary differential equation, (22) may be is a differential inclusion. Note that the set viewed as the set of “uncertainty” due to the unmodeled dynamics. Concerning the initial condition, for simplicity, we choose them to be independent of . For a fuller generality, we may assume and to be -dependent. In this case, we need a condition– converges weakly to . , (ii) It can be demonstrated as in [40, Chapter 4] as converges weakly to a continuous-time Markov , whose generator is given by . We shall use chain this result in the subsequent development. A distinctive feature of (22) is that it includes the continuous-time Markov chain. In the literature of stochastic approximation (with differential inclusion limit), the limit is normally a non-random differential inclusion. The regime-switching ordinary differential equation limit was first obtained in [38]. Here we obtain its differential inclusion counterpart. Thus, it is a regime-switching differential inclusion modulated by a continuous-time Markov chain. The Markov chain presents the time-varying and piecewise deterministic nature of the parameter process and is interesting in its own right. (iii) Equation (23) indicates that for sufficiently large, the mean squares tracking error consists of two parts, namely the part due to parameter variations and the part due to

unmodeled dynamics. When the unmodeled dynamics for suffiare absent, we obtain ciently large. Similar comments apply to the assertions concerning the weak convergence of the interpolated processes. In what follows, our main concern is on developing further tracking capability of adaptive algorithms. B. Adaptive Step Size Algorithms To track a time-varying parameter process, one often uses a constant step size algorithm. However, selecting the constant step size is largely an unsettled matter. It is not a priori clear that one constant step size should be preferred over another. Nevertheless, it is apparent that step size selection will impact significantly on tracking performance. An enticing alternative is to use time-varying step-size sequences, and to adaptively adjust the step sizes. The origin of the adaptive step-size approach can be traced back to [1]. Such ideas were further exploited in [3] with examples from sonar signal processing. Development of algorithms in this paper follows the approaches of [21] and [22]. Adaptive step-size algorithms have been successfully used in spreading code optimization and adaptation in CDMA wireless networks [19], and blind interference suppression in signal processing problems [20]. The rationale of the adaptive step-size algorithm is: One constructs a secondary stochastic approximation algorithm to approximate the “best step size,” in addition to the primary tracking algorithm. As the number of iterations is getting large, the step-size-adaptation sequence approaches to the best step size, which will enhance the performance of the main tracking task. Further discussions can be found in [21] and [22, Section 3.2]. To proceed, we present an adaptive-step-size algorithm. The discussion is carried out by considering an error sequence as a function of the step size fixed at a value . For motivational purposes, we first give a formal discussion; see also [1], [3], , define the error [21], [22], among others. For a fixed . The idea in [1], [3] is to choose that minimizes the stationary value of the mean squares error . Recall that in adaptive filtering, the seis available. Then one quence of outputs and regressors wants to choose so that the mean squares error is minimized. As a result, minimization of above is a variation of the traditional adaptive filtering approach. be the derivative in the mean squares sense. Let Formally differentiating (20) with respect to at time leads to a recursion for . Thus we can construct algorithms that tracks the Markov chain , adaptively estimates the step size, and estimates the mean square derivative simultaneously. The reader is referred to [22, Section 3.2] for further details. In what follows, for our identification and tracking tasks, we propose the following algorithm. Let and be step size and the estimate of , respectively. Denote the error sequence by (24) The adaptive-step-size algorithm is given as follows: (25)

YIN et al.: IDENTIFICATION OF SYSTEMS WITH REGIME SWITCHING AND UNMODELED DYNAMICS

41

Fig. 1. Sample paths of (t): generator Q = Q= ;  = 0:1,  = 0:01. (a) Sample path of  (t):  = 0:1. (b) Sample path of  (t):  = 0:01.

where

is the projection operator so that if if if

Theorem 4.3: Assume that there is a continuous function such that for each , as , for each

, , .

The projection prevents the iterates to be outside of . When it happens, the projection operator will force the iterates to come back to the bounded region. In (25), the first equation is for tracking the time-varying parameter; the second one is for the step size adaption and the third one is for updating the mean squares derivative. Note that the recursion for in (25) can be written as (26)

(28) Then is tight in (the space of functions that are right continuous and have left limit endowed with , any the Skorohod topology [22, Chapter 7]), and as which weakly convergent subsequence has the limit is a solution of

where

is known as a reflection term. That is, the term is the minimal force needed to bring back to the constraint interval if it ever escapes from there. , we use To study convergence of the adaptive step size the ODE approach and weak convergence methods detailed in [22]. Take a continuous-time interpolation for for

, .

Define the interpolation for the reflection term as for , for

C. Numerical Experiments Next, we present a couple of examples including both 1-D and 2-D cases. For simplicity, the unmodeled dynamics are assumed to be absent. These examples demonstrate the performance of adaptive step size algorithms. be a discrete1) Example 4.4: Let the true parameter , the initial state time Markov chain with state space , and the transition probability matrix given in (19) with and

.

is a sequence of i.i.d random variables Note that the noise is bounded, and for each and (27) by (A1). Then we obtain the following result. The proof is essentially a modification of that of [21]. For brevity, we omit the verbatim argument.

. We set

,

,

, and initial data . The input is , and . hence Fig. 2 consists of two parts. Part (a) shows a typical sample , which means path for the iterates with initial estimate that the initial estimate is exactly equal to the true parameter. The figure illustrates that the tracking algorithm tracks the true with parameter very well. Part (b) shows a sample path for the wrong initial estimate ; but as illustrated by the figure, the estimate catches up with the true parameter very fast. There are only a few jumps among 1000 iterations in both cases, a consequence of infrequently switching systems.

42

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 54, NO. 1, JANUARY 2009

Fig. 2. Sample paths of  generated using algorithm (25): horizontal axis-iteration number k ; vertical axis– . (a) Sample path with initial estimate  Sample path with initial estimate  .

=1

Fig. 3. Sample paths of  generated by (25). (a) Sample paths with initial estimate  ; . estimate 

= (5 8)

2) Example 4.5: Let be a discrete-time Markov chain , the initial state with state space , and the transition probability matrix is given by (19) with and

.

Fig. 3 part (a) shows us the tracking performance of adaptive algorithm in two dimensional cases and part (b) gives the sample paths for each component of . V. FURTHER REMARKS Identification of systems with regime switching and unmodeled dynamics is studied. Findings of this paper are of utility in identification of hybrid systems and discrete-event systems, and relevant to diagnosis of faults whose occurrences are typically represented by sudden changes of system dynamics. In this paper, we consider two opposite classes. The first one consists of Markov chains whose states change rapidly. Fast switching

= 5. (b)

= (5; 8) . (b) Sample paths for two individual components with initial

of system parameters prevents us from tracking instantaneous states of the chains. Tracking their average states become a viable and tractable choice. The second class involves infrequent switching Markov chains. An adaptive step size algorithm is developed to tracking system parameters. In comparison to the existing identification literature, this paper presents new results on persistent identification of jump parameters with unmodeled dynamics, representation of switching systems in a Markov chain framework with underlying differential inclusion, upper and lower bounds of identification errors with unmodeled dynamics for fast switching Markov chains, and adaptive step size algorithms for infrequently switching Markov chains. Many problems remain open for system identification with regime switching. In this paper, we considered open-loop identification problems so the input design is at our disposal and is non-random. One may consider the associated closed-loop system in conjunction with the approach provided in [34]. This

YIN et al.: IDENTIFICATION OF SYSTEMS WITH REGIME SWITCHING AND UNMODELED DYNAMICS

paper utilizes IIR model structures. It will be useful to derive similar results on systems in infinite dimensional ARMA or coprime factorization model structures. It is also possible to deal with systems that experience both slowly time-varying parameter variations in normal operating conditions and sudden switching at critical moments, in other words, slowly time-varying hybrid systems. Techniques developed in this paper for switching systems and those in [5], [6], [16], [41], [1], [3] for slowly varying systems may jointly provide a promising foundation for such pursuit. Another worthwhile undertaking is to use nonlinear filtering techniques to track Markov chain states. Results of our findings for treating binary observations of Markov chains will be reported in a forthcoming paper. APPENDIX PROOFS OF RESULTS Proof of Lemma 3.2: Note that

Since the sequence mulation (i.e.,

43

As for the last term in (29), since is a sequence of i.i.d. random variables with 0 mean and finite variance, the wellknown law of large numbers together with (30) implies

in probability. Combining the above estimates, we obtain in probability as . Taking yields the uniform convergence. To prove the second part of the theorem, by the first part , conditions of the result, the boundedness of (A1)–(A3), and the dominated convergence theorem yield that as . Moreover, the convergence is uniform in . Thus the estimator is asymptotically unbiased. The proof of the theorem is concluded. Proof of Lemma 3.6: The proof of (ii) follows from that with the th standard unit vector. of (i) by multiplying The proof of (i) is a slight variation of the argument in [40, pp. 74–76]. We thus omit the details. Proof of Lemma 3.7: Note that

is bounded owing to our for), by virtue of [40, Chapter 4, p.74]

Furthermore, the estimate holds if we take . The desired result then follows. Proof of Theorem 3.3: When there is no unmodeled dynamics, we have

(31) (29) Using (A1), as (30)

where in probability as . Furthermore, noting that the irreducibility and aperiodicity of , it is -mixing exponential mixing rate [2, p. 167]. Thus it is ergodic; see [17, p. 488]. Using this ergodicity and the boundedness of our input , we can then verify sequence (32)

By virtue of Lemma 3.2

Thus, to obtain the desired bounds, we need only examine (33)

in probability. As a result

and , with defined in (12). for each is a 1-dimensional Brownian motion given Recall that in Lemma 3.6 (ii). By virtue of the weak convergence in Lemma 3.6, and the Skorohod representation but without changing notation, we obtain that (34)

44

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 54, NO. 1, JANUARY 2009

For any given constant

,

Define

First, let us consider the case of positive excursion. It is readily seen that

where denotes the deterministic part of identification errors. By selecting the input to be the -periodic signals with the first components being , we obtain and , where

(38) where is given in (12). Minimizing the exponent above with , and consequently, respect to , we obtain . Negative excursion can be dealt with similarly, so

see [33] for details. Using (37), we have the identification error

(35) Now, observe that

We obtain that for any For sufficiently large , using (32), we can make for almost all . Thus, using (31)

(39) (36) Using the relations Therefore, the desired result follows. Proof of Theorem 3.8: Direct computation yields that

(37)

we obtain that by the well-known Chernoff’s bound (see, e.g., [30, p. 326]). Then (39) holds. The theorem is thus proved.

YIN et al.: IDENTIFICATION OF SYSTEMS WITH REGIME SWITCHING AND UNMODELED DYNAMICS

Proof of Corollary 3.9: By virtue of Theorem 3.8, to have , it is sufficient (40) This in turn, implies that

Taking over all such , the desired bound (14) follows. Proof of Theorem 3.13: It is readily seen that

From the second inequality to the third inequality, we have used Lemma 3.7. The rest of the proof is similar to [33, Theorem 4]. Proof of Corollary 3.14: We note that is also normally distributed with mean 0 and variance . The choice of implies that so . Then the desired result follows from Theorem 3.13 and [10, Lemma 2, p.175]. Proof of Proposition 4.1: The proof of (i) consists of three steps. Step 1: Rewrite the recursion as

Thus, for some

,

Noting that is a finite state Markov chain, it follows from the boundedness of and , and (A2)

The Gronwall’s inequality implies for any (41) , with , using the Step 2: For any boundedness of and , the finite state Markov chain , and (41), it can be verified that , so . Further-

more,

45

is tight; see Remark 4.2 (iv) (also [40]). Thus is tight (see the tightness criteria in [22, Chapter

7]). Step 3: By Prohorov’s theorem, we can extract a convergent subsequence. With a slight abuse of notation, still index the subsequence by . By Skorohod representation, we may assume the sequence converges w.p.1. We then carry out averaging procedure using the interpolations. Compared with [36], the term needs to be taken care of. We illustrate it by coninvolves such that sidering the following term. Choose a sequence but as . Then

Using (21), detailed calculations yield the desired differential inclusion limit. (Treatment of differential inclusion limit may be found in [22, Section 8.2.5].) To prove (ii), we use a Liapunov function approach as in proof of [36, Theorem 3.1]. The details are omitted for brevity. To prove (iii), we show the scaled and normalized sequence is tight and then characterize the limit process through martingale problem formulation. Since the argument is similar to [36], the verbatim proof is omitted. Remark on the Proof of Theorem 3.15: Since the proof is similar to that of Theorem 3.8, Corollary 3.9, Theorem 3.13, and Corollary 3.14, we will merely mention the modifications needed. To obtain the upper bounds, in lieu of (38), redefine as . As for , . The rest lower bound, consider of the proof is similar to that of Theorem 3.8, Corollary 3.9, Theorem 3.13, and Corollary 3.14. REFERENCES [1] A. Benveniste, M. Metivier, and P. Priouret, Adaptive Algorithms and Stochastic Approximations. New York: Springer-Verlag, 1990. [2] P. Billingsley, Convergence of Probability Measures. New York: Wiley, 1968. [3] J. M. Brossier, “Egalization Adaptive et Estimation de Phase: Application aux Communications Sous-Marines,” Ph.D. dissertation, Institut National Polytechnique de Grenoble, Grenoble, France, 1992. [4] H.-F. Chen and L. Guo, Identification and Stochastic Adaptive Control. Boston, MA: Birkhäuser, 1991. [5] M. Dahleh and M. A. Dahleh, “On slowly time varying systems,” Automatica, vol. 27, no. 1, pp. 201–205, Jan. 1991. [6] C. A. Desoer, “Slowly varying discrete systems x A x ,” Electron. Lett., vol. 6, pp. 339–340, 1970. [7] S. Dey, V. Krishnamurthy, and T. Salmon-Legagneur, “Estimation of Markov modulated time-series via the EM algorithm,” IEEE Signal Proc. Lett., vol. 1, pp. 153–155, 1994. [8] Y. Ephraim and N. Merhav, “Hidden Markov processes,” IEEE Trans. Inform. Theory, vol. 48, no. 6, pp. 1518–1569, Jun. 2002. [9] S. N. Ethier and T. G. Kurtz, Markov Process, Characterization and Convergence. New York: Wiley, 1986. [10] W. Feller, An Introduction to Probability Theory and Its Applications, 3rd ed. New York: Wiley, 1968, vol. I. [11] S. Gunnarsson and L. Ljung, “Frequency domain tracking characteristics of adaptive algorithms,” IEEE Trans. Acoust. Speech Signal Process., vol. ASSP-37, no. 7, pp. 1072–1089, Jul. 1989. [12] L. Guo and L. Ljung, “Performance analysis of general tracking algorithms,” IEEE Trans. Automat. Control, vol. AC-40, no. 8, pp. 1388–1402, Aug. 1995.

=

46

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 54, NO. 1, JANUARY 2009

[13] L. Guo, L. Ljung, and G.-J. Wang, “Necessary and sufficient conditions for stability of LMS,” IEEE Trans. Automat. Control, vol. 42, no. 6, pp. 761–770, Jun. 1997. [14] J. D. Hamilton and R. Susmel, “Autoregressive conditional heteroskedasticity and changes in regime,” J. Economet., vol. 64, pp. 307–333, 1994. [15] U. Holst, G. Lindgren, J. Holst, and M. Thuvesholmen, “Recursive estimation in switching autoregressions with a Markov regime,” J. Time Series Anal., vol. 15, pp. 489–506, 1994. [16] E. W. Kamen, P. P. Khargonekar, and A. Tannenbaum, “Control of slowly-varying linear systems,” IEEE Trans. Automat. Control, vol. AC-34, no. 12, pp. 1283–1285, Dec. 1989. [17] S. Karlin and H. M. Taylor, A First Course in Stochastic Processes, 2nd ed. New York: Academic Press, 1975. [18] A. N. Kolmogorov, “On some asymptotic characteristics of completely bounded spaces,” Dokl. Akad. Nauk SSSR, vol. 108, pp. 385–389, 1956. [19] V. Krishnamurthy, X. Wang, and G. Yin, “Spreading code optimization and adaptation in CDMA via discrete stochastic approximation,” IEEE Trans. Inform. Theory, vol. 50, no. 9, pp. 1927–1949, Sep. 2004. [20] V. Krishnamurthy, G. Yin, and S. Singh, “Adaptive step size algorithms for blind interference suppression in DS/CDMA systems,” IEEE Trans. Signal Processing, vol. 49, no. 1, pp. 190–201, Jan. 2001. [21] H. J. Kushner and J. Yang, “Analysis of adaptive step-size SA algorithms for parameter tracking,” IEEE Trans. Automat. Control, vol. 40, no. 8, pp. 1403–1410, Aug. 1995. [22] H. J. Kushner and G. Yin, Stochastic Approximation and Recursive Algorithms and Applications, 2nd ed. New York: Springer-Verlag, 2003. [23] B. G. Leroux, “Maximum-likelihood estimation for hidden Markov models,” Stochastic Process Appl., vol. 40, pp. 127–143, 1972. [24] L. Ljung, System Identification: Theory for the User. Englewood Cliffs, NJ: Prentice-Hall, 1987. [25] M. Milanese and G. Belforte, “Estimation theory and uncertainty intervals evaluation in the presence of unknown but bounded errors: Linear families of models and estimators,” IEEE Trans. Automat. Control, vol. AC-27, no. 2, pp. 408–414, Apr. 1982. [26] M. Milanese and R. Tempo, “Optimal algorithms theory for robust estimation and prediction,” IEEE Trans. Automat. Control, vol. AC-30, no. 8, pp. 730–738, Aug. 1985. [27] M. Milanese and A. Vicino, “Optimal estimation theory for dynamic systems with set membership uncertainty: An overview,” Automatica, vol. 27, pp. 997–1009, 1991. [28] M. Milanese and A. Vicino, “Information-based complexity and nonparametric worst-case system identification,” J. Complexity, vol. 9, no. 4, pp. 427–446, Dec. 1993. [29] T. Rydén, “On recursive estimation for hidden Markov models,” Stochastic Process Appl., vol. 66, pp. 79–96, 1997. [30] R. J. Serfling, Approximation Theorems of Mathematical Statistics. New York: Wiley, 1980. [31] V. Solo and X. Kong, Adaptive Signal Processing Algorithms – Stability and Performance. Englewood Cliffs, NJ: Prentice Hall, 1995. [32] L. Y. Wang, “Persistent identification of time varying systems,” IEEE Trans. Automatic Control, vol. 42, no. 1, pp. 66–82, Jan. 1997. [33] L. Y. Wang and G. Yin, “Persistent identification of systems with unmodeled dynamics and exogenous disturbances,” IEEE Trans. Automat. Control, vol. 45, no. 7, pp. 1246–1256, Jul. 2000. [34] L. Y. Wang and G. Yin, “Closed-loop persistent identification of linear systems with unmodeled dynamics and stochastic disturbances,” Automatica, vol. 38, pp. 1463–1474, 2002. [35] G. Yin, S. Kan, and L. Y. Wang, “Identification error bounds and asymptotic distributions for systems with structural uncertainties,” J. Syst. Sci. Complexity, vol. 19, pp. 22–35, 2006. [36] G. Yin and V. Krishnamurthy, “Least mean square algorithms with Markov regime switching limit,” IEEE Trans. Automat. Control, vol. 50, no. 5, pp. 577–593, May 2005. [37] G. Yin and V. Krishnamurthy, “LMS algorithms for tracking slow Markov chains with applications to hidden Markov estimation and adaptive multiuser detection,” IEEE Trans. Inform. Theory, vol. 51, no. 7, pp. 2475–2490, Jul. 2005. [38] G. Yin, V. Krishnamurthy, and C. Ion, “Regime switching stochastic approximation algorithms with application to adaptive discrete stochastic optimization,” SIAM J. Optim., vol. 14, pp. 1187–1215, 2004. [39] G. Yin, L. Y. Wang, and S. B. Kan, “Identification of RegimeSwitching Systems With Unmodeled Dynamics Using Binary Sensors,” Automatica, to be published.

[40] G. Yin and Q. Zhang, Discrete-Time Markov Chains: Two-Time-Scale Methods and Applications. New York: Springer, 2005. [41] G. Zames and L. Y. Wang, “Local-global double algebras for slow adaptation, Part I: Inversion and stability, Part II: Optimization for stable plants,” IEEE Trans. Automat. Control, vol. 36, no. 2, pp. 130–142, Feb. 1991. [42] G. Zames, L. Lin, and L. Y. Wang, “Fast identification -widths and uncertainty principles for LTI and slowly varying systems,” IEEE Trans. Automat. Control, vol. 39, no. 9, pp. 1827–1838, Sep. 1994.

H

n

G. George Yin (F’02) received the B.S. in mathematics from the University of Delaware in 1983, M.S. in Electrical Engineering and Ph.D. in Applied Mathematics from Brown University in 1987. He joined the Department of Mathematics, Wayne State University in 1987, and became a professor in 1996. He severed on the Mathematical Review Date Base Committee, IFAC Technical Committee on Modeling, Identification and Signal Processing, and various conference program committees; he was the editor of SIAM Activity Group on Control and Systems Theory Newsletters, SIAM Representative to the 34th CDC, Co-chair of 1996 AMS-SIAM Summer Seminar in Applied Mathematics, Co-chair of 2003 AMS-IMS-SIAM Summer Research Conference: Mathematics of Finance, Co-organizer of 2005 IMA Workshop on Wireless Communications, and Co-organizer of 2006 IMA PI conference. He is an associate editor of SIAM Journal on Control and Optimization, Automatica, was an Associate Editor of IEEE TRANSACTIONS ON AUTOMATIC CONTROL from 1994 to 1998, and is/was on the editorial board of eight other journals.

Shaobai Kan received the B.S. degree in applied mathematics in 1999 and the M.S. degree in applied statistics in 2002 from Tongji University, Shanghai, China, in 1999 and 2002, respectively, and the M.A. degree in mathematical statistics and the Ph.D. degree in applied mathematics from Wayne State University, Detroit, MI, in 2007 and 2008, respectively. He joined the Department of Mathematics and Computer Science, John Jay College of Criminal Justice, City University of New York (CUNY), New York, NY, in 2008. His current research interests include estimation and system identification, mathematical statistics, stochastic approximation and optimization, stochastic control, applied probability and stochastic processes, as well as actuarial sciences and finance.

Le Yi Wang (SM’03) received the Ph.D. degree in electrical engineering from McGill University, Montreal, QC, Canada, in 1990. Since 1990, he has been with Wayne State University, Detroit, MI, where he is currently a Professor in the Department of Electrical and Computer Engineering. He is currently an Editor of the Journal of System Sciences and Complexity and an Associate Editor of the Journal of Control Theory and Applications. His research interests are in the areas of complexity and information, system identification, robust control, H-infinity optimization, time-varying systems, adaptive systems, hybrid and nonlinear systems, information processing and learning, as well as medical, automotive, communications, and computer applications of control methodologies. Dr. Wang was an Associate Editor of the IEEE TRANSACTIONS ON AUTOMATIC CONTROL. He was a keynote speaker in several international conferences. He serves on the IFAC Technical Committee on Modeling, Identification and Signal Processing.

YIN et al.: IDENTIFICATION OF SYSTEMS WITH REGIME SWITCHING AND UNMODELED DYNAMICS

Cheng-Zhong Xu (SM’02) received the B.S. and M.S. degrees from Nanjing University, Nanjing, China, in 1986 and 1989, respectively, and the Ph.D. degree from the University of Hong Kong in 1993. He is a Professor in the Department of Electrical and Computer Engineering, Wayne State University (WSU), Detroit, MI. He is also the Director of the Center for Networked Computing Systems, WSU. He has published more than 140 peer-reviewed scientific papers in archival journals and conferences in these areas. He is the author of Scalable and Secure Internet Services and Architecture (London, U.K.: Chapman & Hall/CRC Press, 2005) and co-author of Load Balancing in Parallel Computers: Theory and Practice” (Norwell, MA: Kluwer Academic, 1996). His research interest includes networked computing systems and applications, in particular scalable and secure Internet services and architecture, scheduling and resource management in distributed, parallel, and embedded systems, and autonomic systems management for highly reliable computing. Dr. Xu received the President’s Award for Excellence in Teaching, WSU, in 2002 and the Career Development Chair Award in 2003. He is a member of ACM. He serves on the editorial boards of the IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, the Journal of Parallel and Distributed Computing, the Journal of Parallel, Emergent, and Distributed Systems, the Journal of Computers and Applications, and the Journal of High Performance Computing and Networking. He has also guest edited special issues for several other journals on network services and security in distributed systems. He has served a number of international conferences and workshops in various capacities as program chair, general chair, and plenary speaker.

47