It can be easily proven that this is indeed a sufficient statistic [11]. Now p(xt|It) is generated recursively in time by the Bayes law and can be viewed as the state.
Information Based Control for State and Parameter Estimation Luca Scardovi
Thesis for the PhD degree in Electronic and Computer Engineering March 2005 Supervisor: Supervisor:
Prof. R. Zoppoli Prof. M. Baglietto
University of Genoa
Department of Communication,
Faculty of Engineering
Computer and System Sciences
Contents 1 Statement of the problem 1.1 Basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Finite horizon stochastic optimal control . . . . . . . . . . . 1.1.2 Infinite Horizon stochastic optimal control . . . . . . . . . . 1.1.3 Optimal Estimation . . . . . . . . . . . . . . . . . . . . . . 1.1.4 The separation principle and the probing action of the control 1.2 Information measures . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Information theoretic measures . . . . . . . . . . . . . . . . 1.2.2 Fisher Information and Cramer-Rao bound . . . . . . . . . 1.2.3 On the choice of the information measure . . . . . . . . . . 1.3 The Optimal Probing Control Problem (OPC) . . . . . . . . . . . 1.3.1 Finite horizon optimal probing control . . . . . . . . . . . . 1.3.2 Active Estimation over an a priori unknown time horizon . 1.3.3 Active System Identification . . . . . . . . . . . . . . . . . . 1.3.4 Deterministic Approaches . . . . . . . . . . . . . . . . . . .
1 1 3 5 6 7 8 9 14 16 17 18 19 20 21
2 Active parameter identification 2.1 Active identification of linear systems . . . . . . . . . 2.1.1 Problem formulation . . . . . . . . . . . . . . . 2.1.2 Identifiability analysis . . . . . . . . . . . . . . 2.1.3 Entropy Based Active Identification . . . . . . 2.1.4 Extension to non-linear systems . . . . . . . . . 2.2 Active identification of switching systems . . . . . . . 2.2.1 Problem formulation . . . . . . . . . . . . . . . 2.2.2 Observability in the absence of noises . . . . . 2.2.3 Observability in the presence of bounded noises 2.2.4 Simulation Results . . . . . . . . . . . . . . . .
22 22 22 24 26 29 31 31 31 34 38
i
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
3 Active state estimation of nonlinear systems 3.1 Problem formulation . . . . . . . . . . . . . . . 3.2 Information measures and sufficient statistic . . 3.2.1 Gaussian Sum Filter . . . . . . . . . . . 3.2.2 Information content of a Gaussian Sum 3.3 Approximated problem solution . . . . . . . . . 3.4 Simulation results . . . . . . . . . . . . . . . . 3.4.1 Localization . . . . . . . . . . . . . . . . 3.4.2 Bearing Only Motion Planning . . . . . 3.5 A geometric interpretation . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
4 Specializations in discrete environments 4.1 Active Identification of unknown graphs . . . . . . . . . . 4.1.1 Statement of the exploration problem on stochastic 4.1.2 Applying dynamic programming . . . . . . . . . . 4.1.3 An alternative formulation of the problem . . . . . 4.1.4 Exact and approximate value iteration . . . . . . . 4.1.5 Numerical results . . . . . . . . . . . . . . . . . . . 4.2 The robotic exploration problem . . . . . . . . . . . . . . 4.2.1 Problem formulation . . . . . . . . . . . . . . . . . 4.2.2 Problem solution . . . . . . . . . . . . . . . . . . . 4.2.3 Simulation results . . . . . . . . . . . . . . . . . .
. . . . . . . . .
43 44 45 48 49 51 54 54 55 59
. . . . . graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
61 61 62 66 69 73 76 78 79 83 85
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
A Renyi entropy of a Normal random variable
91
B Proofs of the theorems of Chapter 2
93
C Gaussian Sum Filter Derivatives 97 C.1 Weights derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 C.2 Covariance derivatives . . . . . . . . . . . . . . . . . . . . . . . . . 113 C.3 Mean values derivatives . . . . . . . . . . . . . . . . . . . . . . . . 116 D Derivatives of the Quadratic Renyi Entropy
ii
120
Introduction
In this thesis a unified framework for decision making under uncertainty is presented to address problems where the objective is “to gather information.” The proposed formulation allows the treatment of several important classes of problems such as optimal control problems where the objective is to acquire information about some variables of interest (see Chapters 2, 3) as well as decisional problems modelled as finite state Markov chains (see Chapter 4). This is accomplished by considering general dynamic systems defined over arbitrary state and control sets (see Chapter 1). The topic addressed in this dissertation founds its theoretical bases on rather well consolidated subjects such as Optimal Control Theory, Estimation and Identification Theory, Information Theory, Optimal Experiment Design and Learning Theory. Hence in the first chapter a brief review of these subjects is presented. Nowadays, Stochastic Optimal Control Theory and Estimation Theory are still two of the most important and difficult research topics in such disparate fields as statistics, mathematics, economics, robotics, control engineering, etc. However, to the knowledge of the author, the topic addressed in this dissertation has not been directly faced yet. Some related concepts arise in Optimal Experiment Design (OED) [1] (a branch of mathematical statistics), where the problem is to select a suitable experiment to force the measurements to be maximally informative. Such a problem arises for example in physics, biology, chemistry, etc., where setting up complicated and expensive experiments is required. The related theory has offered us a source of inspiration but it is not directly applicable to our class of problems. In fact we shall deal with dynamic systems while OED deals with static environments.
iii
Introduction
iv
Information Theory is a fascinating subject and has grown considerably since Shannon’s seminal paper [2]. While it is clear that Shannon was motivated by problems in communication theory, this subject is much more than a subset of communication theory. Indeed, it has provided fundamental contributions in statistical physics (thermodynamics), computer science (Kolmogorov complexity and algorithmic complexity), statistical inference (Occam’s Razor). In Control Engineering, in the past two decades, some researchers tried to link some concepts from the Information Theory to the Control Theory [3, 4]; however even if some interesting results have been presented, the connection between the two subjects is not yet well established. We deem that the problem faced in this dissertation combines some open questions that are related to various important old and recent works appeared in the literature. Among such issues, we cite Feldbaum’s adaptive dual control [5] (see also the survey in [6]), entropy-based optimal and approximate adaptive control (see for example [3] and [4]), active learning in control (see, for instance, [7], [8] and [9]), and approximate techniques to solve stochastic functional optimization problems (e.g., the Extended Ritz Method [10]). In this thesis we propose a new formulation of Stochastic Optimal Control Problem that we shall call Optimal Probing Control Problem (OPC). It is a stochastic optimal control problem in which a suitable measure of uncertainty is added to “weight” the probing effect of the control law: the more the measure of uncertainty is weighted the more the resulting control law is informative. We introduce the problem in a very general way (see Chapter 1), and specialize it in the following chapters to treat special cases. To introduce the reader to the subject and to underline the broad class of problems that can be faced by using the proposed techniques, we introduce some examples. Robotic exploration problem Consider an autonomous Decision Maker (DM) that must explore an unknown (or partially unknown) environment by means of its sensors. It is obvious that to rapidly obtain a map of the surroundings, the DM must move towards the unknown parts of the environment. The problem of finding a feedback control law which drives the DM to build up effectively a map of the surroundings is called exploration problem (see section 4.2). Network exploration Let us consider a telecommunication network where some links can be broken. An interesting problem consists in updating the map of the network by exploring
DIST - University of Genoa
Introduction
v
it (by means of “intelligent” tokens) (see section 4.1). Bearing only motion estimation Basically, the problem is to estimate a target position, by means of a sequence of noisy bearing measurements typically acquired by a sensor which is mounted on board a moving observer. This is a typical example where the control strategy of the observer influences the observability of the system, in fact the inherent nonlinearity which characterizes the estimation problem makes the observer maneuvers fundamental to estimate properly the target position (see Chapter 3). Active identification of a robot arm The problem is to estimate some parameters of interest (for example representing the dynamics) of a robot arm; it is worth noting that exploring by generating random motor torques cannot be expected to give good coverage of the domain, while generating control actions such that the measurements are maximally informative should be preferable. All the above problem examples have roughly the same objective: to make decisions such that the maximum information about a variable (or a set of variables) of interest is gained. Even if every single problem has its own particularity, a general framework to deal with a broad class of problems can be proposed by adding a suitable uncertainty measure to the cost to be minimized, generalizing the “standard” stochastic optimal control problems. The dissertation is organized as follows. In Chapter 1, the basic concepts on stochastic optimal theory and information theory are recalled. Our approach is detailed and the general OPC problem formulation is given in quite a general framework. Some particular cases of this Problem are given as well. In Chapter 2, the problem of parameter identification is considered. In section 2.1 the attention is focused on LTI systems with an unknown measurement channel, this restriction allows us to take on an identifiability study of the system, proving that a system is identifiable from the null state if and only if it is completely reachable. Moreover, the structure of the system allows us to exploit in a simple form the information measure. Then the problem of finding the most informative vector sequence is addressed. A finite horizon control setting is
DIST - University of Genoa
Introduction
vi
considered. In Section 2.2 switching linear systems are considered. In these systems the switching forces a change of configuration with different couples of matrices describing the dynamic and measurement equations. The current mode of the system can be regarded as unknown discrete time variant parameters to be estimated. We give a condition of identifiability under the presence of a control action and we characterize the set of the identifying control sequences. We then extend the result to deal with the presence of bounded disturbances. In Chapter 3, we address the problem of actively estimate the state of a general stochastic dynamic system over a finite horizon (FH). We formulate the problem in an information theoretic setting by using the Renyi entropy as a measure of information about the state of the system. This choice is motivated by the the possibility of deriving a closed form expression for the amount of information, thus avoiding to resort to nonlinear programming techniques. Unfortunately, the recursive computation of the conditional density function can be accomplished analytically in very few cases, typically, under the classical Linear, Quadratic and Gaussian (LQG) hypotheses. Since the conditional probability function is needed to calculate the measure of uncertainty, we resort to an approximation of the conditional probability. To this end, a Gaussian Sum Filtering approach is proposed. The technique adopted to solve the resulting optimal control problem, consists in assigning the control law a given structure. A certain number of parameters have to be determined in order to minimize the cost function. For such a fixed structure, multilayer feedforward neural networks are chosen. Finally, simulation results are given to show the effectiveness of the proposed approach (in particular the Bearing Only Motion Problem has been addressed). In Chapter 4, the treatment is particularized for discrete environments. In Section 4.1 the problem of exploring an an a-priori partially unknown graph is addressed. The problem is formulated in a very general setting, where the DM sensors are affected by noise. Moreover, under a particular assumption, it is possible to consider a discrete model in time and space and, by using the concept of entropy, to reformulate the problem as a stochastic shortest path problem. By exploiting the concept of “frontier nodes” an equivalent formulation to the original problem is given, for which any policy is a “proper” one and the Dynamic Programming value iteration algorithm converges in a finite number of steps. The complexity of the problem dealt with leads us to consider techniques similar to Neuro-Dynamic Programming. In such a way the original functional optimiza-
DIST - University of Genoa
Introduction
vii
tion problem is reduced to a nonlinear programming one consisting in selecting the optimal values for the “free” parameters of the neural approximators. In Section 4.2 we address the robotic exploration problem i.e. the problem of exploring an unknown environment with one or more Decision Makers. We devise an exploration strategy (theoretically founded on the general setting of the thesis), where entropy is used to quantify the information gain obtained during the exploration process, making the DMs move toward the places where information is less certain. Simulation results show the effectiveness of the approach.
DIST - University of Genoa
Chapter 1
Statement of the problem 1.1
Basic concepts
This chapter sets the stage for the reminder of this dissertation. In the introduction, some examples of control problems have been presented in which the objective of the Decision Maker is to gather information about some quantity. In this chapter a formalization of the class of such problems is presented. To give an intuitive idea of the problem we want to address, we first give an informal statement. Problem: Given a stochastic dynamic system observed through a noisy measurement channel, find a control law such that the expected information gain about some variables of interest is maximized. In the first part of this chapter we shall recall some concepts from optimal control theory and information theory. With these concepts in mind we shall arrive to a generalization of the stochastic optimal control problem, which allows to treat problems where the objective (or part of it) is to gather information. We shall focus on discrete time control problems. Let us consider a dynamic system described by the state equation xt+1 = ft (xt , ut , ξt ),
t = 0, 1, . . .
(1.1)
where xt ∈ Xt denotes the state of the system, ut ∈ Ut denotes the control input and ξt ∈ Wt is a process disturbance. The state vector is observed through noisy
1
§ 1.1 - Statement of the problem
2
ξt
DM
ut
ft (xt , ut , ξt )
xt
ηt yt
ht (xt , ηt )
Figure 1.1: The closed loop structure of the stochastic control problem
measures given by yt = ht (xt , ηt ),
t = 0, 1, . . .
(1.2)
where yt ∈ Yt is the observation and ηt ∈ Vt is a disturbance on the measurement equation. The above characterization of dynamic system is extremely general. The structure of the sets the variables belong to are not specified for the moment. In Chapter 2 and Chapter 3 we shall consider finite dimensional vector spaces (e.g. Xt = Rn ) while in Chapter 4 enumerable sets shall be adopted (this connotation is necessary to model for example Markov Decisional Processes in discrete state spaces). The noises ξt and ηt can be random system and observation disturbances characterized by given probability measures, or unknown deterministic variables, whose statistics are not available. In the former case they shall be characterized by pξt (·|xt , ut−1 ), t = 1, 2, . . ., which may depend explicitly on the state xt and ut−1 but not on prior observation and system disturbances, and by pηt (·|xt , ut−1 ), t = 1, 2, . . ., which may depend explicitly on the state xt and ut but not on prior observation and system disturbances. In the latter case a typical situation is when the sets Wt and Vt are compact. The initial state x0 is often considered as a random variable, in this case characterized by a given probability measure Px0 (·). The information vector consists of the collection of all the past measurements and control vectors, i.e., It = [y0 , y1 , . . . , yt , u0 , u1 , . . . , ut−1 ],
t = 1, 2, . . .
I0 = y0 .
(1.3) (1.4)
DIST - University of Genoa
§ 1.1 - Statement of the problem
3
At each stage t = 0, 1, . . ., the control vector is chosen by the DM on the basis of its information vector It i.e., by means of a control function γt that maps the information vector It into the control set Ut ut = γt (It )
t = 0, 1, . . . .
We define a control law (or policy) a sequence γ of control function i.e., γ , {γ0 , γ1 , . . .} In Fig. 1.1 the control scheme is presented. The system can operate over a finite or infinite number of stages. In the present dissertation both the finite horizon (FH) and the infinite horizon (IH) cases shall be considered. In the next section we shall recall some concepts from optimal control theory.
1.1.1
Finite horizon stochastic optimal control
Let us suppose the objective of the DM is to control the dynamic system over a finite horizon T . A classical formulation for a stochastic optimal control problem (see for example [11]) is the following: Problem 1.1 Find a control law γ = {γ0 , γ1 , . . . , γT −1 } that minimizes the cost functional (T −1 ) X Jπ = gt (xt , γt (It ), ξt ) + gT (xT ) (1.5) E x0 , ξk , ηk k = 0, 1, . . . , T − 1
t=1
subject to the system equation (1.1) and the measurement equation (1.2). The real valued functions gT : XT → R,
gt : Xt × Ut × Wt → R,
t = 0, 1, . . . , T − 1
are two given process cost functions. Problem 1.1 is in the form of a constrained minimization problem. It is worth noting that the above problem does not allow to model real world problems in which the DM objective is not only to control the system state but also to gather information on the system. As shall be shown in Section ??, a new formulation for the stochastic optimal control problem shall allow us to generalize Problem 1.1 in order to deal with such kind of problems.
DIST - University of Genoa
§ 1.1 - Statement of the problem
4
Following [11] we apply the Dynamic Programming algorithm to the problem. The principle of optimality enables us to assign the information vector It a role identical with the one played by the state vector xt when the state is perfectly accessible. The information vector evolves following the equation It t = 0, 1, . . . , T − 2 It+1 = col(It , ut , yt+1 ) = ut , yt+1 (1.6) I0 = y0 . In (1.6) It is the system state, ut is the control vector, and yt+1 can be viewed as a random noise. Note that the cost per stage can be reformulated as a function of the variables of the new dynamic system (1.6). By letting g¯t (It , ut ) , E {gt (xt , ut , ξt )|It } ,
t = 0, 1, . . . , T − 1,
◦ ◦ J¯t+1 (It , ut ) , E Jt+1 (col(It , ut , yt+1 )),
t = 0, 1, . . . , T − 2.
xt ,ξt
yt+1
g¯T (IT ) , E {gT (xT )|IT } , xT
g¯T (IT −1 , uT −1 ) , E {¯ gT (col(IT −1 , uT −1 , yT ))} , yT
the Dynamic Programming equations take the form JT◦ −1 (IT −1 ) = min [¯ gT −1 (IT −1 , uT −1 ) + g¯T (IT −1 , uT −1 )] uT −1 £ ¤ ◦ Jt◦ (It ) = min g¯t (It , ut ) + J¯t+1 (It , ut ) , t = 0, 1, . . . , T − 2. ut
It is worth noting that it is possible to reformulate the stochastic optimal control problem by using the concept of sufficient statistic (see [11] for details). While it is possible to show that many different functions constitute a sufficient statistic for the problem we are considering, we shall focus attention on a particular one that is useful both from the analytical and the conceptual point of view. This sufficient statistic is the conditional probability measure of the state xt , given the information vector It , St (It ) = p(xt |It ),
t = 0, 1, . . . , T − 1.
It can be easily proven that this is indeed a sufficient statistic [11]. Now p(xt |It ) is generated recursively in time by the Bayes law and can be viewed as the state of a controlled discrete-time dynamic system, p(xt+1 |It+1 ) = Φ(p(xt |It ), ut , yt+1 )
t = 0, 1, . . . , T − 1,
DIST - University of Genoa
§ 1.1 - Statement of the problem
5
ηt
ξt ut
xt+1 = ft (xt , ut , ξt )
xt
yt = ht (xt , ηt )
yt
Delay
ut−1 γt
p(xt |It )
Φt−1
Figure 1.2: The closed loop structure of the stochastic control problem in the sufficient statistic domain
where Φt represents the recursive Bayes updating law. At this point it is possible to reformulate the problem by using Bellman optimality principle in the sufficient statistic domain. For the sake of simplicity we assume the noise vectors ξt and ηt to be independent from the system state and the control action, i.e. p(ηt |xt , ut−1 ) = p(ηt ) and p(ξt |xt , ut−1 ) = p(ξt ), it can be shown that the derivation can be carried out even under weaker assumptions. The Dynamic Programming equations become: JT◦ −1 (p(xT −1 |IT −1 )) = min [¯ gT −1 (p(xT −1 |IT −1 ), uT −1 ) + g¯T (p(xT −1 |IT −1 ), uT −1 )] uT −1 £ ¤ ◦ Jt◦ (p(xt |It )) = min g¯t (p(xt |It ), ut ) + J¯t+1 (p(xt |It ), ut )) , t = 0, 1, . . . , T − 2. ut
In Fig. 1.2 the new control scheme can be viewed.
1.1.2
Infinite Horizon stochastic optimal control
Infinite horizon problems represent a reasonable approximation of problems involving a finite but very large number of stages. These problems are also interesting because their analysis is elegant and insightful, and the implementation of optimal policies is often simple. These typologies of problems can be divided in three main categories [12]: Stochastic shortest path problems, Discounted prob-
DIST - University of Genoa
§ 1.1 - Statement of the problem
6
lems, Average cost per stage problems. It is worth noting that in [12] these problems are introduced for systems in which the DM knows perfectly the state of the system, i.e. the noise affects the state equation (1.1) but not the measurement equation (1.2). In this thesis, the attention is focused on systems with noisy measurement channels, therefore we have to extend these problems (in particular the Stochastic shortest path problem) to cover the general stochastic optimal control problem. To this end, we formulate the sequent problem in the sufficient statistic domain. Stochastic shortest path problems: we try to minimize J, the total cost over an infinite number of stages given by "T −1 # X J(p(x0 |I0 )) = lim E g¯(p(xt |It ), γ(p(xt |It )) . T →∞
t=0
We assume that there is a set of final states P which is a cost free termination state; once the system reaches that state it remains there at no further cost. The structure of the problem is assumed to be such that termination is inevitable, at least under an optimal policy. Thus the objective is to reach a termination state with minimal expected cost. The problem is in effect a FH problem, but the length of the horizon may be random and may be affected by the policy being used. It is worth noting that, since the new state of the system is infinite dimensional (in fact the state is the probability density function p(xt |It )), the problem is very difficult to solve in general. If the original state of the system xt belong to a finite set, then the sufficient statistic p(xt |It ) is no more an infinite dimensional object but shall belong to a finite dimensional space (in particular simple cases even finite sets).
1.1.3
Optimal Estimation
The problem of estimating the state of a stochastic dynamical system is of central importance. Consider the discrete stochastic dynamical system (1.1) and the measurement equation (1.2). Given a realization of the information vector Is , the discrete estimation problem consists of computing an estimate of xt . If t < s the problem is called discrete smoothing problem; if t = s the problem is called discrete filtering problem; if t > s the problem is called discrete prediction problem. The problem of estimating a time invariant parameter shall be called identification problem. It is clear that the conditional probability density function of xt given It p(xt |It ), is the complete solution of the filtering problem. This is simply because
DIST - University of Genoa
§ 1.1 - Statement of the problem
7
p(xt |It ) embodies all the statistical information about xt which is contained by the available observations and in the initial condition p(x0 ) [13]. We are adopting the so called Bayesian point of view in that we are taking into account an initial information about xt which is summarized by p(x0 ). With the a posteriori density at hand there still remain the question of what the estimate of the state should be. In fact, when the a posteriori density function p(xt |Is ) is available, one could want to “fix” a value for the estimate x ˆt of the state. Possible choices are the argument of the maximum value of p(xt |Is ), its mean value, etc. In this thesis we do not take into account this problem, as it shall be clear in the following our problem is to find a control law which forces the conditional probability density function to be maximally informative, i.e. it must force the observations to embody the maximum amount of statistical information about xt .
1.1.4
The separation principle and the probing action of the control ηt
ξt
yt
xt
ut = Lt E{xt |It } xt+1 = At xt + Bt ut + ξt
y t = C t x t + ηt
Delay
ut−1 Lt Actuator
E{xt |It }
MMSE Estimator
Figure 1.3: The closed loop structure of the LQ stochastic control problem If the hypothesis that the System (1.1) and the measurement equation (1.2) are linear and the cost is quadratic (LQ hypothesis), it is well known that the actions of the Decision Maker can be decomposed into the two parts shown in Fig. 1.3: an estimator, that uses the data to generate the conditional expecta-
DIST - University of Genoa
§ 1.2 - Statement of the problem
8
tion E{xt |It }, and an actuator, which multiplies E{xt |It } by the gain matrix Lt (solution of the Riccati equation) and applies the input ut = Lt E{xt |It } to the system. Furthermore the gain matrix is independent from the statistics of the problem and is the same as the one that would be used if we were faced with the deterministic problem in which the random variables would be fixed and equal to their expected value. Then, in this case, the solution of Problem 1.1 has the following property: “the estimator portion of the Decision Maker is the minimum mean square estimator (MMSE) assuming that no control takes place, while the actuator portion is an optimal solution of the control problem assuming perfect state estimation prevails” [11]. The property stated above is called the separation principle for linear systems and quadratic criteria. If, in addition, the system and measurement noises are Gaussian then the MMSE estimator is the well known Kalman Filter. Under general assumptions, the separation principle does not hold and the control law influences the amount of information the estimator can extract from the observations. In other words the controller can influence the “sharpness” of the conditional probability density function and hence the amount of information on the state that the conditional pdf contains. Hence the control objective can be twofold: the first is to minimize the control objective (for example to stabilize a system or drive the system to a desired reference) while the second is to try to maximize the information the Decision maker gains about the state of the system or some uncertain parameters. It is worth noting that if the separation theorem holds the second objective cannot be pursued. The question is the following: is there a stochastic optimal control formulation which takes in account these two objectives? In general these two objectives are contrasting and hence a reasonable choice is to search for a suitable trade-off between the attainment of the control objective and the maximization of the information about the state of the system. At the end of this chapter the generalization of the stochastic optimal control problem shall be given. To carry on our treatment the concept of information must be formalized. To this end, in the next section, some concepts from Information Theory are presented.
1.2
Information measures
In this section a brief overview of some concepts from Information Theory is given, this treatment is far from complete, the interested reader can find a systematic treatment of the Information theory in [14].
DIST - University of Genoa
§ 1.2 - Statement of the problem
1.2.1
9
Information theoretic measures
We shall first introduce the concept of Shannon entropy [2] which is a measure of uncertainty of a random variable [14]. Let X be a discrete random variable with alphabet X and probability mass function p(x) = P r{X = x}, x ∈ X . For notational convenience, in the following we shall denote the probability function by p(x) rather then pX (x). Definition 1.1 The Shannon entropy H(X) of a discrete random variable X is defined by X H(X) , − p(x) log p(x). (1.7) x∈X
If not specified the logarithm is to base 2 and the entropy is expressed in “bits”. However the change of the logarithmic base does not change the definition, and the following property allows the entropy to be changed from one base to another by multiplying by an appropriate factor,i.e. Hb (X) = (logb a)Ha (X). Remark 1.1 The entropy H(X) can also be interpreted as the expected value 1 , where X is drawn according to the probability mass function p(x). of log p(x) Thus ½ ¾ 1 H(X) = E log . p(x)
This definition of entropy is related to the definition of entropy in thermodynamics. It is possible to derive the definition of entropy axiomatically by defining certain properties that the entropy of a random variable must satisfy [2]. The entropy associated to a random variable is a measure of the amount of information required on the average to describe the random variable. An immediate consequence of the definition is the positiveness of the entropy, i.e. H(X) ≥ 0 We now extend the definition to a pair of discrete random variables.
DIST - University of Genoa
§ 1.2 - Statement of the problem
10
Definition 1.2 The joint Shannon entropy H(X, Y ) of a a pair of discrete random variable (X, Y ) with a joint mass function p(x, y) is defined by XX H(X, Y ) , − p(x, y) log p(x, y). (1.8) x∈X y∈Y
The conditional entropy of a random variable given another is defined as the expected value of the entropies of the conditional mass functions, averaged over the conditioning random variable. Definition 1.3 If (X, Y ) p(x, y), then the conditioned entropy H(X|Y ) is defined as X H(X|Y ) , − p(y)H(X|y) = − E {log p(x|y)} X,Y
y∈Y
The naturalness of the definition of the joint entropy and the conditional entropy is exhibited by the fact that the entropy of a pair of random variables is the entropy of one plus the conditional entropy of the other. This is provided in the following theorem (for a proof see for example [14]). Theorem 1.1
H(X, Y ) = H(Y ) + H(X|Y )
We now introduce two related concepts: relative entropy (or Kullback Leibler Distance (KLD)) and mutual information. The relative entropy between two probability mass functions p(x) and q(x) is a measure of the inefficiency of assuming that the distribution is q when the true distribution is p. Definition 1.4 The Kullback Leibler distance between two probability mass functions p(x) and q(x) is defined as D(p||q) ,
X x∈X
p(x) log
p(x) p(x) = Ep log dx q(x) q(x)
DIST - University of Genoa
(1.9)
§ 1.2 - Statement of the problem
11
The mutual information is a measure of the “amount of information” that one random variable contains about another random variable. It is the reduction in the uncertainty of one random variable due to the knowledge of the other. Definition 1.5 Consider two discrete random variables X and Y with a joint probability mass function p(x, y) and a marginal probability mass functions p(x) and p(y). The mutual information I(X, Y ) is the relative entropy between the joint distribution and the product distribution p(x)p(y), i.e., I(X, Y ) ,
XX
p(x, y) log
x∈X y∈Y
p(x, y) = D(p(x, y)||p(x)p(y)) p(x)p(y)
(1.10)
The following theorem establishes the relations between entropy and mutual information. Theorem 1.2 I(X, Y ) = H(X) − H(X|Y ), I(X, Y ) = H(Y ) − H(Y |X), I(X, Y ) = H(X) + H(Y ) − H(X, Y ), I(X, Y ) = I(Y, X), I(X, X) = H(X).
Another concept of entropy was introduced by Renyi [15], which somehow extends the concept of Shannon entropy. Definition 1.6 The Renyi entropy Hρ (X) of a discrete random variable X is defined by X 1 Hρ (X) , log p ρ (x) (1.11) 1−ρ x∈X
DIST - University of Genoa
§ 1.2 - Statement of the problem
12
It can be shown that lim Hρ (X) = H(X)
ρ→1
We now introduce the concept of differential Shannon entropy [2], which is the entropy of a continuous random variable. The differential entropy is similar in many ways to the entropy of a discrete random variable, but there are some important differences, and there is need for some care in using the concept [14]. Let X be a continuous random variable with alphabet X and probability density function p(x), x ∈ X . Definition 1.7 The Shannon entropy H(X) of a continuous random variable X is defined by Z H(X) , − (1.12) p(x) log p(x)dx x∈X
Since the Shannon entropy of a random variable is a functional of the associated probability, in the following, we shall write indistinctly H(X) and H(p). The Shannon entropy of a random variable is a measure of the uncertainty of the random variable; it is a measure of the amount of information required on the average to describe the random variable. Another information measure is the Renyi entropy [15]. It is a generalized version of the Shannon entropy and depends on a given parameter ρ ∈ {R+ \ 1}. Definition 1.8 The Renyi entropy Hρ (X) of a continuous random variable X is defined by Z 1 log p ρ (x)dx Hρ (X) , (1.13) 1−ρ x∈X
It can be shown that lim Hρ (X) = H(X)
ρ→1
We now introduce the concept of Kullback Leibler Distance (KLD) or relative entropy between two probability mass functions p(x) and q(x), which is a measure of the inefficiency of assuming that the distribution is q when the true distribution is p.
DIST - University of Genoa
§ 1.2 - Statement of the problem
13
Definition 1.9 The Kullback Leibler distance between two probability mass functions p(x) and q(x) is defined as Z p(x) p(x) D(p||q) , (1.14) p(x) log = Ep log dx q(x) q(x) x∈X
Another important concept is the mutual information, which is a measure of the amount of information that one random variable contains about another random variable. It is the reduction in the uncertainty of one random variable due to the knowledge of the other. Definition 1.10 Consider two random variables X and Y with a joint probability mass function p(x, y) and a marginal probability mass functions p(x) and p(y). The mutual information I(X, Y ) is the relative entropy between the joint distribution and the product distribution p(x)p(y), i.e., Z Z p(x, y) I(X, Y ) , p(x, y) log = D(p(x, y)||p(x)p(y))dxdy (1.15) p(x)p(y) x∈X y∈Y
Let us particularize the concept of information gain for our problem. What is the expected information gain the DM achieve when a certain action is performed? To answer this question we have three possible choices: the Entropy, Kullback Leibler Distance and the Mutual Information. Even if these three concepts are different, their expected value is the same as stated by the following theorem: Theorem 1.3 Given two random variables X and Y with a joint probability mass function p(x, y) and a marginal probability mass functions p(x) and p(y), then I(Y, X) = H(X) − H(X|Y ) = E {D(p(x|y)||p(x))} . Y
DIST - University of Genoa
§ 1.2 - Statement of the problem
14
Proof: By writing the expected value of the relative entropy we obtain ½Z ¾ p(x|y) p(x|y) log dx E {D(p(x|y)||p(x))} = E p(x) Y Y x Z Z p(x|y)p(y) log(p(x|y))dxdy − p(x|y)p(y) log(p(x))dxdy = y,x
y,x
= −H(X|Y ) + H(X) = I(Y, X)
Thanks to this theorem, in the following, it shall be often used the entropy difference as an information gain measure.
1.2.2
Fisher Information and Cramer-Rao bound
Another possible approach to formalize the information concept, is to use a scalar function of the Fisher information matrix as an information measure. The Fisher information matrix is defined as follows. Given a general measurement equation y = h(x) + η where y is the measure vector, x is the state vector and η is a generic noise, the Fisher Information Matrix (FIM) is defined as · 2 ¸ ∂ M (x) = − E log p(y|x) . (1.16) Y ∂x2 If we have an a priori information about the distribution of the state it is possible to extend the definition of the FIM in order to take into consideration also such an information. Then we have: · 2 ¸ ∂ M =− E log p(y, x) , (1.17) Y,X ∂x2 where expectation is taken respect to y an x. We refer to (1.17) as the Bayesian Fisher Information Matrix (BFIM). In the following, without loss of generality, we shall consider the Bayesian setting. A fundamental result and the most important application of the FIM is the Cramer-Rao Theorem, which gives us the lowest possible variance for all unbiased estimators x ˆ(y) (see for example [16])
DIST - University of Genoa
§ 1.2 - Statement of the problem
15
Theorem 1.4 For every unbiased estimator x ˆ(y), if M is not singular, the following relation holds: Cov(ˆ x) ≥ M −1 , £ ¤ where the inequality indicates that the matrix Cov(ˆ x) − M −1 is a non negative definite matrix. If the estimator is biased a similar result applies Theorem 1.5 For every estimator x ˆ(y), if M is not singular, assume that ˆ = ω(x) Ex then ¸ · ¸T ∂ω(x) −1 ∂ω(x) M Cov(ˆ x) ≥ . ∂x ∂x ·
We now use it to define the most efficient estimator. Definition 1.11 An estimator x ˆ(y) is said to be efficient if it meets the CramerRao bound with equality, i.e., if · ¸ · ¸T ∂ω(x) −1 ∂ω(x) Cov(ˆ x) = M ∂x ∂x
The Fisher information matrix is therefore a measure of the amount of “information” about x that is present in y. In fact, it gives a lower bound on the error incurred when estimating x from the data y. Since a scalar quantity is needed, the criterion ÷ ¸ · ¸T ! ∂ω(x) −1 ∂ω(x) V (x) = h M ∂x ∂x is chosen, where h(Q) is a scalar-valued function defined over the set of positivedefinite matrices Q. The requirement on the function h(Q) is that it must be
DIST - University of Genoa
§ 1.2 - Statement of the problem
16
monotonically increasing. More specifically, let Q be a positive definite and ∆Q nonnegative definite. Then it is required that h(Q + ∆Q) ≥ h(Q) and that equality holds only for ∆Q = 0. Two possible choices are 1. h1 (Q) = tr(SQ), where S is a (symmetric) positive definite weighting matrix. 2. h2 (Q) = det(Q).
1.2.3
On the choice of the information measure
In our problem we want to use information theoretic concepts to extend the optimal control problem. The choice of the information measure is a challenging subject. As analyzed in the previous subsections, many information measures have been proposed in the literature. The classical information measures such as the Shannon and Renyi entropy, and the related concepts of mutual information and Kullback divergence have a strong theoretical foundation and an attractive physical meaning, in fact they are strictly related with the concept of Information. Let us consider Problem 1.1, the conditioned probability at time t, associated to the system state xt , is p(xt |It ). The information measures, associated with the conditioned state probability, take the following different forms. The Shannon Entropy is given by ½ ¾ 1 H(p(xt |It )) = E log . (1.18) p(xt |It ) xt It is the average uncertainty on xt given the informative vector It . The conditioned Renyi Entropy takes the following form © ª 1 Hρ (p(xt |It )) = log E p ρ−1 (xt |It ) , ρ 6= 1. 1−ρ xt
(1.19)
When the information matrix is adopted we can choose among two scalar functions of the inverse of the Fisher Information Matrix, the determinant or the trace. When the determinant is used we have µ ½ 2 µ ¶¾¶−1 ∂ 1 Hd (p(xt |It )) = det E log ; (1.20) p(xt |It ) xt ∂x2 t
DIST - University of Genoa
§ 1.3 - Statement of the problem
17
when the trace is adopted we have µ ½ 2 µ ¶¾¶−1 1 ∂ Htr (p(xt |It )) = tr E log ; p(xt |It ) xt ∂x2 t
(1.21)
In the special case of a normal probability density function N (x − x ¯, Σ), the quantities defined above take on a very simple form. In particular, as the intuition suggests, they are functions of the state covariance matrix. Moreover they do not depend on the measurements and hence they can be computed off-line. H(p(xt |It )) =
1 n log ((2π)n det Σt ) + , 2 2
¡ ¢ 1 1 H2 (p(xt |It )) = log 22n π n det Σ = Hs − log 2 2 Hd (p(xt |It )) = det Σt =
(1.22) µ
e2Hs (xt |It ) , (2πe)n
Htr (p(xt |It )) = trΣt ,
en 2n
¶ ,
(1.23)
(1.24) (1.25)
where Σt is the covariance matrix associated to the state xt . The derivation of the Shannon entropy, and the Fisher Information of a Gaussian random variable is a well known result (see for example [14, 17]), the quadratic Renyi entropy of a Gaussian random variable have been derived with elementary calculus and some results from probability theory (see Appendix A). Remark 1.2 In the Gaussian case, all the information measures are functions of the covariance matrix of the conditioned probability. Furthermore, from the minimization point of view, the measures Hs , H2 and Hd are equivalent, since they are all monotonic functions of the determinant of the covariance matrix, and consequently to minimize one is equivalent to minimize the others.
1.3
The Optimal Probing Control Problem (OPC)
This section is devoted to the general formulation of the main problems addressed in this thesis. Such formulations generalize the classical stochastic optimal control problem 1.1. They make possible to address control problems where the amount of information that must be gained is important.
DIST - University of Genoa
§ 1.3 - Statement of the problem
1.3.1
18
Finite horizon optimal probing control
Problem 1.2 (OPC) Find a control law γ = {γ0 , γ1 , . . . , γT −1 } that minimizes the expected value of the cost functional T −1 X
[gt (xt , γt (It ), ξt ) + βt L(It )] + gT (xT ) + βT L(IT )
(1.26)
t=1
subject to the system equation (1.1) and the measurement equation (1.2). The real valued functions gT : XT → R, gt : Xt × Ut × Wt → R,
t = 0, 1, . . . , T − 1,
and the coefficients βt ∈ R+ are given. L(It ) is one among the information measures (1.18),(1.19),(1.21) and (1.20).
It is worth noting that the above Problem is a FH Optimal Control Problem with a cost penalizing the uncertainty on the state variable. This problem include two particular but interesting special cases. If we set βt = 0, t = 0, . . . , T , then we find again the “classical” stochastic optimal control problem defined in section 1.1.1 and the problem becomes identical to Problem 1.1. If instead gt (·, ·, ·) = 0 , t = 0, . . . , T − 1 and gT (·) = 0, then we have a “pure” OPC Problem where the only objective of the DM is to gain information about the state of the system. Moreover if βt = 0, ∀t = 0, 1, . . . , T − 1 and βT = 1, the cost assumes the simple form J(p(x0 ), γ0T , IT ) = L(IT ).
(1.27)
We shall deserve a particular attention to the following problem that we shall call Active Estimation (AE) Problem. Problem 1.3 (AE) Find a control law γ = {γ0 , γ1 , . . . , γT −1 } that minimizes the expected value of the cost functional J(p(x0 ), γ0T , IT ) = L(IT )
(1.28)
subject to the system equation (1.1) and the measurement equation (1.2). L(IT ) is one among the information measures (1.18),(1.19),(1.21) and (1.20).
DIST - University of Genoa
§ 1.3 - Statement of the problem
19
By solving this problem we find the control policy γ0T which minimizes the uncertainty (in the sense of one among the information measures (1.18),(1.19),(1.21), (1.20)) about the final state xT . However sometimes it shall be useful to introduce a cost to prevent the control values to become unfeasible (e.g., if the controls belong to a vector space, their norm could be not bounded), in this case we shall refer to these problems as AE Problems as well.
1.3.2
Active Estimation over an a priori unknown time horizon
The Problem AE is much more difficult when the horizon of the control problem is not fixed. The objective is to find a control horizon T and a control law γ = {γ0 , γ1 , . . . , γT −1 }, such that the cost functional is minimized and the desired quantity of information is extracted from the system. Problem 1.4 Find a control law γ = {γ0 , γ1 , . . . , γT −1 } that minimizes the expected value of the cost functional T −1 X
gt (xt , γt (It ), ξt ) + gT (xT )
(1.29)
t=1
subject to the system equation (1.1), the measurement equation (1.2) and one among the following inequalities eH(IT ) ≤ ² e
Hρ (IT )
(1.30)
≤²
(1.31)
Hd (IT ) ≤ ²
(1.32)
Ht (IT ) ≤ ²
(1.33)
The real valued functions gT : XT → R,
gt : Xt × Ut × Wt → R,
t = 0, 1, . . . , T − 1
are given. The time horizon T is a priori unknown and ² is an arbitrary small constant. Equations (1.30),(1.31),(1.32) and (1.33) constrain the final probability density function to belong to the set of functions for which residual information is under a given arbitrarily small constant. It is worth noting that in (1.30) and (1.31) the exponential function is needed because the entropy (if the random variable is
DIST - University of Genoa
§ 1.3 - Statement of the problem
20
continuous) assumes decreasing values (even negative) with the “contraction” of the density function. If the state space is discrete, then the discrete Shannon and Renyi entropies can be used and constraint (1.30) and (1.31) becomes H(IT ) ≤ ² and Hρ (IT ) ≤ ² respectively. Formally we shall call P the set of probability density functions satisfying the information constraint (i.e., (1.30),(1.31),(1.32), or (1.33)). Moreover we make the following assumption Assumption 1.1 Given the system equation (1.1) and the measurement channel (1.2), a time horizon T and a control policy γ(·) exist such that P r {p(xT |IT ) ∈ P} = 1. Now, on the basis of the discussion of Section 1.1.2, Problem 1.4 can be reformulated as a stochastic shortest path problem in the sufficient statistic domain, by defining the set of final states. The problem reformulation results to be: Problem 1.5 Find a control law γ = {γ0 , γ1 , . . .} that minimizes the cost functional "T −1 # X ∗ J(p(x0 |I0 )) = lim E g (p(xt |It ), γt (p(xt |It )) . (1.34) T →∞
t=0
subject to the new system equation ½ p(xt |It ), p(xt+1 |It+1 ) = Φ(p(xt |It ), ut , yt+1 ), and where
½
∗
g (p(xt |It ), ut ) =
1.3.3
0, g¯(p(xt |It ), ut ),
if p(xt |It ) ∈ P otherwise if p(xt |It ) ∈ P otherwise
t = 0, 1, . . . ,
t = 0, 1, . . . ,
Active System Identification
Sometimes we have to deal with problems in which the state and the measurement equations are uncertain. This is the case when, for example, the structure of the system is partially unknown and we want to identify it. Let us consider the following system xt+1 = ft (xt , θ, ut , ξt ), yt = ht (xt , θ, ηt ),
t = 0, 1, . . .
t = 0, 1, . . . .
DIST - University of Genoa
(1.35a) (1.35b)
§ 1.3 - Statement of the problem
21
where θ is a time invariant parameter vector. Sometimes can be useful to rewrite (1.35a) and (1.35b) in the form of equation (1.1) and equation (1.2). It can be done by extending the state to the parameters ¸ ¸ · · ft (xt , θ, ut , ξt ) xt+1 , t = 0, 1, . . . = (1.36a) θt θt+1 yt = ht (xt , θt , ηt ), t = 0, 1, . . . . (1.36b) We redefine defining a “new” state · ¸ xt x ˜t = θt and a new system equation f¯ = f (Xt , ut , ξt ). We can then rewrite the system in the “standard” form x ˜t+1 = f¯t (˜ xt , ut , ξt ), yt = ht (˜ xt , ηt ),
1.3.4
t = 0, 1, . . .
t = 0, 1, . . . .
(1.37a) (1.37b)
Deterministic Approaches
It is often useful to consider the problem of active estimating the state of a system under deterministic assumption. If the separation principle does not hold, then the control law in general can influence the observability of the state. This is the case for example if we face the study of observability for switching systems. In this case the question we want to answer to is how to design a control law that assures the observability of the system. In section 2.2 we shall address this problem. If the observability condition is satisfied, i.e. a control policy exists which assures the observability of the system, it shall be proven that is possible to discern the state of the system even in presence of bounded noise.
DIST - University of Genoa
Chapter 2
Active parameter identification 2.1
Active identification of linear systems
2.1.1
Problem formulation
In Chapter 1 a general formulation of the active identification problem has been given. In this chapter the attention is focused on LTI systems with an unknown measurement channel. In this particular case it is possible to take on an identifiability study of the system, and to exploit in a simple form the information measure. Then the problem of finding the most informative vector sequence is addressed. A finite horizon control setting is considered. For the reader’s convenience all the proofs are given in Appendix B. Let us consider a discrete-time linear system: xt+1 = Axt + But ,
t = 0, . . . , T − 1
(2.1a)
yt = C(θ)xt + ηt ,
t = 0, . . . , T
(2.1b)
where xt ∈ Rn is the state vector, yt ∈ Rh is the vector of measures, ut ∈ U ⊆ Rm is the control vector, θ ∈ Rnh is a vector of unknown parameters and ηt is a disturbance vector. Let us assume in the following that x0 = x ¯ is a known initial condition. The matrix C(θ) in equation (2.1b) is defined as C(θ) = [θij ] . Since equation (2.1b) is bilinear in xt and θij , system (2.1) can be rewritten as xt+1 = Axt + But , yt =
Xt0 θ
+ ηt ,
t = 0, . . . , T − 1
(2.2a)
t = 0, . . . , T
(2.2b)
22
§ 2.1 - Active parameter identification
where
4 Xt0 =
x0t 0 . . . 0 0 x0t . . . 0 .. . . . .. . .. . . 0 0 . . . x0t
23
and 4
θ=
£
θ1,1 θ1,2 . . . θ1,n θ2,1 . . . θn,n
¤0
is the stacking of the rows (transposed) of the matrix C(θ). It is worth noting that (2.2b) is the linear regression form of (2.1b). We will adopt the following notation uT0 −1 = col [u0 , u1 , . . . , uT −1 ] . The vector θ is here considered as a vector of initially unknown parameters to be estimated. In order to do this, the Best Linear Unbiased Estimator is assumed to be used. The problem of Active Identification addressed in this chapter is to control the system (2.1a) in order to obtain an optimal regressor matrix. The term optimal is intended here with respect to the minimization of a suitable uncertainty measure LT (·). The choice of the information measure LT (·) will be the argument of Section 2.1.3. By extending the state to the parameters, we can rewrite system 2.1 in a state form (as the one considered in Chapter 1). ˜xt + Bu ˜ t, x ˜t+1 = A˜ yt = ht (˜ x t ) + ηt , where
· x ˜t = · A˜ =
xt θt A I
t = 0, 1, . . . t = 0, 1, . . . .
¸ ,
t = 0, 1, . . . , T
¸
· ,
˜= B
B 0
¸
ht (˜ xt ) = Xt0 θt , t = 0, 1, . . . , T
DIST - University of Genoa
(2.3a) (2.3b)
§ 2.1 - Active parameter identification
24
Hence the separation principle does not hold (in fact the measurement channel is not linear in the state), and the controller can influence the estimation performance. Following the general problem formulation given in the previous chapter, the problem can be stated as follows Problem 2.1 ( Active identification problem) Given a suitable information measure LT (·) (related to the vector θ), find the optimal control sequence (uT0 −1 )◦ such that (uT0 −1 )◦ = arg min LT (uT0 −1 ) −1 uT 0
subject to (2.1a) and uT0 −1 ∈ U xt ∈ X ⊆ Rn where U is the bounded set of the admissible control sequences. Problem 2.1 implicitly assumes the identifiability of the vector θ and intuitively requires some degree of reachability of the system (2.1). Indeed in the next section it will be proven that the identifiability of the system is strictly related to its reachability.
2.1.2
Identifiability analysis
The following theorem follows directly from the definition of a completely observable system (see for example [13]). Theorem 2.1 System (2.1) is completely identifiable (in t ≥ T stages) iff a time horizon T and a control sequence uT0 −1 exist such that: T X
Xt Xt0 > 0.
(2.4)
t=0
DIST - University of Genoa
§ 2.1 - Active parameter identification
25
By defining 4
Φ0T =
£
X0 X1 . . . XT
¤
equation (2.4) can be rewritten in a more concise form as Φ0T ΦT > 0. If Φ0T ΦT is singular then certain linear combinations of the elements of θ cannot be determined; in this case, no information about them can be extracted from the data {y0 , . . . , yT }. In the following it will be proved, that complete identifiability (for short identifiability) is a structural property. Indeed, the following result states the equivalence of reachability and identifiability. Theorem 2.2 The system (2.1) is identifiable in n stages (and consequently in T > n stages) from the state zero iff it is completely reachable, i.e., iff rank (K) = n where K=
£
B AB . . . An−1 B
¤
.
Proof: (see Appendix B). Given a identifiable system, only a subset of the control sequences guarantee the identification of the parameters. These sequences will be called proper sequences. Let us consider the following constructive procedure to fix a control sequence: Procedure 2.1 u0 0 4 U = ... 0 0
Let us define u1 u0 .. .
u2 u1 .. .
0 0
0 0
. . . un−1 . . . un−2 .. .. . . u0 u1 . . . u0
£ ¤ = v1 v2 . . . , vn ,
where vi ,
i = 1, . . . , n
DIST - University of Genoa
(2.5)
§ 2.1 - Active parameter identification
26
are the column vectors of U . We want to find a control sequence u0n−1 such that vi ∈ / ker (K) ,
i = 0, . . . , n − 1
(2.6)
Such sequences can be constructed in the following two steps: • take u0 ∈ / ker (B), such a vector always exists (except for the trivial case of B null); • for each i = 1, . . . , n − 1 take ui such that the following condition holds Bui 6=
i X
Aj Bui−j .
j=1
Note that these vectors ui always exist (except for the trivial case of B null).
Corollary 2.1 If the system (2.1) is completely reachable then uT0 −1 guarantees the identifiability of the parameter vector θ if it satisfies the conditions expressed in (2.6) (for short, sequences satisfying (2.6) will called proper). Moreover it can be constructed following the lines of Procedure 2.1.
2.1.3
Entropy Based Active Identification
Under the reachability assumption, it is possible to gain all the information about all the parameters. Hence the problem of finding the control policy which extracts the maximum amount of information can be addressed. The first step is to choose one among the uncertainty measure defined in Chapter 1. When the disturbance vectors η0 , η1 , . . . , ηT have a Gaussian distribution and are independent and identically distributed, after straightforward calculations the information matrix for our problem becomes: MT (uT0 −1 ) = Φ0T (uT0 −1 )R−1 ΦT (uT0 −1 )
(2.7)
where R is the covariance matrix of the vector η. In this case, the BFIM turns out to be MT (uT0 −1 ) = Σ−1 + Φ0T (uT0 −1 )R−1 ΦT (uT0 −1 )
DIST - University of Genoa
(2.8)
§ 2.1 - Active parameter identification
27
where Σ is the covariance of the prior parameter density function. It is worth noting that MT depends on the whole trajectory of the system, i.e. the sequence of states of the controlled system and consequently on the control sequence uT0 −1 . If we consider an efficient unbiased estimator (we choose to use the Best Linear Unbiased Estimator - BLUE), we can consider as a Loss function the entropy of the estimator. In the Gaussian case, the entropy of the estimator takes on a very simple form. ¡ ¢ ˆ = c + log det M −1 HT (θ) where c is a suitable constant (that will not be considered in the following), and ˆ will be equal to M is the FIM. In this case HT (θ) ³ ´−1 log det Φ0T (uT0 −1 )R−1 ΦT (uT0 −1 ) . ˆ will be equal to If the BFIM is considered, HT (θ) ³ ´−1 log det Σ−1+ Φ0T (uT0 −1 )R−1 ΦT (uT0 −1 ) . Hence the entropy is related to the FIM (or BFIM) and the minimization of the entropy of the estimator is equivalent to the minimization of the determinant of the FIM (or the BFIM). It is worth noting that minimizing the mean square error is equivalent to minimize the trace of the BFIM, while minimizing the entropy of the estimator is equivalent to minimize the determinant of the BFIM. In the particular case where which the measurement is a scalar is possible to derive a simple form for the determinant of the FIM. Theorem 2.3 Consider the system (2.1) where yt ∈ R and C(θ) = θ0 and σ 2 is the disturbance variance. Then ¡ ¢ arg min log det MT−1 = uT0 −1
(
arg min
−1 uT 0
where
−1 M−1
−
T X t=0
) 1 0 −1 ln(1 + 2 xt Mt−1 xt ) σ
= Σ0 is the a priori covariance matrix of the parameters.
Proof: (see Appendix B) The active identification problem can be stated as:
DIST - University of Genoa
§ 2.1 - Active parameter identification
28
Problem 2.2 (Active identification problem) Find the optimal control sequence (uT0 −1 )◦ such that ( T ) ³ ´◦ X ¡ ¢ −1 uT0 −1 = arg min − ln 1 + x0t Mi−1 xt −1 uT 0
t=0
subject to eq. (2.1a) and uT0 −1 ∈ U xt ∈ X ⊂ Rn
This problem can be solved by means of constrained nonlinear programming techniques. In the particular case, where the set the state vectors belong to takes on the form X = {x : x0 x ≤ c}, c > 0, a greedy policy can be obtained by solving the following problem. © ª Problem 2.3 min −x0t+1 Mt−1 xt+1 subject to:
ut
xt+1 = Axt + But ut ∈ U ⊂ Rm xt ∈ X ∈ Rn where X = {x : x0 x ≤ c}, c > 0, and t = 0, 1, . . . , T − 1. If rank (B) = n, by using a simple geometric interpretation (see Fig (2.1)), it is possible to give an analytic solution of the problem. At every step the optimal control vector is given by the following relation: u◦t = (B 0 B)−1 B 0 x∗t+1 − (B 0 B)−1 B 0 Axt cvtmax x∗t+1 = max , t = 0, . . . , T − 1 kvt k where vtmax is the eigenvector relative to the maximum eigenvalue of the matrix Mt−1 and M−1 = Σ−1 0 .
DIST - University of Genoa
§ 2.1 - Active parameter identification
x∗t
29
xt = 0
Figure 2.1: Geometrical interpretation of the state selection
2.1.4
Extension to non-linear systems
In the previous sections we have considered systems in which the measurement channel is represented by a parameterized matrix, in this section the case of nonlinear systems is considered. Consider the following dynamic system xt+1 = f (xt , ut (It )) , yt = h(xt , θ) + ηt ,
t = 0, 1, . . . t = 0, 1, . . .
(2.9a) (2.9b)
¯ Σ0 ). where ηt is an additive Gaussian noise with density G(0, σ 2 ) and θ is G(θ, The state of the system is perfectly measurable, i.e. the DM, at every time step, knows perfectly its state but not the parameters. If ¯ t )0 θ, h(xt , θ) = h(x
(2.10)
i.e. there is a linear relation respect to the parameters, proof of Theorem 2.3 still applies and we can state the following: ¯ t )0 θ Theorem 2.4 Consider the system (2.9) where yt ∈ R and h(xt , θ) = h(x and σ 2 is the measurement disturbance variance. Then arguT −1 min log det MT−1 = 0
( arguT −1 min − 0
T X i=0
) 1 −1 log(1 + 2 h(xi )0 Mi−1 h(xi )) σ
−1 where M−1 = Σ0 is the a priori covariance matrix of the parameters.
DIST - University of Genoa
§ 2.2 - Active parameter identification
30
Proof: The proof follows directly from the proof of Theorem 2.3 The assumption of linearity of the channel respect to the parameters is a strength hypothesis. In the case of a non-linear channel function belonging to the class of C 1 functions, we can give an approximate version of (4.7) through a ˆ This procedure linearization of the measurement function near the estimate θ. leads to the following minimization problem: ( arguT −1 min − 0
T X i=0
) µ ´0 ³ ´¶ 1 ³ −1 log 1 + 2 ∇h(xi , θˆi ) Mi−1 ∇h(xi , θˆi ) (2.11) σ
−1 where M−1 = Σ0 is the a priori covariance matrix of the parameters. The estimated parameters θˆi are random variables and are a priori unknown. However, we assume that an estimate of the parameter vector θt can be generated by a suitable estimator for t = 1, 2, . . . , T . Let us define the informative vector as
It = col(x0 , x1 , . . . , xt , θˆt ),
t = 0, . . . , T.
Hence the following Problem can be stated Problem 2.4 Find the optimal control law γ ◦ = {γ0 (I0 ), γ1 (I1 ), . . . , γT −1 (IT −1 )} that minimizes the expected value of µ T ½ ´0 ³ ´¶¾ X 1 ³ −1 ˆ ˆ − log 1 + 2 ∇h(xi , θi ) Mi−1 ∇h(xi , θi ) σ i=0
subject to eq. (2.1a) and uT0 −1 ∈ U xt ∈ X ⊂ Rn
DIST - University of Genoa
§ 2.2 - Active parameter identification
2.2
31
Active identification of switching systems
2.2.1
Problem formulation
By switched linear systems, we refer to discrete-time systems that can be modelled as follows: xt+1 = A(θt )xt + B(θt )ut + wt yt = C(θt )xt + vt ,
(2.12a) (2.12b)
where t = 1, . . . is the time instant, xt is the continuous state vector (the initial state x1 is unknown), θt ∈ Θ , {1, 2, . . . , K} is the discrete state, wt ∈ W ⊂ Rn is the system noise vector, yt ∈ Rm is the vector of measures, and vt ∈ V ⊂ Rm is the measurement noise vector, ut ∈ Rk is the control vector. A(θ), C(θ) and B(θ) ∈ Θ are n × n, m × n and n × k matrices, respectively. We assume the statistics of x1 , wt and vt to be unknown as well as the law governing the evolution of the discrete state. By observability we mean the possibility to infer the initial state x1 and a finite portion of the sequence of switches from a finite number of measurements y1 , . . . , yT . In the first section we study the observability for deterministic systems, i.e. systems in which the noises wt and vt are absent. We conclude this section by defining some notations used throughout this Section. Given a generic time-varying vector v t and two time instants t1 ≤ t2 , 4
let us define vt1 ,t2 = col (vt1 , vt1 +1 , . . . , vt2 ) . Given a generic matrix M , we denote as span(M ) the linear space generated as a linear combination of the columns of M .
2.2.2
Observability in the absence of noises
In this section, we study the observability of the discrete state of system (2.12) over a finite horizon. More specifically we want to know whether it is possible to reconstruct the discrete state of system (2.12) given the observations vector for a given time interval [1, T ] . For the sake of clarity, we first focus on the case where the system and measurement noises are absent, i.e., we shall consider the noise-free switching linear system xt+1 = A(θt )xt + B(θt )ut yt = C(θt )xt .
(2.13a) (2.13b)
In the following we shall denote as T the length of the observation horizon and as τ the length of the switching sequence we would like to observe. The objective
DIST - University of Genoa
§ 2.2 - Active parameter identification
32
is to determine uniquely the first τ discrete states (i.e., the switching pattern θ1,τ ) on the basis of the observation of the output of the system in the interval [1, T ] . Of course, we shall consider situations in which τ ≤ T . By defining the quantities C(θ1 ) C(θ2 )A(θ1 ) F (θ1,T ) , , .. . C(θT )Φ(θ1,T −1 ) G(θ1,T ) ,
0 C(θ2 )B(θ1 ) C(θ3 )A(θ2 )B(θ1 ) .. .
0 0 C(θ3 )B(θ2 ) .. .
··· ··· ··· .. .
0 0 0 .. .
C(θT )Φ(θ2,T −1 )B(θ1 ) C(θT )Φ(θ3,T −1 )B(θ2 ) · · ·
0
where Φ(θ1T −1 ) = A(θT −1 )A(θT −2 ) · · · A(θ1 ) is the transition matrix, the observations vector y1,T can be written as y1,T = F (θ1,T )x1 + G(θ1,T )u1,T .
(2.14)
As the observations vector y1,T depends on the sequence of control vectors u1,T , one may think of choosing suitably the sequence u1,T in order to make it possible to determine uniquely the switching pattern θ1,τ . With this respect, the following definition can be introduced. Definition 2.1 System (2.13) is said to be (τ, T ) discrete state observable [(τ, T )-DSO for short] if there exists a control sequence u1,T such that, for every 0 0 pair of switching patterns θ1,T ∈ ΘT and θ1,T ∈ ΘT with θ1,τ 6= θ1,τ and for every 0 pair of initial states x1 and x1 , we have 0 0 F (θ1,T )x1 + G(θ1,T )u1,T 6= F (θ1,T )x01 + G(θ1,T )u1,T .
According to Definition 2.1, if system (2.13) is (τ, T )-DSO, then it is possible to find a control sequence u1,T such that different switching patterns in the interval [1, τ ] generate different observations vector in the interval [1, T ] , regardless of the
DIST - University of Genoa
§ 2.2 - Active parameter identification
33
initial states. A control sequence that has such a property is called a discerning control sequence. Clearly, if u1,T is a discerning control sequence, the switching pattern θ1,τ can be determined uniquely from the observations vector y1,T as the unique θ1,τ such that y1,T ∈ Y (θ1,τ , u1,T ) . Here Y (θ1,τ , u1,T ) is the set of all the possible observations vectors associated to θ1,τ , i.e., ¯ n 4 ¯ Y (θ1,τ , u1,T ) = y1,T ¯ y1,T = F (θ¯1,T )x1 + G(θ¯1,T )u1,T ; o x1 ∈ Rn , θ1,T ∈ ΘT , θ¯1,τ = θ1,τ . It is worth noting that the set Y (θ1,τ , u1,T ) turns out to be the union of a finite number of affine spaces. The following theorem provides a necessary and sufficient condition for the (τ, T )-DSO of system (2.13). Theorem 2.5 System (2.13) is (τ, T )-DSO if and only if for every pair of switch0 0 ing patterns θ1,T ∈ ΘT and θ1,T ∈ ΘT with θ1,τ 6= θ1,τ the following relationship holds £ ¤ 0 0 rank F (θ1,T ) | − F (θ1,T ) | G(θ1,T ) − G(θ1,T ) £ ¤ 0 ) . (2.15) > rank F (θ1,T ) | − F (θ1,T
Note that in [18] a necessary and sufficient condition for the (τ, T )-DSO of system (2.13) was provided, that is based on the concept of orthogonal projection. More specifically the following theorem holds. Theorem 2.6 [18] System (2.13) is (τ, T )-DSO if and only if for every pair of 0 0 switching patterns θ1,T ∈ ΘT and θ1,T ∈ ΘT with θ1,τ 6= θ1,τ we have £ ¡ ¢¤ £ ¤ 0 0 I − P θ1,T , θ1,T G(θ1,T ) − G(θ1,T ) 6= 0 (2.16) ³ ´ 0 where P θ1,T , θ1,T is the matrix of the orthogonal projection on span
¡£ ¤¢ 0 F (θ1,T ) | − F (θ1,T ) .
DIST - University of Genoa
§ 2.2 - Active parameter identification
34
h ³ ´i 0 Recall that I − P θ1,T , θ1,T is the matrix of the orthogonal projection on the ³h i´ 0 ) subspace orthogonal to span F (θ1,T ) | − F (θ1,T . Hence, It is immediate to verify the equivalence of conditions (2.15) and (2.15) h (2.16), in that condition i 0 ensures that there exists at least one column of G(θ1,T ) − G(θ1,T ) that does not ³h i´ 0 ) belong to span F (θ1,T ) | − F (θ1,T and, hence, with a non-null projection ³h i´ 0 ) on the subspace orthogonal to span F (θ1,T ) | − F (θ1,T . If condition (2.15) holds, then it make sense to search for a discerning control sequence, i.e., a control sequence that guarantees the discernibility of the switching pattern. The following theorem characterizes such a sequence. Theorem 2.7 The sequence u1,T is a discerning control sequence if and only if £ ¡ ¢¤ £ ¤ 0 0 I − P θ1,T , θ1,T G(θ1,T ) − G(θ1,T ) u1,T 6= 0 (2.17) 0 0 . for every pair of switching patterns θ1,T ∈ ΘT and θ1,T ∈ ΘT with θ1,τ 6= θ1,τ
Provided that condition (2.15) is satisfied, i.e., system (2.13) is (τ, T )-DSO, for 0 0 , the every pair of switching patterns θ1,T ∈ ΘT and θ1,T ∈ ΘT with θ1,τ 6= θ1,τ T set of control sequences does (2.17) (i.e., the kernel h ³ u1 that ´i h not satisfy condition i 0 0 ) ) is a linear subspace of of the matrix I − P θ1,T , θ1,T G(θ1,T ) − G(θ1,T Lebesgue measure 0. As a consequence, since the set of control sequences uT1 that does satisfy condition (2.17) for at least one pair of switching patterns can be written as [ £ ¤£ ¤ 0 0 ker I − P (θ1,T , θ1,T ) G(θ1,T ) − G(θ1,T ) 0 θ1,T ,θ1,T
0 where the union is extended to every pair of switching patterns θ1,T and θ1,T such 0 that θ1,τ 6= θ1,τ , it is also a set of Lebesgue measure 0. An elementary example should clarify the foregoing definitions and results.
2.2.3
Observability in the presence of bounded noises
Let us now consider the noisy switching system (2.12) and suppose that, at any time stage, the system noise vector wt and the measurement noise vector vt belong to the known compact sets W and V , respectively. Along the lines of the
DIST - University of Genoa
§ 2.2 - Active parameter identification
35
previous section, we would like to know whether it is possible to find a control sequence u1,T such that different switching patterns in the interval [1, τ ] generate different observations vector in the interval [1, T ] , regardless of the initial state and the noise sequence. First note that in this case the observations vector y1,T can be written as y1,T = F (θ1,T )x1 + G(θ1,T )u1,T + H(θ1,T )w1,T + v1,T where
H(θ1,T ) ,
0 C(θ2 ) C(θ3 )A(θ2 ) .. .
0 0 C(θ3 ) .. .
··· ··· ··· .. .
0 0 0 .. .
C(θT )Φ(θ2,T −1 ) C(θT )Φ(θ3,T −1 ) · · ·
0
.
Then, we would like to find a control sequence u1,T such that for every pair of 0 0 , we have switching patterns θ1,T ∈ ΘT and θ1,T ∈ ΘT with θ1,τ 6= θ1,τ F (θ1,T )x1 + G(θ1,T )u1,T + H(θ1,T )w1,T + v1,T 0 0 0 0 0 6= F (θ1,T )x01 + G(θ1,T )u1,T + H(θ1,T )w1,T + v1,T
(2.18)
for every pair of initial states x1 , x01 , for every pair of system noise sequences 0 w1,T ∈ W T , w1,T ∈ W T , and for every pair of measurement noise sequences 0 v1,T ∈ V T , v1,T ∈ V T . Towards this end, let us rewrite condition (2.18) in the equivalent form · ¸ £ ¤ x1 £ ¤ 0 0 F (θ1,T ) | − F (θ1,T ) 6= G(θ1,T ) − G(θ1,T ) u1,T 0 x1 0 0 0 +H(θ1,T )w1,T + v1,T − H(θ1,T )w1,T − v1,T .
(2.19)
Clearly, condition (2.19) is satisfied if and only if ithe right-hand side is not h 0 ) , or, equivalently, if and in the span of the columns of F (θ1,T ) | − F (θ1,T only ³h if the projection of i´ the right-hand side on the subspace orthogonal to 0 span F (θ1,T ) | − F (θ1,T ) is not null, that is, ¡ ¢¤ n £ ¤ 0 0 I − P θ1,T , θ1,T G(θ1,T ) − G(θ1,T ) u1,T o 0 0 0 +H(θ1,T )w1,T + v1,T − H(θ1,T )w1,T − v1,T 6= 0 .
£
DIST - University of Genoa
(2.20)
§ 2.2 - Active parameter identification
36
If, for the sake of brevity, we define the following quantities £ ¡ ¢¤ 4 0 0 R(θ1,T , θ1,T ) = − I − P θ1,T , θ1,T £ ¤ 0 × G(θ1,T ) − G(θ1,T ) , n ¯ £ ¡ ¢¤ 4 ¯ 0 0 O(θ1,T , θ1,T ) = z ¯ z = I − P θ1,T , θ1,T £ ¤ 0 0 0 × H(θ1,T )w1,T + v1,T − H(θ1,T )w1,T − v1,T ; o 0 0 w1,T , w1,T ∈ W T , v1,T , v1,T ∈ VT , then condition (2.20) turns out to be 0 0 R(θ1,T , θ1,T ) u1,T ∈ / O(θ1,T , θ1,T ).
(2.21)
Note that the boundedness of the sets W and V implies the boundedness of the 0 ). set O(θ1,T , θ1,T Let us now suppose that the control sequence u ¯1,T is a discerning con0 )u trol sequence for the noise-free system (2.13), then R(θ1,T , θ1,T ¯1,T 6= 0 (see 0 Theorem 2.7). As a consequence, since the set O(θ1,T , θ1,T ) is bounded, it is always possible to stretch the vector u ¯1,T in such a way that it goes out 0 ) . Formally, there always exists a suitable positive scalar the set O(θ1,T , θ1,T 0 ) such that R(θ 0 0 ) for all u α ¯ (¯ u1,T , θ1,T , θ1,T / O(θ1,T , θ1,T 1,T , θ1,T ) u1,T ∈ 1,T = 0 αu ¯1,T with α > α ¯ (¯ u1,T , θ1,T , θ1,T ) (for the reader’s convenience, such quantities are depicted in Fig. 2.2). Hence, if we want to satisfy condition (2.21) [and consequently condition (2.18)] for all the switching patterns θ1,T ∈ ΘT and 0 0 , it is sufficient to choose the scalar parameter α θ1,T ∈ ΘT such that θ1,τ 6= θ1,τ such that 4
0 α>α ¯ (¯ u1,T ) = max0 α ¯ (¯ u1,T , θ1,T , θ1,T ) θ1,T ,θ1,T
0 0 . where θ1,T ∈ ΘT and θ1,T ∈ ΘT with θ1,τ 6= θ1,τ ¯ 1,T ) as the set of all the possible observations vectors y1,T By defining Y(θ associated with the switching pattern θ1,τ and the control sequence u1,T for any possible initial continuous state and any possible noise sequence, i.e., ¯ n 4 ¯ Y¯ (θ1,τ , u1,T ) = y1,T ¯ y1,T = F (θ¯1,T )x1 + G(θ¯1,T )u1,T
+H(θ¯1,T )w1,T + v1,T ; x1 ∈ Rn , θ1,T ∈ ΘT , θ¯1,τ = θ1,τ , o w1,T ∈ W T , v1,T ∈ V T , the foregoing can be summarized in the following theorem.
DIST - University of Genoa
§ 2.2 - Active parameter identification
37
0 O θ1,T , θ1,T
11111111111 00000000000 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111
0 R θ1,T , θ1,T u ¯1,T 0 αR ¯ θ1,T , θ1,T u ¯1,T
0 αR θ1,T , θ1,T u ¯1,T
Figure 2.2: Graphical interpretation of the condition on the control vectors in the presence of bounded noises
Theorem 2.8 Suppose that the sets W and V are bounded. Furthermore, suppose that system (2.13) is (τ, T )-DSO and let u ¯1,T be a discerning control sequence for the noise-free system (2.13). Then there exists a positive scalar α ¯ (¯ u1,T ) such that, for every α > α ¯ (¯ u1,T ) , the control sequence u1,T = α u ¯1,T is a discerning control sequence for system (2.12), that is, ¡ 0 ¢ Y¯ (θ1,τ , α u ¯1,T ) ∩ Y¯ θ1,τ ,αu ¯1,T = ∅ 0 . for every θ1,τ 6= θ1,τ
Theorem 2.8 ensures that, if the noise-free system (2.13) is (τ, T )-DSO, then it is always possible to choose a suitable discerning control sequence u1,T such that, even in the presence of unknown but bounded disturbances, the switching pattern θ1,τ can be determined uniquely from the observations vector y1,T as the unique θ1,τ such that y1,T ∈ Y¯ (θ1,τ , u1,T ) . It is important to remark that in general, given the vector u¯1,T , determining the positive scalar α(¯ ¯ u1,T ) might be a formidable task. However, in the special case where the sets ³ W and ´ V are polytopes, such a task becomes quite simple, 0 since the set O θ1,T , θ1,T turns out to be also a polytope. Therefore, it is pos³ ´ ³ ´ 0 0 sible to find a suitable matrix Ψ θ1,T , θ1,T and a suitable vector ω θ1,T , θ1,T
DIST - University of Genoa
§ 2.2 - Active parameter identification
38
such that ¡ ¢ © ¡ ¢ ¡ ¢ª 0 0 0 O θ1,T , θ1,T = z ∈ RmT : Ψ θ1,T , θ1,T z ≤ ω θ1,T , θ1,T . 0 ) can be found as the maximum α ≥ 0 such Then, each scalar α(¯ ¯ u1,T , θ1,T , θ1,T that ¡ ¢ ¡ ¢ 0 0 0 α Ψ θ1,T , θ1,T R(θ1,T , θ1,T )u ¯1,T ≤ ω θ1,T , θ1,T .
2.2.4
Simulation Results
In this section, a statistical analysis of the proposed approach is given. The system considered is the following: xt+1 = Axt + But ,
t = 0, . . . , T − 1
0
yt = θ xt + ηt ,
t = 0, . . . , T
where xt ∈ R3 , θ ∈ R3 and ut ∈ {u ∈ R2 : u0 u ≤ 9}. The matrices A and B are extracted randomly to form reachable and stable LTI systems, 1000 instances of the problem have been considered. The noise variance is set to 1. The estimation square error comparing our approach (based on the solution of 2.2) with a randomly selected control vector is shown in the box plots presented in figure 4.9. The time horizon T is set to 20 . From a comparison of the two strategies, one can easily see that our approach improves significantly the parameter estimation with random inputs. In figure 2.4 the optimal control is compared with a sinusoidal control. Here the time horizon T is set to 40. Even in this case the optimal control improves significantly the parameter estimation effectiveness. As a further example let us consider the mass-spring system: m¨ x = F x − kx − ω x˙ The associated state form is: · ¸ ¸ · 0 1 0 x˙ = x+ 1 u k ω −m −m m
DIST - University of Genoa
(2.23)
§ 2.2 - Active parameter identification
39
Estimation Square Error
(a)
(b)
12
0.12
10
0.1
8
0.08
6
0.06
4
0.04
2
0.02
0
0 Optimal
Random
Optimal
Random
Control Policy
Figure 2.3: Box plots of the estimation square error (a) and their enlargement (b)
(b)
(a) 1600 1.4 1400 1.2
Estimation Square Error
1200 1 1000 0.8
800
600
0.6
400
0.4
200
0.2
0
0 Optimal
Sinusoidal
Optimal
Sinusoidal
Control Policy
Figure 2.4: Box plots of the estimation square error (a) and their enlargement (b)
DIST - University of Genoa
§ 2.2 - Active parameter identification
40
We have considered the discrete time version of (2.23) with a sample time T s = 0.1. Let us consider the following measurement equation: yt = [cN (α1 , γ1 ), cN (α2 , γ2 ), . . . , cN (αm , γm )]0 θ + η where 1 1 2 2 (x −αi )
1
N (αi , γi ) = q
2πγi2
e γi
.
The system represents a controlled mass-spring system with a sensor giving information about an object with an increasing precision (high signal-noise ratio) with the alignment of the mass with the object (see Fig. 2.5). The control purpose is to gain the maximum amount of information in a receding horizon setting about the objects (represented with 4 unknown parameters). Figures 2.6, 2.7, 2.8 show the position of the system, the control and the increasing information gain respectively, setting T = 5, N = 1, m = 1, k = 1, ω = 1, α1 = 5, α2 = 2, α3 = −5, α2 = −2, γ1 = γ2 = γ3 = γ4 = 4, σ = 1, c = 10. We have compared our results with the information achieved with several sinusoidal input signals (the structure of the problem suggest the function form to be near to the optimal), in Fig. 2.8 we have compared the maximum informative input sinusoidal signal with the control sequence according to our approach.
k m ω
Figure 2.5: Mass-spring system
DIST - University of Genoa
§ 2.2 - Active parameter identification
41
6
Position of the mass
4
2
0
−2
−4
−6
0
10
20
30
40
Time step
50
60
Figure 2.6: Evolution of the position of the mass
6
4
Control value
2
0
−2
−4
−6
0
5
10
15
20
25
Time step
30
35
40
45
50
Figure 2.7: Evolution of the control values
DIST - University of Genoa
§ 2.2 - Active parameter identification
42
14
Most informative sinusoidal control Most informative control
13
12
Information gain
11
10
9
8
7
6
5
4
0
10
20
30
40
50
Time step
Figure 2.8: Information gain comparison
DIST - University of Genoa
60
Chapter 3
Active state estimation of nonlinear systems In this chapter, we address the problem of actively estimate the state of a stochastic dynamic system over a finite horizon (FH). With active estimation we intend the problem of finding a feedback control law that aim at maximizing the amount of information on the state of the system. In particular we formalize the above problem by means of a stochastic optimal control formulation, where the cost to be minimized is quantified by a suitable uncertainty measure. We assume that the classical linear quadratic (LQ) hypotheses are not met, indeed if these hypotheses are satisfied, the well known separation principle affirms that the choice of the control law does not affect the estimation process, that is, any control law is equally informative. In this chapter we formulate the problem in an information theoretic setting by using the Renyi Entropy as a measure of information about the state of the system. This choice is motivated by the the possibility of deriving a closed form expression for the amount of information, thus avoiding to resort to nonlinear programming techniques. As previously said in Chapter 1, solving a FH stochastic optimal control problem requires the knowledge of the conditional probability density function p(xt |It ) t = 0, 1, . . . , T −1, where xt is the state vector of the controlled plant and It is the information vector consisting of all the measures taken by the controller up to stage t and of all the control actions performed up to stage t − 1 (we assume that all stochastic vectors are mutually independent). Then, dynamic programming could be an effective tool to be applied, at least in principle. This
43
§ 3.1 - Active state estimation of nonlinear systems
44
technique, however, entails the recursive computation of the state conditional probability. Unfortunately, the recursive computation of the conditional density function can be accomplished analytically in very few cases, typically, under the classical LQG hypotheses. Since the conditional probability function is needed to calculate the measure of uncertainty, we resort to an approximation of the conditional probability. To this end, a Gaussian Sum Filtering approach is adopted [19]. The approximating technique adopted in this chapter to solve the resulting optimal control problem, consists in assigning the control law a given structure. A certain number of parameters have to be determined in order to minimize the cost function. Then, for such a fixed structure, a multilayer feedforward neural network is chosen. Constraining the control law to take on a fixed structure enables us to reduce the problem of finding the optimal control law (which is a functional optimization problem) to a nonlinear programming one. Such a technique has been used successfully to solve non-LQG deterministic and stochastic optimal control problems (in both finite, infinite, and receding horizon cases (see [10] and references within). Once the use of preassigned control structures has been decided, implementing the related control laws on multilayer feedforward neural networks appears quite a natural choice. Actually, this family of neural networks is characterized by the ability of approximating nonlinear functions (in our case, the optimal control and estimation functions) by using a number of parameters that may be surprisingly smaller than the one required by traditional expansions, like the polynomial and trigonometric ones (this applies to a class of functions to be approximated, characterized by suitable smoothness assumptions). Such a property, proved by Barron [20], should explain the successful experimental results achieved by feedforward neural networks in solving many application problems, more or less similar to approximation problems.
3.1
Problem formulation
Let us consider a discrete-time stochastic nonlinear system given by xt+1 = ft+1 (xt , ut ) + ξt
t = 0, . . . , T − 1
(3.1a)
yt = ht (xt ) + ηt
t = 0, . . . , T
(3.1b)
where xt ∈ Rn , yt ∈ Rm and ut ∈ Rp are the state vector, the measurement vector and the control vector respectively and where ξt ∈ Rn and ηt ∈ Rm are two independent white noise processes. The initial state x0 is known in probability
DIST - University of Genoa
§ 3.2 - Active state estimation of nonlinear systems
45
according to the initial density function p(x0 ). Let us define the information vector, by which the DM makes its decisions, as It = col(y0 , . . . , yt , u0 , . . . , ut−1 ), t = 1, . . . , T ; I0 = y0 . Then the DM’s control functions take on the form ut = γt (It ),
t = 0, 1, . . . , T − 1 .
(3.2)
While controlling the system, a process cost gt (xt , ut ) is incurred at any stage t. The final process cost is denoted by gT (xT ). Our aim is to control the system in T decisional stages in order to gain the maximum amount of information on the state vector. It is possible to formalize the above statement by means of a stochastic optimal control problem, by adding in the cost function to be minimized a suitable term quantifying the uncertainty in the knowledge of the state vector. We briefly recall the AE Problem, already formalized in Section 1.3. Problem 3.1 (AE) Find a sequence of control functions {u◦0 = γ0◦ (I0 ), . . . , u◦T −1 = γT◦ −1 (IT −1 )} that minimizes the expected value of the cost functional "T −1 # X J= gt (xt , γt (It )) + gT (xT ) + βL(IT ) (3.3) t=0
subject to the system equation (3.1a) and the measurement equation (3.1b). The scalar β modules the trade-off between the process cost and the term L(IT ) that denotes a suitable uncertainty measure penalizing the lack of knowledge on xT at t = T. To formalize the problem we are dealing with, two concepts are important: the concept of information and the concept of sufficient statistic, which will be the subjects of the next section.
3.2
Information measures and sufficient statistic
In this chapter we will concentrate on two information measures: the differential Shannon Entropy and the differential Renyi entropy (see Chapter 1). Let us recall their respective definitions. Definition 3.1 The Shannon entropy H(x) of a continuous random variable x ∈ X is defined by Z H(x) , − p(x) log p(x)dx. (3.4) X
DIST - University of Genoa
§ 3.2 - Active state estimation of nonlinear systems
46
Definition 3.2 The Renyi entropy Hr (x) of a continuous random variable x ∈ X is defined by Z 1 Hr (x) , pr (x)dx log2 (3.5) 1−r X where r ∈ R+ \{1}. If we set r = 2, then (3.5) is called quadratic Renyi entropy. It is worth noting that the above measures are scalar functionals of the probability density function p(x). We will address the problem of choosing one among these measures later. Given a stochastic optimal control problem, the information on the state of the system is contained in the conditional probability p(xt |It ). It is possible to show that this is indeed a sufficient statistic for the problem and its importance is both from the analytical and the conceptual point of view [11]. The conditioned probability p(xt |It ) can be generated recursively by the Bayes law and can be viewed as the state of a controlled discrete-time dynamic system, p(xt+1 |It+1 ) = Φ(p(xt |It ), ut , yt+1 ), t = 0, 1, . . . , T − 1 where Φ(·, ·, ·) represents the recursive Bayes updating law Φ(p(xt |It ), ut , yt+1 ) = where
p(yt+1 |xt+1 )p(xt+1 |It , ut ) p(yt+1 |It , ut )
(3.6)
Z p(xt+1 |It , ut ) = p(yt+1 |It , ut ) =
ZR
n
Rn
p(xt+1 |xt , ut )p(xt |It )dxt p(yt+1 |xt+1 )p(xt+1 |It , ut )dxt+1 .
In Fig. 3.1 a block diagram showing the evolution of the conditioned probability density function according to (3.6) is presented. Unfortunately an explicit form for (3.6) is not available in general. An important exception is when the system is linear and the random variables are normal distributed: in this case the conditioned probability is also normal and (3.6) takes on the form of the well known Kalman Filter equations.
DIST - University of Genoa
§ 3.2 - Active state estimation of nonlinear systems
47
ξt−1 xt−1
γt−1
ξt
ut−1
xt+1
xt
ft
ηt
ht
ft+1
γt
ut
ηt+1
yt+1
yt
p(xt−1 |It−1 )
Φt
ht+1
p(xt |It )
Φt+1
p(xt+1 |It+1 )
Figure 3.1: Sufficient statistic propagation according to Bayes rule.
Since we are addressing the problem under general assumptions and the conditioned probability function is needed to calculate the measure of uncertainty, we must resort to an approximation of the conditional probability. A possible approach is to approximate the conditional probability density functions p(xt |It ) by means of suitable fixed-structured functions, in which a finite dimensional parameter vector Sˆt have to be fixed in order to approximate the true function. The initial p.d.f. p(x0 ) can be approximated by optimizing the corresponding parameter vector Sˆ−1 with respect to some suitable loss function. An updating rule to propagate in time the parameters characterizing such a representation is then needed. Given the parameters vector Sˆt characterizing the p(xt |It ), the applied control ut and the measure yt+1 , such a rule will take on the form ˆ 0 (Sˆ−1 , y0 ) Sˆ0 = Φ ˆ t+1 (Sˆt , ut , yt+1 ), Sˆt+1 = Φ
t = 0, . . . , T − 1.
(3.7)
The description of one possible technique of this kind is presented in the next paragraph.
DIST - University of Genoa
§ 3.2 - Active state estimation of nonlinear systems
3.2.1
48
Gaussian Sum Filter
It can be shown [19] that any probability density p(x) can be approximated with a Gaussian Sum representation of the form z X pˆ(x) = αi N (x − µi , Pi ) (3.8) i=1 z X
αi = 1, αi ≥ 0 ∀i.
i=1
pˆ converges uniformly to any p(x) as the number of basis functions z increases [21]. Applying this technique to approximate p(xt |It ), we obtain a fixed structure parametric representation of the form pˆ(xt , Sˆt ). In the given representation the approximate sufficient statistics are given by Sˆt = {αki , µki , Pki , i = 1, . . . , zt }, t = 0, . . . , T . zt X (3.9) pˆ(xt , Sˆt ) = αki N (xt − µki , Pki ) i=1
It is possible to show that if p(xt |It−1 ) has a Gaussian Sum representation, also p(xt |It ) and consequently p(xt+1 |It ) admit the same representation, whose correctness improves as Pki → 0. 0 In the limit as zt → ∞, Pki → 0 and Pki → 0, the Gaussian Sum algorithm gives the exact evolution of the a posteriori density function of p(xt |It ) [22]. Each Gaussian term in the sum may be propagated independently using an Extended Kalman Filter (EKF) and the result is then normalized by equating the zero moment of each Gaussian distribution [23]. This is the so called Gaussian Sum Filter (GSF). Let zt X p(xt |It ) = αki N (xt − µki , Pki ) (3.10) i=1 zt+1
p(xt+1 |It ) =
X
0 0 α(t+1)i N (xt+1 − µ0(t+1)i , P(t+1)i )
(3.11)
i=1
be the Gaussian Sum representation of the ex-post and ex-ante1 probability density function respectively and p(x0 |I−1 ) be the approximation of p(x0 ) according 1
The ex-ante and ex-post probabilities are the updated probabilities using information up to the current time and up to the previous time respectively
DIST - University of Genoa
§ 3.2 - Active state estimation of nonlinear systems
49
to (3.8). Two cases must be taken in account to obtain (3.11) from (3.10). The first one is when the system noise covariance Qt is comparable to that of the terms of the Gaussian Sum Pki . Then the prediction step is performed as follows: zt+1 = zt , ¯ ∂f(t+1) ¯ ¯ , F(t+1)i = ∂xt ¯µki ,ut µ0(t+1)i = f(t+1) (µki , ut ) , 0 T P(t+1)i = F(t+1)i Pki F(t+1)i + Qt , 0 α(t+1)i = αki .
Following the same approach as in EKF, the innovation step is given by: ¯ ∂ht+1 ¯¯ H(t+1)i = , ∂xt+1 ¯µ0 (t+1)i
0 T N(t+1)i = H(t+1)i P(t+1)i H(t+1)i + Rt+1 , −1 0 T K(t+1)i = P(t+1)i H(t+1)i N(t+1)i , ³ ´ β(t+1)i = N yt+1 − ht+1 (µ0(t+1)i ), N(t+1)i , h i µ(t+1)i = µ0(t+1)i + Kki yt+1 − ht (µ0(t+1)i ) , 0 0 P(t+1)i = P(t+1)i − K(t+1)i H(t+1)i P(t+1)i ,
α(t+1)i =
0 α(t+1)i β(t+1)i
Pzt+1 h=1
0
α(t+1)h β(t+1)h
.
The second one is when the covariance Qt is large if compared to the Pki . In this case it may be necessary to introduce a Gaussian Sum representation for p(ξt ) to prevent all approximating Gaussian collapsing into a single term and the GSF reducing to a single EKF [19]. In any case if the elements of the approximating covariance overcome a fixed threshold depending on the initial approximation, it may be necessary to reapproximate the probability density according to (3.9) [22].
3.2.2
Information content of a Gaussian Sum
The Shannon entropy of a Gaussian mixture it is not calculable in closed form. This is not the case for the Renyi quadratic entropy, whose attractiveness is given
DIST - University of Genoa
§ 3.3 - Active state estimation of nonlinear systems
50
by the following theorem Theorem 3.1 Let x be a random variable such that z X p(x) = αi N (x − µi , Pi ) i=1
Then the quadratic Renyi entropy H2 (x) can be expressed in closed-form by the expression £ ¤ H2 (x) = − log2 αT Cα where α = [α1 , α2 , . . . , αz ]T and C is the symmetric matrix of elements cji = N (µj − µi , Pj + Pi ) .
Proof: by declaring beforehand the following result valid for the product of two gaussians N (x − µi , Pi ) N (x − µj , Pj ) = cij N (x − µij , Pij ) where cij = cji = N (µi − µj , Pi + Pj ) ³ ´−1 ³ ´ µij = Pi−1 + Pj−1 Pi−1 µi + Pj−1 µj ³ ´−1 Pij = Pi−1 + Pj−1 it is possible to develop H2 (x) as follows: #2 Z "X z H2 (x) = − log2 αi N (X − µi , Pi ) dX Rn
i=1
z z X X £ ¤ αi cij αj = − log2 αT Cα . = − log2 i=1 j=1
DIST - University of Genoa
§ 3.3 - Active state estimation of nonlinear systems
3.3
51
Approximated problem solution
In the previous section we have shown how it is possible to obtain an approximate sufficient statistic SˆT from a Gaussian Gaussian sum representation, furthermore it has been shown that it is possible to evaluate analytically its information content. Consequently (3.3) can be redefined as "T −1 # ³ ´ X Jˆ Sˆ−1 , x0 , ξ, η = gt (xt , γ¯ (Sˆt )) + gT (xT ) + βH2 (SˆT ) (3.12) t=0
where the control laws take on the form ut = γ¯t (Sˆt ). Of course, by addressing cost Jˆ instead of cost J, an approximation introduced, as the approximate statistics Sˆt is used instead of the true infinite dimensional ones p(xt |It ). Even assuming the control laws to depend on the approximate statistic, the AE Problem is not easy to solve: in fact the control laws are functions to be determined by minimizing the cost (3.12) as in a functional optimization problem. Following the lines of [10], by exploiting again (as done in approximating p(xt |It )) the properties of parameterized structures, we fix the structure of the control functions, i.e. ut = γˆ (Sˆt , wt ) where wt is the set of parameters of the chosen representation and γˆ is a nonlinear approximator. This technique takes on the name of Extended Ritz Method [10]. This allows us to reformulate, in a more tractable form, Problem 3.1. For the sake of notational compactness, let us define ξ , {ξ0 , . . . , ξT −1 } ,
η , {η0 , . . . , ηT } ,
w , col [w0 , . . . , wT −1 ] .
We define an Active Neural Estimation Problem the following Problem 3.2 (ANE) Let ut = γˆ (Sˆt , wt ) and let Sˆ−1 be the parameters of the Gaussian sum representation of p(x0 ). Find the sequence of the optimal parameter vectors © ª w◦ = w0◦ , . . . , wt◦ , . . . , wT◦ −1 that minimizes the cost functional ³ ´ J¯ = Jˆ Sˆ−1 , x0 , ξ, η, w E x0 , ξ, η
where in (3.12) the control functions γ¯ (Sˆt ) are substituted by γˆ (Sˆt , wt ).
DIST - University of Genoa
(3.13)
§ 3.3 - Active state estimation of nonlinear systems
52
Problem 3.1 has been reduced to a nonlinear programming problem (NLP), which can be solved by applying the gradient method. Let h be the generic descending step of the algorithm. Then the solution of problem (3.2) may be calculated by applying the following updating rule ¯ wh+1 = wh − th ∇hw J,
h = 0, 1, . . .
¯ wh . The evaluation of (3.13) and of its gradient are where ∇hw J¯ denotes ∇w J| in general difficult to achieve but impossible. The application of the stochastic gradient method [24] allows us to overcome this new problem. The new updating rule becomes wh+1 = wh − th ∇hw J,
h = 0, 1, . . .
where th is the step-size of the algorithm and ∇hw J is evaluated at each step in a particular realization of x0 , ξ and η. For the sake of simplicity, let us consider the process cost depending only on the control vectors, that is J = G(u0 , . . . , uT −1 ) + βT H2 (SˆT ) G(u0 , . . . , uT −1 ) =
T −1 X
gt (ut ).
t=0
Finally let us briefly describe the learning mechanism which includes the backpropagation (BP) algorithm through which ∇hw J is evaluated. The training algorithm consists at each training iteration of two main steps that are iterated up to convergence. In the former, called forward pass, random variables are generated according to their distribution and the system is simulated up to the last stage. More specifically the following steps are performed: 1) the initial state xh0 and the random vectors ξ h and η h are randomly generated; 2) the optimal control is generated and the approximate sufficient statistic is propagated up to the last stage T . Then we obtain ³ ´ Jh , J Sˆ−1 , xh0 , ξ h , η h , wh In the latter, called backward pass, the gradient of the cost with respect of the weight wt of each net, denoted with ∇hwt J, is recursively back evaluated. The
DIST - University of Genoa
§ 3.4 - Active state estimation of nonlinear systems
53
initialization of the algorithm is given by defining the following quantities: ψT λT
∂Jh ∂H2 = βT , ∂ SˆT ∂ SˆT ∂Jh ∂G ∂H2 ∂H2 , = + βT = βT ∂xT ∂xT ∂xT ∂xT ˆ T ∂hT ˆ T ∂hT ∂Φ ∂H2 ∂ Φ = βT = ψT . ∂yT ∂xT ∂ SˆT ∂yT ∂xT ,
Then, for t = T − 1, . . . , 0, the following quantities are generated recursively: ∂Jh ∂ut ∂H2 ∂ut
µ
= =
∂Jh ∂ut
=
∇hwt J
=
∂Jh ∂ Sˆt
∂G ∂H2 + βT , ∂ut ∂ut ˆ t+1 ∂H2 ∂ Φ ∂H2 ∂ft+1 + , ˆ ∂xt+1 ∂ut ∂ St+1 ∂ut ˆ t+1 ∂G ∂Φ ∂ft+1 + ψt+1 + λt+1 , ∂ut ∂ut ∂ut ¯ ∂Jh ∂ˆ γ ¯¯ , ∂ut ∂wt ¯wh t
¶ , γ
ψt =
∂Jh ∂ˆ γ , ∂ut ∂ Sˆt ˆ t+1 ∂Jh ∂Φ = ψt+1 + ∂ Sˆt ∂ Sˆt
µ
∂Jh ∂ Sˆt
¶ , γ
ˆ t ∂ht ∂Jh ∂Jh ∂ft+1 ∂Jh ∂ Φ = + ∂xt ∂xt+1 ∂xt ∂ Sˆt ∂yt ∂xt ˆ t ∂ht ∂ft+1 ∂Φ = λt+1 + ψt . ∂xt ∂yt ∂xt
λt =
ˆ t+1 /∂ Sˆt , ∂ Φ ˆ t /∂yt , ∂ Φ ˆ t+1 /∂ut have been obtained by difThe quantities ∂ Φ ferentiating (3.7) and the final stage Renyi Entropy; calculations are reported in Appendices C and D . The quantities ∂ˆ γ /∂wt and ∂ˆ γ /∂ Sˆt are evaluated by applying the BP algorithm to each net [10].
DIST - University of Genoa
§ 3.4 - Active state estimation of nonlinear systems
3.4
54
Simulation results
In this section we present two applications of the AE Problem. The proposed examples show the effectiveness of the active neural control in a contest generally considered as difficult, that is: • hard nonlinearities; • non Gaussian random variables; • non stationary processes. In the following examples the process cost has been considered as a quadratic form given by G(u0 , . . . , uT −1 ) =
T −1 X
ut Γut
t=0
where Γ > 0 is a matrix introduced to limit the control vectors’ module and let them physically feasible. In the following examples we have chosen one hidden layer neural network made up of twenty neurons.
3.4.1
Localization
Let us consider an observer O, who is allowed to move along a circumference of given ray r. The observer task is to determine his angular position, not known with certainty, using a fixed point P as an absolute reference. Figure 3.2 gives a geometric representation of the problem. The angle γ is the angular position of the observer and θ is the measure available to him. The mathematical formulation of the problem is the following: γt+1 = γt + ut + ξt µ ¶ yP − r sin γt −1 θt = tan + ηt xP − r cos γt γ0 ∼ U (0, 2π)
t = 0, 1, . . .
where xP and yP are the cartesian coordinates of P . The initial probability density is considered as uniform to model the complete lack of knowledge of the observer. In the training of the neural controllers we considered a time horizon T = 18 steps and absence of system noises.
DIST - University of Genoa
§ 3.4 - Active state estimation of nonlinear systems
55
P
xP X D θ x rD
O
β
Figure 3.2: Geometric representation of the problem.
The evolution of the Renyi quadratic entropy obtained by the active neural control is represented in Fig. 3.3 as the average in each step over 20.000 simulations and it is comparable with those obtained by the application of both random and heuristic controls. The match is performed considering the modules of the test controls to be confrontable with the neural ones to take care of the process cost. Heuristic controls have been chosen according to the idea that the most valuable states to obtain an information gain are those for which we have a large measure variation. An example of the trajectories followed by the system is depicted in Figure 3.4 as a polar plot, where the angle is the state γt and the radial distance increases with time. The dotted line shows the two angles for which the measure channel presents an extremum. An example of evolution of the probability density function is eventually represented in Figure 3.5. As we can see from a starting uniform probability density we succeeded in controlling the system to reduce the uncertainty of the state.
3.4.2
Bearing Only Motion Planning
Let us consider two generic geometric points A and B in an absolute reference frame F. Point B, whose absolute position xB is known with certainty, leaves the origin of F according to a rectilinear uniform motion denoted by vB which can be directed anywhere. Point A, instead, whose absolute position xA is known
DIST - University of Genoa
§ 3.4 - Active state estimation of nonlinear systems
56
2.5 Euristic control Random control Neural control
Average Rényi quadratic entropy
2
1.5
1
0.5
0
−0.5 0
2
4
6
8
10
12
14
16
18
k
Figure 3.3: Entropy mean evolution along T steps 90
90
20 60
120
10
150
30
180
330
240
60
10
150
0
210
20
120
30
180
0
210
330
240
300 270
300 270
Figure 3.4: Two example of trajectories. Starting density
k=6
0.15
0.6 pdf
0.8
pdf
0.2
0.1 0.05 0
0.2 0
2
4 β k = 12
0
6
2
0
2
0
2
β k=T
4
6
4
6
2.5 2 pdf
1.5 pdf
0.4
1
1.5 1
0.5 0
0.5 0
2
β
4
6
0
β
Figure 3.5: An example of density evolution.
DIST - University of Genoa
§ 3.5 - Active state estimation of nonlinear systems
57
with uncertainty, has to follow the most appropriate trajectory to establish as well as possible his relative position x = xB − xA . With reference to Fig. 3.6, A can only measure the angle θ between vector x and the horizon. F B
x
A
vB
θ
Figure 3.6: Geometric representation of the problem. The mathematical formulation of the problem is the following: xB(t+1) = xBk + cvB xA(t+1) = xAk + ut − ξt ¶ µ xt2 + ηt yt = tan−1 xt1 t = 0, 1, . . . or, equivalently, by considering the system state as the relative position between the two points x = xB − xA xt+1 = xt − ut + cvB + ξt µ ¶ −1 xt2 + ηt yt = tan xt1 t = 0, 1, . . . where the term cvB + ξt can be regarded as a W N (cvB , Σξ ) acting on the system. Fig. 3.7, 3.8 and 3.9 are the analogous for the BOMP of Fig. 3.3, 3.4 and 3.5 for the localization and show how our neural control policy is satisfying in terms of information gain. In Fig. 3.8, in particular are represented for clearness the separate trajectories followed by the two points instead of the state of the system. Point B is the one which starts from the origin of F. It must be noticed that good behaviours have been obtained even if B does not move according to the rectilinear uniform motions for which the neural network has been trained.
DIST - University of Genoa
§ 3.5 - Active state estimation of nonlinear systems
58
4 Euristic control Random control Neural control
Average Rényi quadratic entropy
3.8 3.6 3.4 3.2 3
2.8 2.6 2.4 0
2
4
6
8
10
t
0
0
−5
−5
−10
−10
y
y
Figure 3.7: Entropy mean evolution along T steps
−15
−15
−20
−20 −20 −15 −10 x
−5
0
−20
−15
−10 x
−5
0
Figure 3.8: An example of trajectories.
DIST - University of Genoa
§ 3.5 - Active state estimation of nonlinear systems
Starting density
k=4
pdf
0.4
pdf
0.4 0.2
0.2
0
0
22 20 x2
18
18
20
6
22
4 x2
x1
10 x1
2
k=8
12
14
k=N 2 pdf
1 pdf
59
0.5 0 0
1 0 2
−2 −4
2
x2
4 x1
6
2
0 −2
x
2
0 −2
x1
Figure 3.9: Example of density evolution.
3.5
A geometric interpretation
The minimization of H2 involves the maximization of the quadratic form αt Ct αt . This led us to a geometric interpretation of the active control based on Renyi quadratic entropy. Let A ⊂ Rzt be the admissibility region of the weight αt and Qt = αt Ct αt be the paraboloid in Rzt with respect to the variables αt . The maximization of Qt in A may be obtained by augmenting the eigenvalues of Ct , or equivalently increase its determinant, and by making αt as most parallel as possible to the eigenvector corresponding to the greatest eigenvalue of Ct . (i) (i) Let λt = maxi λt , where λt are the eigenvalues of Ct . In Figure 3.10 is presented the evolution of λt for a realization of the first example. (i) Let us denote with vt , i = 1, . . . , zt , the eigenvectors of Ct at stage t. The (i) (i) alignment of αt and vt is evaluated through |αtT vt |, which is superiorly limited by the Cauchy-Schwartz inequality: (i)
(i)
|αtT vt | ≤ kαt kkvt k ∀i = 1, . . . , zt Fig. 3.11 represents the previous analysis for the first example. Eigenvectors are ordered with respect of the corresponding eigenvalue. It can be noted that at
DIST - University of Genoa
§ 3.5 - Active state estimation of nonlinear systems
60
18.5
18
λ
k
17.5
17
16.5
16
15.5 0
2
4
6
8
10
12
14
16
18
k
Figure 3.10: Eigenvalues analysis.
the last stage αT is almost parallel to the last eigenvector, that is the direction in which there is the most significative variability. k=0
k=6
0.15
0.15 v(i) α
0.2
v α
0.2
(i)
0.1 0.05
0.05 0
50 i k = 12
0
100
0.4
0.4
0.3
0.3 v(i) α
(i)
v α
0
0.1
0.2 0.1 0
0
50 i k=T
100
0
50 i
100
0.2 0.1
0
50 i
100
0
Figure 3.11: Eigenvectors analysis.
DIST - University of Genoa
Chapter 4
Specializations in discrete environments 4.1
Active Identification of unknown graphs
In this section we consider the problem of exploring an a-priori partially unknown graph. This general problem can model a number of interesting decisional and control problems such as extra-planetary robotic exploration, see floor mapping, search for free channels in communication networks, etc. The graph exploration can be formulated as a problem in which a controller (or Decision Maker-DM) must acquire all the information about a finite set of unknown parameters. Considering the problem of exploration in a robotic framework, a possible way to represent an unknown environment is to model it as a graph defined by partitioning the domain into a finite number of nodes. The environment is unknown due to the possible absence of links. Thus, two distinct states, free and not free, can be associated with each link of the graph depending on the presence of obstacles on them. At each exploration step the agent must move from a node, where it is currently located, to another selected among the adjacent ones. The objective of the exploration is to identify the state of all the links reachable from the agent’s starting point. A similar model for the robotic exploration problem can be found in [25]. One of the first exploration problems formulated on graphs was the on-line chinese postman problem (OCPP) [26, 27]: an agent must traverse all edges of an unknown, directed and strongly connected graph and return to the starting vertex. The agent has the map of all explored edges and vertices and recognizes
61
§ 4.1 - Specializations in discrete environments
62
them when they are encountered again; moreover it knows which edges are incident to the explored vertices but it doesn’t know the vertices they lead to. For this problem the authors introduced some euristic algorithms and gave upper and lower bounds for them. Another exploration problem is the on-line TSP (OTSP) in which an agent must visit all the vertices of an unknown, directed and strongly connected graph with the assumption that it knows which edges are incident to the explored vertices, but in this case also the vertices they lead to [28, 29]. A possible variant is the piecemeal search in which the DM must return every so often to its starting point before completing the exploration (for example, it must return to the starting point to refuelling because of its limited fuel availability) [30]. We address the problem in a very general setting, where the DM sensors are affected by noise. In this case the problem can be formulated as a stochastic optimal control problem on a continuous but finite dimensional state space. Moreover, under a particular assumption, it is possible to consider a discrete model in time and space and, by using the concept of entropy, to reformulate the problem as a stochastic shortest path problem (see [12]). When addressing directed graphs, by suitably decomposing the set of nodes, one can apply standard dynamic programming (DP) algorithms to obtain the optimal solutions (see [31] for a time-varying setting and [32] for the optimization of particular utility functions). The same does not hold true in the case of undirected graphs, when the concept of “information set” assumes major importance. By exploiting the concept of “frontier nodes” an equivalent formulation to the original problem is given, for which any policy is a “proper” one and the DP value iteration algorithm converges in a finite number of steps. The complexity of the problem dealt with leads us to consider techniques similar to Neuro-Dynamic Programming (see [12],[33]). In such a way the original functional optimization problem is reduced to a nonlinear programming one consisting in selecting the optimal values for the “free” parameters of the neural approximators (see also [10] and the references therein). Thanks to the powerful approximating properties of the latter, the original problem can be approximated with any desired degree of accuracy. Some Barron’s results suggest the method proposed might avoid the “curse of dimensionality”.
4.1.1
Statement of the exploration problem on stochastic graphs
Let us consider a Decision Maker (DM) moving on an undirected graph G = (V, E) where
DIST - University of Genoa
§ 4.1 - Specializations in discrete environments
63
- V = {1, 2, ..., N } is the set of nodes (0 is the starting node); - E is the set of undirected links (i, j) connecting the nodes i and j; let M = |E|; - cij = length of link (i, j); - Es ⊆ E is the set of stochastic links, whose lengths can take on the values ci,j = Ci,j or ci,j = ∞, where 0 < Ci,j < ∞ ; let Ms = |Es |. For each (i, j) ∈ Es , let us define a random variable θi,j that represents the existence of the corresponding link, and can take on two values ½ 1 if ci,j = Ci,j θi,j = (4.1) 0 if ci,j = ∞ . Moreover, let θ = col[θi,j , (i, j) ∈ Es ] and p(θ) the related a-priori probability mass function. We shall consider graphs with self-cycles, i.e. graphs for which a link (i, i) connecting each vertex i ∈ V to itself exists. We shall assume the length of such links to be ci,i = 1 , i ∈ V and their existence to be perfectly known. Before starting, the DM perfectly knows the topology of the graph, but it has only a probabilistic knowledge on the subset Es of links. For the sake of simplicity, in this section we consider a two value probability mass function for the length of the links in Es . However, the technique proposed in the following is suitable to be applied in the case of more complex discrete probability density functions. The length of a link (i, j) ∈ Es is only probabilistically known to the DM until it visits i or j for the first time. When visiting a node i, the lengths of every link departing from i become known. The DM gathers information about other links (not departing from i) accordingly to a given noisy measurement equation. To summarize this, let us introduce the following measurement function 1 if ( n = i or n = j ) and θi,j = 1 0 if ( n = i or n = j ) and θi,j = 0 g i,j (n, θ, ξ) = (4.2) i,j g˜ (n, θ, ξ) otherwise . Where n represents the node, θ the vector of random variables related to the existence of stochastic links, and ξ a random vector noise affecting the measure.
DIST - University of Genoa
§ 4.1 - Specializations in discrete environments
64
Moreover, let g(n, θ, ξ) = col[g i,j (n, θ, ξ), (i, j) ∈ Es ]. At time t, when the DM is at node nt , the measurement vector yt is obtained as yt = g(nt , θ, ξt )
(4.3)
where ξt is a general i.i.d. stochastic process. It is worth noting that the above proposed measurement equation can effectively model many real exploration problems. As an example let think to a robotic exploration problem, in which the DM can acquire “perfect” information only on the neighboring portion of terrain, but its sensors are affected by an uncertainty that increases proportionally with the distance. Another example is the exploration of an “unknown” telecommunication network, where the information about links not departing from the current node can be corrupted by noise. Without loss of generality, the links adjacent to the starting node n0 will be assumed to be a-priori known. As the links departing from the visited nodes have become perfectly known, the transitions on the graph result to always be deterministic, so the information vector for the DM can be defined as 4
It = col [y0 , y1 , . . . , yt ] ,
t = 0, 1, . . . .
(4.4)
We shall assume the DM to have a perfect memory. Let p(θ|I) be the conditional probability mass function of the stochastic vector θ when the information vector I has been acquired. Such a mass function can be initialized with the a-priori mass function p(θ) θi,j ∈ Es and can be updated by the Bayes law. It will be useful to describe the movement of the DM on the graph by introducing the following elementary discrete-time dynamic system: nt+1 = ut ,
t = 0, 1, . . . , T
ut ∈ U (nt , It ) where U (i, I) =
(4.5)
n j ∈ V : ∃ (i, j) ∈ E \ Es or
o ∃ (i, j) ∈ Es , p(θi,j = 1|I) = 1
is the set of “known” neighboring nodes. In particular, U (nt , It ), t = 0, 1, . . . corresponds to the set of all the nodes connected nt by a finite-length link. At any stage t = 0, 1, . . ., the DM makes its decision on the basis of the current node nt and of the information vector It , that is, by the control function ut = γ(nt , It ),
t = 0, 1, . . . .
DIST - University of Genoa
(4.6)
§ 4.1 - Specializations in discrete environments
65
Let us define the process cost as the length of a path from n0 to nT : J=
T −1 X
h(nt , ut ) ,
(4.7)
t=0
where we have defined h(nt , ut ) = cnt ,ut . We have described how the DM moves and acquires information on the graph. Let us now define the objective the DM must achieve. Informally speaking, the goal of the DM is to gather all the possible information about the graph with the minimum path length. In order to formalize this, in the following we shall use the concepts of information and entropy as given by Shannon [2]. The entropy H(x) of a discrete random variable x over its domain X is defined as X 4 H(x) = − p(x = x ¯) log p(x = x ¯) . x ¯∈X
In particular, the entropy related to the random vector θ when the information vector I has been collected is defined as X ¯ log p(θ = θ|I) ¯ . H(θ|I) = − p(θ = θ|I) Ms ¯ θ∈{0,1}
Let I(I 0 , I 00 ) = −∆H(I 0 , I 00 ) = H(θ|I 0 ) − H(θ|I 00 ). I(I 0 , I 00 ) represents the difference between the “quantity of knowledge” on the stochastic parameters related to two different information vectors I 00 and I 0 . In particular, I(It1 , It2 ) is the information gain acquired by the DM in the time interval [t1 , t2 ]. At time t, given an information vector It and the current position nt , the exploration task can be considered “completed” if the following relation holds: lim max I(It , It+t¯) ≤ ²
t¯→∞
(4.8)
It¯
subject to (4.3) and (4.5). ² is an arbitrarily small constant. The existence of the maximum is guaranteed by the boundedness of the information gain: I(I 0 , I 00 ) ≤ H(θ) ≤ |Es|,
∀I 0 , I 00 .
DIST - University of Genoa
§ 4.1 - Specializations in discrete environments
66
The sense of condition (4.8) is quite simple. It means that the exploration process can be considered as terminated if DM, in the future, cannot gather an information greater than a given ². We can now state the Exploration problem in the form of a usual stochastic optimal control problem. Problem 4.1 (SGEP-Stochastic Graph Exploration Problem) Find the optimal control function γ ◦ generating u◦0 = γ ◦ (n0 , I0 ) , . . . , u◦T −1 = γ ◦ (nT −1 , IT −1 ) that minimize the expected value of the cost J subject to the constraints (4.5) and (4.8) (T is the a-priori unknown time at which the DM satisfies the constraint (4.8)).
4.1.2
Applying dynamic programming
The Stochastic Graph Exploration Problem described in the previous section can be ideally solved by means of Dynamic Programming. Let us remark that the dimension of the information vector It , t = 0, 1, . . . grows with time. To avoid this heavy drawback, we shall adopt an equivalent formulation for the control function making use of a sufficient statistic and, in particular, of the conditional probability mass function p(θ|It ) (see e.g. [11]). ¯ t ), θ¯ ∈ {0, 1}Ms will be computed, In general, the 2Ms real values p(θ = θ|I for each time instant t = 0, 1, . . . , by means of the Bayes formula (the iteration is initialized by the a-priori probabilities). In this case the (fixed) dimension of the sufficient statistic grows exponentially with the number Ms of stochastic links. If the variables θi,j , (i, j) ∈ Es remain uncorrelated, then, only Ms values have to be computed, e.g., 4
pi,j t = p(θi,j = 1|It ), (i, j) ∈ Es . for t = 1, 2, . . . . pi,j 0 are the a-priori probabilities p(θi,j = 1) . For the sake of simplicity and without loss of generality, we shall consider this case, that corresponds to assume the a-priori indipendence of the stochastic variables θi,j ∈ Es and that g˜i,j (n, θ, ξ) = g˜i,j (n, θi,j , ξ i,j ), n ∈ V, (i, j) ∈ Es (see (4.2) where ξti,j , (i, j) ∈ Es , t = 0, 1, . . . are i.i.d scalar quantities. Then, to represent concisely the sufficient statistic, let us define the following Ms dimensional vector h i 4 pt = col pi,j , (i, j) ∈ E s t
DIST - University of Genoa
§ 4.1 - Specializations in discrete environments
67
and simply address it as sufficient statistic. Moreover, let us denote by pt+1 = P + (pt , ut , yt+1 ) the application of the Bayes formula at stage t = 1, 2, . . . . In the general case, when the measurement channel is affected by noise, the vectors pt , t = 0, 1, . . . belong to a continuous space [0, 1]Ms . Let us define the “augmented state” corresponding to a node n ∈ V and a sufficient statistic p as 4
x = col(n, p). Then, with a little abuse of notation, the control function (4.6) can be substituted by the following (equivalent) one ut = γ(xt ). Similarly we shall write U (x) instead of U (n, I). Moreover, let us define as 4 Sˆ = V × [0, 1]Ms the set of all the possible augmented states. Of course, since the DM has a perfect knowledge on the adjacent links, not all the nodes n ∈ V are provided with all the free values of p ∈ [0, 1]Ms . Then, in general, only the set S ⊂ Sˆ of feasible augmented state has to be considered. Application of Dynamic Programming yields ½ (k+1) J (x) = min h(x, u) u∈U (n) · ¸¾ ¡ ¢ (k) + +EJ col u, P (p, u, g(u, θ, ξ)) , k = 0, 1, . . . θ,ξ ½ ˜ J(x), ∀x ∈ S \ Sf (0) (4.9) J (x) = 0, ∀x ∈ Sf where Sf = {x : Imax (x) ≤ ²} and, for a generic augmented state x = col(n, p) , Imax (x) is the maximum information gain, as defined in (4.8), achievable by the DM when nt = n and ˜ pt = p . J(x) are some upper bounds on the optimal costs J ◦ (x) . In order to solve such a problem, a possibility consists in resorting to approximating techniques such as extended Ritz method or Neuro-Dynamic Programming (see for example [10, 11]). In the following of the section, we shall apply Neuro-Dynamic Programming to the solution of the graph exploration problem.
DIST - University of Genoa
§ 4.1 - Specializations in discrete environments
68
For the sake of simplicity, we shall consider the case where the DM’s “vision” is restricted to the adjacent links. In such a simplified framework the measurement equations can be written as 0 if (n = i or n = j) and ci,j = Ci,j i,j 1 if (n = i or n = j) and ci,j = ∞ y = −1 otherwise . where by −1 we mean that the DM acquires no information on the parameter, i.e., the measure is uncorrelated with the parameter. In this particular case the vector of measurement noises ξt make no sense (yt = g(nt , θ), and will disappear from now on. Each component of the function P + can be written as if (nt+1 = i or nt+1 = j) and ci,j = Ci,j 0 i,j 1 if (nt+1 = i or nt+1 = j) and ci,j = ∞ pt+1 = i,j pt otherwise and each probability pi,j t , (i, j) ∈ Es can take on only one of the three values 0 , i,j 1 , and p0 . Hence, the augmented state space S turns out to be a discrete set with cardinality |S| < N 3MS . As a consequence it is possible to solve the graph exploration problem by means of exact dynamic programming and the recursive algorithm (4.9) yields the optimal control function u◦ = γ ◦ (x) for any state x ∈ S in a finite number of iterations. Moreover, the maximum information gain Imax (x) achievable by the DM in a state x = col(n, p) can be easily calculated as the entropy of the “reachable” part of the graph, i.e., X Imax (x) = − pi,j log pi,j (i,j)∈R(x)
where R(x) is the set of all the stochastic links (i, j) such that at least one between i and j can be reached by the DM on a finite-cost path departing from node n in the best possible case, i.e., when all the unknown stochastic links are supposed to have a finite cost. It is worth noting that given a state x the set R(x) can be easily computed with a simple polynomial-time algorithm. Unfortunately, even in such a simplified framework solving Problem SGEP via the Dynamic-Programming algorithm (4.9) may require an unacceptable computational time, unless very small instances ¡of the problem are involved. In fact, the ¢ number of augmentes states is of order O N 3Ms . Then we may incur the “curse of dimensionality” when the number Ms of stochastic links increases. In order
DIST - University of Genoa
§ 4.1 - Specializations in discrete environments
69
to mitigate this drawback, in the next sections we shall adopt an approximate technique. The following definition will be useful. Definition 4.1 A control function γ which drives the DM from any x ∈ S to Sf in a finite number of steps is said to be proper.
Clearly, not all the possible policies are proper. This is a serious drawback when one looks for an approximate solution. In order to overcome this obstacle, we shall formulate a problem for which all the possible policies are proper.
4.1.3
An alternative formulation of the problem
Following the lines of [34], let us define, for a state x = col(n, p), the set of ˜ “frontier nodes” F(x) as the union all the nodes adjacent to at least one stochastic link with unknown length, i.e., n o 4 ˜ F(x) = j ∈ V : ∃(j, k) ∈ Es , pj,k = pj,k . 0 ˜ For any x = (n, p), we shall denote as F(x) the set of frontier nodes f ∈ F(x) such that the shortest path sp(x, f ) driving from node n to node f through deterministic links (i.e, on the deterministic graph (V, E \{(i, j) ∈ Es : pi,j < 1})) ˜ does not cross any other frontier node f 0 ∈ F(x), f 0 6= f . 0 Denote by U (x) = {sp(x, f ) : f ∈ F(x)} the set of the shortest paths driving from x to any node in F(x). This set defines the admissible control actions associated to a new Problem which will be called SGEP0 . In this framework, at any stage t = 0, 1, . . . the DM chooses the next frontier node to visit or, equivalently, the path in the graph on the basis of the state x, i.e., u0 = γ 0 (x) , u0 ∈ U 0 (x) . In the following, we shall denote by lf (u0 ) the number of links composing the path u0 , and by f (u0 ) its last component, i.e., the frontier node associated
DIST - University of Genoa
§ 4.1 - Specializations in discrete environments
70
to u0 . Consequently, a new discrete-time dynamic system can be introduced (the integer T 0 is a stochastic variable): nτ +1 = f (u0τ ),
τ = 0, 1, . . . , T 0 − 1
(4.10)
xT 0 ∈ Sf
(4.11)
u0τ
(4.12)
0
∈ U (xτ ) .
Since a path u0τ ∈ U 0 (xt ) does not cross any frontier node but f (u0τ ), all the state transitions associated to such a path are deterministic except for the last one which depends on the realization of y(f (u0τ ), θ). Hence pτ +1 = P + [pτ , f (u0τ ), g(f (u0τ ), θ)] . Moving from a node n to a frontier node f ∈ F(x), following the path u0 = γ 0 (x), the DM incurs the deterministic cost lf (u0 ) 0
0
h (x, u ) =
X
£ ¤ h u0 (k − 1), u0 (k)
k=1 4
where u0 (k) is the k-th node visited by path u0 and u0 (0) = n . We have now all the elements necessary to define an alternative formulation of the cost (4.7) 0
J =
0 −1 TX
¡ ¢ h0 xτ , u0τ
τ =0
and to state Problem SGEP0 . Problem 4.2 (SGEP0 ) For every x ∈ S \ Sf , find the optimal control function γ 0◦ that minimizes the expected value of the cost J 0 subject to (4.10)-(4.12) (T 0 is the a-priori unknown time at which the DM reaches Sf ).
According to the definition of the set of admissible controls U 0 (x) , at every time step τ = 0, 1, . . . the DM visits a node adjacent to at least one unknown stochastic link. Hence, after at most Ms + 1 steps, the DM has a perfect knowledge of the graph. Then the following proposition holds.
DIST - University of Genoa
§ 4.1 - Specializations in discrete environments
71
Proposition 4.1 All the policies for Problem SGEP0 are reaches Sf in at most Ms + 1 steps.
Given a control law γ 0 for Problem SGEP0 , it is always possible to define an induced control law γ¯ for Problem SGEP, by choosing, for every state x, γ¯ (x) as the first node of the path γ 0 (x). Here and in the following given a control law γ for Problem SGEP, we shall denote as J γ (x) the expected value of the cost associated with such a control law, i.e., (T −1 ) X 4 γ J (x) = E h(xt , γ(xt )) θ
t=0
under the constraints (4.3) and (4.5) with x0 = x and ut = γ(xt ) . T is the a-priori unknown time at which the DM reaches Sf . Clearly given a control law γ 0 for Problem SGEP0 and the corresponding induced control law γ¯ , we have 0
J 0γ (x) = J γ¯ (x) . The following theorem enlightens the relation between an optimal control law for Problem SGEP0 and the induced control law for Problem SGEP. Theorem 4.1 Suppose that γ 0◦ is an optimal control law for Problem SGEP0 , and let γ¯ ◦ be the control law for Problem SGEP derived from γ 0◦ . Then γ¯ ◦ is an optimal control law for Prolem SGEP.
Proof: The proof can be given by induction. Let γ ◦ be an optimal control law for problem SGEP. In order to prove Theorem 1 it is sufficient to show that ◦ ◦ J γ (x) = J γ¯ (x) , ∀x ∈ S . Towards this end, let us consider the following decomposition of the state space S. Sk =
{x = col ( n, p ) ∈ S : |pi,j : pi,j = pi,j 0 , (i, j) ∈ Es | = k} ,
i.e., the DM does not know k values of the lengths of the stochastic links. In particular, SMs corresponds to the set of states for which the DM has no information on the lengths of the links in Es , and S0 corresponds to all the states for which the DM has complete information on the stochastic links. The following properties hold:
DIST - University of Genoa
§ 4.1 - Specializations in discrete environments
1. Sh ∩ Sl = ∅,
M [s
∀h 6= l,
72
Sh = S;
h=0
2. for any control function γ 0 , 0
f (γ (x)) ∈
k−1 [
Si , ∀x ∈ Sk , k = 1, . . . , Ms .
i=0
First we notice that, since for every x ∈ S0 we have Imax (x) = 0 , the set S0 is included in the set of final states, i.e., S0 ⊆ Sf . Hence for every x ∈ S0 we ◦ ◦ have J γ (x) = J γ¯ (x) . ◦ Let us now suppose that for all x ∈ Si , i = 0, 1, . . . , k we have J γ (x) = ◦ ¯ ) in ¯ = col(¯ J γ¯ (x) . If we apply the optimal control law γ ◦ in a generic state x n, p k+1 S , the DM moves on the graph until eventually it reaches a frontier node f ∈ x) . Since all the state transitions associated to such a path are deterministic F(¯ except the last one, the cost can be written as γ◦
x) = J (¯
¯−1 T X
n ◦ o h(xt , γ ◦ (xt )) + E J γ (xT¯ ) θ
t=0
¯ and ut = γ ◦ (xt ) . T¯ is the under the constraints (4.3) and (4.5) with x0 = x 4 ¯ , f, g(f, p ¯ ))]. First, we note time when the DM reaches the state xT¯ = [f, P + (p that the summation in the right-hand member has the following lower bound ¯−1 T X
x, u0f ) h(xt , γ ◦ (xt )) ≥ h0 (¯
t=0
S 4 where u0f = sp(x, f ) . Moreover, since xT¯ ∈ ki=0 Si , by the induction hypothesis ◦ ◦ we have J γ (xT¯ ) = J γ¯ (xT¯ ) , hence the following inequality holds ¡ ¢¤ ◦ ◦ £ ¯ , f, g(f, p ¯ )) . x) ≥ h0 (¯ x, u0f ) + E J γ¯ col f, P + (p J γ (¯ θ
x) = The right-hand member can be associated to a control law γ 0 such that γ 0 (¯ ¯ . Then, by the optimality of γ 0◦ , we have u0f and γ 0 (x) = γ 0◦ (x) , ∀x 6= x ¡ ¢¤ ◦ £ ¯ , f, g(f, p ¯ )) x, u0f ) + E J γ¯ col f, P + (p h0 (¯ θ
≥J
0 γ 0◦
◦
x) . x) = J γ¯ (¯ (¯
DIST - University of Genoa
§ 4.1 - Specializations in discrete environments
73
◦
◦
x) ≤ J γ¯ (¯ x) . Hence, we conMoreover, by the optimality of γ ◦ , we have J γ (¯ ◦ ◦ γ γ ¯ x) = J (¯ x) . clude that J (¯ In the following we shall consider Problem SGEP0 , since, in the light of Theorem 4.1, it turns out to be equivalent to Problem SGEP. While, on one hand, this choice requires a computational overhead, on the other hand, we can look for an approximate solution without having to check if it is proper (see Proposition 4.1).
4.1.4
Exact and approximate value iteration
We now address Problem SGEP0 and consider the following dynamic programming algorithm to solve it ½ (k+1) J (x) = min h0 (x, u0 ) u0 ∈U 0 (x) h ³ ³ ´ ´i¾ (k) 0 + 0 0 + EJ col f (u ), P p, f (u ), g(f (u ), θ) , θ
k = 0, 1, . . . ½ ˜ J(x) ∀x ∈ S \ Sf (0) J (x) = 0 ∀x ∈ Sf
(4.13) .
(4.14)
The following result can be claimed. Proposition 4.2 The recursive algorithm (4.14) yields the optimal control function u0◦ = γ 0◦ (x) for any state x ∈ S \ Sf in at most Ms iterations. Proof: Such a property of the recursive algorithm (4.14) descends directly from the possibility of decomposing the state space in Ms subsets as in the Proof of Theorem 1. As stated previously, since the number of the states S \Sf grows exponentially with the number Ms of stochastic links, for complex instances of the graph it is not possible to find the optimal cost function J ◦ (x) in a reasonable time by using the “exact” algorithm (4.14). Hence, we shall resort to an approximation technique that consists in assigning a given structure to the cost-to-go function. In such structures, a certain number of “free” parameters have to be determined in order to approximate as well as possible optimal cost-to-go function J ◦ (x).
DIST - University of Genoa
§ 4.1 - Specializations in discrete environments
74
Following [10], we choose as fixed-structure functions the so called “one-hiddenlayer” (OHL) networks. This means that, for each node, the approximate costto-go takes on the form Jbn (p, wn ) =
νn X
cn,i ϕ(p, wn,i ),
n = 1, 2, . . . , N
(4.15)
i=1
where νn is the number of parametrized basis functions of the n-th approximator 4
and wn = col(wn,1 , cn,1 , wn,2 , cn,2 , . . . , wn,νn , cn,νn ) is the vector of “free” parameters to be tuned. Furthermore, if we define the vector of all the parameters 4
as w = col ( w0 , w1 , . . . wN −1 ), then we can write the approximate cost-to-go b as function J(·) b w) = Jbn (p, wn ), ∀x = col(n, p), x ∈ S \ Sf , J(x, b w) = 0, J(x, ∀x ∈ Sf . Among various possible parametrized basis functions ϕ, we chose sigmoidal func¯ n,i · p + wn,0i ). Then the approximators (4.15) are given by OHL neural tions σ(w networks. As to the capability of OHL neural networks to approximate the optimal solutions the reader is referred to [34]. Clearly, following the guideline of the formulation of Problem SGEP0 , it is possible to associate a control function to any given cost-to-go function Jb by means of the dynamic programming operator, i.e., ½ 0 γ b (x) = arg min h0 (x, u0 ) u0 ∈U (x) h ³ ³ ´ ´i¾ + E Jb col f (u0 ), P + p, f (u0 ), g(f (u0 ), θ) . (4.16) θ
By construction, equation (4.16) gives a proper control law independently of the b values of the cost-to-go J(x), x ∈ S \ Sf . However if one applies equation (4.16) b to the optimal cost-to-go, i.e., J(x) = J ◦ (x), then an optimal control function is obtained. Furthermore, it is worth noting that equation (4.16) must be applied on line, a crucial issue concerns the required computational effort. If the graph is a strongly connected one and the exact computation of the expectation in (4.16) is not viable, a Montecarlo approximation can be implemented. Even in this case the resulting policy is a proper one. We are now able to formulate a mathematical programming problem that approximates the original functional Problem SGEP0 to any degree of accuracy
DIST - University of Genoa
§ 4.1 - Specializations in discrete environments
75
(the reader interested in the approximation of functional stochastic optimization problems by approximating parametrized schemes is referred to [10]). ¡ ¢ ◦ Problem 4.3 (SGEPw ) Find the optimal vector w◦ = col w0◦ , w1◦ , . . . , wN −1 such that the control function γ b0 associated cost-to-go functions P to the approximate 0 ◦ Jbn (I, wn ), n = 1, 2 . . . , N , minimizes x∈{S\Sf } J 0bγ (x). We now describe in some detail the “approximate value iteration” algorithm that can be used to determine w◦ . Such an algorithm 4.1 is similar to the incremental approximate value iteration algorithm described in [12] but, in this case, N different neural networks are trained at the same time. Algorithm 4.1 (0)
1. Choose randomly the initial weight vectors wn , n = 1, 2, . . . , N ; set k = 0; 4
2. choose randomly an admissible state x(k) = col(n, p) ∈ S \ Sf ; 3. make one step of the value iteration algorithm in the state x(k) : ½ ³ ´ ³ ´ (k) ¯ h0 x(k) , u0 J x = min u0 ∈ U 0 (x(k) ) ¾ h ³ ¡ ¢ (k) ´ i + 0 0 b + E Jf (u0 ) P p, f (u ), g(f (u ), θ) , wf (u0 ) ; θ
4. update the weight vectors of the N neural network accordingly to ½h ³ ´ ³ ´ i2 ¾ c1 (k+1) (k) (k) (k) b ¯ wn = wn − ∇wn Jn p, wn − J x c2 + k and (k+1)
wi
(k)
= wi ,
∀i 6= n, n ∈ V ;
5. if X h ¡ ¢ ¡ ¢ i2 Jb x, w(k+1) − Jb x, w(k) >δ
.
x∈S\Sf
then set k = k + 1 and return to step 2.
DIST - University of Genoa
(4.17)
§ 4.1 - Specializations in discrete environments
76
Note that, since the number of feasible states |S \Sf | grows exponentially with the number of stochastic links, for complex instances of the graph G the computation of the summation in (4.17) requires too much computational time. Hence, instead of (4.17), we have considered the following termination criterium: X h ¡ ¢ ¡ ¢ i2 Jbn p, wn(k+1) − Jbn p, wn(k) < δ, n = 1, . . . , N (k)
p∈V Sn
(n)
where the n-th validation set V Sh is composed by a given number α ¿ |S \ Sf | of sufficient statistic p such that the state col(n, p) is feasible.
4.1.5
Numerical results
In this section, we apply the proposed approach for the approximate solution of Problem SGEP to a simulation examples in order to show its effectiveness. Let us consider a graph with 9 nodes and 6 stochastic links (see Fig. 4.1) and let us suppose that the DM’s vision is restricted to the adjacent links. Moreover let us suppose that ² = 0, i.e., the goal of the DM is to explore all the reachable 1 stochastic links. The a-priori probabilities pi,j 0 have been chosen equal to 2 . ˆ = 6561 and, after the exclusion of all the unfeasible In this case, we have |S| states, |S| = 4050. Given the simplicity of the graph and the relatively small number of states, we can apply the algorithm (4.14) (which ends in 6 iterations) to find the optimal control function γ 0◦ (x) and the optimal cost to go J 0◦ (x), ∀x ∈ S \ Sf . The optimal solution has been compared with the approximate solution obtained by following the approach of Section 4.1.4. For each node a OHL neural network with 10 hyperbolic tangent activation functions has been used. The neural networks have been trained by means of Algorithm 4.1. Let Jb be the approximate cost function after the training process. Fig. 4.2 shows, for every node n, a box plot of the percentage error between the approximate cost-to-go function Jb and the optimal one J ◦ , that is, ¯ 4 ¯ b − J ◦ (x, w)¯/J ◦ (x), ∀x ∈ S \ Sf . P E(x) = ¯J(x) Let γˆ 0 be the proper control function derived from the approximate cost function Jb by equation (4.16).
DIST - University of Genoa
§ 4.1 - Specializations in discrete environments
77
4
3
5, ∞ 2
3
3
20
7, ∞
5
5, ∞ 1
8
7
5, ∞
5, ∞ 3
3
10, ∞
9
6
Figure 4.1: A simple stochastic graph.
7
6
Values
5
4
3
2
1
0 1
2
3
4
5
6
7
8
9
Figure 4.2: Box plots of the percentage error P E(x). The states x are grouped by associating the information vectors p to the ralated nodes.
DIST - University of Genoa
§ 4.2 - Specializations in discrete environments
78
0.9
0.8
0.7
Values
0.6
0.5
0.4
0.3
0.2
0.1
0 1
2
3
4
5
6
7
8
9
Figure 4.3: Box plots of the percentage error P E 0 (x).
Fig. 4.3 shows, for every node n, a box plot of the percentage error between 0 the cost function J γˆ and the optimal one J 0◦ , defined as ¯ 0 4 ¯ P E 0 (x) = ¯J 0ˆγ (x) − J ◦ (x)¯ J ◦ (x), ∀x ∈ S \ Sf . From a comparison of Figs. 4.2 and 4.3, one can easily see that, even if the approximate function Jb does not represent a very good approximation of the b optimal cost function J ◦ , the costs-to-go of the proper policy γˆ 0 , derived from J, turn out to be very close to the optimal ones.
4.2
The robotic exploration problem
In this Section we address the robotic exploration problem i.e. the Problem of exploring an unknown environment with one or more Decision Makers. We shall show how this Problem can be formulated as an active identification one. Exploration and mapping are fundamental tasks that a single mobile decision maker (DM) or “a team” of DMs must perform whenever they operate in unknown domains on which only partial information or no information at all is available (examples are extra-planetary robotic exploration, see floor mapping, search for free channels in communication networks, etc.). A promising strategy to rapidly obtain the map of an unknown environment is to use a team of autonomous DMs. According to such a strategy, each single agent performs a local exploration, and a team-coordination policy is adopted in
DIST - University of Genoa
§ 4.2 - Specializations in discrete environments
79
order to make the agents cooperate in attaining the global exploration goal more efficiently. If several DMs cooperate, a team optimal control problems arises (see, for instance, [35]). In both cases suboptimal solutions are proposed. The unknown environment has been modelled as a grid defined by partitioning the domain to explore into a finite number of regular squares (cells). The environment is unknown due to the possible presence of obstacles located in unknown positions. Then two distinct states, free and not free, are associated with the cells of the grid, depending on the presence of obstacles on them. The DMs must explore the environment moving from a free cell to an adjacent free one for purpose of identifying the states of all the cells reachable from the starting one. At each exploration step, a DM improves its knowledge of the model by updating on its private map the states of the cells it has encountered along its path, and communicates such pieces of information to the other agents in order to speed up the whole exploration task. Entropy is used to quantify the information gain obtained during the exploration process, making the DMs move toward the places where information is less certain. Such a strategy has been used by MacKay [7] for data selection and analysis. Similar techniques adopted in exploration frameworks have been presented in [36] and [37]. In this section we will use the active identification framework described in Chapter 1 to devise an exploring technique which allows one or more Decision Makers, acting on a partially unknown environment, to rapidly build a map of the surrounding.
4.2.1
Problem formulation
In this section we shall formulate the robotic exploration problem as an active identification one. Let us consider a decision maker (DM) that is given the task of exploring a two-dimensional environment. The mapping problem consists in constructing a map of the ground by identifying its obstacle-free parts and the parts occupied by obstacles. In order to model the environment, we choose a discrete formalization dividing the ground into regular squares or cells. For the sake of simplicity, and without loss of generality, we are assuming the portion of terrain that the DM must explore to be “square”. Without any further complication, a general compact set of cells can be considered instead. We shall assume the portion of ground to explore to be described by the grid X = {0, 1, . . . , n}2 .
DIST - University of Genoa
§ 4.2 - Specializations in discrete environments
80
In order to model the presence of an obstacle in a cell, we introduce the following mapping function: 1 if there is an obstacle on x , x∈X θ(x) = (4.18) 0 otherwise. In order to make the notions more compact, where useful, we shall denote by the vector θ the collection of the values of θ(x) for all possible x ∈ X (ordered, e.g., by rows). We shall continue to write θ(x) to denote the components of θ corresponding to each x ∈ X . Let us define as xt the position of DM on X at time t = 0, 1, . . . (we adopt a discrete-time setting). We describe the movement of the DM on X by the following simple state equation xt+1 = xt + ut ,
t = 0, 1, . . .
(4.19)
where x0 = x ˆ is the known initial position of the DM, ut ∈ U is the DM’s two-dimensional control vector, U = {−1, 0, 1}2 is the set of admissible controls. Moreover, we shall assume the DM to acquire information on the environment by means of its sensors, that will be modelled by a suitable measurement equation: yt = h(xt , θ) + ηt ,
t = 0, 1, . . .
(4.20)
where h(·, ·) is measurement function (representing the specific DM sensor, and it is not specified here for the sake of generality) and ηt is a white additive noise. Let us then define the information vector, by which the DM makes its decisions, as It , col(y0 , . . . , yt , xt ). Then the DM’s control functions take on the form ut = γt (It ),
t = 0, 1, . . . .
(4.21)
Remark 4.1 As the state equation is not affected by noise and the initial position is known, the DM knows perfectly its position at every instant t = 0, 1, . . .. For this reason the information vector It defined here, is different from the information vector defined in chapter 1. It is worth noting that θ plays here the role of an a priory unknown parameter vector. Because of the noisy measurement equation, only a probabilistic knowledge of such parameter vector is allowed. We consider a Bayesian framework, and update the corresponding subjective probability on the base of a prior probability
DIST - University of Genoa
§ 4.2 - Specializations in discrete environments
ut
DM
81
xt + ut
(DM position) xt+1
xt ηt yt
Z −1
h(xt , θ) + ηt
θ (Environment)
Figure 4.4: Dynamic decision scheme
p(θ) we assume to know. Then we give the following probability update based on the bayes rule:
P (θ|It ) =
P (yt |θ, xt )P (θ|It−1 ) , P (yt |xt )
t = 0, 1 . . . .
(4.22)
where we have used the uncorrelation hypothesis between consecutive measures (since ηt is white noise). Moreover, let us consider a cost function g(x, u, θ) associated to a decision (movement) of the DM when it is in a given position x. Given T exploration stages the total “process” cost is given by J(x0 , uT0 −1 , θ)
=
T X
g(xt , ut , θ).
(4.23)
t=0
As said before, the objective of the DM is to build a map of the environment or, equivalently, to acquire all the possible information on it while minimizing the process cost (4.23). To formalize this, we need to introduce an information measure and to formulate the exploration problem as an active identification one, in particular we formalize the exploration problem as an active identification problem over an a priori unknown time horizon (see section 1.3.2 for a general formulation). To fix the ideas and to interpret the situation in spite of Fig. 1.1, see Fig. 4.4.
DIST - University of Genoa
§ 4.2 - Specializations in discrete environments
82
Given a decision law γ = {u0 = γ0 (I0 ), . . . , uT −1 = γT −1 (IT −1 )}, a grid X , and a starting state x ˆ, the grid entropy is HT (θ) = −
1 XX
p (θ(x) = i|IT ) log p (θ(x) = i|IT ) .
x∈X i=0
It is worth noting that the above quantity is a random variable, because it depends on the realization of the measurements y0 , y1 , . . . , yT . For this reason different definitions of grid explorability can be given. We shall give two of them (the least conservative), explorability with probability 1 and explorability in mean square. Definition 4.2 Given a grid X and a starting state x ˆ, Xp ⊆ X is called explorable with probability 1 if a decision law γ = {u0 = γ0 (I0 ), u1 = γ1 (I1 ), . . .} exists, such that the entropy of Xp converges almost everywhere to zero, i.e. µ ¶ P lim HT (θp ) = 0 = 1 a.e. . T →∞
Definition 4.3 Given a grid X and a starting state x ˆ, Xms ⊆ X is called explorable in mean square if a decision law γ = {u0 = γ0 (I0 ), u1 = γ1 (I1 ), . . .} exists, such that the entropy of Xms converges in mean square to zero, i.e. © ª lim E HT2 (θms ) = 0. T →∞
Definition 4.4 We define X¯p and X¯ms as the largest explorable grid (i.e. the maximum cardinality grid) in probability and in mean square respectively. For the sake of simplicity, we define the entropy associated with the largest explorable grid, as H(θ¯e ), assuming implicitly to have chosen one among the previous definitions. We can now formalize the exploration problem. Problem 4.4 Given a grid X , an initial state x ˆ with θ(ˆ x) = 0 and an small ² > 0, ªfind an optimal decision law γ ◦ = © ◦ arbitrarily ◦ u0 = γ0 (I0 ), . . . , u◦T −1 = γT◦ −1 (IT −1 ) such that the expected value of (4.23) is minimized and HT (θ¯e ) ≤ ².
DIST - University of Genoa
§ 4.2 - Specializations in discrete environments
83
It is worth noting that the above problem is a particular Active Identification Problem where the the time horizon T is unknown. Hence we are dealing with a particular version of Problem 1.4. In fact, the instant in which the amount of information gathered by the DM reaches the lower bound ², is a priori unknown and depends not only on the control law but also on the measurements y0 , y1 , . . . , yT . As previously said in Chapter 1, the difficulty of the stated problem is considerable, and we must abandon the idea of solving it optimally. Then two choices can be taken, the first consists in trying to find an approximate control law (for example by using the Neuro-Dynamic approach), the second is to make some assumptions on the measurement channel and to resort to an ad-hoc technique. In this section we shall adopt the latter approach, the former has been used in the previous section where the problem of exploring an unknown graph has been addressed.
4.2.2
Problem solution
It is worth noting that the class of the measurement equations (4.20) introduced in the previous section is very broad, hence in this section we shall specialize it. From now on we shall be operating under the following assumptions. Assumption 4.1 The DM, at every exploration step t, acquires mutually independent measurements y(z) on every cell z ∈ X . ¯ given a “sensor Given a map configuration, described by the parameter vector θ, radius” r, we define the set of “visible” cells from a given position x¯ by the set ¯ r). Ω(¯ x, θ, ¯ when the DM occupies the Then, given a map represented by the parameters θ, position x ¯, it can acquire information only about the cells belonging to the set ¯ r) ⊂ X . We can now characterize the measurement equation Ω(¯ x, θ, Assumption 4.2 The measurement equation is defined as √ ¯ + ηt , ||z − xt || > 2, z ∈ Ω(xt , θ, ¯ r) θ(z) √ ¯ ¯ r) yt (z) = θ(z), ||z − xt || ≤ 2, z ∈ Ω(xt , θ, ¯ r). ηt , z∈ / Ω(xt , θ, where xt is the DM position and θ¯ is the map instance.
DIST - University of Genoa
(4.24)
§ 4.2 - Specializations in discrete environments
84
The last assumption is that the cost the DM incurs when it does not move is null, i.e. g(xt , 0, θ) = 0. Let us consider now the following cost function à T ! X J˜t = lim g (xi , ui , θ) + βi H (p(θ|Ii )) (4.25) T →∞
i=t
where βt is a suitable scalar parameter weighting the information gain. It is worth noting that, tanks to the assumptions 4.1 and 4.2, it is possible to decompose the expected value of the cost (4.25) as follows à t¯ ! n o X g (xi , ui , θ) + βt¯H(θ|It¯) E J˜t = i=t
+ lim T →∞
T X
E {g (xj , uj , θ)} + βj H(θ|Ij )
(4.26)
j=t¯+1
where t¯ > t is the first instant in which the DM acquires information, i.e. the first instant such that H(θ|It¯) 6= H(θ|It¯−1 ). We shall call xt¯ a frontier state. The approach proposed here is based on the minimization of the first part of the cost (4.26), Ã t¯ ! X J¯t = (4.27) g (xi , ui , θ) + βt¯H(θ|It¯) i=t
If we fix βt¯ = 0, minimizing (4.27) leads to an algorithm similar to the known frontier based exploration algorithm, justifying the reasonableness of the frontierbased heuristic presented in [38]. Thanks to the particular assumptions made, it is possible to derive a simple procedure based on the minimization of (4.27). It is evident that discarding the stochastic part of the cost allows generation of the policy in real-time. Consider the DM in a given position xt , the problem is to find the control sequence which minimizes (4.27) for t = 1, . . .. It can be easily done by the following algorithm 1. Find the set Xtf
DIST - University of Genoa
§ 4.2 - Specializations in discrete environments
85
2. Compute the minimum deterministic cost from xt to x ∀x ∈ Xtf 3. Choose x◦f such that (4.27) is minimized. It is worth noting that the solution of every step of the above procedure requires a polynomial time algorithm such as the value iteration algorithm with a complexity of O(n3 ) (where n is the cardinality of (X )). The procedure terminates when the set XTf is empty. Remark 4.2 We have not considered explicitly the obstacle avoidance problem, although it is worth noting that it can be treated in two possible ways. The first is to associate an infinite cost to every position containing an obstacle, i.e. g(xi , ui , 1) = ∞ ∀ ui . The second is to minimize (4.27) subjected to the constraint P (θ(xi )) = 0.
4.2.3
Simulation results
In this section, some examples are given to illustrate the effectiveness of the methodology described in the previous sections. The method has been tested considering both a noisy either a deterministic (i.e. ηt = 0 ∀t) measurement channel. Different kinds of maps have been considered, in order to show how the efficiency in solving the exploration task improves when the proposed entropy based technique is applied. To this end, our algorithm has been compared with one that uses only the frontier definition [38] (i.e., when βt¯ = 0 in (4.27)). Both structured either unstructured maps have been considered. In Fig. 4.5 numerical results are presented for 10 randomly extracted maps. The map shown in Fig. 4.8 has been considered as representative of an artificial structured environment. Elements such as rooms, doors and passages are present, as well as an unexplorable part (which must be identified in order to end the exploration task). As we have seen in Section 4.2.2, the entropy concept is sufficient to establish if the exploration task is terminated, i.e., if all the explorable parts of the environment have been identified. With little modifications we have extended our approach to a collaborative exploration problem, in which several DMs cooperate to identify the unknown environment. We have considered a vision range r = 4 and a communication capability Rc = 10. We have selected four starting locations, one for each room.
DIST - University of Genoa
§ 4.2 - Specializations in discrete environments
86
Entropy based exploration Frontier based exploration
650 600
Number of exploration steps
550 500 450 400 350 300 250 200 150 100 50 0
1
2
3
4
5
6
7
Map number
8
9
10
Figure 4.5: Comparison of the entropy based strategy and the frontier based one
Number of exploration steps
800
700
600
500
400
300
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
ρ
Figure 4.6: Evaluation of the entropy based strategy varying the coefficient ρ
DIST - University of Genoa
§ 4.2 - Specializations in discrete environments
87
Number of exploration steps
800
700
600
500
400
300
200 0.01
0.02
0.03
0.04
ρ
0.05
0.06
0.07
0.08
0.09
Figure 4.7: Evaluation of the entropy based strategy varying the coefficient ρ
Figure 4.8: Map of a deterministic environment
DIST - University of Genoa
§ 4.2 - Specializations in discrete environments
88
In Fig. 4.9, the mean value of the number of steps required to complete the exploration task is given as a function of the number N of agents (the continuous and the dashed lines correspond to the entropy based exploration and frontier based exploration respectively). Note that, for a small N , a significant improvement in the performance is obtained by increasing the number of exploring agents. The best performance is achieved with N = 6, when the entropy based exploration strategy is used. When N is augmented, the increase in time is due to “congestion” when too many agents are moving in this particular environment. To show how the exploration task evolves, the entropy of the map (4.10) has been computed over time. The minimum map entropy is reached by using any of the two procedures, but our technique significantly increases the speed of the information acquisition (see Fig. 4.10). Figures 4(a-f) show how the exploration task is executed for a particular realization. For different time instants snapshots are provided, where the grey cells are the unexplored ones. In Fig 4(a) at t = 0, when the robots are at their starting point, every cell but the ones within the vision range are mapped as unknown. Note how different “rooms” are explored by different robots, thus showing on effective cooperate behavior. The set of grey cells in Fig 4(f) represents the unexplorable part of the map. To study the behavior of our technique in a “natural” unstructured environment, we have considered 30 scattered maps generated as instances of a random map with a random number of obstacles between 0 and 300 uniformly distributed on it. As a performance index, we have considered the mean value of the number of steps required to complete the exploration task for N = 1, .., 7 (see Fig. 5). In this situation a performance improvement is highlighted when Procedure 30 is used.
DIST - University of Genoa
§ 4.2 - Specializations in discrete environments
89
120 Frontier based exploration Entropy based exploration
110
exploration steps
100 90 80 70 60 50 40 30 1
2
3
4
n. agents
5
6
7
Figure 4.9: Mean of the number of exploration steps for the map in Fig. 1.
400
400
N=2
N=1 200
0
200
0
50
100
150
400
0
0
50
100
N=3
N=4
200
0
200
0
20
40
60
80
100
400
0
0
20
40
60
80 N=6
200
200
0
50
100
150
0
0
20
40
60
80
400 N=7 200
0
100
400 N=5
0
150
400
Entropy based exploration Frontier based exploration 0
20
40
60
80
100
Figure 4.10: Entropy evolution.
DIST - University of Genoa
100
§ 4.2 - Specializations in discrete environments
90
20.5
20.5
19.5
19.5
18.5
18.5
17.5
17.5
16.5
16.5
15.5
15.5
14.5
14.5
13.5
13.5
12.5
12.5
11.5
11.5
10.5
10.5
9.5
9.5
8.5
8.5
7.5
7.5
6.5
6.5
5.5
5.5
4.5
4.5
3.5
3.5
2.5
2.5
1.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5 12.5 13.5 14.5 15.5 16.5 17.5 18.5 19.5 20.5
1.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5 12.5 13.5 14.5 15.5 16.5 17.5 18.5 19.5 20.5
(a)
(b)
20.5
20.5
19.5
19.5
18.5
18.5
17.5
17.5
16.5
16.5
15.5
15.5
14.5
14.5
13.5
13.5
12.5
12.5
11.5
11.5
10.5
10.5
9.5
9.5
8.5
8.5
7.5
7.5
6.5
6.5
5.5
5.5
4.5
4.5
3.5
3.5
2.5
2.5
1.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5 12.5 13.5 14.5 15.5 16.5 17.5 18.5 19.5 20.5
1.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5 12.5 13.5 14.5 15.5 16.5 17.5 18.5 19.5 20.5
(c)
(d)
20.5
20.5
19.5
19.5
18.5
18.5
17.5
17.5
16.5
16.5
15.5
15.5
14.5
14.5
13.5
13.5
12.5
12.5
11.5
11.5
10.5
10.5
9.5
9.5
8.5
8.5
7.5
7.5
6.5
6.5
5.5
5.5
4.5
4.5
3.5
3.5
2.5
2.5
1.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5 12.5 13.5 14.5 15.5 16.5 17.5 18.5 19.5 20.5
1.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5 12.5 13.5 14.5 15.5 16.5 17.5 18.5 19.5 20.5
(e)
(f)
Figure 4.11: Evolution of a simulation run.
DIST - University of Genoa
Appendix A
Renyi entropy of a Normal random variable First it is recalled a well known result from probability theory Lemma A.1 Given two normal probability density function, Ni (x − x ¯i , Σi ) and Nj (x − x ¯j , Σj ), then N1 (x − x ¯i , Σi ) N2 (x − x ¯j , Σj ) = cij N (x − x ¯ij , Σij ) where cij = cji = N (¯ xi − x ¯ j , Σi + Σ j ) ³ ´−1 ³ ´ −1 x ¯ij = Σ−1 Σ−1 ¯i + Σ−1 ¯j i + Σj i x j x ³ ´−1 −1 Σij = Σ−1 + Σ i j
Proposition A.1 The quadratic Renyi entropy of a Normal pdf N (x − x ¯, Σ) is µ n¶ ¡ 2n n ¢ 1 1 e H2 (It ) = log 2 π det Σ = H − log 2 2 2n
91
§ A.0 - Renyi entropy of a Normal random variable
Proof: H2 (It ) = − log
Z
92
Z 2
p (xt |It )dxt = − log
N 2 (xt − x ¯ t , Σt )
By using Lemma A.1 Z ³p ´ H2 (It ) = − log N (0, 2Σt )N (x − Σt Σ−1 x ¯t , 2Σt ) = log (22n π n det Σ) = µ n¶ ¡ 2n n ¢ 1 1 e log 2 π det Σ = H − log 2 2 2n
DIST - University of Genoa
Appendix B
Proofs of the theorems of Chapter 2 The following results are required to prove theorem 2.2 and 2.3. We recall two Propositions from Linear Algebra (see for example [39]) Proposition B.1 Given two real matrices, A ∈ Rm×n and B ∈ Rn×p , the following relations hold: rank (AB) ≤ min(rank (A) , rank (B))
(B.1a)
rank (AB) ≥ rank (A) + rank (B) − n
(B.1b)
Proposition B.2 Given a matrix A ∈ Rn×n and two vectors x and y ∈ Rn , the following relation hold: ¡ ¢ det A + xy 0 = det(A)(1 + x0 A−1 y) (B.2)
Lemma B.1
T X t=1
Xt Xt0
>0⇔
T X
xt x0t > 0.
t=1
93
(B.3)
§ B.0 - Proofs of the theorems of Chapter 2
94
Lemma (B.1) follows from the block diagonal structure of (2.4). Lemma B.2 n vectors {x1 , x2 , . . . , xn }, xi ∈ Rn , are linear independent iff ¡ ¢ rank x1 x01 + x2 x02 + . . . + xn x0n = n
Proof: (⇒) Consider
Z=
£
x1 x2 . . . xn
¤
x01 x02 .. .
x0n From (B.1a) and (B.1b) we have that n ≤ rank (Z) ≤ n (⇐) If rank (Z) = n then, by (B.1a), we have ¡£ ¤¢ x1 x2 . . . xn rank ≥n which completes the proof. Proof of Theorem 2.2 (⇐) Identifiability implies reachability (from the null state) Let us fix T = n. Thanks to Property (B.3) it is sufficient to study the rank of the matrix ! à n X 0 rank (F (1, n)) = rank xt xt . (B.4) t=1
For (B.2) the rank (F (1, n)) is equal to the number of linearly independent vectors xi , i = 1, . . . , n, then the rank expressed in (B.4) is equal to rank [x1 x2 . . . xn ] ,
(B.5)
which, by substitution, can be written as u00 B 0 0 0 u0 B A0 + u01 B 0 rank .. .
0
u00 B 0 A0n−1 + u01 B 0 A0n−2 + . . . + u0n−1 B 0
DIST - University of Genoa
§ B.0 - Proofs of the theorems of Chapter 2
95
or, equivalently, rank (KU ) where U is defined in (2.5). Suppose, by contradiction, that rank (K) < n then, using relation (B.1a) we obtain rank (KU ) ≤ min(rank (K) , rank (U )) and then rank (KU ) < n for all sequences un−1 , contradicting the hypothesis. 0 (⇒) Reachability implies Identifiability (from the null state) We have to prove that if the system is reachable then exists a time T and a sequence uT0 −1 such that the quantity in (B.4) is equal to n. Let us fix T = n. Let us define the column vectors of U as vi ,
i = 1, . . . , n.
Choose a control sequence un−1 so that vectors vi satisfy condition (2.6). This 0 can be always done by following the construction reported in Procedure 2.1. Note that xi = Kvi ,
i = 1, . . . , n
and assume, by contradiction, that xi are linear independent, then ∃α1 , . . . , αn with αi 6= 0 for some i, s.t. n X
αi xi = 0
i=1
and, by substitution, n X i=1
αi Kvi = K
n X
αi vi = 0.
i=1
Then we have a non null linear combination of vectors belonging to both span(K 0 ) and Ker(K) which contradicts the hypothesis.
DIST - University of Genoa
§ B.0 - Proofs of the theorems of Chapter 2
96
Proof of Theorem 2.3 µ ¶−1 ¡ −1 ¢ 1 0 log det MT = log det MT −1 + 2 xT xT = σ µ ¶ 1 − log det MT −1 + 2 xT x0T . σ Now, by proposition B.2, we can rewrite (B.6) as µ ¶ 1 0 −1 − log det (MT −1 ) − log 1 + 2 xT MT −1 xT . σ and iterating recursively we obtain T ¡ −1 ¢ ¡ −1 ¢ X 1 −1 log det MT = log det M−1 − ln(1 + 2 x0i Mi−1 xi ) σ i=0
DIST - University of Genoa
(B.6)
Appendix C
Gaussian Sum Filter Derivatives The training algorithm adopted in this Chapter need the evaluation of the derivˆ t (Sˆt−1 , ut−1 , yt ), the derivatives atives of the Gaussian Sum Filter. Since Sˆt = Φ ˆ ˆ ˆ ˆ ∂ Φt+1 /∂ St , ∂ Φt /∂yt e ∂ Φt+1 /∂ut are needed. We indicate with ep the generic vector with all null elements except the p-th component. The structure of Sˆt is the following: αt µt1 .. . µtzt Pt1 e1 .. ˆ . St = Pt1 ez t .. . Ptz e1 t .. . Ptzt ezt
97
§ C.0 - Gaussian Sum Filter Derivatives
ˆ t as follows Let us decompose the Gaussian Sum Filter Φ 0
α βti αti = φˆαti (Sˆt−1 , ut−1 , yt ) = Pzt ti 0 j=1 αtj βtj µti = φˆµti (Sˆt−1 , ut−1 , yt ) = ati + Kti [yt − ht (ati )] 0 0 Pti = φˆPti (Sˆt−1 , ut−1 , yt ) = Pti − Kti Hti Pti
where 0
αti = α(t−1)i ³ ´ 0 βti = N yt − ht (ati ) , Hti Pti HtiT + Rt h i−1 0 0 Kti = Pti HtiT Hti Pti HtiT + Rt ¢ ¡ ati = ft µ(t−1)i , ut−1 0
Pti = Fti P(t−1)i FtiT + Qt−1 ¯ ∂ft ¯¯ Fti = ∂xt−1 ¯µ(t−1)i ,ut−1 ¯ ∂ht ¯¯ Hti = ∂xt ¯ati and φˆαt = col[φˆαt1 · · · φˆαtzt ] φˆµ = col[φˆµ · · · φˆµ ] t φˆPt
=
tzt t1 P col[φˆt1 e1 · · · φˆPt1 ezt
· · · φˆPtzt e1 · · · φˆPtzt ezt ]
αt = col[αt1 · · · αtzt ] µt = col[µt1 · · · µtzt ] σt = col[Pt1 e1 · · · Pt1 ezt · · · Ptzt e1 · · · Ptzt ezt ]
DIST - University of Genoa
98
§ C.0 - Gaussian Sum Filter Derivatives
the three derivatives are given by: ∂ φˆα ∂ φˆα ˆt ∂Φ = ∂ Sˆt−1 ∂ φˆαt = ∂αt−1 ∂ φˆαt = ∂µt−1 ∂ φˆαt = ∂σt−1 ∂ φˆµt = ∂µt−1 ∂ φˆµt = ∂σt−1 ∂ φˆPt = ∂µt−1 ∂ φˆPt = ∂σt−1
t ∂αt−1 ∂ φˆµ t ∂αt−1 ∂ φˆP t ∂αt−1
∂ φˆα t ∂σt−1 ∂ φˆµ t ∂σt−1 ∂ φˆP t ∂σt−1
t ∂µt−1 ∂ φˆµ t ∂µt−1 ∂ φˆP t ∂µt−1
∂αt1 ∂α(t−1)1
.. .
∂αtzt ∂α(t−1)1 ∂αt1 ∂µ(t−1)11
.. .
∂αtzt ∂µ(t−1)11 ∂αt1 ∂P(t−1)111
.. .
∂αtzt ∂P(t−1)111 ∂µt11 ∂µ(t−1)11
.. .
∂µtzt n ∂µ(t−1)11 ∂µt11 ∂P(t−1)111
.. .
∂µtzt n ∂P(t−1)111 ∂Pt111 ∂µ(t−1)11
.. .
∂Ptzt nn ∂µ(t−1)11 ∂Pt111 ∂P(t−1)111
.. .
∂Ptzt nn ∂P(t−1)111
··· ∂αti ∂α(t−1)j
···
99
∈ RD×D
∂αt1 ∂α(t−1)zt
.. .
∂αtzt ∂α(t−1)zt
··· ∂αti ∂µ(t−1)jp
··· ···
∈ Rzt ×zt
∂αt1 ∂µ(t−1)zt n
.. .
∂αtzt ∂µ(t−1)zt n
∈ Rzt ×zt n
∂αt1 ∂P(t−1)zt nn
∂αti
.. .
···
∂αtzt ∂P(t−1)zt nn
∂P(t−1)jrs
··· ∂µtip ∂µ(t−1)jq
··· ··· ∂µtip ∂P(t−1)jrs
··· ··· ∂Ptirs ∂µ(t−1)jp
··· ··· ∂Ptipq ∂P(t−1)jrs
···
∂µt11 ∂µ(t−1)zt n
.. .
∂µtzt n ∂µ(t−1)zt n
∈ Rzt n×zt n
∂P(t−1)zt nn
.. .
∂µtzt n ∂P(t−1)zt nn
.. .
∂Ptzt nn ∂µ(t−1)zt n
2 ∈ Rzt ×zt n
∂µt11
∂Pt111 ∂µ(t−1)zt n
2 ∈ Rzt n×zt n
2 ∈ Rzt n ×nzt
∂Pt111 ∂P(t−1)zt nn
.. .
∂Ptzt nn ∂P(t−1)zt nn
DIST - University of Genoa
2 2 ∈ Rzt n ×zt n
§ C.0 - Gaussian Sum Filter Derivatives
∂ φˆα t ∂yt ˆ ∂ φµ t ∂yt ∂ φˆP t ∂yt ∂αt1 ∂yt1
ˆt ∂Φ = ∂yt ∂ φˆαt = ∂yt
∈ RD×m
···
∂ φˆPt = ∂yt
··· ∂Ptirs ∂ytp
∂Ptzt nn ∂yt1
ˆt ∂Φ = ∂ut−1 ∂ φˆαt = ∂ut−1 ∂ φˆµt = ∂ut−1 ∂ φˆPt = ∂ut−1
∂ φˆα t ∂ut−1 ˆ ∂ φµ t ∂ut−1 ∂ φˆP t ∂ut−1
∈ Rzt n×m
∂µtzt n ∂ytm ∂Pt111 ∂ytm
···
.. .
∈ Rzt ×m
.. .
∂µtip ∂ytp
∂µtzt n ∂yt1 ∂Pt111 ∂yt1
∂αtzt ∂ytm ∂µt11 ∂ytm
···
.. .
.. .
∂αti ∂ytp
∂αtzt ∂yt1 ∂µt11 ∂yt1
∂ φˆµt = ∂yt
∂αt1 ∂ytm
···
.. .
100
···
∈ Rzt n2 ×m
.. .
∂Ptzt nn ∂ytm
∈ RD×m
∂αt1 ∂u(t−1)1
.. .
∂αtzt ∂u(t−1)1 ∂µt11 ∂u(t−1)1
.. .
∂µtzt n ∂u(t−1)1 ∂Pt111 ∂u(t−1)1
.. .
∂Ptzt nn ∂u(t−1)1
··· ∂αti ∂u(t−1)q
∂αt1 ∂u(t−1)p
.. .
···
∂αtzt ∂u(t−1)p
···
∂µt11 ∂u(t−1)p
∂µtip ∂u(t−1)q
.. .
···
∂µtzt n ∂u(t−1)p
···
∂Pt111 ∂u(t−1)p
∂Ptirs ∂u(t−1)q
···
.. .
∂Ptzt nn ∂u(t−1)p
∈ Rzt ×p ∈ Rzt n×p 2 ∈ Rzt n ×p
DIST - University of Genoa
§ C.1 - Gaussian Sum Filter Derivatives
101
For the reader convenience we report here some notations adopted in this appendix. 1. Gaussian weights (α) • First subscript: time instant • Second subscript: number of the gaussian 2. Mean values (µ) • First subscript: time instant • Second subscript: number of the gaussian • Third subscript: vector element 3. Covariances (P ) • First subscript: time instant • Second subscript: number of the gaussian • Third subscript: vector element 4. Measurements (y) • First subscript: time instant • Second subscript: vector element 5. Controls (u) • First subscript: time instant • Second subscript: vector element In the following, for notational convenience, it will be used the Einstein summation convention. Let us define the following quantities 0
• Mti , Hti Pti HtiT + Rt • J rs will indicate a matrix whose elements are null except to the element (r, s). In Fig. C.1 the input-output scheme of the Gaussian Sum Filter is presented. The aim of this appendix is to calculate the derivatives of the output variables respect to the input variables.
DIST - University of Genoa
§ C.1 - Gaussian Sum Filter Derivatives uk−1
© © ©
µ(k−1)i P(k−1)i α(k−1)i
102 yk
ª
{µki }
ª
Gaussian Sum Filter
ª
{Pki } {αki }
Figure C.1: Gaussian Sum Filter scheme.
C.1
Weights derivatives
With respect to the previous weights # " α(t−1)i βti ∂αti ∂ = T ∂α(t−1)j ∂α(t−1)j α(t−1) βt · ´ ¢ ³ T ¡ 1 ∂ = ³ α(t−1)i βti · α(t−1) βt + ´2 ∂α(t−1)j T α(t−1) βt ³ ´¸ ∂ T − α(t−1)i βti α(t−1) βt ∂α(t−1)j ∂ ∂α(t−1)j ∂ ∂α(t−1)j
(C.1)
¢ ¡ α(t−1)i βti = δij βti
(C.2)
´ ³ T α(t−1) βt =
(C.3)
∂α(t−1)h βth = βth δhj = βtj ∂α(t−1)j
Substituting (C.2) and (C.3) in (C.1) we obtain
DIST - University of Genoa
§ C.1 - Gaussian Sum Filter Derivatives
∂αti ∂α(t−1)j
=
=
1
³ T α(t−1) βt
103
h³ ´ i T α β δ β − α β β ´2 ij ti (t−1)i ti tj (t−1) t
h³ ´ i βti T ³ ´2 α(t−1) βt δij − α(t−1)i βtj T α(t−1) βt
With respect to the mean values · ´ 1 ∂αti ∂βti ³ T = ³ α α β ´2 (t−1)i (t−1) t + ∂µ(t−1)jn ∂µ(t−1)jn T α(t−1) βt ¸ ∂βth − α(t−1)i βti α(t−1)h ∂µ(t−1)jn
(C.4)
Thanks to the relation ∂βtj ∂βti ∂βti = δij = δij ∂µ(t−1)jn ∂µ(t−1)jn ∂µ(t−1)in equation (C.4) can be rewritten as · ´ ∂βtj ³ T ∂αti 1 = ³ α(t−1) βt + ´2 α(t−1)i δij ∂µ(t−1)jn ∂µ(t−1)jn T α(t−1) βt ¸ ∂βtj − α(t−1)i βti α(t−1)h δhj ∂µ(t−1)jn ´ i α(t−1)i ∂βtj h³ T = ³ α β δ − α β ´2 ij (t−1)j ti (t−1) t ∂µ(t−1)jn T α(t−1) βt ´ i α(t−1)i ∂βtj h³ T T = ³ α(t−1) βt I − βt α(t−1) ´2 ∂µ(t−1)jn ij T α(t−1) βt
(C.5)
We give now a result that will be used in the following calculations. Given a matrix A and a scalar x. Then the following identity holds µ ¶ ∂|A| −1 ∂A = |A|T r A (C.6) ∂x ∂x
DIST - University of Genoa
§ C.1 - Gaussian Sum Filter Derivatives
104
We can now take on our calculations, ∂βti ∂µ(t−1)in
∂
=
[N (yt − ht (ati ), Mti )] · 1 ∂ · = n/2 ∂µ(t−1)in (2π) |Mti |1/2 µ ¶¸ 1 T −1 · exp − (yt − ht (ati )) Mti (yt − ht (ati )) 2 · ¸ ∂ 1 = · ∂µ(t−1)in (2π)n/2 |Mti |1/2 · ¸ 1 T −1 · exp − (yt − ht (ati )) Mti (yt − ht (ati )) + 2 1 + · n/2 (2π) |Mti |1/2 ¶¸ · µ ∂ 1 T −1 · exp − (yt − ht (ati )) Mti (yt − ht (ati )) (C.7) ∂µ(t−1)in 2
∂ ∂µ(t−1)in
∂µ(t−1)in
·
1 (2π)n/2 |Mti |1/2
¸
1 1 ∂|Mti | |Mti |−3/2 2 (2π)n/2 ∂µ(t−1)in 1 1 = − |Mti |−3/2 |Mti | · 2 (2π)n/2 µ ¶ −1 ∂Mti ·T r Mti ∂µ(t−1)in µ ¶ 1 |Mti |−1/2 −1 ∂Mti = − T r Mti 2 (2π)n/2 ∂µ(t−1)in = −
(C.8)
where ∂Mti ∂µ(t−1)in
=
∂ ∂µ(t−1)in
i h 0 Hti Pti HtiT + Rt 0
=
∂Hti ∂Pti ∂HtiT 0 0 Pti HtiT + Hti HtiT + Hti Pti (C.9) ∂µ(t−1)in ∂µ(t−1)in ∂µ(t−1)in
=
∂Hti ∂ftip ∂Hti = Ftipn ∂atip ∂µ(t−1)in ∂atip
and ∂Hti ∂µ(t−1)in
DIST - University of Genoa
(C.10)
§ C.1 - Gaussian Sum Filter Derivatives
0
∂Pti ∂µ(t−1)in
= =
∂ ∂µ(t−1)in
105
£ ¤ Fti P(t−1)i FtiT + Qt−1
∂P(t−1)i T ∂Fti P(t−1)i FtiT + Fti F + ∂µ(t−1)in ∂µ(t−1)in ti +Fti P(t−1)i
∂FtiT ∂µ(t−1)in
(C.11)
∂P(t−1)i =0 ∂µ(t−1)in Indeed ∂P(t−1)i ∂µ(t−1)in
=
∂ ∂µ(t−1)in
h 0 i 0 P(t−1)i − T(t−1)i H(t−1)i P(t−1)i
0
=
∂P(t−1)i ∂µ(t−1)in
−
∂T(t−1)i 0 H P + ∂µ(t−1)in (t−1)i (t−1)i 0
∂P(t−1)i ∂H(t−1)i 0 −T(t−1)i P(t−1)i − T(t−1)i H(t−1)i ∂µ(t−1)in ∂µ(t−1)in 0
∂P(t−1)i ∂µ(t−1)in ∂T(t−1)i ∂µ(t−1)in
= 0,
=
∂H(t−1)i =0 ∂µ(t−1)in ∂ ∂µ(t−1)in
h 0 i −1 T M(t−1)i P(t−1)i H(t−1)i
0
=
∂P(t−1)i ∂µ(t−1)in 0
−1 T H(t−1)i M(t−1)i
T +P(t−1)i H(t−1)i
−1 ∂M(t−1)i
∂µ(t−1)jn
−1 = −M(t−1)i
T ∂H(t−1)i
0
+ P(t−1)i
∂µ(t−1)in
−1 ∂M(t−1)i
∂µ(t−1)in
∂M(t−1)i −1 M ∂µ(t−1)jn (t−1)i
h i ∂M(t−1)i ∂ T = H(t−1)i P(t−1)i 0 H(t−1)i + Rt−1 = 0 ∂µ(t−1)in ∂µ(t−1)in
DIST - University of Genoa
−1 M(t−1)i +
§ C.1 - Gaussian Sum Filter Derivatives
106
We now calculate the second part of (C.7) µ ¶ 1 ∂ T −1 exp − (yt − ht (ati )) Mti (yt − ht (ati )) = ∂µ(t−1)in 2 · ¸ 1 = exp − (yt − ht (ati ))T Mti−1 (yt − ht (ati )) · 2 µ ¶ h i 1 ∂ (yt − ht (ati ))T Mti−1 (yt − ht (ati )) · − 2 ∂µ(t−1)in ∂ ∂µ(t−1)in
(C.12)
h i (yt − ht (ati ))T Mti−1 (yt − ht (ati )) =
=
∂ ∂µ(t−1)in
[yt − ht (ati )]T Mti−1 (yt − ht (ati )) +
∂Mti−1 (yt − ht (ati )) + ∂µ(t−1)in ∂ + (yt − ht (ati ))T Mti−1 [yt − ht (ati )] ∂µ(t−1)in + (yt − ht (ati ))T
= −
∂ht (ati )T −1 M (yt − ht (ati )) + ∂µ(t−1)in ti
+ (yt − ht (ati ))T
∂Mti−1 (yt − ht (ati )) + ∂µ(t−1)in
− (yt − ht (ati ))T Mti−1
∂ht (ati ) ∂µ(t−1)in
(C.13)
Let us consider now the s-th component of the measurement channel, we have that ∂hts (ati ) ∂hts (ati ) ∂ftir = = Htisr Ftirn = (Hti Fti )sn ∂µ(t−1)in ∂atir ∂µ(t−1)in
(C.14)
Recalling that ∂Mti−1 ∂µ(t−1)in
= −Mti−1
∂Mti M −1 ∂µ(t−1)in ti
DIST - University of Genoa
(C.15)
§ C.1 - Gaussian Sum Filter Derivatives
107
We can now rewrite the second part of Eq. (C.7) (i.e Eq. (C.12)) as µ ¶ ∂ 1 exp − (yt − ht (ati ))T Mti−1 (yt − ht (ati )) = ∂µ(t−1)in 2 · ¸ 1 1 T −1 = − exp − (yt − ht (ati )) Mti (yt − ht (ati )) · 2 2 £ · −(HF en )Tti Mti−1 (yt − ht (ati )) + ∂Mti − (yt − ht (ati ))T Mti−1 M −1 (yt − ht (ati )) + ∂µ(t−1)in ti i − (yt − ht (ati ))T Mti (HF en )ti
(C.16)
On the base of the previous results and by defining kti , Mti−1 (yt − ht (ati )) we rewrite Eq. (C.7) as µ ¶ ∂βti βti −1 ∂Mti T r Mti = − + ∂µ(t−1)in 2 ∂µ(t−1)in · ¸ ∂Mti βti T T T (Hti Fti en ) kti + kti kti + kti (Hti Fti en ) + 2 ∂µ(t−1)in where ∂Mti /∂µ(t−1)in is given by (C.9).
DIST - University of Genoa
§ C.1 - Gaussian Sum Filter Derivatives
With respect to the covariances " # α(t−1)i βti ∂αti ∂ = T ∂P(t−1)jrs ∂P(t−1)jrs α(t−1) βt · ¡ ¢ T ∂ 1 = ³ α(t−1)i βti α(t−1) βt + ´2 ∂P(t−1)jrs T α(t−1) βt ¸ ∂βth − α(t−1)i βti α(t−1)h ∂P(t−1)jrs · α(t−1)i ∂βti T = ³ α(t−1) βt + ´2 ∂P T (t−1)jrs α(t−1) βt ¸ ∂βth − βti α(t−1)h ∂P(t−1)jrs ∂βtj ∂βti = δij ∂P(t−1)jrs ∂P(t−1)jrs h i α(t−1)i ∂βtj ∂αti T = ³ δij α(t−1) βt − βti α(t−1)j ´2 ∂P(t−1)jrs ∂P(t−1)jrs T α(t−1) βt ∂βtj ∂P(t−1)jrs
∂
·
1
108
(C.17)
· ∂P(t−1)jrs µ ¶¸ 1 T −1 · exp − (yt − ht (atj )) Mtj (yt − ht (atj )) 2 ¸ · · 1 ∂ 1 exp − (yt − ht (atj ))T · = n/2 1/2 ∂P(t−1)jrs (2π) |Mti | 2 i 1 · Mtj−1 (yt − ht (atj )) + · n/2 (2π) |Mti |1/2 · ¸ 1 T −1 · exp − (yt − ht (atj )) · Mtj (yt − ht (atj )) · 2 · ¸ ∂ 1 T −1 · − (yt − ht (atj )) · Mtj (yt − ht (atj )) (C.18) ∂P(t−1)jrs 2
=
(2π)n/2 |Mti |1/2
DIST - University of Genoa
§ C.1 - Gaussian Sum Filter Derivatives
"
∂ ∂P(t−1)jrs
|Mtj |−1/2 (2π)n/2
# = −
109
1 −3/2 ∂|Mtj | |M | tj ∂P(t−1)jrs 2(2π)n/2
1 |Mtj |−3/2 |Mtj | · 2(2π)n/2 µ ¶ ∂Mtj −1 ·T r Mtj ∂P(t−1)jrs ¶ µ ∂Mtj 1 −1 −1/2 = − |Mtj | T r Mtj (C.19) ∂P(t−1)jrs 2(2π)n/2
= −
∂ ∂P(t−1)jrs
· ¸ 1 − (yt − ht (atj ))T · Mtj−1 (yt − ht (atj )) = 2
∂Mtj−1 1 = − (yt − ht (atj ))T (yt − ht (atj )) 2 ∂P(t−1)jrs
(C.20)
where ∂Mtj−1 ∂P(t−1)jrs
= −Mtj−1
∂Mtj M −1 ∂P(t−1)jrs tj
Substituting (C.19) and (C.20) in (C.18) we obtain µ ¶ ∂βtj ∂Mtj 1 = − βtj T r Mtj−1 + ∂P(t−1)jrs 2 ∂P(t−1)jrs ∂Mtj 1 + βtj (yt − ht (atj ))T Mtj−1 M −1 (yt − ht (atj )) 2 ∂P(t−1)jrs tj µ ¶ ∂Mtj ∂Mtj 1 1 −1 T = − βtj T r Mtj + βtj kti kti 2 ∂P(t−1)jrs 2 ∂P(t−1)jrs where 0 ´ ³ ∂Ptj ∂Mtj ∂ 0 T = Htj Ptj Htj + Rt = Htj HT ∂P(t−1)jrs ∂P(t−1)jrs ∂P(t−1)jrs tj
By making explicit the components we obtain 0
∂Ptjpq ¡ ¢ ∂P(t−1)jzw T T = Ftjpz Ftjwq = Ftj J rs FtjT pq = Ftjpr Ftjsq ∂P(t−1)jrs ∂P(t−1)jrs
DIST - University of Genoa
(C.21)
§ C.1 - Gaussian Sum Filter Derivatives
∂Mtjpq ∂P(t−1)jrs
110
0
∂Ptjzw T T = Htjpz H T = Htjpz Ftjzr Ftjsw Htjwq ∂P(t−1)jrs tjwq ¢ ¡ T = (Htj Ftj )pr FtjT Htj sq
and then Eq. (C.21) becomes ¡ ¢ T ∂Mtj T = Htj Ftj J rs FtjT Htj = Htj Ftj J rs FtjT Htj ∂P(t−1)jrs With respect to the measures # " α(t−1)i βti ∂αti ∂ = T ∂ytm ∂y(tm α(t−1) βt · ´ ´¸ α(t−1)i ∂ ³ T ∂βti ³ T = ³ α(t−1) βt − βti α(t−1) βt ´2 ∂ytm ∂ytm T α(t−1) βt ∂βti ∂ytm
=
= = =
(C.22)
· ∂ 1 (C.23) · n/2 ∂ytm (2π) |Mti |1/2 µ ¶¸ 1 · exp − (yt − ht (ati ))T Mti−1 (yt − ht (ati )) 2 h i ∂ −1 βti (yt − ht (ati ))p Mtipq (yt − ht (ati ))q ∂ytm i βti h −1 −1 δmp Mtipq (yt − ht (ati ))q + (yt − ht (ati ))p Mtipq δqm − 2 −1 −βti Mmq (yt − ht (ati ))q
= −βti ttim
(C.24)
´ ∂ ³ T ∂βth α(t−1) βt = α(t−1)h = −α(t−1)h βth kthm ∂ytm ∂ytm Substituting (C.24) and (C.25) in (C.22) we obtain h 1 ∂αti T = −³ ´2 (α(t−1) βt )α(t−1)i βti ttim + ∂ytm T α(t−1) βt ¡ ¢¤ − α(t−1)i βti α(t−1)h βth kthm i α(t−1)i βti h T = −³ ´2 (α(t−1) βt )ttim − α(t−1)h βth kthm T α(t−1) βt
DIST - University of Genoa
(C.25)
§ C.1 - Gaussian Sum Filter Derivatives
111
With respect to the controls " # α(t−1)i βti ∂αti ∂ ¡ T ¢ = ∂u(t−1)p ∂u(t−1)p αt−1 βt ¸ · ¡ T ¢ α(t−1)i ∂ ∂βti T . (C.26) = ¡ α β − β α β ¢ t ti t T β 2 ∂u(t−1)p t−1 ∂u(t−1)p t−1 αt−1 t Since ∂ ∂u(t−1)p
¡ ¢ α(t−1)j βtj =
we obtain ∂αti ∂u(t−1)p ∂βti ∂u(t−1)p
∂ ∂u(t−1)p
·
=
∂ ∂u(t−1)p
¡ ¢ α(t−1)j βtj = α(t−1)j
∂βtj , ∂u(t−1)p
· ¸ ¢ α(t−1)i ∂βtj ∂βti ¡ T α βt − βti α(t−1)j ¢2 ¡ T ∂u(t−1)p t−1 ∂u(t−1)p αt−1 βt ∂
·
(C.27)
1
· ∂u(t−1)p µ ¶¸ 1 T −1 · exp − (yt − ht (ati )) Mti (yt − ht (ati )) 2 ¸ · ∂ 1 · = ∂u(t−1)p (2π)n/2 |Mti |1/2 · ¸ 1 T −1 · exp − (yt − ht (ati )) Mti (yt − ht (ati )) + 2 1 + · (2π)n/2 |Mti |1/2 · µ ¶¸ ∂ 1 T −1 · exp − (yt − ht (ati )) Mti (yt − ht (ati )) (C.28) ∂u(t−1)p 2 =
(2π)n/2 |Mti |1/2
1 n/2 (2π) |Mti |1/2
¸
1 ∂ |Mti |−1/2 n/2 ∂u (2π) (t−1)p 1 1 −3/2 ∂|Mti | = − |M | ti 2 (2π)n/2 ∂u(t−1)p µ ¶ 1 1 −1 ∂Mti −1/2 = − |Mti | T r Mti (C.29) 2 (2π)n/2 ∂u(t−1)p =
DIST - University of Genoa
§ C.2 - Gaussian Sum Filter Derivatives
112
· µ ¶¸ 1 T −1 exp − (yt − ht (ati )) Mti (yt − ht (ati )) = ∂u(t−1)p 2 · ¸ 1 1 T −1 = − exp − (yt − ht (ati )) Mti (yt − ht (ati )) · 2 2 h i ∂ · (yt − ht (ati ))T Mti−1 (yt − ht (ati )) ∂u(t−1)p ∂
∂ ∂u(t−1)p
(C.30)
h i (yt − ht (ati ))T Mti−1 (yt − ht (ati )) =
=
∂ ∂u(t−1)p
i h (yt − ht (ati ))T Mti−1 (yt − ht (ati )) +
∂Mti−1 (yt − ht (ati )) + ∂u(t−1)p ∂ + (yt − ht (ati ))T Mti−1 [(yt − ht (ati ))] ∂u(t−1)p + (yt − ht (ati ))T
(C.31)
By recalling the following result ∂Mti−1 ∂Mti = −Mti−1 M −1 ∂u(t−1)p ∂u(t−1)p ti we have that ∂Mti ∂u(t−1)p
=
∂ ∂u(t−1)p
h i 0 Hti Pti HtiT 0
= ∂ ∂u(t−1)p
∂Hti ∂Pti ∂HtiT 0 0 Pti HtiT + Hti HtiT + Hti Pti ∂u(t−1)p ∂u(t−1)p ∂u(t−1)p
[yt − ht (ati )] = −
(C.32)
∂ht (ati ) ∂hti ∂ftiz =− = −Hti Bti ep (C.33) ∂u(t−1)p ∂atiz ∂u(t−1)p
By using the above calculations we finally obtain the following relation µ ¶ ∂βti βti −1 ∂Mti = − T r Mti + ∂u(t−1)p 2 ∂u(t−1)p · ¸ βti T T ∂Mti T + (Hti Bti ep ) kti + kti kti + kti (Hti Bti ep ) 2 ∂u(t−1)p where ∂Mti /∂u(t−1)p is given by (C.32).
DIST - University of Genoa
§ C.2 - Gaussian Sum Filter Derivatives
C.2
113
Covariance derivatives
With respect to the weights ∂Pti =0 ∂α(t−1)j With respect to the mean values ∂Pti ∂Pti = δij ∂µ(t−1)jn ∂µ(t−1)in ∂Pti ∂µ(t−1)in
= =
∂ ∂µ(t−1)in
h 0 i 0 Pti − Kti Hti Pti
0 h i ∂Pti ∂ 0 − Kti Hti Pti ∂µ(t−1)in ∂µ(t−1)in
(C.34)
We now develop the first part of (C.34) 0
∂Pti ∂µ(t−1)in
= =
∂ ∂µ(t−1)in
¤ £ Fti P(t−1)i FtiT + Qt−1
∂P(t−1)i T ∂Fti P(t−1)i FtiT + Fti F + ∂µ(t−1)in ∂µ(t−1)in ti + Fti P(t−1)i
∂FtiT ∂µ(t−1)in
(C.35)
it is worth noting that the second derivative in (C.35) is null, i.e. ∂P(t−1)i =0 ∂µ(t−1)in We develop now the second part of (C.34) h i ∂Hti ∂ ∂Kti 0 0 0 Kti Hti Pti = Hti Pti + Kti Pti + ∂µ(t−1)in ∂µ(t−1)in ∂µ(t−1)in 0
∂Pti + Kti Hti ∂µ(t−1)in
DIST - University of Genoa
(C.36)
§ C.2 - Gaussian Sum Filter Derivatives
114
The first derivative in (C.36) can be rewritten as ∂Kti ∂µ(t−1)in
=
∂ ∂µ(t−1)in 0
+ Pti
h 0 i Pti HtiT Mti−1 =
0
∂Pti H T M −1 + ∂µ(t−1)in ti ti
∂Mti−1 ∂HtiT 0 Mti−1 + Pti HtiT ∂µ(t−1)in ∂µ(t−1)in
We explicit now the third derivative of relation (C.37), ∂Mti−1 ∂µ(t−1)in
= −Mti−1
∂Mti ∂µ(t−1)in
=
∂Mti M −1 ∂µ(t−1)in ti
where ∂ ∂µ(t−1)in
h i 0 Hti Pti HtiT + Rt
=
∂Hti ∂Pti 0 ∂HtiT 0 0 Pti HtiT + Hti HtiT + Hti Pti ∂µ(t−1)in ∂µ(t−1)in ∂µ(t−1)in
=
∂Hti ∂ftip ∂Hti Ftipn = ∂atip ∂µ(t−1)in ∂atip
and where ∂Hti ∂µ(t−1)in
Hence we finally rewrite (C.34) as " 0 0 ∂Pti ∂Pti ∂Pti = − H T M −1 + ∂µ(t−1)in ∂µ(t−1)in ∂µ(t−1)in ti ti
−1 ¸ ∂HtiT 0 −1 T ∂Mti M + Pti Hti Hti P ti0 + + Pti ∂µ(t−1)in ti ∂µ(t−1)in 0
0
∂Hti ∂Pti 0 − Kti Pti − Kti Hti ∂µ(t−1)in ∂µ(t−1)in
DIST - University of Genoa
(C.37)
§ C.2 - Gaussian Sum Filter Derivatives
115
With respect to the covariances h 0 i ∂Pti ∂ 0 = δij Pti − Kti Hti Pti ∂P(t−1)jrs ∂P(t−1)irs i h 0 ∂ 0 0 Pti − Pti HtiT Mti−1 Hti Pti = δij ∂P(t−1)irs " 0 0 ∂Pti ∂Pti 0 = δij − H T M −1 Hti Pti + ∂P(t−1)jrs ∂P(t−1)jrs ti ti −
# 0 ∂Mti−1 ∂Pti 0 0 −1 T Hti Pti − Pti Hti Mti Hti (C.38) ∂P(t−1)jrs ∂P(t−1)jrs
0
Pti HtiT
We can now develop the derivatives of relation (C.38) 0 £ ¤ ∂Pti ∂ Fti P(t−1)i FtiT = Fti J rs FtiT = ∂P(t−1)irs ∂P(t−1)irs
∂Mti−1 ∂Mti = −Mti−1 M −1 ∂P(t−1)irs ∂P(t−1)irs ti where 0
∂Mti ∂Pti = Hti H T = Hti Fti J rs FtiT HtiT ∂P(t−1)irs ∂P(t−1)irs ti With respect to the measurements ∂Pti =0 ∂ytm With respect to the controls 0 h 0 i ∂Pti ∂ ∂Pti ∂Kti 0 0 = Pti − Kti Hti Pti = − Hti Pti + ∂u(t−1)p ∂u(t−1)p ∂u(t−1)p ∂u(t−1)p 0
− Kti
∂Hti ∂Pti 0 Pti − Kti Hti ∂u(t−1)p ∂u(t−1)p
Let us consider equation (C.39), 0
∂Pti ∂u(t−1)p
=
∂ ∂u(t−1)p + Fti
£ ¤ Fti P(t−1)i FtiT =
∂Fti P FT + ∂u(t−1)p (t−1)i ti
∂P(t−1)i T ∂FtiT Fti + Fti P(t−1)i ∂u(t−1)p ∂u(t−1)p
DIST - University of Genoa
(C.39)
§ C.3 - Gaussian Sum Filter Derivatives
∂Kti ∂u(t−1)p
=
∂ ∂u(t−1)p 0
+ Pti
i h 0 Pti HtiT Mti−1 =
116
0
∂Pti H T M −1 + ∂u(t−1)p ti ti
∂Mti−1 ∂HtiT 0 Mti−1 + Pti HtiT ∂u(t−1)p ∂u(t−1)p
where ∂Mti−1 ∂Mti = −Mti−1 M −1 ∂u(t−1)p ∂u(t−1)p ti and where ∂Mti ∂u(t−1)p
=
∂ ∂u(t−1)p
h i 0 Hti Pti HtiT 0
=
∂Hti ∂Pti ∂HtiT 0 0 Pti HtiT + Hti HtiT + Hti Pti . ∂u(t−1)p ∂u(t−1)p ∂u(t−1)p
Hence finally we obtain ∂Hti ∂Hti ∂ftiz ∂Hti Bzp = = ∂u(t−1)p ∂atiz ∂u(t−1)p ∂atiz
C.3
Mean values derivatives
With respect to the weights ∂µti =0 ∂α(t−1)j With respect to the mean values ∂µti ∂µti = δij ∂µ(t−1)jn ∂µ(t−1)in ∂µti ∂µ(t−1)in
= =
∂ ∂µ(t−1)in
[ati + Kti (yt − ht (ati ))]
∂Kti ∂ht (ati ) ∂ati + [yt − ht (ati )] − Kti ∂µ(t−1)in ∂µ(t−1)in ∂µ(t−1)in
DIST - University of Genoa
(C.40)
§ C.3 - Gaussian Sum Filter Derivatives
Let us consider relation (C.40), ∂ati = Fti en ∂µ(t−1)in ∂hti ∂hti ∂ftiz = = Hti Fti en ∂µ(t−1)in ∂atiz ∂µ(t−1)in 0
∂Mti−1 ∂Kti ∂HtiT ∂Pti 0 0 = HtiT Mti−1 + Pti Mti−1 + Pti HtiT ∂µ(t−1)in ∂µ(t−1)in ∂µ(t−1)in ∂µ(t−1)in where 0
∂Pti ∂Fti ∂FtiT = P(t−1)i FtiT + Fti P(t−1)i ∂µ(t−1)in ∂µ(t−1)in ∂µ(t−1)in ∂Hti ∂ftip ∂Hti Ftipn = ∂atip ∂µ(t−1)in ∂atip
∂Hti ∂µ(t−1)in
=
∂Mti−1 ∂µ(t−1)in
= −Mti−1
∂Mti M −1 ∂µ(t−1)in ti
and where ∂Mti ∂µ(t−1)in
= =
∂ ∂µ(t−1)in
h i 0 Hti Pti HtiT + Rt
∂Hti ∂Pti 0 ∂HtiT 0 0 Pti HtiT + Hti HtiT + Hti Pti ∂µ(t−1)in ∂µ(t−1)in ∂µ(t−1)in
With respect to the covariances ∂µti ∂P(t−1)jrs
= δij
∂µti ∂P(t−1)irs
DIST - University of Genoa
117
§ C.3 - Gaussian Sum Filter Derivatives
∂µti ∂P(t−1)irs
= =
∂Kti ∂P(t−1)irs
118
∂
[ati + Kti (yt − ht (ati ))] ∂P(t−1)irs ∂Kti (yt − ht (ati )) ∂P(t−1)irs
(C.41)
0
=
∂Mti−1 ∂Pti 0 HtiT Mti−1 + Pti HtiT ∂P(t−1)irs ∂P(t−1)irs
Let us first consider the first derivative of relation (C.42), 0 £ ¤ ∂Pti ∂ = Fti P(t−1)i FtiT = Fti J rs FtiT ∂P(t−1)irs ∂P(t−1)irs
the second derivative will be ∂Mti−1 ∂Mti = −Mti−1 M −1 ∂P(t−1)irs ∂P(t−1)irs ti where 0
∂Mti ∂Pti = Hti H T = Hti Fti J rs FtiT HtiT ∂P(t−1)irs ∂P(t−1)irs ti Then relation (C.42) can be rewritten as ∂Kti 0 = Fti J rs FtiT HtiT Mti−1 − Pti HtiT Mti−1 Hti Fti J rs FtiT Hti Mti−1 ∂P(t−1)irs Finally relation (C.41) can be rewritten as ³ ´ ∂µti 0 = I − Pti HtiT Mti−1 Hti Fti J rs FtiT HtiT Mti−1 (yt − ht (ati )) ∂P(t−1)irs With respect to the measurements ∂µti ∂µti = δij = Kti em δij ∂ytjm ∂ytim
DIST - University of Genoa
(C.42)
§ C.3 - Gaussian Sum Filter Derivatives
119
With respect to the controls ∂µti ∂ = [ati + Kti (yt − ht (ati ))] ∂u(t−1)p ∂u(t−1)p ∂ati ∂Kti = + (yt − ht (ati )) + ∂u(t−1)p ∂u(t−1)p ∂ + Kti [yt − ht (ati )] ∂u(t−1)p
(C.43)
Let us consider the second derivative of (C.43), i h 0 ∂Kti ∂ = Pti HtiT Mti−1 ∂u(t−1)p ∂u(t−1)p 0
=
∂Mti−1 ∂Pti ∂HtiT 0 0 HtiT Mti−1 + Pti Mti−1 + Pti HtiT ∂u(t−1)p ∂u(t−1)p ∂u(t−1)p
where 0
∂Pti ∂u(t−1)p
= =
∂ ∂u(t−1)p
£ ¤ Fti P(t−1)i FtiT
∂P(t−1)i T ∂Fti ∂FtiT P(t−1)i FtiT + Fti Fti + Fti P(t−1)i ∂u(t−1)p ∂u(t−1)p ∂u(t−1)p
∂Hti ∂Hti ∂ftiz ∂Hti Bzp = = ∂u(t−1)p ∂atiz ∂u(t−1)p ∂atiz ∂Mti−1 ∂Mti = −Mti−1 M −1 ∂u(t−1)p ∂u(t−1)p ti ∂Mti ∂u(t−1)p
=
∂ ∂u(t−1)p
h i 0 Hti Pti HtiT 0
=
∂Hti ∂Pti ∂HtiT 0 0 Pti HtiT + Hti HtiT + Hti Pti ∂u(t−1)p ∂u(t−1)p ∂u(t−1)p
The third derivative in (C.43) can be calculated as follows ∂ ∂ht (ati ) [yt − ht (ati )] = − ∂u(t−1)p ∂u(t−1)p ∂hti ∂ftiz = − ∂atiz ∂u(t−1)p = −Hti Bti ep
DIST - University of Genoa
Appendix D
Derivatives of the Quadratic Renyi Entropy In this section we develop the derivatives of the Quadratic Renyi Entropy respect to the sufficient statistic. Let us recall first the Renyi entropy definition Z 1 log pr dX r ² R+ Hr (p) = 1−r p=
z X
αh Nx (µh , Ph )
h=1
Derivatives respect to the weights · Z ¸ ∂ 1 ∂ r Hr (p) = · log p dX ∂αi 1 − r ∂αi ·Z ¸ 1 1 ∂ = ·R r pr dX · 1−r p dX ∂αi ·Z ¸ 1 1 ∂pr = ·R r dX 1−r ∂αi p dX ·Z ¸ 1 1 r−1 ∂p R r p = · dX 1−r ∂αi pr dX In the case of Quadratic Renyi Entropy (r = 2) we obtain ·Z ¸ ∂p 1 ∂ p H2 (p) = −2 R 2 dX ∂αi ∂αi p dX
120
(D.1)
§ D.0 - Derivatives of the Quadratic Renyi Entropy
121
But ∂p = Nx (µi , Pi ) ∂αi and then # Z Z "X z ∂p p dX = αh NX (µh , Ph ) NX (µi , Pi )dX ∂αi h=1 Z g X = αh NX (µh , Ph ) NX (µi , Pi )dX =
h=1 g X
Z αh chi
NX (µhi , Phi ) dX
h=1
=
z X
αh chi .
h=1
With respect to the mean values · Z ¸ ∂ 1 ∂ r Hr (p) = · log p dX ∂µi 1 − r ∂µi ·Z ¸ 1 ∂ 1 r ·R r · p dX = 1−r p dX ∂µi ·Z ¸ 1 1 ∂pr R = · dX 1−r ∂µi pr dX ¸ ·Z 1 1 ∂p = ·R r dX r pr−1 1−r ∂µi p dX If r = 2 we have ∂ 1 H2 (p) = −2 R 2 ∂µi p dX
·Z
¸ ∂p p dX . ∂µi
DIST - University of Genoa
(D.2)
§ D.0 - Derivatives of the Quadratic Renyi Entropy
122
But ∂p ∂µi
∂Nx (µi , Pi ) ∂µ · i µ ¶¸ ∂ 1 1 T −1 = αi exp − (x − µi ) Pi (x − µi ) ∂µi (2π)n/2 |Pi |1/2 2 · ¸ 1 ∂ T −1 − · (x − µi ) Pi (x − µi ) = αi Nx (µi , Pi ) ∂µi 2 = αi
= −αi Nx (µi , Pi ) Pi−1 (x − µi ) and then # Z Z "X z ∂p dX = − αh NX (µh , Ph ) αi NX (µi , Pi ) Pi−1 (X − µi ) dX p ∂µi h=1 # Z "X z αh NX (µh , Ph ) NX (µi , Pi ) (X − µi ) dX = −αi Pi−1 h=1
=
−αi Pi−1
Z "X z
# αh NX (µh , Ph ) NX (µi , Pi ) XdX +
h=1
+αi Pi−1 µi
Z "X z
# αh NX (µh , Ph ) NX (µi , Pi ) dX
h=1
= −αi Pi−1
Z "X z Z
#
αh NX (µh , Ph ) NX (µi , Pi ) XdX +
h=1
∂p dX ∂αi # Z "X z −1 αh NX (µh , Ph ) NX (µi , Pi ) XdX + = −αi Pi +αi Pi−1 µi
p
h=1
+αi Pi−1 µi
z X
αh chi .
h=1
We develop now the integral part of (D.3) # Z "X z αh NX (µh , Ph ) NX (µi , Pi ) XdX = h=1
DIST - University of Genoa
(D.3)
§ D.0 - Derivatives of the Quadratic Renyi Entropy
=
z X
123
Z αh
NX (µh , Ph ) NX (µi , Pi ) XdX
h=1
=
z X
Z αh chi
NX (µhi , Phi ) XdX
h=1
=
z X
(D.4)
αh chi µhi
h=1
By substituting (D.4) in (D.3) we obtain Z z z X X ∂p −1 −1 αh chi − αi Pi αh chi µhi dX = αi Pi µi p ∂µi h=1 h=1 # " z z X X −1 = αi Pi µi αh chi − αh chi µhi h=1
h=1
With respect to the covariances · Z ¸ ∂ 1 ∂ r Hr (p) = · log p dX ∂Pi 1 − r ∂Pi ·Z ¸ 1 1 ∂ r R = · p dX · 1−r pr dX ∂Pi ·Z ¸ 1 1 ∂pr = ·R r dX 1−r ∂Pi p dX ¸ ·Z 1 1 ∂p r−1 = ·R r p dX r 1−r ∂Pi p dX If r = 2 we have that ∂ 1 H2 (p) = −2 R 2 ∂Pi p dX
·Z
∂p p dX ∂Pi
¸ (D.5)
Let us now recall some preliminary results. Let us consider a symmetric matrix A and two generic values b e c, ¡ ¢T ∂|A| = |A| A−1 ∂A
DIST - University of Genoa
§ D.0 - Derivatives of the Quadratic Renyi Entropy
124
¡ ¢T ∂bT A−1 c = − A−1 bcT A−T ∂A Let us develop the first part of (D.5) ½ µ ¶ · ∂Nx (µi , Pi ) ∂ 1 ∂p 1 = αi = αi exp − (x − µi )T Pi−1 · n/2 1/2 ∂Pi ∂Pi ∂Pi (2π) |Pi | 2 · ¸ 1 1 T −1 · (x − µi )] + exp − (x − µi ) Pi (x − µi ) · 2 (2π)n/2 |Pi |1/2 · ¸¾ 1 ∂ − (x − µi )T Pi−1 (x − µi ) · ∂Pi 2 ¸ · αi 1 T −1 −1/2 −1 = − |Pi | Pi exp − (x − µi ) Pi (x − µi ) + 2 2(2π)n/2 · ¸ ∂ 1 T −1 + αi Nx (µi , Pi ) − (x − µi ) Pi (x − µi ) ∂Pi 2 α αi i = − Nx (µi , Pi ) Pi−1 − Nx (µi , Pi ) · 2 2 h i · −Pi−1 (x − µi ) (x − µi )T Pi−1 αi αi = − Nx (µi , Pi ) Pi−1 Pi Pi−1 + Nx (µi , Pi ) · h2 i2
· Pi−1 (x − µi ) (x − µi )T Pi−1 h i αi Nx (µi , Pi ) Pi−1 (x − µi ) (x − µi )T − Pi Pi−1 = 2
DIST - University of Genoa
§ D.0 - Derivatives of the Quadratic Renyi Entropy
Then Z p
∂p dX = ∂Pi
=
=
125
g X αi NX (µj , Pj ) NX (µi , Pi ) Pi−1 · 2 j=1 h i · (X − µi ) (X − µi )T − Pi Pi−1 dX Z z αi −1 X P NX (µj , Pj ) NX (µi , Pi ) · 2 i j=1 h i · (X − µi ) (X − µi )T − Pi dXPi−1 Z X z αi −1 P NX (µj , Pj ) NX (µi , Pi ) · 2 i
Z
j=1
· (X − µi ) (X − µi )T dX + Z z X NX (µj , Pj ) NX (µi , Pi ) Pi dX Pi−1 −
(D.6)
j=1
Moreover we have that Z X z z X NX (µj , Pj ) NX (µi , Pi ) Pi dX = αj cji Pi j=1
j=1
and that Z X z NX (µj , Pj ) NX (µi , Pi ) (X − µi ) (X − µi )T dX = j=1
=
z X
Z
j=1
=
z X
NX (µji , Pji ) (X − µi ) (X − µi )T dX
αj cji Z αj cji
NX (µji , Pji ) [(X − µji ) − (µi − µji )] ·
j=1
· [(X − µji ) − (µi − µji )]T dX Z z h X = αj cji NX (µji , Pji ) (X − µji ) (X − µji )T + (µi − µji ) (µi − µji )T + j=1
DIST - University of Genoa
§ D.0 - Derivatives of the Quadratic Renyi Entropy
126
i − (X − µji ) (µi − µji )T − (µi − µji ) (X − µji )T dX ·Z Z z X T αj cji NX (µji , Pji ) (X − µji ) (X − µji ) dX + NX (µji , Pji ) dX · = j=1 Z · (µi − µji ) (µi − µji )T − NX (µji , Pji ) (X − µji ) dX (µi − µji )T + ¸ Z − (µi − µji ) (X − µji )T NX (µji , Pji ) dX =
z X
h i αj cji Pji + (µi − µji ) (µi − µji )T
j=1
DIST - University of Genoa
Bibliography [1] V. V. Fedorov, Theory of optimal experiments. Academic Press, 1972. [2] C. Shannon, “A mathematical theory of communication,” Bell System Technical Journal, vol. 27, pp. 379–423, 1948. [3] G. N. Saridis, “Entropy formulation of optimal and adaptive control,” IEEE Transactions Automatic Control, vol. 33, pp. 713–721, 1988. [4] K. A. Loparo, X. Feng, and Y. Fang, “Optimal state estimation for stochastic systems: an information theoretic approach,” IEEE Transactions on Automatic Control, vol. 42, pp. 771–785, 1997. [5] A. A. Feldbaum, Optimal Control Systems. Academic Press, New York, 1965. [6] N. M. Filatov and H. Unbehauen, “Survey of adaptive dual control methods,” in IEE Proc. Control Theory Appl., (Monterey, CA), pp. 118–128, 2000. [7] D. MacKay, “Information-based objective functions for active data selection,” Neural Computation, vol. 4, pp. 590–604, 1992. [8] Z. Ghahramani, D. A. Cohn, and M. I. Jordan, “Active learning with statistical models,” J. of Artificial Intelligence Research, vol. 4, pp. 129–145, 1996. [9] K. Fukumizu, “Statistical active learning in multilayer perceptrons,” IEEE Transactions on Neural Networks, vol. 11, pp. 17–26, 2000. [10] R. Zoppoli, M. Sanguineti, and T. Parisini, “Approximating networks and the extended ritz method for the solution of functional optimization problems,” Journal of Optimization Theory and Applications, vol. 112, pp. 403– 439, 2002.
127
§ D.0 - Derivatives of the Quadratic Renyi Entropy
128
[11] D. P. Bertsekas, Dynamic Programming and Optimal Control. Athena Scientific, 2001. [12] D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming. Athena Scientific, 1996. [13] A. H. Jazwinski, Filtering Theory. Academic Press, 1970. [14] T. M. Cover and J. A. Thomas, Elements of Information Theory. Wiley, 1991. [15] A. Renyi, “On measures of entropy and information,” in Fourth Berkley Symposium on Mathematical Statistics and Probability, (Berkley), pp. 547– 561, 1961. [16] T. Soderstrom and P. Stoica, Sistem Identification. Prentice-Hall, 1988. [17] I. Shunsuke, Information Theory for Continuous Systems. World Scientific Publishing, 1993. [18] M. Babaali and M. Egerstedt, “Observability of switched linear systems,” in Hybrid Systems: Computation and Control 2004, pp. 48–63, 2004. [19] D. L. Alspach and H. W. Sorenson, “Nonlinear bayesian estimation using gaussian sum approximation,” IEEE Transactions on Automatic Control, vol. 17, pp. 439–447, 1972. [20] A. R. Barron, “Universal approximation bounds for superpositions of a sigmoidal function,” IEEE Transactions on Information Theory, vol. 39, no. 3, 1993. [21] F. Girosi, “Regularization theory, radial basis functions and networks,” in From Statistics to Neural Networks. Theory and Pattern Recognition Applications (J. H. Friedman, V. Cherkassky, and H. Wechsler, eds.), SpringerVerlag, Berlin, 1993. [22] B. Anderson and J. Moore, Optimal Filtering. Prentice-Hall, 1991. [23] K. Ito and K. Xiong, “Gaussian filters for nonlinear filtering problems,” IEEE Transactions on Automatic Control, vol. 45, pp. 910–927, 2000. [24] H. J. Kushner and G. G. Yin, Stochastic Approximation Algorithms and Applications. Springer–Verlag, New York, 1997.
DIST - University of Genoa
§ D.0 - Derivatives of the Quadratic Renyi Entropy
129
[25] M. Baglietto, M. Paolucci, L. Scardovi, and R. Zoppoli, “Information based multi-agent exploration,” in IEEE Third International Workshop on Robot Motion and Control, (Bukowy Dworek, Poland), pp. 173–179, 2002. [26] X. Deng and C. Papadimitriou, “Exploring an unknown graph,” Journal of Graph Theory, vol. 32, pp. 265 –297, 1999. [27] S. Albers and M. R. Henzinger, “Exploring unknown environments,” in Proceedings of the Twenty-Ninth Annual Symposium on Theory of Computing, pp. 416–425, 1997. [28] S. Koenig, C. Tovey, and W. Halliburton, “Greedy mapping of terrain,” in Proceedings of the International Conference on Robotics and Automation, pp. 3594–3599, 2001. [29] B. Kalyanasundaram and K. R. Pruhs, “Constructing competitive tours from local information,” in Proceedings of International Conference on Automata, Languages and Programming, pp. 102–113, 1993. [30] B. Awerbuch, M. Betke, R. Rivest, and M. Singh, “Piecemeal graph learning by a mobile robot,” in Proceedings of 10th Conf. on Computational Learning Theory, (New York), pp. 321–328, 1997. [31] R. K. Cheung, “Iterative methods for dynamic shortest path problems,” Naval Research Logistics, Vol. 45, 1998. [32] I. Murthy and S. Sarkar, “Exact algorithms for the stochastic shortest path problem with a decreasing deadline utility function,” European Journal of Operation Research, 1997. [33] N. Secomandi, “Comparing neuro-dynamic programming algorithms for the vehicle routing problem with stochastic demands,” Computers & Operations Research, vol. 27, pp. 1201–1225, 2000. [34] M. Baglietto, G. Battistelli, F. Vitali, and R. Zoppoli, “Shortest path problems on stochastic graphs: a neuro dynamic programming approach,” in 42nd IEEE Conference on Decision and Control, (Maui, Hawaii), pp. 6187– 6193, 2003. [35] Y. C. Ho and K. C. Chu, “Team decision theory and information structures in optimal control problems,” IEEE Transactions on Automatic Control, vol. 17, pp. 15–28, 1972.
DIST - University of Genoa
§ D.0 - Derivatives of the Quadratic Renyi Entropy
130
[36] P. Whaite and F. P. Ferrie, “Autonomous exploration: driven by uncertainty,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, pp. 193–205, 1997. [37] S. Moorehead, R. Simmons, and L. Whittaker, “A multiple information source planner for autonomous planetary exploration,” in International Symposium on Artificial Intelligence, Robotics and Automation in Space, (Montreal, Canada), 2001. [38] B. Yamauchi, “A frontier based approach for autonomous exploration,” in IEEE International Symposium on Computational Intelligence in Robotics and Automation, (Monterey, CA), pp. 146–151, 1997. [39] F. Zhang, Matrix Theory. Springer-Verlag, 1999.
DIST - University of Genoa