Reduced Complexity Models in the Identification of ... - IEEE Xplore

1 downloads 0 Views 327KB Size Report
Reduced Complexity Models in the Identification of Dynamical. Networks: links with sparsification problems. Donatello Materassi†, Giacomo Innocenti∗ and ...
Joint 48th IEEE Conference on Decision and Control and 28th Chinese Control Conference Shanghai, P.R. China, December 16-18, 2009

ThB17.4

Reduced Complexity Models in the Identification of Dynamical Networks: links with sparsification problems Donatello Materassi† , Giacomo Innocenti∗ and Laura Giarr´e†† †

Department of Electrical and Computer Engineering, University of Minnesota, 200 Union St SE, 55455, Minneapolis (MN). [email protected]

Dipartimento di Sistemi e Informatica, Universit`a di Firenze, via di S. Marta 3, 50139, Firenze, Italy. [email protected]

††

Dipartimento di Ingegneria dell’Automazione e dei Sistemi, Universit´a di Palermo, viale delle Scienze, 90128, Palermo (I). [email protected]

Abstract— In many applicative scenarios it is important to derive information about the topology and the internal connections of more dynamical systems interacting together. Examples can be found in fields as diverse as Economics, Neuroscience and Biochemistry. The paper deals with the problem of deriving a descriptive model of a network, collecting the node outputs as time series with no use of a priori insight on the topology. We cast the problem as the optimization of a cost function operating a trade-off between accuracy and complexity in the final model. We address the problem of reducing the complexity by fixing a certain degree of sparsity, and trying to find the solution that “better” satisfies the constraints according to the criterion of approximation.

I. I NTRODUCTION In many applicative scenarios it is important to derive information about the topology and the internal connections of a network of dynamical systems. Even though all the node signals can be mutually related, in most cases it is important to find which connections are the direct ones. For example, in Economics quantifying the strongest interconnections among a set of stock historical data can suggest good strategies to balance a given portfolio [1]; in a DNA microarray experiments it is possible to measure the gene expression of thousands of genes expecting at most one hundred to be involved in a specific biochemical reaction [2]; in Cognitive Sciences it is fundamental to understand how neurons stimulate each other according to their intricate interconnection [3], in Signal Processing grouping more signals together according to their interdependence can be a first step to develop strategies to compress data for efficient

978-1-4244-3872-3/09/$25.00 ©2009 IEEE

transmission [4]. In this paper we address the problem of deriving a topology from a set of N time series {Xi }i=1,...,N generated by a network of linear dynamical systems. Some approaches have been recently proposed in the literature, but a systematic formulation of the problem is still missing. For example, in [5] a method to identify a network of dynamical systems is described, but the main assumption of the technique is the possibility to manipulate the input of every single node and to perform many experiments to detect the adjacent links. In this respect it is worth remarking that such a hypothesis is not feasible in most situations and is not practical for large networks. We would also like to stress that in the present paper our approach is completely blind. No a priori insight on the topology is known or used. We operate in a different way from the literature on decentralized network such as sensor networks, or in general estimation over networks (i.e. [6], [7] or [8]). In those frameworks the issue is to locally estimate some variables, i.e. by means of consensus algorithms where each node exchanges information with its neighbors and the estimate of the variable is converging over time, even if the topology is switching. Here no intelligence is at the node level, everything is centralized and the scope is mostly the estimation of a topology. The main idea of the paper is to cast the problem in terms of an identification procedure, according to the black box approach. We consider each time series Xj as the output of an unknown dynamical system the input of which is

4796

ThB17.4 given by a set {Xαj1 , ..., Xαjmj } of mj other time series. The identification of the dynamical system leads to the definition of a modeling error which is a natural measure of how the time series {Xαj1 , ..., Xαjmj } can “describe” the output Xj in terms of predictive/smoothing capability. Then, the choice of the set {Xαj1 , ..., Xαjmj } is realized according to a criterion that takes into account the mean square of the modeling error and at the same time forces mj to be small. Once this step has been performed the arcs linking any Xαjmk to Xj are introduced in the graph. Once this procedure has been accomplished for every node Xj , a graph has been identified. We will show that this way of casting the problem has strong similarities with l0 -minimization problems which have been a very active topic of research in Signal Processing during the last few year. Indeed, a l0 -minimization problem amounts to finding the “sparsest” solution of a set of linear equations. Unfortunately, with no additional assumptions on the solution, the problem is combinatorially intractable [4]. This has propelled the study of relaxed problems involving, for example, the minimization of the norm-1 which is a convex problem and it is known to provide solutions with at least a certain order of sparsity [9]. In this paper, we exploit the similarity of the two problems borrowing some algorithmic tools which have been developed in the area of Signal Processing adapting them to our needs. Since we are interested into modeling a network, we are considering a slightly different formulation of the problem: for example, for our needs, it makes sense to fix a certain degree of sparsity, to find the solution that “better” satisfies such a constraint according to an approximate criterion. This problem has many practical motivations. In many cases, we are dealing with measured data affected by noise, thus we can not expect to have constraints exactly satisfied. At the very same time, in this scenario, if we are considering only an approximation of the constraints, we have to explicitly specify how sparse we want the solution to be. Thus, in order to address our problem we resort to small modifications of greedy algorithms which only provide suboptimal solutions, but can be fastly executed [10]. Notation: E[·]: mean operator; . RXY (τ ) = E[X(t)Y (t + τ )]: cross-covariance function of stationary processes; . RX (τ ) = RXX (τ ): autocovariance; . √RXY ρXY = R R : correlation index; X Y Z(·): Zeta-transform of a signal; . ΦXY (z) = Z(RXY (τ )): cross-power spectral density; . ΦX (z) = ΦXX (z): power spectral density; with abuse of notation, ΦX (ω) = ΦX (eiω ); (·)∗ : complex conjugate. II. P ROBLEM SET UP Let us consider a set of N scalar time series {Si }i=1,...,N . Assume that it is possible to remove any deterministic component from them in order to obtain N stochastic processes

{Xi }i=1,...,N which are wide sense stationary and with zero mean [11]. We intend to derive a mathematical model describing, in a quantitative way, the possible connections and the mutual influences among these time series. We decide to model each stochastic process Xj as the superposition of linear dynamic transformations of the other processes’ outputs: Xj (t) = ej (t) +

N X

Wji (z)Xi (t)

(1)

j=1,j6=i

where Wji (z) is a suitable transfer function and ej is the model error. In this framework, it can be considered interesting to find the set of {Wji (z)}i,j=1...N,i6=j which allows us to best describe the time series according to the weighted least squares criterion min

Wji (z)

N X j=1

82 132 9 0 N = < X Wji (z)Xi A5 E 4Qj (z) @Xj − ; :

(2)

j=1,j6=i

where Qj (z) are dynamic weighting transfer functions. The description provided by (1) according to (2) could be exploited in order to make predictions (if Wji are chosen to be causal) or, more in general, detect dynamical relations between random processes (in this case Wji can also be chosen non-causal). It is immediate to verify that, without any loss of generality, the cost (2) can be minimized considering every node Xj separately. Thus, in the following we are focusing only on the following problem for a fixed j   2    N   X     min E .Wji (z)Xi Qj (z) Xj − . (3)   Wji (z)   j=1,j6=i

This problem has a well-known solution which is given by the Wiener Filter and relies on the knowledge (or the estimation) of cross-spectral density functions of the signals involved [12], [13]. The most important drawback with such an approach is the complexity of the final model since it may be given by an a priori large number of non-zero transfer functions, namely N (N − 1). Even in the case of relatively small networks, the number of transfer functions would be too large, especially if they are not actually providing a significative reduction of the cost. Hence, it is quite natural to develop a strategy to obtain a model of reduced complexity keeping only the most important relations. Thus, the reduction of complexity leads also to an interpretation in terms of graph theory. In fact, if we consider only the nodes Xi which are most “useful” to model Xj , we can give an immediate graphical representation. Every time series Xj is represented by a node in a graph and, if Xi is “helpful” to model Xj , a directed arc from Xi to Xj can be depicted (see Figure 1 and Figure 2). A very intuitive approach to perform this complexity reduction is to consider a hard constraint that limits the number of arcs pointing at Xj to mj > 0. In other words, we have to determine the set of mj transfer function Wji providing the most significative reduction of the cost.

4797

ThB17.4 solution of an optimization problem (see [4]) min kwkl0 w

subject to

(5)

x0 = Ψw where Ψ is given by the redundant base employed and kwkl0 := |{i : wi 6= 0}|.

Fig. 1. A complete graph. Every directed edge is present leading to an explosion of the model complexity.

(6)

It can be said that w is the “simplest” way to express x0 as a linear combination of the columns of Ψ, where the concept of simplicity is given by the number of null entries of w. If we neglect, for the sake of argument, that in our optimization space we are not considering a standard scalar product, but transformations associated with linear dynamical systems Wji (z), then the formulation of our problem is equivalent to min kx0 − Ψwk2 w

subject to

(7)

kwkl0 ≤ m.

Fig. 2. A reduced graph: as it can be noted the property of connectivity is not imposed in our theoretical framework.).

Then the problem can be cast as follows

min min E αjk Wji (z)

"  

Qj (z) Xj −

mj X

Wjαjk (z)Xαjk

k=1

where αjk ∈ {1, ..., N }/{j} with 1 ≤ k ≤ mj .

!#2   

The parallelism of (7) with the problem stated in the previous section is intuitive. Indeed, we are approximating a vector x0 as a linear combination of m columns of Ψ. Problems (7) and (5) are strictly related in many senses. If there exists a sparse solution w∗ for (5) and its l0 norm is not greater than m, then w∗ is an optimal solution for (7), as well. However, it is also possible that the solution w∗ exists, but kwkl0 > m. In the more general case where the relation Ψw = x0 can be satisfied exactly by a certain w, the problem (7) can be rewritten in the form

(4)

min kΨδwk2 δw

subject to

(8)

kw + δwkl0 ≤ m.

III. L INKS BETWEEN COMPLEXITY REDUCTION AND l0 MINIMIZATION

In this section, we try to describe how the problem we have introduced in the previous section is related to the problem of sparse representation of a signal in terms of a redundant base. We intend to draw simple theoretical and abstract analogies in order to justify our adoption, in the next section, of modified greedy algorithms from the context of l0 minimization. Indeed, considering these analogies makes it possible to borrow a set of theoretical, analytical and algorithmic tools from an area which has been extensively studied (even though many problems still remain open or are not tractable). In the recent few years sparsity problems have attracted the attention of researchers in the field of Signal Processing. The reason is mainly due to the possibility of representing a signal using only few elements (words) of a redundant base (dictionary). In a formal way, one would like to recover a signal x0 using a sparse representation w which is the

This transformation does not change the combinatorial nature of the problem, but it can be useful to find an upperbound for the optimum of (7). In such a case, δw can be chosen in order to cancel kwkl0 − m components of w and kΨδk2 provides the desired upperbound in a straightforward manner. This is also related to an important difference between (5) and (7), namely that (7) always admits a solution for any x0 while it is not always the case for (5). Intuitively, if a solution w of the equation Ψw = x0 is available and w is “very sparse” (even though kwk ≥ m), then it is possible to eliminate only few components to find a suboptimal solution for (7). This consideration has practical importance since a standard approach to l0 minimization is to relax the problem minimizing the norm-1 of the vector w. Under the property of restricted isometry, the optimum of the norm-1 problem matches the optimum of the l0 problem [14]. But a feature that makes the norm-1-optimization even more attractive is that its optimum always guarantees at least a certain order of sparsity [9]. To extend the theory

4798

ThB17.4 of l1 minimization to the case of complexity reduction of a network goes beyond the scope of this paper, even though we believe that this line of research deserves to be investigated. IV. A G REEDY A LGORITHM FOR SUBOPTIMAL SOLUTIONS

The problem of network complexity reduction we have formulated in this paper is strictly related to the problem of sparsification as intuitively explained in the previous section. This shows that a network complexity reduction problem is quite challenging to solve, but, at the very same time, inherits a set of practical tools already developed which can be employed to tackle it. Here we present, as an illustrative example, a modifications of a greedy algorithm well-known in the Signal Processing community which can be adopted to obtain suboptimal solutions. We focus on greedy algorithms for two main reasons. First, in order to find a model of a network with reduced complexity, it is necessary to solve an optimization problem of the form of (4) for any single node, so the algorithm speed is a fundamental factor. Moreover, since the complexity of the network model is the final goal, greedy algorithms allow one to specify explicitly the connection degree mj of every node Xj , while this is not possible, for example, using norm1 minimization. A modified Orthogonal Least Squares (Cycling Orthogonal Least Squares) Orthogonal Least Squares (OLS) is a greedy algorithm proposed for the first time in [15] and in many ways it resembles the algorithm of Matching Pursuit developed in [16]. It basically consists of iterated orthogonal projections on a (possibly redundant) base to approximate a given vector. At the n−th iteration step, OLS calculates an approximation ˆ j of the vector Xj based on a set of vectors Γn = X {Xαjk }k=1,...,n . Then the approximation error rn = Xj − ˆ j is computed and used to find a new vector Xα X j(n+1) according to the criterion αj(n+1) = arg max |hrn , Xk i| k6=j

(9)

and Γn+1 := Γn ∪ {Xαj(n+1) }. The standard OLS goes on at every step introducing a new vector until a stopping condition is met (usually on the norm of the residual rk ). We propose an algorithm which derives directly from OLS but it doesn’t increase the number of vectors Xαjk approximating Xj above mj . The rationale is very simple. At any iteration, given the set of vectors {Xαjk }k=1,...,mj the algorithm chooses a vector Xαjk to be removed and tries to replace it with another vector in order to improve the quality of the approximation. If such an improvement is not possible for any vector, the algorithm stops. It can be formally represented using the following pseudocode. Cycling Orthogonal Least Squares: 0. define X0 := 0 (null time series) 1. initialize the variables αji = 0 for i = 1, ..., m, r := Xj , m = 1, c = 1

2. while m ≤ mj 2a. for k = 1, ..., m, k 6= j define rk as the projection of r on to span{{Xαji }i6=c ∪ {Xk }k6=j } 2b. α = arg maxk6=j krk k 2c. if αjc = α then m := m + 1 2d. if αjc 6= α then αjc = α, m := 1 2e. c := (c + 1) mod m 3. return {Xαji }i=1,...,mj Remark 1: While dealing with vectors in finite dimensional spaces, the concept of projection is quite intuitive. It is important to explain what we mean by “orthogonal projection” in our space of stochastic processes Xj . By orthogonal projection of a stochastic process Xj on a set of stochastic processes {Xαi }i=1,...,mj , we mean the best estimate of Xj which can be given, in the sense of the least squares, as a linear dynamic combination of {Xαji }i=1,...,mj . Such an estimate is given by the well-known Wiener Filter and we remand to [12]. V. S IMULATION AND APPLICATIVE RESULTS In this section we report some numerical results in order to describe how the application of the Cycling OLS we have introduced performs in terms of modeling a network. More specifically we start considering a simulation where the structure of the dynamical network is known and evaluate the quality of the sub-optimal solution in terms of reconstruction of the underlying topology. As a second more concrete example we apply our technique to infer information about historical series of economic data. We have considered the exchange rates of 22 currencies in a time span of about 7 years and we show how the application of our method can lead to a qualitative and quantitative understanding of the mutual relations between the economies of different countries. A. A simulated network We considered a randomly generated network of 23 nodes whose actual structure is represented in Figure 3(a). The transfer functions were randomly generated third order FIR models and we computed 1000 simulation steps considering white additive and mutually not correlated noises acting on every single node according to the model (1). From these data, we have decided to tackle the problem of deriving a model for the network with the specific aim of reconstructing its topology. Even though the network contains a relatively small number of nodes, a complete search to minimize the cost (4) is not practically viable, even for small values of mj ’s. Thus we decided to use our technique to infer information about the network topology according to (1) and considering the three constraints mj ≤ 1, mj ≤ 2 and mj ≤ 3, uniformly applied in any case to every node. We considered the cost (4) with Qj = 1 adopting non-causal Wiener Filters (for the sake of simplicity). This means that we are considering a smoothing scenario for our modeling. If the Wiener Filters had been chosen to be causal, then, of course, we would have been in a predictive scenario.

4799

ThB17.4 18

18

9

9

17

7

18

13 14

6

1 23

5

20 10

20

23

5

12

2

(b)

20 10

12

2

(a)

20 10

12

2

6

1 23

5

10

12

13 14

6

1 23

5

22 19

21

4

13 14

6

1

22 19

21

4

24 14

16 15

3

22 19

21

4

11

16 15

3

22 19

21

4

8

11

16 15

3

17

7

8

11

16

9

17

7

8

11

15

3

9

17

7

8

18

2

(c)

(d)

Fig. 3. The actual structure of the network of Example V-A (a) and the reconstruction obtained using the Cycling OLS algorithm with mj = 1 (b), mj = 2 (c) and mj = 3 (d).

Name Australian Dollar Brazil Real Canadian Dollar Chinese Renminbi Danish Krone Euro British Pound Hong Kong Dollar Indian Rupee Japanese Yen South Korean Won Sri Lankan rupee Mexican Peso Malaysian Ringgit Norwegian Krone New Zealand Dollar Swedish Krona Singapore Dollar Thai Baht Taiwanese Dollar American Dollar South African Rand

Code AUD BRL CAD CNY DKK EU GPB HKD INR JPY KRW LKR MXN MYR NOK NZD SEK SGD THB TWD USD ZAR

Country Australia Brazil Canada China Denmark European Union Great Britain Hong Kong India Japan South Korea Sri Lanka Mexico Malaysia Norway New Zealand Sweden Singapore Thailand Taiwan United States of America South Africa

estimated topologies are depicted in Figure 4 a, Figure 4 b and Figure 4 c. VI. C ONCLUSIONS We have formulated the problem of deriving a topological structure from a set of time series. Every time series is represented as a node in a graph. The approach we follow in determining the arcs relies on identification techniques. If a time series Xi is “useful” to model the time series Xj , then the directed arc (i, j) is introduced in the graph. In order to modulate the complexity of the final graph a maximum number mj of arcs pointing at Xj is assumed and a cost function must be minimized to find the arcs. The problem has a similar formulation and strong connections with the problem of compressing sensing which has been widely studied in the last few years. However, since an optimization problem must be solved for any single node, we consider the application of suboptimal greedy algorithms. A modification of the Orthogonal Least Squares is illustrated in the paper. Numerical simulations comparing both the optimal and the suboptimal solutions are provided. A real data case study has been considered: the currencies (daily data of the past 10 years normalized to the swiss frank).

TABLE I L IST OF THE COMPANIES CONSIDERED IN THE ANALYSIS .

The Wiener filters have been computed estimating the crossspectral densities of the signals Xi , under the assumption of ergodicity. The results of the Cycling OLS algorithm are shown in Figure 3. Note as the value of mj incresease more correct links are introduced in the reconstructed topology, but also spurious ones.

VII. ACKNOWLEDGMENTS The authors want to thank Dr. Bernardetta Addis for useful discussions and suggestions

B. An application to real data: currency exchange rates In this section we also present the results obtained by applying our technique to real data. We have considered the daily exchange rate of 22 selected currencies (reported in Table I) from the last 7 years providing 1715 samples for any of the time series. The missing data (the exchange rate on Saturdays and Sundays) have been interpolated (by cubic splines) such that a total number of 2400 daily points have been obtained for our analysis. The Cycling OLS algorithm has been applied on the logarithmic returns of the time series (a standard procedure in Finance,) with order 1,2 e 3 and the

4800

R EFERENCES [1] R. Mantegna and H. Stanley, An Introduction to Econophysics: Correlations and Complexity in Finance. Cambridge UK: Cambridge University Press, 2000. [2] D. Marinazzo, M. Pellicoro, and S. Stramaglia, “Kernel granger causality and the analysis of dynamical networks,” Phys. Rev. E, vol. 77, p. 056215, 2008. [3] A. Brovelli, M. Ding, A. Ledberg, Y. Chen, R. Nakamura, and S. L. Bressler, “Beta oscillations in a large-scale sensorimotor cortical network: directional influences revealed by granger causality.” Proc Natl Acad Sci USA, vol. 101, no. 26, pp. 9849–9854, June 2004. [4] E. Cand`es, M. Wakin, and S. Boyd, “Enhancing sparsity by reweighted l1 minimization,” Journal of Fourier Analysis and Applications, vol. 14, pp. 877–905, 2008. [5] M. Timme, “Revealing network connectivity from response dynamics,” Phys. Rev. Lett., vol. 98, no. 22, p. 224101, 2008.

ThB17.4 USD

MXN

CAD

USD BRL

EU

INR

GBP

DKK

MYR

SEK

THB

ZAR

KRW JPY NZD

TWD CNY

HKD

INR

DKK

MYR

SEK

THB

ZAR

KRW JPY NZD

TWD CNY

(a) Fig. 4.

HKD

(b)

INR

GBP

SGD

NOK

BRL

EU LKR

AUD

MXN

CAD

GBP

SGD

NOK

USD BRL

EU LKR

AUD

MXN

CAD

LKR

DKK

SGD

NOK

MYR

SEK

THB

ZAR

KRW AUD

JPY NZD

TWD CNY

HKD

(c)

The reconstructed topologies obtaining applying the Cycling OLS to the exchange rate time series of the 22 selected currencies.

[6] R. Olfati-Saber, “Distributed kalman filtering for sensor networks,” in Proc. of IEEE CDC, New Orleans, 2007, pp. 5492–5498. [7] L. Schenato, M. Franceschetti, K. Poolla, and S. Sastry, “Foundations of control and estimation over lossy networks,” Proceedings of the IEEE, Special Issue on Networked Control Systems, vol. 95, no. 1, January 2007. [8] I. Schizas, A. Ribeiro, and G. Giannakis, “Consensus in ad hoc WSNs with noisy links-part i: Distributed estimation of deterministic signals,” IEEE Trans. on Signal Processing, vol. 56, no. 1, 2008. [9] D. Napoletani and T. Sauer, “Reconstructing the topology of sparsely connected dynamical networks,” Phys. Rev. E, vol. 77, p. 026103, 2008. [10] J. A. Tropp, “Greed is good: Algorithmic results for sparse approximation,” IEEE Trans. Inform. Theory, vol. 50, pp. 2231–2242, 2004. [11] A. Shiryaev, Probability. New York: Springer-Verlag, 1995. [12] A. Sayed and T. Kailath, “A survey of spectral factorization methods,” Numerical Linear Algebra with Applications, vol. 8, pp. 467–469, 2001. [13] P. E. Caines, Linear stochastic systems. New York, NY, USA: John Wiley & Sons, Inc., 1987. [14] E. J. Cand`es and T. Tao, “Decoding by linear programming,” IEEE Transactions on Information Theory, vol. 51, no. 12, pp. 4203–4215, 2005. [15] S. Chen, S. A. Billings, and W. Luo, “Orthogonal least squares methods and their application to non-linear system identification,” Intl. J. Control, vol. 50, no. 5, pp. 1873–1896, 1989. [16] S. Mallat and Z. Zhang, “Matching pursuits with time-frequency dictionaries,” IEEE Transactions on Signal Processing, vol. 41, no. 12, 1993.

4801

Suggest Documents