Parallel and distributed computational multivariate ...

4 downloads 20 Views 471KB Size Report
In the 1990's, Hamilton's book in time series anal- ysis (Hamilton 1994), for .... Each processor stores at most NB rows, where NB = [N/P1 and the row k is stored ...
Proceedings of the American Control Conference Anchorage, AK May 8-10,2002

Parallel and Distributed Computational Multivariate Time Series Modeling in the State Space Celso Pascoli Bottura'

Gilmar Barreto'

[email protected]

[email protected]

Mauricio Jose Bordon3

Annabell Del Real Tamariz3

[email protected]

[email protected]

Prof. Dr. School of Electrical and Computer Engineering, State University of Campinas, UNICAMP, Campinas, Brazil Prof. Ms. School of Electrical and Computer Engineering, State University of Campinas, UNICAMP, Campinas, Brazil PhD. Student at the School of Electrical and Computer Engineering - State University of Campinas, UNICAMP, Campinas, Brazil

ABSTRACT - In this paper a parallel and distributed computational procedure using a subspace method developed by Masanao Aoki for state space modeling of multivariate time series is proposed and implemented. The parallel solution of the Riccati equation due to the large computational effort it requires receives a special attention. For model evaluation, short time predictions, where a central role is played by a Kalman filtering approach are tested and some results are presented.

1 INTRODUCTION The field of Multivariate Time Series Modeling and its applications is certainly very wide and still is in its beginning years, in spite of all its advances, realizations and application possibilities to engineering, economics, medicine, ecology. Modeling and prediction of time series benefited from the ideas and work of (Wiener 1949) among others, but for multivariate analysis of signals and systems, the developments that the theory of linear systems suffered in the second half of the 20th century are very important and fundamental. From the end of the 1950's, Kalman's state space approach had a great impact on it (Kailath 1980) , (Caines 1988) . Time series modeling, identification and systems analyses could and did get benefits from it , but at different paces, in some sense. The state space approach to time series modeling has benefited from works of (Akaike 1974) and (Faurre 1973) in the 1970's, and from the work of (Aoki 1990) in the 1980's. In the 1990's, Hamilton's book in time series analysis (Hamilton 1994), for instance, devoted special attention to state space modeling of time series and this tendency is certainly a growing one, for instance in the area of economics. High performance computing is more recent, and state space modeling and computing of multivariate time series in all areas cited above and particularly in digital communications and intelligent control can benefit from it. This paper presents a parallel and distributed algorithm for multivariate time series modeling. A linear dynamic system, time invariant and finite dimensional can represent a family of vector time series (Willems 1987). If the covariances of this family of vectors, yis, are placed in a convenient matrix form, an Hankel

0-7803-729&0/02/$17.00 0 2002 AACC

matrix will be obtained. The treatment of the Hankel matrix by Masanao Aoki's algorithm (Aoki 1990). through adequate numerical techniques, leads us to the obtention of one of the realizations for the dynamic system as a model for the time series under study. The focus of this work is to present a contribution for data processing time reduction for multivariate time series modeling through the parallel and distributed computational treatment of these data. For doing this, an original procedure and its implementation are here presented. The parallel solution of the Riccati equation receives a special attention due to the large computational effort it requires; this is a consequence of our earlier work, (Bottura 1997), where we detected that the solution of the Riccati equation is a bottleneck to be overcome in order to tum the algorithm faster. For model evaluation, short time predictions, where a central role is played by a Kalman filtering approach are tested and some results are presented.

2 PARALLEL MASANAO AOKI's ALGORITHM Modeling of multivariate time series in the state space requires multiple experiments with sequences of multivariate discretetime random signals. Such model can be represented in many cases, as these are the ones we address in this paper, by a linear stochastic time-invariant diseete-time system, with a white noise input et:

{

zt+l yt

= Axt + B e t

= Cxt +et

(1)

where zt E R" : weakly stationary state vector; et E RP : weakly stationary serially uncorrelated zero mean noise vector; yt E RP : observation vector (output); A E Rnxn: system's dynamic matrix; B E R n X P : system's input matrix and C E RPx" : output matrix and where it is supposed that the input noise has a covariance matrix given by :

E(eteT) = A&,,, A > 0 regularity condition

(2)

where S t , , is the Kronecker delta and A is a constant symmetric matrix and that the state vector covariance matrix is given by E ( z t x T ) = II ,where the matrix ll is constant.

1466

For the obtention of the triple (A, B and C) from the output time series, the pair of matrices { A , C} is supposed observable and the pair of matrices {A: B } is supposed reachable; in addition { A , B, C} is assumed to be a balanced minimal realization for the system given by (1). that is to say G , = Go,where Go is the observability grammian and G , is the reachability grammian:

where the matrix M is defined by :

M = E(Xk+lyT) = A n C T + B A . Following the same reasoning as in (Aoki 1990). we define the observability matrix for the system (1) by :

1 m

C"A

m

Go= x(AT)'CCTCAk ;

G, =

AkBBT(AT)k k=O

k=O

O J = l CAJ-'

(3)

A2M ... A K - ' M ] .

5 2 ~ = [ M AM Step 1 - Simultaneous Generation of matrices HA , HM, H C , H ,Y- e Y+.

0 0

..

Y- =

.

..

0 0

0 0

.

..

..' ... ... ...

. .,. YN-K

YN-K+I

...

YN-K-I

YN-K

Y3

yz y1

(IO)

Furthermore, from the system's output data three new matrices can be generated:

From system's output samples the matrices YE R ( K p ) x ( N - l ) and Y+ E R ( J p ) x ( N - l ) , given below, can be generated :

Yz y1 0

I?

(9)

and a matrix with the same structure of the reachability matrix :

By a detailed analysis and partitioning of Masanao Aoki's algorithm, a procedure with the following seven parallelization steps is structured, Fig.1, and executed as follows :

Y1

1

- HA and HM given by :

YN-1

YN-2 YN-3

A2

Ag

A3 A4

A4 A5

.... ... ...

(4)

-

and

and

H~=o.,M=

Y+ =

and

- HC: Hc=

(5) The product of the matrices Y+ and YT allows the generation of Hankel matrix, H, without requiring the explicit calculation of the covariances :

[ Ai

A2

AK]r

as well as

where A0 is the output vector autocovariance. Step 2 - The singular value decomposition (SVD) of the matrix H J , K = UC1/2C1/2VT is obtained.

HJ.K=

-

Step 3 - Here the parallel calculations of some of the system (1) matrices are made:

-

CM CAM CA2M

CAM ... CAK-'M C A ~ M ... C A ~ M CA3M ... CAK+'M

CA~-IM

C A ~ M ... C A ~ + ~ - ' M -

..

-

..

and

as well as of the matrix M = AnCT+ BA.

1467

M = C-'/'UTHM

,where

*

Step 4 - The symplectic matrix :

9=

[ 9-QiETD -9TD

Q9T 9-T

]

single-program multiple-data ( SPMD). The second part concerns with the Real Schur Form ( RSF) determination that is implemented using Scalapack libraries, (Blackford 1997).

(I7)

3

is built, where the auxiliary matrices are given by, \k = AT - CTA;'M~

(18)

Q = CTAh,'C

(1%

D = MA;'M~.

(20)

and Step 5 - The matrix II , solution of the Riccati equation, is determined for :

The first part: ESST reduction to Hessenberg matrix, is fundamentally based on matrix-matrix multiplication operators, representing a higher computational cost; these operations are made by row accessing the matrix elements; in this way blocks that are distributed between the different participating processors are formed. In this case it is possible to parallelize the multiplication operation in such a way that each processor computes the values of different elements of the resulting vector. Each processor stores at most NB rows, where NB = [N/P1 and the row k is stored at the Lk/NBJ processor. The distribution assigns row blocks of size NB on successive processors. If a value of P is divisible by the value of N , then each processor will receive blocks of the same size, in such a way to guarantee the best work loading.

For a SPMD programming paradigm, each participating processor makes a multiplication operation (C = A * B) with the data sent by the principal processor, that is to say, data are divided and each processor computes at most NB blocks of the resultK(II) = (M - A ~ C ~ ) ( A ~ - ( C I I C ~ )-- A ~ (I M I C ~ ) ~ ing . matrix for the matrix-matrix multiplication operation it realizes. The principal processor encharged of data distribution (22) and calculations realization sends at most NB blocks of matrix The solution H can be obtained from the real Schur deA ( N B , N) and of matrix B ( N ,N) for each participating procomposition of the matrix = WTWT. Partitioning the cessor and receives a C ( N B ,N) matrix with the final result of unitary matrix W as : each processor. Figure 2 shows the described procedure for a (6 x 6) system. where

+

the following proposition is enunciated (Vaughan 1970):

Proposition2.1 The matrix

II = Wzl W;:

is the solu-

tion of the Riccati equation.

Step 6 - Calcutation of the matrix A ,where A = A0 - CHCT. Step 7 - Obtention of matrix B = (M - A n C T ) A - ' In a deeper and more detailed analysis of our partition of Masanao Aoki's algorithm as structured in Fig.1,the procedure for computational modeling of multivariate time series can be re-structured and improved by some more parallelization steps, such as the solution of the discrete algebraic Riccati equation in parallel, for which an original procedure was also proposed (Bottura 1999)(Tamariz 1999)and a brief discussion is here presented in the next section.

3 Parallel Solution of the Discrete Algebraic Riccati Equation The proposed parallel procedure to solve the discrete algebraic Riccati equation (DARE) is divided in two parts. The first one concerns with the upper Hessenberg matrix via Elementary Stabilized Similarity Transformations ( ESST) ,using the message passing parallel programming paradigm, specifically Message Passing Interface (MI'/) implementation, (Snir 1996),distributed memory, asynchronous communication; the sending and receiving message semantics are blocking with programming mode

The algorithm second part, concerned with the RSF determination, is based on the Scalapack library utilization. In this phase, the basic mechanisms of the library: grid creations, intertasks communication, data input-output, are present. Scalapack uses a data distribution with cyclic blocks of 2-dimension, where a matrix divided into blocks (MB x NB), is assigned to the same processor. The cyclic data distribution is parameterized by the four numbers: P,Q,r and c, where (P x Q ) defines the grid and (T x c ) is the block size. The processor are logically arranged as a grid of P rows and Q columns; hence the number of processors used is R = P x Q. More details about the utilization of this library can be obtained in (Blackford 1997). We assume the matrix is partitioned in (MB x NB) blocks and that the first block is given to the process of coordinates (RSRC, CSRC), where RSRC is the process row over which the first row of the matrix is distributed and CSRC is the process column over which the first column of the matrix is distributed. The matrix entry (IJ) is thus to be found in the process of coordinates (pr,pc) within the local (1.m) block at the position (x,y) given by:

(P-,P.) = ((RSRC+L(I-l)/MBJ)modP, (CSRC+l(J-l)/NBJ)modQ), (z,y)

= (mod(1- 1 , M B ) + l , m o d ( J - l , N B ) + 1 ) .

we present in ~i~~~~3 the mapping of matrix (6 cess grid (2

6 ) onto pro-

2).

For this parallelization of Aoki's algorithm the load balancing among the utilized processors has not been considered. The processing system uses : a network of UNIX workstations (Stevens 1990)and programming environments: languages

1468

Figure 3: Mapping of Matrix 6 x 6 onto process grid 2 x 2

and operational systems for parallel and/or distributed computation specially the MPI: Message Passing Interface, that through user’s interface primitives allows process calls, messages sending and receiving, data spreading, memory sharing among other resources. During the initial processing steps, covariance matrices are built from the same data set, as these matrices are independent and can be distributed among the various processors and simultaneously executed. The partial results are then combined by a manager processor that stores and sends the more recently obtained matrices to the processors that will execute the next tasks. The final calculations of the system matrices depend on the finalizationof the Hankel matrix singular values calculation. This Hankel matrix SVD is also executed in parallel as well as the real Schur decomposition in the proposed procedure for parallel and distributed computational modeling of time series via Aoki’s algorithm.

Figure 1: A Parallel Masanao Aoki’s Algorithm Structure 4 Experimental Results

bit ... bi.

c-

[ c3I

Receive

-[ Send

c-

Receive

-[ Sed

-

Receive

0 c12

The proposed algorithm is applied to parallel and distributed computational modeling of the multivariate time series samples given by {yl, y ~. .,. y ~ }yt, E R2.The obtained model with these N samples is the triple :

R

.._

AN=

a # j [ bnl ... ... ...

‘-1

[

-0.2497 0.7490 0.0045

=

bit ... bi.

a42... as1 as2... as.

[

CII CSl

c42 ... C J 2 ...

a61 an2... ah]

b..

CID

0 R

CJ

[

.,.I

bii

.._

b.1

... b..

... ...

-0.6753 0.3505 0.0649

bin

n

[

~~:~~~~ ]:

BN =

-0.4679

-2.6917 -0.2622

1.6029 -1.0398

[

-0.1129 -0.0745

-0.2225 0.3285 0.0597

0.2349 0.6982 0.1756

]

From the sequence ( y 1 y2,. . . Y N } we take the first q vectors { y l , y2:. . . y,,} with q