Optimization Algorithms on Homogeneous Spaces - CiteSeerX

Optimization Algorithms on Homogeneous Spaces WITH

APPLICATIONS IN LINEAR SYSTEMS THEORY

Robert Mahony

March 1994 Presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy at the Australian National University

Department of Systems Engineering Research School of Information Sciences and Engineering Australian National University

Acknowledgements

I would like to thank my supervisors John Moore and Iven Mareels for their support, insight, technical help and for teaching me to enjoy research. Thanks to Uwe Helmke for his enthusiasm and support and Wei-Yong Yan for many useful suggestions. I would also like to thank the other staff and students of the department for providing an enjoyable and exciting environment for work, especially the students from lakeview for not working too hard. I reserve a special thanks for Peter Kootsookos because I owe him one. I have been lucky enough to visit Unite Auto, Catholic University of Leuven, Louvain-la-Neuve and the Department of Mathematics, University of Regensburg, for extended periods during my studies and thank the staff and students of both institutions for their support. A number of people have made helpful comments and contributions to the results contained in this thesis. In particular, I would like to thank George Bastin, Guy Campion, Kenneth Driessel, Ed Henrich, Ian Hiskens, David Hill and David Stewart as well several anonymous reviewers. Apart from the support of the Australian National University I have also received additional financial support from the following sources:

The Cooperative Research Centre for Robust and Adaptive Systems, funded by the Australian Commonwealth Government under the Cooperative Research Centres Program. Grant I-0184-078.06/91 from the G.I.F., the German-Israeli Foundation for Scientific Research and Development Boeing Commercial Aircraft Corporation.

Lastly I thank Pauline Allingham for her support and care throughout my doctorate.

i

Statement of Originality

The work presented in this thesis is the result of original research done by myself, in collaboration with others, while enrolled in the Department of Systems Engineering as a Doctor of Philosophy student. It has not been submitted for any other degree or award in any other university or educational institution. Following is a list of publications in refereed journals and conference proceedings completed while I was a Doctor of Philosophy student. Much of the technical discussion given in this thesis is based on work described in the papers numbers [1,2,5,6,10,11] from the list below. The remaining papers cover material I chose not to include in this thesis. Journal Papers:

1.

R. E. Mahony and U. Helmke. System assignment and pole placement for symmetric realisations. Submitted to Journal of Mathematical Systems, Estimation and Control, 1994.

2.

R. E. Mahony, U. Helmke, and J. B. Moore. Gradient algorithms for principal component analysis. Submitted to Journal of the Australian Mathematical Society, 1994.

3.

R. E. Mahony and I. M. Mareels. Global solutions for differential/algebraic systems and implications for Lyapunov direct stability methods. To appear Journal of Mathematical Systems, Estimation and Control, 1994.

4.

R. E. Mahony, I. M. Mareels, G. Campion, and G. Bastin. Non-linear feedback laws for output regulation. Draft version, 1994.

5.

J. B. Moore, R. E. Mahony, and U. Helmke. Numerical gradient algorithms for eigenvalue and singular value calculations. SIAM Journal of Matrix Analysis, 15(3), 1994.

ii

Conference Papers:

6.

R. E. Mahony, U. Helmke, and J. B. Moore. Pole placement algorithms for symmetric realisations. In Proceedings of IEEE Conference on Decision and Control, San Antonio, U.S.A., 1993.

7.

R. E. Mahony and I. M. Mareels. Global solutions for differential/algebraic systems and implications for Lyapunov stability methods. In Proceedings of the 12’th World Congress of the International Federation of Automatic Control, Sydney, Australia, 1993.

8.

R. E. Mahony and I. M. Mareels.

Non-linear feedback laws for output stabilization.

Submitted to the IEEE Conference on Decision and Control, 1994. 9.

R. E. Mahony, I. M. Mareels, G. Campion, and G. Bastin. Output regulation for systems linear in the input. In Conference on Mathematical Theory of Networks and Systems, Regensburg, Germany, 1993.

10. R. E. Mahony and J. B. Moore.

Recursive interior-point linear programming algo-

rithm based on Lie-Brockett flows. In Proceedings of the International Conference on Optimisation: Techniques and Applications, Singapore, 1992. 11. J. B. Moore, R. E. Mahony, and U. Helmke. Recursive gradient algorithms for eigenvalue and singular value decompositions. In Proceedings of the American Control Conference, Chicago, U.S.A., 1992.

Robert Mahony

iii

iv

Abstract Constrained optimization problems are commonplace in linear systems theory. In many cases the constraint set is a homogeneous space and the additional geometric insight provided by the Lie-group structure provides a framework in which to tackle the numerical optimization task. The fundamental advantage of this approach is that algorithms designed and implemented using the geometry of the homogeneous space explicitly preserve the constraint set. In this thesis the numerical solution of a number of optimization problems constrained to homogeneous spaces are considered. The first example studied is the task of determining the eigenvalues of a symmetric matrix (or the singular values of an arbitrary matrix) by interpolating known gradient flow solutions using matrix exponentials. Next the related problem of determining principal components of a symmetric matrix is discussed. A continuous-time gradient flow is derived that leads to a discrete exponential interpolation of the continuous-time flow which converges to the desired limit. A comparison to classical algorithms for the same task is given. The third example discussed, this time drawn from the field of linear systems theory, is the task of arbitrary pole placement using static feedback for a structured class of linear systems. The remainder of the thesis provides a review of the underlying theory relevant to the three examples considered and develops a mathematical framework in which the proposed numerical algorithms can be understood. This framework leads to a general form for a solution to any optimization problem on a homogeneous space. An important consequence of the theoretical review is that it develops the mathematical tools necessary to understand more sophisticated numerical algorithms. The thesis concludes by proposing a quadratically convergent numerical optimization method, based on the Newton-Raphson algorithm, which evolves explicitly on a Lie-group. v

vi

Contents Acknowledgements

:::::::::::::::::::::::::::::::::

i

:::::::::::::::::::::::::::::::

ii

Statement of Originality

Abstract

v

Glossary of Symbols

1

Introduction 1.1

Historical Perspective

1.2

:::::::::::::::::::::::::::::

6

::::::::::::::

6

1.1.1

Dynamical Systems as Numerical Methods

1.1.2

Optimization Techniques and Numerical Solutions to Differential Equa-

1.1.3

xi

1

tions

2

:::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::

10

::::::::::

17

::::::::::::::::::::::::::::::

19

Linear Systems Theory and Pole Placement Results

Summary of Results

Numerical Gradient Algorithms for Eigenvalue Calculations

24

::::::::::::::::::::::::

27

::::::::::::::::::::::::::::::

32

:::::::::::::::::::::::::::::::

37

:::::::::::::::::::::::::

41

2.1

The Double-Bracket Algorithm

2.2

Step-Size Selection :

2.3

Stability Analysis

2.4

Singular Value Computations

vii

2.5

Associated Orthogonal Algorithms :

2.6

Computational Considerations

2.7

Open Questions and Further Work 2.7.1

3

47

:::::::::::::::::::::::::

51

:::::::::::::::::::::::

52

::::::::::::::

53

Time-Varying Double-Bracket Algorithms

Gradient Algorithms for Principal Component Analysis

57

A Gradient Descent Algorithm

:::::::::::::::::::::::::

65


:::::::::::::::::::::::::

69

::::::::::::::::::::::

69

:::::::::::::::

70

::::::::::::::::::::

71

::::::::::::::::::::::::::

72

Continuous-Time Gradient Flow

3.2 3.3

3.5

55

::::::::::::::::::::::::

3.1

3.4

4

::::::::::::::::::::::

3.3.1

An Equivalent Formulation

3.3.2

Padé Approximations of the Exponential

Comparison with Classical Algorithms 3.4.1

The Power Method :

3.4.2

The Steepest Ascent Algorithm

::::::::::::::::::::

75

3.4.3

The Generalised Power Method

::::::::::::::::::::

76

:::::::::::::::::::::::

79

Open Questions and Further Work

Pole Placement for Symmetric Realisations

81

:::::::::::::::::::::::::::

83

:::::::::::::::::::::

90

:::::::::::::::::::::::

93

:::::

102

::::::::::::::::::::::::::::::::::

107

::::::::::::::

113

4.1

Statement of the Problem

4.2

Geometry of Output Feedback Orbits

4.3

Least Squares System Assignment

4.4

Least Squares Pole placement and Simultaneous System Assignment

4.5

Simulations

4.6

Numerical Methods for Symmetric Pole Placement

viii

4.7

5

6

7


:::::::::::::::::::::::

Gradient Flows on Lie-Groups and Homogeneous Spaces

118

121

:::::::::::::::::::::

123

:::::::::::::

125

:::::::::

127

:::::::::::::::::::::::::::::::::

130

::::::::::::::::::::::::

133

::::::

135

:::::::::::::::

141

5.1

Lie-groups and Homogeneous Spaces

5.2

Semi-Algebraic Lie-Groups, Actions and their Orbits

5.3

Riemannian Metrics on Lie-groups and Homogeneous Spaces

5.4

Gradient Flows

5.5

Convergence of Gradient Flows

5.6

Lie-Algebras, The Exponential Map and the General Linear Group

5.7

Affine Connections and Covariant Differentiation

5.8

Right Invariant Affine Connections on Lie-Groups :

5.9

Geodesics

::::::::::::::

144

:::::::::::::::::::::::::::::::::::

148

Numerical Optimization on Lie-Groups and Homogeneous Spaces

155

::::::::::::

157

Newton-Raphson Algorithm on Lie-Groups

::::::::::::::::::

161

6.3

Coordinate Free Newton-Raphson Methods

::::::::::::::::::

169

6.4

Symmetric Eigenvalue Problem

::::::::::::::::::::::::

172

6.5


:::::::::::::::::::::::

180

6.1

Gradient Descent Algorithms on Homogeneous Spaces

6.2

Conclusion

182

7.1

Overview :

:::::::::::::::::::::::::::::::::::

182

7.2

Conclusion

:::::::::::::::::::::::::::::::::::

184

ix

x

Glossary of Symbols

Linear Algebra, Sets and Spaces:

R

The real numbers.

C

The complex numbers.

N

The natural numbers.

RN

N -dimensional Euclidean space.

RN M

The set of all real N

CN M

The set of all complex N

B (x), B

The ball of radius

Sk(N )

The set of all skew symmetric matrices f 2 RN N

S (N )

The set of all symmetric matrices f 2 RN N

M matrices, MN -dimensional Euclidean space. M matrices, 4MN -dimensional Euclidean space.

> 0 around a point x 2 RN or the origin, B (x) = fy 2 RN j jjx ; y jj < g, B = B (0).

GL(N R), GL(N )

j T = ;g.

j T = g.

The general linear group of all real invertible N

GL(N C )

The general linear group of all complex invertible N

O(N )

The set of N

St(p n)

The Stiefel manifold of n

N orthogonal matrices fU 2 RN

N

N matrices. N matrices.

j U T U = IN g:

p orthogonal matrices, fX 2 Rn p j X T X = Ipg:

Differential Geometry, Sets and Spaces:

C k (M )

The set of all at least k times differentiable functions from a manifold M to the real numbers.

C 1(G)

The set of all smooth functions from a manifold G to the real numbers.

Ck C1

The set of at least

k times differentiable respectively smooth functions from

an understood set (usually Rn ) to the real numbers. xi

Tx M

The tangent space of a manifold M at the point x 2 M .

TM

The tangent bundle of M , the union over all x 2 M of each Tx M .

TxM

The cotangent space of a manifold

M

at the point

x 2 M.

The cotangent

space is the dual space of linear functionals on the vector space TxM .

T M

The cotangent bundle of M , the union over all x 2 M of each TxM .

D(G)

The algebra of all smooth vector fields on a smooth manifold G.

D(G)

The set of all smooth 1-forms fields on a smooth manifold G.

Sn

The n-dimensional sphere in Rn+1 , fx 2 Rn+1

RPn

n-dimensional real projective space, the set of all vector directions in Rn+1 .

G=H

The quotient space of a group G by a normal subgroup H .

stab(X )

The subgroup associated with a group action that leaves X fixed.

M (H0 )

The set of orthogonally similar matrices to H0.

Grass(p,n)

The Grassmannian manifold of p-dimensional subspaces in Rn .

gl(N R)

The Lie algebra associated with GL(N R). One has that gl(N R) = RN N

g, h

The Lie-algebra associated with arbitrary Lie-groups G or H respectively.

gl(N C )

The Lie algebra associated with GL(N C ), the set of all N

j xT x = 1g.

equipped with the matrix Lie-bracket A B ] = AB ; BA.

Linear Algebra, Notation and Operators:

IN

The N

N identity matrix.

0N M

The N

M matrix with each element zero.

i

Eigenvalues.

i

Singular values.

Re(a)

The real part of a 2 C . xii

N matrices.

hx yi

The Euclidean inner product of x and y in Rn .

ij

The Kronecker delta function, ij

AT

The transpose of a matrix A.

jjAjj, jjAjjF

The Frobenius norm of A 2 Rn m . One has jjAjj2F

jjxjj2

The Euclidean two-norm of x 2 R.

jjAjj2

Matrix two-norm, supremum of jjAxjj for jjxjj2 = 1.

A B ]

The Lie-bracket of two matrices, A B ] = AB ; BA.

adiA B

The adjoint operator on matrices, adiA B ad0A B

=

=

0 if i 6= j , ij

=

A.

A

=

1 if i = j .

Pn Pm A2 . i=1 j =1 ij

=

adiA;1 B

;

adiA;1 B

A where

fA Bg

The generalised Lie-bracket of two matrices fA B g = AT B ; B T A.

tr(A)

The trace of A 2 Rn n .

diag(1 : : : n )

The matrix with diagonal elements

: : : n ) and all other elements

( 1

zero. vec(A)

The vector operation that stacks the columns of a matrix vector in Rnm .

A 2 Rm

n into a

ker T

The kernel of a linear operator T .

dom T

The domain of a linear operator T , defined as the subspace orthogonal to the

spfv1 : : : vn g

kernel of T with respect to a given inner product.

The subspace generated by the span of the vectors v1 : : : vn.

sp(A)

The subspace generated by the span of the columns of A 2 Rn n .

dim V

The dimension of a subspace V

dist

The distance between two subspaces, the Frobenius norm of the residual pro-

Rn.

jection operator. O(h)

Big O notation. A function f (h) is order O(h) when there exists B

> 0 such that jf (hh)j B for all 0 h . xiii

> 0 and

o(h)

Little o notation. A function f (h) is order o(h) when f (h) is order O(h) and limh!0 jf (hh)j

=

0.

Differential Geometry, Notation and Operators:

D jx ( )

The Fréchet derivative of a scalar function

Tpf

M ! N between two manifolds at the point p 2 M . One has Tpf : TpM ! Tf (p)N , Tpf (X ) := Df jp (X ) the Fréchet derivative of f .

df

The differential of a function f :

jJ

The restriction of a map

J

The closure of J

g ( )

General notation for a Riemannian metric operating on tangent vectors and

hh ii

An explicit Riemannian metric operating on tangent vectors and .

grad

The gradient of a potential 2 C 1 (G) on a Riemannian manifold G.

Z

The derivation of 2 C 1 (M ) with respect to a smooth vector field Z

smooth manifold, in direction .

2 C1(M ) evaluated at x 2 M a

The tangent map associated with a function f :

M ! N between two manifolds. df : TM ! TN , df (Xp) = Tp f (Xp) where Xp 2 TpM .

One has

: M ! N between two sets M and N to a map : J ! N where J M .

M in some topology on M .

.

on a manifold

M.

One has that

Z (x) 2 TxM the value of Z at x.

Z (x)

=

2 D(M ) D jx (Z (x)) for x 2 M and

Y Z ]

The Lie-bracket of two smooth vector fields Y Z

LY Z

The Lie-derivative of two vector fields Y Z

has that LY Z

=

Y Z ].

2 D(M ) for M a manifold.

2 D(M ) for M a manifold. One

r

An affine connection.

rY Z

The action of an affine connection on (or covariant derivation of) a smooth vector field Z

2 D(M ) with respect to Y 2 D(M ) on a manifold M . xiv

Jx

The Jacobi matrix of 2 C 2(Rn ) evaluated at a point x 2 Rn .

H

The Hessian of 2 C 2(M ) on a manifold M evaluated at a critical point of

.

xv

Chapter 1

Introduction The aim of the present work is to investigate a particular class of constrained optimization problems, those where the constraint set is a smooth homogeneous manifold (embedded in Euclidean space). Rather than rely on standard numerical techniques the approach taken is to exploit the geometry of Lie-groups and homogeneous spaces. The advantages of this approach are considerable especially in the areas of stability, robustness and flexibility of the algorithms developed. Optimization problems of the class considered are important in many fields of study. The two areas from which the principal examples in the body of the thesis are drawn are the fields of numerical linear algebra and linear systems theory. An advantage of considering questions drawn from the field of numerical linear algebra is the degree of expertise in solving such problems using classical techniques. This provides an excellent foundation on which to develop new results as well as ensuring that there is a battery of existing numerical methods to which proposed algorithms may be compared. In contrast, the field of linear systems theory contains many important optimization problems for which no satisfaction solution is known. Presently accepted solution methods tend to be awkward adaptations of numerical linear algebra methods which do not exploit the natural structure of the problem. Many of these optimization problems are of a form for which the methods developed in the this work are applicable. As a consequence of the ad-hoc development of many of the existing algorithms in linear systems theory there has been little or no effort to understand the particular requirements of 1

2

Introduction

Chapter 1

numerical methods for engineering problems. The neglect of this aspect of a proper numerical treatment of optimization problems in engineering is especially important when on-line and adaptive processes are considered. In such processes, conventional numerical methods must be augmented with additional check procedures and tests to guarantee robustness of the process. Indeed, one should even consider whether the principal goals of classical numerical methods are appropriate for numerical methods in an adaptive or on-line engineering application. Following this line of reasoning further, it is instructive to consider a set of priorities suitable for numerical methods which solve on-line and adaptive engineering applications. I believe the following properties in some sense describe the characteristics desirable for such algorithms. It should be mentioned that the algorithms considered are all recursive iterations whose limiting value yields the solution of the problem considered and the properties mentioned below are phrased with this in mind. Simplicity, The algorithm should be simple in concept and flexible in implementation. The relationship between the task performed and the computational method employed should be easily understood. Global Convergence, The method should converge to the desired solution for almost any initial condition. In a sense this can be considered as a robustness and stability requirement on the algorithm. Thus, algorithms should be highly robust with respect to noisy data sequences and large deviations. Interestingly, this point is often an argument against using iterative numerical methods that converge too quickly. Constraint stability, The method should explicitly preserve the constraint set on which the optimization problem is posed. If a numerical algorithm is implemented on-line it will be running for a considerable period of time and it is imperative that the constraint be maintained exactly to preserve the qualitative properties of the system. Classical numerical optimization methods do not necessarily have these properties as primary goals. Indeed, most classical algorithms are designed primarily to obtain the best absolute accuracy with the least computational cost (implemented on a digital computer). For example standard error analysis of a numerical method ensures that the numerical solution obtained satisfies some absolute error bound. In contrast, the numerical algorithms considered in later chapters are designed to exactly preserve the constraint while solving an optimization

x1-0

Introduction

3

problem robustly. The questions of computational cost and absolute accuracy are of secondary importance to the properties mentioned above. An important aspect of the properties outlined above is that they do not demand fast convergence or high accuracy (other than to preserve the constraint set). Certainly, constraint stability requires that only certain errors may occur (those that preserve the structure of the problem), however, high accuracy within the constraint set is often not a necessity. For example, if a computed linear feedback gain that stabilises a given plant, then probably any nearby feedback will also stabilise the plant. However, if that computation is then used to initialise a further iterate and introduces modelling errors into a process, these modelling errors could accumulate and eventually cause more significant problems. The bursting phenomena observed in early adaptive feedback algorithms provide a useful analogy. Fast convergence properties within the constraint set are not necessarily desirable either. Indeed, if the algorithm converges too quickly then it will tend to track input noise disturbances, whereas a scheme that converges slowly will act somewhat like a low pass filter. In a practise one would like to have a “knob” which adjusts the rate of convergence of a given algorithm (analogous to adjusting the pass band of a filter). This is impossible for most classical algorithms, however, the algorithms proposed in this thesis can all be sped up and slowed down to a certain degree. In this thesis I propose recursive algorithms for solving a number of constrained optimization problems which satisfy the properties outlined above. The principal algorithms considered are based on the classical gradient descent algorithm but modified so that they explicitly preserve that constraint set. This is achieved by exploiting the geometry of Lie-groups and homogeneous spaces, though the algorithms proposed can be understood without resorting to deep theoretical results. The algorithms are closely related to certain continuous-time gradient flows and dynamical systems that have been proposed recently by a number of authors as potential numerical methods for engineering problems (cf. the recent monograph (Helmke & Moore 1994b) for an excellent review of these developments). By designing numerical algorithms based on these methods one brings dynamical systems solutions to engineering problems one step closer to applications. The modified gradient descent algorithms proposed display all the properties mentioned above. In the case where there is a unique local (and global) minimum then the gradient descent gradient algorithms will always converge to the desired minima. Gradient flows (and

4

Introduction

Chapter 1

also gradient algorithms) are robust to variations in initial conditions (Smale 1961). Since the algorithms proposed are designed to explicitly preserve the constraint then by definition they satisfy the property of constraint stability. Finally, the basic gradient descent algorithm is the simplest optimization method available and the modifications considered are relatively straightforward changes. Of course, there are applications where the linear convergence rate associated with gradient descent algorithms is not sufficient for the problem considered. In this case one must look at more sophisticated methods. The most direct quadratically convergent method is the Newton-Raphson algorithm for determining the zeros of a gradient vector field. Exploiting the geometry of Lie-groups again one can formulate a Newton-Raphson algorithm directly on the constraint set. Unfortunately the region in which the NewtonRaphson method will converge to the desired minimum is only a local neighbourhood of that point. Thus, the Newton-Raphson method by itself does not satisfy the global convergence property. Nevertheless, the method is useful in certain situations and can be implemented in parallel with a modified gradient descent algorithm to guarantee robustness. The potential applications of the theory expounded in this work are far reaching and varied. Originating from recent dynamical systems studies of eigenvalue problems (Brockett 1988, Brockett 1991b, Chu & Driessel 1990, Helmke & Moore 1990) one may design iterative gradient descent algorithms for the symmetric eigenvalue and singular value problems (cf. Moore et al. (1994) and Chapter 2). These new algorithms do not aim to compete with state of the art solutions to these problems. Rather, the symmetric eigenvalue and singular value problems provide an environment in which to understand the new approach used in the context of a well understood problem and compare the algorithms generated to classical methods. Having, understood and developed the theory necessary to implement these methods in a simple case one is confident in tackling problems in linear systems theory which have proved amenable to dynamical systems solutions. For example, Helmke (1993) provided a variational characterisation of several different classes of balanced realisations for linear systems. Dynamical systems which compute balanced realisations were proposed several authors (Perkins, Helmke & Moore 1990, Helmke, Moore & Perkins 1994). The problem of computing balanced realizations can be numerically illconditioned, especially when a plant is nearly uncontrollable/unobservable, and calls for special numerical methods (Laub, Heath, Paige & Ward 1987, Safanov & Chiang 1989, Tombs &

x1-0

Introduction

5

Postlethwaite 1987) with good numerical properties. Gradient descent methods have good numerical properties for dealing with ill-conditioned problems and offer an attractive alternative to modifications of existing numerical linear algebra methods. Yan and Moore (1991) and later Helmke and Moore (1994a) developed gradient dynamical-systems solutions to computing balanced realizations and minimising the L2 -sensitivity of a state-space realization of a given transfer function. Yan et al. (1994) has developed a number of recursive algorithms based on these dynamical systems for L2 sensitivity optimization as well as Euclidean norm balancing. A generalisation of balancing and sensitivity minimisation for time varying plants, termed “-balancing”, is discussed by Imae, Perkins and Moore (1992). An important application of the dynamical systems approach for balancing and sensitivity minimisation is in the design of finite word length implementations of controllers. Recent work on designing digital statespace systems which draws from these ideas is outlined in the articles (Li, Anderson, Gevers & Perkins 1992, Madievski, Anderson & Gevers 1994). A state of the art discussion of many of these issues is contained in the recent monograph by Gevers and Li (1993). The area of balanced realizations and sensitivity minimisation is only one facet of the potential applications of dynamical systems concepts to control theory. Brockett’s (1988) original work lead him to consider a number of applications related to analogue computing. Brockett (1989b) went on to show that dynamical systems can be used to realize general arithmetical and logical operations. Least squares matching problems (Brockett 1989a, Brockett 1991a) are also a natural application of the original development with practical relevance to computer vision and statistical principal component analysis. The geometry of least squares and principal component analysis was developed by a number of authors in the mid eighties (Bloch 1985a, Bloch 1985b, Byrnes & Willems 1986). An interesting application of these ideas to the dynamical theory of learning in neural-networks was discussed by Brockett (1991a). This work was based on Brockett’s own research along with recent developments in using the singular value decomposition to understand learning procedures (Bouland & Karp 1989, Baldi & Hornik 1989). The work ties in closely with Oja’s (1982, 1989) results on neural-network learning. Recently, Yan, Helmke and Moore (1994) have provided a rigourous analysis of Oja’s learning equations. Numerical methods related to these problems are presented in Mahony, Helmke and Moore (1994) (cf. Chapter 3). More generally, Faybusovich (1992a) has developed dynamical-system solutions for com-

6

Introduction

Chapter 1

puting Pisarenko frequencies, used in certain signal processing applications. Similar techniques provide new approaches to realization theory (Brockett & Faybusovich 1991). Another potential application in signal processing is the digital quantization of continuous-time signals considered the articles (Brockett 1989b, Brockett & Wong 1991). Yan, Teo and Moore (n.d.) have also investigated using dynamical systems for computing

LQ optimal output feedback

gains. This has motivated a number of authors (Sreeram, Teo, Yan & Li 1994, Mahony & Helmke 1993) to use similar methods for difficult simultaneous stabilization problems that have no classical solution. Preliminary results by Ghosh (1988) have tackled the simultaneous stabilization problem using algebraic geometric methods, though more recent results (Blondel 1992, Blondel, Campion & Gevers 1993) indicate that the problem can not be solved exactly using algebraic operations and consequently, recursive methods offer one of the better numerical approaches to obtain an approximation.

1.1 Historical Perspective The material presented in this thesis builds primarily on the recent development of dynamical systems solutions to certain linear algebraic problems. There is also dependence on classical optimization theory from the last fifty years or so and more recent concepts of numerical stability for computational integration methods. The pole placement results presented in Chapter 4 relate to a considerable body of knowledge in linear systems theory developed since the seventies. To provide a historical background for the work presented in this thesis the present section is split into three subsections covering the fields of dynamical systems theory, numerical optimization theory and linear systems theory. There is some overlap between these topics, especially since the focus is on those developments which lead to the new results presented in the body of this thesis.

1.1.1

Dynamical Systems as Numerical Methods

Much of the work covered in the following subsection is relatively recent and I know of only one book (Helmke & Moore 1994b) that is devoted to the study of this topic with applications to engineering. Nevertheless, there are several good review articles available which cover the

x1-1


7

early application of the Toda lattice to eigenvalue problems (Chu 1984a, Watkins 1984) and an overview of continuous realization methods for traditionally numerical linear algebra problems (Chu 1988). Historically, the idea of solving a numerical problem by computing the limiting solution of a continuous-time differential equation is not new. The accelerating development of digital computers in the mid twentieth century tended to obscure the potential of such methods though the study of analogue circuit design was still of interest for practical applications. In the cases where dynamical solutions to certain problems were considered (for example Rutishauser (1954, 1958) proposed dynamical systems for solving the symmetric eigenvalue problem) the classical algorithms known today developed so quickly that the dynamical systems approach was forgotten. More recently, digital techniques have improved to the point where many traditionally analogue tasks are being performed digitally. Interestingly, there is a renewed interest recently in analogue techniques, brought about perhaps by a feeling that the limits of digital technology may be approaching. The particular historical development of dynamical system solutions for numerical linear algebra problems on which my work is based began with the study of a differential equation proposed by Toda (1970). Toda’s original idea was to study the evolution of point masses in one dimension related by an exponential attractive force. The differential equation that he proposed became known as the Toda lattice and was extensively studied by a number of authors (Hénon 1974, Flashka 1974, Flashka 1975, Moser 1975, Kostant 1979, Symes 1980a, Symes 1980b). In Flashka (1974) a representation of the Toda lattice as an isospectral differential equation on the set of tridiagonal1 symmetric matrices was developed. By isospectral it is understood that the eigenvalues of the matrix solution to the Toda lattice remain constant for all time. Moser (1975) extended this to show that a solution of the Toda lattice converges to a diagonal matrix and thus provides a way in which to compute the eigenvalues of a tridiagonal symmetric 1

Tridiagonal symmetric matrices are matrices of the form

0 0 BB BB . . .. BB 0 . . @ .. . . 1

1

1

2

. 0

for real numbers

1

:::

n and 1 : : : n;1 .

.

2

n;2 0

..

.

0 .. .

..

.

0

n;1

n;1

n;1

n

1 CC CC CC A

8

Introduction

Chapter 1

matrix. Symes (1982) showed that the Toda lattice was in fact related to the classical

QR

algorithm for the symmetric eigenvalue problem. This paper generated considerable interest in dynamical systems solutions of numerical linear algebra problems and was followed by several papers (Deift, Nanda & Tomei 1983, Watkins 1984, Chu 1984a, Chu 1984b, Nanda 1985, Shub & Vasquez 1987, Watkins & Elsner 1988) which generalise the initial connection seen by Symes. Present day interest in the Toda flow is considerable with recent work into developing a VLSI (very large scale integrated circuit) type implementation of the Toda flow by a nonlinear lossless electrical network (Paul, Hueper & Nossek 1992) as well as its close connection to the double bracket equation discussed below. Prompted in part by the potential of the Toda lattice as a theoretical (and potentially practical) tool in numerical-linear algebra several authors undertook to investigate more general numerical methods in the context of dynamical systems. Ammar and Martin (1986) investigated other standard matrix eigenvalue methods and showed a strong connection between both the discrete-time and continuous-time Riccati equation. Their results were based in part on a Lie-theoretic interpretation of the Riccati flow developed by Hermann and Martin (1982). A complete phase portrait of the Riccati equation was given by Shayman (1986) while Helmke (1991) has related the Riccati flow to Brockett’s double-bracket flow (Brockett 1991b). Articles by Riddle (1984) on minimax problems for sums of eigenvalues and by Duistermaat, Kolk and Varadarajan (1983) on flows constrained to evolve on flag manifolds should be mentioned since both articles have proved useful references for many of the works mentioned below. The double bracket equation

H_ (t) = H (t) H (t) N ]] H (0) = H0

:1:1)

(1

and its properties were first studied by Brockett (1988, 1991b) (see also independent work by Chu and Driessel (1990) and Chu (1991b)). Here H

=

HT 2 Rn

n and N

=

N T 2 Rn

n are

symmetric matrices. When H is tridiagonal and N is diagonal then (1.1.1) reduces to the Toda

lattice. Brockett showed that (1.1.1) defines an isospectral flow whose solution H (t), under suitable conditions on N , converges to a diagonal matrix. Brockett spoke of using (1.1.1) to

solve various combinatorial optimization tasks such as linear programming problems and the sorting of lists of real numbers.

x1-1


9

The double bracket equation was seen to be a fundamental generalisation of the Toda lattice with many practical applications. Among the the diverse fields to which the double-bracket equation appears to be relevant one finds applications to the travelling salesman problem and quantization of continuous signals (Brockett 1989b, Brockett & Wong 1991). Least squares matching and applications in computer vision are discussed in the paper (Brockett 1989a). Chu and Driessel (1990) considered continuous-time solutions to structured inverse eigenvalue problems along with matrix least squares problems. For applications in subspace learning see the papers (Brockett 1991a, Yan, Helmke & Moore 1994). Stochastic versions of the double bracket equation are studied the report (Colonius & Klieman 1990). An important connection has been recognised between the double-bracket flow and the modern geometric approach to linear programming pioneered by Khachian (1979) and Karmarkar (1984, 1990). Fundamental work in this area has been carried out by a number of authors (Bayer & Lagarias 1989, Lagarias & Todd 1990, Bloch 1990b, Faybusovich 1991a, Faybusovich 1991b, Helmke 1992). A deep understanding of the double-bracket equation has been developed over the last few years. The connection between the Toda lattice and the double-bracket flow is thoroughly described by a series of papers (Bloch 1990a, Bloch, Brockett & Ratiu 1990, Bloch, Flaschka & Ratui 1990). Lagarias (1991) shows certain monotonicity properties of sums of eigenvalues of solutions of the Toda lattice. It is not surprising to find that the double-bracket equation can be interpreted as a gradient flow on adjoint orbits of a compact Lie-group (Bloch, Brockett & Ratiu 1990, Bloch, Brockett & Ratiu 1992). Indeed there is now an emerging theory of completely integrable gradient and Hamiltonian flows associated with the double-bracket equation (Faybusovich 1989, Bloch et al. 1992, Bloch 1990b). The paper by Faybusovich (1992b) gives a complete phase portrait of the Toda flow and

QR

algorithm including a

discussion of structural stability. The development of the double-bracket equation has been parallelled by a number of papers which investigate the potential of dynamical systems solutions to numerical linear algebra problems. Watkins and Elsner (1989a, 1989b) considered both the generalised eigenvalue problem and the singular value decomposition. The symmetric eigenvalue problem is discussed the articles (Brockett 1988, Chu & Driessel 1990, Brockett 1991b). The singular value decomposition has also been studied in detail (Smith 1991, Helmke & Moore 1990, Helmke et al. 1994). The Jacobi method for minimising the norm of off-diagonal entries of a matrix is discussed by

10

Introduction

Chapter 1

Chu (1991a) with application to simultaneous diagonalization of multiple matrices. Chu and Driessel (1991) have also looked at inverse eigenvalue problems which are related to recent work in pole placement for classes of structured linear systems by Mahony and Helmke (1993) (cf. Chapter 4). Numerical methods based on the double-bracket equation have been discussed by Brockett (1993) and Smith (1993). Numerical methods with connections to the dynamical systems solutions for inverse singular value problems are discussed by Chu (1992) while numerical methods for feedback pole placement within a class of symmetric state-space systems is discussed in the conference paper (Mahony, Helmke & Moore 1993) (cf. Section 4.6).

1.1.2

Optimization Techniques and Numerical Solutions to Differential Equations

An early reference for optimization techniques is monograph (Aoki 1971) or the book by Luenburger (1973). More recent material can be obtained in Dennis and Schnabel (1983) and the recent review of state of the art methods (Kumar 1991). For recent developments in numerical Hamiltonian integration methods see the review article (Sanz-Serna 1991). Relationships of these developments to classical numerical integration techniques is contained in the review (Stuart & Humphries 1994). The problems considered in this thesis are constrained scalar optimization problems on smooth manifolds without boundary. That is the problem of minimising (or maximising) a function

f : M ! R from the constraint set M to the real numbers.

There are strong con-

nections, however, with classical numerical linear algebra problems such as that of computing the eigenvalues of a symmetric matrix. The tools employed are derived from a geometric understanding of the problems considered combined with methods from classical unconstrained optimization theory and Lie-theory. Of course, the geometry of most problems drawn from the field of numerical linear algebra is well understood. For example, a geometric understanding of the symmetric eigenvalue problems is not new. Parlett and Poole (1973) first rigourously analysed the classical

QR, LU

and power iterations in a geometric framework, though Bu-

urema (1970) had done preliminary work and the geometric structure of the problem must have been known to many. A recent survey article is Watkins (1982). A geometric understanding of

x1-1


11

the problem of determining a single eigenvector of a symmetric matrix was known long before the general

QR algorithm was understood.

Indeed, steepest descent optimization techniques

for dominant eigenvector determination were proposed by Hestenes and Karush (1951). An excellent discussion of the early optimization techniques for such problems is contained in Faddeev and Faddeeva (1963). Far from being a closed field there is still much interest in methods similar in scope, though far advanced in technique (Auchmity 1991, Batterson & Smillie 1989, Batterson & Smillie 1990). For more general numerical linear algebraic techniques such as the QR algorithm it is necessary to use the language of Grassmannians and Flag manifolds to further develop the early work of Parlett and Poole (1973). A lot was done to understand these problems in the early eighties in connection with studying the Toda flow (Symes 1982, Deift et al. 1983, Watkins 1984, Chu 1984a). Later Ammar and Martin (1986) analysed a number of matrix eigenvalue methods using flows on Grassmannians and Flag manifolds and showed strong connections to both the discrete-time and the continuous-time Riccati equations. The developing geometric understanding of classical numerical linear algebra techniques lead to minimax style results for the eigenvalues of matrices (Riddell 1984). These developments have resulted in a number of elegant new proofs of matrix eigenvalue inequalities, for example Wieland-Hoffman inequality (Chu & Driessel 1990) the Courant-Fischer minimax principal (Helmke & Moore 1994b, pg. 14) and Eckart-Young theorem (Helmke & Shayman 1992). Applications of the double-bracket equation and dynamical systems theory to numerical linear algebraic problems (Brockett 1988, Watkins & Elsner 1989a, Watkins & Elsner 1989b, Smith 1991, Helmke & Moore 1990, Brockett 1991b, Helmke et al. 1994) has lead to the design of numerical algorithms based explicitly on the dynamical systems developed. Recent advances in such techniques are discussed in the articles (Chu 1992, Brockett 1993, Moore, Mahony & Helmke 1994). These methods are essentially based on classical unconstrained optimization methodologies reformulated on the constraint set. Unconstrained scalar optimization techniques fall into roughly three categories (Aoki 1971) i) Methods that use only the cost-function values. ii) Methods that use first order derivatives of the cost function. iii) Methods that use second (and higher) order derivatives of the cost function.

12

Introduction

Chapter 1

Methods of the first type tend not to be useful for other than linear search and non-smooth optimization problems due to computational cost. An excellent survey of early techniques such as pattern searches, relaxation methods, Rosenbrock and Powell’s methods as well as random search methods and some other variations of these ideas is contained in Aoki (1971, section 4.7). Other good references for these methods are the books (Luenburger 1973, Minoux 1986). Recent developments are discussed in the collection of articles (Kumar 1991). The fundamental method of type ii) is the gradient descent method. For a potential

@f : : : @f )T the method of gradient descent f : Rn ! R with the gradient denoted Df = ( @x @xn 1

is

xk+1 = xk ; sk Df (xk ) where sk

> 0 is some pre-specified sequence of real positive integers known as step-sizes. Here the integer k indexs the iterations of the numerical algorithm acting like a discrete-time variable for the solution sequence fxk g1 k=0 . A suitable choice of step-size sk is any sequence such that P sk ! 0 as k ! 1 and 1 k=1 sk = 1. Polyak (1966) showed that provided f satisfies certain convexity assumptions then the solution sequence of the gradient descent algorithm converges to the minimum of f . The optimal gradient descent method is known as the method of steepest descent (Cauchy 1847, Curry 1944) where the step-size is chosen at each step by

sk = arg min f (xk ; sDf (xk )): s0 Here “arg min” means to find the value of s that minimises f (xk ; sDf (xk )). The method of steepest descent has the advantage of being associated with strong global convergence theory (Minoux 1986, Theorem 4.4, pg. 86). The step-size selection procedure is usually completed using a linear search algorithm or using some estimation technique based on approximations of

f (xk ; sk Df (xk )). Using a linear search technique generally provides a faster but less reliable algorithm while a good approximation technique will inherit the strong global convergence theory of the optimal method. The disadvantage of the overall approach is the linear rate of convergence of the solution sequence fxk g to the desired limit (even for optimal step-size selection). Nevertheless, when the reliability and not the rate of convergence of an optimization problem is at issue the steepest descent method or an approximate suboptimal gradient descent method remains a preferred numerical algorithm.

x1-1


13

There are a number of algorithm which improve on the convergence properties of the steepest descent method. Of these only the Newton-Raphson method is important for the sequel, however, it is worth mentioning that multi-step methods, combining a series of estimates

xk+1 : : : xk+p and derivatives Df (xk+1 ) : : : Df (xk+p ) can be devised which converge with superlinear, quadratic and higher orders of convergence, but which have much weaker convergence results associated with them than the steepest descent methods. The most prominent of these methods are the accelerated steepest descent methods (Forsythe 1968) and the method of conjugate gradients (Fletcher & Reeves 1964). The Newton-Raphson method falls into the third category and relies on the idea of approximating the scalar function f (x) by its truncated Taylor series

f (x) f (xk ) + (x ; xk )T Df (xk ) + (x ; xk )T D2 f (xk )(x ; xk ) where

D2f (xk ) is the square matrix with ij ’th entry

@ 2f (xk ) . If @xi@xj

f (x) is quadratic then this

approximation is exact and the optimal minimum can be found in a single step

x = xk ; (D2f (xk ));1 Df (xk ): Of course, in general this will not be true but if the approximation is fairly good, one would expect the residual error jjx ; xk+1 jj to be of order2 O(jj(x ; xk )jj3). Indeed, the NewtonRaphson algorithm is the most natural algorithm that displays quadratic convergence properties. A disadvantage of the Newton-Raphson algorithm is the cost of determining the inverse

D2f (xk ));1 and a number of methods have been devised to reduce the computational cost of

(

this calculation. The most common of these are the Davidon-Fletcher-Powell approach (Davidon 1959, Fletcher & Powell 1963) and a rank-2 correction formula independently derived by 2

The big O order notation,

B > 0 and > 0 such that

jj

x xk+1 ;

jj

is of order O(jj(x ; xk )jj3 ) means that there exists real numbers

x xk+1 ) (x xk ) 3

jj(

;

jj

jj

;

jj

B

for all jj(x ; xk )jj . If jjx ; xk+1 jj is of order O(jj(x ; xk )jj3 ) then it follows using the little o order notation that jjx ; xk+1 jj is of order o(jj(x ; xk )jj2 ),

x xk+1 ) k!1 (x xk ) 2 lim

jj( jj

;

;

jj

jj

=

0:

Thus the error bound at each step decreases like a quadratic function around the limit point. Methods with this convergence behaviour are known as quadratically convergent.

14

Introduction

Chapter 1

Broyden (1970), Fletcher (1970), Goldfarb (1970) and Shanno (1970). An excellent review of these methods is provided by Minoux (1986, Chapter 4). The approach to optimization described above is closely related to the task of numerically approximating the solution of an ordinary differential equation. Indeed, the gradient descent method is just the Euler method (Butcher 1987, Section 20) applied to determine the solution of the gradient differential equation

x_ = ;Df (x) x(0) = x0 where f : Rn

:1:2)

(1

! R (Euler’s original work is republished in the monograph (Euler 1913)). The

Euler method is rarely used in modern numerical analysis since it is only a first order method. That is, the error between xk+1 and x(h; xk ) (the solution of (1.1.2) with x(0) = xk evaluated

at time h) is o(h),

lim

h!0

jjxk+1 ; x(h; xk )jj = 0: h

(Naturally, this translates to a linear convergence rate for the gradient descent method.) More advanced numerical integration methods exist, the most common of which in engineering applications are the Runge-Kutta methods (Butcher 1987, Section 22) or linear multi-step methods (Butcher 1987, Section 23). The idea of stability for a numerical approximation of the solution to an initial value problem is usually described in terms of the ability of the numerical method to accurately reproduce the behaviour of the continuous-time solution. Thus, if one is considering the scalar linear differential equation

x_ = qx x(0) = x0 2 C

:1:3)

(1

where q

2C

as

0. A numerical approximation to this problem is loosely said to be stable if the

t!

is a fixed complex number with real part Re(q ) < 0, then the solution x(t) ! 0

approximation also converges to zero. A Runge-Kutta method, with step size h, is said to be Astable if the numerical solution of the scalar linear differential equation given above converges to zero for any z

hq lying in the complex left half plane. Thus, for any real positive step-size selection h > 0 and any linear system with Re(q ) < 0 an A-stable Runge-Kutta method solution of (1.1.3) will converge to zero. The concept of AN -stability captures the same =

qualitative behaviour for non-autonomous linear systems (Burrage 1978). A strengthening of

x1-1


15

A-stability for contractive numerical problems (cf. the review article (Stuart & Humphries 1994)) termed B -stability was proposed by Butcher (1975) which can also be generalised to non-autonomous systems (BN -stability) (Burrage & Butcher 1979). In this the concept of

paper Burrage and Butcher also introduced the important concept of “algebraic stability” which they showed implied both B - and BN -stability. Algebraic stability is a condition on the parameters that define a Runge-Kutta method which has relevance to many different stability problems (Stuart & Humphries 1994) and even to question of existence and uniqueness of solutions to implicit Runge-Kutta methods (Cooper 1986). For systems with

Re(q) 0 is a small real number) and “stiff stability” (Gear 1968) (the method must be stable for all z 2 fz 2 C j j arg(;z )j g for some small real number

> 0).

like this behaviour to be replicated in the numerical solution. A definition of

The unifying idea behind each of these stability definitions is the ability of the numerical method to replicate the properties of the continuous-time solution that is being approximated. The classical definitions of stability discussed above consider only simple convergence behaviour of systems (A- and convergence rates,

AN -stability for linear decay problems, L-stability for fast

B - and BN -stability for contractive problems).

Another important class

of differential equations are those which preserve certain quantities, for example energy or a Hamiltonian. Numerical methods for these two classes of problems (conservative and Hamiltonian systems) have been the subject of considerable research recently. Methods for conservative systems are discussed in the articles (Greenspan 1974, Greenspan 1984). Methods for Hamiltonian systems are of more relevance to the present work. These methods can be divided roughly into two types (Sanz-Serna 1991), firstly methods that are classical numerical differential equation solvers which happen also to preserve a Hamiltonian, and secondly methods which are constructed explicitly from generating functions for solving Hamiltonian systems. The earlier methods were based on generating functions (Ruth 1983, Channell 1983, Menyuk 1984, Feng 1985, Zhong & Marsden 1988). When it was observed that expressing these methods could

16

Introduction

Chapter 1

often be interpreted as numerical Runge-Kutta methods with particular properties people became interested in exactly which Runge-Kutta methods would have the property of preserving a Hamiltonian. This question was answered independently by a number of authors (Sanz-Serna 1988, Suris 1989, Lasagni 1988). Application of these ideas to engineering problems associated with equations of motion of a rigid body has been undertaken by Crouch, Grossman and Yan (1992). Crouch, Grossman and Yan are also working on related integration techniques for engineering problems (Crouch & Grossman 1994, Crouch, Grossman & Yan 1994). A recent review article for Hamiltonian integration methods is Sanz-Serna (1991). Interestingly, the characterisation of Runge-Kutta methods that preserve Hamiltonians is related to the algebraic construction first described when defining algebraic stability (Burrage & Butcher 1979). Indeed, Stuart and Humphries (1994), describe a number of connections between early stability theory and modern numerical methods for Hamiltonian and conservative systems. In Stuart and Humphries (1994) the concept of numerical stability, the question of whether, and in what sense the dynamical properties of a continuous-time flow are inherited by a discrete numerical approximation, is defined. This concept is sometimes termed “practical stability” and is closely related to the definition of constraint stability given on page 2. I have opted not to use the term numerical stability to describe the algorithms proposed in the sequel since the optimization tasks considered require two types of numerical stability, preservation of a constraint and convergence to a limit. In certain cases the Toda lattice, double-bracket flow and related dynamical systems can be interpreted as a completely integrable Hamiltonian flow (Bloch 1985a, Bloch et al. 1992, Bloch 1990b). In these cases one could think to apply the modern Hamiltonian integration techniques discussed by Sanz-Serna (1991). To do this however, one would have to consider the various differential equations as Hamiltonian flows on Rn and the insight gained by considering the solution in matrix space would be lost. Several authors have looked directly at discretizing flows on Lie-groups and homogeneous spaces. Moser and Veselov (1991) considered discrete versions of classical mechanical systems. Chu (1992) considered discrete methods for inverse singular problems based on dynamical systems insights while Brockett (1993), Smith (1993) and Moore et al. (1994) have studied more deliberate discretizations of gradient flows on Lie-groups and homogeneous spaces.

x1-1 1.1.3


17

Linear Systems Theory and Pole Placement Results

Textbooks on feedback control and linear systems theory are those of Kailath (1980), Wonham (1985) and Sontag (1990). An excellent reference for classical linear quadratic methods is the book (Anderson & Moore 1971) or the more recent book (Anderson & Moore 1990). A recent review article on developments in pole placement theory is Byrnes (1989). The field of systems engineering during the mid seventies was the scene of a developing understanding of the mathematical and geometric foundation of linear systems theory. Seminal work by Kalman (1963) among others set a foundation of mathematical systems theory which lead people naturally to use algebraic geometric tools to solve some of the fundamental questions that arose. This lead to a strong geometric framework for linear systems theory being developed in the late seventies and early eighties (Bitmead & Anderson 1977, Martin & Herman 1979, Hazewinkel 1979, Byrnes, Hazewinkel, Martin & Rouchaleau 1980, Helmke 1984, Falb 1990). See also the conference proceedings (Martin & Hermann 1977b, Byrnes & Martin 1980). The development of the Toda lattice was of considerable interest to researchers working in linear systems theory in the late seventies and lead to several new developments in scaling actions on spaces of rational functions in system theory (Byrnes 1978, Krishnaprasad 1979, Brockett & Krishnaprasad 1980). More recently Nakamura (1989) has showed a connection between the Toda lattice and the study of moduli spaces of controllable linear systems. Also Brockett and Faybusovich (1991) have made connections with realization theory. One of the principal questions in linear systems theory that remained unanswered until recently was the question of how the natural frequencies or poles of a multi-input multi-output system are effected by changing feedback gain. In the case where the full state of a multi-input multi-output state space system is available as output, Wonham (1967) showed that arbitrary pole placement is equivalent to complete controllability of the system. The case for output feedback (when only part of the state is available directly from the output) was found to be far more difficult. Indeed, even after the theory of optimal linear quadratic methods was far advanced (Anderson & Moore 1971) an understanding of the output feedback pole placement problem remained elusive. A few preliminary results on pole shifting were obtained in the early seventies (for example Davison and Wang (1973)) which lead to the first important result, obtained independently by Davison and Wang (1975) and Kimura (1975). Given a linear

18

Introduction

system with n states,

Chapter 1

m inputs and p outputs, the result stated that for almost all controllable

and observable linear state-space systems for which

m + p ; 1 n the poles of that system could be almost arbitrarily changed using output feedback. In 1977 Herman and Martin published a pair of articles (Hermann & Martin 1977, Martin & Hermann 1977a) which used the dominant morphism theorem to show that

mp n is a

necessary and sufficient condition for output feedback pole placement if one allows complex gain matrices. Observe that if m,

p 1 then mp m + p ; 1 and thus the results obtained

by Hermann and Martin are stronger than those obtained earlier apart from the disadvantage of requiring complex feedback. Unfortunately, their results don’t generalise to real feedback gains though it was hoped that the condition mp n would also be a necessary and sufficient for real output feedback pole placement. However, Willems and Hesselink (1978) soon gave a counter example (m

=

2, p

=

2, n

=

4) showing that the strict inequality mp

=

n could not

be achieved for arbitrary pole placement using real feedback. The case

mp

=

n was studied by Brockett and Byrnes (1979, 1981) using tools from

algebraic geometry and constructions on Grassmannian manifolds. By using these ideas Brockett and Byrnes generalised Nyquist and Root locus plots to multi-input and multi-output systems, however though useful, their results only applied in the case

mp = n and fell short

of completely characterising the pole placement map in this case also. In Byrnes (1983) ˘ the Ljusternik-Snivel’mann category of real Grassmannians is used to improve on Kimura’s original result. There have been no other significant advances in dealing with this problem during the mid eighties. A recent review article (Byrnes 1989) outlines the early results as well as describing the state of the art towards the end of the eighties. Recently Wang and Rosenthal have made new contributions to the problem of output feedback pole placement (Wang 1989, Rosenthal 1989, Rosenthal 1992). Most recently Wang (1992) has given a necessary and sufficient condition for pole placement using the central projection model. Given a linear system with n states, m inputs and p outputs Wang has shown that arbitrary output feedback pole placement is possible for any strictly proper controllable and observable plant with

mp > n.

If the plant is only proper then almost arbitrary pole

x1-2

Summary of Results

19

placement is possible. The case mp = n is still not fully understood. Little has been done to study classes of linear systems and the pole placement map. In Martin and Herman (1977a) pole placement for linear Hamiltonian systems was considered. More recently Mahony et al. (1993) (cf. Chapter 4) studied pole placement for symmetric state-space systems. Simultaneous pole placement for multiple systems is also a problem that has had little study. Ghosh (1988) has written a paper on this topic using algebro-geometric techniques and recently Blondel (1992) and Blondel, Campion and Gevers (1993) have also contributed. Such problems can also be tackled using the ideas outlined by Mahony and Helmke (1993) (cf. Chapter 4). The development of efficient numerical methods for pole placement by output feedback is a challenge. Methods from matrix calculus have been applied by Godbout and Jordan (1989) and more recently, gradient descent methods have been proposed (Mahony et al. 1993) (cf. Section 4.6).

1.2 Summary of Results The thesis is divided into seven chapters. Chapter 1 provides an overview of the subject matter considered. Chapters 2 to 4 consider three example optimization problems in detail. The first problem discussed is a smooth optimization problem which can be used to solve the symmetric eigenvalue problem. A considerable amount is known about the continuous-time gradient dynamical systems associated with this optimization problem and the development builds on this knowledge to generate a recursive numerical algorithm. The next problem considered is an optimization problem related to principal component analysis. A discussion of the continuoustime gradient flow is given before a numerical algorithm is developed. The connection of the numerical method proposed and classical numerical linear algebraic algorithms for the same task is investigated. The third example, drawn from the field of linear systems theory, is the task of pole placement for the class of symmetric linear systems. A discussion of the geometry of the task is undertaken yielding results with the flavour of traditional pole placement results. Continuous-time gradient flows are derived and used to investigate the structure of the optimization problem. A numerical method is also proposed based on the continuous-time gradient flow. The latter chapters approach the subject from a theoretical perspective. In Chapter 5

20

Introduction

Chapter 1

a theoretical foundation is laid in which the algorithms proposed in Chapters 2 to 4 may be understood. Chapter 6 goes on to consider the particular numerical algorithms proposed in detail and provides a template for designing numerical optimization algorithms for any constrained optimization problem on a homogeneous space. Later in Chapter 6 a more sophisticated numerical algorithm based on the Newton-Raphson algorithm is developed in a general context. The algorithm is applied to a specific problem (the symmetric eigenvalue problem) to provide an example of how to use the theory in practise. Concluding remarks are contained in Chapter 7. The principal results contained in Chapters 2 to 6 are summarised below. Chapter 2: In this chapter a numerical algorithm, termed the the double-bracket algorithm

Hk+1 = e;k Hk N ]Hk ek Hk N ] is proposed for computing the eigenvalues of an arbitrary symmetric matrix. For suitably small

k , termed time-steps, the algorithm is an approximation of the solution to the continuoustime double-bracket equation. Since the matrix exponential of a skew symmetric matrix is orthogonal it follows that this iteration has the important property of preserving the spectrum of the iterates. That is the eigenvalues of Hk remain constant for all k. By choosing a suitable

diagonal target matrix N the sequence Hk will converge to a diagonal matrix from which the

eigenvalues of H0 can be directly determined. To ensure that the algorithm converges a suitable step-size k must be chosen at each step. Two possible choices of schemes are presented along with analysis showing that the algorithm converges to the desired matrix for almost all initial conditions. A related algorithm for determining the singular values of an arbitrary (not necessarily square) matrix is proposed and is shown to be equivalent to the double-bracket equation applied to an augmented symmetric system. An analysis of convergence behaviour showing linear convergence to the desired limit points is presented. Associated with the main algorithms presented for the computation of the eigenvalues or singular values of matrices are algorithms evolving on Lie-groups of orthogonal matrices which compute the full eigenspace decompositions of given matrices. The material presented in this chapter was first published in the conference article (Moore, Mahony & Helmke 1992). A journal paper based on an expanded version of the conference

x1-2

Summary of Results

21

paper is to appear this year (Moore et al. 1994). Chapter 3: In this chapter an investigation is undertaken of the properties of Oja’s learning equation

X_ = XX T NX ; NX N = N T 2 Rn n evolving on the set of matrices

fX 2 Rn

n m matrices where n m are integers.

m

j XT X

=

Img, the Stiefel manifold of real

This differential equation was proposed by Oja

(1982, 1989) as a model for learning in certain neural networks. Explicit proofs of convergence for the flow are presented which extend the results in Yan et al. (1994) so that no genericity assumption is required on the eigenvalues of

N.

The homogeneous nature of the Stiefel

manifold allows one to develop an explicit numerical method (a discrete-time system evolving on the Stiefel manifold) for principal component analysis. The method is based on a modified gradient ascent algorithm for maximising the scalar potential

RN (X ) = tr(X T NX ) known as the generalised Rayleigh quotient. Proofs of convergence for the numerical algorithm proposed are given as well as some modifications and observations aimed at reducing the computational cost of implementing the algorithm on a digital computer. The discrete method proposed is similar to the classical power method and steepest ascent methods for determining the dominant p-eigenspace of a matrix N . Indeed, in the case where

p = 1 (for a particular

choice of time-step) the discretization is shown to be equivalent to the power method. When

p>

1, however, there are subtle differences between the power method and the proposed

method. The chapter is based on the journal paper (Mahony, Helmke & Moore 1994). Applications of the same ideas have also been considered in the field of linear programming (Mahony & Moore 1992). Chapter 4: In this chapter, the task of pole placement is considered for a structured class of systems (those with symmetric state space realisations) for which, to my knowledge, no previous pole placement results are available. The assumption of symmetry of the realisation, besides having a natural network theoretic interpretation, simplifies the geometric analysis considerably. It is shown that a symmetric state space realisation can be assigned arbitrary

22

Introduction

Chapter 1

(real) poles via symmetric output feedback if and only if there are at least as many system inputs as states. This result is surprising since a naive counting argument (comparing the number of

m(m + 1) of symmetric output feedback gain to the number of poles n of a symmetric realization having m inputs and n states) would suggest that 12 m(m + 1) n is free variables

1 2

sufficient for pole placement. To investigate the problem further gradient flows of least squares cost criteria (functions of the matrix entries of realisations) are derived on smooth manifolds of output feedback equivalent symmetric realisations. Limiting solutions to these flows occur at minima of the cost criteria and relate directly to finding optimal feedback gains for system assignment and pole placement problems. Cost criteria are proposed for solving the tasks of system assignment, pole placement, and simultaneous multiple system assignment. The theoretical material contained in Sections 4.1 to 4.4 along with the simulations in Section 4.5 are based on the journal paper (Mahony & Helmke 1993) while the numerical method proposed in Section 4.6 was presented at the 1993 Conference on Decision and Control (Mahony et al. 1993). Much of the material presented in this chapter was developed in conjunction with the results contained the monograph (Helmke & Moore 1994b, Section 5.3), which focusses on general linear systems. Chapter 5: In this chapter a brief review of the relevant theory associated with developing numerical methods on homogeneous spaces is presented. The focus of the development is on classes of homogeneous spaces encountered in engineering applications and the simplest theoretical constructions which provide a mathematical foundation for the numerical methods proposed. A discussion is given of the relationship between gradient flows on Lie-groups and homogeneous spaces (related by a group action) which motivates the choice of a particular Riemannian structure for a homogeneous space. Convergence behaviour of gradient flows is also considered. The curves used in constructing numerical methods in Chapters 2 to 4 are all based on matrix exponentials and the theory of the exponential map as a Lie-group homomorphism is reviewed to provide a theoretical foundation for this choice. Moreover, a characterisation of the geodesics associated with the Levi-Civita connection (derived from a given Riemannian metric) is discussed and conditions are given on when the matrix exponential maps to a geodesic curve on a Lie-group. Finally, an explicit discussion of the relationship between geodesics on Lie-groups and homogeneous spaces is given. Much of the material presented is standard or at least easily accessible to people working

x1-2

Summary of Results

23

in the fields of Riemannian geometry and Lie-groups. However, this material is not standard knowledge for researchers in the field of systems engineering. Moreover, the development strongly emphasizes the aspects of the general theory that is relevant to problems in linear systems theory. Chapter 6: In this chapter the gradient descent methods developed in Chapters 2 to 4 are reviewed in the context of the theoretical developments of Chapter 5. The conclusion is that the proposed algorithms are modified gradient descent algorithms where geodesics are used to replace the straight line interpolation of the classical gradient descent method. This provides a template for a simple numerical approach suitable for solving any scalar optimization problem on a homogeneous space. Later in Chapter 6 a coordinate free Newton-Raphson method is proposed which evolves explicitly on a Lie-group. This method is proposed in a general form with convergence analysis and then used to generate a quadratically convergent numerical method for the symmetric eigenvalue problem. A comparison is made to the

QR algorithm

applied to an example taken from Golub and Van Loan (1989, pg. 424) which shows that the Newton-Raphson method proposed converges in the same number of iterations as the classical

QR method.

Chapter 2

Numerical Gradient Algorithms for Eigenvalue Calculations A traditional algebraic approach to determining the eigenvalue and eigenvector structure of an arbitrary matrix is the QR algorithm. In the early eighties it was observed that the QR algorithm is closely related to a continuous-time differential equation which had become known through study of the Toda lattice. Symes (1982), and Deift et al. (1983) showed that for tridiagonal real symmetric matrices, the

QR algorithm is a discrete-time sampling of the solution to a

continuous-time differential equation. This result was generalised to full complex matrices by Chu (1984a), and Watkins and Elsner (1989b) provided further insight in the late eighties. Brockett (1988) studied dynamical matrix flows generated by the double Lie-bracket1 equation,

H_ = H H N ]] H (0) = H0 for constant symmetric matrices

N and H0 .

This differential equation is termed the double-

bracket equation, and solutions of this equation are termed double-bracket flows. Similar matrix differential equations appear earlier than those references given above in Physics literature. An 1

The Lie-bracket of two square matrices X , Y

2

Rnn is

XY ] = XY Y X: If X = X T and Y = Y T are symmetric matrices then XY ]T = XY ] is a skew symmetric matrix.

;

;

24

x2-0

Introduction

25

example, is the Landau-Lifschitz-Gilbert equation of micromagnetics

dm^ = (m^ H ; m^ dt 1 + 2 as

! 1 and = ! k, a constant.

m H )) jm^ j2 = 1

(^

In this equation

m ^ H 2

R3 and the cross-product

is equivalent to a Lie-bracket operation. The relationship between this type of differential equation and certain problems in linear algebra, however, has only recently been investigated. An important property of the double-bracket equation is that its solutions have constant spectrum (i.e. the eigenvalues of a solution remain the same for all time) (Chu & Driessel 1990, Helmke & Moore 1994b). By suitable choice of the matrix parameter N Brockett (1988) showed that the double-bracket flow can be used to diagonalise real symmetric matrices (and hence compute their eigenvalues), sort lists, and even to solve linear programming problems. In independent work by Driessel (1986), Chu and Driessel (1990), Smith (1991) and Helmke and Moore (1990), a similar gradient flow approach was developed for the task of computing the singular values of a general non-symmetric, non-square matrix. The differential equation obtained in these approaches is almost identical to the double-bracket equation. In Helmke and Moore (1990), it is shown thatthese flows can also be derived as special cases of the double-bracket equation for a non-symmetric matrix, suitably augmented to be symmetric. When the double-bracket equation is viewed as a dynamical solution to linear algebra problems (Brockett 1988, Chu & Driessel 1990, Helmke & Moore 1994b) one is lead naturally to consider numerical methods based on the insight provided by the double-bracket flow. In particular, the double-bracket flow evolves on a smooth submanifold of matrix space, the set of all symmetric matrices with a given spectrum (Helmke & Moore 1994b, pg. 50). A numerical method with such a property is termed constraint stable (cf. page 2) Such methods are particularly of interest when accuracy or robustness of a given computation is an important consideration. Robustness is of particular interest for engineering applications where input data will usually come with added noise and uncertainty. As a consequence when one considers numerical approximation of solutions to the double-bracket equation it is important to study those methods which preserve the important structure of the double-bracket flow. For the particular problem of determining the eigenvalues of a symmetric matrix, there are many well tested and fast numerical methods available. It is not so much to challenge

26


Chapter 2

established algorithms in speed or efficiency that one would study numerical methods based on the double-bracket equation. Rather, with the developing theoretical understanding of a number of related differential matrix equation (many of which with important applications in linear systems theory, for example the area of balanced realizations (Imae, Perkins & Moore 1992, Perkins et al. 1990) one may look upon a detailed study of numerical methods based on the double-bracket flow as providing a stepping stone to a new set of robust and adaptive computational methods in linear systems theory. The material presented in this chapter was first published in the conference article (Moore et al. 1992). A journal paper based on an expanded version of the conference paper is to appear this year (Moore et al. 1994). In this Chapter, I propose a numerical algorithm, termed the the double-bracket algorithm, for computing the eigenvalues of an arbitrary symmetric matrix,

Hk+1 = e;k Hk N ]Hk ek Hk N ]: For suitably small k, termed time-steps, the algorithm is an approximation of the solution to the continuous-time double-bracket equation. Since the matrix exponential of a skew symmetric matrix is orthogonal it seen that this iteration has the important property of preserving the spectrum of the iterates. It is shown that for suitable choices of time-steps the double-bracket algorithm inherits the same equilibria and limit points as the double-bracket flow and displays linear convergence to its limit. A related algorithm for determining the singular values of an arbitrary (not necessarily square) matrix is proposed and is shown to be equivalent to the doublebracket equation applied to an augmented symmetric system. An analysis of convergence behaviour showing linear convergence to the desired limit points is presented. Associated with the main algorithms presented for the computation of the eigenvalues or singular values of matrices are algorithms which compute the full eigenspace decompositions of given matrices. These algorithms also display linear convergence to the desired limit points. The chapter is divided into seven sections. In Section 2.1 the double-bracket algorithm is introduced and the basic convergence results are presented. Section 2.2 deals with choosing step-size selection schemes, and proposes two valid methods for generating the time-steps

k . Section 2.3 discusses the question of stability and proves that the double-bracket equation

x2-1


27

has a unique attractive fixed point under assumptions that both the step-size selection schemes proposed in Section 2.2 satisfy. The remainder of the chapter deals with computing the singular values of an arbitrary matrix, Section 2.4 and computing the full spectral decomposition of symmetric (or arbitrary) matrices Section 2.5. A number of computational issues are briefly mentioned in Section 2.6 and Section 2.7 considers some remaining open issues.

2.1 The Double-Bracket Algorithm In this section a brief review of the continuous-time double-bracket equation is given with emphasis on its interpretation as a gradient flow. The double-bracket algorithm is introduced and conditions are given which guarantee convergence of the algorithm to the desired limit point. Let N and H be real symmetric matrices, and consider the potential function

(H )

:= =

jjH ; N jj2 jjH jj2 + jjN jj2 ; 2tr(NH )

(2.1.1)

where the norm used is the Frobenius norm

jjX jj2 := tr(X T X ) =

X

x2ij

with xij the elements of X . Note that (H ) measures the least squares difference between the elements of H and the elements of N . Let M (H0 ) be the set of orthogonally similar matrices,

generated by some symmetric initial condition H 0

=

H0T 2 Rn

n . Then

M (H0) = fU T H0 U j U 2 O(n)g where O(n) denotes the group of all n

:1:2)

(2

n real orthogonal matrices. It is shown in Helmke and

Moore (1994b, pg. 48) that M (H0) is a smooth compact Riemannian manifold with explicit

forms given for its tangent space and Riemannian metric. Furthermore, in the articles (Bloch, Brockett & Ratiu 1990, Chu & Driessel 1990) the gradient of

(H ), with the respect to the

28


normal Riemannian metric2 on

M (H0) (Helmke & Moore 1994b, pg.

Chapter 2 50), is shown to be

grad (H ) = ;H H N ]]. Consider the gradient flow given by the solution of

H_

=

;grad(H )

=

H H N ]]

(2.1.3) with H (0) = H0

which is termed the double-bracket flow (Brockett 1988, Chu & Driessel 1990). Thus, the double-bracket flow is a gradient flow which acts to decrease, or minimise, the least squares potential , on the manifold M (H0 ). Note that from (2.1.1), this is equivalent to increasing, or maximising, tr(NH ). The matrix H0 is termed the initial condition, and the matrix N is referred to as the target matrix. The double-bracket algorithm proposed in this chapter is,

Hk+1 = e;k Hk N ] Hk ek Hk N ]

:1:4)

(2

for arbitrary symmetric n

n matrices H0 and N , and some suitably small scalars k , termed time-steps. Consider the curve Hk+1 (t) = e;tHk N ] Hk etHk N ] where Hk+1 (0) = Hk and Hk+1 = Hk+1 ( k ), the (k + 1)’th iteration of (2.1.4). Observe that d (e;tHk N ]H etHk N ]) = H H N ]] k k k t=0 dt

and thus, e;tHk N ] Hk etHk N ] is a first approximation of the double-bracket flow at

Hk 2

M (H0 ). It follows that for small k , the solution to (2.1.3) evaluated at time t = k with H (0) = Hk , is approximately Hk+1 = Hk+1 ( k ). It is easily seen from above that stationary points of (2.1.3) will be fixed points of (2.1.4). In general, (2.1.4) may have more fixed points than just the stationary points of (2.1.3), however, Proposition 2.1.5 shows that this is not the case for suitable choice of time-step k . The term equilibrium point is used to refer to fixed points of the algorithm which are also stationary points of (2.1.3). To implement (2.1.4) it is necessary to specify the time-steps k . This is accomplished by 2

A brief discussion of the derivation of gradient flows on Riemannian manifolds is given in Sections 5.3 and 5.4.

x2-1


considering functions N :

M (H0) ! R+ and setting k := N (Hk ).

29

The function N is

termed the step-size selection scheme.

M (H0 ) ! R+ be a step-size selection scheme for the doublebracket algorithm on M (H0). Then N is well defined and continuous on all of M (H 0 ), except possibly those points H 2 M (H0 ) where HN = NH . Furthermore, there exist real numbers B > 0, such that B > N (H ) for all H 2 M (H0) where N is well defined. Condition 2.1.1 Let N :

Remark 2.1.2 The variable step-size selection scheme proposed in this chapter is discontinuous at all the points H

2 M (H0), such that H N ] = 0.

2

Remark 2.1.3 Observe that the definition of a step-size selection scheme depends implicitly on the matrix parameter N . Indeed, N may be thought of as a function in two matrix variables

2

N and H .

Condition 2.1.4 Let N be a diagonal n

: : : > n .

n matrix with distinct diagonal entries 1 > 2 >

Let 1

> 2 > : : : > r be the eigenvalues of H0 with associated algebraic multiplicities n1 : : : nr satisfying Pri=1 ni = n. Since H0 is symmetric, the eigenvalues of H0 are all real and the diagonalisation of H0 is

2 66 1I. n 0. := 6 64 .. . . . .. 0 r Inr 1

where Ini is the ni

ni identity matrix.

3 77 77 5

:1:5)

(2

For generic initial condition H0 and a target matrix

N that satisfies Condition 2.1.4, the continuous-time equation (2.1.3) converges exponentially fast to (Brockett 1988, Helmke & Moore 1994b). Thus, the eigenvalues of H0 are the diagonal entries of the limiting value of the solution to (2.1.3). The double-bracket algorithm behaves similarly to (2.1.3) for small k and, given a suitable step-size selection scheme, should converge to the same equilibrium as the continuous-time equation.

30


Chapter 2

Proposition 2.1.5 Let H0 and N be n

n real symmetric matrices where N satisfies Condition 2.1.4. Let (H ) be given by (2.1.1) and let N : M (H0 ) ! R+ be a step-size selection scheme that satisfies Condition 2.1.1. For Hk 2 M (H0), let k = N (Hk ) and define

4 (Hk k ) := (Hk+1) ; (Hk)

:1:6)

(2

where Hk+1 is given by (2.1.4). Suppose

4 (Hk k) < 0

when Hk N ] 6= 0:

(2.1.7)

Then:

a) The iterative equation (2.1.4) defines an isospectral (eigenvalue preserving) recursion on the manifold M (H0). b) The fixed points of (2.1.4) are characterised by matrices H

2 M (H0) satisfying

H N ] = 0:

:1:8)

c) Every solution Hk , for k

=

(2

1 2 : : :, of (2.1.4), converges as

M (H0) where H1 N ] = 0.

Proof To prove part a), note that the Lie-bracket H N ]T

=

k ! 1, to some H1 2

;H N ] is skew-symmetric. As

the exponential of a skew-symmetric matrix is orthogonal, (2.1.4) is an orthogonal similarity transformation of Hk and hence is isospectral. For part b) note that if Hk N ] = 0, then by direct substitution into (2.1.4) then Hk+1 = Hk .

Hk+l = Hk for l 1, and Hk is a fixed point of (2.1.4). Conversely if Hk N ] 6= 0, then from (2.1.7), 4 (Hk k ) 6= 0, and thus Hk+1 6= Hk . By inspection, points satisfying Thus

(2.1.8) are stationary points of (2.1.3), and indeed are known to be the only stationary points of (2.1.3) (Helmke & Moore 1994b, pg. 50). Thus, the fixed points of (2.1.4) are equilibrium points, in the sense that they are all stationary points of (2.1.3). In order to prove part c) the following lemma is required.

x2-1


Lemma 2.1.6 Let

N

31

satisfy Condition 2.1.4 and N satisfy Condition 2.1.1 such that the

double-bracket algorithm satisfies (2.1.7). The double-bracket algorithm, (2.1.4), has exactly

n!= Qri=1 (ni!) distinct equilibrium points in M (H 0). These equilibrium points are characterised by the matrices T , where is an n n permutation matrix, a rearrangement of the rows of the identity matrix, and is given by (2.1.5).

Proof Note that part b) of Proposition 2.1.5 characterises equilibrium points of (2.1.4) as

H 2 M (H0) such that H N ] = 0. Evaluating this condition component wise, for H = fhij g, gives

hij (j ; i) = 0 and hence by Condition 2.1.4, hij

=

0 for i

6= j .

Using the fact that (2.1.4) is isospectral, it

follows that equilibrium points are diagonal matrices which have the same eigenvalues as H0. Such matrices are distinct, and can be written in the form T , for an n

n permutation

matrix. A simple counting argument yields the number of matrices which satisfy this condition to be n!=

Qr (n !). i=1 i

H0, the sequence fHk g generated by the doublebracket algorithm. Observe that condition (2.1.7) implies that (Hk ) is strictly monotonic decreasing for all k where Hk N ] 6= 0. Also, since is a continuous function on the compact set M (H0), then is bounded from below, and (Hk ) will converge to some non-negative value 1 . As (Hk ) ! 1 then 4 (Hk k ) ! 0. Consider for a fixed initial condition

For an arbitrary positive number , define the open set

D M (H0), consisting of all

M (H0), within an neighbourhood of some equilibrium point of (2.1.4). The set M (H0) ; D is a closed, compact subset of M (H0) on which the matrix function H 7! H N ] points of

does not vanish. As a consequence, the difference function (2.1.6) is continuous and strictly

M (H0) ; D, and thus, can be over bounded by some strictly negative number 1 < 0. Moreover, as 4 (Hk k ) ! 0 then there exists a K = K (1) such that for all k > K then 0 4 (Hk k ) > 1 . This ensures that Hk 2 D for all k > K . In other words, Hk is negative on

converging to some subset of possible equilibrium points. Imposing the upper bound

B

on the step-size selection scheme

N ,

Condition 2.1.4,

N (Hk )Hk N ] ! 0 as k ! 1. Thus, eN (Hk )Hk N ] ! I , the identity matrix, and hence, e;N (Hk )Hk N ]Hk eN (Hk )Hk N ] ! Hk as k ! 1. As a consequence

it follows that

32


Chapter 2

jjHk+1 ; Hk jj ! 0 for k ! 1 and this combined with the distinct nature of the fixed points, Lemma 2.1.6, and the partial convergence already shown, completes the proof.

2.2 Step-Size Selection The double-bracket algorithm (2.1.4) requires a suitable step-size selection scheme before it can be implemented. To generate such a scheme, one can use the potential (2.1.1) as a measure of the convergence of (2.1.4) at each iteration. Thus, one chooses each time-step to maximise the absolute change in potential j4 j of (2.1.6), such that 4 < 0. Optimal time-steps can be determined at each step of the iteration by completing a line search to maximise the absolute change in potential as the time-step is increased. Line search methods, however, involve high computational overheads and it is preferable to to obtain a step-size selection scheme in the form of a scalar relation depending on known values. Using the Taylor expansion, 4 (Hk ) is expressed as a linear term plus a higher order

error term in a general time-step . By estimating the error term one obtains a mathe-

4U (Hk ) which is an upper bound to 4(Hk ) for all . Choosing a suitable time-step, k , based on minimising 4 U the actual change in potential, 4(Hk k) 4U (Hk k ) < 0, will satisfies (2.1.7). Due to the simple nature of the function 4U , there is an explicit form for the time-step k depending only on Hk and N .

matically simple function

Lemma 2.2.1 For the

k’th

step of the recursion (2.1.4) the change in potential

4(Hk ) of (2.1.6), for a time-step is 4(Hk ) = ;2 jjHk N ]jj2 ; 2 2tr(N R2( )) with

R2 ( ) :=

Z1 0

(1

; s)Hk00+1 (s )ds

:2:1)

(2

:2:2)

(2

where Hk00+1 ( ) is the second derivative of Hk+1 ( ) with respect to .

Hk+1 ( ) be the (k + 1)’th recursive estimate for an arbitrary time-step . Thus Hk+1 ( ) = e; HkN ]Hk e HkN ]. It is easy to verify that the first and second time derivatives

Proof Let

x2-2

Step-Size Selection

33

of Hk+1 are exactly

Hk0 +1 ( ) Hk00+1 ( )

= =

Hk+1 ( ) Hk N ]] Hk+1 ( ) Hk N ]] Hk N ]]:

Applying Taylor’s theorem, then

Hk+1 ( )

= =

Z1 Hk+1 (0) + dd Hk+1 (0) + 2 (1 ; s)Hk00+1 (s )ds 0 2 Hk + Hk Hk N ]] + R2( ):

(2.2.3)

Consider the change in the potential (H ) between the points Hk and Hk+1 ( ),

4(Hk )

= = = =

Observe that for

=

0 then

(Hk+1( )) ; (Hk) ;2tr(N (Hk+1( ) ; Hk )) ;2tr(N ( Hk Hk N ]] + 2 R2( ))) ;2 jjHk N ]jj2 ; 2 2tr(N R2( ))

4(Hk 0)

=

(2.2.4)

d 4 (Hk ) 0, and also that d =0

=

;2jjHk N ]jj2. Thus, for sufficiently small the error term 2tr(N R2( )) becomes negligible, and 4 (Hk ) is strictly negative. Let opt > 0 be the first time for which d 4 (Hk ) = 0, then 4 (Hk opt ) < 4 (Hk ) < 0 for all strictly positive d = opt

< opt. It is not possible, however, to estimate opt directly from (2.2.4) due to the transcendental nature of the error term R2( ). Approximating the error term by a quadratic function in allows one to compute an explicit step-size selection scheme based on this estimate.

Lemma 2.2.2 (Constant Step-Size Selection Scheme) The constant time-step

cN = 4jjH jj1 jjN jj 0

:2:5)

(2

satisfies Condition 2.1.1. Furthermore, the double-bracket algorithm, equipped with the stepsize selection scheme cN , satisfies (2.1.7).

34


αk

Chapter 2

α

∆ψU(Η,α) < 0 ∆ψU(Η,α) ∆ψU (Η,α) > ∆ψ(Η,α)

Figure 2.2.1: The upper bound on

4 (Hk ) viz 4 U (Hk )

Proof Recall that for the Frobenius norm jtr(XY )j jjX jj jjY jj (follows from the Schwartz inequality). Then

4(Hk ) ;2 jjHk N ]jj2 + 2 2jtr(N R2( ))j ;2 jjHk N ]jj2 + 2 2jjN jj jjR2( )jj ;2 jjHk N ]jj2 + 2 2jjN jj Z1 (1 ; s)jjHk+1(s ) Hk N ]] Hk N ]]jjds 0 ;2 jjHk N ]jj2 + 4 2jjN jj jjH0jj jjHk N ]jj2 =: 4U (Hk ): (2.2.6) Thus 4U (Hk ) is an upper bound for 4 (Hk ) and has the property that for sufficiently

small , it is strictly negative, see Figure 2.2.1. Due to the quadratic form of 4U (Hk ) in

, it is immediately clear that ck = cN (Hk ) = 1=(4jjH0jjjjN jj) is the minimum of (2.2.6).

A direct norm bound of the integral error term is not likely to be a tight estimate of the error and the function 4U is a fairly crude bound for 4 . The following more sophisticated estimate results in a step-size selection scheme. Lemma 2.2.3 (An Improved Bound for 4 (Hk )) The difference function 4 (Hk ) can be over bounded by

4(Hk ) ;2 jjHk N ]jj2 + jjH0jj jjN Hk N ]]jj e2 jjHkN ]jj ; 1 ; 2 jjH N ]jj : k jjHk N ]jj =: 4U (Hk ): (2.2.7)

x2-2

Step-Size Selection

35

Proof Consider the infinite series expansion for the matrix exponential

eA = I + A + 12 A2 + 3!1 A3 + : It is easily verified that

eA Be;A

=

B + A B ] + 2!1 A A B]] + 3!1 A A A B ]]] +

=

adA B: i ! i=0

1 1 X

(2.2.8)

i

adA (adiA;1 B ) ad0A B

B where adA : Rn n ! Rn n is the linear map X 7! AX ; XA. Substituting ; Hk N ] and Hk for A and B in (2.2.8) and comparing with Here adiA B

=

=

(2.2.3), gives

2 R2( ) =

1 1 X

j

ad; Hk N ] (Hk ): j =2 j !

j

Considering jtr(N R2( ))j and using the readily established identity tr(N ad;A B )

=

tr((adjA N )B ) gives

1 X j 2tr(N R2( ))j = j1! tr adj HkN ](N )Hk j=2 1 X1 j j ! jjad Hk N ] (N )jj jjH0jj =

=

j =2 1 1 X

j ;1jjad (2jj Hk N ]jj) Hk N ] (N )jj jjH0jj j ! j =2 1 1 jjH0jj jjad Hk N ](N )jj X j (2 jjHk N ]jj) 2 jjHk N ]jj j ! j =2 jjH0jj jjN Hk N ]]jj e2 jjHkN ]jj ; 1 ; 2 jjH N ]jj : k 2jjHk N ]jj

Thus combining this with the first line of (2.2.6) gives (2.2.7). The variable step-size selection scheme is derived from this estimate of the error term in the same manner the constant step-size selection scheme was derived in Lemma 2.2.2.

36


Chapter 2

Lemma 2.2.4 ( Variable Step-Size Selection Scheme) The step-size selection scheme N :

M (H0 ) ! R+

N ]jj2 + 1)

N (H ) = 2jjH1 N ]jj log( jjH jjjjjjH N H N ]]jj

:2:9)

(2

0

satisfies Condition 2.1.1. Furthermore, the double-bracket algorithm, equipped with the stepsize selection scheme N , satisfies (2.1.7). Proof I first show that N satisfies the requirements of Condition 2.1.1. As the Frobenius norm is a continuous function then N is well defined and continuous at all points H

H N ] 6= 0.

2 M (H0)

H N ] = 0 then N is not well defined. To show that there exists a positive constant , such that N (H ) > , consider the following lower bound,

for which

LN

which is just

N .

When

jjHk N ]jj + 1) 2jjHk N ]jj 2jjH0jj jjN jj 2 2jjH1 N ]jj log( 2jjH jjjjjjHNkjjNjj]jjH N ]jj + 1) k 0 k 2 k N ]jj 2jjH1 N ]jj log( jjH jjjjjjHN + 1) Hk N ]]jj k 0 1

:=

log(

Using L’Hôpital’s rule it can be seen that the limit of

(2.2.10)

LN

at a point

H 2 M (H0) where H N ] = 0 is 1=(4jjH0jj jjN jj). Including these points in the definition L of L N , gives that N is a continuous, strictly positive, well defined function for all H 2 M (H0). Thus, since M (H0) is compact, there exists a real number > 0 such that

N LN > 0 on M (H0 ) ; fH1

j H1 N ] = 0g.

> 0, such that N (H ) < B, H 2 M (H0), set P ( ; )2x2 , H N ] = X = fxij g. For N given by Condition 2.1.4, then jjN X ]jj = j ij i6=j i where xii = 0 since H N ] is skew symmetric. Observe that To show that there exists a real number B

P x2 i6=j ij jjN X ]jj = Pi6=j (i ; j )2 x2ij maxi6=j (i ; j );2 =: b jjX jj

x2-3

Stability Analysis

for all choices of X

=

37

;X T . It follows that

N (H )

jjX jj2 + 1) 2jjX jj jjH0jj jjN X ]jj 1 jj jjb + 1) 2jjX jj log( jjX H0jj 2jjHb jj =: B

=

1

log(

0

since log(x + 1) x for x > 0.

Hk 2 M (H0 ), Hk N ] 6=

N (Hk ) = k > 0 minimises (2.2.7), and from Lemma 2.2.3 it follows that 0 4U (Hk ) 4 (Hk ). Thus, the double-bracket algorithm, equipped with the step-size selection scheme N , satisfies Finally, for a matrix

0, the time-step

(2.1.7) and the proof is complete.

2.3 Stability Analysis In this section the stability properties of equilibria of the double-bracket algorithm (2.1.4) are investigated. It is shown that for generic initial conditions, and any step-size selection scheme that satisfies Condition 2.1.1 and (2.1.7), a solution converges to the unique equilibrium point

,

fHk g of the double-bracket algorithm

given by (2.1.5). The algorithm is shown to

converge at least linearly in a neighbourhood of . Lemma 2.3.1 Let N satisfy Condition (2.1.4) and N be some selection scheme that satisfies Condition 2.1.1 and (2.1.7). The double-bracket algorithm (2.1.4) has a unique locally asymptotically stable equilibrium point , given by (2.1.5). All other equilibrium points of (2.1.4) are unstable. Proof It has been shown that the Hessian of the potential function (at an equilibrium point

in M (H0)) is always non-singular and is only negative definite at the point (cf. Duistermaat

M (H0) is compact then the By assumptions on N and N , (Hk ) is

et al. (1983) or Helmke and Moore (1994b, pg. 53)). Since local minimum at

is also a global minimum.

monotonically decreasing. Thus the domain of attraction of contains an open neighbourhood of , and hence, is a locally asymptotically stable equilibrium point of (2.1.4).

38


Chapter 2

All other equilibrium points H 1 are either saddle points or maxima of (Helmke & Moore

1994b, pg. 53). Thus for any neighbourhood D of an equilibrium point H1

some H0

2 D such that (H0) < (H1).

6= , there exists

It follows that the solution to the double-bracket

algorithm, with initial condition H 0 , will not converge to H1 and thus H1 is unstable.

Lemma 2.3.1 is sufficient to conclude that for generic initial conditions the double-bracket algorithm will converge to the unique matrix . It is difficult to characterise the set of initial conditions for which the algorithm converges to some unstable equilibrium point

H 1 6= .

For the continuous-time double-bracket flow, however, it is known that the unstable basins of attraction of such points are of zero measure in M (H0) (Helmke & Moore 1994b, pg. 53).

d 2 R+ be a constant such that 0 < d < 1=2jjH0jj2jjN jj2 and consider the constant step-size selection scheme, dN : M (H0) ! R+ ,

Lemma 2.3.2 Let

N satisfy Condition 2.1.4.

Let

dN (H ) = d: The double-bracket algorithm (2.1.4), equipped with the step-size selection scheme dN , has a unique locally asymptotically stable equilibrium point

,

given by (2.1.5). The rate of

convergence of the double-bracket algorithm converges at least linear in a neighbourhood . Proof Since dN is a constant function, the time-step dk

=

dN (Hk ) = d is constant.

Thus,

the map

Hk 7! e;dHk N ]Hk edHk N ] M (H0), and one may consider the linearisation of this map at the equilibrium point , given by (2.1.5). The tangent space T M (H0) at consists of those matrices = ] where 2 Skew(n), the class of skew symmetric matrices (Helmke & Moore 1994b, pg. 49). It is easily verified that the matrices 2 T M (H0) are independently parameterised by their components ij , where i < j , and i 6= j . Thus, computing the is a differentiable map on all

linearization of the double-bracket algorithm at the point one obtains

k+1 = k

; d( k N ; N k ) ; ( k N ; N k )]

:3:1)

(2

x2-3

Stability Analysis

39

2 TM (H0 ). Expressing this in terms of the linearly independent parameters ij , where i < j , and i = 6 j one has

for k

ij )k+1 = 1 ; d(i ; j )(i ; j )]( ij )k

(

for

i j = 1 n:

:3:2)

(2

The eigenvalues of the linearisation (2.3.1) can be read directly from (2.3.2) as 1 ; d(i ;

j )(i ; j ), for i < j and i 6= j . Since i j when i < j then if d < 1=2jjH0jj2jjN jj2 (where jjX jj2 is the induced matrix 2-norm, the maximum singular value of X ) it is easily verified that j1 ; d(i ; j )(i ; j )j < 1. It follows that is asymptotically stable with rate of convergence at least linear. The linear scaling factor for the convergence error is maxi 2 > : : : > r 0 are the distinct singular values of H0 , occurring with P multiplicities n 1 : : : nr such that ri=1 ni = n. By convention the singular values of a matrix

Here

are chosen to be non-negative. It should be noted that though such a decomposition always exists and is unique, there is no unique choice of orthogonal matrices V and U . Let S (H0) be the set of all orthogonally congruent matrices to H0 ,

S (H0) = f V T H0 U 2 Rm n j V 2 O(m) U 2 O(n)g:

:4:3)

(2

It is shown Helmke and Moore (1994b, pg. 89) that S (H0) is a smooth compact Riemannian manifold with explicit forms given for its tangent space and Riemannian metric. Following the articles (Chu 1986, Chu & Driessel 1990, Helmke & Moore 1990, Helmke & Moore 1994b, Smith 1991) consider the task of calculating the singular values of a matrix

H0 , by

: S (H 0) ! R+ , (H ) = jjH ; N jj2. It is shown Helmke and Moore (1990, 1994b) that achieves a unique local and global minimum at the point 2 S (H0). Moreover, in the articles (Helmke & Moore 1990, Helmke & Moore 1994b, Smith 1991) the explicit form for the gradient grad is calculated. The minimizing minimising the least squares cost function

gradient flow is

H_

= =

;grad(H ) H fH N g ; fH T N T gH

with H (0) = H0 the initial condition. Here the generalised Lie-bracket

fX Y g := X T Y ; Y T X = ;fX Y gT is used.

(2.4.4)

x2-4


Condition 2.4.1 Let N be an m

n matrix, with m n,

2 66 .1 6 .. N = 666 64 0

where 1

43

..

3 77 77 7 n 775 0 .. .

.

0(m;n) n

> 2 > : : : > n > 0 are strictly positive, distinct real numbers.

N that satisfies Condition 2.4.1, it is known that (2.4.4) converges exponentially fast to 2 S (H0) (Helmke & Moore 1990, Smith 1991). For H0 and N constant m n matrices, the singular value algorithm proposed is For generic initial conditions, and a target matrix

Hk+1 = e;k fHkT N T gHk ek fHk N g:

:4:5)

(2

This algorithm is analogous to the double-bracket algorithm eqLB:eq:DB3.

Lemma 2.4.2 Let

H0, N be m n matrices.

R(m+n) (m+n) , where

For any H

0 H 0 Hb = B @ mT m H

For any sequence of real numbers k , k

=

0n n

2 Rm

n define a map H

1 CA :

7! Hb 2 :4:6)

(2

1 : : : 1 the iterations,

Hk+1 = e;k fHkT N T gHk ek fHk N g

:4:7)

(2

with initial condition H 0 and

Hb k+1 = e;k Hbk Nb] Hb k ek Hbk Nb]

:4:8)

(2

b 0 are equivalent. with initial condition H Proof Consider the iterative solution to (2.4.8), and evaluate the multiplication in the block form of (2.4.6). This gives two equivalent iterative solutions, one the transpose of the other, both of which are equivalent to the iterative solution to (2.4.7).

44


b 0 and Nb are symmetric (m + n) Remark 2.4.3 Note that H

Chapter 2

m + n) matrices, and that as a result, the iteration (2.4.8) is just the double-bracket algorithm (2.1.4). 2 (

Remark 2.4.4 The equivalence given by this lemma is complete in every way. In particular,

H1 is an equilibrium point of (2.4.7) if and only if Hb 1 is an equilibrium point of (2.4.8). b k ! Hb 1 as k ! 1. Similarly, Hk ! H1 if and only if H 2

This leads one to consider step-size selection schemes for the singular value algorithm induced by selection schemes which were derived in Section 2.2 for the double-bracket algo-

c0) ! R+ is a step-size selection scheme for (2.1.4), on M (Hb 0), rithm. Indeed if Nb : M (H and Hk 2 S (H0), then one can define a time-step k for the singular value algorithm by

k = Nb (Hb k ): Thus, if (2.4.8) equipped with a step-size selection scheme

:4:9)

(2

Nb ,

satisfies Condition 2.1.1

and (2.1.7), then from Lemma 2.4.2, (2.4.7) will satisfy similar conditions. For the sake of simplicity the following development considers only the constant step-size selection scheme (2.2.5) and the variable step-size selection (2.2.9). Theorem 2.4.5 Let H0, N be m Let

n matrices where m n and N satisfies Condition 2.4.1.

Nb : M (Hc0) ! R+ be either the constant step-size selection (2.2.5),

or the variable

step-size selection (2.2.9). The singular value algorithm

Hk+1

k

= =

e;k fHkT N T g Hk ek fHk N g

Nb (Hb k )

with initial condition H 0, has the following properties:

i) The singular value algorithm is a self-equivalent (singular value preserving) recursion on the manifold S (H0). ii) If Hk is a solution of the singular value algorithm, then (H k ) = jjHk ; N jj2 is strictly monotonically decreasing for every k

2 N where fHk N g 6= 0 and fHkT N T g 6= 0.

x2-4


iii) Fixed points of the recursive equation are characterised by matrices H

45

2 S (H0) such

that

fHk N g = 0 and fHkT N T g = 0:

:4:10)

(2

Fixed points of the recursion are exactly the stationary points of the singular value gradient flow (2.4.4) and are termed equilibrium points. iv) Let Hk , k

1 2 : : :, be a solution to the singular value algorithm, then H k converges

=

to a matrix H1

2 S (H0), an equilibrium point of the recursion.

v) All equilibrium points of the double-bracket algorithm are strictly unstable except

,

given by (2.4.2), which is locally asymptotically stable with at least linear rate of convergence.

Proof To prove part i) note that the generalised Lie-bracket,

fX Y g = ;fX Y gT , is skew

symmetric, and thus (2.4.5) is an orthogonal congruence transformation and preserves the singular values of

Hk .

Also note that the potential

2.4.2 shows that the sequence from Proposition 2.1.5,

1 2

Hb k

(Hk ) = 12 (Hb k ).

Moreover, Lemma

is a solution to the double-bracket algorithm and thus,

(Hb k) must be monotonically decreasing for all k 2 N such that

Hb k Nb ] 6= 0, which is equivalent to (2.4.10). This proves part ii), and part iii) follows by noting that if fHkT N T g = 0 and fHk N g = 0, then Hk+l = Hk for l = 1 2 : : :, and Hk is a fixed point of (2.4.5). Moreover, since (Hk ) is strictly monotonic decreasing for all fHk N g 6= 0 and fHkT N T g 6= 0, then these points can be the only fixed points. It is known that these are

the only stationary points of (2.4.4) (Helmke & Moore 1990, Helmke & Moore 1994b, Smith 1991). In order to prove iv) one needs the following characterisation of equilibria of the singular value algorithm.

Lemma 2.4.6 Let

N satisfy Condition 2.4.1 and Nb

be either the constant step-size selec-

tion (2.2.5), or the variable step-size selection (2.2.9). The singular value algorithm (2.4.5) equipped with time-steps

k

=

Nb (Hb k ), has exactly 2n n!= Qri=1 (ni!) distinct equilibrium

46


Chapter 2

points in S (H0). Furthermore, these equilibrium points are characterised by the matrices

0 B@ T

0n (m;n)

0(m;n) n 0(m;n) (m;n)

where is an n

1 CA S

n permutation matrix, and S = diag(1 : : : 1) a sign matrix.

Proof Equilibrium points of (2.4.5) are characterised by the two conditions (2.4.10). Since N

satisfies Condition 2.4.1, then setting H

=(

j hji ; i hij = 0

hij ) one has fH N g = 0 is equivalent to

for i = 1 : : :n

j = 1 : : :n:

Similarly, the condition fH T N T g = 0 is equivalent to

j hij ; ihji hij j

=

0 for i = 1 : : :n

j = 1 : : :n for i = n + 1 : : :m j = 1 : : :n:

=

0

By manipulating the relationships, and using the distinct, positive nature of the easily shown that hij

=

0 for

i 6= j .

i ,

it is

Using the fact that (2.4.5) is self equivalent, the only

possible matrices of this form which have the same singular values as

H0 are characterised

as above. A simple counting argument shows that the number distinct equilibrium points is 2n n!=

Qr (n !). i=1 i

The proof of part iv) is now directly analogous to the proof of part c) Proposition 2.1.5. It remains only to prove part v), which involves the stability analysis of the equilibrium points characterised by (2.4.10). It is not possible to directly apply the results obtained in Section 2.3

b k , since the Nb does not satisfy Condition 2.1.4. However, for to the double-bracket algorithm H the constant step-size selection scheme induced by (2.2.5), and using analogous arguments to those used in Lemma 2.3.1 and 2.3.2, it follows that is the unique locally attractive equilibrium point of the singular value algorithm. Similarly, by linearizing (2.4.4) for continuous step-size selection schemes at the point , it can be shown that the rate of convergence is at least linear in

b is the unique exponentially a neighbourhood of . Thus, using Lemma 2.4.2 it follows that attractive equilibrium point of the double-bracket algorithm on

M (Hc0).

To obtain the same

results for the variable step-size selection scheme (2.2.9) one applies Proposition 2.3.5 to the

x2-5

Associated Orthogonal Algorithms

47

c0) and uses the equivalence given by Lemma 2.4.2 to obtain double-bracket algorithm on M (H the same result for the singular value algorithm (2.4.4). This completes the proof. Remark 2.4.7 The above theorem holds true for any time-steps k

=

Nb (Hb k ) induced by a

step-size selection scheme, Nb , which satisfies Condition 2.1.1, such that Theorem 2.3.6 holds.

2

2.5 Associated Orthogonal Algorithms In addition to finding eigenvalues or singular values of a matrix it is often desired to determine the full eigen-decomposition of a matrix, i.e. the eigenvectors related to each eigenvalue. Associated with the double-bracket algorithm and singular value algorithm there are algorithms evolving on the set of orthogonal matrices which converge to the matrix of orthonormal eigenvectors (for the double-bracket algorithm) and separate matrices of left and right orthonormal singular direction (for the singular value algorithm). To simplify the subsequent analysis one imposes a genericity condition on the initial condition H 0 .

H0 = H0T 2 Rn n is a real symmetric matrix then assume that H0 has distinct eigenvalues 1 > : : : > n . If H0 2 Rm n , where m n, then assume that H0 has distinct singular values 1 > : : : > n > 0.

Condition 2.5.1 If

For a sequence of positive real numbers k , for

k = 1 2 : : :, the associated orthogonal

double-bracket algorithm is

Uk+1 = Uk ek UkT H Uk N ] U0 2 O(n) 0

where

H0 = H0T 2 Rn

n is symmetric. For an arbitrary initial condition H0

:5:1)

(2

2 Rm

n the

associated orthogonal singular value algorithm is

Vk+1 Uk+1

= =

Vk ek fUkT H T Vk N T g V0 2 O(m) Uk ek fVkT H Uk N g U0 2 O(n): 0

0

(2.5.2)

48


Chapter 2

Note that in each case the exponents of the exponential terms are skew symmetric and thus, the recursions will remain orthogonal. Let H0

=

H0T 2 Rn

n and consider the map g : O(n) ! M (H0), U

is a smooth surjection. If Uk is a solution to (2.5.1) observe that

7! U T H0U , which

g (Uk+1 ) = e;k g(Uk )N ]g (Uk)ek g(Uk)N ] is the double-bracket algorithm (2.1.4). Thus, g maps the associated orthogonal double-bracket

U 0 , to the double-bracket algorithm with initial condition U0T H0U0 , on M (U0T H0 U0) = M (H0).

algorithm with initial condition

Remark 2.5.2 Consider the potential function on the set of orthogonal n

: O(n) ! R+ , (U ) = jjU T H0U ; N jj2

n matrices. Using the standard induced Riemannian metric from

Rn n on O(n), the associated orthogonal gradient flow is (Brockett 1988, Chu 1984a, Chu & Driessel 1990, Helmke & Moore 1994b)

U_ = ;grad (U ) = U U T H0U N ]:

2 H0T be a real symmetric n n matrix that satisfies Condition 2.5.1. n satisfy Condition 2.1.4, and let N be either the constant step-size selection

Theorem 2.5.3 Let H0 Let

N 2 Rn

=

(2.2.5) or the variable step-size selection (2.2.9). The recursion

Uk+1

k

= =

Uk ek UkT H Uk N ] U0 2 O(n)

N (Hk ) 0

referred to as the associated orthogonal double-bracket algorithm, has the following properties: i) A solution

Uk , k

=

1 2 : : :, to the associated orthogonal double-bracket algorithm

remains orthogonal. ii) Let

(U )

=

R+ . Let Uk ,

jjU T H0 U ; N jj2 be a map from O(n) to the set of non-negative reals

k = 1 2 : : :, be a solution to the associated orthogonal double-bracket

x2-5

Associated Orthogonal Algorithms algorithm. Then

(Uk ) is strictly monotonically decreasing for every k 2

49

N where

UkT H0Uk N ] 6= 0.

iii) Fixed points of the algorithm are characterised by matrices U

2 O(n) such that

U T H0 U N ] = 0:

There are exactly 2n n! distinct fixed points. iv) Let Uk ,

k = 1 2 : : :, be a solution to the associated orthogonal double-bracket algorithm, then Uk converges to an orthogonal matrix U1 , a fixed point of the algorithm.

v) All fixed points of the associated orthogonal double-bracket algorithm are strictly unstable, except those 2n points U

2 O(n) such that UT H0U =

where

=

diag(1 : : : n). Such points U are locally asymptotically stable with at

least linear rate of convergence and H0

H0.

=

U UT is an eigenspace decomposition of

T Proof Part i) follows directly from the orthogonal nature of ek Uk H0 Uk N ]. Note that in part

can be expressed in terms of the map g(U ) = U T H0U from O(n) to M (H0) and the double-bracket potential (H ) = jjH ; N jj2 of (2.1.1), i.e. ii) the definition of

(Uk ) = (g (Uk)): Observe that g (U0) = U0T H0U0 , and thus, g (Uk ) is the solution of the double-bracket algorithm

with initial condition U 0T H0 U0. As the step-size selection scheme N is either (2.2.5) or (2.2.9), then g (Uk ) satisfies (2.1.7). This ensures that part ii) holds.

Uk is a fixed point of the associated orthogonal double-bracket algorithm with initial condition U0T H0U0 , then g (Uk ) is a fixed point of the double-bracket algorithm. Thus, from Proposition 2.1.5, g (Uk ) N ] = UkT H0 Uk N ] = 0. Moreover, if UkT H0Uk N ] = 0 for some given k 2 N, then by inspection Uk+l = Uk for l = 1 2 : : :, and Uk is a fixed If

point of the associated orthogonal double-bracket algorithm. From Lemma 2.1.6 it follows

50


Chapter 2

U is a fixed point of the algorithm then U T H0 U = T for some permutation matrix . By inspection any orthogonal matrix W = SU T , where S is a sign matrix S = diag(1 : : : 1), is also a fixed point of the recursion, and indeed, any two fixed points are related in this manner. A simple counting argument shows that there are exactly 2n n! that if

distinct matrices of this form. To prove iv), note that since g (Uk ) is a solution to the double-bracket algorithm, it converges

H 1 2 M (H0), H1 N ] = 0 (Proposition 2.1.5). Thus Uk must converge to the preimage set of H1 via the map g . Condition 2.5.1 ensures that set generated by the 1 2 preimage of H1 is a finite distinct set, any two elements U1 and U1 of which, are related by U11 = U12 S , S = diag(1 : : : 1). Convergence to a particular element of this preimage follows since k UkT H0 Uk N ] ! 0 as in Proposition 2.1.5. to a limit point

O(n) is the same as the dimension of Thus g is locally a diffeomorphism on O(n),

To prove part v), observe that the dimension of

M (H0 ), due to the genericity Condition 2.5.1.

which forms an exact equivalence between the double-bracket algorithm and the associated orthogonal double-bracket algorithm. Restricting g to a local region the stability structure of

equilibria are preserved under the map g;1. Thus, all fixed points of the associated orthogonal

double-bracket algorithm are locally unstable except those that map via g to the unique locally asymptotically stable equilibrium of the double-bracket algorithm recursion. Observe that due to the monotonicity of (U k ) a locally unstable equilibrium is also globally unstable. The proof of the equivalent result for the singular value algorithm is completely analogous to the above proof.

m n satisfies Condition 2.5.1. satisfy Condition 2.4.1. Let the time-step k be given by

Theorem 2.5.4 Let

H0 2

Rm n where

Let

N2

Rm n

k = Nb (Hb ) where Nb is either the constant step-size selection (2.2.5), or the variable step-size selection b 0). The recursion scheme (2.2.9), on M (H

Vk+1 Uk+1

= =

Vkek fUkT H T Vk N T g V0 2 O(m) Uk ek fVkT H Uk N g U0 2 O(n) 0

0

x2-6


51

referred to as the associated orthogonal singular value algorithm, has the following properties:

Vk Uk ) be a solution to the associated orthogonal singular value algorithm, then both Vk and Uk remain orthogonal.

i) Let

(

ii) Let (V U ) = jjV T H0U ; N jj2 be a map from O(m)

O(n) to the set of non-negative reals R+ , then (Vk Uk ) is strictly monotonically decreasing for every k 2 N where fVkT H0Uk N g 6= 0 and fUkT H0T Vk N T g 6= 0. Moreover, fixed points of the algorithm are characterised by matrix pairs (V U ) 2 O(m) O(n) such that

fV T H0 U N g = 0 and fU T H0T V N T g = 0: Vk Uk ), k = 1 2 : : :, be a solution to the associated orthogonal singular value algorithm, then (Vk Uk ) converges to a pair of orthogonal matrices (V 1 U1 ), a fixed

iii) Let

(

point of the algorithm. iv) All fixed points of the associated orthogonal singular value algorithm are strictly unstable, except those points (V U) 2 O(m)

O(n) such that

VT H0U = where = diag(1 : : : n ) 2 Rm n . Each such point (V U) is locally exponentially

asymptotically stable and H 0

=

VT U is a singular value decomposition of H 0.

2.6 Computational Considerations There are several issues involved in the implementation of the double-bracket algorithm as a numerical tool which have not been dealt with in the body of this chapter. Design and implementation of efficient code has not been considered and would depend heavily on the nature of the hardware on which such a recursion would be run. As each iteration requires the calculation of a time-step, an exponential and a

k + 1 estimate it is likely that it would

be best to consider applications in parallel processing environments. Certainly in a standard computational environment the exponential calculation would limit the possible areas of useful application of the algorithms proposed.

52


Chapter 2

It is also possible to consider approximations of the double-bracket algorithm which have good computational properties. For example, consider a (1,1) Padé approximation to the matrix exponential

k Hk N ] : ek Hk N ] 22IIn + ; H N ] n

k

k

Such an approach has the advantage that, as Hk N ] is skew symmetric, the Padé approximation will be orthogonal, and will preserve the isospectral nature of the double-bracket algorithm. Similarly an (n n) Padé approximation of the exponential for any

n will also be orthogonal.

There are difficulties involved in obtaining direct step-size selection schemes based on Padé approximate double-bracket algorithms. Trying to guarantee that the potential is monotonic decreasing for such schemes by choosing step-size selection schemes based on the estimation techniques developed in Section 2.2 yields time-steps which are prohibitively small. A good heuristic choice of step-size selection scheme, however, can be made based on the selection schemes given in this chapter and simulations indicate that the Padé approximate double-bracket algorithm is viable when this is done.

2.7 Open Questions and Further Work One of the fundamental problems tackled in this chapter is the task of step-size selection. The best step-size selection scheme developed (2.2.9) is unsatisfactory in several ways; it is not continuous at critical points of the cost function and it is computationally expensive to evaluate. A better general understanding of the step-size selection task would be desirable. In particular, it may be possible to develop linear search techniques that are guaranteed to converge to the optimal step-size, obviating the need for approximations. One of the primary motivations for studying the symmetric eigenvalue problem from a dynamical systems perspective is the potential for applications to on-line and adaptive processes. It is instructive to consider how the double-bracket algorithm can be modified to deal with time-varying data. Subsection 2.7.1 is by no means a comprehensive treatment of this issue, nevertheless, it provides an indication of how such a task may be approached. To go beyond the treatment of Subsection 2.7.1 it would be desirable to consider a particular application and refine the algorithm to provide a useful numerical technique.

x2-7 2.7.1


53

Time-Varying Double-Bracket Algorithms

Consider a sequence of ‘input’ matrices

Ak

=

ATk

for which an estimate of the eigenvalues

Ak at each time k is required. One assumes that the spectrum of each Ak is related, for example the sequence Ak is slowly varying with k. If the sequence Ak is a noisy observation of

of some time-varying process or contains occasional large deviations then a sensible algorithm for estimating the spectrum of

Ak+1 would exploit the full data sequence A0 : : : Ak along

with the new data Ak+1 to generate a new estimate. A gradient descent algorithm achieves this in a fundamental manner since each new estimate is based on a small change in the previous estimate which in turn is based on the data sequence up to time k. The issue of constraint stability is of importance in such situations since the presence of small errors in the constraint at each step may eventually lead the estimates to stray some distance from the true spectrum. Given a symmetric matrix

H0

=

H0T

and a diagonal matrix

N

=

diag(1 : : : N )

satisfying Condition 2.1.4 consider the potential

(U ) = jjU T H0 U ; N jj2: In Section 2.5 the relationship

U T H0 U 2 M (H0 ) was exploited to display the connections

between the double-bracket algorithm and the associated orthogonal algorithm. However, it is also possible to rewrite the potential as

(U ) = jjUNU T ; H0 jj2F : Similarly, the associated orthogonal algorithm itself can be rewritten with the matrix N modified by an orthogonal congruency transformation

Uk+1

= =

Uk ek UkT H Uk N ] ek H Uk NUkT ]Uk : 0

0

The advantage of this formulation is the fact that matrix H0 appears explicitly in the algorithm. The time-varying associated orthogonal double-bracket algorithm is defined to be

Uk+1 = ek Ak Uk Nk UkT ] Uk U0 = I:

:7:1)

(2

54


Chapter 2

Ak+1 is given by Hk+1 = UkT+1 Ak+1 Uk+1 . Observe that the eigenvector decomposition of Ak+1 is derived from the data sequence up to time k and is applied to Ak+1 to approximate a spectral decomposition Ak+1 = Uk+1 Hk+1 UkT+1 where it is hoped that Hk+1 is nearly diagonal. An estimate of the spectral decomposition of

If Ak

=

H0 is constant it is easily seen that the time-varying associated orthogonal algorithm

reduces to the standard associated orthogonal algorithm. Also each step of the time-varying algorithm will reduce the potential jjAk ; Uk+1 NUkT+1 jj

jjAk ; Uk NUkT jj. Thus, as long

Ak does not vary too quickly with time the proposed algorithm should converge to and track the spectral decomposition of Ak .

as the sequence of matrices

Remark 2.7.1 A time-varying dual singular value decomposition algorithm is fully analogous

2

to the development given above.

Remark 2.7.2 If the sequence

Ak

is a stationary stochastic process it may be sensible to

replace the driving term Ak in (2.7.1) by Bk

=

1

n

Pn A . k =1 k

2

Chapter 3

Gradient Algorithms for Principal Component Analysis N T is that of finding an eigenspace of specified dimension p 1 which corresponds to the maximal p eigenvalues of N . There are a number of classical algorithms available for computing dominant eigenspaces The problem of principal component analysis of a symmetric matrix N

=

(principal components) of a symmetric matrix. A good reference for standard numerical methods is Golub and Van Loan (1989). There has been considerable interest in the last decade in using dynamical systems to solve linear algebra problems (cf. the review (Chu 1988) and the recent monograph (Helmke & Moore 1994b)). It is desirable to consider the relationship between such methods and classical algebraic methods. For example, Deift et al. (1983) investigated a matrix differential equation based on the Toda flow, the solution of which (evaluated at integer times) is exactly the sequence of iterates generated by the standard QR algorithm. In general, dynamical system solutions of linear algebra problems do not interpolate classical methods exactly. Discrete computational methods based on dynamical system solutions to a given problem provide a way of comparing classical algorithms with dynamical system methods. Recent work on developing numerical methods based on dynamical systems insight is contained Brockett (1993) and Moore et al. (1994). Concentrating on the problem of principal component analysis, Ammar and Martin (1986) 55

56


Chapter 3

have studied the power method (for determining the dominant p-dimensional eigenspace of

a symmetric matrix) as a recursion on the Grassmannian manifold Gp (Rn ), the set of all pdimensional subspaces of Rn . Using local coordinate charts on

Gp(Rn ) Ammar and Martin

(1986) show that the power method is closely related to the solution of a matrix Riccati differential equation. Unfortunately, the solution to a matrix Riccati equation may diverge to infinity in finite time. Such solutions correspond to solutions that do not remain in the original local coordinate chart. Principal component analysis has also been studied by Oja (1982, 1989) in relation to understanding the learning performance of a single layer neural network with n inputs and

p neurons.

Oja’s analysis involves computing the limiting solution of an explicit

matrix differential equation evolving on Rn p (there is no requirement for local coordinate representations). The evolution of the system corresponds to the ‘learning’ procedure of the neural network while the columns of the limiting solution span the principal component of the

E fuk uTk g (where E fuk uTk g is the expectation of uk uTk ) of the vector random process uk 2 Rn , k = 1 2 : : :, with which the network was ‘trained’. Recent work by

covariance matrix N

=

Yan et al. (1994) has provided a rigourous analysis of the learning equation proposed by Oja. Not surprisingly, it is seen that the solution to Oja’s learning equations is closely related to the solution of a Riccati differential equation. In this chapter I investigate the properties of Oja’s learning equation restricted to the Stiefel manifold (the set of all

n p real matrices with orthonormal columns).

Explicit proofs of

convergence for the flow are presented which extend the results of Yan et al. (1994) and Helmke and Moore (1994b, pg. 26) so that no genericity assumption is required on the eigenvalues of N . The homogeneous nature of the Stiefel manifold is exploited to develop an explicit numerical method (a discrete-time system evolving on the Stiefel manifold) for principal component analysis. The method proposed is a gradient descent algorithm modified to evolve explicitly on St(p n). A step-size must be selected for each iteration and a suitable selection scheme is proposed. Proofs of convergence for the proposed algorithm are given as well as modifications and observations aimed at reducing the computational cost of implementing the algorithm on a digital computer. The discrete method proposed is similar to the classical power method and steepest ascent methods for determining the dominant p-eigenspace of a matrix N . Indeed, in

the case where p = 1 (for a particular choice of time-step) the discretization is shown to be the

power method. When p > 1, however, there are subtle differences between the methods.

x3-1


57

This chapter is based on the journal paper (Mahony et al. 1994). Applications of the same ideas have also been considered in the field of linear programming (Mahony & Moore 1992). The chapter is organised into five sections including the introduction. Section 3.1 reviews the derivation of the continuous-time matrix differential equation considered and gives a general proof of convergence. In Section 3.2 a discrete-time iteration based on the results in Section 3.1 is proposed along with a suitable choice of time-step. Section 3.3 considers two modifications of the scheme to reduce the computational cost of implementing the proposed numerical algorithm. Section 3.4 considers the relationship of the proposed algorithm to classical methods while Section 3.5 indicates further possibilities arising from this development.

3.1 Continuous-Time Gradient Flow In this section a dynamical systems solution to the problem of finding the principal component of a matrix is developed. The approach is based on computing the gradient flow associated with a generalised Rayleigh quotient function. The reader is referred to Warner (1983) for technical details on Lie-groups and homogeneous spaces.

N T be a real symmetric n n matrix with eigenvalues 1 2 : : : n and an associated set of orthonormal eigenvectors v1 : : : vn . A maximal p-dimensional eigenspace, or maximal p-eigenspace of N is spfv1 : : : vpg the subspace of Rn spanned by fv1 : : : vpg. If p > p+1 then the maximal p-eigenspace of N is unique. If p = p+1 = = p+r , for some r > 0, then any subspace spfv1 : : : vp;1 wg where w 2 spfvp vp+1 : : : vp+r g is a maximal p-eigenspace of N . Let N

=

For p an integer with 1 p n, let St(p n) = fX

2 Rn p j X T X = Ipg

:1:1)

(3

where Ip is the p

p identity matrix, denote the Stiefel manifold of real orthogonal n p matrices. For X 2 St(p n), the columns of X are orthonormal basis vectors for a p-dimensional subspace of Rn .

Lemma 3.1.1 The Stiefel manifold St(p n) is a smooth compact np ; 12 p(p + 1)-dimensional

58


Chapter 3

submanifold of Rn p . The tangent space of St(p n), at a point X is given by

TX St(p n) = fX ; X j 2 Sk(n) 2 Sk(p)g where Sk(n), Sk(p) are the set of n

A 2 Rn

n.

:1:2)

(3

n (respectively p p) skew symmetric matrices A = ;AT ,

Proof It can be shown that St(p n) is a regular1 level set of the function X

7! X T X ; Ip

(Helmke & Moore 1994b, pg. 25). In this chapter, however, the homogeneous structure of St(p n) is important and it is best to introduce this structure here. Define G

O(n) O(p) to be the topological product of the set of n n and p p real orthogonal matrices O(n) = fU 2 Rn j U T U = UU T = Ing. Observe that G is a compact Lie-group (Helmke & Moore 1994b, pg. 348) with group multiplication given by matrix multiplication (U 1 V1) (U2 V2) = (U1 U2 V1 V2 ). Define a map : G St(p n) ! St(p n) ((U V ) X ) := UXV T :

=

:1:3)

(3

It is easily verified that is a smooth, transitive, group action on St(p n). Since G is compact

it follows that St(p n) is a compact embedded submanifold of Rn p (Helmke & Moore 1994b, pg. 352). The tangent space of St(p n) at a point X

2

St(p n) is given by the image of the

linearization of X : G ! St(p n), X (U ) := (U X ), at the identity element of G (Gibson 1979, pg. 75). Recall that the tangent space of

O(n) at the identity is TIn O(n) = Sk(n)

(Helmke & Moore 1994b, pg. 349) and consequently that the tangent space at the identity of

G is T(In Ip) G = Sk(n) Sk(p). Computing the linearization of X gives DX j(In Ip )( ) = X ; X DX j(In Ip)( ) is the Fréchet derivative of X T(In Ip)G.

where

at

In Ip) in direction ( ) 2

(

Given a function f : M ! N between two smooth manifolds, a regular point p 2 M is a point where the tangent map Tp f : Tp M ! Tf(p) N is surjective. Given q 2 N let U = fp 2 M j f (p) = qg then U is known as a regular level set if for each p 2 U then T p f is surjective. It can be shown (using the inverse function theorem) that regular level sets are embedded submanifolds of M (Hirsch 1976, pg. 22). 1

x3-1


The Euclidean inner product on Rn n

59

Rp p is

h(1 1) (2 2)i = tr(T1 2) + tr( T1 2 ): This induces a non-degenerate inner product on T(In Ip ) G. Given

linearization T(In Ip ) X decomposes the identity tangent space into

T(InIp )G = ker T(In Ip)X

X2

St(p n) then the

dom T(In Ip ) X

where ker T(InIp )X is the kernel of T(In Ip ) X and dom T(In Ip ) X

f

= (1 1 )

2 T(InIp )G j h(1 1) ( )i = 0 ( ) 2 ker T(InIp)X g

is the domain of T(In Ip ) X (the subspace orthogonal to ker T(In Ip ) X using the Euclidean

inner product provided on T(In Ip ) G). By construction, T(In Ip ) X restricts to a vector space

isomorphism T (?In Ip ) X ,

T(?InIp) X T(?InIp )X (X )

:

dom T(In Ip ) X

:=

T(In Ip ) X (X ):

! TX St(p n)

The normal Riemannian metric (cf. Section 5.3 or Helmke and Moore (1994b, pg. 52)) on St(p n) is the non-degenerate bilinear map on each tangent space

hh1X ; X 1 2X ; X 2ii = tr((?1 )T ?2 ) + tr(( ?1 )T ?2 ) where i X ; X i

:1:4)

(3

2 TX St(p n) for i = 1 2 and

(i i) = ((i )? ( i)? )

is the decomposition of

( i i)

(?i ?i )

into components in ker T(In Ip ) X and dom T(InIp )X re-

spectively. It is easily verified that hh ii varies smoothly with X and defines a Riemannian metric.

fx 2 Rn j jjxjj = 1g = S n;1 , the (n ; 1)-dimensional sphere in Rn . The tangent space of S n;1 is TxS n;1 = f 2 Rn j T x =

Remark 3.1.2 It can be shown that for p

=

1, St(1 n)

=

60


Chapter 3

0g, and the normal metric is hh ii = T , for , in Tx S n;1 (Helmke & Moore 1994b, pg.

2

25). The classical Rayleigh quotient is the map rN : Rn ; f0g ! R,

T

rN (x) = xxTNx x: This is generalised to the Stiefel manifold St(p n) as a function termed the generalised Rayleigh quotient

RN :

St(p n) ! R

RN (X ) = tr(X T NX ):

:1:5)

(3

The Ky-Fan minimax Principle states (Horn & Johnson 1985, pg. 191) max

RN (X )

=

1 + : : : + p

min

RN (X )

=

n+1;p + : : : + n:

X 2 St(p n) X 2 St(p n) Moreover, if X

2

St(p n), such that RN (X ) =

Pp , then the columns of X will generate j =1 i

a basis for a maximal p-dimensional eigenspace of N .

N

NT

n n matrix and p be an integer with 1 p n. Denote the eigenvalues of N by 1 : : : q with algebraic multiplicities n1 : : : nq such that Pqi=1 ni = n. For X 2 St(p n), define the generalised Rayleigh quotient RN : St(p n) ! R, RN (X ) = tr(X T NX ). Then,

Theorem 3.1.3 Given

i) The gradient of

=

a real symmetric

RN (X ), on the Stiefel manifold

St(p n), with respect to the normal

Riemannian metric (3.1.4), is gradRN (X ) = (In ; XX T )NX

=

XX T N ]X:

:1:6)

(3

ii) The critical points of RN (X ) on St(p n) are characterised by

XX T N ] = 0

and correspond to points X

eigenspace of N .

2

St(p n), such that the columns of X span a p-dimensional

x3-1


iii) For all initial conditions X 0

2

dX dt

61

St(p n), the solution X (t) 2 St(p n) of

=

gradRN (X )

=

(

In ; XX T )NX X (0) = X0

exists for all t 2 R and converges to some matrix X1

2

(3.1.7)

St(p n) as t ! 1. For almost

all initial conditions the solution X (t) of (3.1.7) converges exponentially fast to a matrix

whose columns form a basis for the maximal p-eigenspace of N . iv) When p = 1 the exact solution to (3.1.7) is given by

tN x(t) = jjeetN xx0 jj 0

where x0

:1:8)

(3

2 S n;1 = St(1 n).

Proof The gradient of RN is computed using the identities

i ii)] )]

DRN jX ( ) = hhgradRN (X ) ii 2 TX St(p n) gradRN (X ) 2 TX St(p n)

DRN jX ( ) is the Fréchet derivative of RN (X ) in direction 2 TX St(p n) evaluated at the point X 2 St(p n). Computing the Fréchet derivative of RN in direction X ; X 2 TX St(p n) gives

where

DRN jX (X ; X )

=

2tr(X T N (X ; X ))

=

2tr(XX T N ) ; 2tr(X T NX ):

Observe that tr(X T NX ) = 0 since X T NX is symmetric and is skew symmetric. Similarly, only the skew symmetric part of XX T N contributes to tr(XX T N ). Thus,

DRN jX (X ; X )

using the Riemannian metric (3.1.4).

=

tr(XX T N ])

=

hhN XX T ]X X ; X )ii

The second line follows since any component of

62


N XX T ] that

Chapter 3

lies in ker T(InIp )X does not contribute to the value of tr(XX T N ])

since one may choose

2 dom T(InIp)X and of course N XX T ] 2 Sk(n) which ensures

N XX T ]X 2 TX St(p n). This proves part i).

At critical points of RN the gradient gradRN (X ) is zero, N XX T ]X

orthogonal change of coordinates U

2 O(n)

0 X = UX = B @

Ip 0(n;pp)

=

0. Consider the

1 CA

0 1 N N N = UNU T = B @ 11 12 CA

and

N21 N22

where N11

2 Rp p, N12 = (N21 )T 2 Rp U N XX T ]X

= = =

since N XX T ]X

=

(n;p) and N

22

2 R(n;p)

T 0N X (X ) ]X1 0 0 ; N B@ B 12 C X =@ A

(n;p) . Observe that

N21

0

0

N21

1 CA

0

= 0 and thus N X (X )T ] = 0. It follows that 0. Consequently, N21

N XX T ] = 0. Observe that gradRN (X ) = 0 () NX = X (X T NX ) and the columns of X form a basis for a p-eigenspace of N .

Infinite time existence of solutions to (3.1.7) follows from the compact nature of St(p n).

By applying La’Salles invariance principle, it is easily verified that X (t) converges to a level set of

Lr .

RN for which gradRN (X ) = 0.

These sets are termed critical level sets and denoted

Lemma 3.1.4 Given N and RN as above. The critical level sets of RN in St(p n) are the sets

Lr = fX 2

St(p n) j RN (X ) =

q X i=1

rii

gradRN (X ) = 0g

which are indexed by vectors r = (r1 r2 : : : rq ), such that each ri is an integer between zero and ni inclusive (0

ri ni) and the sum Pqi=1 ri = p. For any X 2 Lr then the columns

x3-1


63

of X span an eigenspace of N associated with the union of r i eigenvalues i for i = 1 : : : q .

Each Lr is an embedded submanifold of St(p n). The tangent space of Lr is given by

TX Lr = fX ; X j 2 Sk(n) 2 Sk(p) and N XX T ]] = 0g: Proof The columns of a critical point

X2

St(p n) of

RN

span a p-eigenspace of

:1:9)

(3

N

and

X T NX are a subset of p eigenvalues of N . Index this subset by a vector r = (r1 r2 : : : rq ), such that each ri is an integer between zero and ni inclusive Pq r = p and each r represents the algebraic multiplicity of as (0 ri ni ), the sum i i i=1 i P q an eigenvalue of X T NX . It follows directly that RN (X ) = i=1 rii and thus the collection of sets Lr are the critical level sets of RN .

thus, the eigenvalues of

2 O(n) such that N = UT U and 1 1In 0 0 C C .. . 0 0 C CA

Since N is symmetric there exists U

0 BB =B B@

1

0

q Inq

0

ni identity matrix. To show that the critical level sets Lr are embedded submanifolds of St(p n) it is convenient to consider the problem where N is replaced by directly. In this case the critical level sets Lr of R on St(p n) are exactly Lr = UT Lr . The map X 7! UX is a diffeomorphism of St(p n) into itself which preserves submanifold with Ini the ni

structure.

O(n1 ) O(n2 ) O(nq ) O(p) and observe that H is a compact Lie group. Given an arbitrary index r consider the map : H Lr ! Lr , Let H

=

0 BB U1 ((U1 U2 : : :Uq V ) X ) = UXV T where U = B B@ 0 0

Observe that U T U

=

0 ..

0 .

0

0

Uq

1 CC CC : A

In and consequently R( ((U V ) X )) = R (X ). Moreover,

gradR ( ((U V ) X ))

= =

UXX T U T ]UXV T U T XX T ]XV T = 0

64


since it is assumed that gradR (X ) = XX T ]X

=

Chapter 3

0. It follows that is a group action of H

on Lr . If X and Y are both elements of Lr then X T X and Y T Y have the same eigenvalues and are orthogonally similar, i.e. there exists V

inspection one can find

U

2 O(p) such that V T X T XV = Y T Y . By

U U2 : : :Uq ) such that UXV

= ( 1

=

Y

which shows that

is a

transitive group action on Lr . It follows that Lr is itself a homogeneous space (with compact

Lie transformation group) and hence is an embedded submanifold of St(p n) (Helmke &

X 7! U X is a diffeomorphism of St(p n) into itself this shows that Lr = U Lr is also an embedded submanifold of St(p n). Moore 1994b, pg. 352). Since

Observe that any curve

Y (t)

U (t)XV T (t), Y (0)

=

=

X,

lying in

Lr

will satisfy

N Y (t)Y (t)T ]Y (t) = 0. Similarly, it is easily verified that any curve (passing through Y (0) = X 2 Lr ) satisfying this equality must lie in L r . Thus, the tangent space TX Lr is given by the equivalence classes of the derivatives at time t = 0, of curves Y (t) such that N Y (t)Y (t)T ]Y (t) = 0. Let U_ (0) = 2 Sk(n) and V_ (0) = 2 Sk(p) then

d N Y (t)Y (t)T ]Y (t) t=o dt

=

=

N XX T ; XX T ]X + N X X T ; X X T ]X T T + N XX ]X + N XX ]X T N XX ]]

since N XX T ] = 0 (cf. part ii) Theorem 3.1.3). But this is just the definition (3.1.9) and the result is proved. Now at a critical point of

RN

the Hessian

HRN

is a well defined bilinear map from

TX St(p n) to the reals (Helmke & Moore 1994b, pg. 344). Let (1 X ; X 1) 2 TX St(p n) and (2X ; X 2) 2 TX St(p n) be arbitrary then

HRN (1X ; X 1 2X ; X 2)

= = =

;

D X ;X D X ;X RN (X ) D X ;X tr(XX T N ]2) tr(1 XX T ] N ]2): 1

1

1

1

2

2

XX T ] is skew symmetric since XX T is symmetric and 1 is skew symmetric. Similarly, 1 XX T ] N ] is skew symmetric. Since 1 and 2 are arbitrary then HRN is degenerate in exactly those tangent direction (X ; X ) 2 TX St(p n) for which Observe that

1

x3-2


65

XX T ] N ] = 0. But this corresponds exactly to (3.1.9) and one concludes that the Hessian HRN degenerates only on the tangent space of Lr. It is now possible to apply Lemma 5.5.2 to

complete the proof of part iii). Part iv) of the theorem is verified by explicitly evaluating the derivative of (3.1.8). Remark 3.1.5 In the case 1 < p n no exact solution to (3.1.7) is known, however, for X (t) a solution to (3.1.7) the solution for H (t) = X (t)X (t)T is known since

H_ (t)

= = =

H (0) = X0X0T

_ T + XX _T XX NXX T + XX T N ; 2XX T NXX T NH (t) + H (t)N ; 2H (t)NH (t)

(3.1.10)

and this equation is a Riccati differential equation (Yan, Helmke & Moore

2

1994).

3.2 A Gradient Descent Algorithm In this section a numerical algorithm for solving (3.1.7) is proposed. The algorithm is based on a gradient descent algorithm modified to ensure that each iteration lies in St(p n). Let X0

2

St(p n) and consider the recursive algorithm generated by

Xk+1 = e;k Xk XkT N ]Xk

:2:1)

(3

for a sequence of positive real numbers k , termed time-steps. The algorithm generated by

(3.2.1) is referred to as the Rayleigh gradient algorithm. The Lie-bracket Xk XkT N ] is skew T symmetric and consequently e;k Xk Xk N ] is orthogonal and Xk+1 2 St(p n). Observe also

d e; Xk XkT N ]X = (I ; X X T )NX = gradR (X ) n k k k N k k d =0 T the gradient of RN at Xk . Thus, e; Xk Xk N ]Xk represents a curve in St(p n), passing through Xk at time = 0, and with first derivative equal to gradRN (X ). The linearization of

that

66


Chapter 3

Xk+1 ( ) = e; Xk XkT N ]Xk around = 0 is Xk+1 ( ) = Xk + gradRN (Xk ) + (higher order terms): The higher order terms modify the basic gradient descent algorithm on Rn p to ensure that the interpolation occurs along curves in St(p n). For suitably small time-steps k , it is clear that

(3.2.1) will closely approximate the gradient descent algorithm on Rn p .

To implement the Rayleigh gradient algorithm it is necessary to choose a time-step k , for each step of the recursion. A convenient criteria for determining suitable time-steps is to maximise the change in potential

4RN (Xk k ) = RN (Xk+1 ) ; RN (Xk ):

:2:2)

(3

It is possible to use line search techniques to determine the optimal time-step for each iteration of the algorithm. Completing a line search at each step of the iteration, however, is computationally expensive and often results in worse stability properties for the overall algorithm. Instead, a simple deterministic formulae for the time-step based on maximising a lower bound

4RlN (Xk ) for (3.2.2) is provided. Lemma 3.2.1 For any Xk 2 St(p n) such that gradRN (Xk ) Xk+1 = e;k Xk XkT N ]Xk , where

6= 0, the recursive estimate

2 T

k = 2ppjjjjXNkXXk X NT ]jjN ]2jj

:2:3)

(3

k k

satisfies 4RN (Xk k ) = RN (Xk+1) ; RN (Xk ) > 0. Proof Denote

Xk+1 ( ) = e; Xk XkT N ]Xk for an arbitrary time-step .

Direct calculations

show

4R0N (Xk ) 4R00N (Xk )

=

;2tr(XkT+1 ( )N XkXkT N ]Xk+1( ))

=

+4tr(

XkT+1 ( )N XkXkT N ]2Xk+1 ( )):

x3-2


67

Taylor’s formula for 4RN (Xk ) gives

4RN (Xk )

=

;2 tr(XkT N XkXkT N ]Xk) Z1

+4 2

0

tr(XkT+1 ( )N XkXkT N ]2Xk+1 ( ))(1 ; s)ds

2 jjXkXkT N ]jj2 Z1 ; 4 2 jjXk+1( )XkT+1 ( )jjjjN XkXkT N ]2jj(1 ; s)ds 0 T 2 2p = 2 jjXkXk N ]jj ; 2 pjjN XkXkT N ]2jj =: 4RlN (Xk ) RlN (Xk ) yields a unique maximum occurring at = k given by (3.2.3). Observe that if gradRN (Xk ) 6= 0 then jjXk XkT N ]jj2 6= 0 and thus RlN (Xk ) > 0. The result follows since 4RN (Xk ) 4RlN (Xk ) > 0. The quadratic nature of

Theorem 3.2.2 Given

N

=

NT

n n matrix and p be an integer with of N by 1 : : : n . For a given estimate

a real symmetric

p n. Denote the eigenvalues Xk 2 St(p n), let k be given by (3.2.3). The Rayleigh gradient algorithm 1

Xk+1 = e;k Xk XkT N ]Xk has the following properties. i) The algorithm defines an iteration on St(p n). ii) Fixed points of the algorithm are critical points of

RN, X 2

St(p n) such that

XX T N ] = 0. The columns of a fixed point of (3.2.1) form a basis for a p-dimensional eigenspace of N .

Xk , for k = 1 2 : : :, is a solution to the algorithm then the real sequence R N (Xk ) is strictly monotonic increasing unless there is some k 2 N with X k a fixed point of the

iii) If

algorithm. iv) Let Xk , for k

=

1 2 : : :, be a solution to the algorithm, then X k converges to a critical

level set of RN on St(p n).

v) All critical level sets of RN are unstable except the set for which the Rayleigh quotient is maximised. The columns of an element of the maximal critical level set form a basis for the maximal eigenspace of N .

68


Chapter 3

T Proof Part i) follows from the observation that e;k Xk Xk N ] is orthogonal. Part ii) is a direct

consequence of Lemma 3.2.1 (since 4RN (Xk k ) = 0 if and only if Xk is a fixed point) and Theorem 3.1.3. Part iii) also follows directly from Lemma 3.2.1. To prove part iv) observe that since St(p n) is a compact set,

RN (Xk ) is a bounded monotonically increasing sequence which must converge. As a consequence Xk converges to some level set of RN such that for any X in this set 4RN (X (X )) = 0. Lemma 3.2.1 ensures that any X in this set is a fixed point of the recursion. If X is a fixed point of the recursion whose columns do not span the maximal p-dimensional

subspace of N then it is clear that there exists an orthogonal matrix U

arbitrarily small and such that RN (UX )

> RN (X ).

2 O(n), with jjU ; In jj

As a consequence, the initial condition

X0 = UX (jjX0 ; X jj small) will give rise to a sequence of matrices Xk that diverges from the level set containing X , Lemma 3.2.1. This proves the first statement of v) while the attractive nature of the remaining fixed points follows from La’Salle’s principle of invariance along with the Lyapunov function V (X ) =

;Pp ; R (X ). N i=1 i

Remark 3.2.3 It is difficult to characterise the exact basin of attraction for the set of matrices whose columns span the maximal p-eigenspace of N . It is conjectured that the attractive basin

2

for this set is all of St(p n) except for other critical points. Remark 3.2.4 For a fixed initial condition X0

Define Hk

=

Xk XkT and observe

2

St(p n) let

Xk be the solution to (3.2.1).

Hk+1 = e;k Hk N ]Hk ek Hk N ]: Thus,

Hk

can be written as a recursion on the set of symmetric rank

fH 2 Rn n j H = H T H 2 = H rank H = pg.

:2:4)

(3

p projection matrices

The algorithm generated in this manner is

known as the double-bracket algorithm (cf. Chapter 2), a discretization of the continuous-time double-bracket equation (3.1.10)

2

x3-3


69

3.3 Computational Considerations In this section two issues related to implementing (3.2.1) is a digital environment are discussed. Results in both the following subsections are aimed at reducing the computational cost asso0 ciated with estimating the matrix exponential e ;k Xk Xk N ], a transcendental n n matrix function. The result presented in Subsection 3.3.1 is also important in Section 3.4.

3.3.1

An Equivalent Formulation

To implement (3.2.1) on conventional computer architecture the main computational cost for 0 each step of the algorithm lies in computing the n n matrix exponential e ;k Xk Xk N ] . The following result provides an equivalent formulation of the algorithm which involves the related

p p transcendental matrix functions “cos” and “sinc”. Define the matrix function sinc : Rp p sinc(A) = Ip ; Observe that Asinc(A)

=

! Rp

p by the convergent infinite sum

A2 + A4 ; A6 + : 3!

5!

7!

sin(A) and thus, if A is invertible, sinc(A)

=

A;1 sin(A).

Define

the matrix function cos(A) by an analogous power series expansion. The matrix functions cos

and sinc are related by cos2 (A) = Ip ; A2 sinc2(A).

N T a real symmetric n n matrix with eigenvalues 1 : : : n , let k , for k = 1 2 : : :, be a sequence of real positive numbers. For X0 2 St(p n) an initial condition that is not a critical point of R N (X ), then, Lemma 3.3.1 Given N

Xk+1

= =

=

e;k Xk XkT N ]Xk Xk cos( k Yk ) ; k XkT NXk sinc( k Yk ) + k NXk sinc( k Yk ) (3.3.1)

where the power expansions for cos( k Yk ) and sinc( k Yk ) are determined by the positive semi-definite matrix Yk2

2 Rp

p

Yk2 = XkT N 2 Xk ; (XkT NXk )2:

:3:2)

(3

70


Chapter 3

Remark 3.3.2 The matrix Yk need not be explicitly calculated as the power series expansions

2

of sinc and cos depend only on Yk2 .

T Proof The proof follows from a power series expansion of e;k Xk Xk N ] Xk ,

Xk+1 =

X 1

!

1 T l (; k Xk Xk N ]) l ! l=0

Xk

:3:3)

(3

Simple algebraic manipulations lead to the relation,

Xk XkT N ]2Xk = ;Xk Yk2

:3:4)

(3

where Yk2 is defined by (3.3.2). Pre-multiplying (3.3.4) by ;XkT provides an alternative formula for Yk2

Yk2 = XkT XkXkT N ]T Xk XkT N ]Xk

which is positive semi-definite. Using (3.3.4) it is possible to rewrite (3.3.3) as a power series in (;Yk2 )

Xk+1 =

1 (; )2m X k

m=0

m)! Xk (;Yk

(2

2

m ) +

! ; k )2m+1 X (X T NX ) ; NX (;Y 2)m k k k k k (2m + 1)!

(

:3:5)

(3

where the first and second terms in the summation follow from the odd and the even power powers of Xk XkT N ]lXk respectively. Rewriting this as two separate power series in (;Yk2 )

Xk+1

= =

1 (; )2m X k

m=0

(2

m ) ;

T k Xk (Xk NXk ) ; NXk

1 (; )2m X k 2 m (;Y )

m)! (;Yk m=0 (2m + 1)! Xk cos( k Yk ) ; k Xk (XkT NXk ) ; NXk sinc( k Yk )

Xk

2

k

and the result follows by rearranging terms.

3.3.2

Padé Approximations of the Exponential

It is also of interest to consider approximate methods for calculating matrix exponentials. In particular, one is interested in methods that will not violate the constraint X k+1

2

St(p n). A

standard approximation used for calculating the exponential function is a Padé approximation

x3-4

Comparison with Classical Algorithms

71

of order (n m) where n > 0 and m > 0 are integers (Golub & Van Loan 1989, pg. 557). For example, a (1,1) Padé approximation of the exponential is

e;k Xk Xk0 N ] = (In + 2k Xk Xk0 N ]);1(In ; 2k Xk Xk0 N ]): A key observation is that when n = m and the exponent is skew symmetric the resulting Padé approximate is orthogonal. Thus,

Xk+1 = (In + 2k Xk Xk0 N ]);1(In ; 2k Xk Xk0 N ])Xk with initial condition X 0

2

:3:6)

(3

St(p n), defines an iteration on St(p n) which approximates the

Rayleigh gradient algorithm (3.2.1). Of course, in practise one would use an algorithm such as Gaussian elimination (Golub & Van Loan 1989, pg. 92) to solve the linear system equations

In + 2k Xk Xk0 N ])Xk+1 = (In ; 2k Xk Xk0 N ])Xk

(

for Xk+1 rather than computing the inverse explicitly. The algorithm defined by (3.3.6) can also be rewritten in a similar form to that obtained in Lemma 3.3.1. Consider the power series expansion

i 1 X

k k 0 ; 1 0 ; 2 XkXk N ] : (In + Xk Xk N ]) = 2 i=0

From here it is easily shown that

;

Xk+1 = ;Xk + 2Xk ; k (Xk(Xk0 NXk) ; NXk ) (Ip + 4k Yk2 );1 where Yk2

2 Rp

2

:3:7)

(3

p is given by (3.3.2).

3.4 Comparison with Classical Algorithms In this section the relationship between the Rayleigh gradient algorithm (3.2.1) and some classical algorithms for determining the maximal eigenspace of a symmetric matrix are investigated. A good discussion of the power method and the steepest ascent method for determining a single

72


Chapter 3

maximal eigenvalue of a symmetric matrix is given by Faddeev and Faddeeva (1963). Practical issues arising in implementing these algorithms along with direct generalizations to eigenspace methods are covered by Golub and Van Loan (1989).

3.4.1

The Power Method

In this subsection the algorithm (3.2.1) in the case where p = 1 is considered. It is shown that

for a certain choice of time-step k the algorithm (3.2.1) is the classical power method. Recall that St(1 n) = S n;1 the (n ; 1)-dimensional sphere in Rn . Theorem 3.4.1 Given N For xk

=

NT a real symmetric n n matrix with eigenvalues 1 : : : n.

2 S n;1 let k be given by

y2 2jjN xxT N ]2jj

k = p 2

where yk

:4:1)

(3

2 R is given by

yk = xTk N 2xk ; (xTk Nxk )2 : For x0

1 2

:4:2)

(3

2 St(1 n) = S n;1 an arbitrary initial condition then:

i) The formulae

xk+1 = e;k xk xTk N ]xk defines a recursive algorithm on S n;1 . ii) Fixed points of the rank-1 Rayleigh gradient algorithm are the critical points of r N on

S n;1 , and are exactly the eigenvectors of N .

iii) If xk , for

k = 1 2 : : : is a solution to the Rayleigh gradient algorithm, then the real sequence rN (xk ) is strictly monotonic increasing, unless x k is an eigenvector of N .

2 S n;1 which is not an eigenvector of N , then yk 6= 0 and sin( k yk ) T xk+1 = cos( k yk ) ; xk Nxk y xk + sin(y k yk ) Nxk :

iv) For a given xk

k

k

:4:3)

(3

x3-4


v) Let xk , for k

=

73

1 2 : : : be a solution to the rank-1 Rayleigh gradient algorithm, then x k

converges to an eigenvector of N .

vi) All eigenvectors of

N,

considered as fixed points of (3.4.3) are unstable, except the

eigenvector corresponding to the maximal eigenvalue 1, which is exponentially stable. Proof Parts i)-iii) follow directly from Theorem 3.2.2. To see part iv) observe that yk

jjgradrN (xk )jj and yk

=

0 if and only if gradrN (xk )

=

0 and xk is an eigenvector of

=

N.

The recursive iteration (3.4.3) now follows directly from Lemma 3.3.1, with the substitution sinc( k yk ) = sin(k ykkyk ) . Parts v) and vi) again follow directly from Theorem 3.2.2. Remark 3.4.2 Equation (3.4.3) involves only Nxk , xTk Nxk and (Nxk )T (Nxk ) vector computations. This structure is especially of interest when sparse or structured matrices

N

considered.

are

2

A geodesic (or great circle) on Sn;1 , passing through x at time t = 0, can be written

(t) = cos(t)x ; sin(t)V where

V

= _ (0)

and evaluating

:4:4)

(3

is a unit vector orthogonal to x. Choosing V (xk )

(t) at time t = k jjgradrN (xk )jj gives (3.4.3).

=

gradrN (xk ) jjgradrN (xk )jj , x = xk

Thus, (3.4.3) is a geodesic

interpolation of (3.1.8) the solution to the rank-1 Rayleigh gradient flow (3.1.7). For a symmetric n

n matrix N = N T the classical power method is computed using the

recursive formula (Golub & Van Loan 1989, pg. 351)

zk

=

xk+1

=

Nxk zk : jjzkjj

(3.4.5)

The renormalisation operation is necessary if the algorithm is to be numerically stable. The following lemma shows that for

N

positive semi-definite and a particular choice of k the

rank-1 Rayleigh gradient algorithm (3.4.3) is exactly the power method (3.4.5). Lemma 3.4.3 Given N

=

N T a positive semi-definite n n matrix. For xk 2 S n;1 (not an

74


Chapter 3

xk sp {x k , N x k } grad r N ( x k ) 0 x k+1 S

N xk

n-1

Figure 3.4.1: The geometric relationship between the power method iterate and the the iterate generated by (3.4.3).

eigenvector of N ) then jjgradrN (xk )jj jjNxjj. Let k be given by

k = jjgradr (x )jj N k 1

jjgradrN (xk )jj where sin;1 jjNxk jj

sin;1

jjgradrN (xk )jj jjNxkjj

:4:6)

(3

2 (0 =2). Then

Nxk = cos( y ) ; xT Nx sin( k yk ) x + sin( k yk ) Nx k k k k k k jjNxkjj yk yk where yk is given by (3.4.2). Proof Observe that jjgradrN (xk )jj2 = yk2

jjNxkjj2;(xTk Nx2)2 0 and thus jjgradrN (xk )jj jjNxkjj. Consider the 2-dimensional linear subspace spfxk Nxkg of Rn . The new estimate xk+1 generated using either (3.4.3) or (3.4.5) will lie in spfxk Nxkg (cf. Figure 3.4.1). Setting Nxk = cos(y ) ; xT Nx sin(yk ) x + sin(yk ) Nx k k k k k y jjNxkjj yk k for

=

> 0 and observing that xk and Nxk are linearly independent then cos(yk ) ; xTk Nxk

and

sin(yk )

yk

=

sin(yk )

yk

=

0

jjNxkjj : 1

N > 0 is positive definite then a real solution to the first relation exists for which yk 2 (0 =2). The time-step value is now obtained by computing the smallest positive root Since

of the second relation.

x3-4


75

N > 0 positive definite in Lemma 3.4.3 ensures that (3.4.3) and (3.4.5) converge ‘generically’ to the same eigenvector. Conversely, if N is symmetric with eigenvalues 1 > > n, 0 > n and jnj > jij then the power method will converge to the eigenvector associated with n while (3.4.3) (equipped with time-step (3.4.1) ) will converge to the eigenvector associated with 1 . Nevertheless, one may still choose k using (3.4.6), with the Choosing

inverse sin operation chosen to lie in the interval sin;1

jjgradrN (xk )jj 2 (=2 ) jjNxkjj

such that (3.4.3) and (3.4.5) are equivalent. In this case the geodesics corresponding to each iteration of (3.4.3) are describing great circles travelling almost from pole to pole of the sphere.

3.4.2

The Steepest Ascent Algorithm

The gradient ascent algorithm for the Rayleigh quotient rN is the recursion (Faddeev & Faddeeva 1963, pg. 430)

zk

=

xk+1

=

xk + sk gradrN (xk ) zk jjzkjj

(3.4.7)

where sk

> 0 is a real number termed the step-size. It is easily verified that the k + 1’th iterate of (3.4.7) will also lie on the 2-dimensional linear subspace spfxk Nxk g of Rn . Indeed for xk not an eigenvector of N , (3.4.3) and (3.4.7) are equivalent when sk = y 2 1

1

k cos( k yk )

;1 :

:4:8)

(3

The optimal step-size for the steepest ascent algorithm (i.e. rN (xk+1 (sk

opt

for any sk

2 R) is (Faddeev & Faddeeva 1963, pg. 433)

))

rN (xk+1 (sk ))

2 sopt k = 2 rN (xk ) ; rN (gradrN (xk )) + frN (xk ) ; rN (gradrN (xk ))] + 4jjrN (xk )jjg

1 2

;1

:

:4:9)

(3

76


Chapter 3

It follows directly that the optimal time-step selection for (3.4.3) is given by

k = y

1

k

cos;1

1

1 + (sk

! yk

opt 2 2 )

:

Substituting directly into (3.4.3) and analytically computing the composition of cos and sin with cos;1 gives

xk+1 =

1

1 + (sk

yk

opt 2 2 )

q

opt 1 ; sk xTk Nxk 2 + (sk )2yk2 opt

q opt

xk + sk

2

s

y Nxk (3:4:10)

opt + ( k )2 k2

with sk given by (3.4.9). This recursion provides an optimal steepest ascent algorithm with opt

scaling factor

3.4.3

1 2 which converges to one as xk converges to an eigenvector of N . +(sopt k ) 2 yk

1

The Generalised Power Method

In both the power method and the steepest ascent algorithm the rescaling operation preserves the computational stability of the calculation. To generalise classical methods to the case where

p > 1, (i.e. Xk 2 St(p n)), one must decide on a procedure to renormalise new estimates to lie on St(p n). Thus, a generalised power method may be written abstractly Zk

=

NXk

Xk+1

=

rescale(Zk ):

(3.4.11)

Since the span of the columns of Xk (denoted sp(Xk )) is the quantity in which one is interested

the rescaling operation is usually computed by generating an orthonormal basis for sp(Zk ) (i.e.

using the Gram-Schmidt algorithm (Golub & Van Loan 1989, pg. 218))) Thus, Xk+1

=

Zk G,

Ip and where G 2 Rp p contains the coefficients which orthonormalise the columns of Zk . When Zk is full rank then G is invertible and the factorization Zk = Xk+1 G;1 can be computed as a QR factorisation of Zk (Golub & Van Loan 1989, pg. 211). The matrix G depends on the particular algorithm employed in computing an orthonormal basis for Zk . XkT+1 Xk+1

=

N > 0 is positive definite the power method will act to maximise the generalised Rayleigh quotient RN (3.1.5). Different choices of G in the rescaling operation, however, will affect the performance of the power method with respect to the relative change in RN at each When

x3-4


77

iteration. The optimal choice of G (for maximising the increase in Rayleigh quotient) for the

k’th step of (3.4.11) is given by a solution of the optimization problem max

fG 2 Rp p j GT ZkT Zk G = Ipg where

Zk

=

NXk .

The cost criterion tr

tr

GT ZkT NZk G

GT ZkT NZk G

=

RZkT NZk (G) is

a Rayleigh

quotient while the constraint set is similar in structure to St(p n). Indeed, it appears that this optimization problem is qualitatively the same as explicitly solving for the principal components of N . One may still hope to obtain a similar result to Lemma 3.4.3 relating the generalised power method to the Rayleigh gradient algorithm (3.2.1). Unfortunately, this not the case except in non-generic cases.

N T a symmetric n n matrix. For any Xk 2 St(p n) let Yk be the unique symmetric, positive semi-definite square root of Yk2 = XkT N 2Xk ; (XkT NXk )2. There exists a matrix G 2 Rp p and scalar k > 0 such that Lemma 3.4.4 Given N

=

NXk G = e;k Xk XkT N ]Xk

:4:12)

(3

if and only if one can solve sin2 ( k Yk )XkT NXk

=

cos( k Yk ) sin( k Yk )Yk

:4:13)

(3

for k . Proof Assume that there exists a matrix G and a scalar k > 0 such that (3.4.12) holds. T Observe that rank(e;k Xk Xk N ]Xk ) = p and thus rank(NXk ) = p. Similarly G 2 Rp p is non-singular. Pre-multiplying (3.4.12) by GT XkT N and using the constraint relation GT XkT N 2 Xk G =

Ip gives

Ip = GT XkT Ne;k Xk XkT N ]Xk :

78


Chapter 3

Since one need only consider the case where G is invertible it follows that

G;1 = XkT ek Xk XkT N ] NXk: Lengthy matrix manipulations yield

XkT Xk XkT N ]2lNXk = (;1)lYk2l XkT NXk

for l = 0 1 : : :

and

XkT Xk XkT N ]2l+1NXk = (;1)l Yk2l+2

for l = 0 1 : : :

:

T Expanding ek Xk Xk N ] as a power series in Yk2 and then grouping terms suitably (cf. Subsection 3.3.1) one obtains

G;1 = cos( k Yk )XkT NXk + sin( k Yk )Yk : T Using (3.3.1) for e;k Xk Xk N ]Xk then (3.4.12) becomes

NXk

= =

e;k XkXkT N ] Xk G;1 Xk cos( Yk ) ; k XkXkT N ]Xksinc( k Yk )

cos( k Yk )XkT NXk + sin( k Yk )Yk

Pre-multiplying this by X kT yields

XkT NXk = cos2 ( k Yk )XkT NXkT + cos( k Yk ) sin( k Yk )Yk and thus sin2 ( k Yk )XkT NXkT

=

cos( k Yk ) sin( k Yk )Yk :

This shows that (3.4.12) implies (3.4.13). If k solves (3.4.13) then defining XkT ek Xk XkT N ]NXk ensures (3.4.12) also holds which completes the proof.

G;1

=

Pp y yT where fy : : : y g is a set of orthonormal eigenvectors for Y , p k 1 i=1 i i i whose eigenvalues are denoted i 0 for i = 1 : : : p, then (3.4.13) becomes Writing Yk

=

p X i=1 Fixing

sin2 ( k i )yi yiT XkT NXk

i and pre-multiplying by yiT

=

p X i=1

i cos( k i ) sin( k i)yiyiT :

while also post-multiplying by y i gives the following

p

:

x3-5


79

equations for k either

sin( k i ) = 0 or cot( k i ) =

1 T T y X NX

i

i

for i = 1 : : : p. It follows that either from the first relation k i

k

=

k yi

m for some integer m or

from the second relation cot( k ) =

i + cot(i)yiT XkT NXk yi i cot(i) + yiT XkT NXk yi

for each i = 1 : : : p. One can easily confirm from this that the p equations will fail to have a

consistent solution for arbitrary choices of Xk and N . Thus, generically the Rayleigh gradient

algorithm (3.2.1) does not correspond to the power generalised method (3.4.11) for any choice of rescaling operation or time-step selection.

3.5 Open Questions and Further Work There remains the issue of characterising the basin of attraction for the Rayleigh gradient algorithm. Simulations indicate that the only points which are not contained in this set are the non-minimal critical point of the generalised Rayleigh quotient, however, proving this is likely to be very difficult. Another area where further insight would be desirable is in the implementation of the (1,1) Padé approximate algorithm (3.3.6). It would seem likely that for the time-steps generated by (3.2.3) the (1,1) Padé approximate algorithm would inherit all the properties of the gradient descent algorithm. This appears to be the case in the simulation studies undertaken. In the earlier comparison between the Rayleigh gradient algorithm and classical numerical linear algebra algorithms no account was taken of the various inverse shift algorithms which tend to be the accepted computational methods. Incorporating the idea of origin shifts into dynamical systems solutions of such linear algebra problems is an important question that has not yet been satisfactorily understood. In Subsection 3.4.1 it was shown that the rank-1 Rayleigh gradient algorithm is closely related to the power method. Also related to the power method is an inverse shift algorithm

80


Chapter 3

N T 2 Rn n be a symmetric matrix and xk 2 Rn be some vector which is not an eigenvector of N then a single step of the Rayleigh iteration is known as the Rayleigh iteration. Let N

k zk+1 x k +1

=

= = =

xTk Nxk xTk ck ; (N ; k In ) 1 xk zk+1 : jjz jj k+1

The Rayleigh iteration converges cubically in a local neighbourhood of any eigenvector of N (Parlett 1974). By comparing the Rayleigh iteration to the power method and the Rayleigh gradient algorithm, one is lead to consider an ordinary differential equation of the form

x_ = (N ; rN (x)In);1 x ; xT (N ; rN (x)In);1 x x where rN (x)

=

xT Nx is the Rayleigh quotient. In the vicinity of an eigenvector of N this xx

differential equation becomes singular and displays finite time convergence to the eigenvector

N corresponding to the eigenvalue i which is the smallest eigenvalue of N such that i > rN (x0 ). The connection between singularly perturbed dynamical systems and shifted of

numerical linear algebra methods is of considerable interest. There is also a connection to the theory of differential/algebraic systems. For example the ordinary differential equation mentioned above is equivalent to the differential/algebraic system

x_

=

0

=

z ; (xT z)x x ; (N ; rN (x)In)z:

Chapter 4

Pole Placement for Symmetric Realisations A classical problem in systems theory is that of pole placement or eigenvalue assignment of linear systems via constant gain output feedback. This is clearly a difficult task and despite a number of important results, (cf. Byrnes (1989) for an excellent survey), a complete solution giving necessary and sufficient conditions for a solution to exist has not been developed. It has recently been shown that (strictly proper) linear systems with

mp > n can be assigned

arbitrary poles using real output feedback (Wang 1992). Here n denotes the McMillan degree of a system having m inputs and p outputs. Of course if mp < n for a given linear system then

generic pole assignment is impossible, even when complex feedback gain is allowed (Hermann & Martin 1977). The case mp = n remains unresolved, though a number of interesting results are available (Hermann & Martin 1977, Willems & Hesselink 1978, Brockett & Byrnes 1981). Present results do not apply to output feedback systems with symmetries or structured feedback systems. More generally, one is also interested in situations where an optimal state feedback gain is sought such that the closed loop response of the system is a best approximation of a desired response, though the exact response may be unobtainable. In such cases one would still hope to find a constructive method to compute the optimal feedback that achieves the best approximation. The problem appears to be too difficult to tackle directly, however, and algorithmic solutions are an attractive alternative.

81

82


Chapter 4

The development given in this chapter is loosely related to a number of recent articles. In particular, Brockett (1989a) considers a least squares matching task, motivated by problems in computer vision algorithms, that is related to the system approximation problem, though his work does not include the effects of feedback. There is also an article by Chu (1992) in which dynamical system methods are developed for solving inverse singular value problems, a topic that is closely related to the pole placement question. The simultaneous multiple system assignment problem considered is a generalisation of the single system problem and is reminiscent of Chu’s (1991a) approach to simultaneous reduction of several real matrices. In this chapter I consider a structured class of systems (those with symmetric state space realisations) for which, to my knowledge, no previous pole placement results are available. The assumption of symmetry of the realisation, besides having a natural network theoretic interpretation, simplifies the geometric analysis considerably. It is shown that a symmetric state space realisation can be assigned arbitrary (real) poles via output feedback if and only if there are at least as many system inputs as states. This result is surprising since a naive counting argument (comparing the number of free variables 12 m(m + 1) of symmetric output feedback

gain to the number of poles n of a symmetric realization having m inputs and n states) would suggest that

1 2

m(m + 1) n is sufficient for pole placement. To investigate the problem further

gradient flows of least squares cost criteria (functions of the matrix entries of realisations) are derived on smooth manifolds of output feedback equivalent symmetric realisations. Limiting solutions to these flows occur at minima of the cost criteria and relate directly to finding optimal feedback gains for system assignment and pole placement problems. Cost criteria are proposed for solving the tasks of system assignment, pole placement, and simultaneous multiple system assignment. The material presented in this chapter is based on the articles (Mahony & Helmke 1993, Mahony et al. 1993). The theoretical material contained in Sections 4.1 to 4.4 along with the simulations in Section 4.5 are based on the journal paper (Mahony & Helmke 1993) while the numerical method proposed in Section 4.6 was presented at the 1993 Conference on Decision and Control (Mahony et al. 1993). Much of the material presented in this chapter was developed in conjunction with the results in the monograph (Helmke & Moore 1994b, Section 5.3), which focusses on general linear systems. The chapter is divided into seven sections. In Section 4.1 the specific problems considered

x4-1


83

in the sequel are formulated and necessary conditions for generic pole placement and system assignment are given. Section 4.2 develops the geometry of the set of symmetric state space systems necessary for the development in later sections. In Section 4.3, a dynamical systems approach to computing systems assignment problems for the class of symmetric state space realizations is proposed while Section 4.4 applies the previous results to the pole placement and the simultaneous multiple system assignment problems. A number of numerical investigations are given in Section 4.4 which substantiate the theory presented in Sections 4.1 to 4.4. In Section 4.6 a numerical algorithm for computing feedback gains for the pole placement problem is presented. The chapter concludes with a discussion of open questions and future work in Section 4.7.

4.1 Statement of the Problem In this section a brief review of symmetric systems is presented before the precise formulations of the problems considered in the sequel are given and a pole placement result for symmetric state space realizations is proved. The reader is referred to Anderson and Vongpanitlerd (1973) for background material network theory. A symmetric transfer function is a proper rational matrix function G(s) 2 Rm m such that

G(s) = G(s)T : For any such transfer function there exists a minimal signature symmetric realisation (Anderson & Vongpanitlerd 1973, pg. 324)

x_ y

= =

Ax + Bu Cx

G(s) such that (AIpq )T = AIpq and C T = Ipq B , with Ipq = diag(Ip ;Iq ), a diagonal matrix with its first p diagonal entries 1 and the remaining diagonal entries -1. A signature symmetric realisation is a dynamical model of an electrical network constructed from p capacitors and q inductors and any number of resistors. of

84


Chapter 4

Static linear symmetric output feedback is introduced to a state space model via a feedback law

u = Ky + v K = K T leading to the “closed loop” system

x_ y

= =

A + BKC )x + Bv B T x: (

In particular, symmetric output feedback, where K

=

KT 2 Rm

:1:1)

(4

m , preserves the structure of

signature symmetric realisations and is the only output feedback transformation that has this property. A symmetric state space system (also symmetric realisation) is a linear dynamical system

x_ y

= =

Ax + Bu A = AT BT x

(4.1.2) (4.1.3)

x 2 Rn , u y 2 Rm, A 2 Rn n , B 2 Rn m . Without loss of generality assume that m n, B is full rank and B T B = Im the m m identity matrix. Symmetric state with

space systems correspond to linear models of electrical RC-networks, constructed entirely of capacitors and resistors. The networks are characterised by the property that the CauchyMaslov1 index coincides with the McMillan degree. The matrix pair (A B ) 2 S (n) where S (n) = fX

fY 2 Rn

2 Rn

O(n m), n j X = X T g the set of symmetric n n matrices and O(n m) =

j X T X = Img is used to represent a linear system of the form (4.1.2) and (4.1.3). The set O(n m) is the Stiefel manifold (a smooth nm ; 12 m(m + 1) dimensional submanifold m

of Rn m ) of n

m matrices with orthonormal columns (cf. Lemma 3.1.1).

Two symmetric state space systems

A B1) and (A2 B2 ) are

( 1

called output feedback

equivalent if

A B2 ) = ( (A1 + B1 KB1T ) T B1 )

( 2

:1:4)

(4

The Cauchy-Maslov index for a real rational transfer function z (s) is defined as the number of jumps of z (s) from ;1 to +1 less the number from +1 to ;1. Bitmead and Anderson (1977) generalise the Cauchy-Maslov index to real symmetric rational matrix transfer functions and show that it is equal to p ; q, the signature of Ipq (Bitmead & Anderson 1977, Corollary 3.3). 1

x4-1


holds for

85

2 O(n) = fU 2 Rn n j U T U = In g the set of n n orthogonal matrices and

K 2 S (m) the set of symmetric m m matrices. Thus the system (A2 B2 ) is obtained from n (A1 B1 ) using an orthogonal change of basis 2 O(n) in the state space R and a symmetric feedback transformation K 2 S (m). It is easily verified that output feedback equivalence is an equivalence relation on the set of symmetric state space systems. Consider the following problem for the class of symmetric state space systems. Problem A Given

(

A B ) 2 S (n) O(n m) a symmetric state space system let (F G) 2

S (n) O(n m) be a symmetric state space system which possesses the desired system structure. Consider the potential (

A B )

: :=

O(n m) ! R jjA ; F jj2 + 2jjB ; Gjj2 Rn n

where jjX jj2 = tr(X T X ) is the Frobenius matrix norm. Find a symmetric state space system

A Bmin) which minimises over the set of all output feedback equivalent systems to (A B ). Equivalently, find a pair of matrices ( min Kmin ) 2 O(n) S (m) such that ( min

( K ) := jj (A + BKB T ) T ; F jj2 + 2jj B ; Gjj2 is minimised over O(n)

2

S (m).

Such a formulation is particularly of interest when structural properties of the desired realisations are specified. For example, one may wish to choose the “target system”

F G)

(

with certain structural zeros. If an exact solution to the system assignment problem exists (i.e.

A Bmin) = 0) it is easily seen that (Amin Bmin) will have the same structural zeros as (F G). For general linear systems it is known that the system assignment problem (for general ( min

feedback) is generically solvable only if there are as many inputs and as many outputs as states. It is not surprising that this is the case for symmetric systems also. Lemma 4.1.1 Let n and m be integers, n

matrix pairs (A B ) 2 S (n) a) If

m

=

n

O(n m).

m, and let (F G) 2 S (n) O(n m). Consider

then for any matrix pair

A B) of the above form,

(

there exist matrices

86


Chapter 4

2 O(n) and K 2 S (m) such that A + BKBT ) T = F

(

B = G:

m < n then the set of (A B) 2 S (n) O(n m) for which an exact solution to the system assignment problem exists is measure zero in S (n) O(n m). (I.e. for almost all systems (A B ) 2 S (n) O(n m) no exact solution to the system assignment problem

b) If

exists.)

m = n then O(n m) = O(n) and BT = B ;1 . choose ( K ) = (GB T GT FG ; B T AB ). Thus,

Proof If

For any

A B) 2 S (n) O(n)

(

A + BKB T ) T = GBT ABGT + GB T B(GT FG ; B T AB)B T BGT = F

(

and B

=

GBT B = G.

To prove part b) observe that since output feedback equivalence is an equivalence relation the set of systems for which the system assignment problem is solvable are exactly those systems which are output feedback equivalent to (F G). Consider the set

F (F G) = f( (F + BGBT ) T G) j ( K ) 2 O(n) S (m)g S (n) O(n m): It is shown in Section 4.2 (Lemma 4.2.1) that

F (F G) is a smooth submanifold of S (n)

O(n m). But F (F G) is the image of O(n) S (m) via the continuous map ( K ) 7! ( (F + BGB T ) T G) and necessarily has dimension at most dim O(n) S (m) = 12 n(n ; 1)+ 12 m(m + 1). The dimension of S (n) O(n m) however is 12 n(n + 1)+(nm ; 12 m(m + 1)) (Helmke & Moore 1994b, pg. 24). Thus, dim S (n)

O(n m) ; dim O(n) S (m) = (n ; m)(m + 1)

which is strictly positive for 0 of S (n)

m < n. Thus, for m < n the set F (F G) is a submanifold

O(n m) of non-zero co-dimension and therefore has measure zero.

A similar task to Problem A is that of pole placement for symmetric state space realizations. The pole placement task for symmetric systems is; given an arbitrary set of numbers

s1

x4-1 : : : sn


87

m m symmetric transfer function G(s) = GT (s) with a symmetric realisation, find a symmetric matrix K 2 S (m) such that the poles of GK (s) = (Im ; G(s)K );1 G(s) are exactly s1 : : : sn . Rather than tackle this problem directly, consider in R and an initial

the following variant of the problem. Problem B Given (A B )

2 S (n) O(n m) a symmetric state space system let F 2 S (n)

be a symmetric matrix. Define

A B ) ( K ) (

:= :=

jjA ; F jj2 jj (A ; BKB T ) T ; F jj2 :

Find a symmetric state space system (Amin Bmin) which minimises over the set of all output

feedback equivalent systems to (A B ). Respectively, find a pair of matrices

O(n) S (m) which minimises over O(n) S (m).

( min

Kmin) 2 2

Problem B minimises a cost criterion that assigns the full eigenstructure of the closed loop system. Two symmetric matrices have the same eigenstructure (up to orthogonal similarity transformation) if and only if they have the same eigenvalues (since any symmetric matrix may be diagonalised via an orthogonal similarity transformation). Thus, Problem B is equivalent to solving the pole placement problem for symmetric systems (assigning the eigenvalues of the closed loop system). The advantage of considering Problem B rather than a standard formulation of the pole placement task lies in the smooth nature of the optimization problem obtained. It is of interest to consider generic conditions on symmetric state space systems for the existence of an exact solution to Problem B (i.e. the existence of

( min Kmin) = 0).

( min

Kmin) such that

This is exactly the classical pole placement question about which much

is known for general linear systems (Byrnes 1989, Wang 1992). The following result answers (at least in part) this question for symmetric state space systems. It is interesting to note that the necessary conditions for “generic” pole placement for symmetric state space systems are much stronger than those for general linear systems. Lemma 4.1.2 Let n and m be integers, n m, and let F Consider matrix pairs (A B ) 2 S (n)

O(n m).

2 S (n) be a real symmetric matrix.

88


n then for any matrix pair (A B ) of the above form, m m such that

2 O(n) and K = K T 2 R

a) If

m

Chapter 4

=

there exist matrices

A + BKBT ) T = F:

:1:5)

(

(4

n then there exists an open set of matrix pairs (A B) 2 S (n) O(n m) of the above form such that eigenstructure assignment (to the matrix F ) is impossible.

b) If m
jjF jj2 and no solution to the eigenstructure assignment problem exists for the system Moreover, the map (A B )

A 0 B 0 ).

(

7! jj(A ; BBT ABBT )jj2 is continuous and it follows that there

is an open neighbourhood of systems around (A0 B 0) for which the eigenstructure assignment task cannot be solved.

Remark 4.1.3 It follows directly from the proof of Lemma 4.1.2 that eigenstructure assignment of a symmetric state space system (A B ) 2 S (n)

F 2 S (n) is possible only if

O(n m) to an arbitrary closed loop matrix

jjF jj2 jjA ; BBT ABBT jj2: 2 Remark 4.1.4 One may weaken the hypothesis of Lemma 4.1.2 considerably to deal with

A B ) 2 S (n) Rn and for which m may be greater matrix pairs

(

m , for which B is not constrained to satisfy B T B

=

Im

than n. The analogous statement is that eigenstructure

assignment is generically possible if and only if rankB

n. The proof is similar to that given

above observing that the projection operator BB T (for

BT B = Im) is related to the general

B (BT B)y BT , where y represents the pseudo-inverse of a matrix. example, the feedback matrix yielding exact system assignment for rankB n is

projection operator

For

K = (BT B )y BT (F ; A)B(BT B )y :

2 A further problem considered is that of simultaneous multiple system assignment. This is a difficult problem about which very little is presently known. The approach taken is to consider a generalisation of the cost criterion for a single system.

90


Problem C For any integer N

Chapter 4

2 N let (A1 B1) : : : (AN BN ) and (F1 G1) : : : (FN GN )

be two sets of N symmetric state space systems. Define

N ( K ) :=

N X i=1

jj (Ai + BiKBiT ) T

Find a pair of matrices ( min Kmin) 2 O(n)

2

; Fijj

2

+2

N X i=1

jj Bi ; Gijj2:

S (m) which minimises N over O(n) S (m).

4.2 Geometry of Output Feedback Orbits It is necessary to briefly review the Riemannian geometry of the spaces on which the optimization problems stated in Section 4.1 are posed. The reader is referred to Helgason (1978) and the development in Chapter 5 for technical details on Lie-groups and homogeneous spaces and, Helmke and Moore (1994b) for a development of dynamical systems methods for optimization along with applications in linear systems theory. The set O(n)

S (m) forms a Lie group under the group operation ( 1 K1) ( 2 K2) =

K1 + K2 ). It is known as the output feedback group for symmetric state space systems. The tangent spaces of O(n) S (m) are

( 1 2

T( K)(O(n) S (m)) = f( ) j 2 Sk(n) 2 S (m)g where

Sk(n) = f 2 Rn

n

j = ;T g the set of n n skew symmetric matrices.

Euclidean inner product on Rn n

The

Rn m is given by

h(A B) (X Y )i = tr(AT X ) + tr(BT Y ):

:2:1)

(4

By restriction, this induces a non-degenerate inner product on the tangent space T(In 0)(O(n)

S (m)) = Sk(n) S (m).

The Riemannian metric considered on

O(n) S (m) is the right

invariant group metric

h(1 1) (2 2 )i = 2tr(T1 2 ) + 2tr( 1T 2 ):

x4-2

Geometry of Output Feedback Orbits

91

The right invariant group metric is generated by the induced inner product on T(In 0)(O(n)

S (m)), mapped to each tangent space by the linearization of the diffeomorphism ( k) 7! ( k + K ) for ( K ) 2 (O(n) S (m)). It is readily verified that this defines a Riemannian metric which corresponds, up to a scaling factor, to the induced Riemannian metric on O(n) S (m) considered as a submanifold of Rn n Rn m . The scaling factor 2 serves to simplify the algebraic expressions obtained in the sequel. Let (A B )

2 S (n) O(n m) be a symmetric state space system. The symmetric output

feedback orbit of (A B ) is the set

F (A B) = f( (A + BT KB) T B) j 2 O(n) K 2 S (m)g of all symmetric realisations that are output feedback equivalent to assumption on the controllability of the matrix pair (A B ) is made.

Lemma 4.2.1 The symmetric output feedback orbit

(

A B ).

:2:2)

(4

Observe that no

F (A B) is a smooth submanifold of

S (n) O(n m) with tangent space at a point (A B) given by

T(AB)F (A B) = f( A] + B BT B) j 2 Sk(n) 2 S (m)g:

:2:3)

(4

Proof The set F (A B ) is an orbit of the smooth semi-algebraic group action

(( K ) (A B)) It follows that

: :=

O(n) S (m)) (S (n) O(n m)) ! (S (n) O(n m)) T T ( (A + B KB ) B ): (4.2.4) (

F (A B) is a smooth submanifold of S (n)

Rn m (cf. Proposition 5.2.2 or

Gibson (1979, Appendix B)). For an arbitrary matrix pair (A B ) the map

f ( K ) := ( (A + BKB T ) T B) S (m) onto F (A B) (Gibson 1979, pg. space of F (A B ) at (A B ) is the range of the linearization of f at (In 0) is a smooth submersion of O(n)

74). The tangent

92


T(In 0)f

Chapter 4

T(In0)(O(n) S (m)) ! T(AB )F (A B) ( ) 7! ( A] + B B T B ) ( ) 2 Sk(n) S (m):

:

The space F (A B ) is also a Riemannian manifold when equipped with the so-called normal

A B ) 2 S (n) O(n m) a symmetric state space system and consider the map f ( K ) := ( (A + BKB T ) T B ). The tangent map T(In0) f induces metric (cf. Section 5.3). Fix

(

a decomposition

T(In 0)(O(n) S (m)) = ker T(In0) f dom T(In 0)f where ker T(In 0)f

=

f(? ?) 2 Sk(n) S (m) j A ?] = B ? BT ?B = 0g

is the kernel of T(In 0)f and dom T(In 0)f

=

f(? ?) 2 Sk(n) S (m) j tr((?)T ? ) = 0 tr(( ?)T ? ) = 0 for all (? ? ) 2 ker T(In0) f g

is the orthogonal complement of the kernel with respect to the Euclidean inner product (4.2.1). Formally, the normal Riemannian metric on F (A B ) is the inner product (4.2.1) on

T(In 0)(O(n) S (m)) restricted to dom T(In0) f and induced on T(AB)F (A B ) via the isomorphism T(In 0)f ? : dom T(In 0)f ! T(AB )F (A B ), the restriction of T(In 0)f to dom T(In 0)f . Thus, for two tangent vectors (i A] + B i B T iB ) 2 T(AB )F (A B ), i = 1 2, the normal Riemannian metric is computed as

hh(1 A] + B 1 BT 1B) (2 A] + B 2 BT 2B)ii = 2tr((?1 )T ?2 ) + 2tr(( 1? )T ?2 ): (?i ?i ) 2 ker T(In0) f dom T(In0)f for i = 1 2. It is readily verified that this construction defines a Riemannian metric on F (A B ).

Here

(i i ) = ((i)? ( i )? )

x4-3


93

4.3 Least Squares System Assignment In this section Problem A is considered, i.e. the question of computing a symmetric state space linear system in a given orbit F (A B ) that most closely approximates a given “target” system in a least squares sense. A brief analysis of the cost functions

and

is given leading

to existence results for global minima. Gradient flows of the cost functions are derived and existence results for their solutions are given. Lemma 4.3.1 Let (F G) (A B ) 2 S (n) a) The function : O(n)

O(n m) be symmetric state space linear systems.

S (m) ! R,

( K ) := jj (A + BKB T ) T ; F jj2 + 2jj B ; Gjj2 has compact sublevel sets. I.e. the sets f( K ) any 0, are compact subsets of O(n)

2 O(n) S (m) j ( K ) g for

S (m).

b) The function : F (A B ) ! R,

A B ) := jjA ; F jj2 + 2jjB ; Gjj2

(

has compact sublevel sets.

Proof The triangle inequality yields both

jjK jj2 = jjBKBT jj2 2(jjA + BKBT jj2 + jjAjj2) and

jjA + BKBT jj2 = jj (A + BKBT ) T jj2 2(jj (A + BKBT ) T ; F jj2 + jjF jj2): Thus, for ( K ) 2 O(n)

jjK jj2

S (m) one has

2(2(jj (A + BKB T ) T

; F jj2 + jjF jj2) + jjAjj2) 4(jj (A + BKB T ) T ; F jj2 + 2jj B ; Gjj2) + 4jjF jj2 + 2jjAjj2

94

Pole Placement for Symmetric Realisations =

Chapter 4

4 ( K ) + 4jjF jj2 + 2jjAjj2

where a factor of 8jj B ; Gjj2 is added to the middle line to give the correct terms for the cost

. Thus, for ( K ) 2 O(n) S (m), satisfying ( K ) , one has

jj( K )jj2

=

jj jj2 + jjK jj2

tr( T ) + 4( K ) + 4jjF jj2 + 2jjAjj2 n + 4 + 4jjF jj2 + 2jjAjj2 and the sublevel sets of

are bounded.

Since

is continuous the sublevel sets are closed

and compactness follows directly (Munkres 1975, pg. 174). Part b) follows by observing that

= f , where f ( K ) = ( (A + BKBT ) T B ) for given (A B). Thus, the sublevel sets of are exactly the images of the corresponding sublevel sets of via the continuous map f . Since continuous images of compact sets are themselves compact (Munkres 1975, pg. 167) the proof is complete.

Corollary 4.3.2 Let

(

F G) (A B) 2 S (n) O(n m) be symmetric state space linear sys-

tems. a) There exists a global minimum ( min Kmin) 2 O(n)

S (m) of ,

( min Kmin) = inff( K ) j ( K ) 2 O(n) S (m)g: b) There exists a global minimum (Amin Bmin ) 2 F (A B ) of ,

A Bmin ) = inff(A B) j (A B) 2 F (A B)g:

( min

c) The submanifold F (A B ) S (n)

O(n m) is closed in S (n)

Proof To prove part a), choose 0 such that the sublevel set J is non empty. Then

f( K ) j ( K ) g

jJ : J ! 0 ] is a continuous map2 from a compact space into the

Let f : M ! N be a map between two sets M and N . Let U the restriction of f to the set U . 2

=

Rn m .

M be a subset of M , then f

j

U :U

!

N is

x4-3


95

reals and the minimum value theorem (Munkres 1975, pg. 175) ensures the existence of ( min

Kmin). The proof of part b) is analogous to that for part a).

F (A B) is not closed. Choose a boundary point (F G) 2 F (A B);F (A B) in the closure3 of F (A B). By part b) there exists a minimum (Amin Bmin) 2 F (A B) such that To prove c) Assume that

A Bmin)

( min

=

inf f(A B ) j (A B ) 2 F (A B )g:

=

0

since (F G) is in the closure of F (A B ). But this implies jjAmin;F jj2 +2jjBmin ;Gjj2

=

0 and

consequently (Amin Bmin ) = (F G). This contradicts the assumption that (F G) 62 F (A B ).

Having determined the existence of a solution to the system assignment problem one may consider the problem of computing the global minima of the cost functions and . Theorem 4.3.3 Let (A B ) (F G) 2 S (n)

: F (A B ) ! R

(

O(n m) be symmetric state space systems. Let

A B) := jjA ; F jj2 + 2jjB ; Gjj2

:3:1)

(4

measure the Euclidean distance between two symmetric realisations. Then a) The gradient of with respect to the normal metric is

0 1 T ; GB T )] + BB T (A ; F )BB T ; A ( A F ] + BG B CA : grad(A B ) = @ (A F ] + BGT ; GB T )B

:3:2)

(4

b) The critical points of are characterised by

A F ]

=

0

=

GBT ; BGT B T (A ; F )B:

:3:3)

(4

Let U M be a subset of a topological space M . The closure of U , denoted U , is the intersection of all closed sets in the topology which contain the set U . 3

96


Chapter 4

_ B _ ) = ;grad(A B ), c) Solutions of the gradient flow (A

A_ B_

= =

A (A F ] + BGT ; GBT )] ; BBT (A ; F )BBT ;(A F ] + BGT ; GBT )B

:3:4)

(4

exist for all time t 0 and remain in F (A B ). d) Any solution to (4.3.4) converges as t ! 1 to a connected set of matrix pairs (A B ) 2

F (A B) which satisfy (4.3.3) and lie in a single level set of .

Proof The gradient is computed using the identities4

Dj(AB) ( ) = hhgrad(A B ) ii = ( A] + B BT B) 2 T(AB )F (A B) grad(A B ) 2 T(AB )F (A B )

i ii] ]

Computing the Fréchet derivative of in direction ( A] + B BT B ) gives

Dj(AB ) ( A] + B BT B ) T T T = 2tr((A ; F ) ( A] + B B )) + 4tr((B ; G) B ) T T (4.3.5) = 2tr((;A ; F A] + 2B (B ; G) )) + 2tr(B (A ; F )B ) T T T T T T = hh (A F ] + BG ; GB ) A] + BB (A ; F )BB (A F ] + BG ; GB )B T ( A] + B B B )ii: When deriving the above relations it is useful to recall that (

A] + B BT B ) = (? A] + B ? BT ?B)

(? ?), (cf. the discussion of normal metrics at the end of Section 4.2). Observing that (A F ] + BGT ; GB T ) 2 Sk(n) while B T (A ; F )B 2 S (m) where

( ) = (? ? )

completes the proof of part a). 4

D (A B) () is the Fréchet derivative of in direction (Helmke & Moore 1994b, pg. 334). j

x4-3


97

To prove b), observe that the first identity ensures that the Fréchet derivative at a critical point (grad = 0) is zero in all tangent directions. Setting (4.3.5) to zero, yields 2tr((F A] + GB T for arbitrary ( ) 2 Sk(n)

; BGT )) + 2tr(BT (A ; F )B ) = 0

S (m) and (A B) a critical point of .

For given initial conditions (A(0) B (0)) solutions of (4.3.4) will remain in the sublevel

set

f(A B) 2 F (A B) j (A B) (A(0) B(0))g.

Since this set is compact, Lemma

4.3.1, infinite time existence of the solution follows. This proves c) while d) follows from an application of La’Salle’s invariance principle. Remark 4.3.4 Let N (s)D(s);1 be a coprime factorisation of the symmetric transfer function

G(s) = BT (sIn ; A);1 B. Then the coefficients of the polynomial matrix N (s) 2 Rs]m m are invariants of the flow (4.3.4). In particular the zeros of the system (A B BT ) are invariant under the flow (4.3.4) (Kailath 1980, Exercise 6.5-4, pg. 464). 2 Remark 4.3.5 It would be desirable to interpret the equilibrium condition (4.3.3) in terms of the properties of the linear system (A B ). Unfortunately, this appears to be a difficult task.

2

The above theorem provides a method of investigating best approximations to a given “target system” within a symmetric output feedback orbit. However, it does not provide

t K (t)). To obtain group O(n) S (m) is proposed.

any explicit information on the changing feedback transformations such information a related flow on the output feedback

( ( )

The following result generalises work by Brockett (1989a) on matching problems. Brockett considers similar cost functions but only allows state space transformations rather than output feedback transformations. Theorem 4.3.6 Let (A B ) (F G) 2 S (n)

O(n m) be symmetric state space linear systems.

Define

( K ) then:

: :=

O(n) S (m) ! R jj (A + BKB T ) T ; F jj2 + 2jj B ; Gjj2

(4.3.6)

98


Chapter 4

a) The gradient of with respect to the right invariant group metric is

0 1 T ) T F ] + ( BGT ; GB T T ) ( A + BKB B CA : grad ( K ) = @ T T T B (A + BKB ; F )B

:3:7)

(4

b) The critical points of are characterised by

F (A + BKB T ) T ] K

= =

BGT ; GB T T ) BT ( T F ; A)B: (

:3:8)

(4

_ K _ ) = ;grad ( K ) c) Solutions of the gradient flow ( _

=

K_

=

; (A + BKBT ) T F ] ; ( BGT ; GBT T ) ;BT (A + BKBT ; T F )B

exist for all time

t 0 and remain in a bounded subset of O(n) S (m).

:3:9)

(4

Moreover,

t ! 1 any solution of (4.3.9) converges to a connected subset of critical points in O(n) S (m) which are contained in a single level set of .

as

d) If ( (t) K (t)) is a solution to (4.3.9) then (

A(t) B(t)) = ( (t)(A + BK (t)BT ) (t)T (t)T B)

is a solution of (4.3.4).

Proof The computation of the gradient is analogous to that undertaken in the proof of Theorem 4.3.3 while the characterisation of the critical points follows directly from setting (4.3.7) to zero. The proof of c) is also analogous to the proof of parts c) and d) in Theorem 4.3.3. The linearization of f ( K ) := ( (A + BKB T ) T T B ) is readily computed to be

T( K )f ( ) = ( A] + B BT B) where A = (A + BKBT ) T and B

B. The image of ( _ K_ ) via this linearization is

=

x4-3


T( K)f ( _ K_ )

Consequently

=

99

A A F ] + (BGT ; GB)] + BBT (A ; F )BBT ; (A F ] + BGT ; GBT )B :

A B_ ) = ;grad(A B).

( _

Classical O.D.E. uniqueness results complete the

proof. The following lemma provides an alternative approach (to that given in Lemma 4.3.1) for determining a bound on the feedback gain K (t). The method of proof for the following result is of interest and the result obtained is somewhat tighter than that obtained in Lemma 4.3.1. Lemma 4.3.7 Let ( (t) K (t)) be a solution of (4.3.9). Then the bound

jjK (t) ; K (0)jj2 12 (T (0) K (0)) holds for all time. Proof Integrating out (4.3.9) for initial conditions ( 0 K0) and then taking norms gives the integral bound

jj (t) ; 0 jj2 + jjK (t) ; K0jj2

Zt jj grad( ( ) K ( ))d jj2 Z t0 jjgrad( ( ) K ( ))jj2d 0 Z 1 t hgrad( ( ) K ( )) grad( ( ) K ( ))id: = 2 =

0

Also

d ( (t) K (t)) = ;hgrad ( (t) K (t)) grad( (t) K (t))i dt and thus, integrating between 0 and t and recalling that 0 ( (t) K (t)) ( (0) K (0)) for all t 0 one obtains

R thgrad( ( ) K ( )) grad( ( ) K ( ))id 0

=

( (0) K (0)) ; ( (t) K (t)) ( (0) K (0))

100


Chapter 4

and consequently

jj (t) ; 0 jj2 + jjK (t) ; K0jj2 12 ( (0) K (0)): The result follows directly. It is advantageous to consider a closely related flow that evolves only on than the full output feedback group

O(n) S (m).

O(n) rather

The following development uses similar

techniques to those proposed by Chu (1992). Let (A B ) 2 S (n)

O(n m) be a given symmetric state space system and define

L = fBKBT j K 2 S (m)g to be the linear subspace of S (n) corresponding to the range of the linear map K

7! BKBT .

Define L? to be the orthogonal complement of L with respect to the Euclidean inner product

on Rn n . The projection operators

P : S (n) ! L

P(X ) := BB T XBB T

:3:10)

(4

and

Q : S (n) ! L? Q(X ) := (I; P)(X ) = X ; BB T XBB T are well defined. Here Irepresents the identity operator and BT B

=

:3:11)

(4

Im by assumption. The

tangent space of O(n) at a point is T O(n) = f j 2 Sk(n)g with Riemannian metric

h1 2 i = 2tr(1T 2 ), corresponding to the right invariant group metric on O(n). Theorem 4.3.8 Let

A B) (F G) 2 S (n) O(n m) be

(

symmetric state space systems.

Define

then,

: :=

O(n) ! R jjQ(A ; T F )jj2 + 2jj B ; Gjj2

(4.3.12)

x4-3


101

a) The gradient of with respect to the right invariant group metric is grad ( ) = Q(A ; T F ) T F ] + ( BGT

; GBT T ) :

b) The critical points 2 O(n) of are characterised by

F Q(A ; T F ) T ] = ( BGT ; GB T T ):

and correspond exactly to the orthogonal matrix component of the critical points (4.3.8) of . c) The negative gradient flow minimising is

F Q(A ; T F ) T ] ; ( BGT ; GB T T )

_ =

Solutions to this flow exist for all time t

(0) = 0

:

:3:13)

(4

0 and converge as t ! 1 to a connected set

of critical points contained in a level set of .

Proof The gradient and the critical point characterisation are proved as for Theorem 4.3.3. The equivalence of the critical points is easily seen by solving (4.3.8) for of

K.

independently

Part c) follows from the observation that (4.3.13) is a gradient flow on a compact

manifold. Fixing

constant in the second line of (4.3.9) yields a linear differential equation in

with solution

K (t) = e;t (K (0) + B T (A ; T F )B) ; B T (A ; T F )B: It follows that K (t) ! ;B T (A ; T F )B as t ! 1. Observe that

( )

= = =

jjQ(A ; T F )jj2 + 2jj B ; Gjj2 jj (A + B(BT (A ; T F )B)BT ) T ; F jj2 + 2jj B ; Gjj2 ( ;B T (A ; T F )B):

K

102


Recall also that for exact system assignment it has been shown that

A)B , Lemma 4.1.2.

Chapter 4

K

=

B T ( F T ;

Thus, it is reasonable to consider solutions (t) of (4.3.13) along with

continuously changing feedback gain

K (t) = B T ( (t)T F (t) ; A)B

:3:14)

(4

as an approach to solving least squares system assignment problems. A numerical scheme based on this approach is presented in Section 4.6.

4.4 Least Squares Pole placement and Simultaneous System Assignment Having developed the necessary tools it is a simple matter to derive gradient flow solutions to Problem B and Problem C described in Section 4.1. Corollary 4.4.1 Pole Placement Let (A B )

system and let F

2 S (n) O(n m) be a symmetric state space

2 S (n) be a given symmetric matrix. Define

: F (A B ) ! R

: O(n)

A B) 7! jjA ; F jj2 S (m) ! R ( K ) 7! jj (A + BKB T ) T ; F jj2 (

then a) The gradient of and with respect to the normal and the right invariant group metric respectively are

0 1 T (A ; F )BB T ; A A F ]] + BB B CA grad(A B ) = @ A F ]B

:4:1)

(4

and

0 B grad ( K ) = @

1

A + BKBT ) T F ] C A: BT (A + BKB T ; T F )B (

:4:2)

(4

x4-4


103

b) The critical points of and are characterised by

A F ] BT (A ; F )B

=

0

=

0

(4.4.3)

and (

A + BKBT ) T F ] BT ( T F ; A)B

=

0

=

K

(4.4.4)

respectively. _ B _ ) = ;grad(A B ) c) Solutions of the gradient flows ( A

A_ B_ exist for all time converges as t

t

= =

A A F ]] ; BBT (A ; F )BB T ;A F ]B

0 and remain in

F (A B).

:4:5)

(4

Moreover, any solution of (4.4.5)

! 1 to a connected set of matrix pairs (A B) 2 F (A B) which satisfy

(4.4.3) and lie in a single level set of . _ K _ ) = ;grad ( K ) d) Solutions of the gradient flow (

exist for all time as t

_

=

K_

=

; (A + BKBT ) T F ] ;BT (A + BKBT ; T F )B

t 0 and remain in a bounded subset of O(n) S (m).

:4:6)

(4

Moreover,

! 1 any solution of (4.4.6) converges to a connected subset of critical points in

O(n) S (m) which are contained in a single level set of .

e) If ( (t) K (t)) is a solution to (4.4.6) then

t A + BK (t)B T ) T (t) T (t)B

( )(

is a

solution of (4.4.5). Proof Consider the symmetric state space system (A B )

2 S (n) O(n m) and the matrix

F G0) 2 S (n) Rn m where G0 is the n m zero matrix. Observe that (A B ) = (A B ) + 2jjB jj2 and similarly ( K ) = ( K ) + 2jjB jj2 , where and are given by

pair

(

104


(4.3.1) and (4.3.6) respectively. Since the norm

Chapter 4

jjBjj2 is constant on F (A B) the structure

of the above optimization problems is exactly that considered in Theorem 4.3.3 and Theorem 4.3.6. The results follow as direct corollaries. Similar to the discussion at the end of Section 4.3 the pole placement problem can be solved by a gradient flow evolving on the orthogonal group O(n) alone. Corollary 4.4.2 Let

A B ) 2 S (n) O(n m) be a symmetric state space system and let

(

F 2 S (n) be a symmetric matrix. Define ' '

: :=

O(n) ! R jjQ(A ; T F )jj2

where Q(X ) = (I; P)(X ) = X ; BBT XBB T (4.3.11). Then, a) The gradient of ' with respect to the right invariant group metric is grad'( ) = Q(A ; T F ) T F ] : b) The critical points 2 O(n) of ' are characterised by

F Q(A ; T F ) T ] = 0:

and correspond exactly to the orthogonal matrix component of the critical points (4.4.4) of '. c) The negative gradient flow minimising ' is

F Q(A ; T F ) T ]

_ =

Solutions to this flow exist for all time t

(0) = 0

:

:4:7)

(4

0 and converge as t ! 1 to a connected set

of critical points contained in a level set of ' .

Proof Consider the matrix pair (F G0) 2 S (n)

Rn m where G0 is the n

m zero matrix.

It is easily verified that ( ) = '( ) + 2jjB jj2 where is given by (4.3.12). The corollary

follows as a direct consequence of Theorem 4.3.8.

x4-4


105

Simultaneous system assignment is known to be a hard problem which generically does not have an exact solution. The best that can be hoped for is an approximate solution provided by a suitable numerical technique. The following discussion is a direct generalisation of the development given in Section 4.3. The generalisation is similar to that employed by Chu (1991a) when considering the simultaneous reduction of real matrices. For any integer N

2 N let (A1 B1 ) : : : (AN BN ) 2 S (n) O(n m) be given symmetric

state space systems. The output feedback orbit for the multiple system case is

F ((A1 B1) : : : (AN BN )) := f( (A1 + B1 KB1T ) T B1) : : : ( (AN + BN KBNT ) T BN ) j

2 O(n) K 2 S (m)g: An analogous argument to Lemma 4.2.1 shows that F ((A1 B1 ) : : : (AN BN )) is a smooth manifold. Moreover, the tangent space is given by

T((A B ):::(AN BN ))F ((A1 B1) : : : (AN BN )) f( A1] + B1 B1T B1) : : : ( AN ] + BN BNT BN ) j 2 Sk(n) 2 S (m)g: 1

1

Indeed, F ((A1 B1) : : : (AN BN )) is a Riemannian manifold when equipped with the normal metric, defined analogously to the normal metric on F (A B ).

Corollary 4.4.3 For any integer N

2 N let (A1 B1) : : : (AN BN ) and (F1 G1) : : :, (FN GN )

be two sets of N symmetric state space systems. Define N

=

A B1 ) : : : (AN BN ))

:=

N (( 1

F ((A1 B1) : : : (AN BN )) ! R N X jjAi ; Fijj2 + 2jjBi ; Gijj2 i =1

and

Then,

N

=

N ( K )

:=

O(n) S (m) ! R

N X i=1

jj (Ai + Bi KBiT ) T ; Fijj2 + 2jj Bi ; Gijj2 :

106


Chapter 4

N

with respect to the normal and the right

N

a) The negative gradient flows of

and

invariant group metric are

A_ i

=

Ai

B_ i

=

;

N X

j =1 N X j =1

N X

Aj Fj ] + Bj GTj ; Gj BjT ] ;

j =1

Bi BjT (Aj ; Fj )BjT Bi

Aj Fj ] + Bj GTj ; Gj BjT )Bi

(

(4.4.8)

for i = 1 : : :N , and _

=

K_

=

N X

j =1 N X

;

Aj Fj ] + Bj GTj ; Gj BjT

j =1

BjT (Aj + Bj KBjT ; Fj T )Bj

(4.4.9)

respectively. b) The critical points of N and N are characterised by

N X N X j =1

Aj Fj ]

=

BjT (Aj ; Fj )Bj

=

j =1

N X j =1

Gj BjT ; Bj GTj

0

(4.4.10)

and

N X j =1

T T (Aj + Bj KBj ) Fj ]

=

K

=

N X j =1 N X j =1

Bj GTj ; Gj BjT T

BjT ( Fj T ; Aj )Bj

(4.4.11)

respectively. c) Solutions of the gradient flow (4.4.8) exist for all time t 0 and remain in F ((A 1 B1),

: : : (AN BN )). Moreover, any solution of (4.4.8) converges as t ! 1 to a connected set of matrix pairs ((A 1 B1), : : :, (AN BN )) 2 F ((A1 B1), : : :, (AN BN )) which satisfy (4.4.10) and lie in a single level set of N .

x4-5

Simulations

d) Solutions of the gradient flow (4.4.9) exist for all time

107

t 0 and remain in a bounded

t ! 1 any solution of (4.4.9) converges to a connected subset of critical points in O(n) S (m) which are contained in a single level set of N . subset of

O(n) S (m).

Moreover, as

e) If ( (t) K (t)) is a solution to (4.4.6) then (Ai(t) Bi(t)) = ( (Ai +Bi KBiT ) T Bi ), for i = 1 : : : N , is a solution of (4.4.8).

Proof Observe that the potentials N and N are linear sums of potentials of the form and

considered in Theorem 4.3.6 and Theorem 4.3.3.

The proof is then a simple generalisation

of the arguments employed in the proofs of these theorems.

4.5 Simulations A number of simulations studies have been completed to investigate the properties of the gradient flows presented and obtain general information about the system assignment and pole placement problems5. In the following simulations the solutions of the ordinary differential equations considered were numerically estimated using the MATLAB function ODE45. This function integrates ordinary differential equations using the Runge-Kutta-Fehlberg method with an automatic step size selection. Numerical integration is undertaken using fourth order Runge-Kutta method while the accuracy of each iteration over the step length is checked against a fifth order method. At each step of the interpolation the step length is reduced until the error between the fourth and fifth order method estimates is less than a pre-specified constant E undertaken the error bound was set to

E=1

> 0. In the simulations

10;7, this allowed for reasonable accuracy

without excessive computational cost. Due to Lemma 4.1.1 one does not expect to see convergence of the solution of (4.3.4) to an exact solution of the System Assignment problem for arbitrary initial condition (unless n = m in which case a solution can be computed algebraically). The typical behaviour of solutions to 5 Indeed, computing the gradient flows (4.3.4) and (4.4.1) has already helped in understanding of the pole placement and system assignment tasks since it was the non-convergence of the original simulations that lead to a further investigation of the existence of exact solutions to the problems, and eventually to Lemmas 4.1.1 and 4.1.2.


Chapter 4

Potential Ψ

108

Time

Figure 4.5.1: Plot of (A(t) B (t)) verses t for (A(t) B (t)) a typical solution to (4.3.4).

(4.3.4) is shown in Figure 4.5.1, where the potential, (A(t) B (t)), for (A(t) B (t)) a solution to (4.3.4), is plotted verses time. The potential is plotted on log10 scaled axis for all the plots presented to display the linear convergence properties of the solution. The initial conditions

A B0) 2 S (5) O(5 4) and the target system (F G) 2 S (5) O(5 4) were randomly generated apart from symmetry and orthogonality requirements. The state dimension, n = 5, and the input and output dimension, m = 4, were arbitrarily chosen. Similar behaviour is obtained for all simulations for any choice of n and m for which m < n. In Figure 4.5.1, observe that the potential converges to a non-zero constant limt!1 (A(t) B (t)) = 9:3. For ( 0

the limiting value of the solution to be an exact solution to the system assignment problem one would require limt!1 (A(t) B (t)) = 0. In contrast, Lemma 4.1.2 ensures only that the pole placement task is not solvable on some open set of symmetric state space systems but leaves open the question of whether other open sets of systems exists for which the pole placement problem is solvable. Simulations show that the pole placement problem is indeed solvable for some open sets of symmetric state space systems. Figure 4.5.2 shows a plot of the potential (A(t) B (t)) (cf. Corollary 4.4.1) verses

A t B (t)) a solution to (4.4.5). The initial conditions and target matrix here were the initial conditions (A 0 B0 ) and the state matrix F , from (F G), used to generate Figure time for

( ( )

4.5.1. The plot clearly shows that the potential converges exponentially (linearly in the log10 verses unscaled plot) to zero. Consequently, the solution (A(t) B (t)) converges to an exact

solution the pole placement problem, limt!1 A(t)

=

F.

Comparing Figures 4.5.1 and 4.5.2

and recalling that they were generated using the same initial conditions, one sees explicitly that

x4-5

Simulations

A

( (40)

Simulation 1 2 3 5 6 7 8 9 10 11

2:63 2:09 5:65 3:35 3:16 1:62 1:05 3:68 1:20 2:72

109

B(40)) 10;10 10;9 10;9 10;10 10;11 10;11 10;10 10;10 10;8 10;8

Table 4.5.1: Potentials (Ai(40) Bi (40)) for experiments i = 1 : : : 10 where (Ai (t) Bi (t)) is a solution to (4.3.4) with initial conditions (A i (0) Bi (0)) = (A0 + Ni Ui B0 ) S (n) O(n m). Here Ni = NiT is a randomly generated symmetric matrix with Ni 0:25 and Ui O(n) is an randomly generated orthogonal matrix with Ui In 0:25.

2

jj ; jj

2 jj jj

the system assignment problem is strictly more difficult than the pole placement problem. One may ask does the particular initial condition

A B0) lie in an open set of initial

( 0

conditions for which the pole placement problem can be exactly solved. A series of ten simulations was completed, integrating (4.4.5) for initial conditions (A i Bi ) close to (A0 B0 ),

jjA0 ; Aijj + jjB0 ; Bijj 0:5. Each integration was carried out over a time interval of forty seconds and the final potential (A(40) B (40)) for each simulation is given in Table 4.5. The

plot of verses time for each simulation was qualitatively the same as Figure 4.5.2. It is my conclusion from this that the pole placement problem could be exactly solved for all initial conditions in a neighbourhood of (A0 B0 ).

Remark 4.5.1 It may appear reasonable that the pole placement problem could be solved for all initial conditions with state matrix A 0 in a neighbourhood of the desired structure

F.

In

fact simulations have shown this to be false.

2 O(n n ; m) be a matrix orthogonal to B, (i.e. BT C = 0). Observe that a solution to the pole placement problem requires T F ; A = BKB T and thus Let C

Since

T F C ; AC

=

0 =)

F C ; AC = 0:

A and C are specified by the initial condition (the span of C is the important object)


Chapter 4

Potential Φ

110

Time

Figure 4.5.2: Plot of (A(t) B (t)) verses t for (A(t) B (t)) a solution to (4.4.5) with initial conditions (A 0 B0 ) for which the solution (A(t) B (t)) converges to a global minimum of .

then

2

Rn n must lie in the linear subspace defined by the kernel of the linear map

7! F C ; AC .

Of course

intersection of the kernel of

must also lie in the set of orthogonal matrices and the

7! F C ; AC with the set of orthogonal matrices provides

an exact criterion for the existence of a solution to the pole placement problem. The difficulty for initial conditions where jjA0 ; F jj is small is related to the fact that the

solution to the pole placement problem for initial conditions (A 0 B0 ) = (F B0 ), (i.e. the state

matrix already has the desired structure), is given by the matrix pair (In 0) in the output feedback group. The matrix In lies at an extremity of

2 O(n) S (m)

O(n) in Rn

n and

A B0) may shift the kernel of the linear map

7! F C ; A0 C such that it no longer intersects with O(n). 2 it is reasonable that small perturbations of

( 0

An advantage, mentioned in Section 4.3, in computing the limiting solution of (4.4.7) (Figure 4.5.3) compared to computing the full gradient flow (4.4.5) (Figure 4.5.2) is the associated reduction in the order of the O.D.E. that must be solved. Interestingly, it appears that the solutions of the projected flow (4.4.7) will also converge more quickly than those of (4.4.5). Figure 4.5.3 shows the potential '( (t)) (cf. Corollary 4.4.2) verses time for

t

In while the specified symmetric state space system used for computing the norm ' was (A0 B0) the initial conditions for Figures 4.5.1 and 4.5.2. Observe that from time t = 1:2 to t = 2, Figure

( )

a solution to (4.4.7). The initial conditions for this simulation were

0 =

4.5.3 displays unexpected behaviour which I interpret to be numerical error. The presence of this error is not surprising since the potential (and consequently the gradient) is of order

x4-5

Simulations

Simulation 1 2 3 4 5

=

2.05 1.73 2.03 0.52 1.6

53 43.5 27.75 20 44

25.85 25.14 13.66 38.46 27.5

111

Table 4.5.2: Linear rate of convergence for the solution of (4.4.5), given by , and (4.4.7) given by . The final column shows the ratio between the rates of convergence for the two differential equations.

10;12

E 2, where E is the error bound chosen for the ODE routine in MATLAB. The

relationship of numerical error to order of the potential was checked by adjusting the error bound E in a number of early simulations. The exponential (linear) convergence rates of the solution to (4.4.7) and to (4.4.5) are computed by reading off the slope of the linear sections of plots 4.5.2 and 4.5.3. For the example shown in Figures 4.5.2 and 4.5.3 convergence of the solutions is characterised by

A t B(t)) e;t '( (t)) e; t

( ( )

where

= 2:05 = 53

A t B(t)) is a solution to (4.4.5) and (t) is a solution to (4.4.7).

( ( )

Five separate

experiments were completed in which the two flows were computed for randomly generated target matrices and initial conditions with

n = 5 and m = 4.

The linear convergence rates

computed from these five experiments are given in Table 4.5. I deduce that solutions of (4.4.7) converge around twenty times faster than solutions to (4.4.5) when the systems considered have five states and four inputs and outputs. A brief study of the behaviour of systems with other numbers of states and inputs indicate that the ratio between convergence rates is around an order of magnetude. In the system assignment problem Lemma 4.1.1 ensures that an exact solution to the system assignment problem does not generically exist. The gradient flow (4.3.4), however, will certainly converge to a connected set of local minima of the potential , Theorem 4.3.3. An important question to ask concerns the structure the critical level set associated with the local minima of may have. In particular, one may ask whether the level set is a single point or is


Chapter 4

Potential φ∗

112

Time

Figure 4.5.3: Plot of ' ((t)) verses t for (t) a solution to (4.4.5) with initial conditions (0) = In the identity matrix. The potential ' () := (A0 T F ) B0 B0T (A0 2 T T F )B0 B0 is computed with respect to the initial conditions (A 0 B0 ) used in Figures 4.5.1 and 4.5.2.

jj

jj

;

;

;

it a submanifold (at least locally) of F (A B ). Remark 4.5.2 Observe that critical level sets of

are given by two algebraic conditions

jjgrad(A B)jj = 0 and (A B) = 0 , for some fixed 0 , thus they are algebraic varieties of the closed submanifold F (A B ) Rn n Rn m . It follows, apart from a set of measure zero in F (A B ) (singularities of the algebraic conditions), that the critical sets will locally have submanifold structure in F (A B ). 2 Rather than consider the computationally huge task of mapping out the local minima of

by integrating out (4.3.4) for many different initial conditions in

F (A B), it is possible

to obtain some qualitative information in the vicinity of a given local minima). Choosing any initial condition and integrating (4.3.4) for a suitable time interval an estimate of a local minima

A1 B 1 ) is obtained.

(

If this point is an isolated minima then it should be locally

A1 B 1 ) 1 and integrating (4.3.4) a second time one obtains new estimates of local minima (A1 i Bi ). If (A1 B 1 ) approximates an isolated local minima then the ratio attractive. By choosing a number of initial conditions (Ai Bi) in the vicinity of

1 1 1 1 ri = jjjj(A(Ai BBi) );;(A(A1 B B1)jj)jj i

:5:1)

(4

A1 B 1 ) is not isolated then one expects the ratio ri to Of course ri should be less than one on average since the flow is

should be approximately zero. If be significantly non-zero.

i

(

(

x4-6

113

Frequency


Ratio r i

Figure 4.5.4: Plot of frequency distribution of r i given by (4.5.1) computed for the limiting values of 100 simulations with initial conditions close to (A 1 B 1 ).

convergent. The difficulty in this approach is deciding on suitable time intervals for the various integrations. The first time interval was determined by repeatedly integrating over longer and longer time intervals (for the same initial conditions) until the norm difference between the final values was less than 1

10;8 . An initial time interval of two hundred seconds was

found to be suitable. Each subsequent simulation was integrated over a time interval of fifty seconds. The results of one hundred measurements of the ratio ri for a given estimated local

minima (A1 B 1 ) are plotted as a frequency plot, Figure 4.5.4. The frequency divisions for

this plot are 0:05, thus in the one hundred experiments undertaken eleven experiments yielded

an estimate of ri between 0:325 and 0:375. It is obvious from Figure 4.5.4 that the probability of ri being zero is small and one concludes that the critical sublevel sets of

have a local

submanifold structure of non-zero dimension. In particular, the local minima of

are not

isolated.

4.6 Numerical Methods for Symmetric Pole Placement In this section a numerical algorithm, based on the continuous-time flow (4.4.7) coupled with the feedback gain (4.3.14) is proposed. The algorithm is analogous to those discussed in Chapters 2 and 3. Let (A B ) be a symmetric output feedback system and let F

2 S (n) posses the desired

114


closed loop eigenstructure. For 0

Chapter 4

2 O(n) consider the iterative algorithm generated by

i+1

=

Ki

=

e;i Ti F i Q( Ti F i ;A)] BT ( Ti F i ; A)B

i

(4.6.1) (4.6.2)

i2

N and i a sequence of positive real numbers termed time-steps. Observe that the T T Lie-bracket Ti F i , Q( Ti F i ; A)] is skew symmetric, hence, e;i i F i Q( i F i ;A)] for

is orthogonal and i+1 lies in O(n).

To motivate the algorithm observe that

d e; Ti F i Q( Ti F i ;A)] =0 d i

= =

the negative gradient of

' at i

T F Q( T F ; A)] i i i T F iQ(A ; T i F i ) i ] i

i i

(cf. Corollary 4.4.2). Thus,

represents a curve in O(n), passing through i at time

=

i

e; Ti F i Q( Ti F i ;A)]

0, and with first derivative equal to

;grad'( i). Indeed, the algorithm proposed can be thought of as a modified gradient descent

T T algorithm where instead of straight line interpolation the curves i e; i F i Q( i F i ;A)] are used.

To implement (4.6.1) it is necessary to choose a time-step i for each step of the recursion. A convenient criteria for determining a suitable time-step is to minimise the smooth function

4'( i i) = '( i+1 ) ; '( i): In particular, one would like to ensure that

' .

equilibrium point of

:6:3)

(4

4'( i ) is strictly negative unless i is a

The following argument is analogous to the derivation of step-size

selection schemes given in Section 2.2.

Lemma 4.6.1 Let

A B ) be

(

a controllable6 symmetric output feedback system and

S (m), F 6= 0, posses the desired closed loop eigenstructure. 6

I.e. the controllability matrix B

A B ) ensures that Q(A) = 0.

(

6

AB A2B

An;1 B ] is full rank.

For any

i

F 2

2 O(n) such that

It is easily shown that controllability of

x4-6


grad' ( i ) 6= 0, the recursive estimate i+1

= i

e;i Ti F i Q( Ti F i ;A)] , where

i = 4jjF jj(jjP( T F 1 )jj + jjQ(A)jj)

:6:4)

(4

i

i

115

satisfies 4'( i i ) = '( i+1 ) ; '( i ) < 0.

e; Ti F i Q( Ti F i ;A)] for an arbitrary time-step and define Xi = Ti F i and Xi+1 ( ) = Ti+1 ( )F i+1 ( ). The Taylor expansion for Xi+1( ) is

Proof Let

i+1 ( ) = i

Xi+1 ( ) = Xi ; Xi Xi Q(Xi ; A)]] + 2 R2( ) where

R2 ( ) =

Z1 0

Xi+1(s ) Xi Q(Xi ; A)]] Xi Q(Xi ; A)]](1 ; s)ds:

Substituting into (4.6.3) and bounding yields (after some algebraic manipulations)

4'( i )

jjQ(Xi+1 ; Xi)jj2 ; 2 tr(Q(Xi ; A)Xi Xi Q(Xi ; A)]]) + 2tr(Q(Xi ; A)R2 ( )) ;2 jjXi Q(Xi ; A)]jj2 + 4 2 (jjP(Xi)jj + jjQ(A)jj) jjXi Q(Xi ; A)]jj2jjF jj := 4'u( i ) =

The controllability of (A B ) along with the assumptions grad' ( i ) 6= 0 and F

6= 0 ensures

that the quadratic coefficient of 4'u ( i ) does not vanish and it is easily seen that its unique

minimum is strictly negative and occurs for 0 > 4'u( i i ) 4'( i i).

=

i

of (4.6.4). The result follows since

A B ) be a controllable symmetric output feedback system and let F 2 6 0, posses the desired eigenstructure. For a given estimate i 2 O(n), let i be S (n), F =

Theorem 4.6.2 Let

(

given by (4.6.4). The algorithm (4.6.1)

i+1 = i

e;i Ti F i Q( Ti F i ;A)]

:6:5)

(4

116


Chapter 4

has the following properties. a) The algorithm defines an iteration on O(n). b) Fixed points of the algorithm are the equilibrium points of (4.4.7). c) If i is a solution to (4.6.5) then the real sequence '( i ) is strictly monotonic decreasing unless there is some i 2 N with i a fixed point of the algorithm.

d) Any solution i to (4.6.5) converges as i ! 1 to a set of equilibrium points contained in a level set of ' .

T T Proof Part a) follows from the observation that e;i i F i Q( i F i ;A)] is orthogonal. Fixed

_ i+1 ( ) vanishes (Lemma points of the recursion are those for which the first derivative of

4.6.1) and correspond exactly to the equilibrium points of (4.4.7). This proves part b) while part c) is a corollary of Lemma 4.6.1. To prove part d) observe that O(n) is a compact set, and thus ' ( i ), a bounded monoton-

ically decreasing sequence, must converge. This implies that 4'( i i) ! 0 as i !

1. It

follows that i converges to a level set of ' such that for any in this set 4'( ( )) = 0. Lemma 4.6.1 ensures that any point in this set is an equilibrium point of (4.4.7).

Remark 4.6.3 Observe that there is an associated sequence of realisations

Ai Bi

= =

A + BKi B T ) Ti

i B

i (

2

for any solution ( i Ki) of (4.6.1) and (4.6.2).

A primary aim in developing the algorithm (4.6.1) is to provide a reliable numerical tool with which to investigate the structure of the pole placement (system assignment) problem for symmetric realisations. Figure 4.6.1 is a simulation for a fifth order symmetric state space system with four inputs. The initial condition is 0

=

In , the identity matrix, and the algorithm

is run for 1000 steps. The linear convergence properties of the algorithm are shown by the linear appearance of the log verses iteration plot, Figure 4.6.1. The time-step selection for this simulation is displayed in Figure 4.6.2 and indicates both the non-linear nature of the selection

x4-6

117

Log of potential


Iteration

i showing linear convergence properties.

( )

Time–step selection

Figure 4.6.1: Iteration verses '

Iteration

Figure 4.6.2: Iteration verses time-step selection i.

scheme as well as its limiting behaviour. The existence of a limit to the time-step selection scheme (4.6.4) as i ! 1, ensures that the linearization of (4.6.1) around a critical point exists. By computing this linearization the linear convergence properties displayed in Figure 4.6.1 can be confirmed theoretically. Simulation studies have shown the presence of many local minima in the cost potential ' .

Figure 4.6.3 is a plot of both the cost ' and the norm of the gradient jjgrad'( i )jj2 for a simulation of a seventh order symmetric state space system with four inputs. The system was chosen such that an exact solution to the pole placement problem existed. Thus, the global minimum of

' was known to be zero, however, Figure 4.6.3 shows the cost ' converging

to a constant while the gradient converges to zero. The algorithm (4.6.1) provides a reliable numerical method to investigate the presence and position of such local minima.


Potential and norm of gradient

118

Chapter 4

Potential

Norm of gradient

Iteration

Figure 4.6.3: Iteration verses both potential ' T F )T ] 2.

jj

i

jj F Q(A ;

( ) and the norm of the gradient

4.7 Open Questions and Further Work An important question that has not been addressed in this chapter is that of understanding the equilibrium conditions for the various dynamical systems in the context of classical systems theory. It would be nice to relate conditions such as (4.4.5) to properties such as the frequency response of the achieved system. Unfortunately, even finding a relationship between the desired and the achieved pole positions appears to be difficult. The discussion of Problem C, multiple systems assignment, is another area that would benefit from further work. The results presented in this chapter are far from comprehensive. A natural extension of the theory presented in this chapter is to consider more general systems. For example a class of systems (A B C ) with a given Cauchy-Maslov index (i.e.

AIpq )T = AIpq and C T

(

=

Ipq B where Ipq = diag(Ip ;Iq )) could be approached using the

same techniques developed earlier. The Lie transformation group associated with the set of such systems is

G = fT 2 Rn n j T T Ipq T = Ipq det(T ) 6= 0g which has identity tangent space (or Lie-algebra)

g = f 2 Rn n

j (Ipq)T = ;Ipq g

the set of signature skew symmetric matrices.

x4-7


119

Related to the general construction for systems with an arbitrary Cauchy-Maslov index is the problem for Hamiltonian linear systems. These are systems (A B C ) where (AJ )T and C T

=

JB where

0 J =B @

1 ;In C A:

0

In

=

AJ

0

The set of Hamiltonian linear systems is a homogeneous space with Lie transformation group

Sp(n R) = fT 2 R2n

2n

j T T JT = J det(T ) 6= 0g

termed the symplectic group. The Lie-algebra associated with the symplectic group is the set of 2n

2n Hamiltonian matrices Ham(n R) = f 2 R2n 2n

j (J )T + J = 0g:

Hamiltonian systems are important for modelling mechanical systems. One may also consider pole placement problems on the set of general linear systems. A discussion of some basic results is contained in the monograph (Helmke & Moore 1994b, Section 5.3). One area in which these results could be extended is to consider dynamic output feedback. Assume that one knows the degree

d of a dynamic compensator applied to a given

linear state space system. The dynamics of the closed loop system can be modelled by the differential equation

x_ y w_ u

= = = =

Ax + Bu Cx Gw + Cx Fw + Ky

where the feedback law u is allowed to depend both on the dynamic compensator state w and

the direct output y . This system can be rewritten as an augmented system with static feedback

0 1 xC dB @ A dt w

=

0 B@ A

0

C G

10 1 0 CA B@ x CA + B@ B w

0

1 CA u

120


0 1 y B @ C A

=

u

=

w

Chapter 4

0 B@ C

10 1 0 CB x C A@ A 0 Id w 0 1 B y C K F @ A: w

Once the system is written in this form it is amenable to analysis via the general linear theory presented in Helmke and Moore (1994b, Section 5.3). Of course, one could also exploit the structure of the augmented problem itself to reduce computational cost and ensure that the roles of system and compensator states do not become confused. Gradient descent methods could also be used to compute canonical forms for system realizations. For example, to compute the companion form of a given state matrix A consider the smooth cost function

(A) =

n X X i=2 j 6=i;1

A2ij +

n X i=2

(

Ai(i;1) ; 1)2

on the homogeneous space

S (A) = fTAT ;1 j T 2 Rn n det(T ) 6= 0g: Given that computating canonical forms is often an ill conditioned numerical problem, dynamical system techniques and related numerical gradient descent algorithms with their strong stability properties may prove to be an important numerical tool in certain situations.

Chapter 5

Gradient Flows on Lie-Groups and Homogeneous Spaces The optimization problems considered in Chapters 2, 3 and 4 are all problems where the constraint set is a homogeneous space. In each case the approach taken is to consider a suitable Riemannian metric on the homogeneous space and compute the maximising (or minimising) gradient flow. The limiting value of a solution of the gradient flow (for arbitrary initial condition) then provides an estimate of the maximum (or minimum). The numerical methods discussed in Chapters 2 to 4 are closely related to each other. They each rely on using a ‘standard’ curve lying within the homogeneous space, which can be assigned an arbitrary initial condition and arbitrary initial tangent vector, to interpolate the solution of the continuous-time gradient flow. Thus, for an arbitrary point in the constraint set one estimates the solution of the gradient flow by travelling a short distance along the ‘standard’ curve starting from the present estimate with initial tangent vector equal to the gradient at that point. It is natural to ask whether there is an underlying structure on which the numerical solutions proposed in Chapters 2 to 4 are based and, if there is, to what degree can such an approach be applied to any generic optimization problem on a homogeneous space. With the developing interest in dynamical system solutions to linear algebraic problems (Symes 1982, Deift et al. 1983, Brockett 1988, Chu & Driessel 1990, Helmke & Moore 1994b) during the eighties there came an interest in the potential of continuous realizations of classical

121

122


Chapter 5

problems as efficient numerical methods (Chu 1988). Interestingly, it has taken several years before the connection between dynamical systems and linear algebraic problems is examined in the other direction, namely, can one use insights and understanding developed by studying problems using the dynamical systems approach to design efficient numerical algorithms for problems in linear algebraic. Recently Chu (1992) has shown that by using the insight provided by a geometric understanding of a structured inverse eigenvalue problem a better understanding of a quadratically convergent algorithm first proposed by Friedland et. al. (1987) is obtained. Perhaps more directly based on the dynamical systems literature is the work by Brockett (1993) that looks at the design of gradient algorithms on the adjoint orbits of compact Lie-groups. The methods proposed in Chapters 2 to 4 are gradient descent algorithms constructed explicitly on the homogeneous space (Moore et al. 1992, Mahony et al. 1993, Moore et al. 1994, Mahony et al. 1994). Certainly the numerical methods proposed satisfy the broad requirements of simplicity, global convergence and constraint stability discussed on page 2. Moreover, the numerical methods described in each chapter have strong similarities, for example the Riemannian metrics used are all of a similar form and the ‘standard’ curves used to generate the numerical methods are all based on matrix exponentials. To develop a general understanding of these methods, however, it is apparent that one must develop a better understanding of the Riemannian geometry of the homogeneous constraint sets on which the algorithms are constructed. In this chapter I attempt to provide a rigourous but brief review of the relevant theory associated with developing numerical methods on homogeneous spaces. The focus of the development is on the classes of homogeneous spaces encountered in engineering applications and the simplest theoretical constructions which provide a rigourous basis for the numerical methods developed. A careful development is given of the relationship between gradient flows on Lie-groups and homogeneous space (related by a group action) which motivates the choice of a particular Riemannian structure for a homogeneous space. Convergence behaviour of gradient flows is also considered. The curves used in constructing numerical methods in Chapters 2 to 4 were all based on matrix exponentials and the well understood theory of the exponential map as a Lie-group homomorphism is reviewed to provide a basis for this choice. Moreover, the geodesic structure of the spaces considered (following from Levi-Civita connection) is developed and conditions are given on when the matrix exponential maps to a geodesic curve

x5-1

Lie-groups and Homogeneous Spaces

123

on a Lie-group. Finally, an explicit discussion of the relationship between geodesics on Liegroups and homogeneous spaces is given. The conclusion is that the algorithms proposed in Chapters 2 to 4 are modified gradient descent algorithms with geodesic curves used to replace the straight line interpolation of the classical gradient descent algorithm. Much of the material presented is standard or at least accessible to people working in the fields of Riemannian geometry and Lie-groups, however, this material would not be standard knowledge for researchers in an engineering field. Moreover, the development strongly emphasizes the aspects of the general theory that is relevant to problems in linear systems theory. Due to to the focus of the work, explicit proofs are given for a number of results which do not appear to be standard in the literature. In particular, I have not seen the results concerning the interrelation of gradient flows on Lie-groups and homogeneous spaces nor a careful presentation of the relationship between geodesics on Lie-groups and homogeneous spaces in any existing reference. The chapter is divided into nine sections. Section 5.1 presents a brief review of Lie-groups and homogeneous spaces while Section 5.2 considers a certain class of homogeneous space, orbits of semi-algebraic Lie-groups, which includes all the constraint sets considered in this thesis. Section 5.3 describes a natural choice of Riemannian metric for a given homogeneous space while Section 5.4 discusses the derivation of gradient flows on Lie-groups and homogeneous spaces and shows why the choice of Riemannian metric made in Section 5.3 is the most natural. Section 5.5 discusses the convergence properties of gradient flows. Sections 5.6 to 5.9 develop the geometry of Lie-groups and homogeneous spaces concentrating on providing a basis for understanding the exponential map and geodesics.

5.1 Lie-groups and Homogeneous Spaces In this section a brief review of Lie-groups and homogeneous spaces is presented. The reader is referred to Helgason (1978) and Warner (1983) for further technical details. A Lie-group G is an abstract group which is also a smooth manifold on which the operations

7! , for , 2 G) and taking the inverse ( 7! ;1 , for 2 G) are smooth diffeomorphisms of G onto G. For 2 G one defines automorphisms of G associated

of group multiplication (

124


Chapter 5

with right and left multiplications by a constant

r : G ! G l : G ! G

r ( ) := l ( ) := :

(5.1.1)

Observe that r and l are diffeomorphisms of G with smooth inverse given by r;1 and l;1 respectively. Let M be a manifold and G be a Lie-group. A smooth group action of G on M is a smooth mapping

:G M !M which satisfies

( q)

(e q)

= =

( ( q)) q 2 M 2 G q q 2 M e is the identity of G:

is known as transitive if for any q and r in M there exists 2 G such that

( q) = r. Observe that ( ) : M ! M is a diffeomorphism of M into M since

( ) is smooth, surjective (let 2 G, then for any q 2 M , ( (;1 q)) = q ) and has smooth inverse ;1 ( ) = ( ;1 ). A smooth manifold M with a transitive group of diffeomorphisms ( ( ) : M ! M ) is known as a smooth homogeneous space. The action

Let p 2 M and define the stabiliser of p by stab(p) = f

2 G j ( p) = pg:

By construction stab(p) G is an abstract subgroup of G. By inspection the map

p

p ()

: :=

is a smooth map which is onto if and only if smooth transitive group action of stab(p)

=

G onto M

G!M

( p)

is transitive.

(5.1.2) As a consequence, if

one has that dim G

dim M .

is a

The stabilizer,

;p 1 (p), is the inverse image of a single point under a continuous map and is a

x5-2

Semi-Algebraic Lie-Groups, Actions and their Orbits

125

closed set in the manifold topology on G. Consequently, stab(p) is a closed abstract subgroup

G and is a Lie-subgroup of G with the relative topology inherited from G (Warner 1983, pg. 110). The left coset space G=stab(p) = f stab(p) j 2 Gg has a natural topology such that the surjective mapping : G ! G=stab(p), 7! stab(p) , is a continuous, open mapping. Similarly, equipping G=stab(p) with the unique differential structure that makes smooth (Warner 1983, pg. 120), it is easily verified that is a submersion. of

The right coset space is itself a homogeneous space under the group action

( stab(p))

G G=stab(p) ! G=stab(p) stab(p):

: :=

Consider the smooth map

?p

?p ( stab(p)) It is a standard result that

: :=

G=stab(p) ! M

( p):

?p : G=stab(p) ! M

(5.1.3)

is a diffeomorphism (Helgason 1978,

Proposition 4.3, pg. 124). By construction, the following diagram commutes

G

HHH HH

p

-

G=stab(p)

?p HH HHHj ? M

In particular, p

=

?p is the composition of a submersion and a diffeomorphism and is

itself a submersion.

5.2 Semi-Algebraic Lie-Groups, Actions and their Orbits A set G Rs is known as semi-algebraic when it can be obtained by finitely many applications of the operations of intersection, union and set difference starting from sets of the form

126


fx 2 Rs j f (x) = 0g with f a polynomial function on Rs .

Chapter 5

A semi-algebraic Lie-group is a

Lie-group which is also a semi-algebraic subset of Rs . The following two sets are examples of semi-algebraic Lie-groups. Example 5.2.1

a) The general linear group

GL(N R) = fT 2 RN

N j det(T ) 6= 0g

b) The orthogonal group

O(n) = O(n R) = fT 2 RN where IN is the N

N

j TT 0 = IN g

N identity matrix.

2 Let G be a Lie-group and be a smooth group action : G

define the orbit of the action to be the set

Rr

! Rr . Fix p 2 Rr and

O(p) = f ( p) j 2 Gg: The set O(p) is an immersed1 submanifold of Rr in the sense that it is a subset of Rr with a

G=stab(p) ! O(p), The map is a smooth transitive group action of G acting on O (p) and thus,

differential structure given by that induced by the diffeomorphism ? p : (cf. (5.1.3)).

O(p) is given the structure of a homogeneous space. It is certainly not clear that the differential structure induced by the immersion is compatible with the Euclidean differential structure on

Rr . In the case where the two differential structures are compatible then O(p) is an embedded submanifold of Rr .

G be a subset of Rs . f(x f (x)) j x 2 Gg Rs Let

A map

f :G!

Rr is semi-algebraic when the graph of f ,

Rr is semi-algebraic. In particular, if

G is a semi-algebraic

1 An immersion is a one-to-one map f : M ! N between two manifolds M and N which for which the differential df is full rank at all points. An immersed submanifold is a subset U N such that U = f (M ) is the image of some manifold M via an immersion f . The set U N inherits the differential structure on M via the map f , however, this need not correspond to the differential structure associated with the manifold N . An embedding is an immersion f : M ! N such that the image U = f (M ) is a manifold with subspace differential structure inherited from N (Warner 1983, pg. 22).

x5-3


127

! Rr is a rational map (i.e. the i’th component of f is a ratio of two polynomial maps) then the map f : G ! Rr is semi-algebraic (Gibson 1979, pg. 223). subset of Rs and f : Rs

The following result shows that for semi-algebraic Lie-groups and semi-algebraic group actions the orbit of a point p standard where

2 Rr is always an embedded submanifold of Rr .

G is a compact Lie-group (Varadarajan 1984, pg.

81). For

The result is

G semi-algebraic

the reader is referred to Gibson (1979, pg. 224).

! Rr be a smooth group action of G on Rr . Let p 2 Rr be an arbitrary point and denote the orbit of p by O (p) = f ( p) j 2 Gg. Then, O (p) is an embedded submanifold of Rr with the embedding RN

Proposition 5.2.2 Let G be a Lie-group and : G

?p : G=stab(p) ! O (p) given by (5.1.3), if either: a)

G is a compact Lie-group.

b)

G is a semi-algebraic Lie-group and : G

RN

! RN

is a semi-algebraic group

action.

5.3 Riemannian Metrics on Lie-groups and Homogeneous Spaces

M ! M be a smooth transitive group action of G on M . Denote the tangent space of G at the identity e by Te G. Let ge : TeG TeG be an inner product on Te G, i.e. a positive definite, bilinear map. The inner Let G be a Lie-group, M be a smooth manifold and : G

product chosen in the sequel is always a Euclidean inner product computed by choosing an arbitrary fixed basis fE1 : : : Eng for Te G, expressing given tangent vectors X

and Y

=

Pn y E in terms of this basis and setting i= 1 i i ge (X Y ) =

n X i=1

=

Pn x E i=1 i i

x i yi :

Of course, this construction depends on the basis vectors used. One could also consider other inner products, for example when G is semi-simple the negative of the killing form (Helgason

128


Chapter 5

1978, pg. 131) is a positive definite inner product. A number of authors have used the killing form in related work (Faybusovich 1989, Bloch et al. 1992, Brockett 1993), however, the choice of a particular inner product is immaterial to the following development. Let ge be an inner product on Te G. Let r of (5.1.1), be right translation by

2 G. Since

r is a diffeomorphism, the differential2 at the identity e of G, Te r : Te G ! T G, is a vector space isomorphism. Using Te r one can define an inner product on each tangent space of G g g ( ) where

T G T G ! R ge Te r;1 ( ) Ter;1()

: :=

and are elements of T G.

It is easily verified that g varies smoothly on

G and

consequently defines a Riemannian metric,

g( ) := g ( ) for

:3:1)

(5

2 G, , in T G. This Riemannian metric is termed the right invariant group metric for

G. Observe that for any two smooth vector fields X and Y

on G one has

g (drX drY ) = g (X Y ):

:3:2)

(5

2 M be arbitrary and recall that p of (5.1.2) is a submersion of G onto M (since the group action is transitive). Thus, the differential of p at the identity Te p : Te G ! TpM Let p

is a linear surjection of vector spaces. Decompose Te G into the topological product

TeG = ker Te p

dom Te p

2 Let : M ! N be a smooth map between smooth manifolds M and N . Let p 2 M be an arbitrary point then the differential of at p (or the tangent map of at p) is the linear map

Tp Tp (X )

: :=

Tp M T(p) N D p (X ) !

j

where Djp (X ) is the Fréchet derivative (Helmke & Moore 1994b, pg. 334) of in direction X 2 Tp M . The full differential of is a map from the tangent bundle of M , TM = p2M Tp M to the tangent bundle of N

d : TM TN d(Xp) := Tp (Xp ) where Xp is an element of the tangent space T p M for arbitrary p M . !

2

x5-3


129

where ker Te p is the kernel of Te p and dom Te p

=

fX 2 TeG j ge(X Y ) = 0 Y 2 ker Te pg

:3:3)

(5

is the domain (the subspace orthogonal to ker Te p using the inner product provided on Te G).

By construction, Te p restricts to a vector space isomorphism Te? p ,

Te? p Te? p (X )

:

dom Te p

:=

Te p (X ):

! TpM

Thus, one may define an inner product on the tangent space Tp M by

gpM (X Y ) = ge Te? ;p 1 (X ) Te? p;1 (Y ) : where Te? p;1 (X ) 2 Te G via the natural inclusion dom Te p

TeG. It is easily verified that

this construction defines a smooth inner product on the tangent bundle TM . Thus, one defines

a Riemannian metric,

g M (X Y ) := gqM ( ) for q

:3:4)

(5

2 M and X , Y in Tq M . This is termed the normal metric on M .

2 M be arbitrary then the normal Riemannian metric on M and the right invariant group metric on G are related by the differential of p : G ! M . To see this observe that for any p 2 M there exists 2 G such that ( p) = q . Thus, Let q

q ( )

= =

( ( p)) = ( p)

p r ( ):

Differentiating at the identity gives the following commutating diagram of vector space homomorphisms.

130


TeG

dr

HHH H d q HHH H

-

Chapter 5

T G d p

Hj ?

Tq M

In particular, the normal Riemannian metric can also be defined by

gM (X Y ) := g(T? ;p 1 ( ) T? ;p 1 ()) g ( ) is the right invariant group metric on G, X restriction of T p to where

dom T p

=

and

Y

:3:5)

(5

in Tq M and T? p is the

fY 2 T G j g(Y X ) = 0 for all X 2 ker T pg

the domain of T p . Observe that dom T p

=

:3:6)

(5

dr (dom Te q ).

5.4 Gradient Flows Let M be a Riemannian manifold (with Riemannian metric gM ) and let : M

! R be a smooth

potential function. The gradient of on M is defined pointwise on M by the relationships

;

D jp ( ) = gM grad (p) 2 TpM grad (p) 2 TpM where

D jp ( ) is the Fréchet derivative of in direction at the point p 2 M

(5.4.1) (5.4.2) (Helmke &

Moore 1994b, pg. 334). Existence follows from the positive definiteness and bilinearity of the inner product along with linearity of the Fréchet derivative. Observe that grad is a smooth vector field on

M

which vanishes at local maxima and

minima of . Consider the ordinary differential equation on M , termed the gradient flow of ,

p_ = grad (p)

x5-4

Gradient Flows

131

p0 2 M be some initial condition then solutions of the gradient flow with initial condition p 0 exists and are unique (apply classical

whose solutions are integral curves3 of grad . Let

O.D.E. theory to the local co-ordinate representation of the differential equation).

G M ! M be a smooth transitive group action of G on M . Fix p 2 M and consider the ‘lifted’ potential : G ! R, Let G be a Lie-group and :

( ) := p ()

:4:3)

(5

where p is given by (5.1.2). Let ge ( ) be an inner product on Te G and define the right

invariant group metric g on G and the normal metric g M on M as described in Section 5.3. The

smooth potential : M

! R and the ‘lifted’ potential give rise to the two gradient flows q_ _

=

grad (q )

=

grad ( )

q 2 M 2 G

(5.4.4) (5.4.5)

defined with respect to the normal metric and the right invariant group metric respectively.

p 2 M be some fixed element of M . Let q0 = p (0) (where p is given by (5.1.2)) be an arbitrary initial condition in M . Let q (t) denote the solution of (5.4.4) with initial condition q 0 and let (t) denote a solution of (5.4.5) with initial condition 0 then

Lemma 5.4.1 Let

q (t) = p ((t)): Proof By construction q0

=

p (0 ). Consider the time derivative of p ((t))

d ((t)) dt p

d

d p dt (t) T p grad ((t)):

= =

3

Let X be a smooth vector field on a manifold M . An integral curve of X is a smooth map

_ (t)

: =

R M !

X ( (t))

; d and d denotes the tangent vector to R at the point R. where _ ( ) := d dt dt 2

132


Chapter 5

Thus, it is sufficient to show that

T p grad () = grad ( p()) and use the uniqueness of solutions to (5.4.4) and (5.4.5) to complete the proof. Let grad

=

grad0

ker T p and grad?

+

grad ? be the unique decomposition of grad into grad0

2 dom T p (cf. (5.3.6)). Observe that

g (grad0() grad0())

= = =

2

g (grad( ) grad0( )) D j (grad0( )) D jp () (T p grad0()) = 0

grad0 () = 0. Since the metric g is positive definite then grad0 = 0 and it follows that grad 2 dom (T p ).

since T p

Let

q 2 M be arbitrary and choose 2 G such that p () = q .

Let

X 2 Tq M be an

arbitrary tangent vector and observe

g M (T p grad () X ) = g T? ;p 1 T p grad ( ) T? p;1 (X ) 1 using (5.3.5). Of course T? ; p T p (grad ( )) = grad ( ) since grad

2 dom (T p).

It follows that

;

gM T p grad () X

= = = =

Since

g grad ( ) T? p;1 (X ) D j (T? ;p 1 (X )) D jq (T p T? ;p 1 (X )) ;

D jq (X ) = gM grad (q) X :

X is arbitrary and gM is positive definite then T p grad ( ) = grad ( p( )) and

the proof is completed.

x5-5

Convergence of Gradient Flows

133

5.5 Convergence of Gradient Flows

M ! R be smooth function. Let grad denote the gradient vector field with respect to the Riemannian metric on M . The critical points of : M ! R coincide with the equilibria of the gradient flow on M . Let M be a Riemannian manifold and let :

q_(t) = ;grad (q(t)):

:5:1)

(5

For any solution x(t) of the gradient flow

d dt (q (t))

= =

and therefore

g (grad (q(t)) q_(t)) ;g(grad (q(t)) grad (q(t))) 0

(q (t)) is a monotonically decreasing function of t.

The following proposition

is discussed in Helmke and Moore (Helmke & Moore 1994b, pg. 360). Proposition 5.5.1 Let

:M!

R be a smooth function on a Riemannian manifold with

compact sublevel sets4 . Then every solution q (t) 2 M of the gradient flow (5.5.1) on M exists

t 0. Furthermore, x(t) converges to a connected component of the set of critical points of as t ! +1. for all

M

Note that the condition of the proposition is automatically satisfied if

is compact.

Solutions of a gradient flow (5.5.1) display no periodic solutions or strange attractors and there is no chaotic behaviour. If has isolated critical points in any level set fq

2 M j (q) = cg,

c 2 R, then every solution of the gradient flow (5.5.1) converges to one of these critical points as t ! +1. This is also the case where the critical level sets are smooth submanifolds of M . In general, however, it is possible that the solution of a gradient flow converges to a connected level set of critical points of the function . Such ‘non-generic’ behaviour is undesirable when gradient flows are being used as a numerical tool. For problems of the type considered in this thesis the following lemma is generally applicable. Lemma 5.5.2 Let : M

R

! R be a smooth function compact sublevel sets, such that

4 Let c 2 then the sublevel set of associated with the value c is fq sublevel sets then each such set (possibly empty) is a compact subset of M .

2

M (q) c j

. If

g

has compact

134


Chapter 5

(i) The set of critical points of is the union of closed, disjoint, critical level sets, each of which is a submanifold of M .

H at a critical point degenerates exactly on the tangent space of the critical level sets of . Thus, for q 2 M a critical point of and 2 T q M then H jq ( ) = 0 if and only if is in the tangent space of the critical level set of .

(ii) The Hessian5

Then every solution of the gradient flow

q_ = ;grad (q) converges exponentially fast to a critical point of .

Proof Denote the separate connected components of the critical level sets of

by Ni for

i = 1 2 : : : K where K is the number of disjoint critical level sets. Thus, the limit set of a solution to the gradient flow x_ = grad is fully contained in some N j for j 2 1 2 : : : K ]. Let a 2 Nj be an element of this limit set. Condition ii) ensures that each Nj is a non-degenerate critical set. It may be assumed without loss of generality that the value of constrained to Nj is zero. The generalised Morse lemma (Hirsch 1976, pg. 149) gives a open neighbourhood Ua of a in M and a diffeomorphism f : Ua ! Rn , n = dim M , nj = dim Nj , such that (i) (ii)

f (Ua \ Nj ) = Rnj f0g f ;1 (x1 x2 x3) = 12 (jjx2jj2 ; jjx3jj2)

2 Rnj , x2 2 Rn; , x3 2 Rn+ and nj + n; + n+ = n. Let W = f (Ua) Rn then the gradient flow of f ;1 on W is

with x1

x_ 1 = 0 x_ 2 = ;x2 x_ 3 = x3:

:5:2)

(5

Let M be a smooth manifold. The Hessian of a smooth function : M R is a symmetric bilinear map q : Tq M Tq M R given by 2~ ~j q ( ) = ~i @x@i @x j n 1 where x = x : : : x is a local coordinate chart on M and ~,

~ are the local coordinate representations of and

Tq M while ~ is a local coordinate representation of . 5

H

!

j

!

H

f

2

g

j

x5-6


135

x3

W

-

W W

+

x

2

+

W

-

Figure 5.5.1: Flow around a saddle point

Let W + :=

f(x1 x2 x3) j jjx2jj jjx3jjg and W ; := f(x1 x2 x3) j jjx2jj < jjx3jjg. Using

the convergence properties of (5.5.2) it follows that every solution of original gradient flow starting in f ;1 (W + ;f(x1 x2 x3) j x3

=

0g) will enter the region f ;1 (W ; ) for which < 0

(cf. Figure 5.5.1). On the other hand, every solution starting in ff ;1 (x1 x2 x3) j x3

=

0g will

2 Nj x1 2 Rnj . As is strictly negative on f ;1 (W ; ),

; S all solutions starting in f ;1 W ; W + ; ff ;1 (x x x ) j x = 0g will eventually and

converge to the point f ;1 (x1 0 0)

1

converge to some

Ni 6= Nj .

2

3

3

By repeating this analysis for each

solution must converge to a connected subset of some

Ni and recalling that any

Ni (Proposition 5.5.1) the proof is

completed.

5.6 Lie-Algebras, The Exponential Map and the General Linear Group Let G be a Lie-group. The Lie-algebra of G, denoted g, is the set of all left invariant smooth vector fields on G, i.e. smooth vector fields X ( ) 2 T G such that

X l ( ) = dl X ( ) where l ( ) := is left multiplication by . In particular,

X ( ) = dl X (e):

:6:1)

(5

136


Chapter 5

X ( ) and Y ( ) be two smooth vector fields on G and think of them as derivations6 of C 1(G) (the set of smooth functions which map G ! R). The Lie-bracket of X ( ) and Y ( ) is defined with respect to the action of X and Y as derivations

Let

X ( ) Y ( )]f = X ( )Y ( )f ; Y ( )X ( )f

f 2 C 1(G).

where

By checking the linearity of this map it follows that the Lie-bracket of

two vector fields is itself a derivation and corresponds to a vector field, denoted

X Y ]().

The set of smooth vector fields on G, denoted D(G), is a vector space over R under pointwise addition of vector fields and scalar multiplication. Considering the Lie-bracket operation as a multiplication rule,

Y ( ) = Y

D(G) is given the structure of an algebra.

Assume that

X ( ) = X and

are left invariant vector fields on G, then X Y ] = X Y ]( ) is also a left invariant

vector field on G, since

dl X Y ])f

(

= = = =

X Y ](f l ) (5.6.3) X D(f l )j (Y ) ; Y D(f l )j (X ) X Df jl ( ) (dl Y ) ; Y Df jl( ) (dl X ) X Df jl ( ) (Y l ) ; Y Df jl ( ) (X l ) = (X Y ] l )f:

Thus g forms a subalgebra of the algebra of derivations. Note that there is a one-to-one correspondence between g and Te G the tangent space of

G at the identity given by (5.6.1). Thus, g is a finite dimensional algebra of the same dimension as the Lie-group G. Indeed, an alternative way of thinking about g is as the tangent space Te G equipped with the bracket operation

X (e) Y (e)] = dl (X (e)) dl(Y (e))](e):

R

6 Let C 1 (G) be the set of all smooth maps from G into . The set C 1 (G) acquires a vector space structure under scalar multiplication and pointwise addition of functions. A derivation of C 1 (G) is a linear map C 1 (G) ! C 1 (G). The set of all derivations of C 1 (G), denoted D(G), itself forms a vector space under scalar multiplication and pointwise addition of functions. A smooth vector field is a smooth map X : G ! TG which assigns a vector X ( ) 2 T G to each element 2 G. Any smooth vector field X defines a derivation X (f ) 7! Xf ( ) := Df j (X ( )), the Fréchet derivative of f in direction X at the point . Indeed, this correspondence is an isomorphism

G)

D(

the set of smooth vector fields on Gg

f

between D(G) and the vector space of smooth vector fields on G (Varadarajan 1984, pg. 5).

:6:2)

(5

x5-6


Example 5.6.1 Let N be a positive integer and consider the set of all real non-singular N

137

N

matrices

GL(N R) = f 2 RN where det( ) is the determinant of

.

N j det( ) 6= 0g

GL(N R) is known as the general linear group and is a Lie-group under the group operation of matrix multiplication. Since GL(N R) The set

is an open subset of RN N it inherits the relative Euclidean topology and differential struc-

ture. The tangent space at the identity IN (the

N

N

identity matrix) of

GL(N R) is

TIN GL(N R) = RN N the set of all real N N matrices. Consequently, the dimension of the Lie-group GL(N R) is n = N 2. The tangent space of GL(N R) at a point 2 GL(N R) can be represented by the image of TIN GL(N R) = RN N via the linearization of the diffeomorphism generated by left multiplication l , T GL(N R)

TIN l (TIN GL(N R)) fA j A 2 RN N g:

= =

The Lie-algebra of GL(N R), denoted gl(N R), is the set of all left invariant vector fields of

GL(N R). From (5.6.1) it follows that

gl(N R) = fX ( ) = A j Let f

2 G A 2 RN N g:

2 C 1(GL(N R)) be any smooth real function then the Lie-bracket of two elements A,

B 2 gl(N R) acting on f is A B ]f

= = =

A)(B )f ; (B )(A)f (A) Df j (B ) ; (B ) Df j (B ): ;

;

D Df j (B) j (A) ; D Df j (A) j (B) (

GL(N R) inherits the Euclidean differential structure from RN derivative Df j (X ), X 2 RN N , can be written

Now since

Df j (X )

=

N df X d (X )ij

ij =1

ij

N the Fréchet

138


2 RN

where Xij is the (i j )’th entry of X

N . Writing (B )ij

=

Chapter 5

PN B and applying s=1 is sj

the product rule of differentiation gives

;

D Df j (B ) j (A)

=

=

since ddfpk

N X N X

d2 f (B) (A) ij pk ij =1 pk=1 dij dpk X ! N N N df X X d is Bsj (A)pk + ij =1 dij pk=1 dpk s=1 N N X N df X N X d2 f (B) (A) + X Bkj (A)ik ij pk ij =1 pk=1 dij dpk ij =1 dij k=1

PN B = 0 unless p = i and s = k. It follows that s=1 is sj ! N N df X N X X A B ]f = (A)ik Bkj ; (B )ik Akj d =

ij k=1

ij =1 N X

k =1

df ((AB ; BA)) = ((AB ; BA))f ij ij =1 dij

(AB ; BA) is a smooth left invariant vector field on GL(N R). For any two matrices A B 2 RN N define the matrix Lie-bracket by A B ] = AB ; BA. The bracket where

operation on the Lie-algebra can now be written in terms of the matrix Lie-bracket operation on Te GL(N R) = RN N

A B ] = A B]:

:6:4)

(5

Indeed, it is usual to think of gl(N R) as the set

gl(N R) = fA j A 2 RN N g with the matrix Lie-bracket operation A B ] = AB ; BA.

:6:5)

(5

2

G and H be two Lie-groups and let g and h be their associated Lie-algebras. A map : G ! H is called a Lie-group homomorphism (or just homomorphism) from G to H if is smooth and is a group homomorphism (i.e. (g1 g2;1) = (g1) (g2);1 ). A map : g ! h is called a Lie-algebra homomorphism (or just homomorphism) from g to h if is linear and preserves the bracket operation (X Y ]) = (X ) (Y )]. The tangent map Te : TeG ! Te H induces a map : g ! h, (X ) = dlTe (X (e)), which is a Lie-algebra Let

homomorphism (Warner 1983, pg. 90). Abusing notation slightly it is standard to identify g

x5-6


with Te G, h with Te H (cf. (5.6.1)) and write Te : g

induced by a Lie-group homomorphism :

G ! H.

139

! h as the Lie-algebra homomorphism The following result is fundamental in

the theory of Lie-groups. A typical proof is given in Warner (1983, Theorem 3.27).

Proposition 5.6.2 Let

G and H

be Lie-groups with Lie-algebras g and h respectively and

assume that G is simply connected. Let : g ! h be a Lie-algebra homomorphism, then there

exists a unique Lie-group homomorphism : G ! H such that T e = .

Let G be any Lie-group and denote its Lie-algebra by g. Denote the identity component

of

G by Ge , the set of all points in G path connected to the identity e.

Observe that R is a

Lie-group under addition. The Lie-algebra of R is a one-dimensional vector space r

drd ,

=

2 R, where drd denotes the derivative in R. Let X 2 g be arbitrary and consider the map

:

d) ( dr

:=

r!g

X:

It is easily seen that is a Lie-algebra homomorphism and using Proposition 5.6.2 there exists a unique Lie-group homomorphism expX : R ! Ge such that

Te

=

.

Since expX is a Lie-group homomorphism then expX (t1

expX (t1 ) expX (t2 ) and the set

:6:6)

(5

t

+ 2) =

t2R expX (t) G

is known as a one-parameter subgroup of G. One may define the full exponential by exp

:

g!G

(5.6.7)

exp(X ) := expX (1): The exponential map is a local diffeomorphism from an open neighbourhood N0 of 0 2 g into an open neighbourhood Me of e 2 G (Helgason 1978, Proposition 1.6).

140


Lemma 5.6.3 Let exp : gl(N R) !

Chapter 5

GL(N R) denote the exponential map (5.6.7) and let

eX = IN + X + X2!

2

be the standard matrix exponential. Let X

+

X3 3!

2 gl(N R) = RN

N , then

exp(X ) = eX :

X 2 RN N (Horn & Johnson 1985, pg. 300). Let X 2 gl(N R) (thought of as the set of N N matrices equipped with the matrix Lie-bracket) then define X : R ! GL(N R) by Proof Recall that the matrix exponential eX is well defined for all

X (t) = etX : Observe that X is well defined smooth map since the matrix exponential is itself smooth and always non-singular (det(eX ) = etr(X ) 6= 0). Indeed, is a group homomorphism (R is a Lie-group under addition) since X (t1 + t2 )

d tangent space of R at 0 is the set f dr

X

=

X (t1) X (t2 ) and X (;t) = X (t);1 .

The

j 2 Rg where drd denotes normal derivation. Observe

that

d) Te X ( dr

= =

d) D X j0 ( dr dtd etX = X: t=0

But this is exactly the Lie-algebra homomorphism that induces the Lie-group homomorphism expX (5.6.6). Since expX is the unique Lie-group homomorphism that has the property

Te expX ( drd ) = X then it follows that X (t) = expX (t). definition of exp (5.6.7).

The full result follows from the

x5-7


141

5.7 Affine Connections and Covariant Differentiation Let G be a smooth manifold. An affine connection is a rule r which assigns to each smooth

vector field X

2 D(G) a linear mapping rX : D(G) ! D(G), rX (Y ) := rX Y , satisfying rfX +gY rX (fY )

where f , g

= =

f rX + g rY f rX Y + (Xf )Y

(5.7.1) (5.7.2)

2 C1(G) and X , Y 2 D(G).

An affine connection naturally defines parallel transport on a manifold. Let

X and Y 2

D(G) be smooth vector fields and let (t) be a smooth integral curve of X on some time interval (0 ), > 0, then the family of tangent vectors t 7! Y ( (t)) is said to be transported

parallel to (t) if

(

rX Y )( (t)) = 0:

:7:3)

(5

Expressing (5.7.3) in local coordinates one can show that the relationship depends only on the

(t) (Helgason 1978, pg. 29). Thus, given a curve (t) on (0 ) and a smooth assignment (0 ) ! Y (t) 2 T (t)G then Y (t) is transported parallel to if and only if any smooth extensions X , Y 2 D(G), X ( (t)) = _ (t), Y ( (t)) = Y (t) satisfy (5.7.3). values of the vector fields

A geodesic is a curve parallel to the curve

: R ! G as

(t).

X and Y

along the curve

(t) for which the family of tangent vectors _ (t) is transported It is usual to write the parallel transport equation for a geodesic

r _ _ = 0

:7:4)

(5

2 D(G) of _ satisfies rX X ( (t)) = 0. Given a point 2 G and a tangent vector X 2 T G there exists a maximal open interval I R containing zero and a unique geodesic X : I ! G with (0) = and _ (0) = X (Helgason where by this one means that any smooth extension X

1978, pg. 30).

2 G, ( : (0 ) ! G, (0) = , () = ) there exists a set of n linearly independent smooth assignments t 7! Y i (t) 2 T (t)G, i = 1 : : : n, t 2 (0 ) (where each Yi (t) is transported parallel to ) and which span the set of all Given a fixed curve between two points ,

142


Chapter 5

smooth assignments t 7! Y (t) (Y (t) transported parallel to ) (Helgason 1978, pg. 30). These solutions correspond to choosing n linearly independent vectors in T G as initial conditions and solving (5.7.3) for Yi (t). The construction induces an isomorphism

P(0 ) : T G ! T G P(0 )(Z ) = Z = Pni=1 zi Yi (0) 2 T G. on the curve .

where

n X i=1

ziYi ( )

:7:5)

(5

Of course, this isomorphism will normally depend on the

Parallel transport of a smooth covector field w : G ! T G, is defined in terms of its action

on an arbitrary vector field X

2 D(G)

P(0 )w)(X ) = w(P( 0)X )

(

where P( 0) is parallel transport from

() backwards to (0) along the curve (t). Parallel transport of an arbitrary tensor field T : G ! T G T G TG TG of type (r s) is given by its action on arbitrary covector and vector fields P(0 )T )(w1 : : : wr W1 : : : Ws) = T (P( 0)w1 : : : P( 0)wr P( 0)W1 : : : P( 0)Ws ):

(

Parallel transport of a function f

2 C1(G) is just P(0 )f = f ( ()):

An affine connection on a manifold

G induces a unique differentiation on tensor fields

known as covariant differentiation (Helgason 1978, pg. 40). It is usual to denote the covariant differentiation associated with a given affine connection by the same symbol r. One may think

of covariant differentiation of a tensor T (with respect to a vector field X

at a point

2 G, as the limit

1 rX T )() = slim (P T ( (s)) ; T ( )) !0 s ( 0)

(

2 D(G)) evaluated :7:6)

(5

(t) is the integral curve associated with X , (0) = , (Helgason 1978, pg. 42). In particular, if T is a tensor of type (r s) then rX T is also a tensor of type (r s). Considering where

x5-7


the above definition applied to a function f

rX f )()

(

= =

143

2 C 1(G) one has

f ( (s)) ; f ( ) Df j (X ( )) = (Xf )(): 1

s!0 s lim

Thus as expected, covariant differentiation on C1 (G) corresponds to derivation with respect to the vector field X .

It is easily seen that covariant differentiation inherits property (5.7.1) from the affine connection. To see that it satisfies the Leibniz formulae (rZ (T

R) = rZ T R + R rZ R)

one observes that any operation defined by a limit of the form (5.7.6) has the properties of a classical derivative. A rigourous proof is given in Mishchenko and Fomenko (1980, pg. 329). In particular, given a (0 2) tensor g ( ) contracted with two vector fields X and Y then

rZ (g(X Y )) = rZ g(X Y ) + g(rZ X Y ) + g(X rZY ): Given G a Riemannian manifold (with Riemannian metric g : TG

TG ! R) there exists

a unique covariant differentiation satisfying

rX Y ; rY X rZ g for any smooth vector fields X , Y , Z

= =

X Y ] 0

(5.7.7) (5.7.8)

2 D(G) (Helgason 1978, pg. 48). The affine connection

associated with this covariant differentiation is known as the Levi-Civita connection. Consider the action of the Levi-Civita connection on g (X Y ) for arbitrary vector fields X , Y , Z

rZ g(X Y )

= =

2 D(G),

rZ g(X Y ) + g(rZ X Y ) + g(X rZY ) g (rX Z Y ) + g (X rZ Y ) + g(Z X ] Y ):

By permuting the vector fields X , Y and Z , and then eliminating rX and rY from the resulting equations one obtains 2g (X rZ Y ) = Zg (X Y )+g (Z X Y ])+Y g (X Z )+g (Y X Z ]);Xg(Y Z );g(X Y Z ]):

:7:9)

(5

144


Chapter 5

Since X , Y and Z are arbitrary, this equation uniquely determines the Levi-Civita connection in terms of the metric g .

5.8 Right Invariant Affine Connections on Lie-Groups Let G be a smooth manifold and let : G ! G be a smooth map from G into itself. An affine connection r on G is invariant under if

d rX Y

=

rdX d Y:

If G is a Lie-group then r is termed right invariant if r is invariant under each map r ( ) := ,

2 G.

G be a Lie-group. There is a one-to-one correspondence between right invariant affine connections on G and bilinear maps

Lemma 5.8.1 Let

! : Te G Te G ! Te G given by

!(Y Z ) = (rdr Y dr Z )(e)

:8:1)

(5

for Y , Z

2 TeG.

Proof If

r is an affine connection then (5.8.1) certainly defines a bilinear map from TeG

TeG ! TeG.

TeG ! Te G, let fE 1 : : : E ng be a linearly ~ i = dr E i , independent basis for Te G. Define the n smooth right invariant vector fields E i = 1 : : : n. Thus, for arbitrary vector fields Y , Z 2 D(G) there exist functions yi 2 C 1(G), P ~ i and Z = for i = 1 : : : n and zj 2 C 1 (G), for j = 1 : : : n such that Y = ni=1 yi E Pn z E~ j . One defines r : D(G) ! D(G) Y j =1 j Conversely, given a bilinear map ! : Te G

rY Z =

n X n X i=1

yi

j =1

zj dr!(E i E j ) + (E~ izj )E~ j :

:8:2)

(5

x5-8

Right Invariant Affine Connections on Lie-Groups

145

r is well defined observe that both (E~ izj )E~ j and ! are bilinear in E~ i and E~ j and thus the definition is independent of the choice of fE 1 : : : E ng. To see that r is an

To see that

affine connection one observes that linearity in

f 2 C 1 (G )

rY (fZ )

= =

E~ i ensures that (5.7.1) holds; while for any

1 n n 0n n X X @ yi fzj dr !(E i E j) + f (E~ izj )E~ j A + X yi X zj (E~ if )E~ j i=1 j =1 f rY Z + (Y f )Z

i=1

j =1

and (5.7.2) also holds. Consider two arbitrary vector fields Y and Z and observe that

rdrY drZ

since for any

=

n X n X i=1

yi

zj dr !(E i E j ) + ((dr E~ i)zj )dr E~ j

=

0j=n 1 n 1 X X dr @ yi zj !(E i E j ) + (E~ izj )E~ j A

=

dr rY Z

i=1

j =1

2G drE~ i)zj ( )

(

= = =

Thus,

Dzj j (drE~ i) D(zj r )j; (E~ i) Dzj j (E~ i) = E~ i zj ( ):

r is a right invariant affine connection.

(5.8.3)

1

Moreover, for any two right invariant vector

fields Y and Z

rY Z (e) = rdr Ye drZe (e) = dre!(Ye Ze) = !(Ye Ze) and thus r satisfies (5.8.1). This completes the proof. The following result provides an important relationship between the exponential map on

G (5.6.6), and geodesics with respect to right invariant affine connections. invariant connections is given in Helgason (1978, pg. 102).

A proof for left

146


Proposition 5.8.2 Let then for any X

2 Te G

Chapter 5

r be a right invariant affine connection and let ! be given by (5.8.1) !(X X ) = 0

if and only if the geodesic X : R ! G with _ (0) = X is an analytic Lie-group homomorphism of R into G.

In particular, if X is a group homomorphism then then X must be the unique group homomorphism with geodesic X is just

d X (1) = X (cf. Proposition 5.6.2). X = expX

Thus, if

! (X X ) = 0 then the :8:4)

(5

the exponential map (5.6.6). Let G be a Lie-group with an inner product ge : Te G

Te G ! R on the tangent space at the

identity. Let g be the right invariant group metric (cf. (5.3.1)), then the Levi-Civita connection

defined by g is also right invariant. To see this one computes rdr Z dr Y for arbitrary vector fields X , Y , Z

2 D(G). Using (5.7.9) it follows that

2g (dr X rdrZ dr Y )

since

=

dr Zg (X Y ) + g (drZ drX Y ]) + dr Y g (X Z ) + g (drY dr X Z ]) ; dr Xg (Y Z ) ; g (dr X drY Z ])

g is right invariant (cf. (5.3.2)) and d X Y ] = d X d Y ] (Helgason 1978, pg.

24).

Recalling (5.8.3) one obtains 2g (dr X rdr Z dr Y )

=

=

Zg (X Y ) + g(Z X Y ]) + Y g(X Z ) + g(Y X Z ]) ; Xg(Y Z ) ; g(X Y Z ]) 2g (X rZ Y )

But g is right invariant, and thus 2g (drX rdr Z dr Y ) = 2g (drX drrZ Y ) which shows

that dr rZ Y

=

rdr Z dr Y .

GL(N R) (cf. example Section 5.6). The tangent space of GL(N R) at the identity is TIN GL(N R) = RN N since GL(N R) is an Example 5.8.3 Consider the general linear group

x5-8

Right Invariant Affine Connections on Lie-Groups

147

open subset of RN N . Consider the Euclidean inner product on TIN GL(N R)

hX Y i = tr(X T Y ): The tangent space of GL(N R) at a point

2 G is represented as T GL(N R) = fX j X 2

RN N g the image of TIN GL(N R) via dr . The right invariant metric for GL(N R) generated

by h i is just

g

:

T GL(N R) T GL(N R) ! R g(Y Z ) = tr((;1)T Y T X ;1):

The Levi-Civita connection r associated with g can be explicitly computed on the set of

Y , Z 2 RN N then X , Y and Z are the unique right invariant vector fields associated with X , Y and Z . Using (5.7.9) one has

right invariant vector fields on

2g (X rZY )

=

GL(N R).

Let X ,

Zg (X Y ) + g(Z X Y ]) + Y g(X Z ) + g (Y X Z ]) ; Xg(Y Z ) ; g(X Y Z]):

Now Y Z ] is certainly right invariant (since d X Y ] = d X d Y ] (Helgason 1978, pg. 24)). In particular, observe that Zg (X Y )

=

0

=

Y g (X Z ) = Xg (Y Z ) since in each

case the metric computation is independent of . Parallelling the argument leading to (5.6.4) given in the example Section 5.6 for right invariant vector fields one obtains

A B ] = (BA ; AB) = ;A B]:

Using this it follows that 2g (X rZY )

=

;g(Z X Y ]) ; g(Y X Z ]) + g(X Y Z ])

=

tr(Z T Y X ] + Y T Z X ] + X T Y Z ]):

148


Evaluating the left hand side of this equation at

=

IN

Chapter 5

and writing rZ Y (e)

=

! (Z Y )

one obtains7 tr(X T ! (Z Y ))

= =

1 tr(Z T Y ]X + Y T Z ]X + X T Y Z ]) 2 1 tr(X T Y T Z ] + X T Z T Y ] + X T Y Z ]): 2

Consequently the bilinear map ! for the Levi-Civita connection is given by

!(Y Z ) = 12 Y T Z ] + Z T Y ] + Y Z ] :

:8:5)

(5

Note that ! (X X ) = X T X ] is zero if and only if X is a normal matrix, (i.e. commutes with

its transpose). Consequently, the exponential exp(tX ) and only if X is normal.

=

etX on GL(N R) is a geodesic if 2

5.9 Geodesics In this section the relationship between geodesics on a Lie-group to geodesics on a homogeneous space equipped with the normal metric is outlined. Though intuitively natural the result is difficult to prove. The approach taken is to construct a coordinate basis which block decomposes the Riemannian metric on the Lie-group into two parts, one of which is related to the homogeneous space and the other of which lies in the kernel of the group action. This construction is of interest in itself and justifies the somewhat long proof.

G be a Riemannian Lie-group with metric denoted g . Let r denote the Levi-Civita connection. If 2 G is arbitrary then the geodesics through are just the curves (t) := r X (t) = X (t) where X is a geodesic of G passing through e, the identity of G. To see Let

this one computes (cf. (5.7.4))

r _ _

= =

rdr _ dr_ dr r _ _ = 0:

One also needs the easily verified results tr(ABC ]) = tr(AB ]C ), A T B ]T = B T A] and tr(A) = tr(AT ) for arbitrary matrices A, B , C 2 N N . 7

R

x5-9

Geodesics

149

When dealing with a Riemannian manifold (equipped with the Levi-Civita connection) there is an equivalent characterisation of geodesics using variational arguments. Loosely, geodesics are curves of minimal (or maximal) length between two given points on a manifold. The following result is proved in Mishchenko and Fomenko (1980, pg. 417). Proposition 5.9.1 Let G be a Riemannian manifold with metric denoted g . Consider the cost functional

E ( ) =

Z1 0

on the set of all smooth curves : (0 1) !

M.

The cost functional

g (_ ( ) _ ( ))d G.

Then the extremals of E ( ) are geodesics on

E ( ) measures the action of a curve .

The length of a curve

is

measured by the related cost functional

L( ) =

Z 1q 0

g (_ ( ) _ ( ))d:

Extremals of L( ) correspond to curves that minimise (or maximise) the curve length between

(0) and (1) on M . Extremals of E ( ) are also extremals of L( ) (Mishchenko & Fomenko 1980, Theorem 2, pg. 417), however, the converse is not true. The reason for this is that the

: (0 1) ! G is uniquely parametrized by t 2 (0 1) whereas any curve (t) = T , for T : (0 1) ! (0 1) a smooth map, will have the same length and consequently is an extremal of L( ).

uniqueness of geodesics ensures that a geodesic

G be a Lie-group, M be a smooth manifold and : G M ! M be a smooth transitive group action of G on M . Let g denote a right invariant Riemannian metric on G and g M denote the induced normal metric on M . If : R ! G is a geodesic (with respect to the Levi-Civita connection) on G then the curve : R ! M Theorem 5.9.2 Let

(t) := ( (t) p) is a geodesic (with respect to the Levi-Civita connection generated by the induced normal metric) on M . Proof It is necessary to develop some preliminary theory before proving the main result.

150


Chapter 5

2 M be arbitrary and recall that the stabilizer of p, H = stab(p) = f 2 G j ( p) = pg is a Lie-subgroup of of dimension n ; m of G. In particular, h the Lie-algebra of H is a Lie-subalgebra of g the Lie-algebra of G. Let X 2 h and let exp be the exponential map on G, then t 7! exp(tX ) is a smooth curve lying in H (Warner 1983, pg. 104). Moreover, let fEm+1 : : : En g be a basis Denote the dimension of G by n and the dimension of M by m. Let p

for Te H then one can choose local coordinates for H around e

xm+1 : : : xn ) = exp(xm+1 Em+1 ) exp(xm+2 Em+2 ) exp(xn En ):

(

These coordinates are known as canonical coordinates of the second kind for

H

and are

described in Varadarajan (Varadarajan 1984, pg. 89).

fEm+1 : : : Eng of TeG to a full basis of TeG choosing the remaining tangent vectors fE1 : : : Emg to satisfy Extend the partial basis

g(Ei Ej ) = 0 i = 1 : : : m j = m + 1 : : : n: Let

2 G be an arbitrary point in G and define canonical coordinates of the second kind on

G, centred at ,

x x(x1 : : : xn ) Identify Rn

=

Rm

:

Rn

! G

:= exp(x1E1) exp(x2E2 ) exp(xn En ):

Rn;m as a canonical decomposition into the first m coordinates and the

remaining n ; m coordinates. Define the two inclusion maps

i1 : Rm ! Rn i2 : Rn;m ! Rn

i1(x1 : : : xm) := (x1 : : : xm 0 : : : 0) i2(xm+1 : : : xn) := (0 : : : 0 xm+1 : : : xn ):

One now has maps

x1 : Rm ! G x1 := x i1 = exp(x1 E1) exp(xmEm )

x5-9

Geodesics

151

and

x2 : Rn;m ! G x2 := x i2 = exp(xm+1 Em+1 ) exp(xnEn ) The map x2 is just the canonical coordinates of the second kind for the embedded submanifold

r (H ) = H.

The relationship of these maps is shown in the commutative diagram, Figure

5.9. Observe that the range of dx1 is exactly dr (spfE1 : : : Emg) since the map xi

exp(xi Ei ), which is exactly x(0 : : : 0 xi 0 : : : 0) has differential

dx( @x@ i ) = drEi = E~i

:9:1)

(5

~i is the unique right invariant vector field on G associated with Ei where E ~i one has d p E

7! r

2 TeG. In addition,

i = m + 1 : : : n since H is a coset of the stabilizer stab(p). Recall ~ : : : E ~ g. the definition of dom d p (cf. (5.3.6)). It follows directly that dom d p = spfE m 1 =

0 for

Consider the map

y y(y 1 : : : y m)

: :=

! M

p x i1(y 1 : : : ym ): Rm

dy = d p dx1 is bijection. Thus, the map y forms local coordinates for the manifold M centred at p ( ). This completes the

Observe from the above discussion that the differential

construction of the local coordinate charts shown in the commutative diagram, Figure 5.9.

y -M Rm * 6

p i - ) xx ? R Rn G i PP P 6i PPPxP PPPP n;m

R

R

1

1

2

2

R

Figure 5.9.1: A commutative diagram showing the various coordinate charts and smooth curves on G and M constructed in the proof of Lemma 5.9.2.

152

x.

Rn


Chapter 5

Consider the local coordinate representation of the Riemannian metric g in the coordinates

Canonically associating tangent vectors of Rn at a point x,

Z=

Z 2 TxRn with the full space

n X zi @x@ i ! 7 (z1 : : : zn) i=1

then the local coordinates representation of the metric g , denoted g0 can be written in matrix form

g0(Y Z ) = Y T G(x)Z G(x) 2 Rn n is a positive definite, symmetric matrix. Now consider arbitrary vector fields Y = (y1 : : : ym 0 : : : 0) and Z = (0 : : : zm+1 : : : zn ) then

where

g0(Y Z )

= =

g (dxY dxZ ) n m X X yi zj g (Ei Ej ) = 0: i=1 j =m+1

Thus, the matrix G(x) is block diagonal of form

0 G (x) G(x) = B @ 11 0

0

G22(x)

1 CA :

Moreover, since the maps shown in Figure 5.9 are commutative and the metric gM on

M

is

g on dom d p = spfE~1 : : : E~mg it is easily shown that the local coordinate representation of gM on Rm is

induced by the action of

gM0 (Y Z ) = Y T G11(i1(y ))Z = (di1Y )T G(i1(y))(di1Z ): I proceed now to prove the main result. Let : R !

(t)

: :=

G be a geodesic and define

R!

M

p (t):

2 R be a parameter and consider any one parameter smooth variation of the curve on M . Assume that 0 = and (t) is a smooth map from R ! M . Both and have Let

local coordinate representations on Rm in the coordinates described above. Denote the local

x5-9

Geodesics

coordinate representations by 0 := y ;1

and 0

:= y ;1 . Let

153

: R ! Rm be the

and

smooth curve

:= 0

; 0

since subtraction of vectors is well defined in Rm . The curves , ,

are shown

on the commutative diagram, Figure 5.9. Denote the local coordinate representation of by

0 = x;1 .

Observe that since is a geodesic of

G then 0 is a geodesic of Rn equipped

with the metric g 0 (Mishchenko & Fomenko 1980, Lemma 3, pg. 345). Consider the following

one parameter smooth variation of 0

0 := 0 + i1

0

=( 1+(

)1 : : : m0 + ( )m m0 +1 : : : n0 ):

The action E ( 0 ) on Rn with respect to the Riemannian metric g 0 is

E ( 0 )

= =

Z1 Z01

g 0(_ 0 ( ) _ 0 ( ))d ( _ 0 ( )1

: : : _ 0 ( )m 0 : : : 0)T G(x)(_ 0 ( )1 : : : _ 0 ( )m 0 : : : 0) 0 0 0 T 0 0 +2(_ ( )1 : : : _ ( )m 0 : : : 0) G(x)(0 : : : 0 _ ( )m+1 : : : _ ( )n ) 0 0 T 0 0 _ ( )m+1 : : : _ ( )n ) G(x)(0 : : : 0 _ ( )m+1 : : : _ ( )n )d: + (0 : : : 0

The middle term of this expansion is zero due to the block diagonal structure of G(x) while the

last term is independent of since the perturbation i1 only enters in the first m coordinates.

Thus, recalling the construction of 0 , one has

d E ( 0 ) =0 d

=

=

=

d Z 1 (_ 0( ) : : : _ 0( ) 0 : : : 0)T G(x)(_ 0( ) : : : _ 0( ) 0 : : : 0) 1 m 1 m d 0 T +( _ ( )1 : : : _ ( )m 0 : : : 0) G(x)( _ ( )1 : : : _ ( )m 0 : : : 0)d =0 d Z 1 (_ 0 ( ) : : : _ 0 ( ) )T G (x)(_ 0 ( ) : : : _ 0 ( ) )d 1 m 11 1 m d 0 =0 Z 1 d g 0 (_ 0 ( ) _ 0 ( ))d = d E ( 0 )d =0 =0 d d 0 M

However, since 0 is a geodesic it follows that 0 is extremal and

d E ( 0) = 0 =0 d

154


which means that the derivative dd E ( 0 )

Chapter 5

E ( ) =0 = 0 on M and since =0 is an arbitrary smooth one parameter perturbation it follows that is an extremal of the action E on M . From Proposition 5.9.1 one now concludes that is a geodesic and the proof =

0. Thus, dd

is complete.

G from geodesics on M . Let : R ! M be a geodesic and define : R ! G, by := x i1 y ;1 . As above, define 0 := y ;1 and 0 := x;1 = i1 0. Then let : R ! Rn be any one parameter perturbation of , 0 := x;1 and define : 0 ; 0. This construction induces a ;

;

perturbation in 0 given by 0 := + ( )1 : : : ( )m = ( )1 : : : ( )m . Furthermore, one has by construction that the i’th component of _ is zero for i = m + 1 : : : n. It follows Remark 5.9.3 It is also possible to construct geodesics on

that

d E ( ) =0 d

= =

d E ( 0 ) =0 d Z d 1(_ 0 ( ) : : : _ 0 ( ) 0 : : : 0)T G(x)(_ 0 ( ) : : : _ 0 ( ) 0 : : : 0) m m 1 1 d +(0

=

m components zero and the remaining n ; m components given by the corresponding components of . Observe that 02 = 0 since 0 = 0 and thus = 0 is a local minimum of E ( 2) since E is positive definite. It follows that d E ( 2) d E ( 0 ) = 0 while the first relationship = 0 since 0 is a geodesic. This d d =0 =0 Here

: : : 0 _ ( )m+1 : : : _ ( )n )T G(x)(0 : : : 0 _ ( )m+1 : : : _ ( )n)d =0 d E (0 ) + d E ( 2) : =0 =0 d d 0

2 denotes the curve in Rn

with first

d E ( ) = 0 =0 d for any one-parameter perturbation of which proves that is a geodesic.

shows that

2

Chapter 6

Numerical Optimization on Lie-Groups and Homogeneous Spaces The numerical algorithms proposed in Chapters 2 to 4 are based on a single idea, that of interpolating the integral solution of a gradient flow via a series of curves lying wholly within the constraint set. For each iteration, the particular curve chosen is tangent to the gradient flow at the present estimate and the next estimate is evaluated using a time-step chosen to ensure the cost function is monotonic decreasing (for minimisation problems) on the sequence of estimates generated. Algorithms of this type are related to the classical gradient descent algorithms on Euclidean space, for which the interpolation curves are straight lines. Consequently, the algorithms proposed in the preceding chapters are termed modified gradient descent algorithms where the modification is the use of a curve rather that a straight line to interpolate the gradient flow. The property of preserving the constraint while solving the optimization problem is a fundamental property of the algorithms proposed. This property, termed constraint stability (cf. page 2) is conceptually related to recent work in developing numerical integration schemes that preserve invariants of the solution of an ordinary differential equation. Results on numerical integration methods for Hamiltonian systems are particularly relevant to the general class of problems considered in this thesis. Early work in this area is contained in the articles (Ruth 1983, Channell 1983, Menyuk 1984, Feng 1985, Zhong & Marsden 1988). A recent review

155

156


Chapter 6

article of this work is Sanz-Serna (1991). Following from this approach is a general concept of numerical stability (Stuart & Humphries 1994) which is loosely defined as the ability of a numerical integration method to reproduce the qualitative behaviour of the continuous-time solution of a differential equation. The development given in Stuart and Humphries (1994) is not directly applicable to the solution of optimization problems since it is primarily focussed on integration methods and considers only a single qualitative behaviour at any one time, either the preservation of an invariant of a flow (Hamiltonian systems) or the convergence of the solution to a limit point (contractive problems). In contrast, the optimization problems discussed in this thesis require two simultaneous forms of numerical stability, namely preservation of the constraint relation and convergence to a limit point within the constraint set. This leads one to consider what properties a numerical method for optimization on a homogeneous space should display. In Chapter 1 the three properties of simplicity, global convergence and constraint stability were defined (page 2) in the context of numerical methods for on-line and adaptive processes. The modified gradient descent algorithms proposed in the early part of this thesis all displayed these properties. It is natural to ask whether the proposed algorithms are in fact closely related. In particular, since the the only difference between the proposed algorithms is in the curves used to interpolate the gradient flow it is important to investigate the properties of these curves more carefully. Indeed, one may ask whether the choice of curves can be justified or whether there may be more suitable choices available. In this chapter I begin by reviewing the gradient descent algorithms proposed in Chapters 2 to 4 and using the theoretical results of Chapter 5 to develop a mathematical framework which explains each algorithm as an example of the same concept. This provides a design procedure for a deriving numerical methods suitable for solving any constrained optimization problem on a homogeneous space. The remainder of the chapter is devoted to developing a more sophisticated constrained optimization algorithm exploiting the general theoretical framework provided by Chapter 5. The method considered is based on the Newton-Raphson method reformulated (in coordinate free form) to evolve explicitly on a Lie-group. Local quadratic convergence behaviour is proved though the method is not globally convergent. To provide an interesting example the symmetric eigenvalue problem is considered (first discussed in Chapter 2) and a Newton-Raphson method derived for this case. It is interesting to compare the behaviour of this example with the classical

x6-1

Gradient Descent Algorithms on Homogeneous Spaces

157

shifted QR algorithm, however, it is not envisaged that the proposed method is competitive for solving traditional problems. The interest in such methods is for solving numerical problems for on-line and adaptive processes. The chapter is divided into five sections. Section 6.1 discusses the theoretical foundation of the modified gradient descent algorithms proposed in Chapters 2 to 4 and develops a general template for generating such methods. Section 6.2 develops the general form of the NewtonRaphson iteration on a Lie-group and proves quadratic convergence of the algorithm in a neighbourhood of a given critical point. Section 6.3 provides a coordinate free formulation of the Newton-Raphson algorithm. The theory is applied to the symmetric eigenvalue problem in Section 6.4 and a comparison is made to the performance of the QR algorithm.

6.1 Gradient Descent Algorithms on Homogeneous Spaces In this section the numerical algorithms proposed in Chapters 2 to 4 are discussed in the context of the theoretical discussion of Chapter 5. Recall the constrained optimization problem posed in Chapter 2 for computing the spectral decomposition of a matrix H0. The algorithm proposed for this task was the double-bracket algorithm (2.1.4)

Hk+1 = e;k Hk D]Hk ek Hk D] where1

D = diag(1 : : : N ).

The algorithm has the property of explicitly evolving on the

set

M (H0) = fU T H0U j U 2 O(N )g of all orthogonal congruency transformations of H0 . The set of orthogonal matrices O(N ) is

certainly an abstract group and indeed is a Lie-subgroup of GL(N R) (Warner 1983, pg. 107). The orthogonal group O(N ) features in all of the numerical algorithms considered and it seems a good opportunity to review its geometric structure. 1. The identity tangent space of O(N ) is the set of skew symmetric matrices (Warner 1983,

1 In Chapter 2 the diagonal target matrix was denoted N , however, to avoid confusion with the notation of Chapter 5 the target matrix is now denoted D and the dimension of the matrices is denoted N .

158


Chapter 6

pg. 107)

TIN O(N ) = Sk(N ) = f 2 RN

N

j = ;T g:

U 2 O(N ) is given by the image TIN O(N ) via the linearization of rU : O(N ) ! O(N ), rU (W ) := WU (right translation by U )

2. The tangent space at a point

TU O(N ) = fU 2 RN 3. By inclusion

Sk(N )

N

j 2 Sk(N )g:

:1:1)

(6

RN N is a Lie-subalgebra of the Lie-algebra gl(N R) of

GL(N R). In particular, Sk(N ) is closed under X Y ] 2 Sk(N ) if X and Y are skew symmetric.

the matrix Lie-bracket operation

4. The scaled Euclidean inner product on Sk(N )

h1 2i = 2tr(1T 2) generates a right invariant group metric on O(N ),

g (1U 2U ) = 2tr(T1 2): Observe that g (1U 2U ) = 2tr(U T 1T 2U )

:1:2)

(6

h U 2U i since U T U = IN . Thus

= 1

the right invariant group metric on O(N ) is the scaled Euclidean inner product restricted to each individual tangent space.

5. The Levi-Civita connection generated by the right invariant group metric (6.1.2) (cf. Example 5.8.3) is associated with the bilinear map ! : Sk(N )

Sk(N ) ! Sk(N ),

!(1 2) = 1 2]: This follows directly from (5.8.5) while observing that 2 Sk(N ) implies T ] = 0.

The extra factor of 2 in(6.1.2) cancels the factor of 1=2 in (5.8.5).

6. The value of ! ( ) = 0 for any 2 Sk(N ) and thus all curves

(t) = exp(t)

x6-1

Gradient Descent Algorithms on Homogeneous Spaces are geodesics on O(N ) passing through IN at time t

=

159

0. By uniqueness this includes

all the possible geodesics on O(N ) passing through I N . 7. Geodesics on O(N ) passing through U

2 O(N ) and with tangent vector _ (0) = U 2

TU O(N ) at time t = 0 are given by (cf. Section 5.9)

(t) = exp(t)U: Recall once more the double-bracket algorithm (2.1.4)

Hk+1 = e;k Hk D] Hk ek Hk D],

mentioned above. In Section 2.5 the associated orthogonal algorithm

Uk+1 = Uk ek UkT H Uk D] 0

was discussed and shown to be related to the double-bracket equation via the algebraic relationship

Hk = UkT H0 Uk : Uk ek UkT H Uk D] does not appear to be in the correct form for a geodesic exp(t)U on O(N ). The reason for this lies in the characterisation of M (H0) = fU T H0 U j U 2 O(N )g. In particular, (U H ) 7! U T HU is not a group action of O(N ) on M (H0). The use of this awkward definition for M (H0) is historical (cf. Brockett (1988) and the development Unfortunately,

0

in Helmke and Moore (1994b, Chapter 2)). By considering the related characterisation

M (H0) = fWH0 W T j W 2 O(N )g M (H0) is seen to be a homogeneous space with transformation group O(N ) and group action (W H ) 7! WHW T . Of course, all that has been done is to take the transpose of the orthogonal matrices. It is easily shown that the associated orthogonal iteration for the new characterisation of M (H0) is

Wk+1 = e;k Wk H WkT D] Wk : 0

Observe that this iteration is constructed from geodesics on

O(N ).

Thus, the associated

orthogonal iteration for the double-bracket algorithm is a geodesic interpolation of the flow

W_ = WH0W T D]W . Using Lemma 5.9.2 geodesics on O(N ) will map to geodesics on M (H0) and one concludes that the the double-bracket algorithm itself is a geodesic interpolation

160


Chapter 6

of the double-bracket flow

H_ = H H D]]: Recall that geodesics are curves of minimum length between two points on a curved surface and are the natural generalization of straight lines to non-Euclidean geometry. Then, at least for the double-bracket algorithm, the question posed in the introduction to this chapter, whether the choice of interpolating curves in the proposed numerical algorithms is justified, is answered in the affirmative. It should not come as a surprise that the other algorithms proposed in Chapters 2 to 4 are also geodesic interpolations of continuous-time flows. The algorithm proposed in Section 2.4 is based directly on the double-bracket equation and can be analysed in exactly the same manner. In Chapter 3 the Rayleigh gradient algorithm (3.2.1) is immediately in the correct form to observe its geodesic nature. Indeed, for the rank-1 case (cf. Subsection 3.4.1) the geodesic nature of the recursion has already been observed explicitly. Finally the pole placement algorithm (4.6.1) proposed in Chapter 4

i+1

e;i Ti F i Q( Ti F i ;A)] e;i F i Q( Ti F i ;A) Ti ] i

i

= =

is explicitly a geodesic interpolation of the gradient flow (4.4.7)

F Q(A ; T F ) T ]

_ =

evolving directly on the Lie-group O(N ). Thus, the algorithms proposed in Chapters 2 to 4 form a template for a generic numerical approach to solving optimization problems on homogeneous spaces associated with the orthogonal group. In every case considered exponential interpolation of the relevant continuous-time flow is equivalent to geodesic interpolation of the flow due to the specific structure of O(N ). Care should be taken before the same approach is used for more abstract Lie-groups (the easily constructed exponential interpolation curves may no longer be geodesics), nevertheless, the basic structure of the algorithms presented is extremely simple and could be applied to almost any optimization problem on a homogeneous space. Of course, step-size selection schemes

x6-2


161

must be determined for each new situation and the stability analysis depends on the step-size selection. The basic properties of the algorithms will remain consistent, however, and provide a useful technique for practical problems where the properties of constraint stability and global convergence (cf. page 1) are more important than those of computation cost.

6.2 Newton-Raphson Algorithm on Lie-Groups In this section a general formulation of the Newton-Raphson algorithm is proposed which evolves explicitly on a Lie-group. Interestingly, the iteration can be expressed in terms of Lie-derivatives and the exponential map. In practise, one still has to solve a linear system of equations to determine the regression vector. The Newton-Raphson algorithm is a classical (quadratically convergent) optimization technique for determining the stationary points of a smooth vector field (Kincaid & Cheney 1991,

Z : Rn ! Rn a smooth vector field2 on Rn , let p 2 Rn be a stationary point of Z (i.e. Z (p) = 0) and let q 2 Rn be an estimate of the stationary point p. Let k = (k1 k2 : : : kn), with k1 : : : kn non-negative integers be a multi-index and denote its size by jkj = k1 + k2 + + kn . Expanding Z as a Taylor series around q one obtains for each element of Z = (Z 1 Z 2 : : : Z n ), pg. 64). Given

Z i (x) = Z i(q) +

1 (h1 )k (hn )kn X @ jkjZ i (q ) 1 k k ! kn ! @ (x ) @ (xn )kn 1 jkj=1 1

1

x ; q 2 Rn and hj is the j ’th element of h, and the sum is taken over all multi-indices k with jkj = j for j = 1 2 : : :. The Taylor series of an analytic3 function is uniformly and absolutely convergent in a neighbourhood of q (Fleming 1977, pg. 97). Indeed, if q is a good estimate of p one expects that only the first few terms of the Taylor series are sufficient to provide a good approximation of Z i . Assume that p is known and consider setting where

h

=

R R

When dealing with Euclidean space one naturally associates the element @=@x i of the basis of Tx n with the basis element e i of n (the unit vector with a 1 in the i’th position). This induces an isomorphism Tx n n (Warner 1983, pg. 86) and one writes a vector field as map Z : n ! n rather than the technically more correct Z : n ! T n , Z (x) 2 Tx n . 3 In fact a smooth function f 2 C (M ) on a smooth manifold M is defined to be analytic at a point p 2 M if the Taylor series of f 0 , the expression of f in local coordinates centred at p, is uniformly and absolutely convergent in a neighbourhood of 0. 2

R

R

R

R

R

R

R

162


Chapter 6

x = p = h + q. Ignoring all terms with jkj 2 one obtains the approximation 0 = Z i (p) Z i (q ) +

The Jacobi matrix is defined as the

n @Z i X

j

j (q )h : j =1 @x

n n matrix with (i j )’th element (Jq Z )ij

=

@Zji (q ) @x

(Mishchenko & Fomenko 1980, pg. 16). Thus, the above equation can be rewritten in matrix =

form as 0

Z (q) + Jq Z h.

When Jq Z is non-singular one can solve this relation uniquely

for h, an estimate of the residule error between q and p. Thus, one obtains a new estimate q0 of

p based on the previous estimate q and the correction h q0 = q + h:

This estimate is the next estimate of the Newton-Raphson algorithm. Given an initial estimate

q0 2 Rn, the Newton-Raphson algorithm is: Algorithm 6.2.1 [Newton-Raphson Algorithm on Rn ] Given qk

2 Rn compute Z (qk ).

Compute the Jacobi matrix Jqk Z given by (Jqk Z )ij Set h = ;(Jqk Z );1 Z (qk ).

Set qk+1 Set k

=

=

=

@Z i (qk ). @xj

qk + h.

k + 1 and repeat.

2

The convergence properties of the Newton-Raphson algorithm are given by the following proposition (Kincaid & Cheney 1991, pg. 68).

Proposition 6.2.2 Let

Z

: Rn

! Rn be an analytic vector field on Rn and p 2 Rn be a

stationary point of Z . Then there is a neighbourhood U of p and a constant C such that the Newton-Raphson method (Algorithm 6.2.1) converges to p for any initial estimate q0

2 U and

the error decreases quadratically

jjqk+1 ; pjj C jjqk ; pjj2: It is not clear how best to go about reformulating the Newton-Raphson algorithm on an

x6-2


arbitrary Lie-group

G.

163

One could use the Euclidean Newton-Raphson algorithm in separate

local coordinate charts on G. Care must be taken, however, since local coordinate charts may display extreme sensitivity to perturbation in the Euclidean coordinates, leading to numerically ill conditioned algorithms. Given a Lie-group G, let 2 C 1 (G) be an analytic real function on G. Denote the identity

G by e and associate the tangent space Te G with the Lie-algebra g of G in the canonical manner (cf. Section 5.6). For X 2 Te G arbitrary define a right invariant vector ~ 2 D(G) by X ~ = dr X , for i = 1 : : : n, where r () := (cf. (5.1.1) and the field X analogous definition for left invariant vector fields (5.6.1)). Recall that the map t 7! exp(tX ) element of

(where the exponential is the unique Lie-group homomorphism associated with the Lie-algebra

7! X , cf. (5.6.7)) is an integral curve of X~ passing through e at time zero. Given 2 G arbitrary, the map t 7! exp(tX ) is an integral curve of the right ~ passing through the point 2 G at time zero. It follows directly from invariant vector field X homomorphism (d=dt)

this observation that

X

d X )) =t = dt (exp(X )) :

( ~ )(exp(

=t

Indeed, there is a natural extension of this idea which generalizes to higher order derivatives. These derivatives can be combined into a Taylor theorem for analytic real functions on a Liegroup. Proposition 6.2.3 is proved in Varadarajan (Varadarajan 1984, pg. 96) and formalises this concept. Before this result can be stated it is necessary to introduce some notation. Notation: Let k

k k2 : : : kn), with k1 , k2 : : : non-negative integers, represent a multiindex and denote its size by jkj = k1 + k2 + + kn . Let Z1 : : : Zn be n objects and let t = (t1 : : : tn ) be any set of n real numbers. The set of objects (in Proposition 6.2.3 the objects will be vector fields) of the form t1 Z1 + + tn Zn forms a vector space under = ( 1

addition and scalar multiplication. One also considers formal products of elements, for example

tZ tZ tZ

t21 t2 (Z1Z2 Z1), where the scalar multiplication is commutative but multiplication between elements Z1 and Z2 is non-commutative. One defines an additional element 1 = Z0 which acts as a multiplicative identity, Z0(t1 Z1 ) = (t1 Z1 ) = (t1 Z1)Z0 . Given a multi-index k = (k1 k2 : : : kn) consider a second multi-index (i1 : : : ijkj) with jkj entries ip 2 0 1 : : : n] such that the number of occurrences where ip = j for 1 j n is exactly ( 1 1 )( 2 2 )( 1 1 ) =

164


Chapter 6

kj . Let Z = t1 Z1 + + tn Zn then the formal power Z k is defined by

+ tn Zn)k = jk1j!

tZ

( 1 1+

X

tk tknn )(Zi Zi Zijkj ):

( 11

(i1i2 :::ijkj )

1

2

In other words, the sum is taken over all permutations of elements of the form (ti1 Zi1 )(ti2 Zi2 )

(tijkj Zijkj ) such that there are exactly k1 occurrences of t1 Z1 , k2 occurrences of t2 Z2 etc. Of course, if the size of jkj is equal to either zero or one then the situation is particularly simple t Z + tn Zn )k k (t1 Z1 + + tn Zn ) ( 1 1+

= =

1 for jkj = 0

tkj Zkj for jkj = 1 where kj = 1 is the only nonzero element of k:

Proposition 6.2.3 Given G a Lie-group and 2 C1 (G) an analytic real function in a neigh-

bourhood of a point

G.

2 G, let X1 : : : Xn 2 TeG be a basis for the identity tangent space of

~ Define the associated right invariant vector fields X i

=

dr Xi , for i = 1 : : : n and let k

represent a multi-index with n entries. The asymptotic expansion

;exp(t X 1

1+

+ tn Xn) =

1 tk tkn X n ((X 1 ~ + +X ~ n )k )( ) 1 k ! k ! 1 n jkj=0

:2:1)

1

(6

converges absolutely and uniformly in a neighbourhood of .

G be a Lie-group and 2 C1(G) be an analytic map on G. Choose a basis X1 : : : Xn 2 TeG for the identity tangent space of G and define the associated right in~ i = dr Xi , for i = 1 : : : n. Expressing as a Taylor series around a variant vector fields X point 2 G one has Let

;exp(t X 1

1+

+ tnXn ) = () +

n X j =1

tj (X~ j )() + O(jjtjj2)

:2:2)

(6

where O(jjtjj2) represents the remainder of the Taylor expansion, all terms for which jkj 2.

~ i and discarding the By taking the derivative of this relation with respect to the vector fields X

higher order terms one obtains the approximation

n X

~ ~ X (X 1 + + tn Xn ) Xi ( ) + i ~ j )( )tj :

; X exp(t X ~

i

1

j =1

:2:3)

(6

x6-2


Define the Jacobi matrix of to be the n

165

n matrix with (i j )’th element

J )ij = (X~ iX~ j )()

:2:4)

(

(6

which is dependent on the choice of basis X1 : : : Xn for Te G. Define the two column vectors

t

t : : : tn)T

= ( 1

and

X : : : X~ n ())T .

( ) = ( ~1 ( )

Recalling the discussion of the

Newton-Raphson method on Rn it is natural to consider the following iteration defined for

2G

t 0

= =

;(J );1 () exp(t1 X1 + + tn Xn ):

The motivation for considering this algorithm parallels that given above for the NewtonRaphson method on Rn . If

is a critical point of then X~i () = 0 for each X~ i . Thus assuming that exp(t1 X1 + + tn Xn ) = and then solving the approximate relation for (t1 : : : tn ) gives a new estimate 0 = exp(t1 X1 + + tn Xn ) . It follows that if was a good estimate of then the difference between and the approximate Taylor expansion should be of order O(jjtjj2) and consequently the new estimate 0 will be a correspondingly better estimate of . Given an initial point 0 2 G and a choice of n basis elements fX1 : : : Xng for Te G the Newton-Raphson algorithm on G is: Algorithm 6.2.4 [Newton-Raphson Algorithm on a Lie-group G] Given k

2 G compute (k ).

Compute the Jacobi matrix (Jk ) given by (Jk )ij

Set t = ;(Jk );1 (k ). Set k+1 Set k

=

=

X X k ).

= ( ~ i ~ j )(

exp(t1 X1 + + tn Xn )k .

k + 1 and repeat.

2

2 C1(G) an analytic real function with a critical point 2 G, let 2 G be arbitrary and define f : Rn ! G, f (t) := exp(t1X1 + + tn Xn ) Lemma 6.2.5 Given G a Lie-group and

G centred at (Varadarajan 1984, pg. 88). ~ 1 (f (t)) : : : X ~ n (f (t))). An iteration Define a smooth vector field Z on Rn by Z (t) = (X of the Newton-Raphson algorithm (Algorithm 6.2.4) on G with initial condition is the image to be canonical coordinates of the first kind on

166


Chapter 6

of a single iteration of the Newton-Raphson algorithm (Algorithm 6.2.1) on Rn with initial condition 0 via the canonical coordinates f .

Proof Observe that

Z (0)

X f ( ):

=

Also for 1 i j

n one finds @ Z j @ti t=0

= = =

d g (exp(rX )) since dr

matrices J0 Z

r=0

=

Xg e

= ~ ( )

: : : X~ n (f (0)))

( ~ 1 ( (0))

=

@ X~ (f (t)) t=0 @ti j @ (X~ r )(exp(t X )) i i @t j i

ti =0

XX ~i ~j ( )

for any

g 2 C 1 (G) and X 2 g.

Thus, the two Jacobi

J are equal. The Newton-Raphson algorithm on Rn is just t = 0 ; (J0 Z );1 Z (0) = ;(J );1():

and the image of t is exactly 0

on G.

=

exp(t1 X1 + + tn Xn ) the Newton-Raphson algorithm

It is desirable to prove a similar result to Proposition 6.2.2 for the Newton-Raphson method on a Lie-group. To compute the rate of convergence one needs to define a measure of distance in a neighbourhood of the critical point considered. Let G be a neighbourhood of a critical

2 G of an analytic function 2 C 1(G) and let fX1 : : : Xng be a basis for TeG as above. There exists a subset U such that the canonical coordinates of the first kind on G centred at , (t1 : : : tn) 7! exp(t1 X1 + + tn Xn ) are a local diffeomorphism onto U point

(Helgason 1978, pg. 104). One defines distance within U by the distance induced on canonical coordinates centred at by the Euclidean norm in Rn ,

jj exp(t1X1 + + tn Xn)jj =

X n i=1

( i )2

t

!

1 2

:

x6-2


167

2 C1(G) an analytic real function on a Lie-group G let 2 G be a critical point of . There exists a neighbourhood W G of p and a constant C > 0 such that the Newton-Raphson algorithm on G (Algorithm 6.2.4) converges to for any initial estimate 0 2 W and the error, measured with respect to distance induced by canonical coordinates of Lemma 6.2.6 Given

the first kind, decreases quadratically

jjk+1 ; jj C jjk ; jj2: Rn be an open neighbourhood of 0 in Rn and define a smooth vector by Z (x) = ~ (f (x)) : : : X ~ (f (x))), where f : Rn ! G, f (x) := exp(x1 X + + xn X ) are (X n n 1 1

Proof Let U1

canonical coordinates of the first kind. Since point of

Z , i.e. Z (0) = 0.

is a critical point of then 0 is a stationary

Applying Proposition 6.2.2 one obtains an open neighbourhood

U2 U1 of 0 and a constant C1 such that the Newton-Raphson algorithm on Rn (Algorithm 6.2.1) converges quadratically to zero for any initial condition in U2 . A standard result concerning the exponential of the sum of two elements of a Lie-algebra is (Helgason 1978, pg. 106) exp(X ) exp(Y ) = exp((X + Y ) + O(jjX jj jjY jj))

g containing 0 and a real number C2 > 0 such that for X , Y 2 g there exists Z 2 g such that exp(Z ) = exp(X ) exp(Y ) and jjZ jj C2jjX jj jjY jj. Of course Rn g via the isomorphism x 7! x1 X1 + + xnXn and corresponds to an open set U3 Rn . Let r > 0 such that the open ball Br = fx 2 Rn j jjxjj < rg is fully contained in U3 \ U2 . for X and Y sufficiently small. By this one means there exists an open set

Let

U = x 2 Rn j x 2 B = minfr=2 4(C W centred at . and define

=

g +C )

1 1

2

f (U ) G as the image of U via the canonical coordinates of the first kind

The proof proceeds by induction. Assume k

2 W and qk 2 U such that f (qk ) = k . Let

tk+1 denote the next iterate of the Newton-Raphson algorithm on Rn (Algorithm 6.2.1) with

168


Chapter 6

initial condition q k . From above one has

jjtk+1 jj C1jjqk jj2 12 jjqk jj U U1 and the second follows from the fact that jjqk jj < 4(C 1+C ) . Define tk+1 = tk+1 ; qk and observe that the affine change of basis x 7! x ; qk = x0 preserves the form of the Newton-Raphson algorithm (Algorithm 6.2.1) applied to the transformed vector field Z0 (x0) = Z (x0 + qk ) = Z (x). Thus, tk+1 is the next iterate of the Newton-Raphson algorithm (Algorithm 6.2.1) for the vector field Z0 and the where the first inequality follows from the fact that 1

2

initial condition 0. Moreover

x0 7! exp((x0)1X1 + + (x0)nXn )k are the canonical coordinates of the first kind centred at k . In particular, applying Lemma 6.2.5, it follows that the next iterate of the Newton-Raphson algorithm on G (Algorithm 6.2.4)

k+1 is

Substituting k

k+1 = exp(t1k+1 X1 + + tnk+1 Xn)k : =

exp(qk1 X1 + + qkn Xn ) one has

n X

k+1 = exp(

i=1

n X

tik+1 Xi) exp(

i=1

qki Xi):

But

jjtk+1 jj jjtk+1 jj + jjqk jj 2jjqkjj 2 2Br=2 U3 and thus there exists qk+1 such that

n X

qki +1 Xi)

=

jjqk+1 ; (tk+1 + qk )jj

=

exp(

By construction k+1

=

i=1

n X

exp(

i=1

n X

tik+1 Xi) exp(

i=1

qki Xi)

jjqk+1 ; tk+1 jj C2 jjqkjjjjtk+1 jj 2C2jjqkjj2:

P exp( ni=1 qki +1 Xi ) and

x6-3


169

jjqk+1 jj jjqk+1 ; tk+1 jj + jjtk+1 jj = 2C2 jjqk jj2 + C1 jjqk jj2 : To see that the sequence qk+1 does in fact converge to zero one observes that jjqk+1 jj

1 2

jjqk jj

since jjqk jj < 4(C11+C2 ) . Observing that qk+1 is just the representation of the next iterate k+1 of the Newton-Raphson algorithm on G (Algorithm 6.2.4) in local coordinates, one has

jjk+1 ; jj C jjk ; jj2: where C

=

2C2 + C1 and the proof is complete.

Remark 6.2.7 An interesting observation is that though each single iteration of the NewtonRaphson algorithm (Algorithm 6.2.4) on

G is equivalent to an

iteration of the Euclidean

Newton-Raphson algorithm (Algorithm 6.2.1) in a certain set of local coordinates this is not

2

true of multiple iterations of the algorithm and the same coordinate chart.

6.3 Coordinate Free Newton-Raphson Methods The construction presented in the previous section for computing the Newton-Raphson method on a Lie-group G depends on the construction of the Jacobi matrix J (cf. (6.2.4)) which is explicitly defined in terms of an arbitrary choice of n basis vectors fX1 : : : Xng for Te G. In

this section the Newton-Raphson algorithm on an arbitrary Lie-group equipped with a right invariant Riemannian metric is formulated as a coordinate free manner iteration. Let G be a Lie-group with an inner product ge ( ) defined on Te G. Denote the right invari-

ant group metric that ge generates on G by g (cf. Section 5.3). Choose a basis fX1 : : : Xng for Te G which is orthonormal with respect to the inner product ge ( ), (i.e. ge (Xi Xj ) = ij , where ij is the Kronecker delta function, ij

=

0 unless i = j in which case ij

the right invariant vector fields

X~ i = drXi

=

1). Define

170


Chapter 6

associated with the basis vectors fX1 : : : Xn g. Since the basis fX1 : : : Xn g was chosen to

be orthonormal it follows that the decomposition of an arbitrary smooth vector field Z can be written

Z= In particular, let

n X j =1

zj X~j =

n X j =1

2 D(G),

g (X~ j Z )X~ j :

2 C1(G) be an analytic real map on G and grad be defined with respect

to the metric g (cf. Section 5.4)

grad =

n X j =1

g(X~ j grad )X~ j =

n X j =1

X X:

( ~j ) ~j

:3:1)

(6

t = (t1 : : : tn ) 2 Rn and define the vector field X~ 2 D(G) by X~ = Pnj=1 tj X~ j which P is the right invariant vector field associated with the unique element X = nj=1 tj Xj 2 Te G. P ~ )()t = (X ~ )( ) and consequently post-multiplying (6.2.3) by X ~ Observe that nj=1 (X j j i and summing over i = 1 : : : n one obtains the approximation Let

n X i=1

X

( ~ i )(exp(

X ))X~ i =

n X i=1

X X

( ~ i ( )) ~ i +

grad

1 ~ j )( )A X ~i (tj X

0 n n X @~ X i=1

Xi

X :

j =1

( ) + grad( ~ )( )

Now assuming that exp(X ) is a critical point of then computing the regression vector for the Newton-Raphson algorithm is equivalent to solving the coordinate free equation ~ )( ) 0 = grad ( ) + grad(X ~ (or equivalently the tangent vector X for the vector field X

:3:2)

(6

2 TeG that uniquely defines X~ ). In

Algorithm 6.2.4 the choice of fX1 : : : Xng was arbitrary and it follows that solving directly

t X1 + + tn Xn, where t = (t1 : : : tn ) is the error estimate t = ;(J ) ( ). Given an initial point 0 2 G the Newton-Raphson algorithm on a Lie-group G can be written in a coordinate free form as: ~ using (6.3.2) is equivalent to setting X for X

= 1

x6-3


171

Algorithm 6.3.1 [Coordinate Free Newton-Raphson Algorithm]

2 TeG such that X~ k = dr X k solves

Find X k

~ k )( ): 0 = grad (k ) + grad(X k

Set k+1

Set k

=

=

exp(X k )k .

k + 1 and repeat.

2

~ ) one may use the identity To compute grad(X ~ ) Y ~ )( ) g(grad(X

= =

~ ( ) Y~ X @ 2 (exp(t Y )(exp(t X ) ) 1 2 t t =0 @t1 @t2 1

2

where Y~ is an arbitrary right invariant vector field. Explicitly computing the derivatives on the

~ ) since the metric g is positive right hand side for arbitrary Y~ completely determines grad(X

definite. An example of the nature of this computation is given in the next section.

Remark 6.3.2 Without the insight provided by the Taylor expansion (Proposition 6.2.3) one may guess the Newton-Raphson algorithm would be given by solving 0 = grad + rX~ grad

r is the Levi-Civita connection. However, ~ 2 D(G) be an arbitrary right invariant ~ ) except in particular cases. Let Y rX~ grad 6= grad(X vector field, then from (5.7.9), one has rX~ g (grad Y~ ) = g (rX~ grad Y~ ) + g (grad rX~ Y~ ). ~Y ~ while r Y ~ ~ ~ ~ Now rX~ g (grad Y~ ) = rX~ Y~ = X X~ = rY~ X + X Y ] since the Levi-Civita for the right invariant vector field

X~ ,

where

connection is symmetric. Thus, one obtains 0

= =

~ Y ~ ]) ;X~ Y~ + g(rX~ grad Y~ ) + g(grad rY~ X~ ) + g(grad X ~ + g (r grad Y ~ ) + g (grad r X ~ ;Y~ X X~ Y~ )

172


Chapter 6

and consequently ~ ) Y ~ ) = g (r ~ grad Y ~ ) + g (grad r ~ X ~ g(grad(X X Y ):

The value of

rY~ X~

is given by the unique bilinear map associated with the right invari-

ant affine connection

r (cf. Section 5.8).

One has

rX~ grad

=

~ ) if and only if grad(X

g (grad rY~ X~ ) = 0 for all Y~ . The most likely situation for this to occur is when the bilinear

map associated with r is identically zero. For the examples considered in this thesis this will

2

not be true.

6.4 Symmetric Eigenvalue Problem In this section the general structure developed in the previous two sections is used to derive a coordinate free Newton-Raphson method for the symmetric eigenvalue problem. An advantage of considering the symmetric eigenvalue problem is that one can compare the Newton-Raphson algorithm to classical methods such as the shifted

QR

algorithm. This provides a good

perspective on the performance of the Newton-Raphson algorithm, however, I stress that the method is not proposed as competition to state of the art numerical linear algebra methods for solving the classical symmetric eigenvalue problem. Rather, the focus is still on adaptive and on-line applications. Recall the constrained optimization problem that was posed in Chapter 2 for computing the spectral decomposition of a matrix H . It was shown that minimising the functional

(H )

:= =

jjH ; Djj2 jjH jj2 + jjDjj2 ; 2tr(DH )

on the set4

M (H0) = fUH0U T j U 2 O(N )g H0 = H0T where

:4:1)

(6

D = diag(1 : : : N ) (a diagonal matrix with independent eigenvalues) is equivalent

4 The original definition (2.1.2) is slightly different M (H0 ) = fU T H0 U j U 2 O(N )g to the definition used here. The map U 7! U T H0 U , however, is not a group action and the definition given above is equivalent to (2.1.2).

x6-4


to computing the eigenvalues of

H (Brockett 1988, Helmke & Moore 1994b).

173

To apply the

theory developed in the previous section one must reformulate this optimization problem on

O(N ) the Lie-group associated with the homogeneous space M (H0).

The new optimization

problem that considered is:

H0T 2 S (N ) be a symmetric matrix and let D = diag(1 : : : N ) be a diagonal matrix with real eigenvalues 1 > : : : > N . Consider the potential Problem A Let H0

=

(U )

: :=

O(N ) ! R ;tr(DUH0U T ):

Find an orthogonal matrix which minimises over O(N ). It is easily seen that if one computes a minimum minimum of

.

U of Problem A then U H0UT

2 is a

Recalling Section 5.4 one easily verifies that the minimising gradient flow

solutions to Problem A will map via the group action to the minimising gradient flow associated with (Helmke & Moore 1994b, pg. 50). Computing a single iteration of the Newton-Raphson method (Algorithm 6.3.1) relies on ~ ) for an arbitrary right invariant vector field X ~ . Recall the computing both grad and grad(X

discussion in Section 6.1 regarding the Riemannian geometry of O(N ).

H0 = H0T 2 S (N ) be a symmetric matrix and let D = diag(1 : : : N ) be a diagonal matrix with real eigenvalues 1 > : : : > N and let

Lemma 6.4.1 Let

(U )

: :=

O(N ) ! R ;tr(DUH0U T ):

Express the tangent spaces of O(N ) by (6.1.1) and consider the right invariant group metric (6.1.2). a) The gradient of on O(N ) is grad = ;UH0 U T D]U:

174


Chapter 6

2 Sk(N ) be arbitrary and set X~ = XU = drU X the right invariant vector field

b) Let X

~ on O(N ) is on O(N ) generated by X . The gradient of X

~ ) = X D] UH U T ]U: grad(X 0

Proof Recall the definition of gradient (5.4.1) and (5.4.2). The Fréchet derivative of direction U

2 TU O(N ) is

D jU (U )

= =

in a

;tr(DUH0U T ) ; tr(DUH0(U )T ) tr(;D UH0U T ]T ) = g (;D UH0U T ]U U ):

Observing that ;D UH0U T ]U

2 TU O(N ) completes the proof of part a).

For part b) observe that ~ X

= =

D jU (XU ) tr(X D]UH0U T ):

Taking a second derivative of this in an arbitrary direction U one obtains

Dtr(X D]UH0U T )U (U )

=

tr(UH0U T X D]])

=

g (X D] UH0U T ]U U ):

~ ) = X D] UH0U T ]U . and thus grad(X

Recall the equation for the coordinate free Newton-Raphson method (6.3.2). Rewriting this in terms of the expression derived in Lemma 6.4.1 gives the algebraic equation 0 = ;UH0U T D]U + X D] UH0U T ]U which one wishes to solve for X

2 Sk(N ).

Remark 6.4.2 To see that a solution to this equation exists observe that given a general linear solution X

2 Rn

n (which always exists since the equation is a linear systems of N 2 equations

x6-4


175

in N 2 unknowns) then (

;X T ) D] UH0U T ]

= = =

;D X ]T UH0U T ] ;X D] UH0U T ]T ;UH0U T D]T = UH0U T D]:

T Thus, ;X T is also a solution and by linearity so is (X ;2X ) . The question of uniqueness for

X 2 Sk(N ) obtained is unclear. In the case where UH0U T = is diagonal with distinct eigenvalues it is clear that X D] ] = 0 =) X D] = 0 =) X = 0 and the solution is unique. As a consequence a genericity assumption on the eigenvalues of H0 would

the solution

need to be made to obtain a general uniqueness result. I expect that once such an assumption

H0 the skew solution of the linear system would be unique. Unfortunately I have no proof for this result at the present time. 2 is made on the eigenvalues of

Given an initial matrix

H0 and choosing U0 = In then the Newton-Raphson solution to

Problem A is: Algorithm 6.4.3 [Newton-Raphson Algorithm for Spectral Decomposition]

2 Sk(N ) such that

Find Xk

Xk D] UkH0 UkT ] = Uk H0UkT D]:

:4:2)

Set Uk+1 Set k

=

=

(6

eXk Uk , where eXk is the matrix exponential of Xk .

k + 1 and repeat.

2

Remark 6.4.4 To solve (6.4.2) one can reformulate the matrix system of linear equations as a constrained vector linear system. Denote by vec(A) the vector generated by taking the columns

of A

2 Rl

m (for l and m arbitrary integers) one on top of the other. Taking the vec of both

sides of (6.4.2) gives5 (

5

DUH0U T )T IN ; (UH0U T ) D ; D (UH0U T ) + IN (UH0U T D) = vec(UH0U T D]):

Let A,

B and C be real N N matrices and let Aij denote the ij ’th entry of the matrix A.

vec(Xk ) (6.4.3)

The Kronecker

176


Chapter 6

Distance || Hk – D ||

Newton–Raphson

Gradient Descent

Iteration

jj ; jj

Figure 6.4.1: Plot of H k D where Hk = Uk H0 UkT and Uk is a solution to both (6.4.4) and Algorithm 6.4.2. The eigenvalues of H0 are chosen to be (1 : : : N ) the eigenvalues of D though H0 is not diagonal. Thus, the minimum Euclidean distance between Hk M (H0) and D is zero. By plotting the Euclidean norm distance Hk D on a logarithmic scale the quadratic convergence characteristics of Algorithm 6.4.2 are displayed.

2

jj ; jj

The constraint Xk

2 Sk(N ) can be written as a vector equation IN

(

where P is the N 2

2

+

P )vec(Xk ) = 0

N 2 permutation matrix such that vec(A) = P vec(AT ), A 2 RN

N.

In practice, it is known that a skew symmetric solution to (6.4.3) exists and one proceeds

N (N ; 1) submatrix of the N 2 N 2 Kronecker product and using Gaussian elimination to solve for the free variables Xij , i > j . 2

by extracting the

1 2

N (N ; 1)

1 2

Of course a Newton-Raphson algorithm cannot be expected to converge globally in O(N ) and for arbitrary choice of

H0 one must couple the Newton-Raphson algorithm with some

other globally convergent method to obtain a practical numerical method. In the following simulations the associated orthogonal iteration described in Section 2.5 is used. In fact the product of two matrices is defined by

0AB A B =B @ ...

A1nB

11

An1 B

.. .

Ann B

1 CA RN N : 2

2

2

A readily verified identity relating the vec operation and Kronecker products is (Helmke & Moore 1994b, pg. 314) vec(ABC ) = (C T

A)vecB:

x6-4


177

algorithm implemented is a slight variation of (2.5.1)

Uk+1 = e;k Uk H UkT D]Uk

:4:4)

(6

0

where the modification is due to the new definition (6.4.1) of M (H0). The step size selection method used is that given in Lemma 2.2.4

k D]jj2

k = 2jjH1 D]jj log( jjH jjjjjjHD + 1) H D]]jj k

where

k

0

Hk = Uk H0 UkT and Uk is a solution to (6.4.4).

The minor difference between (6.4.4)

and the associated orthogonal double-bracket algorithm (2.5.1) does not affect the convergence results proved in Chapter 1. It follows that (6.4.4) is globally convergent to an orthogonal matrix U such that U H0UT is a diagonal matrix with diagonal entries in descending order. Figure 6.4.1 is an example of (6.4.4) combined with the Newton-Raphson algorithm 6.4.3. The aim of the simulation is to display the quadratic convergence behaviour of the NewtonRaphson algorithm. The initial condition used was generated via a random orthogonal congruency transformation of the matrix D = diag(1 2 3),

0 1 2 : 1974 ; 0 : 8465 ; 0 : 2401 BB CC B H0 = B ;0:8465 2:0890 ;0:4016 C CA : @ ;0:2401 ;0:4016 1:7136

Thus, the eigenvalues of H0 are 1, 2 and 3 and the minimum distance between D and M (H0) is

zero. In Figure 6.4.1 the distance jjHk ; Djj is plotted for Hk

=

Uk H0UkT and Uk is a solution

to both (6.4.4) and Algorithm 6.4.2. In this example the modified gradient descent method (6.4.4) was used for the first six iterations and the Newton-Raphson algorithm was used for the remaining three iterations. The plot of

jjHk ; Djj measures the absolute Euclidean distance

between Hk and D. Naturally, there is some distortion involved in measuring distance along the

surface of M (H0), however for limiting behaviour, jjHk ; Djj is a reasonable approximation of distance measured along

M (H0).

The distance

jjHk ; Djj is expressed on a log scale to

show the linear and quadratic convergence behaviour. In particular, the quadratic convergence behaviour of the Newton-Raphson algorithm is displayed by iterations seven, eight and nine in Figure 6.4.1.

178


Iteration 0 1 2 3 4 5 6 7 8

Hk )21

(

2 1.6817 1.6142 1.6245 1.6245 1.5117 1.1195 0.7071 converg.

Hk )31

(

zero

Hk )41

(

zero

Hk )32

(

4 3.2344 2.5755 1.6965 0.0150 10;9 converg.

Hk )42

(

zero

Chapter 6

(

Hk )43

6 0.8649 0.0006 10;13 converg.

Table 6.4.1: The evolution of the lower off-diagonal entries of the shifted QR method described by Golub and Van Loan (1989, Algorithm 8.2.3., pg. 423). The initial condition used is H 0 (6.4.5).

To provide a comparison of the coordinate free Newton-Raphson method to classical algorithms the following simulation is completed for both the Newton-Raphson algorithm and the shifted

QR algorithm (Golub & Van Loan 1989, Section 8.2).

The example chosen is

taken from page 424 of Golub and Van Loan (1989) and rather than simulate the symmetric

QR algorithm again the results used are taken directly from the book.

The initial condition

considered is the tridiagonal matrix

0 B B B H0 = B B B B @

1 CC 0 C CC : 6 C CA

1 2 0 0 2 3 4 0 4 5

:4:5)

(6

0 0 6 7

To display the convergence properties of the QR algorithm (Golub & Van Loan 1989) give a table in which they list the the values of the off-diagonal elements of each iterate generated for the example considered. This table is included (in a slightly modified format) as Table 6.4.

Hk)ij is said to have converged when it has norm of order 10;12 or smaller. The initial condition H 0 is tridiagonal and the QR algorithm preserves tridiagonal structure and consequently the elements (Hk )31 , (Hk )41 and (Hk )42 remain zero for all iterates. The convergence behaviour of the symmetric QR algorithm is cubic in successive off-diagonal entries. Thus, (Hk )43 converges cubically to zero, then (Hk )32 converges cubically and so on Each element

(

(Wilkinson 1968). The algorithm as a whole, however, does not converge cubically since each

x6-4


179

off-diagonal entry must converge in turn. It is interesting to also display the results in a graphical format (Figure 6.4.2). Here the norm jjHk ; diag(Hk )jj

jjHk ; diag(Hk )jj = (Hk )221 + (Hk )231 + (Hk )241 + (Hk )232 + (Hk)242 + (Hk )243 : 1 2

is plotted verses iteration. This would seem to be an important quantity which indicates robustness and stability margins of the numerical methods considered when the values of Hk are uncertain or subject to noise in an on-line or adaptive environment. The dotted line show the behaviour of the QR algorithm. The plot displays the property of the QR algorithm that it must be run to completion to obtain a solution. Figure 6.4.2 also shows the plot of jjHk ; diag(Hk )jj for a sequence generated initially by the modified gradient descent algorithm (6.4.4) (the first five iterations) and then the NewtonRaphson algorithm (for the remaining three iterations). Since the aim of this simulation is to show the potential of Newton-Raphson algorithm the parameters were optimized to provide good convergence properties. The step-size for (6.4.4) was chosen as a constant k

=

0:1

which is somewhat larger than the variable step-size used in the first simulation. This ensures slightly faster convergence in this example, although in general there are initial conditions H0 for which the modified gradient descent algorithm is unstable with step-size selection fixed at 0.1. The point at which the modified gradient descent algorithm was halted and the NewtonRaphson algorithm was begun was also chosen by experiment. Note that the Newton-Raphson method acts directly to decrease the cost jjHk ; diag(Hk )jj, at least in a local neighbourhood of the critical point. It is this aspect of the algorithm that suggests it would be useful in an on-line or adaptive environment.

Remark 6.4.5 It is interesting to note that in this example the combination of the modified gradient descent algorithm (6.4.4) and the Newton-Raphson method (Algorithm 6.4.2) converges in the same number of iterations as the QR algorithm.

2

180


Chapter 6

|| H k – diag ( Hk ) ||

QR algortihm

Gradient descent Newton–Raphson

Iteration

jj ;

jj

Figure 6.4.2: A comparison of Hk diag(Hk ) where Hk is a solution to the symmetric QR algorithm (dotted line) and H k = Uk H0 UkT for Uk a solution to both (6.4.4) and Algorithm 6.4.2 (solid line). The initial condition is H 0 (6.4.5).

Iteration 0 1 2 3 4 5 6 7 8

(

Hk)21

2 2.5709 3.7163 4.7566 1.1572 -0.0690 0.0011 converg.

Hk )31

(

0 -0.0117 -0.2994 -0.7252 -0.2222 -0.0362 10;6 10;9 converg.

Hk )41

(

0 -0.0233 0.2498 -0.1088 -0.8584 0.0199 10;5 10;10 converg.

Hk )32

(

4 4.9252 4.3369 2.5257 1.1514 -0.1112 10;5 10;10 converg.

Table 6.4.2: The evolution of the lower off-diagonal entries of H k solution to Algorithm 6.4.2. The initial condition is H 0 (6.4.5).

Hk)42

Hk )43

(

(

0 -0.4733 -0.2838 -0.0176 -0.1216 0.0649 10;6 10;9 converg.

6 4.0717 1.4798 0.8643 0.2822 0.0075 0.0011 10;11 converg.

=

Uk H0UkT where Uk is a

6.5 Open Questions and Further Work There are several issues that have not been resolved in the present chapter. In Section 6.1 it is concluded that the modified gradient descent algorithms proposed in Chapters 2 to 4 can be interpreted as geodesic interpolations of gradient algorithms. This provides a template for generating numerical algorithms that solve optimization problems on homogeneous spaces which have Lie transformation group O(N ), however things are somewhat more complicated if one considers general matrix Lie-groups. Certainly, it is a simple matter to derive exponential interpolation algorithms based on the same ideas and it would be interesting to investigate the relationship between exponential and geodesic interpolations for GL(N R). The full Newton-Raphson algorithm could also benefit from further study. In particular,

x6-5


181

issues relating to rank degeneracy in the Jacobi matrix need to be addressed. These issues are important since many relevant optimization problems are defined on a homogeneous space of lower dimension than its Lie transformation group. In this situation the lifted potential on the Lie-group will certainly have level sets of non-zero dimension and there will be directions in which the Jacobi matrix is degenerate. This issue is related to the difficulties encountered in determining whether a unique solution exists (6.4.2). Once the Newton-Raphson method on a Lie-group is fully understood it should be a simple matter to generalize the theory to an arbitrary homogeneous space. If there is an associated drop in dimension this may result in computational advantages and the development of algorithms that do not suffer from the degeneracy problems discussed above. It is also interesting to consider the computational cost of the Newton-Raphson method relative to classical algorithms such as the

QR

method. One would hope that the total

computational cost of a single step of the Newton-Raphson method would be comparable to taht of a step of the QR method, especially if the matrix linear systems can be solved using parallel algorithms. The relationship between the modified gradient descent algorithms, the Newton-Raphson algorithm and modern integration techniques that preserve a Hamiltonian function (Sanz-Serna 1991, Stuart & Humphries 1994) is worth investigating. It is hoped that the insights provided by Hamiltonian integration techniques along with the perspective given by the present work can be combined to design efficient optimization methods that preserve homogeneous constraints.

Chapter 7

Conclusion

7.1 Overview The following summary outlines the contributions of this thesis. Chapter 2: Two numerical algorithms are proposed for the related tasks of estimating the eigenvalues of a symmetric matrix and estimating the singular values of an arbitrary matrix. Associated algorithms which compute the eigenvectors and singular vectors associated with the spectral decomposition of a matrix are also presented. The algorithms are based on gradient descent methods and evolve explicitly on a homogeneous constraint set. Step-size selection criteria are developed which ensure good numerical properties and strong stability results are proved for the proposed algorithms. To reduce computational cost on conventional machines a Padé approximation of the matrix exponential is proposed which also explicitly preserves the homogeneous constraint. An indication is given of the manner in which a time-varying symmetric eigenvalue problem could be solved using the proposed algorithms. Chapter 3: The problem of principal component analysis of a symmetric matrix is considered as a smooth optimization problem on a homogeneous space. A solution in terms of the limiting solution of a gradient dynamical system is proposed. It is shown that solutions to the dynamical system considered do indeed converge to the desired limit for almost all initial conditions. A modified gradient descent algorithm, based on the gradient dynamical system solution, 182

x7-1

Overview

183

is proposed which explicitly preserves the homogeneous constraint set. A step-size selection scheme is given along with a stability analysis that shows the numerical algorithm proposed converges for almost all initial conditions. Comparisons are made between the proposed algorithm and classical methods. It is shown that in the rank-1 case the modified gradient descent algorithm is equivalent to the classical power method and steepest ascent method for computing a single dominant eigenvector of a symmetric matrix. However this equivalence does not hold for higher dimension power methods and orthogonal iterations. Chapter 4: The problems of system assignment and pole placement are considered for the set of symmetric linear state space systems. A major contribution of the chapter is the observation that the additional structure inherent in symmetric linear systems forces the solution to the “classical” pole placement question to be considerably different to that expected based on intuition obtained from the general linear case. In particular, generic pole placement can not be achieved unless the system considered has as many inputs (and outputs) as states. To compute feedback gains which assign poles as close as possible to desired poles (in a least squares sense) a number of ordinary differential equations are proposed. By computing the limiting solution to these equations for arbitrary initial conditions, estimates of locally optimal feedback gains are obtained. A gradient descent numerical method, based on the dynamical systems developed, is presented along with a step-size selection scheme and full stability analysis. Chapter 5: A review of the mathematical theory underlying the numerical methods proposed in Chapters 2 to 4 is given. A brief review of Lie-groups and homogeneous spaces is given, especially the class of homogeneous space which is most common in linear systems theory, orbits of semi-algebraic Lie-groups. A detailed discussion of Riemannian metrics on Liegroups and homogeneous spaces is provided along with the motivation for the choice of the metrics used elsewhere in this thesis. The derivation of gradient flows and the relationship between gradient flows on a homogeneous space and its Lie transformation group is covered. The convergence properties of gradient flows are discussed and a theorem is proved which is useful for proving convergence of gradient flows in many practical situations. The remainder of the chapter works towards developing a practical understanding of geo-

184

Conclusion

Chapter 7

desics on Lie-groups and homogeneous spaces. The theory of Lie-algebras is discussed and the exponential map is introduced. Affine connections are discussed and the the Levi-Civita connection associated with a given Riemannian metric is introduced. Geodesics are defined and the theory of right invariant affine connections is used to derive conditions under which the exponential map generates a geodesic curve on a Lie-group. Finally, it shown that geodesics on a Lie-group G are related to geodesics on a homogeneous space (with Lie transformation group G) via the group action.

Chapter 6: The numerical algorithms proposed in Chapters 2 to 4 are reconsidered in the context of the theory developed in Chapter 5. The proposed algorithms are seen to be specific examples of general gradient descent methods using geodesic interpolation. The remainder of the chapter is devoted to developing a Newton-Raphson algorithm which evolves explicitly on an arbitrary Lie-group. The iteration is derived in canonical coordinates of the first kind and then generalised into a coordinate free form. A theorem is given proving quadratic convergence in a local neighbourhood of a critical point. An explicit NewtonRaphson algorithm, based on the general theory developed, is derived for the symmetric eigenvalue problem.

7.2 Conclusion A primary motivation for considering the problems posed in this thesis is the recognition of the advantages of numerical algorithms that exploit the natural geometry of the constrained optimization problem that they attempt to solve. This idea is especially important for on-line and adaptive engineering applications where the properties of simplicity, global convergence and constraint stability (cf. page 2) become the principal goals. The starting point for the new results proposed in this work is the consideration of constrained optimization problems on homogeneous spaces and Lie-groups. The regular geometry associated with these sets is suitable for the constructions necessary to develop practical numerical methods. Moreover, there are numerous examples of constrained optimization problems arising in linear systems theory where the constraint set is a homogeneous space or Lie-group. The early results presented in Chapters 2 to 4 of this thesis do not rely heavily on abstract

x7-2

Conclusion

185

Lie theory. Nevertheless, the algorithms proposed are specific examples of a more general construction outlined in Chapter 6. For any smooth optimization problem on an orbit (cf. Section 5.2) of the orthogonal group O(N ) this construction can be summarised as follows: 1. Given a smooth homogeneous space M embedded in Euclidean space with transitive Lie transformation group O(N ) and group action : O(N )

M ! M , let : M ! R be

a smooth cost function.

O(N ) with the right invariant group metric induced by the Euclidean metric acting on the identity tangent space. I.e. for 1 , 2 2 Sk(N ), U 2 O(N ) then 1U , 2 U 2 TU O(N ) and h1 U 2U i = tr(1T 2 ):

2. Equip

3. Fix q

2 M and define the lifted potential (U )

: :=

O(N ) ! R ( (U q)):

4. Compute the gradient descent flow

U_ = ;grad (U ) on O(N ) with respect to the metric h i. 5. The modified gradient descent algorithm for using geodesic interpolation on O(N ) is

Uk+1 = e;(sk grad(Uk ) UkT )Uk where sk

> 0 is a small positive number.

6. The modified gradient descent algorithm using geodesic interpolation for on M is

pk = (Uk q ):

186

Conclusion

Chapter 7

7. Determine a step-size selection scheme f : M

! R,

sk := f (pk ) which guarantees (pk+1 ) < (pk ) except in the case where pk is a critical point of . In Chapter 6 a general construction is outlined for computing a Newton-Raphson algorithm on a Lie-group (Algorithm 6.3.1). The advantage of this construction is that the algorithm generated converges quadratically in a neighbourhood of the desired equilibrium. There are two main disadvantages; firstly, convergence can only be guaranteed in a local neighbourhood of the equilibrium and secondly, the theoretical construction is complicated and relies on abstract geometric construction. A comparison between the gradient descent algorithm and the Newton-Raphson algorithm nicely displays the trade off between a simple linearly convergent numerical method associated with strong convergence theory and a numerical method designed to converge quadratically (or better) and associated with weaker convergence theory. The stability and robustness of the first approach suggests that it would be of use in on-line and adaptive engineering applications where reliability is more important than computational cost of implementation. The second approach may also have applications in adaptive processes where accurate estimates are needed and the uncertainties are small.

Bibliography Ammar, G. & Martin, C. (1986). The geometry of matrix eigenvalue methods, Acta Applicandae Mathematicae 5: 239–278. Anderson, B. D. O. & Moore, J. B. (1971). Linear Optimal Control, Electrical Engineering Network Series, Prentice-Hall Inc., Englewood Cliffs, N.J., U.S.A. Anderson, B. D. O. & Moore, J. B. (1990). Optimal Control: Linear Qudratic Methods, Prentice-Hall Inc., Englewood Cliffs, N.J., U.S.A. Anderson, B. D. O. & Vongpanitlerd, S. (1973). Network Analysis and Synthesis: A Modern Systems Theory Approach, Electrical Engineering, Prentice-Hall, Englewood Cliffs, N.J., U.S.A. Aoki, M. (1971). Introduction to Optimization Techniques: Fundamentals and Applications of Nonlinear Programming, Macmillan Co., New York, U.S.A. Auchmity, G. (1991). Globally and rapidly convergent algorithms for symmetric eigenproblems, SIAM Journal of Matrix Analysis and Applications 12(4): 690–706. Baldi, P. & Hornik, K. (1989). Neural networks and principal component analysis: Learning from examples without local minima, Neural Networks 2: 53–58. Batterson, S. & Smillie, J. (1989). The dynamics of the Rayleigh quotient iteration, SIAM Journal of Numerical Analysis 26(3): 624–636. Batterson, S. & Smillie, J. (1990). Rayleigh quotient iteration for nonsymmetric matrices, Math. Comp. 55(191): 169–178.

187

188

Bibliography

Bayer, D. A. & Lagarias, J. C. (1989). The nonlinear geometry of linear programming I, II, Transactions of the American Mathematical Society 314: 499–580. Bitmead, R. R. & Anderson, B. D. O. (1977). The matrix Cauchy index: properties and applications, SIAM Journal of Applied Mathematics 33(4): 655–672. Bloch, A. M. (1985a). A completely integrable Hamiltonian system associated with line fitting in complex vector spaces, Bulletin of the American Mathematical Society 12(2): 250–254. Bloch, A. M. (1985b). Estimation, principal components and Hamiltonian systems, Systems and Control Letters 6: 103–108. Bloch, A. M. (1990a). The Kaehler structure of the total least squares problem, Brockett’s steepest descent equations and constrained flows, in A. C. M. R. M. A. Kaashoek, J. H. van Schuppen (ed.), Realization and Modelling in System Theory, Birkhäuser Verlag, Boston. Bloch, A. M. (1990b). Steepest descent, linear programming and Hamiltonian flows, Contemporary Math. 114: 77–88. Bloch, A. M., Brockett, R. W. & Ratiu, T. (1990). A new formulation of the generalised Toda lattice equations and their fixed point analysis via the momentum map, Bulletin American Mathematical Society 23(2): 477–485. Bloch, A. M., Brockett, R. W. & Ratiu, T. (1992). Completely integrable gradient flows, Communications in Mathematical Physics 23: 447–456. Bloch, A. M., Flaschka, H. & Ratui, T. (1990). A convexity theorem for isospectral sets of Jacobi matrices in a compact Lie algebra, Duke Math. J. 61: 41–65. Blondel, V. (1992). Simultaneous Stabilization of Linear Systems, PhD thesis, Faculté des Sciences Appliquée, Universite Catholique de Louvain. Blondel, V., Campion, G. & Gevers, M. (1993). A sufficient condition for simultaneous stabilization, IEEE Transactions on Automatic Control 38: 1264–1266. Bouland, H. & Karp, Y. (1989). Auto-association by multilayer perceptrons and the singular value decomposition, Biological Cybernetics 59: 291–294.

Bibliography

189

Brockett, R. W. (1988). Dynamical systems that sort lists, diagonalise matrices and solve linear programming problems, Proceedings IEEE Conference on Decision and Control, pp. 799–803. Brockett, R. W. (1989a). Least squares matching problems, Linear Algebra and its Applications 122-124: 761–777. Brockett, R. W. (1989b). Smooth dynamical systems which realize arithmetical and logical operations, Three Decades of Mathical Systems Theory, number 135 in Lecture Notes in Control and Information Sciences, Springer-Verlag, pp. 19–30. Brockett, R. W. (1991a). Dynamical systems that learn subspaces, in A. C. Antoulas (ed.), Mathematical sytems theory - The Influence of Kalman. Brockett, R. W. (1991b). Dynamical systems that sort lists, diagonalise matrices and solve linear programming problems, Linear Algebra and its Applications 146: 79–91. Brockett, R. W. (1993). Differential geometry and the design of gradient algorithms, Proceedings of Symposia in Pure Mathematics, Vol. 54, pp. 69–92. Brockett, R. W. & Byrnes, C. B. (1979). On the algebraic geometry of the output feedback pole placement map, IEEE Conference on Decisions and Control, Fort Lauderdale, Florida, U.S.A., pp. 754–757. Brockett, R. W. & Byrnes, C. I. (1981). Multivariable Nyquist criteria, root loci and pole placement: A geometric viewpoint, IEEE Transactions on Automatic Control 26(1): 271– 283. Brockett, R. W. & Faybusovich, L. E. (1991). Toda flows, inverse spectral transform and realisation theory, Systems and Control Letters 16: 79–88. Brockett, R. W. & Krishnaprasad, P. S. (1980). A scaling theory for linear systems, IEEE Transactions of Automatic Control 25: 197–207. Brockett, R. W. & Wong, W. S. (1991). A gradient flow for the assignment problem, in G. Conte, A. M. Perdon & B. Wyman (eds), Progress in System and Control Theory, Birkhäuser, pp. 170–177.

190

Bibliography

Broyden, C. G. (1970). The convergence of a class of double-rank minimization algorithms, II: The new algorithm, Journal Institute of Mathematics and its Applications 6: 222–231. Burrage, K. (1978). High order algebraically stable Runge-Kutta methods, B.I.T. 18: 373–383. Burrage, K. & Butcher, J. C. (1979). Stability criteria for implicit Runge-Kutta methods, SIAM Journal of Numerical Analysis 16(1): 46–57. Butcher, J. (1987). The Numerical Analysis of Ordinary differential equations: Runge Kutter and general linear methods, John Wiley and Sons, Chichester, U.K. Butcher, J. C. (1975). A stability property of implicit Runge-Kutta methods, B.I.T. 15: 358–361. Buurema, H. J. (1970). A geometric proof of convergence for the QR method, Phd. thesis, Rijksuniversiteit Te Groningen. Byrnes, C. I. (1978). On certain families of rational functions arising in dynamics, Proceedings of the IEEE pp. 1002–1006. Byrnes, C. I. (1983). On the stability of multivariable systems and the Ljusternik-˘snivel’mann category of real Grassmanians, Systems and Control Letters 3: 255–262. Byrnes, C. I. (1989). Pole placement by output feedback, Three Decades of Mathematical Systems Theory, Vol. 135 of Lecture Notes in Control and Information Sciences, SpringerVerlag, pp. 31–78. Byrnes, C. I. & Martin, C. F. (eds) (1980). Algebraic and geometric methods in linear systems theory, Vol. 18 of Lectures in Applied Mathematics, American Mathematical Society, Providence, Rhode Island, U.S.A. Byrnes, C. I. & Willems, J. C. (1986). Least-squares estimation, linear programming and momentum: A geometric parametrization of local minima, IMA Journal of Mathematical Control and Information 3: 103–118. Byrnes, C. I., Hazewinkel, M., Martin, C. & Rouchaleau, Y. (1980). Introduction to geometrical methods for the theory of linear systems, Geometrical Methods for the Theory of Linear Systems, D. Reidel Publ. Comp. see also Reprint Series, 273, Erasmus University, Rotterdam.

Bibliography

191

Cauchy, A. L. (1847). Méthode générale pour la résolution des systèms d’équationssimultanées, Comptes Rendus Academie Science Paris XXV: 536–538. Channell (1983). Symplectic integration algorithms, Technical Report 83-9, Los Alamos National Laboratory. Chu, M. T. (1984a). The generalized Toda flow, the QR-algorithm and the center manifold theory, SIAM Journal Discrete Mathematics 5(2): 187–201. Chu, M. T. (1984b). On the global convergence of the Toda lattice for real normal matrices and its application to the eigenvalue problems, SIAM Journal Mathematical Analysis 15: 98–104. Chu, M. T. (1986). A differential equation approach to the singular value decomposition of bidiagonal matrices, Linear Algebra and its Applications 80: 71–80. Chu, M. T. (1988). On the continuous realization of iterative processes, SIAM Review 30(3): 375–387. Chu, M. T. (1991a). A continuous Jacobi-like approach to the simultaneous reduction of real matrices, Linear Algebra and its Applications 147: 75–96. Chu, M. T. (1991b). Least squares approximation by real normal matrices with specified spectrum, SIAM Journal of Matrix Analysis and Applications 12(1): 115–127. Chu, M. T. (1992). Numerical methods for inverse singular value problems, SIAM Journal on Numerical Analysis 29(3): 885. Chu, M. T. & Driessel, K. R. (1990). The projected gradient method for least squares matrix approximations with spectral constraints, SIAM Journal of Numerical Analysis 27(4): 1050– 1060. Chu, M. T. & Driessel, K. R. (1991). Constructing symmetric non-negative matrices with prescribed eigenvalues by differential equations, SIAM Journal on Mathematical Analysis 22(5): 1372–1387. Colonius, F. & Klieman, W. (1990). Linear control semigroup acting on projective space, Technical Report 224, Universität Augsburg, Germany.

192

Bibliography

Cooper, G. J. (1986). On the existence of solution for algebraically stable Runge-Kutta methods, IMA Journal of Numerical Analysis 6: 325–330. Crouch, P. E., Grossman, R. & Yan, Y. (1992). On the numerical integration of the dynamic attitude equations, Proceedings of the IEEE Conference on Decision and Control, Tucson, Arizona, pp. 1497–1501. Crouch, P. E., Grossman, R. & Yan, Y. (1994). A third order Runge-Kutta algorithm on a manifold, preprint. Crouch, P. E. & Grossman, R. (1994). Numerical integrations of ordinary differential equations on manifolds, to appear in Journal of Nonlinear Science. Curry, H. (1944). The method of steepest descent for nonlinear minimization problems, Quarterly Applied Mathematics 2: 258–261. Dahlquist, G. (1978).

G-stability is equivalent to A-stability, B.I.T. 18: 384–401.

Davidon, W. C. (1959). Variable metric methods for minimization, Technical Report ANL-5990, Argonne National Laboratory. Davison, E. J. & Wang, S.-H. (1973). Properties of linear time-invariant multivariable systems subject to arbitrary output and state feedback, IEEE Transactions on Automatic Control pp. 24–32. Davison, E. J. & Wang, S.-H. (1975). On pole assingment in linear multivariable systems using output feedback, IEEE Transactions on Automatic Control pp. 516–518. Deift, P., Nanda, T. & Tomei, C. (1983). Ordinary differential equations for the symmetric eigenvalue problem, SIAM Journal of Numerical Analysis 20(1): 1–22. Driessel, K. R. (1986). On isospectral gradient flows - solving matrix eigenproblems using differential equations, in J. R. Cannon & U. Hornung (eds), Inverse Problems, Birkhauser Verlag, pp. 69–90. Duistermaat, J. J., Kolk, J. A. C. & Varadarajan, V. S. (1983). Functions, flows and oscillatory integrals on flag manifolds and conjugacy classes in real semi-simple Lie groups, Composito Mathematica 49: 309–398.

Bibliography

193

Ehle, B. L. (1973). A-stable methods and pade approximations to the exponential, SIAM Journal of Mathematical Analysis 4: 671–680. Euler, L. (1913). De integratione aequationum differentialium per approximationem, Opera Omnia, 1’st series, Vol. 11, Institutiones Calculi Integralis, Teubner, Leipzig, Germany, pp. 424–434. Faddeev, D. K. & Faddeeva, V. N. (1963). Computational Methods of Linear Algebra, W. H. Freeman and Co., San Francisco. Falb, P. (1990). Methods of Algebraic Geometry in Control Theory: Part I, Vol. 4 of Systems and Control: Foundations and Applications, Birkhäuser, Boston, U.S.A. Faybusovich, L. (1992a). Inverse problems for orthogonal matrices, Toda flows, and signal processing, Proceedings of the IEEE Conference on Decision and Control. Faybusovich, L. E. (1989). QR-type factorisations, the Yang-Baxter equation and eigenvalue problem of control theory, Linear Algebra and its Applications 122-124: 943–971. Faybusovich, L. E. (1991a). Dynamical systems which solve optimisation problems with linear constraints, IMA Journal of Information and Control 8: 135–149. Faybusovich, L. E. (1991b). Hamiltonian structure of dynamical systems which solve linear programming problems, Physica D . Faybusovich, L. E. (1992b). Toda flows and isospectral manifolds, Proceedings of the American Mathematical Society 115(3): 837–847. Feng (1985). Difference schemes for Hamiltonian formalism and symplectic geometry, Journal of Computational Mathematics 4: 279–289. Flashka, H. (1974). The Toda lattice, II. Existence of integrals, Physical Review B 9(4): 1924– 1925. Flashka, H. (1975). Discrete and periodic illustrations of some aspects of the inverse methods, in J. Moser (ed.), Dynamical System, Theory and Applications, Vol. 38 of Lecture Notes in Physics, Springer-Verlag, Berlin. Fleming, W. (1977). Functions of Several Variables, Undergraduate texts in Mathematics, Springer-Verlag, New York, U.S.A.

194

Bibliography

Fletcher, R. (1970).

A new approach to variable metric algorithms, Computer Journal

13(3): 317–322. Fletcher, R. & Powell, M. J. D. (1963). A rapidly convergent descent method for minimization, Computer Journal 6: 163–168. Fletcher, R. & Reeves, C. M. (1964). Function minimization by conjugate gradients, Computer Journal 7: 149–154. Forsythe, G. E. (1968). On the asymptotic directions of the s-dimensional optimum gradient method, Numerische Mathematik 11: 57–76. Friedland, S., Nocedal, J. & Overton, M. L. (1987). The formulation and analysis of numerical methods for inverse eigenvalue problems, SIAM Journal of Numerical Analysis 24: 634– 667. Gear, C. W. (1968). The automatic integration of stiff ordinary differential equations, in A. J. Morrell (ed.), Information Processing 68: Proceedings IFIP Congress, Edinburgh, pp. 187–193. Gevers, M. R. & Li, G. (1993). Parametrizations in Control, Estimation and Filtering Problems: Accuracy Aspects, Communications in Control Engineering, Springer Verlag, London, United Kingdom. Ghosh, B. K. (1988). An approach to simultaneous system design, part II: Nonswitching gain and dynamic feedback compensation by algebraic geometric methods, SIAM Journal of Control and Optimization 26(4): 919–963. Gibson, C. G. (1979). Singular Points of Smooth Mappings, Vol. 25 of Research Notes in Mathematics, Pitman, London, United Kingdom. Godbout, L. F. & Jordan, D. (1989). Gradient matrices for output feedback systems, International Journal of Control 32(5): 411–433. Goldfarb, D. (1970). A family of variable metric methods derived by variational means, Mathematics of Computation 24: 23–26. Golub, G. H. & Van Loan, C. F. (1989). Matrix Computations, The Johns Hopkins University Press, Baltimore, Maryland U.S.A.

Bibliography

195

Greenspan, D. (1974). Discrete numerical methods in physics and engineering, Academic Press, New York, U.S.A. Greenspan, D. (1984). Discrete numerical methods in physics and engineering, Journal of Computational Physics 56: 21. Hazewinkel, M. (1979). On identification and the geometry of the space of linear systems, Lecture Notes in Control and Information Science, Vol. 16, Springer-Verlag, Berlin. see also Reprint Series, No. 245, Erasmus University, Rotterdam. Helgason, S. (1978). Differential Geometry, Lie Groups and Symmetric Spaces, Academic Press, New York. Helmke, U. (1984). Topology of the moduli space for reachable linear dynamical systems: The complex case, Technical Report 122, Forschungsschwerpunkt Dynamische Systeme, University of Bremen. Helmke, U. (1991). Isospectral flows on symmetric matrices and the Riccati equation, Systems and Control Letters 16: 159–165. Helmke, U. (1992). Isospectral flows and linear programming, Journal of Australian Mathematical Society 34(3). Helmke, U. (1993). Balanced realisations for linear systems: A variational approach, SIAM Journal of Control and Optimization 31: 1–15. Helmke, U. & Moore, J. B. (1990). Singular value decomposition via gradient flows, Systems and Control Letters 14: 369–377. Helmke, U. & Moore, J. B. (1994a).

L2 -sensitivity minimization of linear system representa-

tions via gradient flows, to appear in Journal of Mathematical Systems, Estimation and Control. Helmke, U. & Moore, J. B. (1994b). Optimization and Dynamical Systems, Communications and Control Engineering, Springer-Verlag, London. Helmke, U. & Shayman, M. A. (1992). Critical points of matrix least square distance functions, 2nd IFAC workshop on System Structure and Control, Prague, Czechoslovakia, pp. 116– 118. to appear in Linear Algebra and its Applications.

196

Bibliography

Helmke, U., Moore, J. B. & Perkins, J. E. (1994). Dynamical systems that compute balanced realizations and the singular value decomposition, to appear SIAM Journal Matrix Analysis. Hénon, M. (1974). Integrals of the Toda lattice, Physical Review B 9(4): 1921–1923. Hermann, R. & Martin, C. F. (1977). Applications of algebraic geometry to systems theory part I, IEEE Transactions on Automatic Control 22: 19–25. Hermann, R. & Martin, C. F. (1982). Lie and Morse theory of periodic orbits of vector fields and matrix Riccati equations, I: General Lie theoretic methods, Math. Systems Theory 15: 277–284. Hestenes, M. R. & Karush, W. (1951). A method of gradient for the calculation of the characteristic roots and vectors of a real symmetric matrix, Journal of the Resources National Burea of Standards 47: 45–61. Hirsch, M. W. (1976). Differential Topology, number 33 in Graduate Texts in Mathematics, Springer-Verlag, New York. Horn, R. A. & Johnson, C. R. (1985). Matrix Analysis, Cambridge University Press, Cambridge, U.K. Imae, J., Perkins, J. E. & Moore, J. B. (1992). Towards time-varying balanced realisation via Riccati equations, Mathematics of Control, Signals and Systems 5: 313–326. J. E. Dennis, J. & Schnabel, R. B. (eds) (1983). Numerical methods for unconstrained optimization and nonlinear equations, Computationa Mathematics, Prentice-Hall Inc., New Jersey, U.S.A. Kailath, T. (1980). Linear Systems, Prentice-Hall, Inglewood Cliffs, N.J., U.S.A. Kalman, R. E. (1963). Mathematical description of linear systems, J.S.I.A.M. Control 1(2): 152– 192. Karmarkar, N. (1984). A new polynomial time algorithm for linear programming, Combinatorica 4 pp. 373–395. Karmarkar, N. (1990). Riemannian geometry underlying interior-point methods for linear programming, Contemp. Math. 114: 51–75.

Bibliography

197

Khachian, L. G. (1979). A polynomial algorithm in linear programming, Soviet Mathematics Doklady 20: 191–194. Kimura, H. (1975). Pole assignment by gain output feedback, IEEE Transactions on Automatic Control pp. 509–516. Kincaid, D. & Cheney, W. (1991). Numerical Analysis: Mathematics of Scientific Computing, Brooks/Cole Publishing Company, Pacific Grove, California, u.S.A. Kostant, B. (1979). The solution to a generalized Toda lattice and representation theory, Advances in Mathmatics 34: 195–338. Krishnaprasad, P. S. (1979). Symplectic mechanics and rational functions, Richerch Automatic 10: 107–135. Kumar, S. (ed.) (1991). Recent developments in Mathematical Programming, Gordon and Breach Science Publishers, Philadelphia, U.S.A. Lagarias, J. C. (1991). Monotonicity properties of the Toda flow, the QR-flow, and subspace iteration, SIAM Journal of Matrix Analysis and Applications 12(3): 449–462. Lagarias, J. & Todd, M. J. (eds) (1990). Mathematical Development Arising from linear Programming, Vol. 114 of Contemporary Mathematics, American Mathematical Society, Providence, R.I., U.S.A. Lasagni, F. (1988). Canonical Runge-Kutta methods, ZAMP 39: 952–953. Laub, A. J., Heath, M. T., Paige, C. C. & Ward, R. C. (1987). Computation of system balancing transformations and other applications of simultaneous diagonalization algorithms, IEEE Transactions on Automatic Control 32(2): 115–121. Li, G., Anderson, B. D. O., Gevers, M. & Perkins, J. E. (1992). Optimal FWL design of statespace digital sytems with weighted sensitivity minimization and sparseness consideration, IEEE Transactions on Circuits and Systems - I: Fundamental theory and applications 39: 365–377. Luenburger, D. G. (1973). Introduction to Linear and Nonlinear Programming, AddisonWesley, Reading, U.K.

198

Bibliography

Madievski, A. G., Anderson, B. D. O. & Gevers, M. R. (1994). Optimum realizations of sampled data controlers for FWL sensitivity minimization, submitted to Automatica. Mahony, R. E. & Helmke, U. (1993). System assignment and pole placement for symmetric realisations, Submitted to Journal of Mathematical Systems, Estimation and Control. Mahony, R. E. & Moore, J. B. (1992). Recursive interior-point linear programming algorithm based on Lie-Brockett flows, Proceedings of the International Conference on Optimisation: Techniques and Applications, Singapore. Mahony, R. E., Helmke, U. & Moore, J. B. (1993). Pole placement algorithms for symmetric realisations, Proceedings of IEEE Conference on Decision and Control, San Antonio, U.S.A. Mahony, R. E., Helmke, U. & Moore, J. B. (1994). Gradient algorithms for principal component analysis, Submitted to Journal of the Australian Mathematical Society. Martin, C. F. & Hermann, R. (1977a). Applications of algebraic geometry to systems theory part II: Feedback and pole placement for linear Hamiltonian systems, Proceedings of the IEEE 65: 841–848. Martin, C. F. & Hermann, R. (eds) (1977b). The 1976 Ames Research Centre (NASA) Conference on Geometric Control Theory, Vol. VII of Lie Groups: History Frontiers and Applications, Math Sci Press, Brookline, Massachusetts, U.S.A. Martin, C. & Herman, R. (1979). Applications of algebraic geometry to systems theory: The McMillan degree and Kroneker indices of transfer functions as topological and holomorphic system invariants, SIAM Journal of Control and Optimization 16(5): 743– 755. Menyuk (1984). Some properties of the discrete Hamiltonian method, Physica, series D 11: 109–129. Minoux, M. (1986). Mathematical Programming: Theory and Algorithms, John Wiley and sons, Chichester, Britain. Mishchenko, A. & Fomenko, A. (1980). A Course in Differential Geometry and Topology, Mir publishers, Moscow, Russia.

Bibliography

199

Moore, J. B., Mahony, R. E. & Helmke, U. (1992). Recursive gradient algorithms for eigenvalue and singular value decompositions, Proceedings of the American Control Conference, Chicago, U.S.A. Moore, J. B., Mahony, R. E. & Helmke, U. (1994). Numerical gradient algorithms for eigenvalue and singular value calculations, SIAM Journal of Matrix Analysis 15(3). Moser, J. (1975). Finitely many mass points on the line under the influence of an exponential potential - an integrable system, in J. Moser (ed.), Dynamic Systems Theory and Applications, Springer-Verlag, New York, pp. 467–497. Moser, J. & Veselov, A. P. (1991). Discrete versions of some classical integrable systems and factorization of matrix polynomials, Communications in Mathematical Physics 139: 217– 243. Munkres, J. R. (1975). Topology, a first course, Prentice-Hall, Englewood Cliffs, N.J., U.S.A. Nakamura, Y. (1989). Moduli spaces of controllable linear dynamical systems and nonlinear integrable systems of Toda type, in J. M. J. Harnad (ed.), Proceedings CRM workshop on Hamiltonian Systems, Transformation Groups and Spectral Transform Methods, pp. 103– 112. Nanda, T. (1985). Differential equations and the QR algorithm, SIAM Journal Numerical Analaysis 22(2): 310–321. Oja, E. (1982). A simplified neuron model as a principal component analyzer, Journal of Mathematical Biology 15: 267–273. Oja, E. (1989). Neural networks, principal components, and subspaces, International Journal on Neural Systems 1: 61–68. Parlett, B. N. (1974). The Rayleigh quotient iteration and some generalisations for nonnormal matrices, Mathematics of Computation 28(127): 679–693. Parlett, B. N. & Poole, W. G. (1973). A geometric theory for the QR, LU, and power iterations, SIAM Journal of Numerical Analysis 10(2): 389–412. Paul, S., Hueper, K. & Nossek, J. A. (1992). A class of non-linear lossless dynamical systems, ¨ Achiv für Elektronik und Ubertragungstechnik 46: 219–227.

200

Bibliography

Perkins, J. E., Helmke, U. & Moore, J. B. (1990). Balanced realizations via gradient flow techniques, Systems and control Letters 14: 369–380. Polyak, B. T. (1966). A general method for solving extremum problems, Soviet Mathematics Doklady 8: 593–597. Riddell, R. C. (1984). Minimax problems on Grassmann manifolds: Sums of eigenvalues, Advances in Mathematics 54: 107–199. Rosenthal, J. (1989). Tuning natural frequencies by output feedback, in K. Bowers & J. Lund (eds), Computation and Control, Proceedings of the Bozeman Conference, Bozeman, Montana, Vol. 1 of Progress in Systems and Control Theory, Birkhäuser, pp. 277–282. Rosenthal, J. (1992). New results in pole assignment by real output feedback, SIAM Journal of Control and Optimization 30(1): 203–211. Ruth (1983). A canonical integration technique, IEEE Transactions on Nuclear Science 30: 2669–2671. Rutishauser, H. (1954). Ein infinitesimales analogon zum quotienten-differenzen-algorithmus, Archiv der Mathematik 5: 132–137. Rutishauser, H. (1958). Solution of eigenvalue problems with the LR-transformation, National Bureau of Standards: Applied Mathematics Series 49: 47–81. Safanov, M. G. & Chiang, R. Y. (1989). A Schur method for banced-truncations, IEEE Transactions on Automatic Control 34: 729–733. Sanz-Serna, J. M. (1988). Runge-Kutta schemes for Hamiltonian systems, B.I.T. 28: 877–883. Sanz-Serna, J. M. (1991). Symplectic integrators for Hamiltonian problems: An overview, ACTA Numerica 1: 243–286. Shanno, D. (1970). Conditioning of quasi-newton methods for function minimization, Mathematics of Computation 24: 641–656. Shayman, M. A. (1986). Phase portrait of the matrix Riccati equation, SIAM Journal of Control and Optimization 24(1): 1–65.

Bibliography Shub, M. & Vasquez, A. T. (1987). Some linearly induced Morse-Smale systems, the

201

QR

algorithm and the Toda lattice, in L. Keen (ed.), The Legacy of Sonya Kovaleskaya, Vol. 64 of Contemporary Mathematics, American Mathematical Society, Providence, U.S.A., pp. 181–194. Smale, S. (1961). On gradient dynamical systems, Annals of Mathematics 74(1): 199–206. Smith, S. T. (1991). Dynamical systems the perform the singular value decomposition, Systems and Control Letters 16(5): 319–327. Smith, S. T. (1993). Geometric Optimization methods for adaptive filtering, PhD thesis, Division of Applied Science. Sontag, E. D. (1990). Mathematical Control Theory, Springer-Verlag, New York, U.S.A. Sreeram, V., Teo, K. L., Yan, W.-Y. & Li, C. (1994). A gradient flow approach to simultaneous stailization problem, submitted to 1994 Conference on Decision and Control. Stuart, A. M. & Humphries, A. R. (1994). Model problems in numerical stability for initial value problems, to appear SIAM Review. Suris, Y. B. (1989). Hamiltonian Runge-Kutta type methods and their variational formulation, Math. Sim. 2: 78–87. in Russian. Symes, W. W. (1980a). Hamiltonian group actions and integrable systems, Physica D 1: 339– 374. Symes, W. W. (1980b). Systems of the Toda type, inverse spectral problems, and representation theory, Inventiones Mathematicae 59: 13–51. Symes, W. W. (1982). The QR algorithm and scattering for the finite nonperiodic Toda lattice, Physica 4D pp. 275–280. Toda, M. (1970). Waves in nonlinear lattice, Supplement of the Progress in Theoretical Physics 45: 174–200. Tombs, M. S. & Postlethwaite, I. (1987). Truncated balanced realization of stable non-minimal state-space system, International Journal of Control 46: 1319–1330.

202

Bibliography

Varadarajan, V. S. (1984). Lie Groups, Lie Algebras, and their Representations, Vol. 102 of Graduate texts in Mathematics, Springer-Verlag, New York, U.S.A. Wang, X. (1989). Geometric inverse eigenvalue problem, in K. Bowers & J. Lund (eds), Computation and Control, Proceedings of the Bozeman Conference, Bozeman, Montana, Vol. 1 of Progress in Systems and Control THeory, Birkhäuser, pp. 375–383. Wang, X. (1992). Pole placement by static ouptput feedback, Journal of Mathematical Systems, Estimation and Control 2(2): 205–218. Warner, F. W. (1983). Foundations of Diferentiable manifolds and Lie Groups, Graduate texts in Mathematics, Springer-Verlag, New York, U.S.A. Watkins, D. S. (1982). Understanding the QR algorithm, SIAM Review 24(4): 427–440. Watkins, D. S. (1984). Isospectral flows, SIAM Review 26(3): 379–391. Watkins, D. S. & Elsner, L. (1988). Self similar flows, Linear Algebra and its Applications 110: 213–242. Watkins, D. S. & Elsner, L. (1989a). Self-equivalent flows associated with the generalized eigenvalue problem, Linear Algebra and its Applications 118: 107–127. Watkins, D. S. & Elsner, L. (1989b). Self-equivalent flows associated with the singular value decomposition, SIAM Journal Matrix Analysis Applications 10: 244–258. Widlund, O. (1967). A note on unconditionally stable linear multistep methods, B.I.T. 7: 65–70. Wilkinson, J. H. (1968). Global convergence of the

QR algorithm, Linear Algebra and its

Applications 1. Willems, J. C. & Hesselink, W. H. (1978). Generic properties of the pole placement map, Proceedings of the 7th IFAC Congress, pp. 1725–1729. Wonham, W. M. (1967). On pole assignment in multi-input controllable linear systems, IEEE Transactions on Automatic Control 12: 660–665. Wonham, W. M. (1985). Linear Multivariable Control, third edition edn, Springer-Verlag, New York, U.S.A.

Bibliography

203

Yan, W.-Y. & B., M. J. (1991). On L2-sensitivity minimization of linear state-space systems, preprint. Yan, W.-Y., Helmke, U. & Moore, J. B. (1994). Global analysis of Oja’s flow for neural networks, To appear IEEE Transactions on Neural Networks. Yan, W.-Y., Moore, J. B. & Helmke, U. (1994). Recursive algorithms for solving a class of nonlinear matrix equations with applications to certain sensitivity optimization problems, to appear SIAM Journal of Control and Optimization. Yan, W.-Y., Teo, K. L. & Moore, J. B. (n.d.). A gradient flow approach to computing LQ optimal output feedback gains, Submitted to Journal of Optimal Control and Applications. Zhong, G. & Marsden, J. E. (1988). Lie-Poisson Hamilton-Jacobi theory and Lie-Poisson integrators, Physics Letters A 133(3): 134–139.

Optimization Algorithms on Homogeneous Spaces - CiteSeerX

Optimization Algorithms on Homogeneous Spaces - CiteSeerX

Suggest Documents

On two point homogeneous Finsler spaces - CiteSeerX

Harnack Inequality on Homogeneous Spaces

on spaces of homogeneous type1 - Semantic Scholar

Invariant Finsler metrics on polar homogeneous spaces

Commutators of Singular Integrals on Homogeneous Spaces

Invariant Finsler metrics on polar homogeneous spaces

Spaces of Type BLO on Non-homogeneous Metric Measure Spaces

Invariant Matsumoto metrics on homogeneous spaces

ON COMPLEX WEAKLY COMMUTATIVE HOMOGENEOUS SPACES

On Evolution, Search, Optimization, Genetic Algorithms ... - CiteSeerX

On Homogeneous and Self-Dual Algorithms for LCP - CiteSeerX

analytic geometry of homogeneous spaces

Non-homogeneous spatial filter optimization for ... - CiteSeerX

GENERALIZED NON-HOMOGENEOUS MORREY SPACES

Convex optimization on Banach Spaces

Cayley-Klein Poisson homogeneous spaces

Homogeneous Immunoassay Based on ... - CiteSeerX

perturbed optimization in banach spaces i - CiteSeerX

Heat Kernel on Homogeneous Bundles over Symmetric Spaces

on homogeneous images of compact ordered spaces

Nonparametric density estimation on homogeneous spaces in high ...

Invariant metrics on homogeneous spaces with equivalent isotropy

On s-sets in spaces of homogeneous type

ON NORMAL RIEMANNIAN HOMOGENEOUS SPACES OF RANK 1 ...