Optimal Control Approach For Large Scale

Optimal Control Approach For Large Scale Unconstrained Optimization Problems 1 2

Li-Zhi Liao3 July 18, 1995

Abstract We propose an optimal control approach to tackle large scale unconstrained optimization problems. Our discussion begins with the introduction of a new framework to describe and de ne the structure of a general function with a large number of variables. Based on the new concepts, a sucient condition is established so that any large scale unconstrained optimization problem satisfying this sucient condition can be solved in O(n) ops per iteration by optimal control algorithms which have local second-order convergence, where n is the number of variables. This optimal control approach could reduce the computational cost by a factor of n compared with the standard superlinear or quadratic optimization methods. Our application of this sucient condition on a collection of large scale unconstrained optimization problems in the literature indicates that the majority of these problems are amenable to this approach.

Key words: large-scale problems, optimal control, nonlinear optimization, numerical algorithms.

AMS subject classi cation: 90C06, 90C30, 65K10, 49N55. Abbreviated title: Large scale optimization. This work has been supported partially through a grant from the Advanced Scienti c Computing Division of the National Science Foundation (ASC-9211109), and by the Cornell Theory Center, which receives funding from members of its Corporate Research Institutes, the National Science Foundation (NSF), the Advanced Research Projects Agency (ARPA), the National Institute of Health (NIH), New York State, and IBM Corporation. 2 This research was also supported partially by grant FRG/94-95/II-32 from Hong Kong Baptist University. 3 Department of Mathematics, Hong Kong Baptist University, 224 Waterloo Rd., Kowloon, Hong Kong. [email protected] 1

{1{

1. Introduction We consider the following unconstrained nonlinear programming (NLP) problem

min f(x)

(NLP)

x2Rn

where f 2 C 2(Rn ). In this paper, we are only interested in large scale problems, i.e. n is large (for example n 10; 000). For unconstrained nonlinear programming problems in the form of (NLP), the application of standard optimization methods with at least superlinear convergence would require O(n2 ) ops. As the problems size n increases, these methods become less and less attractive, and sometimes even impossible to be implemented for very large n. As a result, exploiting structure is not only necessary for eciency but also may be the only possible way. Since the 1980's, many researches have been conducted to study and explore the structure of large scale optimization problems. Coleman [2] has provided a very good introduction to both theory and algorithms in the area of large sparse optimization. An overview and discussion of recent research developments for large scale optimization can be found in Coleman [3] and Conn et al: [6], [8]. Most large scale optimization studies associated with structure exploitation can be divided into two categories. The rst category focuses on the structure or sparsity of the Hessian matrix r2xxf (see Gill and Murray [11], Nocedal [20] and Fletcher [10]). The article by Griewank and Toint [12] falls into this category. They discuss a partial separability approach to solve certain kinds of problems eciently, and establish a very nice link between the depth of partial separability and the partial derivatives of f(x). They apply their approach to the class of Quasi-Newton updates in a partitioned fashion. The second category pays attention directly to the structure in the function f(x). The research in the second category is relatively new. A recent contribution in this category is the introduction of group partially separable functions by Conn et al: [6]. There are certain structured problems in the form of (NLP) that do not t into either category. For example, in the following problem min f(x) =

x2Rn

n X i=1

(xi ? 2) + 2

n Y

(xi ? 1)2

i=1

(1:1)

where xi is the ith component of x, f(x) demonstrates some obvious structure. But this structure is not well represented in the Hessian matrix of f(x). As a matter of fact, in general the Hessian matrix is dense. In addition, this function f(x) is not group partially separable (see x3.3.1, Conn et al: [7]) and has small dimension for the invariant subspace (see x1.2, Conn et al: [7]). Therefore, the structure demonstrated in (1.1) can not be exploited by previous approaches. In this paper, we propose an optimal control approach to solve certain large scale unconstrained optimization problems including problem (1.1) in O(n) ops and O(n) storage {2{

per iteration. Furthermore, based on a sucient condition, a special class of unconstrained optimization problems can be de ned so that any problem in this class can be solved in O(n) ops per iteration by optimal control algorithms which are locally quadratically convergent. Compared with the standard superlinear or quadratic optimization methods, our new approach can reduce the computational cost and storage by a factor of n for many large scale unconstrained optimization problems. This paper is organized as follows. Section 2 contains the motivation for our approach. In Section 3, we introduce a new framework, called Pattern Structure, to describe and de ne the structure of a general function with a large number of variables. In Section 4, we brie y discuss two optimal control algorithms along with their advantages over the standard optimization methods for optimal control problems. In Section 5, we provide a sucient condition so that any large scale unconstrained optimization problem satisfying this sucient condition can be solved in O(n) ops by optimal control algorithms. In Section 6, we apply this sucient condition to various published large scale unconstrained optimization problems. Finally, we close our discussion by addressing some future research directions and areas. 2. Motivation As we discuss later in Section 4, if problem (NLP) ts into the format of discrete time optimal control problems, problem (NLP) can be solved by optimal control algorithms in O(n) ops per iteration with locally quadratic convergence. However, if the same problem (NLP) is solved by any standard Quasi-Newton's method, O(n2 ) ops would be required per iteration. The reduction of order n is extremely signi cant in the large scale case. The attractive O(n) ops and fast local convergence resulting from optimal control algorithms really motivate this research and the desire to investigate the conditions on which a function f(x) would t into the format of discrete time optimal control problems. The rst report about the application of optimal control approach to nonlinear programming problems was due to Murray and Yakowitz [19]. They apply the Dierential Dynamic Programming algorithm (an optimal control algorithm) to several nonlinear programming problems and notice the fact that the computational burden grows linearly with the number of variables. Our research was initially inspired by their study. The research in this study can be viewed as two main parts. The rst part (Section 3) is to introduce some concepts and de nitions related to the structure in any function f(x). This part paves a way for our later discussion and analysis in Section 5. The second part of our discussion is to establish a sucient condition so that problem (NLP) can be converted into a discrete time optimal control problem. 3. Pattern Structure In this section, we initiate some Pattern concepts so that the structure in any function f(x) can be characterized and described. We will illustrate these concepts and de nitions with example (1.1) throughout this section. Function f(x) in (1.1) consists of two terms. The rst term is the summation of n elements (xi ? 2)2 i = 1; ; n; the second term is the product of n elements (xi ? 1)2 i = 1; ; n. In general, we can say that each term consists of two components, elements {3{

(such as (xi ? 2)2 and (xi ? 1)2) and operations (such as addition and multiplication) among those elements. The operations can be viewed as the forms of connection among the elements, and we de ne them as operators in the following: a) Operator : R2 ! R1 is de ned as for any x; y 2 R1 if there exists a z 2 R1 such that z = x y and operator satis es the associative law. Operator set is de ned as = f j is de ned aboveg. (3.1) Obviously, scalar addition and multiplication are two elements in . In example (1.1), the operator in the rst term is the scalar addition, and the operator in the second term is the scalar multiplication. Now, let's turn our attention to the element. In example (1.1), elements (xi ? 2)2 i = 1; ; n and (xi ? 1)2 i = 1; ; n can be viewed as the basic units in the corresponding terms. In the following, we de ne these basic units along with the operators connecting them as patterns and abstract patterns. b) For any x 2 Rn, a pattern of x, say p(x), is de ned as follows: if there exist an integer l > 0, l functions i : R1 ! R1, i = 1; ; l and l ? 1 operators i 2 i = 1; ; l ? 1 such that p(x) = 1(x[1] ) 1 2(x[2] ) 2 l?1 l (x[l] ) (3.2) where [i] N = f1; 2; ; ng, x[i] = fxj j j 2 [i]g for i = 1; ; l, and the order of execution for operators i i = 1; ; l ? 1 is from left to right (this rule will be followed in the rest of the paper unless speci ed). For simplicity, from now on we will omit x in p(x) whenever no confusion will occur. c) An abstract pattern of x, say p, is a pattern of x such that there exists a set I N and the pattern p can be represented as p=

X

i2I N

ai 1(xI1(i)) 1 2(xI2(i)) 2 lI ?1 lI (xIlI (i))

(3:3)

where Ik (i) N , xIk(i) = fxj j j 2 Ik(i)g for k = 1; ; lI , i 2 I, Ik (i) 6 Ij (i) if k 6= j for any k; j = 1; ; lI , i 2 I, ai 2 R1 i 2 I, and lI + jI j = (n)y (jI j denotes the total number of elements in I.). According to above de nitions, the rst term of example (1.1) can be de ned as an abstract pattern, say p1 , in the following way: p1 = (x1 ? 2)2 + (x2 ? 2)2 + + (xn ? 2)2 with I = N , lI = 1, Ik(i) = fig, ai = 1 i 2 I and (x) = (x ? 2)2 . The second term of example (1.1) can be de ned as an abstract pattern, say p2, in the following way:

y (n) denotes that there exist c1, c2 and 0 < c1 < c2 such that c1n (n) c2n. {4{

p2 = (x1 ? 1)2 (x2 ? 1)2 (xn ? 1)2 with I = f1g, lI = n, Ik(i) = fkg, ai = 1, i = scalar multiplication i 2 I and (x) = (x ? 1)2. The following de nitions are the extension of patterns and abstract patterns. d) For any function f(x) 2 R1 , x 2 Rn , let's denote P(f) to be the pattern set of f which is de ned as follows: i) P(f) is the set of dierent patterns and abstract patterns of x appeared in f(x); ii) f(x) can be represented by the elements of P(f), i.e. f(x) is a function of the elements of P(f). Similarly, we can de ne the following: e) Abstract pattern set of f, PA(f), is de ned as a subset of P(f) and each element of PA (f) is an abstract pattern of x. f) Pattern number of f, N(f), is de ned as the total number of patterns in P(f). Note: N(f) depends on the de nition of P(f). But for simplicity, we do not include P in the de nition. This simplicity will be carried out as well in the following de nitions which depend on P(f). g) Abstract pattern number of f, NA (f), is de ned as the total number of abstract patterns in PA(f). The abstract pattern set PA(f) can be further divided into two sets, PA1 (f) which contains the abstract patterns in PA(f) such that for each p 2 PA1 (f), its set I de ned in (3.3) satis es jI j = 1, and PA2 (f) which contains the rest abstract patterns in PA (f). Obviously, PA1 (f) [ PA2 (f) = PA (f) and PA1 (f) \ PA2 (f) = ;. Let NA1 (f) and NA2 (f) denote the number of patterns in PA1 (f) and PA2 (f), respectively. In example (1.1), let p1 and p2 be the corresponding abstract patterns of the rst and second terms as de ned earlier, respectively. Then it is easy to verify that for example (1.1)

P(f) = PA (f) = p1; p2 ; PA1 (f) = fp2g; PA2 (f) = fp1 g: It is important to know that for any given function f(x), its PA (f)Pset (therefore P(f) set) is not unique. This can be observed in the following function: Pni=1 x2i which n x2 and can havethe following two dierent abstract pattern sets P (f) = A i=1 i . P 2 x2; Pn 2 PA(f) = n= x i=1 i i=n=2+1 i In this paper, we will focus our attention on function f(x) with N(f) being independent of n, i.e. N(f) = O(1) (this is usually true for most of large scale structured problems). Under this assumption, function f(x) can be simpli ed signi cantly in terms of its patterns. For any function f(x), we de ne F(p) = f(x) where p 2 P(f). (3.4) {5{

In f(x), function f is represented by the components of x; while in F(p), function F is represented by the patterns of f(x) or the elements of P(f). Under the earlier de nition of P(f) for example (1.1), example (1.1) becomes min p1 + p2 with some restrictions on p1 and p2. These restrictions will be discussed in Section 5. With the establishment of Pattern Structure, all patterns and abstract patterns for any function can be well identi ed and de ned. In the following sections, we will explore the application of Pattern Structure in solving problem (NLP). But rst, let's look at two optimal control algorithms. 4. Optimal Control Methods The discrete time optimal control problem is a very popular class of optimization problem (see [23]). In general, these problems can be represented in the following format: N X

g(x ; u ; t) + g(xN +1 ; N + 1) min u ;;uN ) t=1 t t where xt+1 = T(xt; ut; t); t = 1; ; N,

( 1

(DOCP)

x1 x1 given and xed where xt 2 Rn is the state variable; ut 2 Rm is the control variable; g : Rn Rm R1 ! R1 is the loss function; T : Rn Rm R1 ! Rn is the transition function; t is the time step or time period and N is the total number of time periods, in general N is large. In problem (DOCP), the transition function T's look like constraints, while each state variable xt behaves like an intermediate parameter. But after a careful study of problem (DOCP), it can be seen that both constraint T's and intermediate parameter xt's can be removed by replacing xt+1 with T(xt; ut; t) recursively backward in time from t = N to t = 1. Therefore, the resulting problem would become an unconstrained nonlinear programming problem in the form of (NLP). The transformation of problem (DOCP) into (NLP) con rms that the optimal control problem in the format of (DOCP) is a subclass of unconstrained optimization problems (Obviously, problem (DOCP) can be viewed as a constrained optimization problem.). Although problem (DOCP) can be solved by any standard optimization methods, it would be wasteful to ignore the special structure in problem (DOCP). By recognizing and taking advantages of the special structure in problem (DOCP), many optimal control algorithms have been proposed to solve problem (DOCP) eciently (Mitter [17], Mayne [16], Jacobson [13], Ohno [21], Pantoja [22], Dunn and Bertsekas [9]). Many algorithms are discussed in the book by Bryson and Ho [1] and the article by Yakowitz [24]. The most signi cant advantage for these optimal control algorithms is that the computational complexity per iteration is a linear function of the total number of time periods N (see Liao and Shoemaker [15] for details), while the direct implementation of any standard optimization algorithms with at least superlinear convergence for problem {6{

(DOCP) would usually require at least a quadratic function of the total number of time periods N. To date, the most popular optimal control algorithm is called Dierential Dynamic Programming (DDP) which is proposed by Mayne [16] and developed further by Jacobson and Mayne [14]. A detailed DDP algorithm can be found in [25]. Basically, the DDP algorithm is obtained through the following 3 steps: i) First, problem (DOCP) is decomposed into a sequence of subproblems, where each subproblem corresponds to one time period, according to Dynamic Programming scheme. ii) Then each subproblem is solved recursively backward in time by Newton's method to compute a sequence of linear functions, called feedback functions. iii) Finally, the sequences of state and control variables are updated forward in time based on the sequences of transition functions T(xt; ut; t) and feedback functions. The backward and forward procedures constitute one iteration of the DDP algorithm. Pantoja [22], Dunn and Bertsekas [9] discover that Newton's method can be applied to problem (DOCP) eciently. We refer this ecient Newton's method for problem (DOCP) as stagewise Newton's method. Stagewise Newton's method shares the same computational complexity as DDP, and has locally quadratic convergence. The followings are the main properties for both the DDP algorithm and stagewise Newton's method. a) The computational complexity per iteration (see [15]) is N (2n3+ 72 n2m + 2nm2 + 13 m3 ) + O(N (n + m)2 ). b) The storage is (8Nnm+lower term) bytes for double precision (see [15]). c) Both algorithms are quadratically convergent. Even though Liao and Shoemaker [15] provide a globalization strategy for the convergence of DDP in nonconvex situations, we feel that the globalization strategy is not very robust and ecient. A robust, ecient and economical globalization strategy for DDP still remains undiscovered. Coleman and Liao [4] recently suggest a globalization strategy with the use of a trust region strategy for stagewise Newton's method. Their globalization strategy appears to be quite successful numerically. As we mentioned earlier in this section, problem (DOCP) can be converted to an unconstrained optimization problem. Therefore, any standard optimization techniques can be applied to problem (DOCP). However, it would require O(N 3 ) ops for direct Newton's method, O(N 2 ) ops for Quasi-Newton methods to nd the solution. If N is large (which is common in practice), properties a) and b) clearly indicate the advantages of optimal control algorithms over standard optimization methods for optimal control problems. {7{

5. A Sucient Condition In this section, we will establish a sucient condition for

problem (NLP) so that it can be transformed into a discrete time optimal control problem in the form of (DOCP). The sucient condition requires three conditions and one important result which will be presented one by one in the following. The de nitions and notation introduced in Section 3 will be used extensively in the following presentation. Let's rst look at the three conditions. Condition 1 { Finite Patterns: N(f) is O(1). Condition 1 indicates that the number of patterns for function f(x) in (NLP) is nite and independent of the total number of variables n. In order to simplify the following discussion, we de ne the following notation: i) Base set of p2 PA(f), denoted by B(p), is de ned as B(p) = f(I1 (i); ; IlI (i)) j i 2 I g I, I1(i); ; IlI (i) are same as in (3.3). (5.1) For any two base sets, say B(p) and B(q), and two elements, say e1 and e2, with e1 = (I1(i); ; Il1 (i)) 2 B(p) and e2 = (I1 (j); ; Il2 (j)) 2 B(q), we de ne e1 e2 if for any k with 1 k l1 , there exists an l with 1 l l2 such that Ik(i) Il (j). We de ne B(p) B(q) if for any ep 2 B(p), there exists an eq 2 B(q) such that ep eq . ii) Index set of p, denoted by B (p), is de ned as B (p) = fk 2 Ij (i) j j = 1; ; lI and i 2 I g. (5.2) (Note: We restrict that any two elements in B (p) are dierent.) iii) Base size of p is de ned as jpj = lI for every p 2 PA (f) where lI is de ned in (3.3). Condition ? 2 { Partial Separability: For any pi , pj 2 PA(f); i; j 2 f1; ; NA (f)g and i 6= j, if fpig [ fpj g \ PA2 (f) 6= ;, then @ 2F = 0 @pi @pj

where F is de ned in (3:1)

(5:3)

except that if pi; pj 2 PA2 (f), then there exists an element p 2 PA2 (f) such that B(pi ) B(p) and B(pj ) B(p). Condition 2 imposes restrictions on the patterns in PA2 (f) so that the patterns in PA2 (f) can be separated from each other and the patterns in PA1 (f). It is easy to observe that the objective function in (DOCP) are separated from each other according to each time period. Therefore, Condition 2 is just a similar requirement of this separated objective function. An observation here is that the patterns in PA1 (f) are not required to be separated from each other. Condition 3 { Restriction on PA1 (f): For every p 2 PA1 (f) with base size s (jpj = s), let (i1 ; i2 ; ; is ) be any order of (1; 2; ; s), then there always exist s functions 1; ; s such that {8{

p = s(xIlis ; s?1 (xIlis?1 ; s?2(xIlis?2 ; ; 2(xIli2 ; 1(xIli1 )); )

(5.4)

providing that the operators j j = 1; ; s ? 1 as in (3.3) have been chosen and embedded into (5.4). Sets Ilis , , Ili1 are similar to IlI (i), , Il1 (i) as in (3.3). In problem (DOCP), if state variable xt is replaced by transition functions T(x ; u ; ) for = 1; ; t ? 1 backward in time, the resulting xt, T( T(x1; u1; 1); ; ut?1; t ? 1), has the same format as equation (5.4). Therefore, Condition 3 guarantees that any pattern in PA1 (f) shares the `nested' structure. Now, let's look at one important result. This result will be summarized in the following Lemma 1. But rst, we need the following notation. Consider any p 2 PA2 (f) with jpj = q, its base set can be represented as B(p) = f(I1 (i); ; Iq (i)) j i 2 I Ng. (5.5) De ne ? (5.6) Mp =maxi2I max1j;kq (max(s2Ij (i);2Ik (i)) js ? j) . Before the discussion of Lemma 1, let's rst demonstrate its idea. Consider function h(x) =

n? X

3

3

i=1

hi(xi ; xi+1; xi+2 ; xi+3 ):

(5:7)

In function h(x), assuming that p has been de ned and B(p) = f(i; i + 1; i + 2; i + 3) j i = 1; ; 3n ? 3g, then we have Mp = 3. The essence of Lemma 1 is to de ne the vector

0x 1 i? yi = @ x i? A 3

2

3

1

x3i

i = 1; ; n

(5:8)

so that Mp^ = 1, where p^, B(^p), Mp^ and h^ are the corresponding notation of p, B(p), Mp and h, respectively, under variable y, and

X h^ (y) = ^hi(yi ; yi+1 ); n?1 i=1

B(^p) = f(i; i + 1) j i = 1; ; n ? 1g:

(5:9)

Lemma 1: For any p 2 PA(f) with Mp 2, assume that B(p) in (5.5) is its base set and 2

Mp in (5.6) is O(1). Then there exists a vector y, where each component of y is either a component of x or a subvector of x but containing at most Mp components of x, so that Mp^ 1 where p^ is the corresponding pattern of p under vector y and Mp^ is same as (5.6) except using the indices of vector y. Proof: We will provide a constructive proof so that vector y can be generated. {9{

Without loss of generality, assume I = f1; 2; ; K g, then we de ne set S to be the index set of p as in (5.2). The order of elements in S follows the rule that if s 2 Ii(j); 2 Ik(l), then element s precedes element if j < l or j = l, i < k for i; k = 1; ; q and j; l = 1; ; K. Now, we de ne each component of vector y as follows: a) y1 is de ned as a subvector of x whose indices are the rst Mp elements in S, remove these Mp elements from S, set i = 2. b) yi is de ned as a subvector of x whose indices are the rst Mp elements in S, remove these Mp elements from S, increase i by 1 and repeat b) whenever the condition of c) is not satis ed. c) Whenever there are less than Mp elements in S (if the set S is empty, terminate the process of de ning y), de ne the last component of y as the subvector of x whose indices are the remaining elements. Obviously, each component of y contains at most Mp components of x. Let's denote p^ as the new pattern in PA2 (f) in terms of vector y which replaces vector x in pattern p 2 PA2 (f). As a result, B(p) will become B^y (^p) and can be expressed as B^y (^p) = f(I^1(i); ; I^q (i)) j i 2 I Ng. (5.10) where I^j (i) is the corresponding set for Ij (i) in (5.5) under vector y for j = 1; ; q. Now we claim that any element in B^y (^p) can not contain more than two dierent indices of the components of y. Suppose not, there exists an element, say e 2 B^y (^p), such that e contains at least 3 dierent indices of the components of y, say s1 , s2 and s3 with s1 < s2 < s3. From the de nition of B^y (^p), we know that there exists an element in B(p), say (I1 (l); I2 (l); ; Iq (l)), such that its corresponding element in B^y (^p) is e. Therefore, there exist a component of ys1 , say (ys1 ) , and an index l1 2 fk j k 2 Ii (l); i = 1; ; qg such that (ys1 ) = xl1 . Similarly, there exist a component of ys3 , say (ys3 ) , and an index l2 2 fk j k 2 Ii(l); i = 1; ; qg such that (ys3 ) = xl2 . From the de nitions of set S and y, we know that if (ys1 ) is a component of ys1 and (ys3 ) is a component of ys3 , then there are at least Mp distinct elements in fk j k 2 Ii (l); i = 1; ; qg between the locations of l1 and l2 . This implies 1 + Mp max1i;jq (max(s2Ii(l);2Ij (l)) js ? j) and therefore contradicts the de nition of Mp in (5.6). Since each element of B^y (^p) contains at most two dierent indices of the components of y, say s1 and s2 with s1 < s2 , each element of B^y (^p) can be reduced to either (fs1 g; fs2 g) or (fs1 ; s2 g) according to the de nition of an abstract pattern in (3.3). Therefore, set B^y (^p) can be simpli ed as p^'s base set By (^p) whose base size is at most 2. From the de nition of vector y and above discussion, it is straightforward to see that s2 = s1 + 1. Therefore, Mp^ 1 where the de nition of Mp^ is same as (5.6) except using the indices of vector y. Note: From equations (3.3), (5.1) and (5.6), it is straightforward to see that if Mp 1, then jpj 2. { 10 {

Lemma 1 indicates that under certain conditions and transformations, the Mp de ned in (5.6) for each pattern in PA2 (f) is at most 1. This result is important in de ning the objective function in problem (DOCP) since each term in the objective function of problem (DOCP) contains two variables, state variable and control variable. The following Lemma 2 is the generalization of Lemma 1 to all patterns in PA2 (f). But rst we introduce the following de nitions to ease the later discussion. For positive integers M and K with K 2, we de ne the following: 1) A vector x is called an M 1 layer vector if a) x has at most M components; b) each component of x is a scalar. 2) A vector x is called an M K layer vector if a) x has at most M components; b) each component of x is an M (K ? 1) layer vector. Note: A regular n-dimension vector is an n 1 layer vector. Again, let's demonstrate the idea of Lemma 2 with an example. Let's say in function h(x) of (5.7), we add a second term, then function h(x) becomes h(x) =

n? X

3

3

i=1

hi(xi ; xi+1; xi+2 ; xi+3 ) +

n? X

3

4

j =1

qj (xj ; xj+1; xj+2; xj+3; xj+4):

(5:11)

With the de nition of vector y through (5.8), equation (5.11) becomes

X X h^ (y) = ^hi(yi ; yi+1 ) + q^j (yj ; yj+1 ; yj+2 ): n?1

n?2

i=1

j =1

(5:12)

For the second term in (5.12), obviously each function q^j contains three components of variable y. However, if we can de ne y j = 1; ; n2 : zj = 2yj?1 (5:13) 2j Vector z is a 3 2 layer vector. Then, equation (5.12) becomes n=2?1 n=2?1 ~h(z) = X h~ i (zi ; zi+1 ) + X q~j (zj ; zj+1 ): i=1

j =1

(5:14)

In equation (5.14), both functions h~ i and q~j contain only two components of variable z. Equations (5.8) and (5.13) also demonstrate the mechanics of layer vector de ned earlier. Lemma 2: For any given function f(x), let PA (f), PA1 (f) and PA2 (f) be the corresponding sets associated with function f(x). Assume that Condition 1 is true and for any p 2 PA2 (f), { 11 {

Mp de ned in equation (5.6) is O(1), then there exist a positive integer K N(f) and a vector y whose each component is an M K layer vector, where M = maxp2PA2 (f ) Mp , such that after the replacement of x by y, any p^ 2 PA2 (f) satis es Mp^ 1, where p^ is the corresponding element of p 2 PA2 (f) under vector y and Mp^ is same as (5.6) except using the indices of vector y. Proof: If Mp 1 for any p 2 PA2 (f), de ne y = x; otherwise we assume that there is at least one p 2 PA2 (f) such that Mp 2. Then Lemma 1 indicates that there exists a vector y whose each component is an Mp 1 layer vector such that Mp^ 1, where p^ is the corresponding pattern of p under vector y and Mp^ is same as (5.6) except using the indices of vector y. From the de nition of y in the proof of Lemma 1, we can see that for any pattern q 2 PA2 (f), Mq^ Mq (this is due to the order rule in de ning the set S in the proof of Lemma 1). The de nition of M and the assumptions that Mq is O(1) for any q 2 PA2 (f) and N(f) is O(1) indicate that M is also O(1). Above discussion implies that we can repeat the procedure of de ning a new vector y as in Lemma 1 so that eventually for any q 2 PA1 (f), Mq is at most 1. Therefore, there exists an integer K with K N(f) such that the nal vector y can be de ned and each component of y is an M K layer vector. With the establishment of Lemma 2 and Condition 1 { Condition 3, we are able to provide a sucient condition in the following Theorem so that problem (NLP) can be transformed into problem (DOCP) under this sucient condition. The proof of this Theorem is constructive in a way that state variables, control variables, transition functions and objective functions in problem (DOCP) will be de ned during the proof. Theorem: (Sucient Condition) For any given problem (NLP), let sets P(f), PA (f), PA1 (f) and PA2 (f) be the corresponding sets as de ned in Section 3 associated with function f(x) and satisfy Condition 1 { Condition 3. Assume that P(f) = PA(f) and for any p 2 PA2 (f), Mp de ned in (5.6) is O(1). Then problem (NLP) can be transformed into the format of problem (DOCP) with O(1) dimensions for both state and control variables at each time period. Proof: Lemma 2 guarantees that a vector y can be found so that Mp^ 1 where p^ is the corresponding pattern of p under vector y and Mp^ is same as (5.6) except using the indices of vector y. In addition, there exist positive integers M and K both in the order of O(1) such that y is an M K layer vector. From the de nition of vector y in the proofs of Lemma 1 and Lemma 2, it is easy to check that Condition 1 { Condition 3 still hold under vector y since this x ? y transformation will not change sets PA1 (f) and PA2 (f) and function F(p) in (3.4) except some index rearrangements under vector y. Therefore, we can assume in the rest of the proof that Mp 1 for any p 2 PA2 (f). After identifying sets PA(f), PA1 (f) and PA2 (f), function f(x) can be represented by its abstract patterns (since P(f) = PA(f)) as F(p) = f(x) where p 2 PA(f). (5.15) { 12 {

Partial Separability assumption (Condition 2) assures that there exist functions h1 and h2 such that F(p) = h1(q1; ; qNA1 (f )) + h2(p1 ; ; pNA2 (f )) (5.16) where qi 2 PA1 (f), i = 1; ; NA1 (f) and pj 2 PA2 (f), j = 1; ; NA2 (f). Condition 2 and jpj 2 for any p 2 PA2 (f) (see the Note at the end of Lemma 1) imply that function h2 in (5.16) can be further decomposed as X h2(p1; ; pNA2 (f )) = (5.17) k (pk;1; ; pk;I (k)) K

where K = fk j pk;i 2 PA (f), i = 1; ; I(k), B(pk;i ) B(pk;1 ); i = 2; ; I(k)g, I(k) is a function of k, and k 's are some functions for k 2 K. Condition 3 indicates that for any qi 2 PA1 (f) with jqij = si i = 1; ; NA1 (f), there exists a sequence of functions i;1; ; i;si i = 1; ; NA1 (f), satisfying (5.4). Now, we can de ne the control variable ut 2 R1, state variable yt 2 R1+NA1 (f ) and transition function T(yt ; ut; t) as follows: ut = xt t = 1; ; N where N = n, (5.18) 8 < i;t(ut; [yt]i) if 1 i NA11 (f) and t 2 B (pi ) [yt+1]i = [T(yt; ut; t)]i =: [yt]i if 1 i NA (f) and t 62 B (pi ) ut if i = 1 + NA1 (f) t = 1; ; N (5.19) where pi 2 PA1 (f), functions i;t i = 1; ; NA1 (f), t = 1; ; N are de ned earlier and if i > NA1 (f) or 1 i NA1 (f) but 1 62 B (pi ), [y1 ]i = 0; otherwise, [y1]i = ei where ei satis es ei i z = z for any z 2 R1 and i is the rst operator in pi (see (3.3)). Condition 1 and the assumption that Mp de ned in (5.6) is at most 1 for any p 2 PA2 (f) guarantee that the dimension of each yt, t = 1; ; N +1 is O(1). Equation (5.18) implies that the dimension of each ut t = 1; ; N is O(1). Under the de nitions of ut and yt in (5.18) and (5.19), function h1(q1; ; qNA1 (f )) will become a function of yN +1 , and function h2(p1 ; ; pNA2 (f )) will become X~ h2(p1 ; ; pNA2 (f )) = (5.20) k (yk ; uk ) 2

k2K

where ~k is the corresponding function of k for each k 2 K after the replacement of (5.18) and (5.19) into (5.17), respectively. The replacement of (5.18) and (5.19) into (5.17) follows the rule that for any element of pk (k 2 K) who contains two adjacent indices of x (because of Mp 1), the control variable u will replace the x variable with the larger index, the state variable y will replace the x variable with the smaller index according to (5.18) and (5.19). By rearranging the items in F(p) or f(x), it is straightforward to see that F(p) or f(x) can be represented as { 13 {

P

f(x) = F(p) = Nt=1 g(yt; ut; t) + g(yN +1 ; N + 1) for some functions g(; ; t), t = 1; ; N and g(; N + 1). Equations (5.18), (5.19) and (5.21) conclude our proof.

Observations:

(5.21)

a) The assumption P(f) = PA (f) in the Theorem is just for the simplicity of the proof. Usually, this assumption can be removed with some minor changes in de ning control variable and transition function to cover those p 2 P(f) but p 62 PA (f). b) In the proof of the Theorem, the de nitions of control variables in (5.18) and transition functions in (5.19) are for the general case. Usually, the de nitions of control variables and transition functions can be much more eective for each individual problem. c) It is important to know that the dimension of state variables can be dierent at dierent time periods. This is also true for control variables. d) The Theorem not only establishes a sucient condition for the conversion of problem (NLP) into problem (DOCP), but also provides a procedure to show how this conversion can be done. Since above Theorem provides a sucient condition, those problems in the form of (NLP) satisfying the requirement of the Theorem form a special class of unconstrained optimization problems. With the engagement of optimal control algorithms, any problem in this special class can be solved in O(n) ops with O(n) storage per iteration. The following two corollaries consider two common classes of unconstrained optimization problems. Corollary 1: If the Hessian matrix of function f(x) in problem (NLP) is banded with bandwidth k and k is O(1), then problem (NLP) can be converted into the format of problem (DOCP). Proof: 0 De ne 1 x(t?1)k+1 yt =B @ ... CA t = 1; ; N ? 1 and t = N if mod(n; k) = 0, x(t?1)k+k 0x n 1 b k ck+1 n if mod(n; k) = 0 , B C . yN =@ .. A if mod(n; k) 6= 0 where N = bkn c + 1 otherwise k xn F(y) = f(x). Then, it is straightforward to check that the Hessian matrix of F(y) is tridiagonal with respect to vector y. Now, we de ne n ut = yt zt+1 = ut t = 1; ; N, z1 = 0. { 14 {

Thus, the new objective function will look like P f(x) = Nt=1 g(zt; ut; t) + g(zN +1 ; N + 1) where g(; ; t) t = 1; ; N and g(; N +1) are the functions resulting from this replacement in F(y). Assumption k = O(1) indicates that the dimensions for ut and zt+1 t = 1; ; N are also O(1). Now, let's look at another type of problem (NLP) as follows: ? f(x) = g f1 (x[1] ) 1 f2 (x[2] ) 2 l?1 fl (x[l] ) (5.22) where i 2 (de ned in (3.1)) i = 1; ; l ? 1, l is O(n), [i] N , [i] \ [j] = ; if i 6= j, for i; j = 1; ; l, x[i] = fxk j k 2 [i]g but the number of elements in x[i] is O(1) for i = 1; ; l. Then, we will have the following result. Corollary 2: If function f(x) in problem (NLP) can be expressed in the form of (5.22), then problem (NLP) can be converted into the format of (DOCP). Proof: For simplicity, we only prove the case where fi (x[i] ) 2 R1 for i = 1; ; l. Since [i] \ [j] = ; if i 6= j for i; j = 1; ; l, by rearranging the indices, we can assume that if i < j, then k < l for every k 2 [i], l 2 [j]. Then we can de ne ut = x[t] t = 1; ; l, yt+1 = yt t?1 ft(ut) t = 1; ; l where y1 = 1 and 0 is scalar multiplication. Obviously, f(x) = g(yl+1 ).

Group Partial Separability

Since the introduction of partial separability by Griewank and Toint [12], Conn et al: [6] propose a new way to describe the structure of a nonlinear function under the notion of group partial separability. They introduce both partially separable functions and group partially separable functions to recognize some structures in nonlinear functions. However, their group partially separable functions are limited to a speci c form (see equation (20) in [6]) even this form covers a wide range of functions. While in our study, there is no restrictions on the function form. Our restrictions are on the structure of a function, these restrictions can be seen from the assumptions in above Theorem. Let's look at a group partially separable function f(x) de ned in x3.3.1 of [7]. From the de nition, it is straightforward to have

rxf(x) =

ng X dgi

rxi = d ( wi;j rxfj + ai ); d i i=1 i j 2Ji i=1

rxxf(x) = 2

ng X dgi X

ng ? X X

(

i=1 j 2Ji

wi;j rxfj + ai ) ri;i gi ( { 15 {

X j 2Ji

wi;j rxfj + ai )T

(5:23)

+

ng X dgi X

wi;j (rxxfj ): d i i=1 j 2Ji 2

(5:24)

It is obvious to see that both rxf(x) and r2xxf(x) are the sums of some sparse vectors and matrices, respectively. Under this circumstance, it seems dicult to impose any assumptions so that minimizing f(x) can be solved in O(n) ops per iteration. It is very interesting to notice that in general, a function f(x) satisfying the conditions of Corollary 2 with operators i i = 1; ; l ? 1 in (5.22) being scalar multiplication is not a group partially separable function according to the de nition in x3.3.1 of [7]. On the other hand, under some additional assumptions a group partially separable function can have the following property. But rst, let's de ne Ki = fk j (ai )k 6= 0; k = 1; ; ng i = 1; ; ng (5.25) where ng and ai i = 1; ; ng are de ned in x3.3.1 of [7]. ? Si = [j2Ji [j] [ Ki i = 1; ; ng (5.26) where Ji i = 1; ; ng and [j]'s are de ned in x3.3.1 of [7]. Corollary 3: Let function f(x) be a group partially separable function de ned in x3.3.1 of [7] with n variables, in addition, f(x) satis es the following conditions: a) ng is O(n); b) for each i = 1; ; ng , the number of elements in set Si is O(1); c) Si \ Sj = ; if i 6= j for i; j = 1; ; ng . Then, minimizing f(x) can be solved in O(n) ops. Proof: With the de nition of each operator i i = 1; ; l ? 1 in (5.22) being scalar addition and function g in (5.22) is linear, the conditions of Corollary 2 are met. 6. Application In Large Scale Unconstrained Optimization Problems In this section, we will apply the sucient condition obtained in Section 5 to a collection of unconstrained optimization problems published in [18], [5], [10] with variable problem size. For each problem listed below, a brief examination of the sucient condition will be provided along with the de nitions of control variables, state variables and transition functions. Problem 1: Extended Rosenbrock function (problem (21) in [18]) The conditions of the Theorem are satis ed. The control variables, state variables and transition functions can be de ned as follows: ut = xt, t = 1; ; N where N = n=2, yt+1 = T(yt ; ut; t) = u2t , t = 1; ; N ? 1 and y1 0. Problem 2: Extended Powell singular function (problem (22) in [18]) The conditions of Corollary 1 are satis ed. The control variables, state variables and transition functions can be de ned as follows: { 16 {

x

ut =

t?1 x2t 2

; t = 1; ; N

where N = n=4,

yt+1 = T(yt ; ut; t) = ut, t = 1; ; N and y1 0. Problem 3: Penalty function I (problem (23) in [18]) The conditions of the Theorem are satis ed. The control variables, state variables and transition functions can be de ned as follows: ut = xt, t = 1; ; N where N = n, yt+1 = T(yt ; ut; t) = yt + u2t , t = 1; ; N and y1 0. Problem 4: Penalty function II (problem (24) in [18]) The conditions of the Theorem are satis ed. The control variables, state variables and transition functions can be de ned as follows: ut = xt, t = 1; ; N where N = n, [y ] + (N ? t + 1) ([y ] )2 t1 ; t = 1; ; N and y1 0. yt+1 = T(yt ; ut; t) = t 2 ut Problem 5: Variable dimensioned function (problem (25) in [18]) The conditions of the Theorem are satis ed. The control variables, state variables and transition functions can be de ned as follows: ut = xt ? 1, t = 1; ; N where N = n, yt+1 = T(yt ; ut; t) = yt + t ut, t = 1; ; N and y1 0. Problem 6: Trigonometric function (problem (26) in [18]) The conditions of the Theorem are satis ed. The control variables, state variables and transition functions can be de ned as follows: ut = xt, t = 1; ; N where N = n, 0 sin(u ) 1 t yt+1 = T(yt ; ut; t) = yt+@ cos(ut) A; t = 1; ; N and y1 0. t cos(ut) Problem 7: Brown almost linear function (problem (27) in [18]) The conditions of the Theorem are satis ed. The control variables, state variables and transition functions can be de ned as follows: ut = xt, t = 1; ; N where N = n, 0 [y ] + u t 1 t yt+1 = T(yt ; ut; t) = [y ] u ; t = 1; ; N and y1 1 . t2 t Problem 8: Discrete boundary value function (problem (28) in [18]) The conditions of Corollary 1 are satis ed. The control variables, state variables and transition functions can be de ned as follows: { 17 {

y

ut =

t?1 y2t

2

; t = 1; ; N ? 1 and t = N if n is even, uN = yn if n is odd

n=2

even , where N = (n + 1)=2 ifif nn isis odd yt+1 = T(yt ; ut; t) = ut, t = 1; ; N and y1 0. Problem 9: Discrete integral equation function (problem (29) in [18]) The conditions of the Theorem are satis ed. The control variables, state variables and transition functions can be de ned as follows: ut = xt, t = 1; ; N where N = n, 001 0 [yt]1 + pt(ut + pt + 1)3 1 BB 0 CC BB [yt]2 + (1 ? pt2)(ut + pt + 1)3 C C [y ] + p [y ] B0C C B t3 t?1 t 2 ; t = 1; ; N and y yt+1 = T(yt ; ut; t) =B C 1 B C . B@ 0 CA B@ [yt]4 + pt?1(1 ? pt?1) [yt]1 C A 0 [yt]5 + ptut 0 ut Problem 10: Broyden tridiagonal function (problem (30) in [18]) The conditions of Corollary 1 are satis ed. The control variables, state variables and transition functions can be de ned exactly as the ones in Problem 8. Problem 11: Broyden banded function (problem (31) in [18]) The conditions of Corollary 1 are satis ed. The control variables, state variables and transition functions can be de ned as follows: 0 x6t?5 1 BB x6t?4 CC n=6 x if n=6 is an integer , B 6t?3 C ; t = 1; ; N ? 1 where N = ut =B C bn=6c + 1 otherwise B@ x6t?2 CA x6t?1 x6t 0 xn?5 1 0x n 1 BB xn?4 CC b 6 c+1 x B B n ? 3C . C uN =B B@ xn?2 CCA if n=6 is an integer, uN =@ .. A otherwise, xn xn?1 xn yt+1 = T(yt ; ut; t) = ut, t = 1; ; N and y1 0. Problem 12: Linear function{full rank (problem (32) in [18]) The conditions of the Theorem are satis ed. The control variables, state variables and transition functions can be de ned as follows: { 18 {

ut = xt, t = 1; ; N where N = m = n, yt+1 = T(yt ; ut; t) = xt + ut, t = 1; ; N and y1 0. Problem 13: Linear function{rank 1 (problem (33) in [18]) The conditions of the Theorem are satis ed. The control variables, state variables and transition functions can be de ned as follows: ut = xt, t = 1; ; N where N = m = n, yt+1 = T(yt ; ut; t) = xt + t ut, t = 1; ; N and y1 0. Problem 14: Linear function{rank 1 with zero columns and rows (problem (34) in [18]) The conditions of the Theorem are satis ed. The control variables, state variables and transition functions can be de ned as follows: ut = xt, t = 1; ; N where N = m = n, y and t = N and y 0. yt+1 = T(yt ; ut; t) = yt + t u ifif t2 = 1t 1 N ?1 t t Problem 15: Chebyquad function (problem (35) in [18]) This problem can not be solved in O(n) ops. However, we can solve it in O(n2 ) ops by optimal control algorithms due to the sparsity. The control variables, state variables and transition functions can be de ned as follows: ut = xt, t = 1; ; N where N = m = n, [yt+1]i = (T(yt; ut; t))i = [yt]i + Ci(ut), i = 1; ; N, t = 1; ; N and y1 0 where Ci is the i-th Chebyshev polynomial shifted to the interval [0,1]. Problem 16: Boundary value problem (problem (6.4) in [10]) The conditions of Corollary 1 are satis ed. The control variables, state variables and transition functions can be de ned as follows: ut = xt, t = 1; ; N where N = n, yt+1 = T(yt ; ut; t) = 12 (yt ? ut)2 , t = 1; ; N and y1 0. Problem 17: The Chained Singular function (problem (5) in [5]) The conditions of Corollary 1 are satis ed. The control variables, state variables and transition functions can be de ned as follows: x ut = x2t?1 ; t = 1; ; N where N = n=2, 2t yt+1 = T(yt ; ut; t) = ut, t = 1; ; N and y1 0. Problem 18: The Generalized Wood function (problem (7) in [5]) The conditions of Corollary 1 are satis ed. The control variables, state variables and transition functions can be de ned exactly as the ones in Problem 2. { 19 {

Problem 19: The Chained Wood function (problem (8) in [5])

The conditions of Corollary 1 are satis ed. The control variables, state variables and transition functions can be de ned exactly as the ones in Problem 17. Problem 20: A generalization of Hock and Schittkowski's 45th problem (problem (9) in [5]) The conditions of the Theorem are satis ed. The control variables, state variables and transition functions can be de ned as follows: ut = xt, t = 1; ; N where N = n, yt+1 = T(yt ; ut; t) = xt ut=t, t = 1; ; N and y1 0. Problem 21: A generalization of the Broyden Tridiagonal function (problem (10) in [5]) The conditions of Corollary 1 are satis ed. The control variables, state variables and transition functions can be de ned exactly as the ones in Problem 8. Problem 22: A generalization of the Broyden Banded function (problem (12) in [5]) The conditions of Corollary 1 are satis ed. The control variables, state variables and transition functions can be de ned exactly as the ones in Problem 11. Problem 23: Toint's 7-diagonal generalization of the Broyden Tridiagonal function (problem (14) in [5]) After careful selection of pattern set P(f), the conditions of the Theorem can be satis ed. The control variables, state variables and transition functions can be de ned as follows: 0 x2t?1 1 ut =B @ xmx+22tt?1 CA; t = 1; ; N ? 1 and t = N if m is even, xm+2t x if m is odd uN = x m 2m m=2 m is even , where m = n=2 and N = (m + 1)=2 ifif m is odd [u ] if 1 i 4 [y2]i = T(y1 ; u1; 1) = [u1]i if i = 5; 6 , y1 0, 1 i?2 [u ] if 1 i 4 [yt+1]i = T(yt; ut; t) = [y t] i if i = 5; 6 ; t = 2; ; N ? 1. ti (Note: This problem is a case of PA (f) 6= P(f).)

{ 20 {

Problem 24: A trigonometric function (problem (15) in [5])

The conditions of the Theorem are satis ed. The control variables, state variables and transition functions can be de ned as follows: ut = xt, t = 1; ; N where N = n, [y ] + sin(u ) yt+1 = T(yt ; ut; t) = [y t] 1 + cos(ut ) ; t = 1; ; N and y1 0. t2 t Problem 25: Another trigonometric function (problem (16) in [5]) No progress can be made for this problem. Problem 26: A generalization of the Cragg and Levy function (problem (17) in [5]) The conditions of Corollary 1 are satis ed. The control variables, state variables and transition functions can be de ned exactly as the ones in Problem 2. Problem 27: A penalty function (problem 18) in [5]) The conditions of the Theorem are satis ed. The control variables, state variables and transition functions can be de ned as follows: ut = xt, t = 1; ; N where N = n, [y ] + 1=u yt+1 = T(yt ; ut; t) = [yt ]1 + t=u t ; t = 1; ; N and y1 0. t2 t Problem 28: An Augmented Lagrangian function for a generalization of Hock and Schittkowski's 80th problem (problem (19) in [5]) The conditions of Corollary 1 are satis ed. The control variables, state variables and transition functions can be de ned as follows: 0 y5t?4 1 B y5t?3 CC ut =B B@ y5t?2 CA; t = 1; ; N where N = n=5, y5t?1 y5t yt+1 = T(yt ; ut; t) = ut, t = 1; ; N and y1 0. Problem 29: A generalization of a function due to A. Brown (problem (20) in [5]) The conditions of the Theorem are satis ed. The control variables, state variables and transition functions can be de ned as follows: ut = xt, t = 1; ; N where N = n=2, yt+1 = T(yt ; ut; t) = ut, t = 1; ; N and y1 0. Problem 30: A generalization of another function due to A. Brown (problem (21) in [5]) The conditions of the Theorem are satis ed. The control variables, state variables and transition functions can be de ned as follows: { 21 {

ut = xt, t = 1; ; N where N = n, yt+1 = T(yt ; ut; t) = ut, t = 1; ; N and y1 0. Problem 31: The discretization of a variational problem (problem (23) in [5]) The conditions of the Theorem are satis ed. The control variables, state variables and transition functions can be de ned exactly as the ones in Problem 30. The results from above thirty-one problems indicate that the proposed approach only fails for two problems (Problem 25 and Problem 15). The rest problems can be converted into optimal control problems and solved in O(n) ops. It is very important to notice that among the twenty-nine solvable problems, some of the Hessian matrices are not sparse at all. This is expected since we are not looking at the sparsity structure of the Hessian matrix but rather focusing on the structure in f(x). We would like to point out here that in order to convert problem (NLP) into problem (DOCP), it is crucial to select the appropriate pattern set P(f) in certain circumstances. Sometimes, some rearrangements for the objective function are necessary (see Problem 9 and Problem 23). The case with P(f) 6= PA(f) can be found in Problem 23. 7. Conclusions and Future Research In this paper, we have proposed an optimal control approach to solve certain large scale unconstrained optimization problems. Rather than the traditional sparsity study of the Hessian matrix and the recent separable function approach, we introduce Pattern Structure to identify and describe the structure of any function. With the establishment of a Pattern Set for any function, we are able to provide a sucient condition for the conversion of an optimization problem into a discrete time optimal control problem. An automatic procedure is also presented to make such conversion practical. The most important and attractive result for such conversion is that the converted optimization problems can be solved, with superlinear and quadratic convergence, in only O(n) ops and O(n) storage per iteration by optimal control algorithms, where n is the number of variables. Comparing with the standard optimization methods that are at least superlinearly convergent, this optimal control approach for large scale unconstrained optimization problem could result in the saving of a factor n in both computation and storage. Following the sucient condition as in the Theorem, a special class of unconstrained optimization problems can be identi ed and de ned so that these problems can be solved eciently by optimal control algorithms. One example in this special class is the function de ned in (1.1). It is straightforward to see that in general, the Hessian matrix of this function f(x) is not sparse, and this function f(x) is not partially separable either in terms of the de nition in [7]. Thirty-one large scale unconstrained optimization problems collected in the literature have been examed by the sucient condition in Section 6. Only two problems (one can be solved in O(n2 ) ops, Problem 15) fail to satisfy the sucient condition. The remaining twenty-nine problems can be solved in O(n) ops. One important observation for these { 22 {

twenty-nine problems is that some of the Hessian matrices are dense and unstructured (Problems 3 { 7, 9, 20, 24 and 27). As we mentioned earlier, sometimes it is crucial to select the appropriate pattern set P(f), and/or make some arrangements for the objective function f(x) (see Problem 9) so that the structure of f(x) can be recognized and identi ed. So far, there is no general rule to make such arrangement or adjustment. This question remains to be answered. Another challenging but very practical task is how to construct a uni ed procedure so that the determination of pattern set can be implemented automatically by computers.

Acknowledgement

The author would like to thank Professor Thomas F. Coleman for many illuminating discussions and constructive comments.

{ 23 {

References

[1] A. E. Bryson, Jr. and Y. Ho, Applied Optimal Control, Hemisphere Publ. Corp., Washington D.C., 1975. [2] T. F. Coleman, Large Sparse Numerical Optimization, Lecture Notes in Computer Science 165, Springer-Verlag, New York, 1984. [3] T. F. Coleman, Large-scale numerical optimization: introduction and overview, in Encyclopedia of Computer Science and Technology 28, A. Kent and J. Williams, eds., Marcel Dekker, Inc., New York, 1993, pp. 167-195. [4] T. F. Coleman and A. Liao, An ecient trust region method for unconstrained discrete-time optimal control problems, Computational Optimization and Applications, 6 (1995), pp. 47-66. [5] A. R. Conn, N. I. M. Gould and Ph. L. Toint Testing a class of methods for solving minimization problems with simple bounds on the variables, Mathematics of Computation, 50 (1988), pp. 399-430. [6] A. R. Conn, N. I. M. Gould and Ph. L. Toint An introduction to the structure of large scale nonlinear optimization problems and the LANCELOT project, in Computing Methods in Applied Sciences and Engineering, R. Glowinski and A. Lichnewsky, eds., SIAM, Philadelphia, PA, 1990, pp. 42-54. [7] A. R. Conn, N. I. M. Gould and Ph. L. Toint LANCELOT: A Fortran Package for Large-Scale Nonlinear Optimization (Release A), Springer Series in Computational Mathematics 17, Springer-Verlag, 1992. [8] A. R. Conn, N. I. M. Gould and Ph. L. Toint Large-scale nonlinear constrained optimization: a current survey, in Algorithms for Continuous Optimization: the State of the Art, E. Spedicato, ed., Kluwer Academic Publishers, 1994, pp. 287-332. [9] J. Dunn and D. P. Bertsekas, Ecient dynamic programming implementations of Newton's method for unconstrained optimal control problems, JOTA, 63 (1989), pp. 23-38. [10] R. Fletcher, An optimal positive de nite update for sparse Hessian matrices, SIAM J. Optim., 5 (1995), pp. 192-215. [11] P. E. Gill and W. Murray, Conjugate gradient methods for large scale nonlinear optimization, Technical Report SOL 79-15, Stanford University, 1975. [12] A. Griewank and Ph. L. Toint On the unconstrained optimization of partially separable functions, in Nonlinear Optimization 1981, M. J. D. Powell, ed., Academic Press, London and New York, 1982, pp. 301-312. [13] D. Jacobson Second-order and second-variation methods for determining optimal control: a comparative study using dierential dynamic programming, Int. J. Control, 7 (1968), pp. 175-196. [14] D. Jacobson and D. Mayne, Dierential Dynamic Programming, Elsevier Sci. Publ., New York, 1970. { 24 {

[15] L.-Z. Liao and C. A. Shoemaker, Convergence in unconstrained discrete-time dierential dynamic programming, IEEE Trans. Automat. Contr., 36 (1991), pp. 692-706. [16] D. Mayne, A second-order gradient method for determining optimal trajectories of non-linear discrete-time systems, Intnl. J. Control, 3 (1966), pp. 85-95. [17] S. K. Mitter, Successive approximation methods for the solution of optimal control problems, Automatica, 3 (1966), pp. 135-149. [18] J. More, B. Garbow and K. Hillstrom, Testing unconstrained optimization software, ACM Transactions on Mathematical Software, 7 (1981), pp. 17-41. [19] D. Murray and S. Yakowitz, The application of optimal control methodology to nonlinear programming problems, Mathematical Programming, 21 (1981), pp. 331-347. [20] J. Nocedal The performance of several algorithms for large scale unconstrained optimization, in Large-Scale Numerical Optimization, T. F. Coleman and Y. Li, eds., SIAM, Philadelphia, PA, 1992, pp. 138-151. [21] K. Ohno, A new approach of dierential dynamic programming for discrete time systems, IEEE Trans. Automat. Contr., 23 (1978), pp. 37-47. [22] J. F. A. De O. Pantoja, Dierential dynamic programming and Newton's method, Int. J. Control, 47 (1988), pp. 1539-1553. [23] C. A. Shoemaker, L.-Z. Liao, H. Caey and L-C. Chang, Optimal Control of Nonlinear Engineering Systems, in Computer Assisted Modeling on the IBM 3090: The 1989 IBM Contest Prize Papers, K. R. Billingsley, H. U. Brown and E. Derohanes, eds., The Baldwin Press, 1992, pp. 403-445. [24] S. Yakowitz, Theoretical and computational advances in dierential dynamic programming, Control and Cybernetics, 17 (1988), pp. 173-189. [25] S. Yakowitz and B. Rutherford, Computational aspects of discrete-time optimal control, Appl. Math. Comput., 15 (1984), pp. 29-45.

{ 25 {