Coordinate Search Algorithms in Multilevel ... - Optimization Online

27 downloads 1851 Views 258KB Size Report
derivative-free coordinate search method, where coarse-level objective func- ..... class of derivative-free algorithms known in literature by the name of generating ..... The domain is discretized by triangulating as described before, with grid.
Coordinate Search Algorithms in Multilevel Optimization Emanuele Frandi1 and Alessandra Papini2 1

Dipartimento di Scienza e Alta Tecnologia, Universit` a degli Studi dell’Insubria [email protected] 2 Dipartimento di Energetica Sergio Stecco, Universit` a degli Studi di Firenze [email protected]

Abstract Many optimization problems of practical interest arise from the discretization of continuous problems. Classical examples can be found in calculus of variations, optimal control and image processing. In recent years a number of strategies have been proposed for the solution of such problems, broadly known as multilevel methods. Inspired by classical multigrid schemes for linear systems, they exploit the possibility of solving the problem on coarser discretization levels to accelerate the computation of a finest-level solution. In this paper, we study the applicability of coordinate search algorithms in a multilevel optimization paradigm. We develop a multilevel derivative-free coordinate search method, where coarse-level objective functions are defined by suitable surrogate models. We employ a recursive v-cycle correction scheme, which exhibits multigrid-like error smoothing properties. On a practical level, the algorithm is implemented in tandem with a full-multilevel initialization. A suitable strategy to manage the coordinate search stepsize on different levels is also proposed, which gives a substantial contribution to the overall speed of the algorithm. Numerical experiments on several examples show promising results. The presented algorithm can solve large problems in a reasonable time, thus overcoming size and convergence speed limitations typical of coordinate search methods.

1

Introduction

Multigrid methods are a class of algorithms originally developed as a powerful tool for solving large linear systems arising from the discretization of elliptic PDEs [3, 18]. Broadly speaking, they consist in applying classical stationary iterative schemes on a sequence of linear systems corresponding to different levels of discretization of the original equation. Hence the term “multigrid”. 1

By solving coarser, low-dimensional problems, and using the solutions as correctors for higher-level estimates, multigrid methods greatly boost the rate of convergence of the underlying iterative scheme. In recent years there have been numerous efforts in extending the multigrid framework to the field of optimization. Many large optimization problems arise in fact from the discretization of a corresponding infinite dimensional problem (i.e. the minimization of some functional). Typical examples can be found in the contexts of calculus of variations, of PDE constrained optimization and optimal control [2, 10, 19], signal denoising and image deblurring [4, 5], and image reconstruction [15, 16, 20]. For such situations several algorithms have been proposed, more or less directly inspired by traditional multigrid techniques for nonlinear systems. On a practical level, these methods tend to perform remarkably well in practice [6–8, 13, 14]. In the present paper, we devise a multigrid strategy based on a classical coordinate search algorithm, which falls in the MG/Opt framework introduced by Nash in [13]. There are a number of reasons, which make this idea attractive: besides being a simple and easy to implement procedure, coordinate search is a derivative-free method, a feature which can be important in applications where gradient information is unavailable or difficult to compute. Other motivations are more strictly related to the multilevel optimization context. By themselves, coordinate search methods are not suitable to tackle large-scale problems, as their convergence speed is generally slow and tends to deteriorate when the size of the problem increases. We argue, however, that by employing a multilevel approach such drawbacks can be effectively overcome, obtaining an algorithm such that the convergence speed of the underlying optimization method is substantially improved. Moreover, we show that such an algorithm preserves the error smoothing properties typical of linear multigrid methods. The paper is organized as follows. In Section 2 we give a brief overview of the ideas behind multilevel optimization. In Section 3 we motivate our proposal and describe a derivative-free multilevel optimization model. Section 4 describes in detail a multilevel algorithm based on a coordinate search routine. In Section 5 we present experimental results on several test problems. Finally, some concluding remarks are drawn in Section 6 .

2

Overview of multilevel optimization

Consider a hierarchy of optimization problems min fh (xh )

xh ∈Xh

(1)

arising from the discretization of some continuous problem. The parameter h, typically corresponding to some grid spacing, denotes the discretization level. Consider a pair of successive discretizations. Denote by h the finer level and by H the coarser one, and by Xh and XH the corresponding variable spaces, of dimension nh and nH respectively. Here, we mostly focus on the typical situation where the problem arises from a discretization on a regular grid and 2

H = 2h. Supposing e.g. we have l levels and h is the parameter corresponding to the finest level, the hierarchy consists of the discretizations defined by grid spacings h, 2h, . . . , ph, where p = 2l−1 . In order to define and run a multilevel algorithm, a number of ingredients are needed. (a) A pair of linear restriction and prolongation operators, R : Xh → XH and P : XH → Xh , to transfer variables between grids. Such operators are usually required to satisfy the variational property P = cRT ,

c > 0 constant,

(2)

which is often imposed in most multigrid algorithms. Though not mandatory from a theoretical point of view, (2) facilitates the analysis of multilevel methods as well as the derivation of some useful properties, in both the linear case and in the optimization context [3, 13, 14]. We do not discuss the issue of selecting the operators, and adopt as a default choice the linear interpolation and full-weighting operators classically used in multigrid methods [3, 18]. (b) An underlying globally convergent optimization method. (c) A multigrid-like correction scheme. An effective correction scheme has been proposed in [13] called MG/Opt, which is directly inspired by classical nonlinear multigrid methods. Schematically, given an initial point x0h at some level h, the function fh (x) is minimized by performing the following steps. • Perform some optimization iterations, obtain x ˜h (pre-smoothing). • Define a coarse initial guess x ˜H = R˜ xh (restriction). • Starting from x ˜H , obtain x+ H by minimizing the surrogate model s T fH (z) = fH (z) − vH z,

(3)

where vH is a suitable correction vector. • Define a search direction eh = P (x+ ˜H ) (prolongation). H −x • Find (e.g. via line-search) x+ ˜h + αeh s.t. fh (x+ xh ) (correch = x h ) ≤ fh (˜ tion). • Refine x+ h by applying some more optimization iterations (post-smoothing). The minimization of the coarse-level surrogate function is usually performed recursively, in which case it is said that the above scheme has a v-cycle structure. The procedure is iterated on the finest level until a stopping criterion for the underlying optimization algorithm is satisfied. 3

This scheme is close in spirit to the classical full approximation scheme (FAS) for nonlinear equations [3]. An important exception is that FAS does not include a line-search step and always takes a full correction step. The inclusion of the line-search step is essentially a safeguard for convergence. See [13] for a case of failure to converge when a full correction step is taken after each recursive call. The correction vector in (3) plays an important role. In many multilevel models, such as [13] and [7, 8], it is defined as vH = ∇fH (˜ xH ) − R∇fh (˜ xh ),

(4)

so that the following first-order relationship between the coarse and fine-level objective functions holds: s ∇fH (˜ xH ) = R∇fh (˜ xh ).

(5)

This definition has its roots in nonlinear multigrid. Specifically, it is an adaptation of the tau-correction used in the FAS scheme [3, 18]. From equation (5) a number of useful properties follow, which ensure that under suitable conditions the fine level search direction eh is a descent direction for fh at x ˜h . For example, s − x ˜ is a descent direction for fH at x˜H , i.e. it is easy to see that if eH = x+ H H s T ∇fH (˜ xH ) eH < 0, then using (2) and (5) s ∇fh (˜ xh )T eh = ∇fh (˜ xh )T P eH = c(R∇fh (˜ xh ))T eH = c∇fH (˜ xH )T eH < 0,

that is, eh = P eH is a descent direction for fh at x˜h . Descent at x˜h can also s be guaranteed under some additional conditions, e.g. by optimizing fH in a sufficiently small neighborhood of x ˜H [14]. The terms pre and post-smoothing are retained from the linear multigrid terminology, and recall the error smoothing properties which a good underlying iterative scheme (often referred to as a smoother rather than a solver) should possess. We briefly expand on this crucial point by recalling the mechanism lying at the core of multigrid methods. An error vector for a linear system can be seen as a linear combination of high-frequency and low-frequency sinusoidal waves, or Fourier modes. Iterative schemes commonly chosen in multigrid such as the damped Jacobi or Gauss-Seidel methods have a peculiar behavior when applied to modes of different frequencies. They quickly reduce oscillatory modes, and have little or no effect on smoother modes, which is a cause of their tendency to stall after a few iterations. Low-frequency modes, however, can be reinterpreted as oscillatory modes on a coarser grid, where they can be damped effectively by applying the iterative scheme to a residual equation which provides an error correction for the fine-grid solution. This idea is applied recursively, leading to a scheme which effectively corrects all the frequencies of the error in a very small number of iterations [3, 18].

2.1

Motivations for coordinate search

Our aim is to derive a multilevel algorithm using coordinate search as the underlying optimization method. To motivate our proposal, we start by recalling how the idea of error smoothing carries over to the optimization context. 4

Consider the classical cyclical coordinate descent method for the general problem minx∈Rn f (x). Starting from an initial point x0 , a coordinate descent iteration can be written as k k xk+1 = argmin f (xk+1 , . . . , xk+1 1 i i−1 , ξ, xi+1 , . . . , xn ),

i = 1, . . . , n,

ξ∈R

where the index k represents the total number of n-coordinate cycles performed. It is well-known that the classical Gauss-Seidel method for solving Ax = b, with A symmetric positive definite, is equivalent to cyclical coordinate descent for minimizing the quadratic function f (x) := 12 xT Ax − bT x.

(6)

With this equivalence in mind, it is natural to expect the coordinate descent algorithm to retain the smoothing properties of Gauss-Seidel iterations, thus making it a suitable choice as a multilevel smoother. This idea is in fact well known in the field of multilevel optimization and has been explored in a number of works: variations of coordinate descent have been tested in a multilevel framework with regard to some inverse problems arising in emission tomography [15, 20], in signal or image denoising, and in image restoration [4, 5]. Results in the cited papers show that such methods yields improved performance compared with more traditional approaches. The use of coordinate descent as a multilevel smoother has also been recently considered in the context of the trust-region oriented RMTR framework [7, 8]. In spite of its conceptual simplicity, exact one-dimensional minimization can be cumbersome to realize in practice. This is the case e.g. when directional derivatives are unavailable or difficult to compute for some reason. However, smoothing steps in multigrid schemes do not need to be very accurate: their purpose is to smooth out oscillatory error components. This is enough to obtain an effective correction step. This idea found its way in the context of nonlinear multigrid, where nonlinear Gauss-Seidel iterations with scalar Newton steps to update a variable are used at each level: since the Gauss-Seidel smoother is not intended for exact system solving, the number of Newton steps in each iteration can be kept very small [3]. On the basis of these considerations, a coordinate search method possibly constitutes a simpler and more straightforward way to deal with the aforementioned difficulties. This method works in a similar fashion to coordinate descent, but makes no use of explicit gradient information. Instead of performing exact coordinate-wise updates, it exploits function value samplings around the current iterate along the coordinate axes, in order to find a point satisfying a suitable decrease condition [9]. It seems sensible to conjecture that, in a multilevel setting, this type of inexact derivative-free minimization could replace exact coordinate-wise minimization and still produce good practical results. We shall see in Section 5 that this is indeed the case.

5

3

A derivative-free multilevel framework

The way in which the correction vector (4) is defined relies on the availability of gradients at any level and any point. Since we want to use a derivative-free method, we need surrogate functions which do not rely on gradient computations. Now, it is clear from the discussion in [14] that the convergence properties of the multilevel v-cycle depend only from the global convergence of the underlying algorithm, and from the fact that fh (x+ xh ). H ) ≤ fh (˜

(7)

In other words, global convergence depends entirely on the structure of the multilevel procedure, and not on the surrogate model used in the recursion. In principle, it is not even necessary to perform a recursion step: if eh is not a good direction, then the scheme simply reduces to the underlying algorithm.1 In fact, we can regard the multilevel v-cycle as a purely theoretical framework, which guarantees convergence while placing minimal requirements on how the correction scheme is designed. On the other hand, the practical improvement in using a multilevel procedure entirely depends on how the algorithm is actually implemented [14]. We now describe how an alternative surrogate model can be constructed without explicitly resorting to gradient computations. For reasons which will be clarified in Section 4, we seek for: • a suitable approximation for the fine-level gradient gh ≈ ∇fh (˜ xh ); • a formula for the surrogate model which enforces a relation analogous to (5) but does not require knowledge of ∇fH (˜ xH ). Supposing we have obtained gh , we start by defining an approximate surrogate s T fˆH (z) := fH (z) − vˆH z, (8) s where vˆH = ∇fH (˜ xH ) − Rgh . Note in passing that ∇fˆH (˜ xH ) = Rgh , which is s s ˆ essentially (5), with fH replaced by fH . s Instead of fˆH , however, we want to minimize an objective which does not require knowledge of the coarse-level gradient. We propose to search for the coarse level correction by optimizing the following objective function: s qˆH (z) :=

fH (z) + fH (2˜ xH − z) + (Rgh )T z. 2

(9)

s Proposition 1. The function qˆH (z) is such that s s qˆH (z) = fˆH (z) + ∇fH (˜ xH )T x ˜H + O(kz − x ˜H k3 ).

=

s fH (z)

T

(10a) T

3

+ ∇fH (˜ xH ) x ˜H + (R(gh − ∇fh (˜ xh ))) z + O(kz − x ˜H k ). (10b)

1 As discussed before, however, if we don’t check for (7) and perform a full correction step convergence is not guaranteed in general.

6

s Proof. We rewrite fˆH (z) with the change of variables z = x ˜H + d as s s s fˆH (z) = fˆH (˜ xH + d) = qH (˜ xH + d) − ∇fH (˜ xH )T x˜H ,

(11)

s qH (˜ xH + d) = fH (˜ xH + d) − ∇fH (˜ xH )T d + (Rgh )T (˜ xH + d).

(12)

where

Supposing that fH is smooth enough and using Taylor expansions, we get fH (˜ xH + d) − ∇fH (˜ xH )T d = fH (˜ xH ) + 21 dT HfH (˜ xH )d + O(kdk3 ), where HfH is the Hessian matrix of fH , and xH )d = fH (˜ xH + d) − 2fH (˜ xH ) + fH (˜ xH − d) + O(kdk3 ). dT HfH (˜ Substituting into (12), we have s qH (˜ xH + d) =

fH (˜ xH + d) + fH (˜ xH − d) + (Rgh )T (˜ xH + d) +O(kdk3 ). 2 | {z } s (˜ qˆH xH +d)

Changing back variables by d = z − x ˜H and using (11), we obtain (10a). Equation (10b) follows immediately from (8). The above result clarifies in which sense the function (9) is a suitable surrogate model. Indeed, equation (10a) immediately yields S s argmin qˆH (z) = argmin(fˆH (z) + O(kz − x˜H k3 ), z

z

and the first order condition s S (˜ xH ) = Rgh ≈ R∇fh (˜ xh ). ∇ˆ qH (˜ xH ) = ∇fˆH S S The latter equation shows that qˆH and fˆH have the same first-order behaviour in a neighborhood of x˜H , and that an approximate version of (5) is enforced where gh appears in place of the exact fine-level gradient. From a computational point of view, this definition of the surrogate model requires two function evaluations at the coarser level instead of one. However, in Section 5 we show that the overall performance is not impacted much, as coarse-level computations contribute to only a small portion of the overall cost of the algorithm.

7

4

A coordinate search multilevel algorithm

The coordinate search method, also known as compass search, belongs to a broad class of derivative-free algorithms known in literature by the name of generating set search (GSS) methods [9]. A GSS method samples the objective function f , along a suitable set of search directions, around the currenti iterate xk , at a given distance depending on a stepsize parameter ∆k . In a classical coordinate search method, the search directions consist of the coordinate axis, and the new iterate xk+1 is sought in the set of trial points D := {xk ± ∆k e1 , xk ± ∆k e2 , . . . , xk ± ∆k en }. If D contains a point satisfying a suitable decrease condition, such point is accepted as the new iterate and the iteration is marked as successful. Otherwise, the stepsize is reduced and the iteration is marked as unsuccessful. The algorithm stops when the stepsize ∆k falls below a prescribed threshold. There is considerable freedom in deciding how to implement a coordinate search procedure. Algorithm 1 describes the implementation used in the experiments of this paper. Symbols S and U denote the sets of successful and unsuccessful iterations, respectively. Algorithm 1 [xk , ∆k ] ← CS(f, x0 ∈ Rn , ∆0 , τ, θ, it max ) Given γ ∈ (0, 1) and ρ(t) = γt2 , let k = 0, i⋆ = 0, succ = FALSE. for it = 0, . . . , it max do j ← (i⋆ + 1) mod n; succ ← FALSE; while succ = FALSE do for i = j, . . . , n, 1, . . . , j − 1 do x⋆ ← argminxk ±∆ei {f (xk + ∆ei ), f (xk − ∆ei )}; if f (x⋆ ) < f (xk ) − ρ(∆k ) (Successful step, k ∈ S) then xk+1 ← x⋆ ; succ ← TRUE; i⋆ ← i; break end if end for if succ = FALSE (Unsuccessful step, k ∈ U) then xk+1 ← xk ; ∆k+1 ← θ∆k ; k ← k + 1; else ∆k+1 ← ∆k ; k ← k + 1; end if if ∆k < τ then return; end if end while end for

We remark the following key points of the algorithm: • the counter it stores the number of successful updates and the index i⋆ denotes the last successful direction found; 8

• the while cycle checks, after each iteration, whether k ∈ S or k ∈ U. In the former case, the algorithm exits the cycle and the counter it is increased by one. In the latter, a new cyclical sampling is performed around xk with a smaller stepsize; • the internal for cycle explores the coordinate directions cyclically, starting from the coordinate successive to i⋆ . • ρ = γt2 is a classical forcing function which ensures sufficient decrease; we set γ = 10−4 in our experiments. Algorithm 1 enjoys global convergence properties, which are stated in the theorem below. A comprehensive review on the theory and practice of GSS methods can be found in the survey paper [9], to which we refer for proofs and further details. Theorem 1. Let f : Rn → R be continuously differentiable, and suppose the set L0f := {x ∈ Rn | f (x) ≤ f (x0 )} is compact. Then, Algorithm 1 produces a sequence of iterates {xk } satisfying lim ∆k = 0 and

k→+∞

lim k∇f (xk )k = 0.

k→+∞

It is also possible to get a r-linear local convergence result for the subsequence {xk }k∈U . Theorem 2. Suppose that f ∈ C 2 and that x∗ is a local minimizer s.t. ∇2 f (x∗ ) is positive definite. If x0 is sufficiently near to x∗ and ∆0 is sufficiently small, the sequence {xk } produced by Algorithm 1 is such that lim xk = x∗ ,

k→+∞

kxk − x∗ k ≤ η∆k , for k ∈ U, where η is a constant independent from k. Theorem 2 also tells us that it is reasonable, to interpret the steplength tolerance τ as a rough measure of the error we expect to obtain when Algorithm 1 stops. We cannot however estimate a-priori the constant η, nor know how small should ∆0 be in order to observe r-linear convergence.

4.1

Description of the multilevel algorithm

Building on the previous discussion, we describe in detail our multilevel coordinate search algorithm (ML/CS) in Algorithm 2. As before, nh denotes the dimension of the finer-grid problem. A single cycle of Algorithm 2 is represented graphically in Figure 1. Starting from an initial finest-level guess x0h , the function ML/CS is repeatedly called until ∆h < τ on the finest level. Note that the pre and post-smoothing phases require Algorithm 1 to find ν1 nh and ν2 nh iterations in S, respectively. The 9

Algorithm 2 [xh , ∆h ] ← ML/CS(fh, x0h , ∆0h , τ, θ, it max , ν1 , ν2 ) if h is the coarsest level then [xh , ∆h ] ← CS(fh , x0h , ∆0h , τ, θ, it max ); return else ˜ h ] ← CS(fh , x0 , ∆0 , τ, θ, nh ν1 ); [˜ xh , ∆ h h ˜ h < τ then if ∆ return end if s Compute an approximation gh ≈ ∇fh (˜ xh ) and define qˆH according to (9). x ˜H ← R˜ xh ; s [x+ qH ,x ˜H , ∆0h , τ, θ, it max , ν1 , ν2 ); H , ∆H ] ← ML/CS(ˆ + eh ← P (xH − x ˜H ); Find x+ = x ˜ + αeh , α ∈ [0, 1], s.t. fh (x+ xh ) via line-search; h H h ) ≤ fh (˜ ˜ h , τ, θ, nh ν2 ); [xh , ∆h ] ← CS(fh , x+ , ∆ H end if ˜ h < τ is satisfied, in which only exception occurs when the stopping criterion ∆ case the v-cycle is halted. The intent is to (approximately) mimic a coordinate descent iteration, which is composed of a cyclical sequence of nh updates. Of course, in the case of CS the coordinate indexes can be neither ordered nor distinct. Various choices are possible to implement the line-search step. In our experiment, a simple backtracking strategy was chosen. From a theoretical point of view, from Theorem 1 and the analysis in [14] it is immediate to get the following global convergence result. Theorem 3. Suppose fh and x0h satisfy the assumptions of Theorem 1. Denote as {xkh } the sequence of iterates produced by Algorithm 2 on the finest level. Then lim k∇fh (xkh )k = 0. k→+∞

As regards the computation of an approximate gradient required by Algorithm 2, a standard way to obtain gh is to use a classical finite-difference scheme centered in x ˜h . That is, gh,i ←

fh (˜ xh + ∆ei ) − fh (˜ xh − ∆ei ) , i = 1, . . . , nh , 2∆

(13)

where ∆ indicates some given stepsize. However, computing gh in this way requires 2nh additional function evaluations, a non-negligible cost when the problem size is large. Therefore, in order to minimize the computational overhead, we want to obtain gh by making use of already computed function values . This is accomplished by exploiting the structure of CS iterations. Specifically, we initialize gh ← (0, . . . , 0)T and update it dynamically during the pre-smoothing

10

x0h

x+ h

x ˜h ν1

xh ν2

P

R x02h

x+ 2h

x ˜2h ν1

x2h ν2

P

R x04h

x+ 4h

x ˜4h

ν1

x4h ν2

R

P x0ph

xph

∆ph < τ

Figure 1: Graphical depiction of a single ML/CS cycle.

step. Supposing that at iteration k the CS algorithm is sampling fh around xkh along the i-th axis with stepsize ∆kh , we update the i-th coordinate of gh by gh,i ←

fh (xkh + ∆kh ei ) − fh (xkh − ∆kh ei ) . 2∆kh

(14)

Since the algorithm is already required to evaluate fh (xkh + ∆kh ei ) and fh (xkh − ∆kh ei ), no additional work is needed. It is worthwhile to notice that this procedure cannot be applied to approximate the gradient ∇fH (˜ xH ) required in (4), as the gradient would have to be available before starting the coarse-level optimization. This motivated the s definition of the alternative surrogate model qˆH in Section 3. As there are no theoretical guarantees on the accuracy of the finite differences (14) as approximations of the finite differences (13) centered in x ˜h , it is legitimate to wonder whether (14) is adequate for the multilevel scheme to retain its practical efficiency. Or more in general, which is the impact of using inexact gradient information in the algorithm. This issue has been recently discussed in [11], where the authors argue that errors in difference approximations of gradients have little impact on performance. In other words, there is reason to believe that a crude approximation will suffice in most situations. Numerical experiments presented in Section 5 show that this is indeed the case.

4.2

Error smoothing properties of ML/CS

In Section 2, we mentioned the relationship between coordinate search and exact coordinate-wise minimization with regard to the possibility of using coordinate 11

search as a smoother in a multilevel context. We now conduct a few preliminary tests on the one-dimensional Poisson problem, in which we show graphically how the CS algorithm exhibits a smoothing behavior which is effectively exploited by ML/CS. Note that the discrete Poisson equation is often used as a model to verify the effectiveness of multilevel correction schemes. We are indeed presenting elementary experiments inspired by those found in classical multigrid textbooks such as [3]. Nevertheless, we believe it is important to show intuitively how coordinate-wise optimization methods, particularly coordinate search, exhibit error smoothing properties similar to those of linear iterative schemes. We consider the positive definite linear system Ah xh = bh ,

(15)

and the equivalent minimization problem min 12 xTh Ah xh − bTh xh , xh

(16)

obtained by discretizing the Poisson equation ( −x′′ (t) = b(t), 0 < t < 1, x(0) = x(1) = 0, with a classical second-order central difference scheme and grid spacing h. We set the right-hand side to bh = 0 (so that the current guess coincides with the error), and the initial guess to   16πi 40πi 1 sin , i = 1, . . . , nh . (17) + sin x0i,h = 2 nh + 1 nh + 1 In the first experiment, depicted in Figure 2, we analyze the effect of a twogrid scheme (i.e. a multigrid with only 2 levels of discretization) on the error obtained by using as smoothers the Gauss-Seidel method for problem (15), and coordinate search for problem (16). In the latter case, one smoothing iteration consists of nh CS updates. The size of the finer and coarser problems are nh = 63 and nH = 31, respectively. From top to bottom and from left to right, we depict: the initial fine-grid error; the fine-grid error after 2 pre-smoothing iterations; the fine-grid error after 2 iterations on the coarser grid and correction; and finally, the error after 2 post-smoothing iterations on the fine grid. Dashed lines refer to Gauss-Seidel iterations, and continuous lines to coordinate search. In both cases, it can be clearly seen how the smoothing phases on the fine grid play the role of damping high-frequency oscillations in the error, but have relatively little effect on the magnitude of the error itself, since low-frequency components remain untouched. The coarse-grid correction, on the opposite, damps smooth components, and the combination of both effects achieves a tangible reduction of the error norm. The second experiment describes in more detail how does the correction step work. The problem is the same as before, with an initial guess consisting of a 12

Initial error

Pre−smoothing

1

1

0.5

0.5

0

0

−0.5

−0.5

−1 0

0.2

0.4

0.6

0.8

−1 0

1

0.2

Correction 1

0.5

0.5

0

0

−0.5

−0.5

0.2

0.4

0.6

0.6

0.8

1

0.8

1

Post−smoothing

1

−1 0

0.4

0.8

−1 0

1

0.2

0.4

0.6

Figure 2: Behavior of the error in a two-grid scheme. The dashed line corresponds to multigrid with Gauss-Seidel, and the continuous line to ML/CS.

combination of smooth and oscillatory modes:   5πi 10πi 20πi 40πi 1 0 sin , i = 1, . . . , nh . + sin + sin + sin xi,h = 4 nh + 1 nh + 1 nh + 1 nh + 1 We perform a single v-cycle with ν1 = ν2 = 1. The algorithm works on a total of 4 discretization levels, the finest corresponding to nh = 63, the coarsest to nh = 7. Figure 3(a) refers to linear multigrid with Gauss-Seidel, and Figure 3(b) to ML/CS. In the top row we depict the initial fine-grid error and the error after one pre-smoothing iteration. At the bottom-left corner, error after presmoothing (dashed line), correction vector (dash-dotted line), and error before post-smoothing (continuous line) are drawn. It can be seen that the correction vector approximates the opposite of the error vector, leaving a very small error but re-introducing some oscillatory components in the interpolation process. A post-smoothing iteration, pictured at the bottom-right corner, completes the process by damping such components to obtain a small and smooth final error. The similarity between the two smoothers is again apparent, supporting the idea that the interplay between smoothing and correction steps in ML/CS works exactly as in the case of multigrid with Gauss-Seidel iterations. As a last test, we consider a non-homogeneous one-dimensional Poisson model problem, where the right-hand side is chosen so that the solution is given by   1 2πi 4πi xh,i = sin , i = 1, . . . , nh . + sin 2 nh + 1 nh + 1 We solve the corresponding optimization problem on a finest grid with nh = 63 13

Initial error

Pre−smoothing

1

1

0.5

0.5

0

0

−0.5

−0.5

−1 0

0.2

0.4

0.6

0.8

−1 0

1

0.2

Correction 1

0.5

0.5

0

0

−0.5

−0.5

0.2

0.4

0.6

0.6

0.8

1

0.8

1

0.8

1

0.8

1

Post−smoothing

1

−1 0

0.4

0.8

−1 0

1

0.2

0.4

0.6

(a) Initial error

Pre−smoothing

1

1

0.5

0.5

0

0

−0.5

−0.5

−1 0

0.2

0.4

0.6

0.8

−1 0

1

0.2

Correction 1

0.5

0.5

0

0

−0.5

−0.5

0.2

0.4

0.6

0.6

Post−smoothing

1

−1 0

0.4

0.8

−1 0

1

0.2

0.4

0.6

(b)

Figure 3: Error correction in a single v-cycle with Gauss-Seidel multigrid (a) and ML/CS (b). Continuous line represent the current error.

14

by a sequence of ML/CS cycles with 4 levels, starting from a zero initial guess. We choose ν1 = ν2 = 2 and set the stepsize control parameter to θ = 0.5. The tolerance parameter is set to τ = 10−5 . The algorithm reached the stopping condition after 3 cycles. In Figure 4, we illustrate how the error vector looks in each of the 3 cycles, after pre-smoothing (left), recursion (middle), and post-smoothing (right). From the picture, it is again apparent how the smoothing steps significantly reduce oscillations in the error, but are unable to damp smooth components, and therefore not sufficient by themselves to reduce the error norm. The correction step then eliminates the remaining components of the error, reducing the error norm by 1 or 2 orders of magnitude. It does excite a certain degree of oscillation, but such oscillations are immediately damped by the post-smoothing step. It may be worth noting that the high-frequency error in the bottom-right picture is not completely smoothed out. This can happen when, during the last cycle, the steplength in the CS algorithm falls below the stopping tolerance τ and post-smoothing is thus halted before completion. However, this has no major impact on the desired accuracy for the solution, as the magnitude of the error is already very small. Pre−smoothing

Correction

Post−smoothing

1

0.02

0

0

−1 0

0.5

1

0

0

0.5

1

−0.02 0

−4

0.02

5

0.5

1

0.5

1

0.5

1

−4

x 10

5

x 10

0 0

0 −5

−0.02 0

0.5

1

−10 0

−4

5

0.5

1

2

−6

x 10

5

1 0

x 10

0

0 −5

−1 −5 0

−5 0

−5

x 10

0.5

1

−2 0

0.5

1

−10 0

Figure 4: Detail of smoothing during consecutive cycles of ML/CS.

4.3

The full-multilevel initialization strategy

The performance of ML/CS can be considerably improved by implementing a fullmultilevel scheme analogous to the full-multigrid (FMG) used in linear multigrid. That is, before invoking Algorithm 2, we refine an initial approximation x0h by solving a sequence of problems on increasingly finer grids. Algorithm 3 describes our implementation of full-multilevel initialization. As in the linear multigrid case, the parameter ν0 is usually set to 1 in practice [3]. 15

Once the initialization has been executed, we call Algorithm 2 with the value of xh returned by FM/CS as the initial point. Algorithm 3 [xh , ∆h ] ← FM/CS(fh, x0h , ∆0h , τ, θ, it max , ν0 , ν1 , ν2 ) if h is the coarsest grid then [xh , ∆h ] ← CS(fh , x0h , ∆0h , τ, θ, it max ); return else x0H ← Rx0h ; ∆0H ← ∆0h ; [xH , ∆H ] ← FM/CS(fH , x0H , ∆0H , τ, θ, it max , ν0 , ν1 , ν2 ); xh ← P xH ; Define a fine-level stepsize ∆h appropriately; [xh , ∆h ] ← ML/CS(fh, xh , ∆h , τ, θ, it max , ν1 , ν2 ) ν0 times; end if Note that, depending on the convergence criterion adopted, the initial approximation obtained by FM/CS may as well already satisfy the stopping condition. In this case, no additional v-cycles are needed and the multilevel scheme behaves as shown in Figure 5. In general, we expect the number of v-cycles performed after a full-multilevel initialization to be small and independent from the finest grid size. x0h

x0h

R

P x02h

x02h

R

P x04h x04h R

P x0ph ML/CS

ML/CS

ML/CS

Figure 5: FM/CS initialization followed by a ML/CS cycle.

16

4.3.1

Warm-start strategy for the stepsize parameter

A proper management of the stepsize used for calling CS in FM/CS is essential in order to obtain the best possible efficiency from the algorithm. Since the fullmultilevel initialization provides in many cases a very good initial guess, starting CS with a default initial stepsize is rarely an optimal strategy: the algorithm consumes valuable time in reducing the stepsize to a suitably small value before it is able to make further progress towards the solution. We therefore seek for a good starting stepsize for the fine-level optimization. In the FM/CS initialization, we call CS with a given initial stepsize on the coarsest grid (e.g. ∆0ph = 1 or ∆0ph = kx0ph k∞ ), where optimization is carried out until the CS stopping condition is satisfied. Then, at each level, we call the internal v-cycles with a stepsize depending on the discretization level and on the value ∆H obtained after the ν2 post-smoothing iterations on the previous (coarser) level. Specifically, we set  ∆H  ¯ ¯ = min ∆0 , C∆H , ∆h = max ∆, ∆ , (18) h θ where C > 1 is a positive constant and the latter safeguard serves to ensure that ∆h > τ . The only exception is the second-coarsest level, where ∆H < τ , in which case we set ∆h = 1 to avoid taking too small steps. Intuitively, by taking an initial stepsize smaller than ∆0h we are trying to reflect the fact that CS iterations at finer levels start with improved initial guesses. In particular, Theorem 2 tells us that ∆H is related to the error of the coarse-grid solution, while the constant C ideally compensates for the amount of error possibly reintroduced by the prolongation. Good values of C are obviously dependent on the problem at hand, and have to be determined experimentally.

5 5.1

Numerical experiments Test problems

We give here a description of the test problems used in our experiments. Most of them belong to the classical Minpack-2 collection [1]. Exceptions are the Poisson multigrid model problem (explained in detail in any multigrid textbook such as [3]) and the MOREBV problem, which is derived from a problem described in [12]. All of these tests were also used in [6]. 5.1.1

P2D: a two-dimensional Poisson model problem

We optimize the functional minx 21 xT Ax − bT x corresponding to the linear system obtained by discretizing the two-dimensional Poisson equation ( −∆x(t, s) = b(t, s), (t, s) ∈ S, x(t, s) = 0,

(t, s) ∈ ∂S,

17

xi,j a

xi,j+1

TU

d

b TL xi+1,j c

xi+1,j+1

Figure 6: Triangulation used for discretizing the test problems.

on a uniformly spaced grid on the unit square S with the classical 5-point Laplacian. As in [6], we set the right-hand side to b(t, s) = 2t(1 − t) + 2s(1 − s), so that the exact solution is x(t, s) = t(1 − t)s(1 − s). 5.1.2

MINS: a nonlinear minimum surface problem

This is a classical nonlinear problem on the unit square S, arising in the calculus of variations: between all the surfaces with a given profile on ∂S, find the one having the smallest area. The problem can be formulated as follows: Z q 1 + k∇x(t, s)k22 dt ds, (19) min x∈V

S

where V = {x ∈ H1 (S) | x(t, s) = x0 (t, s) on ∂S}. We choose the boundary condition proposed in [6]: ( t(1 − t) if s = 0 or s = 1, x0 (t, s) = 0 elsewhere.

(20)

The problem is discretized by a finite element approximation on a uniform triangulation of S [1]. The domain S is partitioned in 2n2 triangles as in Figure 6. The squared norm of the gradient is computed by approximating partial derivatives by finite differences along the catheti of the triangles. We obtain the

18

following convex objective function: fh (xh ) :=

n−1 p h2 X p 1 + a2 + b2 + 1 + c2 + d2 , 2 i,j=0

(21)

where xi+1,j+1 − xi,j+1 xi+1,j+1 − xi+1,j xi+1,j − xi,j xi,j+1 − xi,j ,b= ,c= ,d= . a= h h h h The (n− 1)2 -dimensional vector xh corresponds to the internal components xi,j , i, j = 1, . . . , n − 1, while the boundary components are set according to (20). 5.1.3

DSSC: a steady-state combustion problem

We consider the following variational formulation of a steady-state problem for a solid fuel ignition model: Z   2 x(t,s) 1 min dt ds. (22) k∇x(t, s)k − λe 2 2 1 x∈H0 (S)

S

We set λ = 5. By discretizing (22) with the same triangulation as in the case of MINS, we get the following convex objective function: n h2 X 2 L (a + b2 + c2 + d2 − λ(αU fh (xh ) := i,j + αi,j )), 4 i,j=0

where the quantities a, b, c and d are defined as above and αU i,j =

2 3

αL i,j =

(exi,j + exi,j+1 + exi+1,j+1 ) ,

2 3

(exi,j + exi+1,j + exi+1,j+1 ) .

Only internal components xi,j , i, j = 1, . . . , n − 1 are optimized, while boundary components are set to zero. 5.1.4

MOREBV: a nonconvex problem

The following two-dimensional formulation of a one-dimensional problem first described in [12], Z

−∆x(t, s) + 1 (x(t, s) + t + s + 1)3 2 dt ds, min1 (23) 2 2 x∈H0

S

is discretized on the triangulation introduced above to obtain the following nonconvex objective function:

fh (xh ) :=

n−1 X

4xij − xi−1,j − xi+1,j − xi,j−1 − xi,j+1

i,j=1

 3 !2 h2 i j + 1 + + + xij . 2 n n

Again, internal components xi,j , i, j = 1, . . . , n − 1 are optimized and boundary components are set to zero. 19

5.1.5

DPJB: a quadratic problem with positivity constraints

The following problem models the pressure distribution of a film of lubricant between two cylinders: Z 2 1 min (24) 2 φ(t)k∇x(t, s)k2 − ψ(t)x(t, s), D := [0, 2π] × [0, 2r], x∈V

D

where V = {x ∈ H01 (D) | x ≥ 0 on D}, φ(t) = (1 + ǫ cos(t))3 and ψ(t) = ǫ sin(t). 1 . We set r = 10 and ǫ = 10 The domain is discretized by triangulating as described before, with grid 20 spacings ht = 2π n and hs = n . The subscript h stands here for the pair (ht , hs ). We obtain the objective function   n n−1 X X   1 2 2 ψ(iht )xi,j  . − αU a2 + b2 + αL fh (xh ) := ht hs  i,j c + d 2 i,j=0 i,j i,j=1 where a= αU i,j

xi,j+1 − xi,j xi+1,j+1 − xi,j+1 xi+1,j+1 − xi+1,j xi+1,j − xi,j ,b= ,c= ,d= , hs ht hs ht 1 αL = 16 (2φ(iht ) + φ((i + 1)ht ), i,j = 6 (φ(iht ) + 2φ((i + 1)ht )).

We optimize internal components xi,j , i, j = 1, . . . , n − 1, and set boundary components to zero. Note that this problem has the form min f (x), x∈F

F = {x | x ≥ 0},

therefore a slight modification of the algorithm is necessary to handle the nonnegativity constraint. As described in [9] coordinate search can be readily extended to bound-constrained problems, where the feasible set has the form F = {x | − ∞ ≤ l ≤ x ≤ u ≤ +∞}. Every time a trial step falls outside F , it suffices to project such step on the boundary of the feasible region. That is to say, the body of the for loop in Algorithm 1 is modified as follows: ∆+ = min{∆k , ui − (xk )i }, ∆− = min{∆k , (xk )i − li }; x⋆ ← argmin{xk +∆+ ei ,xk −∆− ei } {f (xk + ∆+ ei ), f (xk − ∆− ei )}; if f (x⋆ ) < f (xk ) − ρ(∆k ) (Successful step, k ∈ S) then xk+1 ← x⋆ ; succ ← TRUE; i⋆ ← i; break end if Formula (14) for gradient approximation is replaced in this case by gh,i ←

− k fh (xkh + ∆+ h ei ) − fh (xh − ∆h ei ) . − (∆+ h + ∆h )

20

That is to say, gh,i is a centered difference approximation of (∇fh (xkh + ξei ))i , + − where ξ = ∆ −∆ . 2 In a multilevel framework a coarse-level feasible set FH can be defined, which ensures that the correction x ˜h + αP eH , α ∈ [0, 1], remains feasible [7]. In our case, where l = 0, u = +∞ on the finest level and P is the full-weighting operator, we have FH = {xH | xH ≥ lH }, where lH,j = x ˜H,j +

max (lh − x ˜h )i ,

i=1,...,nh

j = 1, . . . , nH .

This definition guarantees that for any yH ∈ FH we have x ˜h + P yH ∈ Fh [7].

5.2

Numerical results

Our experiments were executed under a Linux OS on a machine with a 3.40 R processor and 16 GB of main memory. All the algorithms GHz 4-core Intel have been coded in MATLABTM . Our choice for the step contraction parameter in Algorithm 1 is θ = 0.25. We set the stopping criterion tolerance to τ = 10−4 . The multigrid cycle parameters are ν0 = 1 and ν1 = ν2 = 2 unless specified otherwise. In all the experiments, we start with the initial guess x0h = (1, . . . , 1)T and initial stepsize ∆0h = kx0h k∞ = 1. The problem size was selected as nh = 1046529 for all the test problems, corresponding to a finest-grid spacing h = 2110 for the first four problems and 2r (ht , hs ) = ( 22π 10 , 210 ) for DPJB. The stepsize warm-start parameter C was selected on the basis of some preliminary experiments. Specifically, with the above mentioned implementation choices, we tested the algorithm with C = {21 , 22 , . . . , 212 } and for each problem we selected the best performing value with regard to the total number of function value updates: C = 28 for problem MINS, and C = 22 for the other problems. In the following tables we report the total CPU time employed by FM/CS to solve the problem, measured in seconds. As a machine-independent measure of the computational cost, we report the total number of function value updates on all levels: l X FEVALSi , Total cost = i=1

where FEVALSi is the total number of function value updates on the i-th level. This choice appears the most adequate, since in all the considered problems the objective function consists in a sum of terms depending only on a small subset of neighbouring variables, thus allowing for cheap coordinate-wise function value updates [17]. Such updates can be performed in a small and constant number of operations, independent of the problem size. In Table 1, we report results obtained by running FM/CS, with and without using the warm-start strategy (18) for the stepsize during the full-multilevel initialization. The last column reports the observed speed-up, calculated as the ratio between the total number of function value updates in the two cases. 21

Problem P2D MINS DSSC MOREBV DPJB

∆h Time (s) 7.42E+01 1.82E+02 8.51E+01 1.03E+02 1.14E+02

=1 Fevals 1.95E+07 1.96E+07 2.01E+07 2.14E+07 2.41E+07

∆h = min{1, C∆H } Time (s) Fevals 1.08E+01 2.80E+06 1.05E+02 1.12E+07 1.43E+01 3.34E+06 2.32E+01 4.79E+06 2.73E+01 5.75E+06

Speed-up x7.0 x1.8 x6.0 x4.5 x4.2

Table 1: Performance of FM/CS on the test problems.

First of all, we see that the algorithm solves all of the problems in a reasonable time. Furthermore, the warm-start strategy for the stepsize is able to provide a considerable speed-up factor on all but the MINS problem. In general, the effectiveness of the stepsize strategy depends on how well the initial guess P xH provided by the coarse-level initialization approximates the finer-level solution on a given problem. The closer P xH is to the solution, the smaller step is needed in CS in order to make progress, and the shorter time is required to reach the stopping condition. Note finally that the total number of function evaluations needed by FM/CS is generally high, but as explained above such evaluations are never done from scratch. They are performed through inexpensive coordinatewise updates. h 1/26 1/27 1/28 1/29 1/210

Size nh 3969 16129 65025 261121 1046529

Time (s) 8.74E-01 2.11E+00 6.98E+00 2.68E+01 1.05E+02

Fevals 8.70E+04 2.16E+05 7.36E+05 2.83E+06 1.12E+07

Ratio – x2.5 x3.4 x3.8 x4.0

Table 2: Scalability of FM/CS on problem MINS. In Table 2, we give some results obtained by running FM/CS on instances of increasing size of problem MINS. The first column indicates the grid spacing corresponding to the finest level. The last column reports the relative increase in the total number of objective value updates when the problem is solved on the next finer grid. As the problems stem from two-dimensional discretizations, each time the grid spacing is halved the number of variables increases by at most a factor of four. From the results in the table, we conclude that the algorithm scales well when the size of the problem increases.2 The increase in computational cost does not exceed a factor of 4, thus the overall performance does not deteriorate even when dealing with larger instances of the problems. This is in contrast with the typical behavior of direct search methods, whose performance 2 It must be noted that results in Table 2 are representative of the behaviour of FM/CS on all the considered test problems.

22

is most often severely penalized as the problem size increases. Specifically, direct search algorithms are usually limited to sizes of at most a few hundred variables, or a few thousand in case the objective has a structure which allows for fast evaluations along the search directions [9, 17].

Problem P2D MINS DSSC MOREBV DPJB

Classical Time (s) 1.09E+01 1.06E+02 1.38E+00 2.38E+01 2.74E+01

surrogate Fevals 2.80E+06 1.12E+07 3.21E+06 4.86E+06 5.80E+06

Derivative-free Time (s) Fevals 1.08E+01 2.80E+06 1.05E+02 1.12E+07 1.43E+01 3.34E+06 2.32E+01 4.79E+06 2.73E+01 5.75E+06

Ratio x1.00 x1.00 x1.04 x0.98 x0.99

Table 3: Effect of using derivative-free surrogate models. Finally, in Table 3 we show the impact of the derivative-free surrogate model (9) in comparison with the classical surrogate (4), which requires the computation of fine and coarse-level gradients. The results were obtained by running FM/CS, endowed with the stepsize strategy (18). From the results in the table, we conclude that, limited to our experiment, usage of the derivative-free model (9) yields results which are just as good as those obtained with the classical surrogate. This confirms the validity of the model as well as previous intuitions about the effect of using approximate gradient information in the coarse-level optimization.

6

Conclusions and perspectives

In this paper, we have described a multilevel derivative-free optimization method based on a coordinate search algorithm. Experimental results revealed that the algorithm retains the error smoothing properties of multigrid algorithms, and that it substantially improves the performance of classical coordinate search methods. In particular, a carefully designed full-multilevel strategy is able to boost convergence speed considerably, while at the same time overcoming traditional limitations on the problem size. We believe that the proposed algorithm deserves a deeper study. In particular, further experiments on applications where coordinate descent techniques are used to solve discretized optimization problems such as emission tomography, signal denoising or image restoration are currently underway, as well as a detailed analysis of the bound-constrained case. The implementation of direct search methods different from coordinate search and the topic of surrogate model approximations will also be the subject of further research.

23

References [1] B.M. Averick, R.G. Carter, and J.J. Mor´e and G.-L. Xue, The Minpack-2 test problem collection, Tech. Rep., Argonne National Laboratory, Argonne, Illinois, USA, 1992. [2] A. Borzi and K. Kunisch, A globalization strategy for the multigrid solution of elliptic optimal control problems, Opt. Meth. and Soft. 21(3) (2006), pp. 445–459. [3] W.L. Briggs, V.E. Henson, and S.F. McCormick, A Multigrid Tutorial, 2nd ed., SIAM, Philadelphia, 1999. [4] R.H. Chan and K. Chen, A multilevel algorithm for simultaneously denoising and deblurring images, SIAM J. Sci. Comput. 32(2) (2010), pp1043–1063. [5] T.F. Chan and K. Chen, An optimization-based multilevel algorithm for total variation image denoising, SIAM J. Multiscale Model. Simul. 5(2) (2006), pp. 615–645. [6] S. Gratton, M. Mouffe, A. Sartenaer, Ph.L. Toint, and D. Tomanos, Numerical experience with a recursive trust-region method for multilevel nonlinear bound-constrained optimization, Opt. Meth. and Soft. 25(3) (2010), pp. 359 –386. [7] S. Gratton, M. Mouffe, and Ph.L. Toint, and M. Weber-Mendon¸ca, A recursive l∞ -trust-region method for bound-constrained nonlinear optimization, IMA J. Numer. Anal. 28(4) (2008), pp. 827–861. [8] S. Gratton, A. Sartenaer, and Ph.L. Toint, Recursive trust-region methods for multiscale nonlinear optimization, SIAM J. Opt. 19(1) (2008), pp. 414– 444. [9] T.G. Kolda, R.M. Lewis, and V. Torczon, Optimization by direct search: new perspectives on some classical and modern methods, SIAM Review 45(3) (2003), pp. 385–482. [10] R.M. Lewis and S.G. Nash, Model problems for the multigrid optimization of system governed by differential equations, SIAM J. Sci. Comput. 26(6) (2005), pp. 1811–1837. [11] R.M. Lewis and S.G. Nash, Using Inexact Gradients in a Multilevel Optimization Algorithm, Tech. Rep., Systems Engineering & Operations Research Dept., George Mason University, Fairfax VA, 2012. [12] J.J. Mor´e, B.S. Garbow, and K.E. Hillstrom, Testing unconstrained optimization software, ACM Trans. Math. Software 7(1) (1981), pp. 17–41. [13] S.G. Nash, A multigrid approach to discretized optimization problems, Opt. Meth. and Soft. 14 (2000), pp. 99–116. 24

[14] S.G. Nash, Convergence and descent properties for a class of multilevel optimization algorithms, Tech. Rep., Systems Engineering and Operations Research Dept., George Mason University, Fairfax VA, 2010. [15] S. Oh, C.A. Bouman, and K.J. Webb, Multigrid tomographic inversion with variable resolution data and image spaces, IEEE Trans. Image Proc. 15(9) (2006), pp. 2805–2819. [16] S. Oh, A. Milstein, Ch. Bouman, and K. Webb. A general framework for nonlinear multigrid inversion, IEEE Trans. Image Proc. 14(1) (2005), pp. 125–140. [17] C.J. Price and Ph.L. Toint, Exploiting problem structure in pattern search methods for unconstrained optimization, Opt. Meth. and Soft. 3 (2006), pp. 479–491. [18] U. Trottenberg, C. Oosterlee, and A. Sch¨ uller, Multigrid, Academic Press, London, 2001. [19] M. Vallejos and A. Borzi, Multigrid optimization methods for linear and bilinear elliptic optimal control problems, Computing 82 (2008), pp. 31–52. [20] J.C. Ye, C.A. Bouman, K.J. Webb, and R.P. Millane, Nonlinear multigrid algorithms for bayesian optical diffusion tomography, IEEE Trans. Image Proc. 10(5) (2001), pp. 909–922.

25

Suggest Documents