Lipschitz gradients for global optimization in a one-point-based partitioning schemeI Dmitri E. Kvasova,b , Yaroslav D. Sergeyeva,b,∗ a
b
DEIS – University of Calabria, Via P. Bucci 42C, 87036 – Rende (CS), Italy Software Department, N.I. Lobachevsky State University, Nizhni Novgorod, Russia
Abstract A global optimization problem is studied where the objective function f (x) is a multidimensional black-box function and its gradient f ′ (x) satisfies the Lipschitz condition over a hyperinterval with an unknown Lipschitz constant K. Different methods for solving this problem by using an a priori given estimate of K, its adaptive estimates, and adaptive estimates of local Lipschitz constants are known in the literature. Recently, the authors have proposed a one-dimensional algorithm working with multiple estimates of the Lipschitz constant for f ′ (x) (the existence of such an algorithm was a challenge for 15 years). In this paper, a new multidimensional geometric method evolving the ideas of this one-dimensional scheme and using an efficient one-point-based partitioning strategy is proposed. Numerical experiments executed on 800 multidimensional test functions demonstrate quite a promising performance in comparison with popular DIRECT-based methods. Keywords: Global optimization, Lipschitz gradients, set of Lipschitz constants, geometric algorithms 2000 MSC: 65K05, 90C26, 90C56 I
This work was supported by the grants 1960.2012.9 and MK-3473.2010.1 awarded by the President of the Russian Federation for supporting the leading research groups and young researchers, respectively, as well as by the grant 11-01-00682-a awarded by the Russian Foundation for Fundamental Research. The authors thank also the Italian Inter-University Consortium for the Application of Super-Computing for Universities and Research (CASPUR), project “Advanced techniques for global optimization: Numerical methods and applications” in the framework of the “HPC Grant 2011 on GPU cluster”. ∗ Corresponding author. Fax: +39 0984 494855. Email addresses:
[email protected] (Dmitri E. Kvasov),
[email protected] (Yaroslav D. Sergeyev) Preprint submitted to J. of Computational and Applied Mathematics
February 10, 2012
1. Introduction Global optimization is an important part of numerical analysis (see, e.g., [33, 43, 44, 46]). It considers problems derived from complex industrial applications where the objective function f (x) to be minimized is defined over a hyperinterval D ⊂ RN , N ≥ 1, and can be black-box, multiextremal, and requiring high computational resources for its evaluation (see, e.g., [1, 5, 27, 28, 33, 34, 37, 43, 46]). Solving efficiently this type of problems is a great challenge, since they present a high number of local minimizers (only a few of which can be global ones), often with extremely different values, and do not present a simple mathematical description of the global optima. One of the natural and powerful (from both the theoretical and the applied points of view) assumptions on these problems is that the objective function has bounded slopes, i.e., |f (x′ ) − f (x′′ )| ≤ L∥x′ − x′′ ∥, x′ , x′′ ∈ D, 0 < L < ∞,
(1)
where ∥·∥ denotes, usually, the Euclidean norm (other norms can be also used, see, e.g., [13, 30]) and L is the (unknown) Lipschitz constant. In this case, Lipschitz global optimization methods can be applied (see, e.g., [8, 17, 18, 34, 37, 43, 46] and the references given therein). They can be distinguished, for example, by the way in which information about the Lipschitz constant is obtained and by the strategy of exploration of the admissible region. In the literature, there exist at least four approaches to specify the Lipschitz constant L from (1): (i) it can be given a priori (see, e.g., [6, 8, 17]); (ii) its global estimate over the whole domain can be used (see, e.g., [18, 22, 34, 43]); (iii) local Lipschitz constants can be estimated (see, e.g., [21, 37, 39, 43]); (iv) several estimates of L can be chosen from a set of possible values (see, e.g., [9, 10, 16, 19, 26, 36, 37]). In their work global optimization methods using multiple estimates of the Lipschitz constants have proved to be particularly attractive for studying applied problems (see, e.g., [4, 15, 16, 29, 32], other references can be found, e.g., in [23, 36]). In exploring the multidimensional search domain, various adaptive partitioning strategies can be applied. For example, one-point-based algorithms subsequently subdivide the search region in smaller ones and evaluate the objective function at one point within each subregion (see, e.g., [7, 8, 10, 19,
2
42]). Partitions of the search domain into hyperintervals, based on evaluating the objective function at the two vertices corresponding to the main diagonal of hyperintervals called diagonal partitioning strategies, can also be successfully used (see, e.g.,[14, 21, 22, 34, 36, 37]). More complex partitions, based on simplices, auxiliary functions of various nature, and so on, have also been proposed (see, e.g., [18, 25, 45, 46]; many other references can be found in [38]). The choice of the regions to be partitioned is based on an information about the objective function obtained during the search. It can be either of the probabilistic type (e.g., Bayesian approach applying the theory of random functions to a mathematical representation of available (certain or uncertain) a priori information on the objective function behavior, see, e.g., [20, 24, 28, 43, 46]), or of the deterministic one (e.g., geometric approach making a use of different auxiliary functions to estimate the behavior of f (x) over the search region, see, e.g., [2, 17, 18, 34, 37, 39, 40, 43, 46]). In this paper, a particular class of the Lipschitz global optimization problems is considered, namely, the class of problems with differentiable objective functions having the Lipschitz gradients f ′ (x), i.e., f ∗ = f (x∗ ) = min f (x),
(2)
∥f ′ (x′ ) − f ′ (x′′ )∥ ≤ K∥x′ − x′′ ∥, x′ , x′′ ∈ D, 0 < K < ∞,
(3)
D = [a, b] = {x ∈ RN : a(j) ≤ x(j) ≤ b(j)}.
(4)
x∈D
where It is supposed in this formulation that the objective function f (x) can ( )T (x) ∂f (x) ∂f (x) be black-box, multiextremal, its gradient f ′ (x) = ∂f , , . . . , ∂x(1) ∂x(2) ∂x(N ) (which could be itself a costly multiextremal black-box vector-function) can be calculated during the search, and f ′ (x) is Lipschitz-continuous with some fixed, but unknown, constant K, 0 < K < ∞, over D. These problems are often encountered in engineering applications (see, e.g., [34, 37, 43]), particularly, in electrical engineering optimization problems (see, e.g., [35, 37, 43]). In the literature, several methods for solving this problem have been proposed. They can be also distinguished, for instance, with respect to the way the Lipschitz constant K is estimated in their work. There exist algorithms using an a priori given estimate of K (see, e.g., [2, 3, 40]), its 3
adaptive estimates (see, e.g., [14, 37, 40]), and adaptive estimates of local Lipschitz constants (see, e.g., [37, 40]). Algorithms working with a number of Lipschitz constants for f ′ (x) chosen from a set of possible values varying from zero to infinity were not known till 2009 when such an algorithm for solving the one-dimensional problem (2)–(4) has been proposed in [23]. Its extension to the multidimensional case is not a trivial task in contrast to the DIRECT method (see [19]) proposed in 1993 for solving problems with the Lipschitz objective function. The present paper solves this more than 15-year open problem of constructing multidimensional global optimization methods working with multiple estimates of the Lipschitz constants for f ′ (x). A new multidimensional geometric method for finding solutions to the problem (2)–(4) is introduced and studied here. It uses a new one-point-based partitioning strategy (see [37, 42]) and works with a number of estimates of the Lipschitz constant K for f ′ (x). Such multiple (from zero to infinity) estimates of K from (3) are used to calculate the lower bounds of the objective function over the hyperintervals of a current partition of the search domain and to produce new trial points (i.e., points at which both the objective function f (x) and its gradient f ′ (x) are evaluated). In the framework of geometric algorithms, this kind of estimating the Lipschitz constant can be interpreted as examination of all admissible minorant functions during the current iteration of the algorithm without constructing a specific one. A particular attention in the new algorithm is given to the improvement of the current minimal function value (the so-called record value) in order to provide a faster convergence to a global minimizer. As demonstrated by extensive numerical experiments executed on 800 test functions from the differentiable GKLS test classes (see [11]), the usage of gradients allows one to obtain, as expected, an acceleration in comparison with the DIRECT-based methods. The paper is organized as follows. In Section 2, a theoretical background of the new algorithm is presented. Section 3 is dedicated to the description of the algorithm and to its convergence analysis. Finally, Section 4 contains results of numerical experiments executed on 800 test functions. 2. Theoretical background In this section, the main theoretical results, necessary for introducing the new algorithm, are obtained. First, a new partitioning strategy developed in the framework of the one-point-based partition approach is described. 4
The second part presents a technique for estimating the lower bounds of the objective function over hyperintervals. The third part is dedicated to the introduction of a procedure for determining nondominated hyperintervals, i.e., hyperintervals having the smallest lower bound for some particular estimate of the Lipschitz constant for f ′ (x). They are candidates for partitioning at each iteration of the new method. 2.1. One-point-based partitioning strategy In this section, a new efficient one-point-based partitioning scheme proposed in [42] (see also [37]) is considered which is based on a diagonal partitioning strategy from [37, 41]. In this scheme, the function f (x) and its gradient f ′ (x) are evaluated only at one vertex (either ai or bi ) of the main diagonal of each hyperinterval Di = [ai , bi ] of the current partition independently of the problem dimension (recall that performing each trial is a time-consuming operation). Let us start the description of this scheme with a two-dimensional example shown in Fig. 1. In this Figure, partitions of the admissible region D produced by the algorithm at several initial iterations are presented starting from the first trial at the point a (it is supposed here that a single iteration consists of the subdivision of only one hyperinterval). Black dots represent the trial points and the numbers around these dots indicate iterations at which these trial points have been generated. The terms ‘interval’ and ‘subinterval’ will be used to denote two-dimensional rectangular domains. In Fig. 1a, the situation after the first two iterations is presented. Particularly, at the second iteration, the interval D is partitioned into three subintervals of equal area (equal volume in a general case). This subdivision is performed by two lines (hyperplanes) orthogonal to the longest edge of D (see Fig. 1a). The trial (evaluation of the objective function and, as we propose in this paper, of its gradient) is performed only at the point denoted by number 2. Let us suppose that the interval shown in light grey in Fig. 1a is chosen for the further partitioning. Thus, at the third iteration, three smaller subintervals are generated (see Fig. 1b). As one can see from Fig. 1c, the trial point of the fourth iteration coincides with the point 3 at which the trial has already been executed. Therefore, there is no need to perform a new (costly) evaluation of f (x) and f ′ (x) at this point, since the values obtained at the previous iteration can be used. These values can be stored in a specially designed vertex database and is simply retrieved on demand without 5
Figure 1: An example of subdivisions by a new one-point-based partitioning strategy
re-evaluations of the functions. For example, Fig. 1d illustrates the situation after 12 iterations. It can be seen from this figure that 23 intervals have been generated by only 9 trial points. Now we can describe the general scheme of a hyperinterval partitioning by assuming (without loss of generality) that the search hyperinterval D in (4) is an N -dimensional hypercube and the first trial is performed at the vertex a (the scheme starting from the vertex b is obtained analogously). Let a hyperinterval Dt = [at , bt ] of a current partition {Dk } of D = [a, b] be chosen for partitioning at an iteration k ≥ 1 of the algorithm. The operation of partitioning the selected hyperinterval Dt is performed as follows.
6
Step 1. Determine points u and v by the following formulae 2 u = (a(1), . . . , a(i − 1), a(i) + (b(i) − a(i)), a(i + 1), . . . , a(N )), (5) 3 2 v = (b(1), . . . , b(i − 1), b(i) + (a(i) − b(i)), b(i + 1), . . . , b(N )), (6) 3 where a(j) = at (j), b(j) = bt (j), 1 ≤ j ≤ N , and i is given by the equation i = arg min max |b(j) − a(j)|. (7) 1≤j≤N
Get (evaluate or read from the vertex database) the values of the objective function f (x) and its gradient f ′ (x) only at the point u. Step 2. Divide the hyperinterval Dt into three hyperintervals of equal volume by two parallel hyperplanes that are perpendicular to the longest edge i of Dt and pass through the points u and v. The hyperinterval Dt is so substituted by three new hyperintervals with indices t′ = t, m + 1, and m + 2 (where m = m(k) is the number of hyperintervals at the beginning of the iteration k) determined by the vertices of their main diagonals at′ = am+2 = u, bt′ = bm+1 = v,
(8)
am+1 = at , bm+1 = v,
(9)
am+2 = u, bm+2 = bt .
(10)
Augment the current number of hyperintervals m by 2. From the partitioning scheme described above it can be observed that, contrary to many traditional partitioning strategies (see, e.g., [7, 14, 19, 21, 34]), the condition ai (j) < bi (j) ∀j : j = 1, . . . , N, does not have to be satisfied for all hyperintervals Di ⊂ D, and their main diagonals determined by the vertices ai and bi can be oriented in different ways. However, as theoretically shown in [37, 41], the hyperintervals orientations are not arbitrary and a special linking of hyperintervals generated at different iterations can be established with some efforts. 7
This smart linking will allow us to store information about vertices and the corresponding values of f (x) and f ′ (x) in a special database, thereby avoiding redundant functions evaluations. The objective function and its gradient will be calculated at a vertex only once, stored in the database, and read when required. The new partitioning strategy generates trial points in such a regular way that one vertex where the functions are evaluated can belong to several (up to 2N ) hyperintervals (see, for example, a trial point at the 8-th iteration in Fig. 1d). Therefore, the time-consuming operation of the functions evaluations is replaced by a significantly faster operation of reading (up to 2N times) the functions values from the database. In this way, the new partitioning strategy considerably speeds up the search, especially when problems of high dimensions are considered (see [22, 41, 42]). Note also that the possibility to choose the sequence of trial points among either the points ai or the points bi (or among other 2N −2 vertices) of hyperintervals Di (see Step 1 of the scheme) offers an important tool for accelerating the global search when some additional information about the objective function is known (we will see an example of this situation in Section 4). Note the center-sampling partitioning strategies (see, e.g., [7, 10, 19]) do not have this property. 2.2. Lower bounding Let us consider an iteration k ≥ 1 of the new algorithm and a current partition {Dk } of the search hyperinterval D = [a, b] into hyperintervals Di = [ai , bi ], 1 ≤ i ≤ m(k); over these hyperintervals the values of both the function and its gradient are obtained (evaluated or read from the vertex database) at trial points xj(k) = ai , j(k) ≥ 1. In order to choose some hyperintervals for the further partition, the goodness (expressed by the socalled characteristic, see, e.g., [34, 37, 38, 43]) of the hyperintervals with respect to the global search is estimated by the algorithm. Better is the characteristic of a hyperinterval (in some predetermined sense), higher is the possibility to find the global minimizer within this hyperinterval. This hyperinterval is, therefore, a good candidate for a subdivision at the next iteration of the algorithm. An estimate of the lower bound of f (x) over a hyperinterval is one of the possible characteristics of this hyperinterval. The following result holds. ˜ be an estimate of the Lipschitz constant K for f ′ (x) from Theorem 1. Let K ˜ ≥ K and Di = [ai , bi ] be a hyperinterval of a current partition {Dk } (3), K 8
˜ of the characteristic of Di can be with a trial point ai . Then, a value Ri (K) ˜ ≤ f (x), found such that it is the lower bound of f (x) over Di , i.e., Ri (K) x ∈ Di . Proof. Let us prove the theorem in a constructive way. It is known (see, e.g., [7, 30, 31]) that for a differentiable function f (x) over a hyperinterval Di = [ai , bi ] the following inequality is satisfied: ˜ f (x) ≥ Q(x, K), x ∈ Di ,
(11)
˜ is defined over Di as where the quadratic minorant function Q(x, K) ˜ = f (ai ) + ⟨f ′ (ai ), (x − ai )⟩ − 0.5K∥x ˜ − ai ∥2 , x ∈ Di . Q(x, K)
(12)
Here ⟨·, ·⟩ is the scalar product, ∥ · ∥ is the Euclidean norm in RN , and g(x) = f (ai ) − ⟨f ′ (ai ), (x − ai )⟩ is the linear approximation of f (x) over Di . From inequality (11) the following estimates can be obtained: ˜ i − ai ∥2 ≥ f (x) ≥ f (ai ) + ⟨f ′ (ai ), (x − ai )⟩ − 0.5K∥b ˜ i − a i ∥ 2 , x ∈ Di , ≥ Fi − 0.5K∥b where Fi is the minimum value of the linear approximation g(x) over Di , i.e., Fi = f (ai ) + min ⟨f ′ (ai ), (x − ai )⟩. x∈Di
(13)
Since the function g(x) is linear, its minimum (13) is obtained in the vertex zi of the hyperinterval Di = [ai , bi ] which coordinates zi (j), j = 1, . . . , N , can be calculated as follows: (ai ) ai (j), if either bi (j) > ai (j) and ∂f ≥ 0, ∂x(j) ∂f (ai ) or bi (j) < ai (j) and ∂x(j) < 0; zi (j) = (14) (ai ) bi (j), if either bi (j) > ai (j) and ∂f < 0, ∂x(j) (ai ) or bi (j) < ai (j) and ∂f ≥ 0. ∂x(j) The corresponding value Fi from (13) is therefore equal to Fi = f (ai ) + ⟨f ′ (ai ), (zi − ai )⟩. 9
(15)
˜ for f (x) over a hyperinterval Di = [ai , bi ] Figure 2: A quadratic minorant function Q(x, K)
It is clear now that the value ˜ = Fi − 0.5K∥b ˜ i − a i ∥2 Ri = Ri (K)
(16)
satisfies the inequality Ri ≤ f (x), x ∈ Di , and, therefore, it can be taken as the characteristic value of Di that estimates the lower bound of f (x) over Di . The theorem has been proved. Note that analogous results can be obtained in the case of hyperintervals Di with trial points bi rather than ai . ˜ from (12) is illustrated In Fig. 2, a quadratic minorant function Q(x, K) for f (x) over a hyperinterval Di . Here, the characteristic value Ri coincides ˜ obtained at the point bi of the main with the minimum value of Q(x, K) diagonal of Di . In general, as it can be seen from (12), the value Ri is ˜ over Di . smaller than or equal to the minimum value of Q(x, K) 2.3. Nondominated hyperintervals and their graphical representation By using the obtained characteristics of hyperintervals, the relation of domination can be established between every two hyperintervals of a current
10
Figure 3: Graphical representation of hyperintervals
partition {Dk } of D and a set of nondominated hyperintervals can be identified for a possible subdivision at the current iteration of the new algorithm (see [23, 36]). ˜ > 0 of the Lipschitz constant K from (3), Definition 1. Given an estimate K a hyperinterval Di = [ai , bi ] dominates a hyperinterval Dj = [aj , bj ] with re˜ if spect to K ˜ < Rj (K). ˜ Ri (K) Definition 2. A hyperinterval Dt = [at , bt ] is said to be nondominated with ˜ > 0 if for the chosen value K ˜ there is no other hyperinterval respect to K k in {D } which dominates Dt . Let us now generalize the approach proposed by the authors in [23] for the one-dimensional prototype and show that both a multi-dimensional interval 11
Di = [ai , bi ] of a current partition {Dk } and the respective characteristic Ri using the gradient can be represented in a two-dimensional diagram similar to those proposed in [19, 36] for derivative free methods. Difficulties in the construction of such a diagram were among the main reasons that prevented people to propose methods using several estimates of K in their work. So, we take for the dot, corresponding to Di , the vertical coordinate Fi from (13)–(15) and the horizontal coordinate di equal to half of the squared length of the main diagonal of Di , i.e., di = 0.5∥bi − ai ∥2 . For example, in Fig. 3, a partition of the search domain D consisting of three hyperintervals is represented by the dots D1 , D2 , and D3 . Let us suppose that the Lipschitz constant K for the gradient f ′ (x) is estimated ˜ K ˜ ≥ K. The characteristic Ri of a hyperinterval Di , i = 1, 2, 3, can by K, be graphically obtained as the vertical coordinate of the intersection point ˜ and the vertical of the line passed through the point Di with the slope K coordinate axis (see Fig. 3). It is easy to see, that with respect to the estimate ˜ the hyperinterval D2 dominates both hyperintervals D1 and D3 and the K hyperinterval D3 dominates D1 . ¯ >K ˜ of the Lipschitz constant K is considered (see If a higher estimate K ¯ because Fig. 3), the hyperinterval D2 still dominates D1 with respect to K, ¯ < R1 (K). ¯ But D2 in its turn is dominated by the hyperinterval D3 R2 (K) ¯ because R2 (K) ¯ > R3 (K) ¯ (see Fig. 3). with respect to K, Since the exact Lipschitz constant K for f ′ (x) (or its valid overestimate) is unknown in the stated problem, the following definition can be useful. Definition 3. A hyperinterval Dt ∈ {Dk } is called nondominated if there ˜ < ∞ of the Lipschitz constant K such that Dt is exists an estimate 0 < K ˜ nondominated with respect to K. This means that nondominated hyperintervals are those with the smallest characteristics (16) for some particular estimate of the Lipschitz constant for the gradient f ′ (x). For example, in Fig. 3 the hyperintervals D2 and D3 are nondominated. It can be demonstrated following the reasoning used in [23, 36] that nondominated hyperintervals (in the sense of Def. 3) are located on the lowerright convex hull of the set of dots representing the hyperintervals of the current partition of D and can be efficiently found by applying algorithm for 12
Figure 4: The two-dimensional diagram representing dominated (white dots) and nondominated (black dots) hyperintervals of a current partition of the search domain
identifying the convex hull of the dots (see, e.g., [16, 19, 37]). In Fig. 4, the hyperintervals represented by the dots D1 (the largest hyperinterval), D3 , D4 , and D5 are nondominated hyperintervals. It has been shown in [36] that the hyperintervals of a current partition of D form several groups characterized by the length of their main diagonals. The hyperintervals from a group are represented graphically by dots with the same horizontal coordinate. For example, in Fig. 4 there are seven different groups of hyperintervals with the horizontal coordinates equal to di , i = 1, . . . , 7, and one empty group (with the horizontal coordinate between d7 and d6 in Fig. 4). Empty groups correspond to hyperintervals which are not present in the current partition but can be generated (or were generated) at the successive (previous) iterations of the method. As demonstrated in [36, 41], a correspondence between the length of the main diagonal of a hyperinterval Di and a non-negative integer number can be established, which indicates the number of subdivisions of the initial domain D necessary to obtain the hyperinterval Di . At each iteration k ≥ 1 this number can be considered as an index s(k) of a group of equal hyperintervals where 0 ≤ q∞ (k) ≤ s(k) ≤ q0 (k) < +∞ (17)
13
and q∞ (k) and q0 (k) are indices corresponding to the groups of the largest and smallest hyperintervals of the current partition of D, respectively (for example, in Fig. 4, q0 (k) = q∞ (k) + 7). During partitioning, diagonals of hyperintervals become smaller, while the corresponding group indices grow up consecutively starting from q∞ (1) = 0 (see [36] for details). Once a nondominated hyperinterval Dt = [at , bt ] is determined (with ˜ of the Lipschitz constant K), it can be subdivided respect to some estimate K at the next iteration of the algorithm if the following condition is satisfied: ˜ ≤ fmin (k) − ξ, Rt (K)
(18)
where Rt is calculated by (16), fmin (k) is the record value, i.e., the current minimal function value (attained at the record point xmin (k)), and ξ is the parameter of the algorithm, ξ ≥ 0 (it can be set in different ways, see Section 4). Notice that both the record value and the record point can be changed after performing some better trial during partitioning, but the record value remains always greater than or equal to the vertical coordinate of the lowest dot (dot D5 in Fig. 4). Condition (18) prevents the algorithm from subdividing already wellexplored small hyperintervals. For example, among nondominated hyperintervals in Fig. 4 (black dots), the hyperinterval D5 does not satisfy this condition and therefore is excluded from being partitioned at the next iteration of the method. It should be mentioned in this occasion that, together with nondominated hyperintervals, a hyperinterval Dmin (k) = [amin , bmin ] containing the record point (called hereafter the record hyperinterval) is also considered for a possible partition during the work of the algorithm as it will be explained in the next Section. Among different hyperintervals the record point xmin (k) can belong to (up to 2N ), the record hyperinterval is that with the smallest characteristic and can be changed during subdivisions. In Fig. 4, the record hyperinterval is represented by the dot Dmin (note that this dot can be not the lowest one, as in Fig. 4). Hereafter, the index of the group the hyperinterval Dmin (k) belongs to will be indicated as p(k) (during the work of the algorithm the satisfaction of inequalities (17) is ensured for this index which can be eventually updated together with q0 (k) and q∞ (k); see [36] for details). In Fig. 4, p(k) = q0 (k) and, therefore, the hyperinterval Dmin (k) is among the smallest hyperintervals of the illustrated partition of D.
14
3. New Algorithm In this Section, the new algorithm for solving problem (2)–(4) is described. First, the new method is presented and its computational scheme is given, then its convergence properties are analyzed. The new algorithm consists of the following explicitly defined phases: (1) an exploration phase, at which an examination of large hyperintervals (possibly located far away from the record point) is performed in order to capture new subregions with better function values; (2) a record improvement phase, at which the algorithm tries to better inspect the subregion around the record point. Several subdivisions of different hyperintervals can be performed at a single iteration of the new method (this more general notion of an iteration with respect to that of Section 2.1 is often used in the Lipschitz global optimization algorithms with multiple estimates of Lipschitz constants, see, e.g., [10, 19, 23, 36]). The exploration phase consists of several iterations (namely, N + 1 where N is the problem dimension), each serves for determining nondominated hyperintervals and partitioning them. Since each subdivision of a hyperinterval by the scheme (5)–(10) is performed perpendicularly to only one side of the hyperinterval (to the longest side from (7)), the number of iterations within a phase of the algorithm should be correlated with the hyperintervals dimension. This phase is interrupted after finishing an iteration if an improvement on at least 1% of the minimal function value is reached, i.e., if prec prec fmin (k) ≤ fmin − 0.01|fmin |,
(19)
prec where fmin is the record value memorized at the start of the exploration phase. Condition (19) is verified after each iteration of the exploration phase and is used to switch the algorithm to the record improvement phase. This local phase is also launched when the exploration phase finishes without having improved the record value, but only if the record hyperinterval Dmin (k) is not the smallest one within the current partition of hyperintervals (for example, in Fig. 4, the record hyperinterval is among the smallest hyperintervals). Otherwise, the algorithm re-initiates another global exploration phase without forcing the local one. The record improvement phase reflects the already well-established fact in global optimization affirming the benefits of the record improvement during
15
the global search (see, e.g., the references given in [17, 23, 24, 37]). At a single iteration, it performs several subdivisions (namely, N ) of the record hyperinterval trying to improve the record value. During this process a new record value can appear. In this case, a new record hyperinterval can be considered for remaining subdivisions. The record hyperinterval subdivisions are performed by means of the onepoint-based strategy described in Section 2.1. Of course, other possible local improvement techniques can be used for this scope (see, e.g., [12, 30, 31]) but in this case the resulting trial points can not be managed within the vertex database mentioned in Section 2.1. It is important that the available gradient information allows us to terminate automatically the record improvement phase. In fact, the record hyperinterval is not further subdivided when the gradient projection on the directions parallel to the record hyperinterval sides becomes non-negative, i.e., when the following condition is satisfied: ∂f (amin ) (bmin (j) − amin (j)) ≥ 0 ∀j : j = 1, . . . , N. ∂x(j)
(20)
Either in this case or when the prefixed number N of subdivisions are normally performed (without meeting conditions (20)), the algorithm is switched again to the global exploration phase and continues its work. The algorithm stops when the number of generated trial points reaches the maximal allowed number Pmax . The satisfaction of this termination criterion is verified after every subdivision of a hyperinterval. The current record value fmin and the current record point xmin can be taken as approximations of the global minimum value f ∗ and the global minimizer x∗ from (2), respectively. A formal description of the new algorithm follows below (we assume without loss of generality that the admissible region D = [a, b] in (4) is an N dimensional hypercube). Step 0 (Initialization). Set the iteration counter k := 1. Let the first evaluation of f (x) and f ′ (x) be performed at the vertex a of the initial hyperinterval D = [a, b], i.e., x1 := a. Set the current partition of the search interval as D1 := {[a1 , b1 ]}, where a1 = a, b1 = b, and the current number of hyperintervals m(1) := 1. Set fmin (1) := f (x1 ), xmin (1) := a, and Dmin (1) := [a1 , b1 ]. Set group indices q∞ (1) := q0 (1) := p(1) := 0. 16
Suppose now that k ≥ 1 iterations of the algorithm have already been executed. The next iterations of the algorithm consist of the following steps. prec Step 1 (Exploration Phase). Memorize the current record fmin := fmin (k), set the counter of iterations during the exploration phase kg := 1 and perform the following steps:
Step 1.1. Identify the set of nondominated hyperintervals considering only groups of large hyperintervals (namely, those with the current indices from q∞ (k) up to ⌈(q∞ (k) + p(k))/2⌉). Subdivide those nondominated hyperintervals which satisfy inequality (18) and produce new trial points (or read the existing ones from the vertex database) according to Section 2.1. Set k := k + 1 and update hyperintervals indices if necessary (see [36] for details). Step 1.2. If condition (19) is satisfied, then go to Step 2 and execute the record improvement phase. Otherwise, go to Step 1.3. Step 1.3. Increase the counter kg := kg + 1: check whether kg ≤ N . If this is the case, then go to Step 1.1 (continue the exploration of large hyperintervals). Otherwise, go to Step 1.4 (perform the final iteration of the exploration phase by considering more hyperintervals groups). Step 1.4. Identify the set of nondominated hyperintervals considering the current groups of hyperintervals from q∞ (k) up to p(k). Subdivide those nondominated hyperintervals which satisfy inequality (18) and produce new trial points (or read the existing ones from the vertex database) according to Section 2.1. Set k := k +1, update all necessary indices. Step 1.5. If the record hyperinterval is not the smallest one, i.e., if p(k) < q0 (k), then then go to Step 2 and execute the record improvement phase. Otherwise, go to Step 1 and repeat the exploprec ration phase updating the value fmin . Step 2 (Record Improvement Phase). Set k := k + 1. Set the counter of iterations during the record improvement phase kl := 1 and perform the following steps: Step 2.1. Subdivide the record hyperinterval Dmin (k) and produce a new trial point (or read the existing one from the vertex database) 17
according to Section 2.1. Update hyperintervals indices and the record hyperinterval index if necessary. Step 2.2. Increase the counter kl := kl + 1: check whether kl ≤ N . If this is the case, then go to Step 1 (perform a new exploration of large hyperintervals). Otherwise, go to Step 2.1 (continue the local exploration of the subregion near to the record point). Let us now study convergence properties of the new method during minimization of the function f (x) from (2)–(4) when the maximal allowed number of generated trial points Pmax is equal to infinity. In this case, the algorithm does not stop (the number of iterations k goes to infinity) and an infinite sequence of trial points {xj(k) } is generated. Definition 4. The convergence of an infinite sequence of trial points {xj(k) } generated by a global optimization method is called everywhere dense if for any point x ∈ D and any δ > 0 there exist an iteration number k(δ) ≥ 1 and a point x′ ∈ {xj(k) }, k > k(δ), such that ∥x − x′ ∥ < δ. Theorem 2. The new algorithm manifests the everywhere dense convergence. Proof. Every subdivision of a hyperinterval by the partitioning scheme from Section 2.1 produces three new hyperintervals with the same volume equal to the third part of the volume of the subdivided hyperinterval and smaller main diagonals. Trial points generated by the new algorithm are at one of the vertices of each generated hyperinterval. Therefore, fixed a positive value of δ, it is sufficient to demonstrate that after a finite number of iterations k(δ), the largest hyperinterval of the current partition of the search domain D will have the length of its main diagonal smaller than δ. In this case, in δ-neighborhood of any point of D there will exist at least one trial point generated by the algorithm. Let us fix an iteration k ′ of the method and consider the group q∞ (k ′ ) ′ of the largest hyperintervals of the partition {Dk } on its two-dimensional graphic representation. This group is always taken into account when nondominated hyperintervals are looked for at the exploration phase of the algo′ rithm. As it follows from Def. 3, a hyperinterval Dt ∈ {Dk } from this group with the corresponding smallest value Ft from (13)–(15), must be partitioned and substituted by three smaller hyperintervals at the current iteration of the 18
algorithm because it is a nondominated hyperinterval and condition (18) is satisfied for it. Since each group consists of a finite number of hyperintervals, after a sufficiently large number of iterations k > k ′ all hyperintervals of the group q∞ (k) of the largest hyperintervals will be subdivided. The group index q∞ (k) will increase and the same procedure will be repeated with a new group of the largest hyperintervals, thus making the largest hyperintervals smaller and smaller. It can be also noted that the record hyperinterval Dmin is itself represented by a dot in the two-dimensional diagram of the current partition. It can be subdivided either separately during the record improvement phase, or as a nondominated hyperinterval during the exploration phase at which the satisfaction of condition (20) is not taken in consideration. Thus, there exists a finite number k(δ) such that after executing k(δ) iterations of the algorithm the largest hyperinterval of the current partition {Dk(δ) } will have the main diagonal smaller than δ. To conclude the theoretical study of the new algorithm we would like to highlight that the usage of all possible estimates of the Lipschitz constant in its work leads to the convergence of the everywhere dense type. If the Lipschitz constant L (or its valid estimate) of the objective function f (x) or the Lipschitz constant K (or its valid estimate) of the gradient f ′ (x) can be used by a global optimization method, other types of convergence can be established for such an algorithm (see, e.g., methods from [18, 34, 37, 38, 43]). 4. Numerical results In this Section, we present numerical results performed to compare the new algorithm with two methods belonging to the same class of the onepoint-base partitioning methods: the DIRECT algorithm from [19] and its locally-biased modification DIRECTl from [10]. Both of them use the centersampling partitioning strategy and work with a set of Lipschitz constants for the objective function f (x) from (2). The implementation of these two methods (downloadable from http://www4.ncsu.edu/~ctk/SOFTWARE/DIRECTv204. tar.gz) has been used in all the experiments following the way of the multicriteria comparison proposed in [36].
19
In order to make easier the numerical comparison with the DIRECTbased algorithms, the value ξ from (18) was set as in the DIRECT method, i.e., ξ = ϵ|fmin (k)|, ϵ ≥ 0.
(21)
The recommended value of ϵ = 10−4 (see [19, 36]) was used in (21). In accordance with [36], the global minimizer x∗ ∈ D was considered to be found when a method generated a trial point x′ inside a hyperinterval with a vertex x∗ and the volume smaller than the volume of the initial hyperinterval D = [a, b] multiplied by an accuracy coefficient ∆, 0 < ∆ ≤ 1, i.e., √ N |x′ (i) − x∗ (i)| ≤ ∆(b(i) − a(i)), 1 ≤ i ≤ N, (22) where N is from (4). The algorithm was stopped either when the maximal number of trials Pmax equal to 1 000 000 was reached, or when condition (22) was satisfied (see [36] for a discussion about different stopping criteria in global optimization methods). In our numerical experiments we used the same test classes, each of 100 continuously differentiable functions, produced by the GKLS-generator (see [11]) as in [36]. Particularly, eight GKLS D-type classes of dimensions N = 2, 3, 4, and 5 have been considered. For each particular problem dimension N a ‘simple’ and a ‘hard’ classes have been taken for the comparison (see [36] for a detailed description of the classes). For the convenience of the reader, we report here the four criteria introduced in [36, 37] that were used to compare the methods. The following designations are required: Ps – the number of trials performed by the method under consideration to solve the problem number s, 1 ≤ s ≤ 100, of a fixed test class. ms – the number of hyperintervals generated to solve the problem s. Criterion C1. Number of trials Ps∗ required for a method to satisfy condition (22) for all 100 functions of a particular test class, i.e., Ps∗ = max Ps , 1≤s≤100
s∗ = arg max Ps . 1≤s≤100
(23)
Criterion C2. The corresponding number of hyperintervals, ms∗ , generated by the method, where s∗ is from (23). Criterion C3. Average number of trials Pavg performed by the method during minimization of all 100 functions from a particular test class, i.e., 1 ∑ Ps . 100 s=1 100
Pavg =
20
(24)
Criterion C4. Number p (number q) of functions from a class for which DIRECT or DIRECTl executed less (more) function evaluations than the new algorithm. If Ps is the number of trials performed by the new algorithm and Ps′ is the corresponding number of trials performed by a competing method, p and q are evaluated as follows p=
100 ∑
{ σs′ ,
σs′
=
s=1
q=
100 ∑
{ σs ,
σs =
s=1
1, Ps′ < Ps , 0, otherwise.
(25)
1, Ps < Ps′ , 0, otherwise.
(26)
Results based on Criteria C1 and C2 are mainly influenced by minimization of the most difficult functions of a class. Criteria C3 and C4 deal with average data of a class. The number of generated hyperintervals (Criterion C2) provides an important characteristic of any partition algorithm for solving the problem (2)–(4). In some way, it corresponds to the qualitative examination of the search domain D during the work of the method. The greater is this number, the more information about the behavior of the objective function is available and, therefore, the smaller is the risk to miss its global minimizer. Of course, algorithms should not generate many redundant hyperintervals since this slows down the search and is therefore a disadvantage of the method (see [36] for more details). Results of numerical comparison of the methods with respect to Criteria C1 and C2 with eight GKLS test classes are shown in Tables 1–3. The accuracy coefficient ∆ from (22) is given in the second column of the tables. Table 1 reports the maximal number of trials required for satisfying condition (22) for half of the functions of a particular class (columns “50%”) and for all 100 function of the class (columns “100%”). The notation ‘> 1 000 000 (j)’ in Tables 1 and 3 means that after 1 000 000 function evaluations the method under consideration was not able to solve j problems. Table 2 represents the ratio between the maximal number of trials performed by DIRECT and DIRECTl with respect to the corresponding number of trials performed by the new algorithm. The numbers of generated hyperintervals (Criterion C2) are indicated in Table 3. According to Tables 1 and 3, the new multidimensional algorithm requires much fewer trials than the other two methods to ensure a thorough examination of the search domain. Moreover, the advantage of the new method 21
Table 1: Number of trial points for GKLS test functions (Criterion C1). N 2 2 3 3 4 4 5 5
∆ 10−4 10−4 10−6 10−6 10−6 10−6 10−7 10−7
Class DIRECT 111 1062 386 1749 4805 16114 1660 55092
simple hard simple hard simple hard simple hard
50% DIRECTl 152 1328 591 1967 7194 33147 9246 126304
New 59 182 362 416 2574 3773 1757 13662
DIRECT 1159 3201 12507 >1000000 (4) >1000000 (4) >1000000 (7) >1000000 (1) >1000000 (16)
100% DIRECTl 2318 3414 13309 29233 118744 287857 178217 >1000000 (4)
New 335 1075 2043 2352 16976 20866 16300 88459
Table 2: Improvement obtained by the new algorithm in terms of Criterion C1. N 2 2 3 3 4 4 5 5
∆ 10−4 10−4 10−6 10−6 10−6 10−6 10−7 10−7
Class simple hard simple hard simple hard simple hard
DIRECT/New 3.46 2.98 6.12 >425.17 >58.91 >47.92 >61.35 >11.30
DIRECTl/New 6.92 3.18 6.51 12.43 6.99 13.80 10.93 >11.30
becomes even more pronounced as the problem dimension grows or the problem complexity increases. In fact, on half of the test functions from each class (which were the most simple for each method with respect to the other functions of the class) the new algorithm already manifested a very good performance with respect to DIRECT and DIRECTl in terms of the number of generated trial points (see columns “50%” in Table 1). When all the functions were taken in consideration (and, consequently, difficult functions of the class were considered too), the number of trials produced by the new algorithm was much fewer in comparison with two other methods (see columns “100%” in Table 1), ensuring at the same time a substantial examination of the admissible domain (see Table 3). Note also that maximal number of trials equal to 88459 (see Table 1) required by the new method to solve all problems of the hard five-dimensional class is obtained on the function 5 of this class. If we use the new method with the one-point-based strategy starting from the point b rather than from the 22
Table 3: Number of hyperintervals for GKLS test functions (Criterion C2). N 2 2 3 3 4 4 5 5
∆ 10−4 10−4 10−6 10−6 10−6 10−6 10−7 10−7
Class simple hard simple hard simple hard simple hard
DIRECT 111 1062 386 1749 4805 16114 1660 55092
50% DIRECTl 152 1328 591 1967 7194 33147 9246 126304
New 185 607 1867 2061 21635 33173 19823 169413
DIRECT 1159 3201 12507 >1000000 (4) >1000000 (4) >1000000 (7) >1000000 (1) >1000000 (16)
100% DIRECTl 2318 3414 13309 29233 118744 287857 178217 >1000000 (4)
New 1137 3993 12149 14357 186295 223263 255059 1592969
point a (see Section 2.1), the number of trials required by the new algorithm to solve this particular problem becomes equal to 15238. Thus, some a priori knowledge on the objective function behavior can allow us to better select the vertex of the initial hyperinterval D in which the first trial will be executed and, therefore, to accelerate the search even more. Table 4 reports the average number of trials performed during minimization of all 100 functions from the same GKLS classes (Criterion C3). The “Improvement” columns in these tables represent the ratios between the average numbers of trials performed by DIRECT and DIRECTl with respect to the corresponding numbers of trials performed by the new algorithm. The symbol ‘>’ reflects the situation when not all functions of a class were successfully minimized by the method under consideration in the sense of condition (22). This means that the method stopped when Pmax trials had been executed during minimization of several functions of this particular test class. In these cases, the value of Pmax equal to 1 000 000 was used in calculations of the average value in (24), providing in such a way a lower estimate of the average. As can be seen from Table 4, the new method outperforms DIRECT and DIRECTl also on Criterion C3. Finally, results of comparison between the new algorithm and its two competitors in terms of Criterion C4 are reported in Table 5. This table shows how often the new algorithm was able to minimize each of 100 functions of a class with a smaller number of trials with respect to DIRECT or DIRECTl. The notation ‘p : q’ means that among 100 functions of a particular test class there are p functions for which DIRECT (or DIRECTl) spent fewer function trials than the new algorithm and q functions for which the new algorithm generated fewer trial points with respect to DIRECT (or DIRECTl) (p and 23
Table 4: Average number of trial points for GKLS test functions (Criterion C3). N
∆
Class
DIRECT
DIRECTl
New
2 2 3 3 4 4 5 5
10−4 10−4 10−6 10−6 10−6 10−6 10−7 10−7
simple hard simple hard simple hard simple hard
198.89 1063.78 1117.70 >42322.65 >47282.89 >95708.25 >16057.46 >217215.58
292.79 1267.07 1785.73 4858.93 18983.55 68754.02 16758.44 >269064.35
97.22 192.00 491.28 618.32 3675.84 5524.77 3759.05 22189.47
Improvement DIRECT/New DIRECTl/New 2.06 3.01 5.54 6.60 2.28 3.63 >68.45 7.86 >12.87 5.16 >17.32 12.44 >4.27 4.46 >9.79 >12.13
Table 5: Comparison between the new algorithm and DIRECT and DIRECTl in terms of Criterion C4. N 2 2 3 3 4 4 5 5
∆ 10−4 10−4 10−6 10−6 10−6 10−6 10−7 10−7
Class simple hard simple hard simple hard simple hard
DIRECT : New 28 : 72 15 : 85 36 : 64 19 : 81 39 : 61 14 : 86 55 : 45 26 : 74
DIRECTl : New 21 : 79 16 : 84 30 : 70 17 : 83 25 : 75 16 : 84 17 : 83 20 : 80
q are from (25) and (26), respectively). As a rule, the more hard objective functions are presented in a test class, the more pronounced becomes the advantage of the new algorithm on Criterion C4, as well. As demonstrated by the results of the extensive numerical experiments performed, the usage of the gradient information together with the efficient partitioning strategy allows one to obtain a serious acceleration in comparison with the DIRECT-based methods on the studied classes of test problems. References [1] Audet, C., Hansen, P., Savard, G. (Eds.), 2005. Essays and Surveys in Global Optimization. GERAD 25th Anniversary, Springer–Verlag, New York. [2] Baritompa, W., 1993. Customizing methods for global optimization – A geometric viewpoint. J. Global Optim. 3, 193–212. 24
[3] Breiman, L., Cutler, A., 1993. A deterministic algorithm for global optimization. Math. Program. 58, 179–199. [4] Di Serafino, D., Liuzzi, G., Piccialli, V., Riccio, F., Toraldo, G., 2011. A modified DIviding RECTangles algorithm for a problem in astrophysics. J. Optim. Theory Appl. 151, 175–190. [5] Dumas, L., Druez, B., Lecerf, N., 2009. A fully adaptive hybrid optimization of aircraft engine blades. J. Comput. Appl. Math. 232, 54–60. [6] Evtushenko, Y.G., Malkova, V.U., Stanevichyus, A.A., 2009. Parallel global optimization of functions of several variables. Comput. Math. Math. Phys. 49, 246–260. [7] Evtushenko, Y.G., Posypkin, M.A., 2011. An application of the nonuniform covering method to global optimization of mixed integer nonlinear problems. Comput. Math. Math. Phys. 51, 1286–1298. [8] Evtushenko, Y.G., 1985. Numerical Optimization Techniques. Translations Series in Mathematics and Engineering, Springer–Verlag, Berlin. [9] Finkel, D.E., Kelley, C.T., 2006. Additive scaling and the DIRECT algorithm. J. Global Optim. 36, 597–608. [10] Gablonsky, J.M., Kelley, C.T., 2001. A locally-biased form of the DIRECT algorithm. J. Global Optim. 21, 27–37. [11] Gaviano, M., Lera, D., Kvasov, D.E., Sergeyev, Y.D., 2003. Algorithm 829: Software for generation of classes of test functions with known local and global minima for global optimization. ACM Trans. Math. Software 29, 469–480. [12] Gaviano, M., Lera, D., 2002. A complexity analysis of local search algorithms in global optimization. Optim. Methods Softw. 17, 113–127. [13] Gaviano, M., Lera, D., 2008. A global minimization algorithm for Lipschitz functions. Optim. Lett. 2, 1–13. [14] Gergel, V.P., 1997. A global optimization algorithm for multivariate function with Lipschitzian first derivatives. J. Global Optim. 10, 257– 281. 25
[15] Graf, P.A., Kim, K., Jones, W.B., Wang, L., 2007. Surface passivation optimization using DIRECT. J. Comput. Phys. 224, 824–835. [16] He, J., Watson, L.T., Ramakrishnan, N., Shaffer, C.A., Verstak, A., Jiang, J., Bae, K., Tranter, W.H., 2002. Dynamic data structures for a direct search algorithm. Comput. Optim. Appl. 23, 5–25. [17] Horst, R., Pardalos, P.M. (Eds.), 1995. Handbook of Global Optimization. volume 1. Kluwer Academic Publishers, Dordrecht. [18] Horst, R., Tuy, H., 1996. Global Optimization – Deterministic Approaches. Springer–Verlag, Berlin. [19] Jones, D.R., Perttunen, C.D., Stuckman, B.E., 1993. Lipschitzian optimization without the Lipschitz constant. J. Optim. Theory Appl. 79, 157–181. [20] Jones, D.R., Schonlau, M., Welch, W.J., 1998. Efficient global optimization of expensive black-box functions. J. Global Optim. 13, 455–492. [21] Kvasov, D.E., Pizzuti, C., Sergeyev, Y.D., 2003. Local tuning and partition strategies for diagonal GO methods. Numer. Math. 94, 93–106. [22] Kvasov, D.E., Sergeyev, Y.D., 2003. Multidimensional global optimization algorithm based on adaptive diagonal curves. Comput. Math. Math. Phys. 43, 42–59. [23] Kvasov, D.E., Sergeyev, Y.D., 2009. A univariate global search working with a set of Lipschitz constants for the first derivative. Optim. Lett. 3, 303–318. [24] Lera, D., Sergeyev, Y.D., 2010a. An information global minimization algorithm using the local improvement technique. J. Global Optim. 48, 99–112. [25] Lera, D., Sergeyev, Y.D., 2010b. Lipschitz and H¨older global optimization using space-filling curves. Appl. Numer. Math. 60, 115–129. [26] Liuzzi, G., Lucidi, S., Piccialli, V., 2010. A partition-based global optimization algorithm. J. Global Optim. 48, 113–128.
26
[27] Luo, C., Zhang, S., Wang, C., Jiang, Z., 2011. A metamodel-assisted evolutionary algorithm for expensive optimization. J. Comput. Appl. Math. 236, 759–764. [28] Mockus, J., 2000. A Set of Examples of Global and Discrete Optimization: Applications of Bayesian Heuristic Approach. Kluwer Academic Publishers, Dordrecht. [29] Moles, C.G., Mendes, P., Banga, J.R., 2003. Parameter estimation in biochemical pathways: A comparison of global optimization methods. Genome Res. 13, 2467–2474. [30] Nesterov, Y., 2004. Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers, Dordrecht. [31] Nocedal, J., Wright, S.J., 1999. Numerical Optimization. Springer– Verlag, Dordrecht. [32] Panning, T.D., Watson, L.T., Allen, N.A., Chen, K.C., Shaffer, C.A., Tyson, J.J., 2008. Deterministic parallel global parameter estimation for a model of the budding yeast cell cycle. J. Global Optim. 40, 719–738. [33] Pardalos, P.M., Romeijn, H.E., Tuy, H., 2000. Recent developments and trends in global optimization. J. Comput. Appl. Math. 124, 209–228. [34] Pint´er, J., 1996. Global Optimization in Action (Continuous and Lipschitz Optimization: Algorithms, Implementations and Applications). Kluwer Academic Publishers, Dordrecht. [35] Sergeyev, Y.D., Daponte, P., Grimaldi, D., Molinaro, A., 1999. Two methods for solving optimization problems arising in electronic measurements and electrical engineering. SIAM J. Optim. 10, 1–21. [36] Sergeyev, Y.D., Kvasov, D.E., 2006. Global search based on efficient diagonal partitions and a set of Lipschitz constants. SIAM J. Optim. 16, 910–937. [37] Sergeyev, Y.D., Kvasov, D.E., 2008. Diagonal Global Optimization Methods. FizMatLit, Moscow. In Russian.
27
[38] Sergeyev, Y.D., Kvasov, D.E., 2011. Lipschitz global optimization, in: Cochran, J.J. (Ed.), Wiley Encyclopedia of Operations Research and Management Science. Wiley, New York. volume 4, pp. 2812–2828. [39] Sergeyev, Y.D., 1995. An information global optimization algorithm with local tuning. SIAM J. Optim. 5, 858–870. [40] Sergeyev, Y.D., 1998. Global one-dimensional optimization using smooth auxiliary functions. Math. Program. 81, 127–146. [41] Sergeyev, Y.D., 2000. An efficient strategy for adaptive partition of N dimensional intervals in the framework of diagonal algorithms. J. Optim. Theory Appl. 107, 145–168. [42] Sergeyev, Y.D., 2005. Efficient partition of N -dimensional intervals in the framework of one-point-based algorithms. J. Optim. Theory Appl. 124, 503–510. [43] Strongin, R.G., Sergeyev, Y.D., 2000. Global Optimization with NonConvex Constraints: Sequential and Parallel Algorithms. Kluwer Academic Publishers, Dordrecht. [44] Trigiante, D. (Ed.), 2000. Recent Trends in Numerical Analysis. Nova Science Publishers, Inc., New York. [45] Wu, Y., Ozdamar, L., Kumar, A., 2005. TRIOPT: A triangulationbased partitioning algorithm for global optimization. J. Comput. Appl. Math. 177, 35–53. ˇ [46] Zhigljavsky, A.A., Zilinskas, A., 2008. Stochastic Global Optimization. Springer, New York.
28