Ann Oper Res (2007) 156: 25–44 DOI 10.1007/s10479-007-0232-y
Solving fractional problems with dynamic multistart improving hit-and-run Mirjam Dür · Charoenchai Khompatraporn · Zelda B. Zabinsky
Published online: 4 August 2007 © Springer Science+Business Media, LLC 2007
Abstract Fractional programming has numerous applications in economy and engineering. While some fractional problems are easy in the sense that they are equivalent to an ordinary linear program, other problems like maximizing a sum or product of several ratios are known to be hard, as these functions are highly nonconvex and multimodal. In contrast to the standard Branch-and-Bound type algorithms proposed for specific types of fractional problems, we treat general fractional problems with stochastic algorithms developed for multimodal global optimization. Specifically, we propose Improving Hit-and-Run with restarts, based on a theoretical analysis of Multistart Pure Adaptive Search (cf. the dissertation of Khompatraporn (2004)) which prescribes a way to utilize problem specific information to sample until a certain level α of confidence is achieved. For this purpose, we analyze the Lipschitz properties of fractional functions, and then utilize a unified method to solve general fractional problems. The paper ends with a report on numerical experiments. Keywords Fractional programming · Stochastic algorithms · Global optimization · Improving hit-and-run · Lipschitz properties · Multistart · Pure adaptive search
This work was initiated while Mirjam Dür was spending a three-month research visit at the University of Washington. She would like to thank the Fulbright Commission for financial support and the optimization group at UW for their warm hospitality. The work of C. Khompatraporn and Z.B. Zabinsky was partially supported by the NSF grant DMI-0244286. M. Dür () Department of Mathematics, Darmstadt University of Technology, 64289 Darmstadt, Germany e-mail:
[email protected] C. Khompatraporn Department of Production Engineering, King Mongkut’s University of Technology Thonburi, 91 Pracha-Utit Rd., Thungkru, Bangkok, 10140, Thailand e-mail:
[email protected] Z.B. Zabinsky Industrial Engineering, University of Washington, Box 352650, Seattle, WA 98195, USA e-mail:
[email protected]
26
Ann Oper Res (2007) 156: 25–44
1 Introduction and motivation In this paper, we consider optimization problems whose objective function involves ratios of the form g(x) h(x) where g and h are linear, quadratic, or more general functions. We aim at minimizing various functions of these ratios, for example the sum, the product or the maximum of several ratios. Optimization problems of this kind arise in many applications in economics and engineering, whenever some kind of efficiency of a system is to be maximized. A good survey on theoretical and applications oriented aspects of fractional programming can be found in Schaible (1995). Fractional problems have been studied since the 1960’s when Charnes and Cooper (1962) proposed their famous transformation to rewrite the problem of maximizing a ratio of linear functions over linear constraints as an ordinary linear program. It was soon discovered that the sum of two linear ratios is a quasiconcave function, a property that could be algorithmically exploited, cf. Cambini et al. (1989) or Falk and Palocsay (1992). The same is true for a ratio of a concave function divided by a convex function (Freund and Jarre 2001). However, the sum of more than two linear fractions does not enjoy generalized convexity properties and is generally nonconvex with possibly numerous local optima. Several algorithms such as those in Benson (2002a, 2002b), Dür et al. (2001), Konno and Abe (1999) have been proposed to maximize such a function over a polytope. A recent survey article on this problem is Schaible and Shi (2003). Other new contributions to the field are Chadha’s (2002) simplex-type procedure for fractions of functions involving absolute-value functions and an approach exploiting monotonicity in the objective by Phuong and Tuy (2003), which can also handle products of ratios and max–min problems. We thus find the following situation: Many different types of fractional problems have been studied, and different algorithms have been proposed for each class. Most of the algorithms involve sophisticated techniques like Branch-and-Bound which require a certain effort to implement. At this time there is no unified approach to the different fractional problem types (except for, to a certain extent, Phuong and Tuy 2003). Our approach is conceptually different from the many existing deterministic methods. We propose to use a stochastic algorithm called Improving Hit-and-Run (Zabinsky et al. 1993) with restarts, which is easy to implement and applicable to all existing classes of fractional problems. The only parameters needed to be specified are the length (number of function evaluations) of a restart and the total number of restarts. As we show, the length of a restart and the total number of restarts can be chosen based on a theoretical analysis of Multistart Pure Adaptive Search (Khompatraporn 2004), which yields an α-confidence of being within ε of the global optimum. In a later section, numerical experiments demonstrate that Improving Hit-and-Run with restarts is very reliable in terms of accuracy of the solution. The remainder of the paper is organized as follows. We first describe the stochastic algorithms Pure Adaptive Search (PAS), Improving Hit-and-Run (IHR), and Multistart Pure Adaptive Search (MPAS) on which our work is based. In Sect. 3, we develop a new algorithm that we term Dynamic Multistart Improving Hit-and-Run (DMIHR). This algorithm utilizes Lipschitz properties of the objective function. We therefore show how to derive Lipschitz constants for fractional problems. The last section demonstrates by means of several examples from the fractional programming literature the versatility of our approach.
Ann Oper Res (2007) 156: 25–44
27
2 Background on stochastic algorithms In this section, we briefly describe the stochastic algorithms used in this paper. The description and analysis, however, are not restricted to fractional programs, but to the general minimization program min f (x) s.t.
x∈S
where f : R → R is a continuous function and S is a compact set in Rn . The solution set to this program, x∗ = arg minx∈S f (x), is assumed to be a nonempty subset of S. The corresponding optimal objective function value is denoted by y∗ = f (x∗ ). We also denote the maximal objective function value by y ∗ = maxx∈S f (x). While a stochastic algorithm cannot guarantee the exact global optimum, it can (under the assumption that the probability of sampling in level sets is known) ensure the quality of the solution by guaranteeing that the probability of ending up with an estimated optimal value y differing more than ε from the true optimal value y∗ is less than a given level of confidence α. Hence, we are content if the following is satisfied: n
P ( y ≤ y∗ + ε) ≥ 1 − α
(1)
where y denotes the estimate of the optimal objective function value (using the best objective function value found), ε > 0 is a prescribed precision, and α ∈ (0, 1). 2.1 Pure adaptive search (PAS) Pure Adaptive Search (PAS) is still an ideal algorithm, but it has desirable theoretical properties. PAS (see Zabinsky 2003 or Zabinsky and Smith 1992) works as follows: Pure Adaptive Search Step 0. Generate a uniformly distributed starting point X0 ∈ S. Set Y0 = f (X0 ) and a counter k = 0. Step 1. Generate Xk+1 according to a uniform distribution on the improving level set S(Yk ) = {x ∈ S: f (x) ≤ Yk }. Set Yk+1 = f (Xk+1 ). Step 2. If a stopping criterion is met, stop. Otherwise, increment k and return to Step 1. The crucial point clearly lies in the improving level sets S(y) = {x ∈ S: f (x) ≤ y}. For a general continuous function, these improving level sets are nested, and sampling uniformly from them still poses a challenge. The reason that PAS is nonetheless of interest is the fact that its expected number of iterations to obtain an ε-optimal solution is linear in the problem dimension, holding certain parameters constant (Zabinsky and Smith 1992). This property is desirable for practical purposes. Let p(y) denote the probability that a point X which is sampled uniformly from S lies in a level set S(y), i.e. p(y) = P (X ∈ S(y)). Then PAS has the following property, which is characterized by p(y).
28
Ann Oper Res (2007) 156: 25–44
Theorem 1 (Zabinsky and Smith 1992) Let YkPAS denote the objective function value obtained through PAS in iteration k, and let y∗ ≤ y ≤ y ∗ . Then k p(y)(ln(1/p(y)))i P YkPAS ≤ y = i! i=0
(2)
for k = 1, 2, . . . , where p(y) = P (X ∈ S(y)). Proof See Zabinsky and Smith (1992, Theorem 4.3).
This property is needed to derive a theoretical property of the next MPAS algorithm as well as for later analysis. 2.2 Multistart pure adaptive search (MPAS) Multistart Pure Adaptive Search was recently introduced in Khompatraporn (2004). It is an attempt to formulate a strategy of when to stop a run of a stochastic algorithm and how often to restart when the algorithm gets trapped in a local optimum. Like PAS, MPAS is a theoretical algorithm, but we use its analysis to motivate a stopping and restarting strategy for IHR. The MPAS algorithm can be described as follows: Multistart Pure Adaptive Search Step 0. Specify the number of independent restarts r ∈ N+ and the PAS sequence length r,s = +∞. s ∈ N. Set iteration counter i = 1, and let Y Step 1. Generate a uniformly distributed starting point Xi,0 ∈ S. Set Yi,0 = f (Xi,0 ). Set r,s }. Set j = 0. r,s = min{Yi,0 , Y Y Step 2. If j = s, go to Step 4. Otherwise, go to Step 3. Step 3. Generate Xi,j +1 ∈ S(Yi,j ) = {x ∈ S: f (x) ≤ Yi,j } according to a uniform distribr,s = min{Yi,j +1 , Y r,s }. Increment j and ution. Set Yi,j +1 = f (Xi,j +1 ) and update Y return to Step 2. Step 4. If i = r, stop. Otherwise, increment i and return to Step 1. MPAS Let Yr,s be the best objective function value found by MPAS after r independent restarts, each with s PAS iterations. Utilizing the independence of the restarts and the result from Theorem 1 it can be shown (see Khompatraporn 2004 for more details) that
r MPAS ≤ y = 1 − 1 − P YsPAS ≤ y P Yr,s r s p(y)(ln(1/p(y)))i = 1− 1− . i! i=0
(3)
Similarly, if different lengths of the PAS sequences are used, say run k performs sk PAS iterations where k ∈ {1, 2, . . . , r}, then sk r MPAS p(y)(ln(1/p(y)))i 1− . (4) P Yr,{s1 ,s2 ,...,sr } ≤ y = 1 − i! k=1 i=0
Ann Oper Res (2007) 156: 25–44
29
2.3 Improving hit-and-run (IHR) Improving Hit-and-Run is an implementable approximation to Pure Adaptive Search. It was first introduced by Zabinsky et al. (1993). A thorough overview of its development since that time can be found in Zabinsky (2003). IHR uses the Hit-and-Run generator introduced by Smith (1984) to approximate uniformly generated points from the improving level sets. The formal description of the algorithm is as follows: Improving Hit-and-Run Step 0. Initialize X0 ∈ S, Y0 = f (X0 ), and set k = 0. Step 1. Generate a random direction Dk uniformly distributed on the boundary of the unit hypersphere. Step 2. Generate a candidate point Wk+1 = Xk + λDk by sampling uniformly over the line set Lk = {x ∈ S: x = Xk + λDk , λ a real scalar}. If Lk = ∅, go to Step 1. Step 3. Update the current point Xk+1 with the candidate point if it is improving, i.e. set Wk+1 if f (Wk+1 ) < Yk , Xk+1 = otherwise Xk and set Yk+1 = f (Xk+1 ). Step 4. If a stopping criterion is met, stop. Otherwise, increment k and return to Step 1. It has been observed that when the approximation of IHR to PAS deteriorates, it is useful to restart IHR. Up to now, however, we have been lacking a theoretical structure to motivate when to restart IHR and the number of restarts to perform. Dynamic Multistart Improving Hit-and-Run (DMIHR) detailed in the next section is an attempt to address this issue.
3 Main results We first define DMIHR in Sect. 3.1, and then apply it to general fractional programs. In order to adapt the theoretical analysis of MPAS that motivates DMIHR, we need the Lipschitz properties of fractional functions which are derived in Sect. 3.2. Then in Sect. 4, we apply DMIHR to several fractional programs of various types to demonstrate the versatility of this new method. 3.1 Dynamic multistart improving hit-and-run We first discuss how the probability constraint (1) is used to motivate our definition of DMIHR. The density function p(y) in (3) is generally difficult to obtain, but can be bounded when the objective function satisfies the Lipschitz condition. Similarly, the probability constraint described by (4) motivates a practical stopping and restarting strategy using IHR when p(y) can be bounded. Recall that a function f : Rn → R is said to fulfill the Lipschitz condition on S ⊆ Rn if there exists a Lipschitz constant K ≥ 0 such that |f (x) − f (y)| ≤ K x − y for all x, y ∈ S.
30
Ann Oper Res (2007) 156: 25–44
The next lemma provides a means to bound p(y), knowing only the diameter of the feasible region and the Lipschitz constant for the original problem. Lemma 1 (Zabinsky and Smith 1992) For a global optimization problem over a convex feasible region S in n dimensions with diameter D and Lipschitz constant K for the objective function, and for y∗ ≤ y ≤ y ∗ ,
y − y∗ n . (5) p(y) ≥ KD
Proof See Zabinsky and Smith (1992, Lemma 5.2).
The Lipschitz constant K of the objective function is usually not known, but for fractional functions we can find an upper bound K ≥ K using Theorem 2 in the next section. ¯ on p(y): Substituting K into (5) gives a lower bound p(y)
p(y) ≥
y − y∗ KD
n ≥
y − y∗ KD
n =: p(y). ¯
(6)
Returning to Multistart Pure Adaptive Search, to satisfy the probability constraint of being within ε of the global optimum with 1 − α certainty, we essentially want to satisfy the following probability constraint, P
MPAS Yr,s
≤ y∗ + ε ≥ 1 − 1 −
s p(y ¯ ∗ + ε)(ln(1/p(y ¯ ∗ + ε)))i
r
i!
i=0
≥ 1 − α.
(7)
Observe that p(y ¯ ∗ + ε) = (ε/KD)n . Hence, it is not necessary to know y∗ to determine p(y ¯ ∗ + ε). In the case that not every sequence of PAS carried out in MPAS has the same length s, but run k of PAS has a sequence length sk , we arrive at the constraint 1−
r k=1
1−
sk p(y ¯ ∗ + ε)(ln(1/p(y ¯ ∗ + ε)))i i=0
i!
≥ 1 − α.
(8)
We imitate PAS behavior by supposing that the improving points obtained in a sequence of IHR are sampled approximately uniformly on the improving level sets, and we treat the number of improving points in restart k as an approximation to the number of PAS iterations sk . Then we determine the number of restarts of IHR so that the probability constraint (8) is satisfied. In this way the theory of MPAS motivates the algorithm DMIHR as formally described below. Dynamic Multistart Improving Hit-and-Run Step 0. Select α, ε, and the maximum number θ of function evaluations for a single run of IHR. Obtain the parameters K, D, n of the problem. Set j = 1. Step 1. Execute θ iterations of IHR. Record the number of improving points sj , as well as j ) and its associated soluj = f (X the best objective function value of run j by Y j . tion X
Ann Oper Res (2007) 156: 25–44
31
DMIHR found so far by Step 2. Update the best objective function value Y DMIHR = min {Y i } Y i=1,2,...,j
and DMIHR = arg min X
i=1,2,...,j
i ) . f (X
Step 3. Calculate the probability bound Pε of being close to the ε-optimal region from sk j (ε/KD)n (ln(KD/ε)n )i 1− . Pε = 1 − i! k=1 i=0 Step 4. If Pε ≥ 1 − α, stop. Otherwise, increment j and go to Step 1. Note that in practice sk is likely to be different for each restart. The total number of restarts in DMIHR is not predetermined at the beginning of the algorithm, but is the smallest j for which sk j p(y ¯ ∗ + ε)(ln(1/p(y ¯ ∗ + ε)))i 1− 1− ≥ 1 − α, (9) i! k=1 i=0 and hence the name dynamic MIHR. The total number of restarts is not predetermined by the user, but is determined based on the sequence of improving points achieved during execution. Notice that DMIHR is a finite algorithm. Even in the worst case when sk = 1 for each of the restarts, the total number of restarts to satisfy the probability constraint could be large but finite. One advantage of DMIHR is that it is flexible in a sense that it allows the user to stop sampling whenever s/he wants, but yet is able to approximate the probability of being close to the optimum at any stopping time. The user can see the tradeoff between computational effort and desired probability of being close to the optimum. The method potentially saves some samplings when some starting points lead to better-than-expected acquisition of improving points within a restart. Moreover, it is possible that the predetermined number of function evaluations for each of the restarts may be too few to obtain exactly s improving points if the original MPAS was to be implemented. While DMIHR has a means to determine the number of restarts, the maximum number of function evaluations for a single restart is still arbitrarily set by the user. Section 3.2 provides a methodology to compute K. The dimension n is known by the problem, ε is chosen by the user to reflect the desired level of accuracy, and D (the remaining ingredient for p(y ¯ ∗ + ε)) can be computed as follows. If the feasible set S is a hyperrectangle S = [x , x u ], then its diameter is given by D = x u − x 2 . If S is a polytope with vertices v 1 , . . . , v N , then D can be computed from D=
max
(i,j )∈{1,...,N}2
v i − v j 2 .
If the vertices are unknown or too costly to compute, then the polytope can be embedded in a hyperrectangle to derive a bound on D.
32
Ann Oper Res (2007) 156: 25–44
3.2 Lipschitz properties of fractional functions In order to apply DMIHR to general fractional programs, including sums, products, max or min, and absolute value of ratios, we discuss the Lipschitz property of the objective function. It is well known that the Lipschitz property is related to differentiability of the objective function through the following lemma. Lemma 2 Let S be a convex and compact subset in Rn , and let f be continuously differentiable on an open set containing S. Then f is Lipschitzian on S with the constant
K = max ∇f (x) : x ∈ S . (10) Proof See Horst et al. (2002, Proposition 5.1).
In the next lemma we recall some useful properties of Lipschitz functions. Lemma 3 Let fi (i = 1, . . . , p) be Lipschitz functions on a compact set S ⊆ Rn with Lipschitz constants Ki , and let γ ∈ R. Then the following functions also fulfill the Lipschitz condition on S: (a) γfi , with Lipschitz constant equal to |γ |Ki ; p p (b) i=1 fi , with Lipschitz constant less than or equal to i=1 Ki ; (c) both mini=1,...,p fi and maxi=1,...,p fi , with Lipschitz constant less than or equal to maxi=1,...,p Ki ; (d) |fi |, with Lipschitz constant equal to Ki ; (e) f1 · f2 , with Lipschitz constant less than or equal to M2 K1 + M1 K2 , where Mi stands for Mi = max{|fi (x)|: x ∈ S} for i = 1, 2; 1 1 K2 (f) ff12 , with Lipschitz constant less than or equal to ( K +M ) if f2 > 0 on S, where m2 (m2 )2 m2 = min{f2 (x): x ∈ S} and M1 = max{|f1 (x)|: x ∈ S}. Proof We show here only the proof of (e) and (f). The other statements are straightforward from the definition. In (e), we have |f1 (x)f2 (x) − f1 (y)f2 (y)| = |f1 (x)f2 (x) − f1 (y)f2 (x) + f1 (y)f2 (x) − f1 (y)f2 (y)| ≤ |[f1 (x) − f1 (y)]f2 (x)| + |f1 (y)[f2 (x) − f2 (y)]| ≤ |f1 (x) − f1 (y)||f2 (x)| + |f1 (y)||f2 (x) − f2 (y)| ≤ (M2 K1 + M1 K2 ) x − y . To show (f), observe that f1 (x) f1 (y) f1 (x)f2 (y) − f1 (y)f2 (x) = − f (x) f (y) f2 (x)f2 (y) 2 2 ≤
1 |f1 (x)f2 (y) − f1 (y)f2 (y) + f1 (y)f2 (y) − f1 (y)f2 (x)| f2 (x)f2 (y)
1 (|f1 (x) − f1 (y)||f2 (y)| + |f1 (y)||f2 (y) − f2 (x)|) f2 (x)f2 (y)
K 1 M1 K 2 ≤ +
x − y . m2 (m2 )2 ≤
Ann Oper Res (2007) 156: 25–44
33
For differentiable functions, the above estimate on the Lipschitz constant in part (f) of Lemma 3 can be sharpened. Consider a fractional function of the form f (x) = g(x) with g, h(x) h differentiable. We also assume that h(x) > 0 on the feasible set. The assumption on h(x) imposes no restriction on generality since a change of the sign in the numerator can accommodate the sign of the denominator and render the maximization problem to a minimization one. The next theorem shows that g/ h that fulfills the above properties is Lipschitzian, and we derive a bound on the corresponding Lipschitz constant. To aid the proof, it is convenient to define the following constants: η¯ := min h(x) and x∈S
η¯ u := max h(x) x∈S
(11)
the minimum and maximum values of the denominator on the feasible set. Note that, by assumption, we have η¯ u ≥ η¯ > 0. Similarly, it will be necessary to have bounds on the absolute values of the numerator: α¯ g := max |g(x)| = max | min g(x)|, | max g(x)| x∈S
x∈S
x∈S
(12)
and on the partial derivatives of both numerator and denominator: ∂g(x) ∂g(x) ∂g(x) := max = max min , max x∈S ∂xj x∈S ∂xj x∈S ∂xj
(13)
∂h(x) = max min ∂h(x) , max ∂h(x) . αjh := max x∈S ∂xj x∈S x∈S ∂xj ∂xj
(14)
g αj
and
: Rn → R with differentiable functions g(x) Theorem 2 A fractional function f (x) = g(x) h(x) and h(x), and h(x) > 0 on a compact convex set S, satisfies the Lipschitz condition on S. Its Lipschitz constant K is bounded by K ≤ K := M 2 ,
(15)
where the components Mj (j = 1, 2, . . . , n) of M are given by g
Mj =
αj η¯ u + αjh α¯ g (η¯ )2
(16)
,
and the constants involved are defined in (11–14). Proof It follows from Lemma 2 that f is Lipschitzian. To estimate the Lipschitz constant given in (10), observe that ∂f (x) = ∂xj
∂g(x) h(x) − ∂h(x) g(x) ∂xj ∂xj
(h(x))2
.
34
Ann Oper Res (2007) 156: 25–44
Since ∂g(x) ∂g(x) ∂h(x) ∂h(x) ≤ + h(x) − g(x) h(x) g(x) ∂x ∂xj ∂xj ∂xj j ∂g(x) h(x) + ∂h(x) |g(x)| ≤ ∂xj ∂xj g
≤ αj η¯ u + αjh α¯ g , we have ∂f (x) αjg η¯ u + αjh α¯ g = Mj ∂x ≤ (η¯ )2 j on the feasible set, whence K = max ∇f (x) ≤ M , x∈S
with Mj defined in (16).
c,x +a If both numerator and denominator are linear functions, i.e. f (x) = d,x +b with c, d ∈ Rn and a, b ∈ R, then the estimate becomes simpler. If the feasible set is a polytope, then the constants η¯ and η¯ u from (11) and α¯ g from (12) are easily computed using standard linear programming techniques (cf. Charnes and Cooper 1962). g In this case, the constants αj and αjh from (13) and (14) simplify to g
αj = |cj | and
αjh = |dj |.
Therefore, for linear fractional functions we have the following result: c,x +a Corollary 1 A fractional function f (x) = d,x +b which fulfills d, x + b > 0 on a compact and convex set S, satisfies the Lipschitz condition on S. Its Lipschitz constant K is bounded by
K ≤ K := M , where the components Mj (j = 1, . . . , n) of M are given by Mj =
|cj |η¯ u + |dj |α¯ g , (η¯ )2
with η¯ and η¯ u from (11) and α¯ g as in (12). Combining Theorem 2 or Corollary 1 with Lemma 3, it is easy to derive bounds on the Lipschitz constants of a broad class of fractional functions, including sums, products, min and max of fractional forms.
Ann Oper Res (2007) 156: 25–44
35
4 Numerical results We applied the DMIHR algorithm to different types of fractional optimization problems taken from the literature. This section presents the numerical results that we obtained. The parameters were chosen to be α = 0.01 and ε = 0.01. We tried different values for the number θ of function evaluations allowed for each restart of IHR. For each value θ , 100 sets of DMIHR were performed. The starting point was chosen randomly in the feasible set for each of the 100 sets of DMIHR. Since, in contrast to many other approaches, the number of restarts is governed automatically by our algorithm, it is listed as one of the performance criteria. The others are the number of improving points found by the algorithm and the best value found, such that is possible to compare the quality of our solution with the exact optimal value which is known in most of our test examples. 4.1 Sum-of-ratio problems Since the seminal paper by Charnes and Cooper (1962) it is known that maximizing a single ratio of linear functions over a polytope is equivalent to solving an ordinary linear problem. The multi-ratio problem remained intractable up to the 1990s, when the sum-of-ratioproblem experienced considerable attention in the fractional programming community, and several solution methods were proposed. Falk and Palocsay (1992) were the first to consider the two-ratio-case. Several Branch-and-Bound algorithms were then proposed for the case of more than two fractions, for example by Konno and Abe (1999), Dür et al. (2001), Benson (2002a, 2002b), and Kuno (2002). An interior point method was proposed by Freund and Jarre (2001). Phuong and Tuy (2003) developed an algorithm which exploits monotonicity properties of the objective function and is therefore applicable not only to the sum-of-ratio problem, but other problem classes as well. Example 1 In our first example the objective is a sum of two fractions of linear functions. The example is taken from Falk and Palocsay (1992), and was also considered by Phuong and Tuy (2003): max s.t.
3x1 + x2 − 2x3 + 0.8 4x1 − 2x2 + x3 + 2x1 − x2 + x3 7x1 + 3x2 − x3 x1 + x2 − x3 ≤ 1, −x1 + x2 − x3 ≤ −1, 12x1 + 5x2 + 12x3 ≤ 34.8, 12x1 + 12x2 + 7x3 ≤ 29.1, −6x1 + x2 + x3 ≤ −4.1, x1 , x2 , x3 ≥ 0.
The optimal solution of this problem is known to be (x1 , x2 , x3 ) = (1, 0, 0) with an objective value of 2.4714. To compute K, we need K 1 and K 2 for the two ratios. For the first ratio, we have the constants η¯ = 1.7286, η¯ u = 4.7, and α¯ g = 4.7. Using Corollary 1, we thus arrive at K 1 = 9.6962. For the second ratio, the respective values are: η¯ = 4.8286, η¯ u = 12.4, and α¯ g = 8.5, which yields K 2 = 5.2301. Hence, the Lipschitz constant bound of the objective is K = K 1 + K 2 = 14.9263.
36
Ann Oper Res (2007) 156: 25–44
Table 1 Results for Example 1 Number of restarts
θ
Number of improving
Best value found
points per restart Min
Average
Max
Min
Average
Max
Min
Average
Max
50
27
61.98
91
4
15.84
28
2.4375
2.4552
2.4697
60
12
24.94
40
9
18.44
29
2.4038
2.4536
2.4706
70
5
11.76
21
10
21.13
34
2.4131
2.4523
2.4695
80
3
6.69
12
11
23.85
36
2.3916
2.4498
2.4698
90
1
4.29
8
15
26.63
39
2.2838
2.4524
2.4700
100
1
3.15
6
16
29.01
40
2.3598
2.4507
2.4713
150
1
1.16
2
30
42.43
60
2.0070
2.4354
2.4712
200
1
1.00
1
39
55.64
71
2.1315
2.4239
2.4713
Fig. 1 Behavior of DMIHR for Example 1
The feasible set is contained in the box B = [0, 1.9] × [0, 0.91] × [0, 1.9] which gives us the bound D = 2.8369 for the diameter. Table 1 shows the results of different runs of DMIHR for this problem, whereas Fig. 1 shows the behavior of the algorithm as θ varies. As is to be expected, the number of restarts highly depends on the value θ of function evaluations performed in a single restart of IHR. For small values of θ , the algorithm finds only a small number of improving points in each restart. Therefore, the total number of restarts required to satisfy the probability constraint has to be large. In contrast, for large values of θ , the number of improving points found in a restart of IHR is sufficiently large, and restarts become more and more unnecessary. For large enough θ , new restarts are no longer required, and DMIHR behaves like ordinary IHR. It is interesting to note that the number of improving points obtained in an IHR run seems to be a linear function of θ . Example 2 In this example, the objective is a sum of three ratios of quadratic functions. This problem is taken from Benson (2002a). Using the three ratios
Ann Oper Res (2007) 156: 25–44
37
r1 (x1 , x2 , x3 , x4 ) =
−x12 + 16x1 − x22 + 16x2 − x32 + 16x3 − x42 + 16x4 − 214 , 2x1 − x2 − x3 + x4 + 2
r2 (x1 , x2 , x3 , x4 ) =
−x12 + 16x1 − 2x22 + 20x2 − 3x32 + 60x3 − 4x42 + 56x4 − 586 , −x1 + x2 + x3 − x4 + 10
r3 (x1 , x2 , x3 , x4 ) =
−x12 + 20x1 − x22 + 20x2 − x32 + 20x3 − x42 + 20x4 − 324 , x12 − 4x4
the maximization problem is max
3
ri (x1 , x2 , x3 , x4 )
i=1
s.t.
6 ≤ x1 ≤ 10, 4 ≤ x2 ≤ 6, 8 ≤ x3 ≤ 12, 6 ≤ x4 ≤ 8, x1 + x2 + x3 + x4 ≤ 34.
The optimal solution is (x1 , x2 , x3 , x4 ) = (6.00, 6.00, 10.06, 8.00) with an objective function value of 16.17. Following the methodology outlined in Sect. 3.2, we get the following bounds for the Lipschitz constants for the three ratios: K 1 = 40.0388 for r1 , K 2 = 19.9706 for r2 , and K 3 = 60.9571 for r3 . In total, the Lipschitz constant bound of this objective is K = K 1 + K 2 + K 3 = 120.9665. It is easy to obtain D = 5.2915. The results from 100 sets of DMIHR are listed in Table 2. Figure 2 shows the plots of the number of restarts and number of improving points, respectively, with varying θ . We observe that the number of restarts decreases to one very quickly as θ increases. The maximum number of improving points per restart is always equal to θ . This suggests that IHR rarely gets stalled in a local optimum hence one restart is efficient. Similar to Example 1, the number of improving points seems to be linear in θ . For this example the algorithm seems to obtain better objective values when a larger θ is used. Table 2 Results for Example 2 Number of restarts
θ
Number of improving
Best value found
points per restart Min
Average
Max
Min
Average
Max
Min
Average
Max
50
3
4.77
8
15
45.35
50
11.2456
14.0138
15.7699
60
1
2.00
4
33
55.14
60
9.6651
13.4231
15.7194
70
1
1.14
2
40
65.20
70
9.8270
12.8807
15.7036
80
1
1.04
2
49
74.38
80
9.7991
13.3481
15.8799
90
1
1.01
2
40
84.57
90
10.3190
13.7232
15.8481
100
1
1.00
1
70
94.50
100
10.2351
13.8097
15.9685
150
1
1.00
1
94
142.91
150
10.7655
14.7013
16.0408
250
1
1.00
1
216
243.06
250
12.0561
15.3356
16.1221
500
1
1.00
1
450
492.31
500
14.0531
15.5885
16.1594
38
Ann Oper Res (2007) 156: 25–44
Fig. 2 Behavior of DMIHR for Example 2
4.2 Multiplicative problems There has been a large amount of research on general multiplicative programming without the additional fractional structure, see for example Konno and Kuno (1995). For fractional multiplicative programs, an algorithm has been proposed by Konno and Abe (1999). The general monotonic programming approach by Phuong and Tuy (2003) may also be applied to these problems. Example 3 We consider an objective function which is the product of two ratios of linear functions. The example is taken from Phuong and Tuy (2003): max
3x1 + x2 − 2x3 + 0.8 4x1 − 2x2 + x3 · 2x1 − x2 + x3 7x1 + 3x2 − x3 x1 + x2 − x3 ≤ 1,
s.t.
−x1 + x2 − x3 ≤ −1, 12x1 + 5x2 + 12x3 ≤ 34.8, 12x1 + 12x2 + 7x3 ≤ 29.1, −6x1 + x2 + x3 ≤ −4.1, x1 , x2 , x3 ≥ 0. The optimal solution is (x1 , x2 , x3 ) = (1, 0, 0) with an objective function value of 1.0857. The two ratios are the same as in Example 1, so we know that K 1 = 9.6962 and K 2 = 5.2301. In order to use Lemma 3(e), we compute the bounds on both ratios on the feasible set via the Charnes–Cooper transformation from Charnes and Cooper (1962) M1 = max |r1 (x)| = max min r1 (x), max r1 (x) = max{|0|, |1.9|} = 1.9 and x∈S
x∈S
x∈S
M2 = max |r2 (x)| = max min r2 (x), max r2 (x) x∈S
x∈S
x∈S
= max{|0.3513|, |1.1569|} = 1.1569.
Ann Oper Res (2007) 156: 25–44
39
Table 3 Results for Example 3 Number of restarts
θ
Number of improving
Best value found
points per restart Min
Average
Max
Min
Average
Max
Min
Average
Max
50
71
110.97
146
4
14.96
29
1.0720
1.0788
1.0841
60
18
43.60
68
4
17.19
30
1.0695
1.0785
1.0852
70
7
21.92
39
7
19.26
33
1.0642
1.0787
1.0852
80
6
13.41
24
6
21.07
35
1.0679
1.0790
1.0846
90
2
8.72
17
11
23.12
38
1.0657
1.0797
1.0849
100
2
6.36
13
8
24.59
42
1.0612
1.0801
1.0851
150
1
2.32
5
11
33.44
52
1.0716
1.0823
1.0856
200
1
1.41
4
22
41.49
61
1.0639
1.0829
1.0856
300
1
1.05
2
28
55.59
86
1.0723
1.0847
1.0857
500
1
1.00
1
59
85.18
128
1.0848
1.0857
1.0857
Fig. 3 Behavior of DMIHR for Example 3
With Lemma 3(e), we find K ≤ M2 K1 + M1 K2 = 21.1547. For a bound on the diameter, we proceed as in Example 1 to obtain D = 2.8369. Table 3 and Fig. 3 illustrate the behavior of the algorithm. The figures are similar to those of the previous examples, and can be interpreted analogously. 4.3 Min–max problems The problem of minimizing the largest of several ratios has achieved considerable attention. A good reference for this is Schaible (1995). Example 4 The particular problem we consider here was taken from Phuong and Tuy (2003): 3x1 + x2 − 2x3 + 0.8 4x1 − 2x2 + x3 , min max 2x1 − x2 + x3 7x1 + 3x2 − x3
40
Ann Oper Res (2007) 156: 25–44
x1 + x2 − x3 ≤ 1,
s.t.
−x1 + x2 − x3 ≤ −1, 12x1 + 5x2 + 12x3 ≤ 34.8, 12x1 + 12x2 + 7x3 ≤ 29.1, −6x1 + x2 + x3 ≤ −4.1, x1 , x2 , x3 ≥ 0. The optimal solution given in Phuong and Tuy (2003) is (x1 , x2 , x3 ) = (1.015, 0.590, 1.403), with an objective value of 0.573. Again, the ratios are the same as in Example 1. We use Lemma 3(e) to find K = max{K 1 , K 2 } = 9.6962, while the value D = 2.8369 remains the same. The numbers in Table 4 suggest that this is a harder example, and it is only tractable if θ is chosen large enough. However, since ε was chosen to be 0.01, the precision of DMIHR is only expected to be good to the second decimal point. The number of restarts decreases quickly as θ grows, but as depicted in Fig. 4 we do not observe the linear dependence of the number of improving points on θ . 4.4 Problems involving absolute values Chadha (2002) introduced an algorithm to maximize a ratio of functions involving absolute values subject to linear constraints. He showed that under certain assumptions a simplex-type algorithm (i.e. an algorithm that searches adjacent extremal points) can solve this type of problem. In the absence of these assumptions, however, the algorithm fails. Table 4 Results for Example 4 Number of restarts
θ
Number of improving
Best value found
points per restart Min
Average
Max
Min
Average
Max
Min
Average
Max
90
53
130.17
186
1
12.27
28
0.5734
0.5778
0.5849
100
35
95.59
146
2
12.91
28
0.5739
0.5784
0.5840
110
23
73.42
126
2
13.55
28
0.5742
0.5780
0.5829
120
32
61.03
89
2
14.08
31
0.5737
0.5782
0.5841
130
22
47.28
82
2
14.65
31
0.5742
0.5776
0.5812
150
13
33.91
53
3
15.65
29
0.5743
0.5779
0.5829
200
4
17.34
32
3
17.83
34
0.5742
0.5778
0.5836
250
4
10.98
20
6
19.53
35
0.5744
0.5776
0.5868 0.5871
300
1
8.34
18
8
20.73
37
0.5738
0.5771
500
1
4.02
8
11
25.32
41
0.5738
0.5766
0.5866
1000
1
2.21
5
14
30.77
48
0.5733
0.5749
0.5786
5000
1
1.11
3
22
44.54
63
0.5731
0.5735
0.5745
10000
1
1.04
2
31
49.96
80
0.5731
0.5733
0.5738
15000
1
1.04
2
31
53.83
85
0.5731
0.5732
0.5735
25000
1
1.00
1
36
56.60
86
0.5731
0.5732
0.5733
Ann Oper Res (2007) 156: 25–44
41
Fig. 4 Behavior of DMIHR for Example 4
Fig. 5 Behavior of DMIHR for Example 5
Example 5 Our example is taken from Chadha (2002) and serves to illustrate an instance where his simplex-type algorithm fails to detect the solution. DMIHR easily solves the problem with high precision. The problem is max s.t.
|x1 | + |x2 | + 8 −|x1 | − |x2 | + 4 x1 + x2 ≤ 2, −x1 − x2 ≤ 1, x1 − 2x2 ≤ 2, −2x1 + x2 ≤ 2.
The optimum value of this problem is 5 which is attained for all feasible points fulfilling |x1 | + |x2 | = 2, e.g. for (x1 , x2 ) = (2, 0) and (x1 , x2 ) = (0, 2). The feasible polytope has the vertices v 1 = (2, 0), v 2 = (0, 2), v 3 = (−1, 0), and v 4 = (0, −1), which yields D = 3. A bound on the Lipschitz constant can be computed using Lemma 3(f) which yields K = 6.
42
Ann Oper Res (2007) 156: 25–44
Table 5 Results for Example 5 Number of restarts
θ
Number of improving
Best value found
points per restart Min
Average
Max
Min
Average
Max
Min
Average
Max
40
1
1.51
6
2
27.29
40
4.7037
4.9470
4.9997
50
1
1.49
4
3
30.10
50
4.6641
4.9573
4.9996
60
1
1.39
3
3
35.83
60
4.4244
4.9489
4.9995
70
1
1.16
3
3
44.78
70
4.8447
4.9739
4.9998
80
1
1.16
3
4
47.22
80
4.7521
4.9682
4.9998
90
1
1.12
3
5
57.29
90
4.8812
4.9774
4.9999
100
1
1.07
3
7
60.97
100
4.8112
4.9787
4.9999
150
1
1.13
3
6
88.46
148
4.9321
4.9885
4.9999
200
1
1.04
2
3
113.54
197
4.9622
4.9920
4.9998
250
1
1.00
1
28
142.95
246
4.9614
4.9946
4.9998
Because of the low dimensionality, already very small values of θ yield very good objective values, cf. Table 5. The algorithm finds sufficiently many improving points of good quality with small θ values, and we again observe a linear dependence of the number of improving points on θ as shown in Fig. 5. 4.5 Problems with higher dimensions The previous low-dimensional examples were taken from the literature. We next present an example with twenty variables to illustrate the performance of our algorithm in higher dimensions. To our knowledge, no existing methods have been applied to fractional problems in twenty dimensions, possibly due to time and memory limitations. Example 6 Consider the following functions with 20 variables: 20 ixi , r1 (x1 , . . . , x20 ) = i=1 20 i=1 xi 10 ix2i r2 (x1 , . . . , x20 ) = 10i=1 , i=1 x2i−1 10 x2i−1 r3 (x1 , . . . , x20 ) = i=1 , 10 i=1 ix2i and the maximization problem max
3
ri (x1 , . . . , x20 )
i=1
s.t.
1 ≤ xi ≤ 5,
i = 1, . . . , 20.
We have D = 17.885. For the Lipschitz constants of the three ratios we get K 1 = 24.0245, K 2 = 13.1101, and K 3 = 0.4334.
Ann Oper Res (2007) 156: 25–44
43
Table 6 Results for Example 6 Number of restarts
θ
Number of improving
Best value found
points per restart Min
Average
Max
Min
Average
Max
Min
Average
Max
500
11
35.17
55
152
198.63
252
29.9427
33.0959
36.0323
600
1
2.78
5
199
237.46
281
24.8406
30.4769
35.7557
700
1
1.07
2
247
276.18
319
23.0598
29.2734
34.6261
800
1
1.00
1
280
315.02
375
22.4040
28.9303
34.9401
900
1
1.00
1
310
354.02
397
23.4054
29.1442
35.1009
1000
1
1.00
1
352
390.95
452
22.8878
29.9385
37.3206
Fig. 6 Behavior of DMIHR for Example 6
Table 6 shows the results of our numerical experiments with this problem. Unfortunately, the true solution is unknown (unlike in the previous examples which were taken from the literature), so we cannot compare our solution to the “true” optimal solution. Figure 6 shows that the number of restarts decreases quickly as θ grows, and again we see that the number of improving points seems to depend linearly on θ .
5 Conclusions We have introduced an algorithm called Dynamic Multistart Improving Hit-and-Run (DMIHR) and applied it to the class of fractional optimization problems. DMIHR combines IHR, a well-established stochastic search algorithm, with restarts. The development of this algorithm is based on a theoretical analysis of Multistart Pure Adaptive Search, which relies on the Lipschitz constant of the optimization problem. We presented a method to compute Lipschitz constants for fractional functions utilizing their Lipschitz properties. We then applied DMIHR to various types of fractional optimization problems. In contrast to the existing methods, which are problem-type specific, our method provides a unified approach to solve general fractional optimization problems.
44
Ann Oper Res (2007) 156: 25–44
References Benson, H. P. (2002a). Using concave envelopes to globally solve the nonlinear sum of ratios problem. Journal of Global Optimization, 22, 343–367. Benson, H. P. (2002b). Global optimization algorithm for the nonlinear sum of ratios problem. Journal of Optimization Theory and Applications, 112, 1–29. Cambini, A., Martein, L., & Schaible, S. (1989). On maximizing a sum of ratios. Journal of Information and Optimization Science, 10, 65–79. Chadha, S. S. (2002). Fractional programming with absolute-value functions. European Journal of Operational Research, 141, 233–238. Charnes, A., & Cooper, W. W. (1962). Programming with linear fractional functionals. Naval Research Logistics Quarterly, 9, 181–186. Dür, M., Horst, R., & Thoai, N. V. (2001). Solving sum-of-ratios fractional programs using efficient points. Optimization, 49, 447–466. Falk, J. F., & Palocsay, S. W. (1992). Optimizing the sum of linear fractional functions. In C. Floudas & P. M. Pardalos (Eds.), Recent advances in global optimization (pp. 221–258). Princeton: Princeton University Press. Freund, R. W., & Jarre, F. (2001). Solving the sum-of-ratios problem by an interior point method. Journal of Global Optimization, 19, 83–102. Horst, R., Pardalos, P. M., & Thoai, N. V. (2002). Introduction to global optimization (2nd ed.). Dordrecht: Kluwer Academic. Khompatraporn, C. (2004). Analysis and development of stopping criteria for stochastic global optimization algorithms. Ph.D. Dissertation, University of Washington, Seattle, WA. Konno, H., & Abe, N. (1999). Minimization of the sum of three linear fractional functions. Journal of Global Optimization, 15, 419–432. Konno, H., & Kuno, T. (1995). Multiplicative programming problems. In R. Horst & P. M. Pardalos (Eds.), Handbook of global optimization (pp. 369–406). Dordrecht: Kluwer Academic. Kuno, T. (2002). A branch-and-bound algorithm for maximizing the sum of several linear ratios. Journal of Global Optimization, 22, 155–174. Phuong, N. T. H., & Tuy, H. (2003). A unified monotonic approach to generalized linear fractional programming. Journal of Global Optimization, 26, 229–259. Schaible, S. (1995). Fractional programming. In R. Horst & P. M. Pardalos (Eds.), Handbook of global optimization (pp. 495–608). Dordrecht: Kluwer Academic. Schaible, S., & Shi, J. (2003). Fractional programming: the sum-of-ratios case. Optimization Methods and Software, 18, 219–229. Smith, R. L. (1984). Efficient Monte Carlo procedures for generating points uniformly distributed over bounded regions. Operations Research, 32, 1296–1308. Zabinsky, Z. B. (2003). Stochastic adaptive search for global optimization. Dordrecht: Kluwer Academic. Zabinsky, Z. B., & Smith, R. L. (1992). Pure adaptive search in global optimization. Mathematical Programming, 53, 232–338. Zabinsky, Z. B., Smith, R. L., McDonald, J. F., Romeijn, H. E., & Kaufman, D. E. (1993). Improving hit-andrun for global optimization. Journal of Global Optimization, 3, 171–192.