Solving fractional problems with dynamic multistart improving hit-and-run

5 downloads 0 Views 578KB Size Report
Aug 4, 2007 - Specifically, we propose Improving Hit-and-Run with restarts, based on ..... Hit-and-Run (DMIHR) detailed in the next section is an attempt to ...
Ann Oper Res (2007) 156: 25–44 DOI 10.1007/s10479-007-0232-y

Solving fractional problems with dynamic multistart improving hit-and-run Mirjam Dür · Charoenchai Khompatraporn · Zelda B. Zabinsky

Published online: 4 August 2007 © Springer Science+Business Media, LLC 2007

Abstract Fractional programming has numerous applications in economy and engineering. While some fractional problems are easy in the sense that they are equivalent to an ordinary linear program, other problems like maximizing a sum or product of several ratios are known to be hard, as these functions are highly nonconvex and multimodal. In contrast to the standard Branch-and-Bound type algorithms proposed for specific types of fractional problems, we treat general fractional problems with stochastic algorithms developed for multimodal global optimization. Specifically, we propose Improving Hit-and-Run with restarts, based on a theoretical analysis of Multistart Pure Adaptive Search (cf. the dissertation of Khompatraporn (2004)) which prescribes a way to utilize problem specific information to sample until a certain level α of confidence is achieved. For this purpose, we analyze the Lipschitz properties of fractional functions, and then utilize a unified method to solve general fractional problems. The paper ends with a report on numerical experiments. Keywords Fractional programming · Stochastic algorithms · Global optimization · Improving hit-and-run · Lipschitz properties · Multistart · Pure adaptive search

This work was initiated while Mirjam Dür was spending a three-month research visit at the University of Washington. She would like to thank the Fulbright Commission for financial support and the optimization group at UW for their warm hospitality. The work of C. Khompatraporn and Z.B. Zabinsky was partially supported by the NSF grant DMI-0244286. M. Dür () Department of Mathematics, Darmstadt University of Technology, 64289 Darmstadt, Germany e-mail: [email protected] C. Khompatraporn Department of Production Engineering, King Mongkut’s University of Technology Thonburi, 91 Pracha-Utit Rd., Thungkru, Bangkok, 10140, Thailand e-mail: [email protected] Z.B. Zabinsky Industrial Engineering, University of Washington, Box 352650, Seattle, WA 98195, USA e-mail: [email protected]

26

Ann Oper Res (2007) 156: 25–44

1 Introduction and motivation In this paper, we consider optimization problems whose objective function involves ratios of the form g(x) h(x) where g and h are linear, quadratic, or more general functions. We aim at minimizing various functions of these ratios, for example the sum, the product or the maximum of several ratios. Optimization problems of this kind arise in many applications in economics and engineering, whenever some kind of efficiency of a system is to be maximized. A good survey on theoretical and applications oriented aspects of fractional programming can be found in Schaible (1995). Fractional problems have been studied since the 1960’s when Charnes and Cooper (1962) proposed their famous transformation to rewrite the problem of maximizing a ratio of linear functions over linear constraints as an ordinary linear program. It was soon discovered that the sum of two linear ratios is a quasiconcave function, a property that could be algorithmically exploited, cf. Cambini et al. (1989) or Falk and Palocsay (1992). The same is true for a ratio of a concave function divided by a convex function (Freund and Jarre 2001). However, the sum of more than two linear fractions does not enjoy generalized convexity properties and is generally nonconvex with possibly numerous local optima. Several algorithms such as those in Benson (2002a, 2002b), Dür et al. (2001), Konno and Abe (1999) have been proposed to maximize such a function over a polytope. A recent survey article on this problem is Schaible and Shi (2003). Other new contributions to the field are Chadha’s (2002) simplex-type procedure for fractions of functions involving absolute-value functions and an approach exploiting monotonicity in the objective by Phuong and Tuy (2003), which can also handle products of ratios and max–min problems. We thus find the following situation: Many different types of fractional problems have been studied, and different algorithms have been proposed for each class. Most of the algorithms involve sophisticated techniques like Branch-and-Bound which require a certain effort to implement. At this time there is no unified approach to the different fractional problem types (except for, to a certain extent, Phuong and Tuy 2003). Our approach is conceptually different from the many existing deterministic methods. We propose to use a stochastic algorithm called Improving Hit-and-Run (Zabinsky et al. 1993) with restarts, which is easy to implement and applicable to all existing classes of fractional problems. The only parameters needed to be specified are the length (number of function evaluations) of a restart and the total number of restarts. As we show, the length of a restart and the total number of restarts can be chosen based on a theoretical analysis of Multistart Pure Adaptive Search (Khompatraporn 2004), which yields an α-confidence of being within ε of the global optimum. In a later section, numerical experiments demonstrate that Improving Hit-and-Run with restarts is very reliable in terms of accuracy of the solution. The remainder of the paper is organized as follows. We first describe the stochastic algorithms Pure Adaptive Search (PAS), Improving Hit-and-Run (IHR), and Multistart Pure Adaptive Search (MPAS) on which our work is based. In Sect. 3, we develop a new algorithm that we term Dynamic Multistart Improving Hit-and-Run (DMIHR). This algorithm utilizes Lipschitz properties of the objective function. We therefore show how to derive Lipschitz constants for fractional problems. The last section demonstrates by means of several examples from the fractional programming literature the versatility of our approach.

Ann Oper Res (2007) 156: 25–44

27

2 Background on stochastic algorithms In this section, we briefly describe the stochastic algorithms used in this paper. The description and analysis, however, are not restricted to fractional programs, but to the general minimization program min f (x) s.t.

x∈S

where f : R → R is a continuous function and S is a compact set in Rn . The solution set to this program, x∗ = arg minx∈S f (x), is assumed to be a nonempty subset of S. The corresponding optimal objective function value is denoted by y∗ = f (x∗ ). We also denote the maximal objective function value by y ∗ = maxx∈S f (x). While a stochastic algorithm cannot guarantee the exact global optimum, it can (under the assumption that the probability of sampling in level sets is known) ensure the quality of the solution by guaranteeing that the probability of ending up with an estimated optimal value  y differing more than ε from the true optimal value y∗ is less than a given level of confidence α. Hence, we are content if the following is satisfied: n

P ( y ≤ y∗ + ε) ≥ 1 − α

(1)

where  y denotes the estimate of the optimal objective function value (using the best objective function value found), ε > 0 is a prescribed precision, and α ∈ (0, 1). 2.1 Pure adaptive search (PAS) Pure Adaptive Search (PAS) is still an ideal algorithm, but it has desirable theoretical properties. PAS (see Zabinsky 2003 or Zabinsky and Smith 1992) works as follows: Pure Adaptive Search Step 0. Generate a uniformly distributed starting point X0 ∈ S. Set Y0 = f (X0 ) and a counter k = 0. Step 1. Generate Xk+1 according to a uniform distribution on the improving level set S(Yk ) = {x ∈ S: f (x) ≤ Yk }. Set Yk+1 = f (Xk+1 ). Step 2. If a stopping criterion is met, stop. Otherwise, increment k and return to Step 1. The crucial point clearly lies in the improving level sets S(y) = {x ∈ S: f (x) ≤ y}. For a general continuous function, these improving level sets are nested, and sampling uniformly from them still poses a challenge. The reason that PAS is nonetheless of interest is the fact that its expected number of iterations to obtain an ε-optimal solution is linear in the problem dimension, holding certain parameters constant (Zabinsky and Smith 1992). This property is desirable for practical purposes. Let p(y) denote the probability that a point X which is sampled uniformly from S lies in a level set S(y), i.e. p(y) = P (X ∈ S(y)). Then PAS has the following property, which is characterized by p(y).

28

Ann Oper Res (2007) 156: 25–44

Theorem 1 (Zabinsky and Smith 1992) Let YkPAS denote the objective function value obtained through PAS in iteration k, and let y∗ ≤ y ≤ y ∗ . Then k    p(y)(ln(1/p(y)))i P YkPAS ≤ y = i! i=0

(2)

for k = 1, 2, . . . , where p(y) = P (X ∈ S(y)). Proof See Zabinsky and Smith (1992, Theorem 4.3).



This property is needed to derive a theoretical property of the next MPAS algorithm as well as for later analysis. 2.2 Multistart pure adaptive search (MPAS) Multistart Pure Adaptive Search was recently introduced in Khompatraporn (2004). It is an attempt to formulate a strategy of when to stop a run of a stochastic algorithm and how often to restart when the algorithm gets trapped in a local optimum. Like PAS, MPAS is a theoretical algorithm, but we use its analysis to motivate a stopping and restarting strategy for IHR. The MPAS algorithm can be described as follows: Multistart Pure Adaptive Search Step 0. Specify the number of independent restarts r ∈ N+ and the PAS sequence length r,s = +∞. s ∈ N. Set iteration counter i = 1, and let Y Step 1. Generate a uniformly distributed starting point Xi,0 ∈ S. Set Yi,0 = f (Xi,0 ). Set r,s }. Set j = 0. r,s = min{Yi,0 , Y Y Step 2. If j = s, go to Step 4. Otherwise, go to Step 3. Step 3. Generate Xi,j +1 ∈ S(Yi,j ) = {x ∈ S: f (x) ≤ Yi,j } according to a uniform distribr,s = min{Yi,j +1 , Y r,s }. Increment j and ution. Set Yi,j +1 = f (Xi,j +1 ) and update Y return to Step 2. Step 4. If i = r, stop. Otherwise, increment i and return to Step 1. MPAS Let Yr,s be the best objective function value found by MPAS after r independent restarts, each with s PAS iterations. Utilizing the independence of the restarts and the result from Theorem 1 it can be shown (see Khompatraporn 2004 for more details) that

   r  MPAS ≤ y = 1 − 1 − P YsPAS ≤ y P Yr,s  r s  p(y)(ln(1/p(y)))i = 1− 1− . i! i=0

(3)

Similarly, if different lengths of the PAS sequences are used, say run k performs sk PAS iterations where k ∈ {1, 2, . . . , r}, then   sk r     MPAS p(y)(ln(1/p(y)))i 1− . (4) P Yr,{s1 ,s2 ,...,sr } ≤ y = 1 − i! k=1 i=0

Ann Oper Res (2007) 156: 25–44

29

2.3 Improving hit-and-run (IHR) Improving Hit-and-Run is an implementable approximation to Pure Adaptive Search. It was first introduced by Zabinsky et al. (1993). A thorough overview of its development since that time can be found in Zabinsky (2003). IHR uses the Hit-and-Run generator introduced by Smith (1984) to approximate uniformly generated points from the improving level sets. The formal description of the algorithm is as follows: Improving Hit-and-Run Step 0. Initialize X0 ∈ S, Y0 = f (X0 ), and set k = 0. Step 1. Generate a random direction Dk uniformly distributed on the boundary of the unit hypersphere. Step 2. Generate a candidate point Wk+1 = Xk + λDk by sampling uniformly over the line set Lk = {x ∈ S: x = Xk + λDk , λ a real scalar}. If Lk = ∅, go to Step 1. Step 3. Update the current point Xk+1 with the candidate point if it is improving, i.e. set Wk+1 if f (Wk+1 ) < Yk , Xk+1 = otherwise Xk and set Yk+1 = f (Xk+1 ). Step 4. If a stopping criterion is met, stop. Otherwise, increment k and return to Step 1. It has been observed that when the approximation of IHR to PAS deteriorates, it is useful to restart IHR. Up to now, however, we have been lacking a theoretical structure to motivate when to restart IHR and the number of restarts to perform. Dynamic Multistart Improving Hit-and-Run (DMIHR) detailed in the next section is an attempt to address this issue.

3 Main results We first define DMIHR in Sect. 3.1, and then apply it to general fractional programs. In order to adapt the theoretical analysis of MPAS that motivates DMIHR, we need the Lipschitz properties of fractional functions which are derived in Sect. 3.2. Then in Sect. 4, we apply DMIHR to several fractional programs of various types to demonstrate the versatility of this new method. 3.1 Dynamic multistart improving hit-and-run We first discuss how the probability constraint (1) is used to motivate our definition of DMIHR. The density function p(y) in (3) is generally difficult to obtain, but can be bounded when the objective function satisfies the Lipschitz condition. Similarly, the probability constraint described by (4) motivates a practical stopping and restarting strategy using IHR when p(y) can be bounded. Recall that a function f : Rn → R is said to fulfill the Lipschitz condition on S ⊆ Rn if there exists a Lipschitz constant K ≥ 0 such that |f (x) − f (y)| ≤ K x − y for all x, y ∈ S.

30

Ann Oper Res (2007) 156: 25–44

The next lemma provides a means to bound p(y), knowing only the diameter of the feasible region and the Lipschitz constant for the original problem. Lemma 1 (Zabinsky and Smith 1992) For a global optimization problem over a convex feasible region S in n dimensions with diameter D and Lipschitz constant K for the objective function, and for y∗ ≤ y ≤ y ∗ ,

y − y∗ n . (5) p(y) ≥ KD 

Proof See Zabinsky and Smith (1992, Lemma 5.2).

The Lipschitz constant K of the objective function is usually not known, but for fractional functions we can find an upper bound K ≥ K using Theorem 2 in the next section. ¯ on p(y): Substituting K into (5) gives a lower bound p(y)

p(y) ≥

y − y∗ KD



n ≥

y − y∗ KD

n =: p(y). ¯

(6)

Returning to Multistart Pure Adaptive Search, to satisfy the probability constraint of being within ε of the global optimum with 1 − α certainty, we essentially want to satisfy the following probability constraint, P



MPAS Yr,s





≤ y∗ + ε ≥ 1 − 1 −

s  p(y ¯ ∗ + ε)(ln(1/p(y ¯ ∗ + ε)))i

r

i!

i=0

≥ 1 − α.

(7)

Observe that p(y ¯ ∗ + ε) = (ε/KD)n . Hence, it is not necessary to know y∗ to determine p(y ¯ ∗ + ε). In the case that not every sequence of PAS carried out in MPAS has the same length s, but run k of PAS has a sequence length sk , we arrive at the constraint 1−

 r  k=1

1−

sk  p(y ¯ ∗ + ε)(ln(1/p(y ¯ ∗ + ε)))i i=0

i!

 ≥ 1 − α.

(8)

We imitate PAS behavior by supposing that the improving points obtained in a sequence of IHR are sampled approximately uniformly on the improving level sets, and we treat the number of improving points in restart k as an approximation to the number of PAS iterations sk . Then we determine the number of restarts of IHR so that the probability constraint (8) is satisfied. In this way the theory of MPAS motivates the algorithm DMIHR as formally described below. Dynamic Multistart Improving Hit-and-Run Step 0. Select α, ε, and the maximum number θ of function evaluations for a single run of IHR. Obtain the parameters K, D, n of the problem. Set j = 1. Step 1. Execute θ iterations of IHR. Record the number of improving points sj , as well as j ) and its associated soluj = f (X the best objective function value of run j by Y j . tion X

Ann Oper Res (2007) 156: 25–44

31

DMIHR found so far by Step 2. Update the best objective function value Y DMIHR = min {Y i } Y i=1,2,...,j

and DMIHR = arg min X

i=1,2,...,j



i ) . f (X

Step 3. Calculate the probability bound Pε of being close to the ε-optimal region from   sk j   (ε/KD)n (ln(KD/ε)n )i 1− . Pε = 1 − i! k=1 i=0 Step 4. If Pε ≥ 1 − α, stop. Otherwise, increment j and go to Step 1. Note that in practice sk is likely to be different for each restart. The total number of restarts in DMIHR is not predetermined at the beginning of the algorithm, but is the smallest j for which   sk j   p(y ¯ ∗ + ε)(ln(1/p(y ¯ ∗ + ε)))i 1− 1− ≥ 1 − α, (9) i! k=1 i=0 and hence the name dynamic MIHR. The total number of restarts is not predetermined by the user, but is determined based on the sequence of improving points achieved during execution. Notice that DMIHR is a finite algorithm. Even in the worst case when sk = 1 for each of the restarts, the total number of restarts to satisfy the probability constraint could be large but finite. One advantage of DMIHR is that it is flexible in a sense that it allows the user to stop sampling whenever s/he wants, but yet is able to approximate the probability of being close to the optimum at any stopping time. The user can see the tradeoff between computational effort and desired probability of being close to the optimum. The method potentially saves some samplings when some starting points lead to better-than-expected acquisition of improving points within a restart. Moreover, it is possible that the predetermined number of function evaluations for each of the restarts may be too few to obtain exactly s improving points if the original MPAS was to be implemented. While DMIHR has a means to determine the number of restarts, the maximum number of function evaluations for a single restart is still arbitrarily set by the user. Section 3.2 provides a methodology to compute K. The dimension n is known by the problem, ε is chosen by the user to reflect the desired level of accuracy, and D (the remaining ingredient for p(y ¯ ∗ + ε)) can be computed as follows. If the feasible set S is a hyperrectangle S = [x  , x u ], then its diameter is given by D = x u − x  2 . If S is a polytope with vertices v 1 , . . . , v N , then D can be computed from D=

max

(i,j )∈{1,...,N}2

v i − v j 2 .

If the vertices are unknown or too costly to compute, then the polytope can be embedded in a hyperrectangle to derive a bound on D.

32

Ann Oper Res (2007) 156: 25–44

3.2 Lipschitz properties of fractional functions In order to apply DMIHR to general fractional programs, including sums, products, max or min, and absolute value of ratios, we discuss the Lipschitz property of the objective function. It is well known that the Lipschitz property is related to differentiability of the objective function through the following lemma. Lemma 2 Let S be a convex and compact subset in Rn , and let f be continuously differentiable on an open set containing S. Then f is Lipschitzian on S with the constant

K = max ∇f (x) : x ∈ S . (10) Proof See Horst et al. (2002, Proposition 5.1).



In the next lemma we recall some useful properties of Lipschitz functions. Lemma 3 Let fi (i = 1, . . . , p) be Lipschitz functions on a compact set S ⊆ Rn with Lipschitz constants Ki , and let γ ∈ R. Then the following functions also fulfill the Lipschitz condition on S: (a)  γfi , with Lipschitz constant equal to |γ |Ki ; p p (b) i=1 fi , with Lipschitz constant less than or equal to i=1 Ki ; (c) both mini=1,...,p fi and maxi=1,...,p fi , with Lipschitz constant less than or equal to maxi=1,...,p Ki ; (d) |fi |, with Lipschitz constant equal to Ki ; (e) f1 · f2 , with Lipschitz constant less than or equal to M2 K1 + M1 K2 , where Mi stands for Mi = max{|fi (x)|: x ∈ S} for i = 1, 2; 1 1 K2 (f) ff12 , with Lipschitz constant less than or equal to ( K +M ) if f2 > 0 on S, where m2 (m2 )2 m2 = min{f2 (x): x ∈ S} and M1 = max{|f1 (x)|: x ∈ S}. Proof We show here only the proof of (e) and (f). The other statements are straightforward from the definition. In (e), we have |f1 (x)f2 (x) − f1 (y)f2 (y)| = |f1 (x)f2 (x) − f1 (y)f2 (x) + f1 (y)f2 (x) − f1 (y)f2 (y)| ≤ |[f1 (x) − f1 (y)]f2 (x)| + |f1 (y)[f2 (x) − f2 (y)]| ≤ |f1 (x) − f1 (y)||f2 (x)| + |f1 (y)||f2 (x) − f2 (y)| ≤ (M2 K1 + M1 K2 ) x − y . To show (f), observe that      f1 (x) f1 (y)   f1 (x)f2 (y) − f1 (y)f2 (x)   =  −  f (x) f (y)    f2 (x)f2 (y) 2 2 ≤

1 |f1 (x)f2 (y) − f1 (y)f2 (y) + f1 (y)f2 (y) − f1 (y)f2 (x)| f2 (x)f2 (y)

1 (|f1 (x) − f1 (y)||f2 (y)| + |f1 (y)||f2 (y) − f2 (x)|) f2 (x)f2 (y)

K 1 M1 K 2 ≤ +

x − y . m2 (m2 )2 ≤



Ann Oper Res (2007) 156: 25–44

33

For differentiable functions, the above estimate on the Lipschitz constant in part (f) of Lemma 3 can be sharpened. Consider a fractional function of the form f (x) = g(x) with g, h(x) h differentiable. We also assume that h(x) > 0 on the feasible set. The assumption on h(x) imposes no restriction on generality since a change of the sign in the numerator can accommodate the sign of the denominator and render the maximization problem to a minimization one. The next theorem shows that g/ h that fulfills the above properties is Lipschitzian, and we derive a bound on the corresponding Lipschitz constant. To aid the proof, it is convenient to define the following constants: η¯  := min h(x) and x∈S

η¯ u := max h(x) x∈S

(11)

the minimum and maximum values of the denominator on the feasible set. Note that, by assumption, we have η¯ u ≥ η¯  > 0. Similarly, it will be necessary to have bounds on the absolute values of the numerator:   α¯ g := max |g(x)| = max | min g(x)|, | max g(x)| x∈S

x∈S

x∈S

(12)

and on the partial derivatives of both numerator and denominator:        ∂g(x)   ∂g(x)   ∂g(x)     := max = max min , max x∈S ∂xj   x∈S ∂xj  x∈S ∂xj 

(13)

       ∂h(x)       = max min ∂h(x) , max ∂h(x)  . αjh := max     x∈S ∂xj x∈S x∈S ∂xj ∂xj 

(14)

g αj

and

: Rn → R with differentiable functions g(x) Theorem 2 A fractional function f (x) = g(x) h(x) and h(x), and h(x) > 0 on a compact convex set S, satisfies the Lipschitz condition on S. Its Lipschitz constant K is bounded by K ≤ K := M 2 ,

(15)

where the components Mj (j = 1, 2, . . . , n) of M are given by g

Mj =

αj η¯ u + αjh α¯ g (η¯  )2

(16)

,

and the constants involved are defined in (11–14). Proof It follows from Lemma 2 that f is Lipschitzian. To estimate the Lipschitz constant given in (10), observe that ∂f (x) = ∂xj

∂g(x) h(x) − ∂h(x) g(x) ∂xj ∂xj

(h(x))2

.

34

Ann Oper Res (2007) 156: 25–44

Since        ∂g(x)   ∂g(x)   ∂h(x)  ∂h(x)  ≤ +  h(x) − g(x) h(x) g(x)  ∂x      ∂xj ∂xj ∂xj j       ∂g(x)    h(x) +  ∂h(x)  |g(x)| ≤    ∂xj ∂xj  g

≤ αj η¯ u + αjh α¯ g , we have    ∂f (x)  αjg η¯ u + αjh α¯ g   = Mj  ∂x  ≤ (η¯  )2 j on the feasible set, whence K = max ∇f (x) ≤ M , x∈S



with Mj defined in (16).

c,x +a If both numerator and denominator are linear functions, i.e. f (x) = d,x +b with c, d ∈ Rn and a, b ∈ R, then the estimate becomes simpler. If the feasible set is a polytope, then the constants η¯  and η¯ u from (11) and α¯ g from (12) are easily computed using standard linear programming techniques (cf. Charnes and Cooper 1962). g In this case, the constants αj and αjh from (13) and (14) simplify to g

αj = |cj | and

αjh = |dj |.

Therefore, for linear fractional functions we have the following result: c,x +a Corollary 1 A fractional function f (x) = d,x +b which fulfills d, x + b > 0 on a compact and convex set S, satisfies the Lipschitz condition on S. Its Lipschitz constant K is bounded by

K ≤ K := M , where the components Mj (j = 1, . . . , n) of M are given by Mj =

|cj |η¯ u + |dj |α¯ g , (η¯  )2

with η¯  and η¯ u from (11) and α¯ g as in (12). Combining Theorem 2 or Corollary 1 with Lemma 3, it is easy to derive bounds on the Lipschitz constants of a broad class of fractional functions, including sums, products, min and max of fractional forms.

Ann Oper Res (2007) 156: 25–44

35

4 Numerical results We applied the DMIHR algorithm to different types of fractional optimization problems taken from the literature. This section presents the numerical results that we obtained. The parameters were chosen to be α = 0.01 and ε = 0.01. We tried different values for the number θ of function evaluations allowed for each restart of IHR. For each value θ , 100 sets of DMIHR were performed. The starting point was chosen randomly in the feasible set for each of the 100 sets of DMIHR. Since, in contrast to many other approaches, the number of restarts is governed automatically by our algorithm, it is listed as one of the performance criteria. The others are the number of improving points found by the algorithm and the best value found, such that is possible to compare the quality of our solution with the exact optimal value which is known in most of our test examples. 4.1 Sum-of-ratio problems Since the seminal paper by Charnes and Cooper (1962) it is known that maximizing a single ratio of linear functions over a polytope is equivalent to solving an ordinary linear problem. The multi-ratio problem remained intractable up to the 1990s, when the sum-of-ratioproblem experienced considerable attention in the fractional programming community, and several solution methods were proposed. Falk and Palocsay (1992) were the first to consider the two-ratio-case. Several Branch-and-Bound algorithms were then proposed for the case of more than two fractions, for example by Konno and Abe (1999), Dür et al. (2001), Benson (2002a, 2002b), and Kuno (2002). An interior point method was proposed by Freund and Jarre (2001). Phuong and Tuy (2003) developed an algorithm which exploits monotonicity properties of the objective function and is therefore applicable not only to the sum-of-ratio problem, but other problem classes as well. Example 1 In our first example the objective is a sum of two fractions of linear functions. The example is taken from Falk and Palocsay (1992), and was also considered by Phuong and Tuy (2003): max s.t.

3x1 + x2 − 2x3 + 0.8 4x1 − 2x2 + x3 + 2x1 − x2 + x3 7x1 + 3x2 − x3 x1 + x2 − x3 ≤ 1, −x1 + x2 − x3 ≤ −1, 12x1 + 5x2 + 12x3 ≤ 34.8, 12x1 + 12x2 + 7x3 ≤ 29.1, −6x1 + x2 + x3 ≤ −4.1, x1 , x2 , x3 ≥ 0.

The optimal solution of this problem is known to be (x1 , x2 , x3 ) = (1, 0, 0) with an objective value of 2.4714. To compute K, we need K 1 and K 2 for the two ratios. For the first ratio, we have the constants η¯  = 1.7286, η¯ u = 4.7, and α¯ g = 4.7. Using Corollary 1, we thus arrive at K 1 = 9.6962. For the second ratio, the respective values are: η¯  = 4.8286, η¯ u = 12.4, and α¯ g = 8.5, which yields K 2 = 5.2301. Hence, the Lipschitz constant bound of the objective is K = K 1 + K 2 = 14.9263.

36

Ann Oper Res (2007) 156: 25–44

Table 1 Results for Example 1 Number of restarts

θ

Number of improving

Best value found

points per restart Min

Average

Max

Min

Average

Max

Min

Average

Max

50

27

61.98

91

4

15.84

28

2.4375

2.4552

2.4697

60

12

24.94

40

9

18.44

29

2.4038

2.4536

2.4706

70

5

11.76

21

10

21.13

34

2.4131

2.4523

2.4695

80

3

6.69

12

11

23.85

36

2.3916

2.4498

2.4698

90

1

4.29

8

15

26.63

39

2.2838

2.4524

2.4700

100

1

3.15

6

16

29.01

40

2.3598

2.4507

2.4713

150

1

1.16

2

30

42.43

60

2.0070

2.4354

2.4712

200

1

1.00

1

39

55.64

71

2.1315

2.4239

2.4713

Fig. 1 Behavior of DMIHR for Example 1

The feasible set is contained in the box B = [0, 1.9] × [0, 0.91] × [0, 1.9] which gives us the bound D = 2.8369 for the diameter. Table 1 shows the results of different runs of DMIHR for this problem, whereas Fig. 1 shows the behavior of the algorithm as θ varies. As is to be expected, the number of restarts highly depends on the value θ of function evaluations performed in a single restart of IHR. For small values of θ , the algorithm finds only a small number of improving points in each restart. Therefore, the total number of restarts required to satisfy the probability constraint has to be large. In contrast, for large values of θ , the number of improving points found in a restart of IHR is sufficiently large, and restarts become more and more unnecessary. For large enough θ , new restarts are no longer required, and DMIHR behaves like ordinary IHR. It is interesting to note that the number of improving points obtained in an IHR run seems to be a linear function of θ . Example 2 In this example, the objective is a sum of three ratios of quadratic functions. This problem is taken from Benson (2002a). Using the three ratios

Ann Oper Res (2007) 156: 25–44

37

r1 (x1 , x2 , x3 , x4 ) =

−x12 + 16x1 − x22 + 16x2 − x32 + 16x3 − x42 + 16x4 − 214 , 2x1 − x2 − x3 + x4 + 2

r2 (x1 , x2 , x3 , x4 ) =

−x12 + 16x1 − 2x22 + 20x2 − 3x32 + 60x3 − 4x42 + 56x4 − 586 , −x1 + x2 + x3 − x4 + 10

r3 (x1 , x2 , x3 , x4 ) =

−x12 + 20x1 − x22 + 20x2 − x32 + 20x3 − x42 + 20x4 − 324 , x12 − 4x4

the maximization problem is max

3 

ri (x1 , x2 , x3 , x4 )

i=1

s.t.

6 ≤ x1 ≤ 10, 4 ≤ x2 ≤ 6, 8 ≤ x3 ≤ 12, 6 ≤ x4 ≤ 8, x1 + x2 + x3 + x4 ≤ 34.

The optimal solution is (x1 , x2 , x3 , x4 ) = (6.00, 6.00, 10.06, 8.00) with an objective function value of 16.17. Following the methodology outlined in Sect. 3.2, we get the following bounds for the Lipschitz constants for the three ratios: K 1 = 40.0388 for r1 , K 2 = 19.9706 for r2 , and K 3 = 60.9571 for r3 . In total, the Lipschitz constant bound of this objective is K = K 1 + K 2 + K 3 = 120.9665. It is easy to obtain D = 5.2915. The results from 100 sets of DMIHR are listed in Table 2. Figure 2 shows the plots of the number of restarts and number of improving points, respectively, with varying θ . We observe that the number of restarts decreases to one very quickly as θ increases. The maximum number of improving points per restart is always equal to θ . This suggests that IHR rarely gets stalled in a local optimum hence one restart is efficient. Similar to Example 1, the number of improving points seems to be linear in θ . For this example the algorithm seems to obtain better objective values when a larger θ is used. Table 2 Results for Example 2 Number of restarts

θ

Number of improving

Best value found

points per restart Min

Average

Max

Min

Average

Max

Min

Average

Max

50

3

4.77

8

15

45.35

50

11.2456

14.0138

15.7699

60

1

2.00

4

33

55.14

60

9.6651

13.4231

15.7194

70

1

1.14

2

40

65.20

70

9.8270

12.8807

15.7036

80

1

1.04

2

49

74.38

80

9.7991

13.3481

15.8799

90

1

1.01

2

40

84.57

90

10.3190

13.7232

15.8481

100

1

1.00

1

70

94.50

100

10.2351

13.8097

15.9685

150

1

1.00

1

94

142.91

150

10.7655

14.7013

16.0408

250

1

1.00

1

216

243.06

250

12.0561

15.3356

16.1221

500

1

1.00

1

450

492.31

500

14.0531

15.5885

16.1594

38

Ann Oper Res (2007) 156: 25–44

Fig. 2 Behavior of DMIHR for Example 2

4.2 Multiplicative problems There has been a large amount of research on general multiplicative programming without the additional fractional structure, see for example Konno and Kuno (1995). For fractional multiplicative programs, an algorithm has been proposed by Konno and Abe (1999). The general monotonic programming approach by Phuong and Tuy (2003) may also be applied to these problems. Example 3 We consider an objective function which is the product of two ratios of linear functions. The example is taken from Phuong and Tuy (2003): max

3x1 + x2 − 2x3 + 0.8 4x1 − 2x2 + x3 · 2x1 − x2 + x3 7x1 + 3x2 − x3 x1 + x2 − x3 ≤ 1,

s.t.

−x1 + x2 − x3 ≤ −1, 12x1 + 5x2 + 12x3 ≤ 34.8, 12x1 + 12x2 + 7x3 ≤ 29.1, −6x1 + x2 + x3 ≤ −4.1, x1 , x2 , x3 ≥ 0. The optimal solution is (x1 , x2 , x3 ) = (1, 0, 0) with an objective function value of 1.0857. The two ratios are the same as in Example 1, so we know that K 1 = 9.6962 and K 2 = 5.2301. In order to use Lemma 3(e), we compute the bounds on both ratios on the feasible set via the Charnes–Cooper transformation from Charnes and Cooper (1962)         M1 = max |r1 (x)| = max min r1 (x), max r1 (x) = max{|0|, |1.9|} = 1.9 and x∈S

x∈S

x∈S

        M2 = max |r2 (x)| = max min r2 (x), max r2 (x) x∈S

x∈S

x∈S

= max{|0.3513|, |1.1569|} = 1.1569.

Ann Oper Res (2007) 156: 25–44

39

Table 3 Results for Example 3 Number of restarts

θ

Number of improving

Best value found

points per restart Min

Average

Max

Min

Average

Max

Min

Average

Max

50

71

110.97

146

4

14.96

29

1.0720

1.0788

1.0841

60

18

43.60

68

4

17.19

30

1.0695

1.0785

1.0852

70

7

21.92

39

7

19.26

33

1.0642

1.0787

1.0852

80

6

13.41

24

6

21.07

35

1.0679

1.0790

1.0846

90

2

8.72

17

11

23.12

38

1.0657

1.0797

1.0849

100

2

6.36

13

8

24.59

42

1.0612

1.0801

1.0851

150

1

2.32

5

11

33.44

52

1.0716

1.0823

1.0856

200

1

1.41

4

22

41.49

61

1.0639

1.0829

1.0856

300

1

1.05

2

28

55.59

86

1.0723

1.0847

1.0857

500

1

1.00

1

59

85.18

128

1.0848

1.0857

1.0857

Fig. 3 Behavior of DMIHR for Example 3

With Lemma 3(e), we find K ≤ M2 K1 + M1 K2 = 21.1547. For a bound on the diameter, we proceed as in Example 1 to obtain D = 2.8369. Table 3 and Fig. 3 illustrate the behavior of the algorithm. The figures are similar to those of the previous examples, and can be interpreted analogously. 4.3 Min–max problems The problem of minimizing the largest of several ratios has achieved considerable attention. A good reference for this is Schaible (1995). Example 4 The particular problem we consider here was taken from Phuong and Tuy (2003):  3x1 + x2 − 2x3 + 0.8 4x1 − 2x2 + x3 , min max 2x1 − x2 + x3 7x1 + 3x2 − x3

40

Ann Oper Res (2007) 156: 25–44

x1 + x2 − x3 ≤ 1,

s.t.

−x1 + x2 − x3 ≤ −1, 12x1 + 5x2 + 12x3 ≤ 34.8, 12x1 + 12x2 + 7x3 ≤ 29.1, −6x1 + x2 + x3 ≤ −4.1, x1 , x2 , x3 ≥ 0. The optimal solution given in Phuong and Tuy (2003) is (x1 , x2 , x3 ) = (1.015, 0.590, 1.403), with an objective value of 0.573. Again, the ratios are the same as in Example 1. We use Lemma 3(e) to find K = max{K 1 , K 2 } = 9.6962, while the value D = 2.8369 remains the same. The numbers in Table 4 suggest that this is a harder example, and it is only tractable if θ is chosen large enough. However, since ε was chosen to be 0.01, the precision of DMIHR is only expected to be good to the second decimal point. The number of restarts decreases quickly as θ grows, but as depicted in Fig. 4 we do not observe the linear dependence of the number of improving points on θ . 4.4 Problems involving absolute values Chadha (2002) introduced an algorithm to maximize a ratio of functions involving absolute values subject to linear constraints. He showed that under certain assumptions a simplex-type algorithm (i.e. an algorithm that searches adjacent extremal points) can solve this type of problem. In the absence of these assumptions, however, the algorithm fails. Table 4 Results for Example 4 Number of restarts

θ

Number of improving

Best value found

points per restart Min

Average

Max

Min

Average

Max

Min

Average

Max

90

53

130.17

186

1

12.27

28

0.5734

0.5778

0.5849

100

35

95.59

146

2

12.91

28

0.5739

0.5784

0.5840

110

23

73.42

126

2

13.55

28

0.5742

0.5780

0.5829

120

32

61.03

89

2

14.08

31

0.5737

0.5782

0.5841

130

22

47.28

82

2

14.65

31

0.5742

0.5776

0.5812

150

13

33.91

53

3

15.65

29

0.5743

0.5779

0.5829

200

4

17.34

32

3

17.83

34

0.5742

0.5778

0.5836

250

4

10.98

20

6

19.53

35

0.5744

0.5776

0.5868 0.5871

300

1

8.34

18

8

20.73

37

0.5738

0.5771

500

1

4.02

8

11

25.32

41

0.5738

0.5766

0.5866

1000

1

2.21

5

14

30.77

48

0.5733

0.5749

0.5786

5000

1

1.11

3

22

44.54

63

0.5731

0.5735

0.5745

10000

1

1.04

2

31

49.96

80

0.5731

0.5733

0.5738

15000

1

1.04

2

31

53.83

85

0.5731

0.5732

0.5735

25000

1

1.00

1

36

56.60

86

0.5731

0.5732

0.5733

Ann Oper Res (2007) 156: 25–44

41

Fig. 4 Behavior of DMIHR for Example 4

Fig. 5 Behavior of DMIHR for Example 5

Example 5 Our example is taken from Chadha (2002) and serves to illustrate an instance where his simplex-type algorithm fails to detect the solution. DMIHR easily solves the problem with high precision. The problem is max s.t.

|x1 | + |x2 | + 8 −|x1 | − |x2 | + 4 x1 + x2 ≤ 2, −x1 − x2 ≤ 1, x1 − 2x2 ≤ 2, −2x1 + x2 ≤ 2.

The optimum value of this problem is 5 which is attained for all feasible points fulfilling |x1 | + |x2 | = 2, e.g. for (x1 , x2 ) = (2, 0) and (x1 , x2 ) = (0, 2). The feasible polytope has the vertices v 1 = (2, 0), v 2 = (0, 2), v 3 = (−1, 0), and v 4 = (0, −1), which yields D = 3. A bound on the Lipschitz constant can be computed using Lemma 3(f) which yields K = 6.

42

Ann Oper Res (2007) 156: 25–44

Table 5 Results for Example 5 Number of restarts

θ

Number of improving

Best value found

points per restart Min

Average

Max

Min

Average

Max

Min

Average

Max

40

1

1.51

6

2

27.29

40

4.7037

4.9470

4.9997

50

1

1.49

4

3

30.10

50

4.6641

4.9573

4.9996

60

1

1.39

3

3

35.83

60

4.4244

4.9489

4.9995

70

1

1.16

3

3

44.78

70

4.8447

4.9739

4.9998

80

1

1.16

3

4

47.22

80

4.7521

4.9682

4.9998

90

1

1.12

3

5

57.29

90

4.8812

4.9774

4.9999

100

1

1.07

3

7

60.97

100

4.8112

4.9787

4.9999

150

1

1.13

3

6

88.46

148

4.9321

4.9885

4.9999

200

1

1.04

2

3

113.54

197

4.9622

4.9920

4.9998

250

1

1.00

1

28

142.95

246

4.9614

4.9946

4.9998

Because of the low dimensionality, already very small values of θ yield very good objective values, cf. Table 5. The algorithm finds sufficiently many improving points of good quality with small θ values, and we again observe a linear dependence of the number of improving points on θ as shown in Fig. 5. 4.5 Problems with higher dimensions The previous low-dimensional examples were taken from the literature. We next present an example with twenty variables to illustrate the performance of our algorithm in higher dimensions. To our knowledge, no existing methods have been applied to fractional problems in twenty dimensions, possibly due to time and memory limitations. Example 6 Consider the following functions with 20 variables: 20 ixi , r1 (x1 , . . . , x20 ) = i=1 20 i=1 xi 10 ix2i r2 (x1 , . . . , x20 ) = 10i=1 , i=1 x2i−1 10 x2i−1 r3 (x1 , . . . , x20 ) = i=1 , 10 i=1 ix2i and the maximization problem max

3 

ri (x1 , . . . , x20 )

i=1

s.t.

1 ≤ xi ≤ 5,

i = 1, . . . , 20.

We have D = 17.885. For the Lipschitz constants of the three ratios we get K 1 = 24.0245, K 2 = 13.1101, and K 3 = 0.4334.

Ann Oper Res (2007) 156: 25–44

43

Table 6 Results for Example 6 Number of restarts

θ

Number of improving

Best value found

points per restart Min

Average

Max

Min

Average

Max

Min

Average

Max

500

11

35.17

55

152

198.63

252

29.9427

33.0959

36.0323

600

1

2.78

5

199

237.46

281

24.8406

30.4769

35.7557

700

1

1.07

2

247

276.18

319

23.0598

29.2734

34.6261

800

1

1.00

1

280

315.02

375

22.4040

28.9303

34.9401

900

1

1.00

1

310

354.02

397

23.4054

29.1442

35.1009

1000

1

1.00

1

352

390.95

452

22.8878

29.9385

37.3206

Fig. 6 Behavior of DMIHR for Example 6

Table 6 shows the results of our numerical experiments with this problem. Unfortunately, the true solution is unknown (unlike in the previous examples which were taken from the literature), so we cannot compare our solution to the “true” optimal solution. Figure 6 shows that the number of restarts decreases quickly as θ grows, and again we see that the number of improving points seems to depend linearly on θ .

5 Conclusions We have introduced an algorithm called Dynamic Multistart Improving Hit-and-Run (DMIHR) and applied it to the class of fractional optimization problems. DMIHR combines IHR, a well-established stochastic search algorithm, with restarts. The development of this algorithm is based on a theoretical analysis of Multistart Pure Adaptive Search, which relies on the Lipschitz constant of the optimization problem. We presented a method to compute Lipschitz constants for fractional functions utilizing their Lipschitz properties. We then applied DMIHR to various types of fractional optimization problems. In contrast to the existing methods, which are problem-type specific, our method provides a unified approach to solve general fractional optimization problems.

44

Ann Oper Res (2007) 156: 25–44

References Benson, H. P. (2002a). Using concave envelopes to globally solve the nonlinear sum of ratios problem. Journal of Global Optimization, 22, 343–367. Benson, H. P. (2002b). Global optimization algorithm for the nonlinear sum of ratios problem. Journal of Optimization Theory and Applications, 112, 1–29. Cambini, A., Martein, L., & Schaible, S. (1989). On maximizing a sum of ratios. Journal of Information and Optimization Science, 10, 65–79. Chadha, S. S. (2002). Fractional programming with absolute-value functions. European Journal of Operational Research, 141, 233–238. Charnes, A., & Cooper, W. W. (1962). Programming with linear fractional functionals. Naval Research Logistics Quarterly, 9, 181–186. Dür, M., Horst, R., & Thoai, N. V. (2001). Solving sum-of-ratios fractional programs using efficient points. Optimization, 49, 447–466. Falk, J. F., & Palocsay, S. W. (1992). Optimizing the sum of linear fractional functions. In C. Floudas & P. M. Pardalos (Eds.), Recent advances in global optimization (pp. 221–258). Princeton: Princeton University Press. Freund, R. W., & Jarre, F. (2001). Solving the sum-of-ratios problem by an interior point method. Journal of Global Optimization, 19, 83–102. Horst, R., Pardalos, P. M., & Thoai, N. V. (2002). Introduction to global optimization (2nd ed.). Dordrecht: Kluwer Academic. Khompatraporn, C. (2004). Analysis and development of stopping criteria for stochastic global optimization algorithms. Ph.D. Dissertation, University of Washington, Seattle, WA. Konno, H., & Abe, N. (1999). Minimization of the sum of three linear fractional functions. Journal of Global Optimization, 15, 419–432. Konno, H., & Kuno, T. (1995). Multiplicative programming problems. In R. Horst & P. M. Pardalos (Eds.), Handbook of global optimization (pp. 369–406). Dordrecht: Kluwer Academic. Kuno, T. (2002). A branch-and-bound algorithm for maximizing the sum of several linear ratios. Journal of Global Optimization, 22, 155–174. Phuong, N. T. H., & Tuy, H. (2003). A unified monotonic approach to generalized linear fractional programming. Journal of Global Optimization, 26, 229–259. Schaible, S. (1995). Fractional programming. In R. Horst & P. M. Pardalos (Eds.), Handbook of global optimization (pp. 495–608). Dordrecht: Kluwer Academic. Schaible, S., & Shi, J. (2003). Fractional programming: the sum-of-ratios case. Optimization Methods and Software, 18, 219–229. Smith, R. L. (1984). Efficient Monte Carlo procedures for generating points uniformly distributed over bounded regions. Operations Research, 32, 1296–1308. Zabinsky, Z. B. (2003). Stochastic adaptive search for global optimization. Dordrecht: Kluwer Academic. Zabinsky, Z. B., & Smith, R. L. (1992). Pure adaptive search in global optimization. Mathematical Programming, 53, 232–338. Zabinsky, Z. B., Smith, R. L., McDonald, J. F., Romeijn, H. E., & Kaufman, D. E. (1993). Improving hit-andrun for global optimization. Journal of Global Optimization, 3, 171–192.