Discrete Event Dyn Syst (2006) 16: 405–411 DOI 10.1007/s10626-006-9329-8
The Equivalence between Ordinal Optimization in Deterministic Complex Problems and in Stochastic Simulation Problems Yu-Chi Ho & Qing-Shan Jia & Qian-Chuan Zhao
# Springer Science + Business Media, LLC 2006
Abstract In the last decade ordinal optimization (OO) has been successfully applied in many stochastic simulation-based optimization problems (SP) and deterministic complex problems (DCP). Although the application of OO in the SP has been justified theoretically, the application in the DCP lacks similar analysis. In this paper, we show the equivalence between OO in the DCP and in the SP, which justifies the application of OO in the DCP. Keywords Ordinal optimization . Equivalence . Deterministic complex problems . Stochastic simulation problems
1 Introduction It is a challenging problem to do performance optimization of real world complex systems, due to the large design space, the lack of structure information, and the time-consuming performance evaluation in terms of computational burden. We face many such challenges in practice. To name a few, the performance optimization of manufacturing system, communication system, large-scale electric power grid, and the Internet. As a technique to deal with this challenge, ordinal optimization (OO) Acknowledgment of Financial Support: This work was supported by ARO contract DAAD19-01-10610, AFOSR contract F49620-01-1-0288, NSF grant ECS-0323685, NSFC Grant No.60274011, NSFC Grant No. 60574067 and the NCET (No.NCET-04-0094) program of China. Y.-C. Ho Division of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138, USA e-mail:
[email protected] Y.-C. Ho : Q.-S. Jia (*) : Q.-C. Zhao Center for Intelligent and Networked Systems (CFINS), Department of Automation, Tsinghua University, Beijing 100084, People’s Republic of China e-mail:
[email protected] Q.-C. Zhao e-mail:
[email protected]
406
Discrete Event Dyn Syst (2006) 16: 405–411
was developed in 1992 (Ho et al., 1992). The basic idea of OO contains two tenets: ordinal comparison and goal softening. Ordinal comparison says that it is much easier to find out which design is better than to answer how much better. Goal softening says that when it is computationally infeasible to find the optimal design, a top-g% design (i.e., a good enough design) is usually satisfying in practice. Both tenets are intuitively reasonable and can be proven mathematically (Dai, 1996; Xie, 1997; Lee et al., 1999). Comparing with other heuristic and experience-based methods, one attractive advantage of OO is to supply quantifiable result: OO ensures that some good enough designs can be found with specified probability. Over a hundred successful applications show the advantage of OO. (A complete list of the applications can be found in Ordinal Optimization References List (2005)). The performance optimization of complex systems can be classified into two classes. One class is n 1 X Lðq; xi Þ; n!1 n i¼1
min Jtrue ðqÞ ¼ min lim q2Q
q2Q
ð1Þ
where for any design q from the design space Q, the true performance Jtrue ðqÞ in principle can be obtained through the average of infinite number (in practice a sufficiently large number) of Monte Carlo replications. In the i-th replication, xi represents all the randomness in the sample path, and Lðq; xi Þ represents the observed performance. This class will be called the stochastic simulation optimization problem, or SP for short. To apply OO, we randomly select N designs from Q and compose QN , which is called the initial sampling. Then we use a crude model to roughly evaluate the designs in QN and screen out some observed good designs, i.e., the selected set S. The crude model can be the average of a smaller number or even a single replication of Monte Carlo simulations of the complex model, i.e., Jest ðqÞ ¼
n 1 X Lðq; xi Þ: n i¼1
ð2Þ
where n (is much less than) a large number computationally equivalent to infinity. The relation between the detailed and the crude model is Jest ðqÞ ¼ Jtrue ðqÞ þ noiseðqÞ;
ð3Þ
where noiseðqÞ is distributed according to some distribution. The property of this noise term is well studied in simulation and statistics literature. It is usually independent of q and its variance depends on the number of replications, n, which is used in Eq. (2). Let GðQN Þ be the set of truly top-g% designs in QN , where g is specified by the user, and GðQN Þ is called the good enough set of QN . To ensure that we find some truly good enough designs with high probability, i.e., ProbðjGðQN Þ\ Sj kÞ a, say a ¼ 0:95, we use a table in Lau and Ho (1997) to calculate the selection size jSj to use. This probability is called the alignment probability (AP), and the table in Lau and Ho (1997) is known as the universal alignment probability (UAP) table for OO in SP. When the initial sampling contains large enough number of designs, say N ¼ 1;000, the two probabilities ProbðjGðQN Þ \ Sj kÞ and
Discrete Event Dyn Syst (2006) 16: 405–411
407
ProbðjGðQÞ \Sj kÞ are very close,1 where GðQÞ is the set of truly top-g% designs in the entire design space Q (Lin and Ho, 2002). Once S is selected from QN using the crude and computationally fast model, we only need to accurately evaluate the designs in S. Comparing the size of S and QN with Q, OO usually saves the computing budget by at least one order of magnitude.The other class of performance optimization is min Jcomplex ðqÞ; q2Q
ð4Þ
where for each design q the true performance Jcomplex ðqÞ is obtained through deterministic but complex calculation, for example finite element calculations. This class will be called the deterministic complex optimization problem, or DCP for short. OO has also been applied in solving DCP. The procedure is almost the same as in SP. The crude model is usually a deterministic but simplified model, which gives a rough performance estimation fast. The relation between the crude model and the detailed model is Jsimple ðqÞ ¼ Jcomplex ðqÞ þ errorðqÞ;
ð5Þ
where the error is deterministic but complex. The error is regarded as a random noise. Then the UAP table in Lau and Ho (1997) is used again to calculate the selection size. Although many successful applications (to name a few, (Yang, 1998; Guan et al., 2001; Lin et al., 2004)) show that the above application procedure of OO in DCP is reasonable, one critical question remains: In what sense OO in DCP and OO in SP are equivalent s.t. the UAP table in Lau and Ho (1997) can be used in both cases? This will be referred as the equivalence problem. The purpose of this technical note is to answer this question. To see why this problem is worthy of consideration, let us consider two possibly confusing questions below. First, note that in SP the crude model Jest ðqÞ is a stochastic function of q. When the initial sampling QN is fixed, the set of observed top-jSj designs is random. So the event ðjGðQN Þ \ Sj kÞ is stochastic, and the AP ProbðjGðQN Þ \ Sj kÞ is well defined. But in DCP the crude model Jsimple ðqÞ is a deterministic function of q. When the initial sampling QN is fixed, the observed top-jSj designs are fixed, and the event ðjGðQN Þ \ Sj kÞ is deterministic. (Q1)
What does AP mean in DCP?
Second, the error between Jcomplex and Jsimple is deterministic. (Q2)
How can we regard the error as a random noise?
There are some previous studies related to the equivalence problem. The concept of Kolmogorov complexity (Li and Vita´nyi, 1997) was used in Yang (1998) to explain why the error in Eq. (5) can be regarded as a random noise. Then the true performance Jcomplex was supposed to be random according to some distribution 1 Please note that GðQÞ and GðQN Þ are the sets of truly top-g% designs in Q and QN , respectively. Thus it is not necessarily that GðQN Þ GðQÞ? ..
408
Discrete Event Dyn Syst (2006) 16: 405–411
around the observed performance Jsimple . This is known as the statistical inference viewpoint. And the resulting meaning of alignment probability was different from the definition in most other OO literatures. We supply a better answer to the equivalence problem in the next section. 2 Equivalence First we adopt a new viewpoint of DCP and SP. Define ~ q ¼ fq; q 0 g, where q is a design and the meaning of q0 is as follows. For the SP, q 0 represents the random seed we use for each simulation experiment. Then the exact and the crude models qÞ and Jest ð~ qÞ, respectively. For the DCP, q0 represents the sample are Jtrue ð~ realization of the random sampling device we use to sample the search space Q. qÞ and Jsimple ð~ q Þ, respectively. As long Then the exact and crude models are Jcomplex ð~ q Þ, Jest ð~ qÞ, Jcomplex ð~ qÞ, and Jsimple ð~ q Þ are all deterministic. as ~ q is fixed, the value Jtrue ð~ There is no conceptual difference between DCP and SP. By this definition, the two problems are totally equivalent. We shall now show the equivalence between OO in DCP and OO in SP in more detail through four steps. 1. The initial sampling. Suppose the design space Q is extremely large. To apply OO, we usually randomly sample N designs to compose the initial sampling QN . Then the AP ProbðjGðQN Þ \ Sj kÞ (and ProbðjGðQÞ \ Sj kÞ) is well defined. And the difference between these two APs is minimal (Lin and Ho, 2002). In DCP, because these N designs are randomly sampled, QN is a random set, the observed top-jSj designs are random, and ProbðjGðQÞ \ Sj kÞ is well defined. The meaning of this AP is: when applying OO in a DCP many times, how many times we succeed (i.e., find at least k truly top-g% designs of Q in the observed topjSj designs of QN ) on the average. This answers the question (Q1) in Section 1. 2. The complexity of the true performance sequence of the initially sampled designs. Sampling is a matter of uniformly and probabilistically choosing q 0 . As long as the initial sampling process of q1 ; q2 ; . . . qN is independent and in particular independent of, say the values of Jcomplex thus sampled, the true performance sequence Jcomplex ðq 1 Þ; Jcomplex ðq 2 Þ; . . . Jcomplex ðqN Þ forms an independent identically distributed (i.i.d.) sequence since the mapping q ! Jcomplex ðqÞ is a well defined though complicated function. The probability density function of Jcomplex ðq i Þ has the same form as the performance density function of Jcomplex ðqÞ, q 2 Q . This performance density function f ðxÞ shows how many/ much q’s are within Q s.t. Jcomplex ðqÞ ¼ x (Ho, 1999). Since the mappings q ! Jsimple ðqÞ and q ! errorðqÞ are also well defined, the estimated performance sequence Jsimple ðq1 Þ; Jsimple ðq 2 Þ; . . . Jsimple ðqN Þ and the error sequence errorðq1 Þ; errorðq2 Þ; . . . errorðqN Þ are i.i.d. sequences, respectively. Just note that the three sequences are related. Given one sequence, the other two sequences are determined in principle. 3. The assumption that Jcomplex is complex in the sense of Kolmogorov. For the SP, Eqs. (1) and (2) show that Jtrue and Jest are the same except the number of replications used in Jest is finite and usually very small. Tremendous amount of simulation literatures (Yakowitz, 1977; Fishman, 1996; Gentle, 1998; Landau and Binder, 2000) have established the fact that the difference Jest ðqÞ Jtrue ðqÞ ¼ noiseðqÞ (which will be called error type 1) is for all practical purpose
Discrete Event Dyn Syst (2006) 16: 405–411
409
a random variable with zero mean and well defined standard deviation which pffiffiffi varies as 1= n. For the DCP, we, however, do not have this luxury since until OO we never associated Jcomplex with Jsimple . As aforementioned, Jcomplex can be, for example, a large scale finite element calculation while Jsimple can be a crude analytical formula. We have few studies on the property of the difference between Jcomplex and Jsimple . There is no reason to expect a priori that the difference is zero mean, or uncorrelated since Jsimple bears no obvious relationship to Jcomplex as Jest to Jtrue in the SP. The only theoretical base we have to rely on is the concept of Kolmogorov complexity (Li and Vita´nyi, 1997), which quantifies unpredictability. True random sequences and pseudo random sequences derived from complex but deterministic computation are difficult to predict, so are Kolmogorov complex sequences. This is also known as the Kolmogorov-equivalence between a random variable and a pseudo random number generator (PRNG), which is the foundation of Monte Carlo simulation.2 The concept of Kolmogorov complexity conceptually and heuristically assures us that we may regard the difference Jsimple ðqÞ Jcomplex ðqÞ ¼ errorðqÞ (which will be called error type 2) as random if Jcomplex is truly complex in the sense of Kolmogorov. But we do not have the literature to back us up. Conceptually we must do the same amount of experimentation as in the simulation literature for every pair of Jcomplex and Jsimple we wish to use. To apply OO in DCP, we need to assume that Jcomplex is Kolmogorov complex. Because OO needs to assume that the true performance can be accurately evaluated only through time-consuming calculation. When Jcomplex is assumed to be Kolmogorov complex, it is reasonable to apply OO then. However, it is usually difficult to check this assumption in practice, because the concept of Kolmogorov complex is problem dependent and user dependent.3 Under this assumption, the error between Jcomplex and Jsimple is also Kolmogorov complex, and is Kolmogorov-equivalent to a random noise. This answers the question (Q2) in Section 1. 4. The re-ordered error sequence. When applying OO in DCP, we re-order the i.i.d. error sequence by the rough estimation Jsimple . Note that the re-ordered error sequence may not be i.i.d. because in some cases approximation in Jsimple causes correlation among the approximation errors. If the re-ordered error sequence is still i.i.d., then we can reasonably regard it as an i.i.d. noise sequence, and use the UAP table in Lau and Ho (1997) to decide the selection size. If the reordered error sequence is not i.i.d., it can be regarded as a correlated noise sequence. In Deng et al. (1992), it has already been shown by numerical experiments and theoretical explanations that the correlation among the noise seldom can hurt and actually helps most of the time. If we want to take advantage of the correlation and further reduce the selection size, we can use the method in Yang (1998), which divides Q into several regions. The errors of designs in each region are positively correlated, and the errors of designs among 2
Note a Monte Carlo simulation is in reality a complex deterministic function from the random initial seed and q to L. So long as the seed used can be considered as random, so will be the output of the simulation replication. 3
In the case of Monte Carlo simulation, the PRNG has passed all statistical tests.
410
Discrete Event Dyn Syst (2006) 16: 405–411
regions are almost independent. By modifying the crude model Jsimple , we can modify the error sequence more like an i.i.d. sequence. Then for the new crude model, we can use the UAP table in Lau and Ho (1997) to decide the selection size.
3 Conclusions In short, as long as Jcomplex can be assumed to be Kolmogorov complex, we can apply OO to deal with the optimization problem min Jcomplex ðqÞ:
ð6Þ
Jsimple ðqÞ ¼ Jcomplex ðqÞ þ errorðqÞ;
ð7Þ
q2Q
Given the simple model
we estimate the noise level of the error term, and the ordered performance curve of Jcomplex . As long as the design space Q is extremely large, we can use the UAP table in Lau and Ho (1997) to decide the appropriate selection size. In a sense the problem is trivial and simple once it is stated in the above way. But we thought it is nevertheless still worthwhile to point out the obvious.
Acknowledgments The authors would like to thank Professor L.H. Lee, Professor L. Shi, and Professor X.R. Cao for their careful reading of a few earlier drafts of this paper. Their insightful comments and suggestions have helped greatly in improving the presentation of this paper. Of course, all remaining errors are solely the responsibility of the authors.
References Dai L (1996). Convergence properties of ordinal comparison in the simulation of discrete event dynamic system. J Optim Theory App 91:363–388. Deng M, Ho YC, Hu JQ (1992). Effect of correlated estimation errors in ordinal optimization. In Swain JJ Goldsman D Crain RC Wilson JR (eds) Proceedings of the 1992 Winter Simulation Conference, pp. 466–474. Fishman GS (1996). Monte Carlo: Concepts, Algorithms, and Applications. Springer, Berlin Heidelberg New York Gentle JE (1998). Random Number Generation and Monte Carlo Methods. Springer, Berlin Heidelberg New York Guan X, Ho Y-C, Lai F (2001). An ordinal optimization based bidding strategy for electric power suppliers in the daily energy market. IEEE Trans Power Syst 16(4):788–797. Ho YC (1999). An explanation of ordinal optimization: soft computing for hard problems. Inf Sci 113(3–4):169–192. Ho YC, Sreenivas R, Vakili P (1992). Ordinal optimization of discrete event dynamic systems. Journal of Discrete Event Dynamic Systems 2(2):61–88. Landau DP, Binder K (2000). A Guide to Monte Carlo Simulations in Statistical Physics. Cambridge University Press, Cambridge Lau TWE, Ho YC (1997). Universal alignment probabilities and subset selection for ordinal optimization. J Optim Theory App 93(3):455–489. Lee LH, Lau TWE, Ho YC (1999). Explanation of goal softening in ordinal optimization. IEEE Trans Automat Contr 44(1):94–99. Li M, Vita´nyi P (1997). An Introduction to Kolmogorov Complexity and Its Applications, 2nd edition. Springer, Berlin Heidelberg New York
Discrete Event Dyn Syst (2006) 16: 405–411
411
Lin SY, Ho YC (2002). Universal alignment probability revisited. J Optim Theory App 113(2):399– 407. Lin S-Y, Ho Y-C, Lin C-H (2004). An ordinal optimization theory-based algorithm for solving the optimal power flow problem with discrete control variables. IEEE Trans Power Syst 19(1):276– 286. Ordinal Optimization References List 2005. http://www.cfins.au.tsinghua.edu.cn/uploads/Resources/ Complete _Ordinal_Optimization_Reference_List_v6.doc. Xie X (1997). Dynamics and convergence rate of ordinal comparison of stochastic discrete-event system. IEEE Trans Automat Contr 42(4):586–590. Yakowitz SJ (1977). Computational Probability and Simulation. Reading, Massachusetts: AddisonWesley, Advanced Book Program. Yang MS (1998). Ordinal optimization and its application to complex deterministic problems. Ph.D. dissertation, Harvard University.