Metamodeling for Large-Scale Optimization Tasks Based on Object ...

2 downloads 203 Views 445KB Size Report
Jun 1, 2010 - tools and improvements in existing best practices in managing optimization software, as preparation for the use of. “metamodeling” – the ...
Metamodeling for Large-Scale Optimization Tasks Based on Object Networks Ludmilla Werbos, Robert Kozma, Rodrigo Silva-Lugo, Giovanni E. Pazienza, and Paul Werbos

Abstract—Optimization in large-scale networks – such as large logistical networks and electric power grids involving many thousands of variables – is a very challenging task. In this paper, we present the theoretical basis and the related experiments involving the development and use of visualization tools and improvements in existing best practices in managing optimization software, as preparation for the use of “metamodeling” – the insertion of complex neural networks or other universal nonlinear function approximators into key parts of these complicated and expensive computations; this novel approach has been developed by the new Center for Large-Scale Integrated Optimization and Networks (CLION) at University of Memphis, TN.

expensive models or computations. In this paper, we summarize the theory of inverse metamodeling and apply it to a practical problem, related to large-scale optimization tasks. The paper is structured as follows: in Sec. II, we introduce the theoretical background of metamodeling; in Sec. III, we present a practical problem and the numerical simulations obtained by using the principles of metamodeling; in Sec. IV, we discuss the importance of using proper tools to represent the knowledge obtained from our simulations; in Sec. V, we draw the conclusions and discuss the future developments of our project. II. PRINCIPLES OF METAMODELING

T

I. INTRODUCTION

HIS paper presents a combined method of solving and understanding scheduling and optimization tasks using both Linear Programming and Neural Networks. The problem of transportation network configuration and fleet assignment in realistic case is multidimensional (hundreds of thousands variables and constraints) and computationally challenging. The best way to approach it at present time is Mixed Integer Programming (MIP) [1]. This approach led to great success stories, such as [2], [3], especially recently with arrival of Gurobi software [4] and faster multicore computers. However, the more we succeed, the more we want. The scope of real-life problems is growing faster than software and hardware capabilities. If a few years ago we would be satisfied to be able to predict a typical day for manufacturing, distribution or logistics optimization purposes, now we want to see an image of a typical weekend or a week. That would give the planners more power to consider all the opportunities and their combinations on the larger scale, instead of limiting themselves to a limited duration chunks of time modeled independently. Because the goal here is to minimize the amount of time required to get to a desired level of accuracy, the best approach will include some use of metamodeling – the use of fast universal approximators to approximate slower, more Manuscript submitted February 10, 2011. This work has been supported in part by FedEX Institute of Technology FIT Research Grant. Ludmilla Werbos is with IntControl LLC and CLION, The University of Memphis. R. Kozma, R. Silva-Lugo and G.E. Pazienza are with CLION, The University of Memphis. G.E. Pazienza is also with the Pazmany University, Budapest ([email protected]). Paul Werbos is with CLION, The University of Memphis, and the National Science Foundation (NSF).

A. Forward Metamodeling In the world of stochastic optimization and operations research, “metamodeling” usually means forward metamodeling. In forward metamodeling, a neural network or some other function approximator is trained to approximate the function we are trying to minimize. We can represent a neural network as a function, f(x,W), where x are the inputs to the network, and W are the weights or parameters of the network. In the case of forwards metamodeling, f is just a scalar, so we may write the neural network as f(x,W). The task is to minimize the cost function C(u, α) subject to some constraints, where: • C is the total cost • α is the set of all the inputs, including those in the constraints • u is the set of all the things we are trying to optimize, including the fleet assignments There are three steps to forward metamodeling here: 1. Pick a sample {ui} of possible fleet assignments ui – either all based on the same input vector α, or with different α. Let us assume that we have m sample points; in other words, i = 1 to m. 2. Calculate C(ui, α) for each of these assignments 3. Train the neural network f(x,W) to approximate C over this sample; in other words, find the weights W which minimize the error between f and C, for our sample of m cases, where x(i) is set to ui or to ui||αi . Another way to say this is that we train the neutral network f over the set of sample pairs { }. Forward metamodeling is especially useful for stochastic optimization problems where it is very expensive to calculate the function C, or where the only way to get its value is to do a physical experiment. But in the practical

logistics problems we have been looking at, it is generally very easy to calculate the cost C after u and α are already known. Therefore, the usual forwards metamodeling is not so useful here. B. Inverse Metamodeling The term “inverse metamodeling” is fairly new. The term itself is due to Russell Barton of Penn State [5], but it is possible to generalize the idea, to include the following kind of inverse metamodeling, described in [6]: 1. Pick a sample of possible α, {αi}, with i = 1 to m. 2. Run the optimizer to calculate the optimal u, u*(α), for each α, subject to the constraints. 3. Train the neural network f(x,W) to approximate the function u*(α) over the training set {}. In other words, pick W so as to minimize the error between f and u*, when α is used as the input to the neural network. If the neural network were an exact approximator, and extremely fast, this would be extremely useful in saving computation time; however, it is not likely to be quite so good in the beginning. It will require a very complex neural network to get decent approximation in this application. This kind of research is the priority area for CLION, which specializes in complex neural networks. If the neural network is designed to match the capabilities of the dedicated hardware implementation, as much as possible, it should be very fast in real time. Even if the approximation is not perfect, this may provide a fast warm start for the optimizer program. The value of warm starts varies a lot from problem to problem, but generally provides a faster convergence compared to random or randomized starting set of data. C. Variations and Extensions 1) Gradient-assisted learning (GAL): With forwards metamodeling, we can usually calculate ∇uC for each sample point ui, whenever we have a differentiable algorithm to calculate C itself, at a computational cost about the same as the computation of C itself. This comes from the use of the chain rule for ordered derivatives, see [7]. We can use Gradient Assisted Learning to train the neural network to match not only f and C, but ∇f and ∇C as well. This can be extended to inverse metamodeling as well. If the vector u has, say, 200 components, GAL causes a sample of N cases to be “worth” 200 times as much – as if 200×N examples had been collected. For inverse metamodeling of logistics tasks, it is necessary to get sensitivity information from the existing optimizer to make this possible, and build up some new code. 2) Metaexploration: This is when we train a neural network to output a “good” new sample point, um+1 or αm+1, for whatever use. In traditional stochastic search optimization, without metamodeling, the choice of a new sample point to explore is central to the power of the methods. In very complex systems, where an inverse metamodel can only provide a warm start, selection of a new point to follow up is also very important. In fact – the key idea in [6] is that a layer of the cerebral cortex of the mammal brain performs metaexploration, and that this is

what makes the mammal brain different from and more powerful than the reptile brain. We may or may not be ready to get into this kind of advanced capability, but it certainly is fundamental and important. 3) Robust optimization: Most of the US power grid is now managed by large Independent System Operators (ISOs) and Regional Transmission Organizations (RTO), who use large optimization packages to make decisions ranging from commitment of generation a day ahead, to actual dispatch of generation 15 minutes ahead, to planning many years ahead. The optimization problems now solved by these systems [8], are not so different from those in logistics. When the utilities shift from day-ahead commitment to actual operations, they always have to make some (costly) adjustments, because of things which cannot be predicted ahead of time. Therefore, they can get better results by reformulating the optimization problem itself, by injecting noise. If one accounts for uncertainties in the vector u, the problem becomes more complicated in theory – but the actual surface of expected error becomes more convex, which would typically increase the value of warm starts and possibly allow better use of new methods. III. OPTIMIZATION OF SCHEDULING FOR LARGE-SCALE LOGISTICS TASKS A. Efficient Scheduling Process Efficient scheduling is a crucial issue in many areas belonging to both defense and civilian domains, including disaster response, power distribution, transportation, manufacturing job scheduling etc. [9-11]. Throughout the years, there have been countless efforts to tackle this problem; nevertheless, the appearance on the market of manycore platforms and advanced FPGAs offers unexplored possibilities, which requires novel and more efficient approaches. The ultimate goal is to develop efficient nonlinear optimizers, like approximate dynamic programming [12], though we are far away from this goal: today's state of art optimization tools explore mixed integer programming techniques. In the present work, we used the Gurobi software to explore and understand better the problem of large scale optimization tasks, which may include hundreds of thousands of variables. In particular, we focused on the decision support to optimize the assignment of resources for the domestic market operation of a large distribution company, where in the logistics operation there are about 200 delivery points served by a fleet. Our Efficient Scheduling Process (ESP) is composed of three main steps, each of them formulated as a Mixed Integer Programming (MIP) problem: 1) determine point to point flight and volume; 2) assign dispatch locations and volumes; 3) assign fleet vehicles to move containers between dispatch locations and volumes. The last one is by far the most computationally complex among the three steps, and for this reason our study only refer to the optimization of this particular part of the ESP. The ESP must accomplish the following tasks: maximize the profit; assign available resources while minimizing costs and

respecting operational constraints; find the optimal cost resource assignment to move the forecasted volume of products, adhering to operational constraints; globally reflow all forecasted volume to minimize costs according to capacity and operational constraints. B. Numerical Experiments The optimization software used in our simulations is the well-known Gurobi Optimizer. The Gurobi parameters changed through the simulations are: BarOrder, Crossover, Crossover Basis, Cuts, and MIPFocus (a thorough description of their functionalities can be found in [13]). When considering all possible combinations of the values these five parameters can assume, we obtain 720 cases. Before performing the simulations, we decomposed the original problem P into three sub-problems: P1, P2, and P3. P1 is the optimization problem for the first stage of the original problem, P, done independently of the next stage. P2 is optimizing the second stage of the original problem P with the assumption that the results of optimization done at the first stage by solving the P1 are not supposed to be changed, and treated just as the other constraints. P3 is optimizing the first stage of the original problem while at the same time taking into account the processes that need to be optimized at the second stage of the original problem P. It is important to emphasize that P1 is less complex than P2 and P3: for this reason, we run all possible 720 combinations of the parameters for the sub-problem P1, and then we selected the ‘best’ combinations of the five parameters to perform the

simulations of P2 and P3. We used to different criteria to classify the results given on P1: first, the shortest time required to reach 1% MIP gap (STO), and second the least number of nodes explored during the branch and bound phase (LNE). The result of this process can be found in Table I for (STO) and in Table II for (LNE): we selected the best five sets for each of the two methods. These five sets were then used for the optimization of P2 and P3. For the sake of conciseness, in this paper we only present the numerical results for P2, but the results for P3 are qualitatively similar to them. Also, we implemented two different versions, which are based on different forecasts for the future traffic. The results are shown in Table III and Table IV for (STO) and (LNE), respectively. All our simulations were performed on a Dell PowerEdge server with 24 cores and 48 GB of RAM with a Linux Red Hat operative system. We observe that the greatest improvement is achieved for version 2 with (STO), with an average time of around 9 hours, much better than the 24 hours usually required for this step. However, the quality of this result is not uniform through the other configurations: for example, for version 1 of (LNE) all simulations ran for 24 hours, without reaching the desired MIP gap of 1%. It is also worth mentioning that the custom hardware and software configurations may have had an effect in the performance improvement [14].

 

TABLE I MIP PARAMETERS FOUND FOR PROBLEM P1 WHEN USING THE FASTER RUNS (STO) CRITERION BarOrder Crossover Cross. Basis Cuts MIPFocus Set 1 A. M. Degree Auto Quick Very aggr Feasibility Set 2 Auto Auto Quick Very aggr. Feasibility Set 3 Auto Disabled Quick Very aggr Feasibility Set 4 A. M. Degree Disabled Quick Very aggr. Feasibility Set 5 N. dissection Primal Quick Aggressive Feasibility TABLE II MIP PARAMETERS FOUND FOR PROBLEM P1 WHEN USING THE LEAST NODES EXPLORED (LNE) CRITERION BarOrder Crossover Cross. Basis Cuts MIPFocus Set 1 Auto D/P-P Slow/Stable Auto BOB Set 2 Auto D/P-P Slow/Stable Conservative. BOB Set 3. A. M. Degree D/P-P Slow/Stable Very aggr BOB Set 4 A. M. Degree D/P-P Slow/Stable Auto. BOB Set 5 N. dissection D/P-D Slow/Stable Very aggr BOB TABLE III PROBLEM P2 – SIMULATION TIME AND PERFORMANCE WHEN USING THE FASTER RUNS (STO) CRITERION Version 1 Version 2 Time (h) MIP Gap (%) Time (h) MIP Gap (%) Set 1. 24.0 1.16 5.07 0.99 Set 2 10.3 0.95 5.30 0.99 Set 3. 10.6 0.95 7.91 0.98 Set 4 24.0 1.16 7.96 0.98

Set 5

24.0

1.01

18.4

1.00

Avg.

18.58

1.05

8.93

0.99

Std. Dev.

7.42

0.11

5.47

0.01

TABLE IV PROBLEM P2 – SIMULATION TIME AND PERFORMANCE WHEN USING THE LEAST NODES EXPLORED (LNE) CRITERION Version 1 Version 2 Time (h) MIP Gap (%) Time (h) MIP Gap (%) Set 1. 24.0 1.05 12.92 0.99 Set 2 24.0 1.05 24.00 1.66 Set 3. 24.0 1.14 13.20 0.99 Set 4 24.0 1.14 24.00 1.66 Set 5 24.0 1.90 10.78 1.00 Avg.

24.0

1.26

16.98

1.26

Std. Dev.

0

0.36

6.48

0.37

The results of numerical experiments provide a training set for use in metamodeling, with various models described in Section 2. Our primary intention is to use Recurrent Object Networks to be trained with Extended Kalman Filtering (EKF) [15] or more advanced methods under development. IV. KNOWLEDGE REPRESENTATION AND VISUALIZATION When dealing with very complex problems, it is crucial to have an effective way to represent the knowledge and hence visualize information that would be otherwise very hard to retrieve from the raw data. Proper visualization helps in formulating the optimization problems and in assigning initial values and weights in a sequence of optimizers. For our experiments with data visualization, we used a network composed of 141 sites located in the 48 contiguous US states chosen on the basis of their importance in the cargo market [16]. We did not considered the sites with very low traffic among the 200 sites used in the experiments, in order to avoid clutter in the figures. We assigned to each of them a realistic volume of traffic, which we calculated from the available data of the last few years. Finally, we created specific software, programmed with Wolfram Mathematica, which has been used to generate the figures presented in this section. In the following, we will always refer to ‘volume’ as number of pieces received/delivered, but we would obtain similar results if we use other reference parameters, such as the weight of the freight or the number of vehicles employed. The incoming and the outgoing traffics for the 141 sites are shown in Figs. 1 and 2, respectively.

Fig. 1. US sites classified according to the volume of the incoming freight traffic; the size of the circle is proportional to the number of pieces delivered in our model.

We can easily identify that a few key sites – eg, Memphis, Dallas, Chicago, New York, Los Angeles, Atlanta – generate most of the volume. This is in accordance with the model of the freight traffic, in particular in the air transport, as a scalefree network [17] in which a few nodes, called hubs, are connected to most of the remaining nodes and contribute a large share to the traffic [18-20].

Fig. 2. US sites classified according to the volume of the outgoing freight traffic; the size of the circle is proportional to the number of pieces received in our model.

Fig. 4. Connections among the 22 busiest US sites in our model: the width of the line is proportional to the traffic between the two sites.

Indeed, this phenomenon is represented even better in Fig. 3, in which we compare the cumulative percentage of the freight traffic (i.e., the sum of the traffic generated in n sites ordered by total, incoming and outgoing, traffic) with a linear model, in which each site contributes 1/141 to the total volume. Cleary, our model is highly nonlinear: the 7 busiest sites (5% of 141) count for 25% of the total traffic, the 22 busiest sites (16% of 141) count for 50% of the total traffic, and the 48 busiest sites (34% of the total) count for 75% of the total traffic. In correspondence with n=48 we also find the maximum discrepancy between the two models.

Fig. 5. Connections among the 7 busiest US sites in our model: the width of the line is proportional to the traffic between the two sites.

We can then conclude that an effective visualization method is a fundamental tool for representing the knowledge coming from the data, which especially in this kind of problems can be particularly cumbersome. Fig. 3. Comparison between the cumulative percentage of the freight traffic (in blue) and a linear model (in green): it is possible to observe that the 7 busiest sites count for 25% of the total traffic, the 22 busiest sites count for 50% of the total traffic, the 48 busiest sites count for 75% of the total traffic.

These considerations suggest us that we can optimize the transportation among just a few sites thus obtaining a dramatic impact on the final result. In particular, in Figs. 4 and 5, we show the network generated by the 22 and the 7 busiest sites, respectively: we can observe that now the network is simple enough to notice the finest details of both the site locations and the connections among them, while still representing the 50% of the traffic, in the first case, and the 25% of the traffic, in the second one.

V. CONCLUSION AND FURTHER DEVELOPMENT By selecting the optimal set of control parameters for Gurobi 3.0 out of 720 parameter sets, we were able to determine a set of parameters that significantly increased the performance of the system. Important progress was made in the understanding of the details of the data used for the generation of the MIP models by the distribution company. Future directions of research will include expanding the problem both vertically (up to several days) and horizontally (combining separate solution stages into integrated model). However, the complexity of such integrated model might approach the limit of traditional Mixed Integer Programming methods, calling for alternative methods of solution, such as approximate dynamic programming [21], [22], which will

provide further increase in the performance necessary for the daily operation of the distribution company. It is not possible for the neural network to come up with a better solution than the existing optimizer, if the optimization problem is formulated exactly as it is now and the existing optimizer is allowed to run to perfect completion. However, if the existing optimizer is only allowed to run for some time T, the combination of a good warm start and a new run of the existing optimizer for the full time T should lead to a better solution. This new combination could be used to create a new training set for the inverse metamodeling, a better metamodel, and iteratively improve the quality of solution. It seems likely that that some neural networks within the brain also serve as metamodels, to approximate and anticipate more complex calculations by other neural networks; for example, simple, fast feedforward networks may be trained to anticipate and then initialize more complicated recurrent networks, or value function networks may be viewed as approximators of a future value calculation. It is also important to remark the important role played by visualization tools that are able to properly represent the information of our model. In the near future, we plan to improve such tools and make them publicly available. REFERENCES [1] [2] [3] [4] [5]

[6] [7]

[8] [9] [10] [11] [12] [13] [14] [15]

G. B. Dantzig and M. N. Thapa. 2003. Linear Programming 2: Theory and Extensions. Springer-Verlag. C. Barnhart and R. R. Schneur,”Air Network Design for Express Shipment Service” Operations Research, Vol. 44, No. 6 (Nov. - Dec., 1996), pp. 852-8631 S.-C Ting and Tzeng, G.-H., 2004. An optimal containership slot allocation for linear shipping revenue management. Maritime Policy & Management 31, pp. 199-211. http://www.prnewswire.com/news-releases/gurobi-announces-therelease-of-gurobi-optimizer-40-software-106821458.html R. Barton and M. Meckesheimer, Metamodel-based simulation optimization (with), Chapter 18 in Handbooks in Operations Research and Management Science: Simulation, S. G. Henderson and B. L. Nelson, eds., New York: Elsevier Science, 2006 P. J. Werbos, Brain-Like Stochastic Search: A Research Challenge and Funding Opportunity, http://arxiv.org/abs/1006.0385, submitted on 1 Jun 2010 P. J. Werbos, . Backwards Differentiation in AD and Neural Nets: Past Links and New Opportunities, in M. Bucker, G. Corliss, P.Hovland, U. Naumann, and B. Norris (eds), Automatic Differentiation: Applications, Theory and Implementations, Springer (LNCS), New York, 2005. http://www.ferc.gov/industries/electric/indus-act/market-planning.asp A.S. Ko, N.B. Chang, Journal of Environmental Management 88(1), 11 (2008) L. Contesse, J. Ferrer, S. Maturana, Annals of Operations Research 139, 39 (2005) S. Kesen, S. Das, Z. Gungor, The International Journal of Advanced Manufacturing Technology 47, 665 (2010) L. Werbos, P. Werbos, Self-organization in CNN-Based Object Nets, 2010 12th International Workshop on Cellular Nanoscale Networks and their Applications (CNNA), 2010 Gurobi parameters: http://gurobi.com/doc/40/refman/node572.html, retrieved on February 1, 2011 R. Silva-Lugo, R. Kozma, L.Werbos, “Optimization of Scheduling for Larg-Scale Logistics Tasks”, 5th Multidisciplinary International Scheduling Conference 2011 (submitted) R. Ilin, R. Kozma, P.J. Werbos, “Beyond backpropagation and feedforward models: A practical training tool for more efficient

[16] [17] [18] [19] [20] [21] [22]

universal approximator,” IEEE Trans. Neur. Netw. 19(3), pp. 929937, 2008. 2009 Annual World Airport Traffic Report (WATR), Airport Council International, 2009. A.L. Barabasi and R. Albert, “Emergence of scaling in random networks,” Science, vol. 286, no. 5439, 1999. S. Conway, “Scale-free networks and commercial air carrier transportation in the United States,” in 24th Congress of the International Council of the Aeronautical Sciences, 2004. W. Li and X. Cai, “Statistical analysis of airport network of China,” Physical Review E, vol. 69, no. 4, 2004. D. DeLaurentis, E. Han, and T. Kotegawa, “Network-theoretic approach for analyzing connectivity in air transportation networks,” Journal of Aircraft, vol. 45, no. 4, pp. 1669—1679, 2004. W. Powell, Approximate dynamic programming: solving the curses of dimensionality (John Wiley and Sons, 2007) H.P. Simao, J. Day, A.P. George, T. Giord, J. Nienow, W.B. Powell, Transportation Science 43, 178 (2009)

Suggest Documents