Handling Large Datasets in Parallel Metaheuristics: A Spares Management and Optimization Case Study Chee Shin Yeo Institute of High Performance Computing Singapore
[email protected]
Elaine Wong Kay Li EADS Innovation Works Singapore
[email protected]
Abstract—Parallel metaheuristics based on Multiple Independent Runs (MIR) and cooperative search algorithms are widely used to solve difficult optimization problems in diverse domains. A key step in assessing and improving the speed of global convergence of parallel metaheuristics is tracing solutions explored by the MIR-based algorithm. However, this generates large amounts of data, thus posing execution problems. This problem can be resolved by using a flow control workflow to govern the execution of the MIR-based parallel metaheuristics. Using a Spares Management and Optimization case study for the logistics industry, this paper analyzes the performance of the flow control workflow with different problem sizes. We show that by appropriately setting workflow parameters, namely: (1) stop criterion to limit the amount of data cached and exchanged, and (2) clustering policy to distribute/aggregate parallel processes to compute nodes selectively, the performance of the algorithm can be improved.
I. I NTRODUCTION The use of metaheuristics to solve problems in a wide range of domains has been exceedingly popular. Metaheuristics unlike exact methods and heuristics exhibit two distinctive features. Firstly, random modifications (either from a population of possible solutions or around the neighbourhood of current solutions) are involved in deriving the approximated optimal solution. Secondly, heuristics are applied specifically on the solution space based on a belief of the topology of the space, as opposed to applying domainspecific heuristics. Because of the domain independence, metahueristics have been successfully applied to solve many different types of difficult optimization problems. These works have been described and compared in [1], [2], [3], [4], [5], laying ground for more advanced applications and performance improvements. A significant development is the parallelization of metaheuristics. Parallelization approaches have been broadly classified according to the nature of (i.e. same or different) initial solutions and parallel search strategies [6], [7]. Although different initial solutions add complexity to the algorithm, the latter has been motivated by performance improvements. For example, initiating searches from different points avoid being trapped in local optimal solutions [8],
Yong Siang Foo Institute of High Performance Computing Singapore
[email protected]
and decomposing the data into smaller sets can improve the speed of convergence (assuming that decomposition is possible in the first place). The use of different initial solutions, search strategies, and random modifications give rise to potentially very different intermediate solutions. For this reason, cooperative methods, where parallel searches exchange intermediate results, have been applied with notable success. Another exciting development is hybridizing metaheuristics. Hybridization is a combination of (typically two) metaheuristics to leverage on their respective strengths [9], [10]. A popular hybrid is applying tabu search [11], for example with simulated annealing [12], or ant colony searches [13]. The tabu search method avoids repeated searches by keeping track of past searches, which we term solution trace. The use of solution traces is not unique to the tabu search method, traces are also commonly used to assess the convergence of problems visually. For problems involving large datasets, generating solution trace poses execution problems. For one, the memory required to support such effort is very large, and often insufficient. Furthermore, the communication between parallel processes deployed on different compute nodes is very high leading to network bottlenecks. As a result, problems involving large datasets either require very long execution times or are limited in terms of the extent of performance analysis supported. To our knowledge, the efficient handling of large datasets generated by and shared between metaheuristics to improve execution timing has not been studied. To address this problem, we propose using a flow control workflow to govern the execution of the metaheuristics without needing to change the metaheuristics themselves. We incorporate two workflow settings: (1) stop criterion to control the exchange of intermediate results, and (2) clustering policy to control the assignment of jobs onto compute nodes. To assess the performance of the flow control workflow, the Spares Management and Optimization (SMO) case study is used. The SMO problem is a stochastic, integer programming problem, which is a class of difficult optimization problem. Since the pioneering work [14], [15], there has been extensive work to include additional conditions and
constraints into the model, such as emergency supplies [16], service level agreements [17], [18], and pooling [19], [20]. Due to the nature of service levels (measured across all parts) and pooling (applicable across all airports), it is not possible to decompose data for SMO problems without making scenario-specific assumptions. In this work, we consider two SMO problem sizes involving 59 inventory stocking locations with 10 and 60 parts, equating to 590 and 3540 decision variables respectively. We implement the flow control workflow for parallel metaheuristics based on Multiple Independent Runs (MIR) [21] executing parallel search strategies and the cooperative exchange of results at predefined intervals. The rest of the paper is organized as follows: Section II discusses related work. Section III explains the proposed flow control workflow and its settings in terms of stop criterion and clustering policy, Section IV describes the implementation of SMO as a case study to apply the flow control workflow for parallel metaheuristics, Section V presents and analyzes the results from the SMO case study, and Section VI concludes and describes future work. II. R ELATED W ORK To our knowledge, there is no previous work that proposes using a flow control workflow to efficiently handle the large datasets generated by and shared between parallel metaheuristics in order to satisfy the memory constraint and improve execution timing. A number of frameworks have been proposed for executing parallel evolutionary algorithms, such as DREAM [22] and Distributed BEAGLE [23]. DREAM only implements the cooperative island model using threads and sockets, while Distributed BEAGLE enables the MasterSlave parallel evaluation of the population model and the synchronous migration-based island model. There are other more advanced frameworks that can execute not only parallel evolutionary algorithms, but also parallel local searches. For instance, MALLBA [24] and ParadisEO [25] supports different hybridization mechanisms in addition to the two previous models. III. F LOW C ONTROL W ORKFLOW FOR PARALLEL M ETAHEURISTICS Figure 1 shows how a metaheuristic algorithm can be implemented as a flow control workflow. Instead of processing the entire sequence of computations at once in the metaheuristic algorithm, an alternative strategy is to break down the computations into chunks (which we term states) and describe the computations as a workflow comprising multiple 𝑛 states. The output solution 𝑂𝑘−1 of each state 𝑆𝑘−1 is used as the initial solution for the next state 𝑆𝑘 . The optimal solution is the output 𝑂𝑛 of the last state 𝑆𝑛 . To speed up the search, the metaheuristic algorithm can be parallelized, deploying Multiple Independent Runs (MIR) [21] in each state to generate different randomized
S1
O1
S2
O2
S3
O3
Ok-1
Sk
Ok
On-1
Sn
On
MIRka Distribute Initial Solution
MIRkb
Aggregate Results
MIRkx
Figure 1. Implementing a metaheuristic algorithm as a flow control workflow comprising 𝑛 states with Multiple Independent Runs (MIR) in each state
results simultaneously. With a single processor, MIR can only be executed as multiple threads. With multiple processors, MIR can be executed as multiple jobs with each of them running independently on a single processor. In both cases, data is generated by MIR and communicated during distribution/aggregation. The initial solution of a state is first distributed as inputs to each of its MIR before their execution. The output solution of the state is then aggregated after the completion of all its MIR. Hence, the cooperative search process is established by the parallel MIR of search strategies through the exchange of intermediate results at predefined intervals between states. A. Stop Criterion As the stop criterion to limit the optimization process within each MIR, we define a fixed number of iterations based on the number of solutions visited by the metaheuristic algorithm. Since a smaller number of iterations generates a smaller amount of data, the stop criterion is able to control the exchange of intermediate results by limiting the amount of data cached and exchanged. However, the stop criterion also determines the precision of the optimization in terms of the quality of the solutions reached. Therefore, it is essential to ensure that a smaller number of iterations for the stop criterion does not cause extremely poor solution quality even though it is able to reduce the amount of data. This issue is resolved by increasing the number of states in the flow control workflow. For the same stop criterion, a flow control workflow comprising a larger number of states will achieve a higher precision of the overall optimization with better solution quality as compared to a workflow comprising a smaller number of states. B. Clustering Policy Figure 2 shows three different clustering policies for a flow control workflow comprising 3 states with 2 MIR. Depending on the type of clustering policy (Workflow, State, or Job), the total execution time of a job includes the time to complete the job itself (as the entire workflow, a state, or a MIR respectively), the time to distribute the jobs to
Workflow Clustering Policy: Executes the entire workflow as a single job
S1 MIR1a MIR1b Processor P2 P1
Td
S2 MIR2a MIR2b
S3 MIR3a MIR3b
entire workflow
Ta
Time
State Clustering Policy: Executes each state of the entire workflow as a single job
S1 MIR1a MIR1b Processor P2 P1
Td
S1
S2 MIR2a MIR2b
Ta Td
S2
S3 MIR3a MIR3b
Ta Td
S3
Ta
Time
Job Clustering Policy: Executes each MIR in a state as a single job
S1 MIR1a MIR1b Processor P2 P1
Figure 2.
S2 MIR2a MIR2b
S3 MIR3a MIR3b
Td MIR1a Ta Td MIR2a Ta Td MIR3a Ta Td MIR1b Ta Td MIR2b Ta Td MIR3b Ta
Time
Clustering policy at the level of Workflow, State, or Job
the processors (𝑇𝑑 ), and the time to aggregate results from them (𝑇𝑎 ). The first policy called Workflow executes the entire workflow (with many states) as a single job on a single processor. The second policy called State executes each state of the entire workflow as a single job on a single processor. The third policy called Job executes each MIR of a state as a single job on a single processor. In this way, the Job clustering policy is able to execute the parallel metaheuristic with the smallest problem size on a single processor, followed by State with a larger problem size and Workflow with the largest problem size. IV. C ASE S TUDY: I MPLEMENTATION OF SMO As a case study for handling large datasets in parallel metaheuristics, we have implemented the middleware to manage the parallel processing of SMO problems over a cluster of desktops. Designed based on a Service-Oriented Architecture (SOA) [26], the entire system consists of three tiers: (1) Application, (2) Middleware, and (3) Resource. The Application tier contains an analysis service for users to perform high-level business analysis of optimization problems. The Middleware tier consists of three services: (1) Optimization Service which configures domain-specific optimization requirements, (2) Workflow Management (WM) Service which monitors the execution progress of the flow control workflow, and (3) Job Management (JM) Service
which is a wrapper to Torque Resource Manager [27] and interacts via the RESTful web service [28]. We use an Optimization Service for SMO as a case study. Our generic system architecture can support many types of Optimization Service specific to different optimization problems. The Resource tier comprises a compute cluster with a head node and multiple compute nodes to execute the optimization problems. The cluster uses Linux NFS [29] for shared storage and Torque Resource Manager for executing jobs across multiple nodes. Torque provides control over distributed compute nodes and batch jobs, thus enabling more efficient resource management of the cluster. The complete end-to-end system flow is incorporated between Analysis Service and JM Service via Web API. The system flow begins with Analysis Service initiating an optimization request and ends with it receiving the results of the optimization. Optimization Service creates the flow control workflow for the optimization problem to be executed. When the flow control workflow is started, WM Service submits workflow tasks to JM Service based on the selected clustering policy (Workflow, State, or Job). It also uploads the optimization program (e.g. SMO program in Java) and initial solution data (e.g. SMO data in XML) to JM Service. JM Service acts as an interface for the user to execute jobs to the cluster. It provides a job submission interface that supports all the three different types of clustering policy. It also provides a job cancellation interface so that submitted jobs can be cancelled if they have not started or completed. The Torque Executor at each compute node sends the latest status updates of job execution to JM Service, which in turn notifies WM Service of any workflow task completion. In particular, for Job clustering policy, JM Service will wait for all the jobs constituting the MIR of a state to complete before notifying WM Service. V. E XPERIMENTAL R ESULTS We evaluate the performance of the flow control workflow by using SMO as a case study to understand the effect of stop criterion and clustering policy on handling large datasets in MIR-based parallel metaheuristics. Table I lists the SMO, flow control workflow, and environment configurations for our experimental setup. We use the stop criterion of 44000 iterations and Workflow clustering policy as the base case for the comparison of results. The complexity of SMO problems mainly depends on two factors: (1) the number of parts, and (2) the number of inventory stocking locations. We consider two SMO problem sizes involving 59 inventory stocking locations with 10 and 60 parts, equating to 590 and 3540 decision variables respectively. We also consider 7 cases of service levels. A service level specifies two details: (1) the time deadline in which the requested spare parts have to been delivered to the required locations, and (2) the number of airport locations that needs to be delivered to. For example, “24 hrs KUL
Flow Control Workflow
Execution Environment
Parameter Decision Variables Service Levels
MIR Iterations for Stop Criterion Clustering Policies Systems
Value 1) 590 (10 Parts × 59 Locations) 2) 3540 (60 Parts × 59 Locations) 1) 1 hr KUL 2) 1 hr KUL BKK 3) 4 hrs KUL BKK 4) 24 hrs KUL BKK 5) 1 hr KUL BKK CGK HKG SIN 6) 4 hrs KUL BKK CGK HKG SIN 7) 24 hrs KUL BKK CGK HKG SIN 1) 2 2) 4 1) 44000 2) 116000 1) Workflow 2) State 3) Job 1) IBM (IBM ThinkStation D20 multi-core workstation with 32GB of memory and 16 Intel Xeon E5520 2.26GHz processors) 2) DELL (Cluster of Dell OptiPlex 360 desktops with each node having 4GB of memory and 2 Intel Duo E8400 3.0GHz processors)
BKK CGK HKG SIN” means that the spare parts have to be delivered within 24 hours to all 5 airport locations of Kuala Lumpur, Bangkok, Jakarta, Hong Kong, and Singapore. We consider two execution environments: (1) IBM (2.26GHz, 32GB) which is less memory-constrained, and (2) DELL (3.0GHz, 4GB) which is more memory-constrained. We use three metrics to measure the performance: (1) solution quality, (2) memory utilization, and (3) execution time. We calculate average values of these three metrics based on the 7 service levels since their results are about the same and thus deriving average values for them are sufficient. To facilitate easier comparison across these various metrics, the average values are then normalized to standardized values between 0 and 1. A. Effect of Stop Criterion We first examine the effect of stop criterion on the flow control workflow to limit the amount of data cached and exchanged by executing not more than 44000 and 116000 iterations on the IBM multi-core workstation with 2.26GHz of processor speed and 32GB of memory. The memory utilization increases with more MIR and decision variables (Figure 3(a)) as they generate a larger amount of data that has to be stored in the memory. This increasing memory utilization in turn requires increasing execution time (Figure 3(b)). Hence, memory availability is a critical factor to enable the faster execution of parallel metaheuristics based on MIR and cooperative search algorithms, in particular with more MIR and decision variables.
Normalized Avg. Memory Utilization
Settings SMO
1 0.8 0.6 0.4 0.2 0 590Var 590Var 3540Var 3540Var 2MIR 4MIR 2MIR 4MIR Number of Decision Variables and MIR 44kIter
116kIter
(a) Stop Criterion: Memory Utilization
Normalized Avg. Execution Time
Table I E XPERIMENTAL S ETUP
1 0.8 0.6 0.4 0.2 0 590Var 2MIR
590Var 4MIR
3540Var 2MIR
3540Var 4MIR
Number of Decision Variables and MIR 44kIter
116kIter
(b) Stop Criterion: Execution Time Figure 3.
Effect of stop criterion on IBM (2.26GHz, 32GB)
However, the stop criterion is able to limit the exchange of data with a smaller number of iterations to reduce the execution time of the entire workflow. Figure 3(a) shows the memory utilization of 44000 iterations only increases by 10% as compared to 116000 iterations which increases by 93%. Likewise, Figure 3(b) shows the execution time of 44000 iterations only increases by 6% instead of 91% for 116000 iterations. B. Effect of Clustering Policy We now analyze the effect of clustering policy (described in Section III-B) on the flow control to understand how the performance of MIR-based parallel metaheuristics is improved by controlling the assignment of parallel jobs onto compute nodes. We run this experiment using 44000 iterations in the more memory-constrained DELL cluster so as to more effectively highlight the significance of memory constraint. Figure 4(a) shows the Job clustering policy maintains the
Normalized Avg. Memory Utilization
0.6
achieved from the lower memory utilization. This highlights the trade-off between memory utilization (space constraint) and execution time (time constraint), in which the impact is greater for larger datasets due to the bigger problem size making it inefficient for data to be distributed.
0.4
VI. C ONCLUSION
0.2
This paper investigates whether large datasets generated by and shared between parallel metaheuristics can be efficiently handled by using flow control workflow for execution. Through the actual implementation of a Spares Management and Optimization (SMO) case study for the logistics industry, we are able to understand the performance impact of setting two configuration parameters appropriately: (a) stop criterion to control the exchange of data, and (b) clustering policy to control the execution process onto compute nodes. Experimental results show that memory availability is a key factor for improving the performance of parallel metaheuristics involving Multiple Independent Runs (MIR) and cooperative search algorithms. Using a smaller number of iterations for the stop criterion of a metaheuristic reduces the amount of memory utilized and the execution time. The memory utilization can also be reduced by using Job clustering policy in two situations, in particular with more MIR: (1) a more memory-constrained execution environment, such as a cluster of desktops, and (2) a larger number of iterations is required for the stop criterion of the metaheuristic. However, the improvement in execution time is dependent on the problem size. For bigger problem sizes (with larger datasets), the time taken to distribute and aggregate data in the cluster negates the time improvements achieved from lower memory utilization. In such cases, Workflow clustering policy achieves lower execution time than Job clustering policy by executing the entire workflow without any communication overheads between parallel MIR of each state deployed on different compute nodes. Our future work will focus on more intelligent optimization of parallel metaheuristics, such as self-configuring the stop criterion depending on the progress of the optimization. We will also conduct the performance analysis of parallel metaheuristics in a multi-user execution environment, whereby resource contention can be an issue. For instance, using the Job clustering policy, the execution of the flow control workflow belonging to one user may be delayed due to waiting for the jobs of other users to be completed first. Hence, an effective scheduling mechanism is essential to resolve this issue.
1 0.8
0 590Var 590Var 3540Var 3540Var 2MIR 4MIR 2MIR 4MIR Number of Decision Variables and MIR Workflow
State
Job
Normalized Avg. Execution Time
(a) Clustering Policy: Memory Utilization
1 0.8 0.6 0.4 0.2 0 590Var 2MIR
590Var 4MIR
3540Var 2MIR
3540Var 4MIR
Number of Decision Variables and MIR Workflow
State
Job
(b) Clustering Policy: Execution Time Figure 4.
Effect of clustering policy on DELL (3.0GHz, 4GB)
lowest memory usage even as the problem size increases with more MIR and decision variables. The Job clustering policy only incurs 6% more (from 590 variables with 2 MIR to 3540 variables with 4 MIR), whereas Workflow and State incur 65% and 66% more respectively. Hence, instead of using the Workflow clustering policy to execute the entire workflow as it is in a more memory-constrained execution environment, the Job clustering policy is a better alternative than State to improve the performance of MIR-based parallel metaheuristics. This shows that the Job clustering policy is able to effectively reduce the memory utilization by executing MIR on multiple processors simultaneously as compared to both Workflow and State which only execute MIR in parallel on a single processor. However, Figure 4(b) shows the Job clustering policy requires more execution time than Workflow (except for 590 variables with 4 MIR) as the problem size increases. This is because the time taken to distribute and aggregate data across multiple processors negates the time improvements
R EFERENCES [1] I. H. Osman and G. Laporte, “Metaheuristics: A bibliography,” Annals of Operations Research, vol. 63, no. 5, pp. 511– 623, 1996.
[2] A. T¨orn, M. M. Ali, and S. Viitanen, “Stochastic global optimization: Problem classes and solution techniques,” Journal of Global Optimization, vol. 14, pp. 437–447, 1999.
[19] J. Kilpi and A. P. J. Vepslinen, “Pooling of spare components between airlines,” Journal of Air Transport Mgmt., vol. 10, pp. 137–146, 2004.
[3] C. Blum and A. Roli, “Metaheuristics in combinatorial optimization: Overview and conceptual comparison,” ACM Comput. Surv., vol. 35, no. 3, pp. 268–308, 2003.
[20] H. Wong, G.-J. Van Houtum, D. Cattrysse, and D. Van Oudheusden, “Multi-item spare parts systems with lateral transshipments and waiting time constraints,” European Journal of Operational Research, vol. 171, pp. 1071–1093, 2006.
[4] M. Gendreau and J.-Y. Potvin, “Metaheuristics in combinatorial optimization,” Annals of Operations Research, vol. 140, pp. 189–213, 2005. [5] T. Weise, Global Optimization Algorithms Theory and Application , 2008. [6] T. G. Crainic and M. Toulouse, Parallel Strategies for MetaHeuristics. Springer New York, 2003, ch. 17, pp. 475–513. [7] E. Alba, E.-G. Talbi, G. Luque, and N. Melab, Metaheuristics and Parallelism. Wiley, 2005, ch. 4, pp. 79–103. [8] M. G. C. Resende and C. C. Ribeiro, Greedy Randomized Adaptive Search Procedures. Springer New York, 2003, ch. 8, pp. 219–249. [9] E.-G. Talbi, “A taxonomy of hybrid metaheuristics,” Journal of Heuristics, vol. 8, no. 5, pp. 541–564, 1999. [10] C. Blum and A. Roli, Hybrid Metaheuristics: An Introduction. Springer Berlin / Heidelberg, 2008, vol. 114, ch. 1, pp. 1–30. [11] T. G. Crainic, Parallel Computation, Co-operation, Tabu Search. Springer US, 2005, vol. 30, ch. 13, pp. 283–302. [12] S.-Y. Lee and K. G. Lee, “Synchronous and asynchronous parallel simulated annealing with multiple markov chains,” IEEE Trans. Parallel Distrib. Syst., vol. 7, no. 10, pp. 993– 1008, 1996. [13] M. Dorgo and C. Blum, “Ant colony optimization theory: A survey,” Theoretical Computer Science, vol. 344, pp. 243– 278, 2005. [14] C. C. Sherbrooke, “Metric: A multi-echelon technique for recoverable item control,” Operations Research, vol. 16, pp. 122–141, 1968. [15] S. Graves, “A multi-echelon inventory model for a repairable item with one-for-one replenishment,” Management Science, vol. 31, pp. 1247–1256, 1985. [16] P. Alfredsson and J. Verrijdt, “Modeling emergency supply flexibility in a two-echelon inventory system,” Management Science, vol. 45, pp. 1416–1431, 1999. [17] M. A. Cohen, P. R. Kleindorfer, and H. L. Lee, “Nearoptimal service constrained stocking policies for spare parts,” Operations Research, vol. 37, no. 1, pp. 104–117, 1989. [18] K. E. Caggiano, P. L. Jackson, J. A. Muckstadt, and J. A. Rappold, “Optimizing service parts inventory in a multiechelon, multi-item supply chain with time-based customer servicelevel agreements,” Operations Research, vol. 55, no. 2, pp. 303–318, 2007.
[21] F.-H. A. Lee, “Parallel Simulated Annealing on a Messagepassing Multi-computer,” Ph.D. dissertation, Utah State University, 1995. [22] M. G. Arenas, P. Collet, A. E. Eiben et al., “A framework for distributed evolutionary algorithms,” in 7th PPSN, 2002. [23] M. P. Christian Gagne and M. Dubreuil, “Distributed BEAGLE: An Environment for Parallel and Distributed Evolutionary Computations,” in 17th HPCS, 2003. [24] E. Alba, F. Almeida, M. J. Blesa et al., “Mallba: A library of skeletons for combinatorial optimisation,” in 8th Euro-Par, 2002. [25] S. Cahon, N. Melab, and E.-G. Talbi, “Paradiseo: A framework for the reusable design of parallel and distributed metaheuristics,” Journal of Heuristics, vol. 10, no. 3, 2004. [26] T. Erl, Service-oriented Architecture : Concepts, Technology, and Design. Upper Saddle River, NJ, USA: Prentice Hall, 2005. [27] Torque Resource Manager, http://www.clusterresources.com/products/torque/, Apr. 2010. [28] R. T. Fielding, “Architectural Styles and the Design of Network-based Software Architectures,” Ph.D. dissertation, UCI, 2000. [29] Linux NFS, http://nfs.sourceforge.net/, Apr. 2010.