Computational Experience with Branch, Cut and Price Algorithms in ...

4 downloads 239 Views 2MB Size Report
frameworks using a parallel approach to solve optimization problems, while the potential of computing ... schema [4]. SWI-Portal is a web portal that manages users and jobs. ... tree structure called search tree, and a bounding one, where they are evaluated .... Toolkit and the local Sun Grid Engine (SGE) scheduler. The Sun ...
Computational Experience with Branch, Cut and Price Algorithms in Grid Environments Sonya Marcarelli, Emilio Pasquale Mancini, and Umberto Villano Universit` a del Sannio, Dipartimento di Ingegneria, RCOST, Benevento, Italy {sonya.marcarelli, epmancini, villano}@unisannio.it

Abstract. This paper presents our computational experience with parallel Branch, Cut and Price algorithms in a geographically-distributed grid environment. After a brief description of our framework for solving large-scale optimization problems, we describe the experimental grid environment and the tests performed, presenting the obtained performance results.

1

Introduction

In the field of integer optimization, Branch and Bound is one of the most common methods used to solve hard optimization problems. It uses a divide-and-conquer strategy to explore the set of feasible solutions and takes trace of them using a search tree. Unfortunately, many real-world problems are NP-Hard and may require search trees of exponential size. Then, it is natural to try to parallelize the search in order to make the solution more practical. Currently, there are several frameworks using a parallel approach to solve optimization problems, while the potential of computing grids seems to have been only partially exploited. This paper aims to explore this field, since it describes our computational experience in using the Branch, Cut and Price platform, described in [1], for solving largescale optimization problems in a grid environment. In a previous paper [1], we have shown the tests performed in a cluster of Globus nodes on a single LAN. Here, the target computing environment is a grid made up of two clusters belonging to two different LANs, where the front-ends have public addresses and the compute nodes have private addresses. The software system developed for our test is composed of two framework, BCP-G and Meta-PBC, and a web portal, SWI-Portal. BCP-G is a customized version of COIN/BCP, an open source framework developed within the IBM COIN-OR project. The original COIN/BCP framework, based on the use of PVM libraries, has been provided with a new MPI communication API able to exploit the MPICH-G2 system, a grid-enabled MPI implementation [2, 3]. Meta-PBC is instead a brand new framework, implementing a master-worker schema [4]. SWI-Portal is a web portal that manages users and jobs. In the next section, we introduce the Branch, Cut and Price algorithms and the architecture of our grid-enabled system. Then, we describe our experimental grid environment with the interconnection network used and we present G. Min et al. (Eds.): ISPA 2006 Ws, LNCS 4331, pp. 125–134, 2006. c Springer-Verlag Berlin Heidelberg 2006 

126

S. Marcarelli, E.P. Mancini, and U. Villano

several case studies and the obtained performance results. The paper closes with a discussion on future work and our conclusions.

2

The Branch, Cut and Price Algorithms

Branch and Bound algorithms are among the most widely used methods for solving complex optimization problems [5]. An optimization problem is the task of minimizing (maximizing) an objective function, a function that associates a cost to each solution. Branch and Bound uses a divide-and-conquer strategy that partitions the solution space into subsets. As is well known, it is made up of two phases: a branching one, where the subsets of solutions are examined forming a tree structure called search tree, and a bounding one, where they are evaluated finding upper and lower bounds to the optimal solution. LP-based Branch and Bound is a Branch and Bound where the lower bound is computed solving the LP-relaxation of the problem. For example, in a generic MILP (Mixed Integer Linear Programming) problem, there is a finite number of variables subject to the integrality constraint. By relaxing this constraint, we have a LP-relaxation, whose optimal value is a lower bound to the original problem. A typical branching operation is to select a variable with fractional value in the LP solution. Branch and Cut algorithms are a type of Branch and Bound where a finite number of cuts, that is, valid inequalities, are dynamically added to the search tree, in order to improve the lower bound to the LP-relaxation [6, 7]. Branch and Price algorithms are instead based on column generation in order to solve problems with a very large number of variables. They use initially only a small subset of the problem variables and of the respective columns in the constraints matrix, thus defining a reduced problem. In fact, in the original problem, there are too many columns and great part of them will have the respective variables equal to zero in an optimal solution. Branch, Cut and Price joins the two methods used by Branch and Cut and Branch and Price, producing dynamically both cutting planes and variables [5].

3

The Architecture of the Solver System

The architecture of our solver [1] is shown in Fig. 1. In the figure, the upper layer is the portal interface. In the middle, there are the two framework BCP-G and MetaPBC, all of which rely on the lower layer (Globus and MPICH-G2). BCP-G is an optimization framework based on the Branch, Cut And Price method, which we have implemented from COIN/BCP, adding to it a new communication interface written in MPI. COIN/BCP is an open-source framework based on Branch, Cut and Price for solving mixed integer programming problems [8]. It offers a parallel implementation of the algorithm based on the message-passing environment PVM (Parallel Virtual Machine). COIN/BCP implements a single-node pool algorithm, where there is a single central list

Computational Experience with Branch, Cut and Price Algorithms

127

of candidate sub-problems to be processed, owned by the tree manager. The modules communicate with each other by exchanging messages through a messagepassing protocol defined in a separate communications API.

Fig. 1. Solver architecture

Our communication interface in MPI is implemented by the two classes BCP mpi environment and BCP mpi id, which manage the communications between computational modules and the process ids, respectively. This new interface, which is now integrated in the COIN-OR framework (http://www.coin-or.org/ download.html), allows the use of this framework in a Globus grid environment through the grid-enabled implementation of MPI, MPICH-G2. The user has simply to write a Globus rsl script and, through the globusrun command, he/she can launch the solver execution [9]. Meta-PBC is a parallel solver for solving optimization problems using the Branch and Cut algorithm. It consists of three modules: manager, worker and tree monitor [4]. The manager is responsible for the initialization of the problem and manages the message handling between the workers. The worker is a sequential Branch and Cut solver, with additional functionality to communicate in the parallel layer. The workers communicate with each other through the parallel API to know the state of the overall solution process. The parallel interaction between modules is achieved by a separate communication API. The current version is implemented in MPI. In this way, the processes can be executed on a grid using MPICH-G2. The tree monitor collects information about the search tree. SWI-Portal is a web portal that allows users to submit jobs and hence to solve optimization problems, to monitor their job, to view their output and to download the results. Users interact with the portal, and, therefore, with the solvers and the grid, through this interface. SWI-Portal is implemented using the Java Server Pages technology (JSP). It consists of an user interface and of a set of Java classes, wrapping the most important and useful Globus functions. Furthermore, it uses a database to collect information on users, jobs and resources.

128

S. Marcarelli, E.P. Mancini, and U. Villano

All the services and the functions supplied by the SWI-Portal are grouped in four subsystems. The first, the account subsystem, is responsible for managing user access in conjunction with the users DB. It allows a user to register in the system and to enter the portal, giving his login and password. The second one is the scheduling subsystem. It invokes the Globus gatekeeper and the associated job-manager to start the run with the parameters supplied by the user. The subsystem also records information about the runs in the database. From the pages of the Monitoring subsystem, a user can check the status and any other information about all the started processes (such as output, error, rsl, and search tree). Through the Download Subsystem, a user can download all information regarding his jobs and/or cancel this from the server. More details about the solver system architecture can be found in [1].

4

The Grid Testbed

To test the system on slow extra-LAN connections, we have configured an experimental grid environment made up of two Rocks clusters (fab4 and e-science) at two different sites at the University of Sannio [10]. Fig. 2 shows the architecture of our grid environment. Each cluster has front-end with public IP and compute nodes with hidden IPs. We used 4 workstation on fab4, each equipped with Pentium Xeon, 2.8 GHz CPU and 1 GB of RAM, and 16 workstations on e-science, each equipped with Pentium Xeon, 2.8 GHz CPU and 1 GB of RAM. The intra-cluster connection is 100 GigaEthernet. The two clusters are not on the same campus LAN, and are actually connected by a very slow connection (the details are provided later). We installed on each front-end the Globus Toolkit and the local Sun Grid Engine (SGE) scheduler. The Sun Grid Engine is a distributed resource management (DRM) software and it provides functions to utilize effectively the resources within the cluster as submitting, monitoring and managing jobs. In particular, the Globus gatekeeper uses the SGE scheduler as its job-manager. On each head node, we installed MPICH-G2, which allows intercluster and intra-cluster communication. MPICH-G2, based on Globus Toolkit services, allows to run MPI application on a grid environment. It uses TCP for inter-machine messaging and a vendor-supplied MPI (where available) for intramachine messaging. MPICH-G2 requires point-to-point communication between the nodes where the jobs are running. Unfortunately, this requires that all compute nodes have public IP addresses, but this is in contrast with a classical cluster configuration like ours, where the compute nodes have private IPs. In order to solve this problem, and to use all of the processors available, we have chosen a solution based on the Realm Specific IP (RSIP) framework and protocol. RSIP is a network address translation technology that performs a function similar to NAT. It allows the communication between two hosts belonging to different address spaces. In our solution, we installed on each head node an RSIP server and on each compute node an RSIP client. When a compute node of e-science (RSIP client) wishes to contact a node of fab4, it queries the RSIP server for a port number and a public IP address. The client then tunnels the packets to the

Computational Experience with Branch, Cut and Price Algorithms

129

Fig. 2. The Grid Testbed

Fig. 3. Bandwidth in log scale

RSIP server, which strips off the tunnel headers and sends the packets to the target node. On incoming packets, the RSIP server looks up the client IP, based on port number, adds the tunnel header and sends them to the RSIP client. We measured the bandwidth of the extra-LAN connection between the two clusters, including the overhead of the RSIP protocol, through a simple MPI ping-pong program, which calculates the communication time between two processes using blocking send and receive (Fig. 3). The figure shows the bandwidth of the intra-cluster network, on e-science and fab4, and of the inter-cluster

130

S. Marcarelli, E.P. Mancini, and U. Villano Table 1. Transmission latency

inter-cluster e-science fab4

Latency (µs) 650158.2 2067.8 1994.08

network. Moreover, Table 1 shows the transmission latency measured between two nodes in a single cluster, and in different clusters (inter-cluster).

5

Case Studies

We present here the performance results obtained by our solver for the solution of an optimization problem in the above-described grid environment. In particular, we have implemented a generic MIP solver to solve mixed integer linear programming problems. A MIP problem has the following form: min cT x s.t. Ax ≥ b xz ∈ Z n , xc ∈ Rn where c ∈ Rn , A ∈ Rm × n, b ∈ Rm . In the computational experiments of BCPG with the MIP solver we take advantage of the MIPLIB library [11], which, since its introduction, has become a standard test set, and is commonly used to compare the performance of mixed integer optimizers. As LP solver, we use CLP, an open source solver of the COIN-OR project. Table 2 shows the details of the tested problem instances. Column name is the name of the problem instance, rows the number of constraints, cols the number of variables, ints the number of integer variables and nonzeros the number of nonzero elements in the constraints matrix. Table 2. Tested instances name Stein45 Misc07 10teams

rows 331 212 230

cols 45 260 2025

ints 45 259 ——

nonzeros 1034 8619 12150

Figures 4(a), 4(b), 4(c) compare the response times of BCP-G using different scheduling strategies. Using the first strategy, s1, slave processes first are spawned on e-science, and, only when all the nodes of this cluster have been used, on fab4. In the second one, s2, we spawn immediately slave processes alternatively on both clusters. Of course, in the second case the response time is higher, especially for low number of hosts, due to the latency introduced by the inter-cluster connection (its max measured bandwidth is 0,2 MB/s). All the tests presented show that from 16 hosts onward the use of additional hosts does not involve any significant gain in the solver performance. The reason is the used

Computational Experience with Branch, Cut and Price Algorithms

(a) stein45

131

(b) misc07

(c) 10teams Fig. 4. Response times for the stein45, misc07 and 10teams problem, using two scheduling strategies on a variable number of hosts

Fig. 5. The centralized approach

parallelization strategy along with the architecture of our grid environment. The solver system uses a classical centralized approach where all the communications are between the master and the slave processes. During the execution, there is a large amount of data exchanged between the master and the slave processes. In our tests, the master process, the tree manager, is on a compute node on e-science and the slave processes are on all the other compute nodes. The use of the additional compute nodes of fab4 increases the time spent in communication and hence the parallel overhead because of the high transmission latency of the extra-LAN connection. The alternative is to use a decentralized approach where each cluster has its local pool of problems to solve in order to reduce the communication on slow connections. Figure 5 and 6 show a centralized approach,

132

S. Marcarelli, E.P. Mancini, and U. Villano

Fig. 6. The decentralized approach

like our solver, and a decentralized approach in a grid environment made up of three clusters on three LANs.

6

Related Work

Many software packages implementing parallel branch and bound have been developed. SYMPHONY [5] is a parallel framework, similar to COIN/BCP, for solving mixed integer linear programs; PICO [12]. PARINO [13] and FATCOP [14, 15] are generic parallel MIP solvers. Some other parallel solver are PUBB [16] and PPBB-Lib [17]. ALPS [18] is a framework for implementing parallel graph search algorithms and MW [19] is a framework for making master-worker application in grid-environment using Condor. The literature dealing with the management and the performance of MPI applications in grid environments made up of private IP clusters is relatively limited. The paper [20] presents MPICH-GP, an extension of MPICH-G2 for supporting Private IP, whereas [21] describes a solution based on IMPI standard with Network Address Translation mechanism and [22] proposes a solution based on RSIP. Papers on similar topics are [23] and [24]. The paper [25] presents a performance analysis on hierarchical grid system with different bandwidths between clusters.

7

Conclusions and Future Work

In this paper, we have described the configuration of our experimental grid environment, which is made up of two clusters located in two different LAN. We have described the problems encountered using MPICH-G2 in such environment, where the compute nodes have hidden IPs, showing the solution based on RSIP and its performance evaluation. The computational tests performed and presented here led to unsatisfactory results, in that the high latency of the extra-LAN connection minimizes the performance gains due to the use of a high number of compute nodes. A grid is a set of resources of heterogeneous nature with different computational power connected by network with different performance characteristics. In this context, it is necessary to make grid-aware the

Computational Experience with Branch, Cut and Price Algorithms

133

application in order to achieve good performance. A simple master-worker approach, as our tests prove, is not a good solution because does not take into account the topology of the grid environment. In our future work, we wish to change the architecture of the solver system using a decentralized approach. We would divide the search tree in many sub-trees and assign one of them to each cluster, which will solve it individually (Fig. 6). In this way, we think that the overhead introduced by slow networks, as in the case described in our tests, should be reduced.

References 1. Mancini, E., Marcarelli, S., Ritrovato, P., Vasil’ev, I., Villano, U.: A grid-aware branch, cut and price implementation. Lecture Notes in Computer Science 3666 (2005) 38–47 2. Ferreira, L., Jacob, B., Slevin, S., Brown, M., Sundararajan, S., Lepesant, J., Bank, J.: Globus Toolkit 3.0 Quick Start. IBM. (2003) 3. Karonis, N., Toonen, B., Foster, I.: MPICH-G2: A Grid-Enabled Implementation of the Message Passing Interface. J. of Parallel and Dist. Comp. 63 (2003) 551–563 4. Vasil’ev, I., Avella, P.: PBC: A parallel branch-and-cut framework. In: Proc. of 35th Conference of the Italian Operations Res. Society, Lecce, Italy (2004) 138 5. Ralphs, T., Ladanyi, L., Saltzman, M.: Parallel Branch, Cut, and Price for LargeScale Discrete Optmization. Mathematical Programming 98 (2003) 253–280 6. Margot, F.: BAC: A BCP Based Branch-and-Cut Example. (2003) 7. Cordiery, C., Marchandz, H., Laundyx, R., Wolsey, L.: bc-opt: a Branch-and-Cut Code for Mixed Integer Programs. Mathematical Programming (86) (1999) 335– 354 8. Ralphs, T., Ladanyi, L.: COIN/BCP User’s Manual. (2001) http://www.coin-or. org/Presentations/bcp-man.pdf. 9. Globus Alliance: WS GRAM: Developer’s Guide. (2005) http://www-unix. globus.org/toolkit/docs/3.2/gram/ws/developer. 10. Papadopoulos, P.M., Katz, M.J., Bruno, G.: NPACI Rocks: tools and techniques for easily deploying manageable Linux clusters. Concurrency and Computation: Practice and Experience 15 (2003) 707–728 11. Bixby, R.E., Ceria, S., McZeal, C.M., Savelsbergh, M.W.P.: An updated mixed integer programming library MIPLIB 3.0. Optima (58) (1998) 12–15 12. Eckstein, J., Phillips, C., Hart, W.: Pico: An object-oriented framework for parallel branch and bound. Technical report, Rutgers University, Piscataway, NJ (2000) 13. Linderoth, J.: Topics in Parallel Integer Optimization. PhD thesis, School of Industrial and Systems Engineering, Georgia Inst. of Tech., Atlanta, GA (1998) 14. Chen, Q., Ferris, M.C.: Fatcop: A fault tolerant condor-pvm mixed integer programming solver. Technical report, University of Wisconsin CS Department Technical Report 99-05, Madison, WI (1999) 15. Chen, Q., Ferris, M., Linderoth, J.: Fatcop 2.0: Advanced features in an opportunistic mixed integer programming solver. Annals of Op. Res. (103) (2001) 17–32 16. Shinano, Y., Higaki, M., Hirabayashi, R.: Control schemas in a generalized utility for parallel branch and bound. In: Proc. of the 1997 Eleventh International Parallel Processing Symposium, Los Alamitos, CA, IEEE Computer Society Press (1997) 17. Tschoke, S., Polzer, T.: Portable Parallel Branch-And-Bound Library PPBB-Lib User Manual. Department of computer science Univ. of Paderborn. (1996)

134

S. Marcarelli, E.P. Mancini, and U. Villano

18. Ralphs, T.K., Ladanyi, L., Saltzman, M.J.: A library hierarchy for implementing scalable parallel search algorithms. J. Supercomput. 28(2) (2004) 215–234 19. Goux, J., Kulkarni, S., Yoder, M., Linderoth, J.: An enabling framework for masterworker applications on the computational grid. In: HPDC ’00: Proceedings of the Ninth IEEE International Symposium on High Performance Distributed Computing (HPDC’00), Washington, DC, USA, IEEE Computer Society (2000) 43 20. Park, K., Park, S., Kwon, O., Park, H.: Mpich-gp: A private-ip-enabled mpi over grid environments. In: Lecture Notes in Computer Science, Proc. of Parallel and Distributed Processing and Applications, Second International Symposium (ISPA04). Volume 3358., Berlin, DE, Springer (2004) 469–473 21. Velusamy, V., Bangalore, P., Raman, P.: Communication strategies for privateip-enabled interoperable message passing across grid environments. In: Proc. of First International Workshop on Networks for Grid Applications. (2004) http://www.broadnets.org/2004/workshop-papers/Gridnets/Velusamy V.pdf. 22. Das, D., Sabharwal, R., Saraswati, S., Anantharaman, P.N., Oh, J.: A network architecture for enabling execution of mpi applications on the grid. International Journal of Information Technology 11(4) (2004) 74–83 23. Heymann, E., Senar, M.A., Fern´ andez, E., Fern´ andez, A., Salt, J.: Managing mpi applications in grid environments. In Dikaiakos, M.D., ed.: Grid Computing: Second European AcrossGrids Conference, Lecture Notes in Computer Science. Volume 3165. (2004) 42–50 24. Choi, S., Park, K., Han, S., Park, S., Kwon, O., Kim, Y., Park, H.: An nat-based communication relay scheme for private-ip-enabled mpi over grid environments. In: International Conference on Computational Science. (2004) 499–502 25. Chen, C., Schmidt, B.: Performance analysis of computational biology applications on hierarchical grid systems. In: CCGRID, IEEE Computer Society (2004) 426–433

Suggest Documents