Using Constraint Programming to Plan Efficient Data ...

1 downloads 0 Views 115KB Size Report
Using Constraint Programming to Plan Efficient Data Movement on the Grid. Michal Zerola and Michal Å umbera. Nuclear Physics Institute, ASCR.
Using Constraint Programming to Plan Efficient Data Movement on the Grid ˇ Michal Zerola and Michal Sumbera Nuclear Physics Institute, ASCR Czech Republic

Roman Bart´ak Charles University Czech Republic

Abstract Efficient data transfers and placements are paramount to optimizing geographically distributed resources and minimizing the time data intensive experimentss processing tasks would take. We present a technique for planning data transfers to single destination using a Constraint Programming approach. We study enhancements of the model using symmetry breaking, branch cutting, well studied principles from scheduling field, and several heuristics. Real-life wide area network characteristic is explained and the realization of the computed formal schedule is proposed with an emphasis on bandwidth saturation. Results will include comparison of performance and trade-off between CP techniques and Peer-2-Peer model.

1. Introduction and related works The needs of large-scale data intensive projects arising out of several fields such as bio-informatics (BIRN, BLAST), astronomy (SDSS) or High Energy and Nuclear Physics (HENP) communities have been brainteasers for computer scientists for years. One of such HENP experiments is the STAR experiment 1 . As all running experiments, valuable set of new data are taken every year and then reduced to physics ready data formats. The ever growing samples call for large-scaled storage management and efficient scheme to distribute and move data, while endusers may need to access data sets from any point in time. Coordination is hence needed to avoid random access destroying efficiency due to the sharing of common infrastructure such as storage and network and this applies to efficient use of Data Grids. Decoupling of job scheduling from data movement was studied by Ranganathan and Foster in [6]. Authors discussed combinations of replication strategies and scheduling algorithms, but not considering the performance of the network. The nature of HENP experiments, where data are centrally acquired, implies that replication to 1 http://www.star.bnl.gov

J´erˆome Lauret Brookhaven National Laboratory USA

geographically spread sites is a must in order to process data distributively. Accessing large-scale data remotely over wide-area network has turned out to be highly ineffective and a cause of often sorely traceable troubles. The authors of [8] proposed and implemented improvements to Condor, a popular batch system. Its data management architecture is based on exploiting workflows and utilizing data dependencies between jobs through study of related DAGs. Since the workflow in HENP data analysis is typically simple and embarrassingly parallel without dependencies between jobs these techniques do not lead to a fundamental optimization in this field. Sato et al. in [7] and authors of [5] tackled the question of replica placement strategies via mathematical constraints modeling an optimization problem in Grid environment. The solving approach in [7] is based on integer linear programming while [5] uses Lagrangian relaxation method. The limitation of both models is a characterization of data transfers which neglects possible transfer paths and fetching data from a site in parallel via multiple links possibly leading to better network utilization. The purpose of our work is to design and develop an automated system that would efficiently use all network resources in order to bring the aimed dataset to the required destination, without sacrifice of fairness. An initial idea of our presented model originates from Simonis [9] and the proposed constraints for traffic placement problem were expanded primarily on links throughputs and consequently on follow-up transfer allocations in time. The solving approach is based on Constraint Programming techniques. One of the immense advantages of the constrained based approach is a gentle augmentation of the model with additional real-life rules. This work expands the initial study in [11].

2. Problem formalization In this section we will present a formal description of the problem using mathematical constraints which present restrictions from reality. The network is modeled as a directed weighted graph − → (G = (N, E )), where the weight of an edge corresponds to the link bandwidth. The information about demand’s

Re1 Td1 ,e1

JSSP instance

Td1,e1

Td1 ,e3 Re2 Td2 ,e3

Td2,e2

Td2 ,e2 Re3

Td1 ,e3 Td2 ,e3

M1

e1

M2

e2

M3

e3

J =< A3 , A1 , A2 >

Figure 1. The transfer paths of demands d1 and d2 share link e3 , and consequently resource Re3 as well.

Data Transfer instance dest

Path defined by job J

Figure 2. The job J =< A3 , A1 , A2 > defines a transfer path to dest using links e3 , e1 , e2 , with order given by precedences of actions.

2.2. Scale and complexity of the problem

(d ∈ D) origins (orig(d)) is a mapping of that file to a set of sites where the file is available.The goal of the solver is to produce a transfer path for each file from one origin leading to the destination; and for each file and its selected transfer path, allocate a particular link transfer, such that the resulting plan has minimum makespan.

We will show that computational complexity of the problem, in particular of the scheduling stage is strongly NPhard. The input of the scheduling phase consists of the directed weighted graph (eventual loops are allowed), where the weight of an edge determines its throughput and the set of paths without loops, all of them leading to the same vertex. Each path needs time defined by a throughput of a particular edge for crossing it (for a given edge, this time is identical for all paths). Crossing an edge cannot be interrupted and only one path can cross an edge in a given time. The path must cross its edges in an order defined by the path itself. The goal is to allocate edge crosses for every path in such a way that time when the last path reaches destination vertex is minimized. We will use the well-known three field classification α|β|γ for scheduling problems as described in the [3]. We will demonstrate a possible polynomial-time reduction from J3|pij = 1|Cmax , a 3-machine unit-time Jobshop scheduling problem (JSSP). The transformation of the JSSP instance consists of creation a unique link e1 , e2 , and e3 for every machine M1 , M2 , and M3 a unique link e1 , e2 , and e3 is created and a single destination site dest. Every job with ordered actions defines a transfer path with alternating dummy edges, as depicted in Fig. 2. Slowdown for each dummy edge is set to 0, e.g. edge is not causing delays to any transfer. The role of dummy edges is to provide correct paths for each job. The transformation written above can be realized in a polynomial time, as the size of a transformed instance increases linearly by the factor of 3 caused by dummy edges. Due to the fact that the J3|pij = 1|Cmax problem belongs to the strongly NP-hard class [10], the direct consequence implies that computational complexity of our problem is at least the same.

The solving process is composed of two iterative stages and we will describe each of them separately continuing with the explanation of their interaction and possible reduction of search space exploration.

2.1. Planning and scheduling stage The aim of planning stage is to generate a valid transfer path for each requested file from one of its origins to the destination node. We have studied link-based and pathbased approach. The essential idea for the first one is to use one decision variable (Xde ) for each demand and link, while in the second one the variable is coupled with demand and path. The detailed description of the constraints for linkbased principle can be found in [11], while the constraint model for path-based one is explained in [9]. The goal of the scheduling stage is to evaluate the path configuration in the sense of the required makespan. Essentially, it works as an objective function because the realization of the schedule will not depend on particular transfer times calculated in this phase, as we will show in section 5. We will use the notion of tasks and resources from the area of scheduling. For each link e of the graph that will be used by at least one file demand, we introduce a unique unary resource Re . Similarly, for each demand and its transfer path we introduce a set of tasks (depicted in Figure 1). We will assume that all requested files are of the identical unit size. This assumption is not far from reality, because data acquired from detector systems are usually stored in files of a similar size. Instead of defining the link speed as a function of file size over a period of time, we will introduce a term slowdown factor, that quantifies how many units of time are needed for transferring one unit of size. This notion allows us to represent constraints in a simpler way.

3. Constraint model and solving strategy The principle of the search procedure and iteration of described two stage model is outlined in Algorithm 1. The actual best makespan is used as a bound for scheduling phase (Branch-and-bound strategy). Moreover, the ac2

Algorithm 1 Pseudocode for a search procedure.

0/1 P

makespan ← sup plan ← Planner.getFirstPlan() while plan != null do schedule ← Scheduler.getSchedule(plan, makespan) {B-a-B on makespan} if schedule.getMakespan() < makespan then makespan ← schedule.getMakespan() {better schedule found} end if plan ← Planner.getNextPlan(makespan) {next feasible plan with cut constraint} end while

Xd1,l1 = 1

l3

Xd2,l3 = 1

0/1

0/1 P

0/1

0/1

0

0

=1

some local inconsistencies and thus delete some regions of search space that do not contain any solution. By removing inconsistent values from variable domains the efficiency of the search algorithms is improved. A filtering principle we can apply for the decision X variables in the link-based planning model is following. Let S1 and S2 be two sets of the decision X variables, such P that S2 ⊂ S1 . Then, if there exists a constraint stating X∈S1 X = 1 and similarly for P the second set X∈S2 X = 1, we can reduce domains of variables in S1 \ S2 to value 0 (depicted in Fig. 4). The filtering procedure is simple. During posting of constraints into the model, each set of variables, summation of which must equal to 1, is stored. Then we search for pairs S1 and S2 using a quadratic nested loop over the sets stored.

orig(d2 ) l2

0/1

Figure 4. Filtering procedure can reduce values as can be seen on the right graph.

orig(d1 )

l1

=1

l4

l1 ≤ l3

Figure 3. If demand d1 , d2 is assigned to links l1 , l3 respectively, relation l1 ≤ l3 must hold. tual makespan can be effectively used also for pruning the search space during the first (planning) stage as described in [11].

3.2. Search heuristics We have tested several combinations of variable selection and value iteration heuristics. Initially, in the planning phase well known dom strategy [2] for labeling decision X variables was tested. We suggested also a FastestLink selection for link-based planning approach and similarly FastestPath for path-based one. According to the measurements shown in section 4, the majority of time was spent in the planning phase, hence we proposed an improved variable selection heuristic that exploits better the actual transfer times called MinPath. We have described all mentioned heuristics in [11]. In the scheduling phase two approaches were considered. The first one, called SetTimes, is based on determining Pareto-optimal trade-offs between makespan and resource peak capacity. Detailed description and explanation can be found in [4]. The second one is a texturebased heuristic called SumHeight, using ordering tasks on unary resources. The used implementation originates from [1] and supports Centroid sequencing of the most critical activities.

3.1. Symmetry breaking and implied constraints One of the common techniques for reducing the search space is detecting and breaking variable symmetries. This is frequently done by adding variable symmetry breaking constraints that can be expressed easily and propagated efficiently using ordering. One idea that can be applied in the planning stage is the following. Let two file demands d1 and d2 have overlapping sets of origins O = orig(d1 ) ∩ orig(d2 ), such that |O| ≥ 2. The set of all outgoing edges from the common origins will be denoted by L (L = ∪n∈O OUT(n)) and will be paired with the total order ≤. Then, if both file demands d1 and d2 are leaving their origins via links from the set L we will require ordering between particular links (∀a, b ∈ L : b < a =⇒ post constraint Xd1 ,a = 0 ∨ Xd2 ,b = 0) Similar idea can be applied in the scheduling stage as well. In this case, let’s suppose that several file demands are going to be scheduled for a transfer to the destination by exactly the same path. Since file demands have an identical size, we can fix the order in which files are allocated to each link from their common path. More precisely, among the corresponding tasks assigned to each unary resource representing the link from the path, we introduce additional precedences fixing their order. Both presented methods significantly contribute to search space reduction because a real network topology and distribution of files tend to a vast amount of symmetries. The usual purpose of filtering techniques is to remove

4. Comparative studies We have implemented and compared performance of both alternatives of the model, namely using link-based and path-based approach. Several combinations of heuristics were tried and in addition comparison with simulated Peer-2-Peer (P2P) method is shown. The P2P approach is based on torrent-like principle, where files are pulled directly from origins. For implementation of the solver we use 3

P2P

dom ր ST

Link based fast ց ST

fast ց SH

dom ր ST

Path based fast ց ST

fast ց SH

1 20 40 60 80 100 120 140 160 180 200

1 24 40 56 72 88 104 120 144 152 168

1 12 22 42 57 73 89 103 117 137 165

1 12 22 42 57 73 89 103 117 134 165

1 12 22 42 57 73 89 103 117 132 146

1 77 149 237 ? ? ? ? ? ? ?

1 15 33 48 64 81 104 128 150 172 193

1 15 33 48 64 81 103 127 149 172 192

number of requested files to transfer time in seconds taken by solver to generate result percentage of overall time spent in the planning phase percentage of overall time spent in the scheduling phase number of scheduling calls time in general units to execute plan from P2P simulator time in general units to execute plan from CSP solver

Files RunTime % in 1st % in 2nd # of 2nd P2P-M Makespan Files

RunTime

% in 1st

% in 2nd

#of 2nd

P2P-M

Makespan

1 20 40 60 80 100 120 140 160 180 200

0.353 3.548 30 30 30 30 30 30 30 31 32

70% 77% 91% 98% 98% 98% 80% 95% 94% 92% 89%

30% 23% 9% 2% 2% 2% 20% 5% 6% 8% 11%

1 18 169 42 23 28 119 40 45 47 57

8 24 32 56 72 88 104 120 144 152 168

1 12 22 43 58 73 89 103 117 134 146

Table 1. Comparison of makespans. ր, ց Increasing, Decreasing value selection, ST SetTimes, SH - SumHeight heuristic. Choco 2 , a Java based library for constraint programming. The Java based platform allows us an easier integration with already existing tools in the STAR environment. Regarding the data input part, the realistic-like network graph consists of 5 sites, denoted as BNL, LBNL, MIT, KISTI, and Prague and all requested files are supposed to be transferred to Prague. The distribution of files origins at one particular site, is following: the central repository is at BNL where 100% of files are available, LBNL holds 60%, MIT 1%, and KISTI 20% of all files. All presented experiment were performed on Intel Core2 Duo [email protected] with 2GB of RAM, running a Debian GNU Linux. CPU time limit was used for both phases and more precisely, if the top-level search loop detects that time cumulatively spent in planning and scheduling phase exceeded 30 seconds, the search is terminated. Table 1 shows comparison of six combinations of search heuristics for linkbased and path-based model with a Peer-2-Peer one, with an emphasis on a makespan, as quality of the result. We can see that link-based model gives generally better results than path-based, and the most efficient combination of heuristics is FastestLink variable selection in Decreasing value order with the SumHeight heuristic used in a scheduling phase. In this case the solver produced schedules that were better than P2P for all input instances, while the most significant benefits (≈ 50% gain) are for instances up to 50 files. For instances of 40 and more files, all combinations of heuristics reached the time limit of 30 seconds. For P2P all makespans were obtained in less than 1 second. Makespans are in general time units, where 1 unit in reality depends on real link speeds, and we can roughly estimate this 1 unit to the couple of seconds. Hence, the time taken to compute a schedule is paid-off by savings resulting from a better makespan. In reality the network characteristic is dynamic and fluctuates in time. Hence, trying to create a plan for 1000 or more files that will take several hours to execute is needless, as after the time elapsed the computed plan may no longer be valid. Our intended approach is to work with batches, giving us another benefit of implementing fair-share mechanism in a multi user environment. Particularly, the requests

Table 2. Results for link-based approach with FastestLink selection in ց order for planning phase and SH heuristic for scheduling phase.

Makespan (units)

Heuristics performance for 50 files 50 48 46 44 42 40 38 36 34 32

Heuristics performance for 150 files

FastestLink MinPath Peer-2-Peer

Makespan (units)

Files

150 145

FastestLink MinPath Peer-2-Peer

140 135 130 125 120 115 110

0

5

10

15

20

25

30

Time (sec)

0

5

10

15

20

25

30

Time (sec)

Figure 5. Convergence of makespan during the search process for FastestLink and MinPath. coming from users are queued and differ in size and priorities of users. The possibility to pick demands from waiting request into batch within reasonably short intervals is very convenient for achieving fair-shareness. The experiments give us an estimate on the number of files per batch. For a better decomposition and estimate of times spent in the phases we studied heuristics combinations such as FastestLink + SumHeight (see Table 2). On the basis of detailed measurements, we can see that the majority of time spent in a solving process happens in the planning stage. According to this fact and an average number of scheduling calls (5-th column in Table 2) the implementation of the scheduling stage seems to be fairly efficient. Therefore, we have focused on the improvements for the heuristic MinPath in the planning stage as proposed in section 3 and compared the performance with the FastestLink. Figure 5 shows that convergence of the new MinPath heuristic is faster than the FastestLink and both heuristics achieve better makespan than the P2P approach.

5. Schedule execution For evaluating the applicability of our model to the real world, let us briefly describe the TCP/IP protocol principle underlying transfers. Transferring a single file consists of splitting the data content into packets of a given size

2 http://choco.sourceforge.net

4

(defined by window scaling parameter of the protocol) and sending them one-by-one from the source to the destination. Since the reception of each packet has to be acknowledged by the receiver to achieve delivery guarantee, the time for the acknowledged packet to travel from source to destination and back, so called round-trip time (RTT) plays an important role. In wide area network the RTT is significant. To overcome such delays, data transfer tools use threads to handle several TCP streams in parallel in an attempt to smooth or minimize the intrinsic delays associated with TCP. Another standard approach is the use of multiple sender nodes per link to increase the bandwidth usage. With this last approach, any one instance downtime would be compensated by other active senders transfers. Both approaches are typically combined for best results and it has been experimentally shown that a flat transfer rate could be achieved across long distance. Our modeling assumes a single file transfer at any time on a link using unary resources. Trying to explicitly model the real network and packets behavior into the solver would over-complicate the problem. Our implementation will consider: (a) every link is supplied by one LinkManager responsible for transferring files over the link if the link is part of their computed transfer path; (b) as soon as a file becomes available, the corresponding LinkManager initiates another instance of the transfer, respecting a maximum allowed simultaneous parallel instances. The implementation will consider just the plan - the computed transfer paths, because there is no due-time limitation and executes transfers in a greedy manner. However, to allow this, one has to be sure that computed time to complete the schedule would not differ substantially from the real execution of the transfers. Consequently we developed the real-life network simulator along the above facts and comparison of the makespans showed results consistent with each other within a 3% margin, which is negligible. Hence the experiment confirmed that the presented model provides fairly accurate estimate of the real makespan.

formed experiments for evaluating their applicability. Comparison of the results and trade-off between the schedule of a constraint solver and a Peer-2-Peer simulator indicates that it is promising to continue with the work, thus bring improvements over the current techniques used in the community. In the nearest future we will concentrate on the integration of the solver with real data transfer back-ends, consequently execute tests in the real environment.

Acknowledgment The investigations have been partially supported by the IRP AVOZ 10480505, by the Grant Agency of the Czech Republic under Contract No. 202/07/0079 and 20513/201457, by the grant LC07048 of the Ministry of Education of the Czech Republic and by the U.S. Department Of Energy.

References [1] J. C. Beck, A. J. Davenport, E. M. Sitarski, and M. S. Fox. Texture-based heuristics for scheduling revisited. In AAAI, pages 241–248. AAAI Press, 1997. [2] S. W. Golomb and L. D. Baumert. Backtrack programming. J. ACM, 12(4):516–524, 1965. [3] R. Graham, E. Lawler, J. Lenstra, and A. R. Kan. Optimization and approximation in deterministic sequencing and scheduling: A survey. Ann. Discrete Math., 5:169–231, 1979. [4] C. L. Pape, P. Couronne, D. Vergamini, and V. Gosselin. Time-versus-capacity compromises in project scheduling. In Proceedings of the Thirteenth Workshop of the U.K. Planning Special Interest Group, 1994. [5] R. M. Rahman, K. Barker, and R. Alhajj. Study of Different Replica Placement and Maintenance Strategies in Data Grid. In CCGRID, pages 171–178. IEEE Computer Society, 2007. [6] K. Ranganathan and I. Foster. Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications. volume 0, pages 352–258. IEEE Computer Society, 2002. [7] H. Sato, S. Matsuoka, T. Endo, and N. Maruyama. AccessPattern and Bandwidth Aware File Replication Algorithm in a Grid Environment. In GRID, pages 250–257. IEEE, 2008. [8] S. Shankar and D. J. DeWitt. Data Driven Workflow Planning in Cluster Management Systems. In HPDC ’07, pages 127–136, New York, NY, USA, 2007. ACM. [9] H. Simonis. Constraint applications in networks. In F. Rossi, P. van Beek, and T. Walsh, editors, Handbook of Constraint Programming, chapter 25, pages 875–903. Elsevier, 2006. [10] V. G. Timkovsky. Is a unit-time job shop not easier than identical parallel machines? Discrete Applied Mathematics, 85(2):149–162, 1998. ˇ [11] M. Zerola, R. Bart´ak, J. Lauret, and M. Sumbera. Planning Heuristics for Efficient Data Movement on the Grid. In Proceedings of the 4th Multidisciplinary International Conference on Scheduling: Theory and Applications (MISTA), pages 768–771, 2009.

6. Conclusions In this paper we tackle the complex problem of efficient data movements on the network within a distributed environment. The problem itself arises from the real-life needs of the running nuclear physics experiment STAR and its peta-scale requirements for data storage and computational power. We presented the two stage constraint model, coupling path planning and transfer scheduling phase for data transfers to the single destination. The complexity of the problem is introduced and several techniques for pruning the search space, like symmetry breaking, domain filtering, or implied constraints, are shown. We proposed and implemented several search heuristics for both stages and per5

Suggest Documents