Reliability-aware Optimal K-Node Allocation of ...

Reliability-aware Optimal K-Node Allocation of Parallel Applications in Large Scale HPC Systems Narasimha R. Gottumukkala1, 2, Chokchai Box Leangsuksun2, Raja Nassar2, Mihaela Paun2, 3, Dileep Sule2 , Stephen L. Scott4 1Centre

for Business and Information Technologies, University of Louisiana at Lafayette, LA 2College of Engineering & Science, Louisiana Tech University, LA 3 Faculty of Mathematics and Informatics, Spiru Haret University, Bucharest, Romania 4Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oakridge, TN

1

This research is supported by the Department of Energy Grant no:DE-FG02-05ER25659. and 4 by the Mathematics, Information and Computational Sciences Office Office of Advanced Scientific Computing Research, Office of Science

1

Introduction  Present and future Computational applications require massively parallel processors - Top500.org reports  Un-expected failures and downtimes a major performance hindrance for large scale parallel applications - A Single node failure interrupts the entire parallel application running on all the nodes  For MPI Jobs, if one processor fails, the whole job running on all processors is aborted - Increasing processor count => Increasing number of failures and downtimes affects performance

2

Motivation  Typical Fault Tolerance Mechanisms for parallel jobs - Duplicate Tasks/Resources : Too Expensive - Failure Prediction: Not 100% accurate - Checkpoint/Restart: Complex & difficult in HPC

 Reliability aware resource management provides - Allocate nodes such that the performance loss due to failures is minimum - Individual Nodes are found to have time varying failure rates

3

Scalability vs Reliability

 Increasing the number of nodes decreases reliability 4

Job run length

 Increasing job completion time decreases reliability 5

Scalability vs Reliability

x

MTTR

x

x

 Increasing the number of nodes decreases completion time 6

Problem Statement  Given a parallel job which has T hrs of running on a single node, what is the optimal number of nodes that will minimize the completion time - Minimize failure probability and waste time - Minimize the total completion time

4/4/08

7

Outline  Related Work  Resource Allocation Algorithms  Expected Completion Time on K Nodes with Reliability  Reliability Aware optimal k-node allocation Algorithm  Simulation Results  Conclusion and Future Work 8

Related Work  Current schedulers do not consider reliability as an important performance metric - FIFO (First In First Out) - Backfilling (Move short jobs ahead of the queue if long jobs are not interrupted)

 Scalability of parallel applications - Amdahl’s Law • Speedup decreases, but saturates after a certain limit doesn’t matter how big the problem size - Gustafson’s Law: • Any large problem can be efficiently parallelized

 Importance of the of number of processors has been studied for checkpointing applications [Plank et al]  Reliability-aware resource allocation using the reliability of nodes observed to minimize the waste time due to failures. 9

Reliability aware optimal k-node allocation  Idea: - Select k nodes out of n such that the expected completion time is minimal

 the Expected completion time based on - - - -

The reliability of k nodes The completion time on k nodes The expected repair time The expected waste time in the presence of failures

10

Expected Completion Time of Parallel Program on k-Nodes

 The Expected completion time is given by Where Tk is the actual running time of job on k nodes M is the Expected Waste-time R is the Expected Repair time Fk is the system failure probability 11

Expected completion time : With scalability models

Amdahl’s Law

Gustafson’s Law

 The completion time decreases after certain point because of higher failure probability requiring resubmission 12

Reliability Aware Optimal K-Node allocation Algorithm

k

Rk =

∏ R (t + x | t ) i

i

i

i =1

€

 F  E(Tc (k ) ) = Tc (k ) + (W + R) k  1− Fk 

€

 Optimal k node Algorithm: - Select k out of N nodes such that the Expected Completion Time is minimal  Each node has a different reliability, and increasing the number of nodes decreases reliability  Increasing the number of nodes also decreases completion time 13

Reliability Aware Reliability aware optimal k-node algorithm: An Example Case Optimal k Node Allocation

620

x8

 Figure shows how an optimal k is selected out of N nodes  Algorithm selects k-nodes such that the Expected Completion Time is minimum

14

Simulation study  Failure Data

- Used the failure properties of ASC White - ASC White: • LANL System, 4 year failure data 7/1/2000 to 10/1/2004 • 512 nodes , 8196 processors • 8196 Processors

 Parallel Job Workloads

- Generated synthetic workload based on distribution of job run-lengths and distribution of number of processors [Lubin01] - the uniform-log distribution to generate the number of nodes, and two stage hyper exponential distribution to generated job runlengths 15

Simulation Framework Simulation Framework  Performance Metrics

- Job completion time • MCT (Mean Completion Time) = Total completion time / unit job run-length - Waste Time • MWT (Mean Waste Time)

=

The total waste time / unit job run-length

• Where unit job run-length = job-run-length/number of processors

- Relative Percentage Difference (RPD) =

16

Resource Allocation Algorithms  RR (Round Robin) - Allocates the job to k adjacent nodes based on the rotation policy of node-ids

 All (Select All available Nodes) - Selects all the available nodes in the system

 RAS (Select m nodes out of N, but m is fixed) - Here the m number of processors for a job is given by the user (fixed). The algorithm selects the m most reliable processors available for every job

 RA-Opt (Selects k nodes out of N, k not fixed) - Selects K nodes out of N such that the expected completion time is minimal. 17

Experimental Results (MCT)

 Mean Completion Time - MCT is less for RA-Opt as compared to all other techniques - The RPD (Relative Percentage Difference) shows the percentage difference on how much percentage Ra-Opt performs better than each technique 18

Experimental Results (MWT)

 Mean Waste Time

- MWT is less for RA-Opt as compared to all other techniques - Observe that the MWT is higher for ALL nodes as compared to other techniques 19

Experimental Results (MWT with different job Run-lengths)

 The MWT increases with the increase in job run-lengths

- Short and medium jobs do not fail very often, however longer jobs have higher chances of failures

 We observe that RA-Opt technique has minimum MWT as compared to all the other techniques especially for very long jobs 21

Conclusions  Several factors affect the completion time of a parallel program as nodes are scaled higher - Reliability becomes a major factor in deciding the optimal number of nodes to minimize completion time  Developed Reliability aware optimal-k node allocation algorithm - Based on expected completion time function on k nodes  Simulation Results - Long jobs can especially benefit with the reliability-aware optimal k node allocation algorithm 22

Future Work  The reliability-aware optimal k-node allocation can be combined with various scheduling algorithms - Investigate if further improvement is possible.

 Developing reliability aware optimal k-node allocation for checkpointed jobs - Select optimal k nodes by considering the checkpoint overhead - Importance of number of processor selection for checkpointing has been mentioned by Plank et al.

23

Future Work  Different Scalability models for expected completion times  This work can be extended for malleable jobs - Jobs for which the requirements change during runtime

 Reliability aware resource management/allocation for time sharing applications - More than one job is affected due to failures

24

References  [Plank et al 99] James S. Plank and Michael G. Thomason,“The Average Availability of Parallel Checkpointing Systems and Its Importance in Selecting Runtime Parameters,” 29th International Symposium on Fault-Tolerant Computing, Madison, WI, June, 1999, pp. 250-259. [Kumar et al 91] Kumar, V. and Gupta, A. 1991. “Analysis of scalability of parallel algorithms and architectures: a survey”. In Proceedings of the5th international Conference on Supercomputing (Cologne, West Germany, June 17 - 21, 1991). E. S. Davidson and F. Hossfield, Eds. ICS '91. ACM, New York, NY, 396-405. [Gottumukkala et al 07] Narasimha Raju, Gottumukkala, Chokchai Leangsuksun, Raja Nassar, Stephen L Scott. “Reliability-Aware Resource Allocation in HPC Systems”, Proceedings of the IEEE International Conference on Cluster Computing 2007, Austin Texas. U. Lublin and D. G. Feitelson, “The Workload on Parallel Supercomputers: Modeling the Characteristics of Rigid Jobs”, Journal of Parallel & Distributed Computing. Vol. 63, no.11, pp. 1105-1122, November 2003.

25

The End! Thank you

26

Reliability-aware Optimal K-Node Allocation of ...

Reliability-aware Optimal K-Node Allocation of ...

Suggest Documents

Optimal allocation of electronic content

Optimal allocation - Semantic Scholar

Optimal Redundancy Allocation in Hierarchical

Optimal Allocation of Agricultural Water Resources ...

Evolution of longevity through optimal resource allocation

Optimal Bulk Energy Allocation of Hydroelectric Resources

ON OPTIMAL ALLOCATION OF INDlVlSlBLES UNDER UNCERTAINTY

optimal allocation of multiple emergency service ... - CiteSeerX

Optimal allocation of computational resources in

Optimal Allocation of Funds for Loans Using

Efficient Optimal Sizing And Allocation Of

Optimal Allocation and Sizing of Distributed ...

Optimal Allocation Of Construction Planning ... - Javelin Associates

Optimal allocation of cleanings in heat exchanger

OPTIMAL ALLOCATION OF IRRIGATION WATER IN ...

OPTIMAL ALLOCATION OF STORMWATER POLLUTION CONTROL

Optimal Allocation of Fault Detectors - IEEE Xplore

Optimal Allocation of Distributed Generations and

OPTIMAL ALLOCATION OF POINT-COUNT SAMPLING EFFORT

Optimal allocation of electronic content - Semantic Scholar

OPTIMAL ALLOCATION OF IRRIGATION WATER ...

Optimal Bulk Energy Allocation of Hydroelectric Resources

Optimal Allocation of Static and Dynamic VAR

Optimal Allocation of Tasks onto Networked