Performability Modeling for Scheduling and Fault ... - Semantic Scholar

4 downloads 13075 Views 1MB Size Report
B.Sc. in Computer Science, University of Missouri, Rolla. Research interests: Design of very high-speed computers, providing new computing capabilities.
Performability Modeling for Scheduling and Fault Tolerance Strategies for Scientific Workflows Lavanya Ramakrishnan and Daniel A. Reed Indiana University, Bloomington, IN Microsoft Research, Redmond, WA Proceedings of the HPDC 2008

Presenter: Sean

Outline

Outline

1

Introduction

2

Reliability Specification

3

Performability Analysis

4

Evaluation

Outline

Outline

1

Introduction

2

Reliability Specification

3

Performability Analysis

4

Evaluation

Outline

Outline

1

Introduction

2

Reliability Specification

3

Performability Analysis

4

Evaluation

Outline

Outline

1

Introduction

2

Reliability Specification

3

Performability Analysis

4

Evaluation

Outline

Lavanya Ramakrishnan Brief CV

Ph.D. Student, Graduated, Now MCNC

Ph.D. in Indiana University, 2008 (Expected), Advisor: Dennis Gannon. M.Sc. in Indiana University, 2002. B.Sc. in University of Mumbai, 2000.

Research interests:

Distributed systems including grid computing, high performance computing and utility computing, workflow tools, resource management, monitoring and adaptation for performance and fault tolerance

Publications: Lavanya Ramakrishnan and Daniel A. Reed. "Performability Modeling for Scheduling and Fault Tolerance Strategies for Grid workflows", HPDC 2008 Lavanya Ramakrishnan, Laura Grit, Adriana Iamnitchi, David Irwin, Aydan Yumerefendi, and Jeff Chase. "Toward a Doctrine of Containment: Grid Hosting with Adaptive Resource Control". SC 2006

Projects: Linked Environments for Atmospheric Discovery(LEAD) Virtual Grid Application Development Software (VGrADS) Open Resource Control Architecture(ORCA)

Outline

Daniel A. Reed Brief CV

Director of scalable computing and multicore at Microsoft Research (Since Nov. 2007) Ph.D. in Computer Science, Purdue University. M.Sc. in Computer Science, Purdue University. B.Sc. in Computer Science, University of Missouri, Rolla.

Research interests:

Design of very high-speed computers, providing new computing capabilities for scholars in science, medicine, engineering and the humanities, tools and techniques for capturing and analyzing the performance of parallel systems, and collaborative virtual environments for real-time performance analysis. Two great forces are reshaping computing:multicore processors with unprecedented power and the explosive growth of software services hosted on megascale data centers

Professional Experience: 2005 The North Carolina General Assembly appropriates $5.9M in state FY06 and $11.8M in FY07 and beyond to expand the Renaissance Computing Institute (RENCI) 2005 The President’s Information Technology Advisory Committee (PITAC) and its subcommittee on computational science, which he chaired, produced a report on the future of computational science, entitled “Computational Science: Ensuring America’s Competitiveness.” 2001, Reed led the effort to launch the National Science Foundation’s TeraGrid, the world’s largest, most comprehensive distributed cyberinfrastructure for open scientific research, and then served as TeraGrid chief architect through 2003.

INTRODUCTION

Outline

1

Introduction

2

Reliability Specification

3

Performability Analysis

4

Evaluation

INTRODUCTION

Core Concept: Performability

Performability: a composite measure a system’s performance and its dependability

Performance: the "quality of service (QOS), provided the system is correct"

Dependability: an all-encompassing definition for reliability, availability, safety and security

INTRODUCTION

Problem Statement Grid/Cloud computing need to be degradable Resource availability vary significantly: Hardware + Software Performance (QoS) fluctuation incurred by resource availability Degradable: a resource is not only in two states, "fully-operational" or "failed" How to be degradable

Resource provider: provide an assured level of service under a cost model Software: provide an interface for user to express their performance and reliability requirement Execution models: Characteristics of program execution need to be understood Approach: Using performability, present a qualitative model to capture and analyze the effect of resource reliability on application performance.

INTRODUCTION current virtual grid configuration.

III.C. Implementing Virtual Grids Background

Each class of virtual grid (e.g. a bag, a cluster, etc.) may in fact have a different specialized implementation, but these implementations will share a set of technologies which include scheduling, performance monitoring, information services, resource Detailed selection,approach: checkpointing, etc. It’s unclear at present if these implementations are separat (and Allow composable) if there are implementations forBased common of user toorexpress theseparate availability requirement: on combinations existing resource structures LooseBagof (Clusters)). Virtual Grids (e.g. framework in VGrADS Project III.D. Virtual Grid Execution System Design Application Application vgDL

vgID Information Services

vgES APIs VG VG VG

vgFAB vgMON

vgLAUNCH

Resource Managers

Understanding the applications’ reliability requirements: Three common Figure 1. vgES overall architecture programming models

The virtual grid vision is realized as part of the Virtual Grid Execution System (vgES). This work builds on and is informed by a four-year effort to build development tools for

INTRODUCTION

Three Common Programming Models

A M W

W

B W

W

(a) (a) Master Workder

C

B C

C (b)

(b) Divide and Conquer

C

A

A

A

A

A

A

A

A

(c) (c)SPMD

Figure 1: Three common programming models (a) Master Worker (b) Divide and Conquer (c) SPMD

rce Description in vgDLINTRODUCTION

Description of vgDL: BNF grammar for Redline

description of the Virtual Grid Description Language (vgDL) 2.1 and 2.2, which we describe hereafter. Redline expression ::= Identifier‘=‘ Arithmatic_expr | Logic_expr | Predicate Arithmatic_expr ::= A_operand [A_op A_operand]* A_opearnd ::= Integer | Real A_op ::= "+" | "-" | "*" | "/" | "^" Logic_expr::= L_operand [L_op L_operand]* L operand ::= Integer | Real | Boolean | Figure 2-1. BNF grammar for Redline

INTRODUCTION

Description of vgDL: BNF for vgDL Virtual Grids: Resource Abstractions for Grid Applications

8/9/2004

Vgrid ::= Identifier = Rdl-expression [ at time/event ] Rdl-expression ::= Rdl-subexpression | [ “(“ Rdl-expression “)” op “(“ Rdl-expression “)” ]* Rdl-subexpression ::= Associator-expression | Node-expression Associator-expression ::= Bag-of-expression | Cluster-of-expression Bag-of-expression ::= LooseBagof "" "[" MinNode ":" MaxNode "]" [ "[" Number [ “su” | “sec” ] "]" ] ";" Node-expression | TightBagof "" "[" MinNode ":" MaxNode "]" [ "[" Number [ “su” | “sec” ] "]" ] ";" Node-expression Identifier ::= String Min ::= Integer Max ::= Integer Node-expression ::= Identifier "=" Node-constraint Node-constraint ::= "{" Attribute-constraint | Rdl-expression "}" | Rdl-expression Attribute-constraint ::= Redline expression for attribute and constraint [see Figure 3-2] Cluster-of-expression ::= Clusterof "" "[" MinNode ":" MaxNode [ “,” MaxTime “:” “MinTime”] "]" ";" Node-expression op := close | far | highBW | lowBW

Figure 2-2. BNF for Virtual Grid Description Language (vgDL)

BLAST, follows the master-worker execution mpiBLAST1=MasterNode={memory 4GB,model. disk Con> INTRODUCTION sider an mpiBLAST resource request for a master node con20GB} highBW LooseBagOf [4:32] nected to a set of worker nodes, each with at least mpiBLAST1=MasterNode={memory 4GB, disk 4 GB > ;WorkerNode={memory >= 4GB } Example1:mpiBLAST (vgDL) of20GB} memory. In the virtual grid description language(vgDL), highBW LooseBagOf [4:32] this would specified as follows: One faultbetolerance strategy might ;WorkerNode={memory >= 4GB } require the network link between the master and the worker to have ”good” reliOne (section fault tolerance the network ability 3). Thestrategy modifiedmight vgDLrequire might mpiBLAST1=MasterNode={memory 4GB, look disklike>the link between the master and the worker to have ”good” relifollowing: 20GB} highBW LooseBagOf [4:32] ability (section 3). The modified vgDL might look like the ;WorkerNode={memory >= 4GB } following: mpiBLAST2=MasterNode={memory 4GB, disk One fault tolerance strategy might require the network >20GB} (goodReliability AND highBW) Looselink between the master and the worker to have relimpiBLAST2=MasterNode={memory 4GB,”good”disk BagOf [4:32]; WorkerNode={memory ability (section 3). The modified vgDLhighBW) might lookLooselike the >20GB} (goodReliability AND >= 4GB} following: BagOf [4:32]; WorkerNode={memory >= 4GB} to the network being reliable, the request could In addition also specify that the master node be highly4GB, reliable: disk mpiBLAST2=MasterNode={memory In addition to the network being reliable, the request could >20GB} (goodReliability AND highBW) Loosealso specify that the master node be highly reliable: BagOf [4:32]; WorkerNode={memory mpiBLAST3 = HighReliabilityBag= >= 4GB} {memory 4GB, disk > 20GB } (goodReliability AND mpiBLAST3 = HighReliabilityBag= highBW) LooseBagOf WorkIn addition to the network being reliable, the[4:32]; request could {memory 4GB, disk > 20GB } (goodReliability AND erNode ={memory >= 4GB}; MasterNode ={memory also specify that the master node be highly reliable: highBW) LooseBagOf [4:32]; Work4GB, disk >20GB} erNode ={memory >= 4GB}; MasterNode ={memory mpiBLAST3 = HighReliabilityBag= 4GB, disk >20GB} {memory 4GB, disk > 20GB } (goodReliability AND

ity indesc thm virtual gridthat ity levels scale and adaptation fications. We de follows: Excell ity scale that m space that can (70 - 79%), Fa follows: Excelle tive are m the levels exact defin (70 - 79%), Fa ity levels in the ment contexts the exact defin and adaptation puter hardware ment contexts ity scale that ma directly to reso puter hardware follows: Excelle level tools to t directly to resou (70 - 79%), Fai ations. This p level tools to t the exact defini resources when ations. This pr ment contexts a is analogous to resources whenb puter hardware time on their is analogous to directly to resou a longer than time tools on their b level to tr ing penalized w a longerThis than ations. pre clock time mig ing penalized w resources when We define a time migh isclock analogous to ity. These asso Weondefine aba time their defined earlier These assor aity. longer than bilityBag, (c) defined earlier ing penalized wi bilityBag, (e) bilityBag, (c) clock time migh operators for s bilityBag, We define (e) as the following o ity. These for assoc operators s ity and are ma defined earlier o( the following highReliabili bilityBag, ity and are (c) ma

Model

specific INTRODUCTION The Weather Research and Forecasting (WRF) model [17] associa Example2: Weather Research Forecast (WRF) is a mesoscale numerical weatherand prediction system. TheModel disks on WRF model is an SPMD computation where geographic re10 pm gions are modeled in parallel. For a simple WRF execution, vgDL’s the request might be for a cluster with 8 to 32 nodes, each added with at least 4 GB of memory: wrf1= WRFBag = TightBagOf [8:32]; CNode = {memory>=4GB} We might require all the nodes and the network connectwrf2= ing them to beWRFBag highly reliable = since thisHighReliabilityis an SPMD comBag [1:1]; isManyNodes Tight- a putation. A modified request shown below =to request BagOf [8:32]; CNode = {memory>=4GB} HighReliabilityBag:

From these examples we see that applications can have varied reliability requirements based on their characteristics. Workflow planning components need higher-level in- 25 terfaces to describe collective qualitative reliability requirements in the resource selection process. These requirements are based on application characteristics and other real-time constraints such as deadlines or budget. These user-specified

4. P

Grid more co reduced with m using s concep

RELIABILITY SPECIFICATION

Outline

1

Introduction

2

Reliability Specification

3

Performability Analysis

4

Evaluation

RELIABILITY SPECIFICATION

Reliability Specification for vgDL Extension

Quantitative Reliability Level: Excellent (90-100%), Good (80-89%), Satisfactory(70-79%), Fair (60-69%), Poor (0-59%)

Node: HighReliabilityBag, GoodReliabilityBag, MediumReliabilityBag, LowReliabilityBag, PoorReliabilityBag

Link: highReliability, goodReliability, mediumReliability, lowReliability, poorReliability

PERFORMABILITY ANALYSIS

Outline

1

Introduction

2

Reliability Specification

3

Performability Analysis

4

Evaluation

PERFORMABILITY ANALYSIS

ommonly used performability model today is the Markov Reward Model (MRM). To illustrate th Example echnique, appliedSystem to a cyclic case, a 3-CPU multi-processor system is used, that begins running mode. Jobs arrive at the buffer and are stored until a processor(CPU) becomes available, then th the buffer is sent to this CPU to be processed. In this manner jobs are shared equally between th ocessors.

Fig 1: Model of multi-processor the multi-processor system Model of the system

ome assumptions in our model to take note of. It is assumed that not more than one processor can ere is no simultaneous failures of CPUs. This is described by the transition arrows (only one pos & from a state). Another assumption is that the buffers are ultra-reliable, so buffer failure is not r, although such a failure might result in a complete system breakdown. There are no limits on b

PERFORMABILITY ANALYSIS

Markov Reward Model

e behaviour model and reward model describe the MRM :

Fig 2: The Markov reward Model The Markov reward Model

igure 2, you see that there are four states describing the system. These are :

1 : 3 processors up, 0 processors down

PERFORMABILITY ANALYSIS

Accumulative Reward Y(t)

3 : Sample paths of Z(t),& X(t) & Y(t) processes SampleFigpaths of Z(t), X(t) Y(t) processes

PERFORMABILITY ANALYSIS

The Probability Distribution Function of Y(t)

Fig 4The : The Probability Distribution Function of Y(t) Probability Distribution Function of Y(t)

PERFORMABILITY ANALYSIS

Definition of Performability

Performability is defined as “the probability that a system reaches an accomplished level y over a utilization interval (0,t).” y(x,t) = Prob[Y(t)x]

Ȝ

Ȝ

High

Ȝ

Good —

Ȝ

Medium —

Ȝ

Low —

Poor —

Fail —

Markov chain for the resource performance and reliability states Figure 2: Markov chain for the resource performance and reliability states

t r

a s a c

PERFORMABILITY ANALYSIS

Resource State Reliability Model

MTBF = MTTF + MTTR λ = MTTF −1 µ = MTTR−1 The steady state probability of occupancy in each state: πn = ρn π0 , π0 = 1 − ρ, ρ = λ/µ, failure − to − repairratio Normally ρ < 1, otherwise, the system is towards complete failure

PERFORMABILITY ANALYSIS

Performability Modeling

T: the running time on high available resource Running time in other states: T + ni x, i = 1,2,3,4 (Fail is not counted) Reward Rate: inverse of the running time 1/(T + ni x) Measure performability as the accumulated reward rate over a specificized time interval: E[Z(t)] = Σri πi (t)

PERFORMABILITY ANALYSIS

Performability Example

Parameter Application running time T Failure-to-repair rate ρ Perform. x=2 Perform. x=100

A 30 min

Machines B C 30 min 25 min

D 15 min

0.1

0.4

0.4

0.6

0.033 0.031

0.032 0.224

0.038 0.027

0.055 0.029

n1 − n4 = 1, 2, 3, 4 Table 1: Performability for different performance model numbers and reliability characteristics where n1 = 1, n2 = 2, n3 = 3, n4 = 4

nee cat cal ter ity for Mi TM and res on T ma add

PERFORMABILITY ANALYSIS

Performability of Different Programming Models

Master-workder application: E(M−W) = Min(EMaster , EWorker , ENetwork ) when TMaster >> TWorker and TMaster >> TNetwork

Divide and Conquer: performability of the root (Tree root runs longer)

SPMD: ESPMD = Min(Esystemcomponents )

PERFORMABILITY ANALYSIS

Workflow Panning for Performability

Workflow scheduling can base on the projected application running time: Tprojected = 1/E[Z](computation) Follow the performability modeling procedure to achieve the network performability

Based on the computation performability and network performability, using traditional scheduling algorithm to the workflow

PERFORMABILITY ANALYSIS

Fault Tolerance Strategies

Two common strategies: replication (good performance and reliability, but high cost) and checkpoint-restart (good reliability but low performance) Cost of replication: CR = Tprojected ∗ n, n is the number of replica Cost of checkpoint-restart: CCR = Ccheckpoint + Crestart−on−failure , Ccheckpoint = Cper−checkpoint ∗ Tprojected /Tinterval , Tinterval : optimal checkpoint interval to meet the performability level if CR

Suggest Documents