dynamic cloud provisioning for scientific grid workflows

DYNAMIC CLOUD PROVISIONING FOR SCIENTIFIC GRID WORKFLOWS Simon Ostermann, Radu Prodan and Thomas Fahringer Institute of Computer Science, University of Innsbruck Technikerstrasse 21a, Innsbruck, Austria [email protected]

OVERVIEW • Introduction • Optimized

Cloud Provisioning

Cloud Start • Instance Size • Grid Scheduling • Cloud Stop •

• Evaluation •

Wien2k

•

Invmod

•

Meteoag

using 3 scientific workflows

• Conclusion

INTRODUCTION • Infrastructure • On-demand • Other

as a Service a branch of Cloud computing

resources i.e.: Amazon EC2, GoGrid, ...

common Cloud computing areas not covered:

• Platform

as a Service

• Software

as a Service

• Specialized

solutions for Storage, Web hosting, ...

CLOUD COMPUTING FOR SCIENTIFIC COMPUTING? • Rent

resources instead of buying own hardware • Eliminates permanent operation, maintenance, and deprecation costs • Scale up/down an infrastructure based on temporary immediate needs • Significantly reduced over-provisioning • Virtualised resources enables scalable deployment and provisioning of application software • Reliability through business SLA relationships that bind actors to offering higher QoS guarantees

nothing Unallocated Requested Starting Running Accessible Shutting down Terminated Unallocated

50 100 100 100 30 270 50 10 100

CLOUD MODELS

computing mostly available on a hourly basis

• Some

research papers assume finer granularity =#>-(&%''%,6( %,-#./*'(

1%2#(

• Interesting

!""#$%&'#(

0$*&'#(%,-#./*'(

4:8;,6(+3$'()*

!"#$%&'()*+),")-*

Generate failure ="24/">$'()*

3&"%$/-*.-)-/&4(/*

:&;$'()*

./"0*&)0*9%($0*+)''-2*

OPTIMIZED CLOUD PROVISIONING • Analysis

of regular executions and the resulting costs

• Analysis

resulted in multiple parts needing optimization

• Choices

have to be made about: start and stop of resources and the amount of instances requested

• Four

optimizations found, defined as algorithms (in the paper) and exploited in the evaluation

Grid core 3 Grid core 2 Grid core 1 Cloud core 1

• Parallel

CLOUD START 120 120 120

120 120 120

regions with more tasks then250 available cores

• Depending

of Cloud and Grid speed Serialization and Imbalance overheads are analyzed

• When

Grid core 3 Grid core 2 Grid core 1 Cloud core 1

120 120 120

120 120

minimization of the runtime of300the parallel section is possible Cloud resources are started :;.3437)+"

2+&'34'536*7" %&'(")*&+"#"

%&'(")*&+"$"

%&'(")*&+","