Managed by ... Integration of Grid Cost Model into ... Network performance. + .... service). ⢠simulation of the real situation with UNICORE middeware and the.
Managed by Managed by
Integration of Grid Cost Model into ISS/VIOLA Meta-Scheduler environment
Ralf Gruber*, Vincent Keller*, Michela Thiémard*, EPFL Oliver Wäldrich*, Wolfgang Ziegler*, SCAI Philipp Wieder*, Forschungszentrum Jülich Pierre Manneback*, CETIC *CoreGRID members
Managed by
Plan • • • • • •
HPC Situation Application structure Gamma Model ISS Concept Cost Model UNICORE-VIOLA-ISS Testbed
Managed by
HPC : TODAY A3 A1 A2
B3
A4 R1
B1
A5
B2 C1 C2
B4 R2
B5
C3 C4 R3
C5
Resources chosen to satisfy needs of all local applications
Managed by
HPC : TOMORROW A3 A1
A4
A2
A5
C1
C3 C4
C2
C5
B3 B1
B4
B2
B5
UNICORE/MSS + ISS
R1
R3
R2
How to BEST distribute applications to improve overall performance ?
Managed by
HPC : TOMORROW EVENING A3 A1
A4
A2
A5
C1
C3 C4
C2
C5
B3 B1
B4
B2
B5
UNICORE/MSS + ISS
?
?
?
What resources needed to satisfy applications components WITHIN LOWEST COSTS ?
Managed by
HPC : AFTER TOMORROW A3 A1
A2
UNICORE/MSS + ISS
R1
R3
R2
What to do to BEST distribute application COMPONENTS to improve overall performance ?
Managed by
Application structure
Component 1: Embarassingly parallel
Component 2: Application
Point-to-point
Component 3: Multicast
Component 4: Client-server
Machine m NOW Machine l PC cluster
Machine i MPP Machine k NUMA
New feature to VIOLA/Meta-Scheduler environment: Submit to well-suited machines for application components
Γ model
Managed by
Characterizes application components and parallel machines
Γ=
E = CPU time over communication time 1− E
Γ parameters of application components for one machine: CPU time Communication time Size of messages
+ Parameters on one machine: Memory bandwidth Processor performance Network performance
+
Parameters on other machine: Memory bandwidth Processor performance Network performance
Γ parameters of application component on other machine
Managed by
Framework
Managed by
ISS concept
Managed by
Pre-execution
Resource discovery Prologue Decision Queing
Managed by
Prologue
Find eligible machines: Yes on Machine up? Access rights? Program exists? Enough memory?
Managed by
Decision
Collect data on their availabilities Evaluate cost function Submit the job
Cost function time costs
Managed by
HW+SW costs n
min( z ) = βK w ( C k , Ri , p k ) + ∑ ℑC ( Ri , p k ) k =1
such that ∀1 ≤ n
∑( K ( C k =1
e
k
k
k≤n
, Ri , p k ) + K l ( C k , Ri , pk )
+ K eco ( C k , Ri , p k ) + K d ( C k , Ri , p k )) ≤ KMAX max( t kd,i ) − min( t k0 ) ≤ TMAX i ,k
k
( Ri , pk ) ∈ ℜ( C k )
Managed by
Cost function
ℑCk ( Ri , p k ) = α k ( K e ( C k , Ri , p k ) + K l ( C k , Ri , p k )) + γ k ( K eco ( C k , Ri , p k )) + δ k ( K d ( C k , Ri , p k ))
α k , β ,γ k ,δ k ≥ 0 αk + β + γ k + δk > 0
Managed by
Free parameters αk, β, γk, δk
Minimize turn-around time: β= 1, KMAX=∞, αk γk, δk = 0 ∀k
Minimize hardware costs: β= 0, TMAX=∞, αk γk, δk ≥ 0
Managed by
CPU costs Ke t ie,k
K e ( C k , Ri , pk ) = ∫ k e ( C k , Ri , pk ,ϕ ,t )dt t is,k
priority
t ks,i
t ke,i
Managed by
License fees Kl
t ie,k
K l ( C k , Ri , p k ) = ∫ k l ( C k , Ri , p k ,t )dt t is,k
Managed by
Costs of turn-around time Kw max t kd,i
K w (Ck , Ri , pk ) =
k
∫k
w
(t )dt
min t k0 k
salary
t k0
time-to-market
t kd,i
t k0
t kd,i
Managed by
Energy costs t ke ,i
K eco (Ck , Ri , pk ) = ∫ keco (Ck , Ri , pk , t )dt t ks ,i
no frequency adaptation
t ks,i
t ke,i
with frequency adaptation
t ks,i
t ke,i
Managed by
Epilogue
Ganglia
VAMOS
Broker
SI dataWarehouse
Accounting
Managed by
Pleiades: Ganglia data
Γ50% =0.64 =0.82
I/O or MPI ?
To machine with faster communication
E
Managed by
VAMOS: single job profiles
CFD
Plasma physics
Managed by
First VIOLA/UNICORE/Meta-Scheduler/ISS testbeds
UNICORE VIOLA MSS
LIN-EPFL: Pleiades1 Pleiades2 Pleiades3
EPFL: SMP/NUMA High γm cluster
UNICORE VIOLA
Switch
ISS DIT-EPFL: Mizar Condor NOW BlueBrain
ETHZ: SMP/NUMA High γm cluster
MSS ISS
CERN: egee Grid EIF: NoW
CSCS: SMP/vector Low γm cluster
12.2006: EPFL HPCN installations 12.2007: Swiss HPCN Grid initiative
Managed by
Outlook: Simulator
Understanding the behavior of the simulated grid depending of the value of the free parameters.
• Usage of old executions data on 2 departemental clusters – each job was monitored. Data stored in a mysqlDB – Mapping of ganglia and localscheduler (torque) info (VAMOS service) • simulation of the real situation with UNICORE middeware and the VIOLA MSS simulated on top of it • Using the real job traces. • training of the system (broker service) . • Metric used to show the improvement of the grid is the utilization of the machines.
Managed by
Outlook: Component dependency • Each component of a workflow is executed on the well suited machine
• Hard problem : need new ideas • Co-allocation is now made manually. • With ISS in the future : automatically.
Managed by
Publications Pierre Manneback, Guy Bergère, Nahid Emad, Ralf Gruber, Vincent Keller, Pierre Kuonen, Sébastien Noël, and Serge Petiton (2005), "Towards a scheduling policy for hybrid methods on computational Grids", CoreGRID Meeting, Pisa (28-30 November, 2005), to appear in Lecture Notes in Computer Sciences (Springer) Ralf Gruber, Vincent Keller, Pierre Kuonen, Marie-Christine Sawley, Basile Schaeli, Ali Tolou, Marc Torruella, and Trach-Minh Tran (2005), "Intelligent GRID Scheduling System", PPAM 2005, Poznan, Poland, Lecture Notes in Computer Sciences (Springer) 3911, p. 751-757 Vincent Keller, Kevin Christiano, Ralf Gruber, Pierre Kuonen, Sergio Maffioletti, Nello Nellari, MarieChristine Sawley, Trach-Minh Tran, Philipp Wieder, and Wolfgang Ziegler, "Integration of ISS into the VIOLA Meta-Scheduling Environment", CoreGRID Meeting, Pisa (28-30 November, 2005), to appear in Lecture Notes in Computer Sciences (Springer) Ralf Gruber, Pieter Volgers, Alessandro De Vita, Massimiliano Stengel, and Trach-Minh Tran, "Parameterisation to tailor commodity clusters to applications", Future Generation Computer Systems 19 (2003) 111-120 Ralf Gruber, Vincent Keller, Michela Thiémard, Oliver Wäldrich, Philipp Wieder, Wolfgang Ziegler, and Pierre Manneback, "Integration of Grid Cost Model into ISS/VIOLA Meta-Scheduler environment", (2006) to appear.
Managed by
THANK YOU