Sep 28, 2006 - runs a Daemon on his computer when unused ("cycle stealing"). â contributes to execute application launched on the JaceP2P infrastructure.
JaceP2P: an Environment for Asynchronous Computations on Peer-to-Peer Networks J. Bahi, R. Couturier, P. Vuillemin
AND team (Distributed Numerical Algorithms) Laboratoire d’Informatique de l’université de Franche-Comté (LIFC) 28 September 2006
J. Bahi, R. Couturier, P. Vuillemin
HeteroPar’06, Barcelona (Spain)
28 September 2006
1 / 21
Introduction
Motivations Scientific context : iterative methods ⇒ approximate results at each iteration ⇒ communications/synchronizations after each iteration Execution context : Peer-to-Peer (P2P) Computing ⇒ used for file sharing, possibility for distributed computing ⇒ dynamic infrastructures ⇒ decentralized organization ⇒ heterogeneity of processors and networks ⇒ communications between computing nodes Lots of idle times when disconnections due to synchronizations J. Bahi, R. Couturier, P. Vuillemin
HeteroPar’06, Barcelona (Spain)
28 September 2006
2 / 21
Introduction
Our solution
JaceP2P : the P2P version of JACE (Java Asynchronous Computation Environment) ⇒ programming and execution environment for iterative applications ⇒ based on asynchronous iteration model ⇒ based on cycle stealing ⇒ enables node disconnections ⇒ enables communications between peers ⇒ decentralized organization
J. Bahi, R. Couturier, P. Vuillemin
HeteroPar’06, Barcelona (Spain)
28 September 2006
3 / 21
Introduction
Outline
1. Parallel iterative algorithms 2. The JaceP2P environment 3. Experimentations with JaceP2P Conclusion and future works
J. Bahi, R. Couturier, P. Vuillemin
HeteroPar’06, Barcelona (Spain)
28 September 2006
4 / 21
1. Parallel iterative algorithms
1. Parallel iterative algorithms
J. Bahi, R. Couturier, P. Vuillemin
HeteroPar’06, Barcelona (Spain)
28 September 2006
5 / 21
1. Parallel iterative algorithms
1.1. Classification
Synchronous Iterations, Synchronous Communications (SISC) Processor 1 Processor 2 time
Synchronous Iterations, Asynchronous Communications (SIAC)
J. Bahi, R. Couturier, P. Vuillemin
HeteroPar’06, Barcelona (Spain)
28 September 2006
6 / 21
1. Parallel iterative algorithms
1.1. Classification Synchronous Iterations, Synchronous Communications (SISC) Processor 1 Processor 2 time
Synchronous Iterations, Asynchronous Communications (SIAC) Processor 1
Processor 2 time J. Bahi, R. Couturier, P. Vuillemin
HeteroPar’06, Barcelona (Spain)
28 September 2006
6 / 21
1. Parallel iterative algorithms
1.1. Classification
Asynchronous Iterations, Asynchronous Communications (AIAC) Processor 1
Processor 2 time
Processors can compute different iterations at a given time t No synchronization between two iterations ⇒ no idle time
J. Bahi, R. Couturier, P. Vuillemin
HeteroPar’06, Barcelona (Spain)
28 September 2006
7 / 21
1. Parallel iterative algorithms
1.2. Conclusions about asynchronism Number of iterations generally greater Warning : ensure convergence ! BUT All idle times suppressed Communications overlapped by computations Execution time considerably reduced, especially in distant context Tolerant to long message delays Message loss tolerant Tolerant to processor heterogeneity Neighbors do not stop when disconnections occur ⇒ Adapted in P2P computing context J. Bahi, R. Couturier, P. Vuillemin
HeteroPar’06, Barcelona (Spain)
28 September 2006
8 / 21
2. The JaceP2P environment
2. The JaceP2P environment
J. Bahi, R. Couturier, P. Vuillemin
HeteroPar’06, Barcelona (Spain)
28 September 2006
9 / 21
2. The JaceP2P environment
2.1. General presentation
Programming and execution environment on P2P network Decentralized platform ("hybrid P2P" topology) Designed for asynchronous iterative applications Multithreaded environment : communications overlapped by computations Developed in Java (portability), RMI for communications (message passing paradigm) Fault tolerant environment (checkpoint mechanisms) Direct communications between peers
J. Bahi, R. Couturier, P. Vuillemin
HeteroPar’06, Barcelona (Spain)
28 September 2006
10 / 21
2. The JaceP2P environment
2.2. The JaceP2P architecture 3 types of entities : Daemons, Super-Nodes, Spawners Daemons (the computing peers) ⇒ execute computation tasks in a parallel fashion ⇒ tolerate neighbor disconnections ⇒ asynchronous communications for exchange dependencies ⇒ store the checkpoints (task clones) of neighbors Super-Nodes (the points of entrance) ⇒ register the available Daemons of the system ⇒ attribute Daemons when launching applications
J. Bahi, R. Couturier, P. Vuillemin
HeteroPar’06, Barcelona (Spain)
28 September 2006
11 / 21
2. The JaceP2P environment
2.2. The JaceP2P architecture 3 types of entities : Daemons, Super-Nodes, Spawners Daemons (the computing peers) ⇒ execute computation tasks in a parallel fashion ⇒ tolerate neighbor disconnections ⇒ asynchronous communications for exchange dependencies ⇒ store the checkpoints (task clones) of neighbors Super-Nodes (the points of entrance) ⇒ register the available Daemons of the system ⇒ attribute Daemons when launching applications
J. Bahi, R. Couturier, P. Vuillemin
HeteroPar’06, Barcelona (Spain)
28 September 2006
11 / 21
2. The JaceP2P environment
2.2. The JaceP2P architecture Spawners (the application launchers) ⇒ launch a given application by specifying : URL of the application class-file number of nodes parameters
⇒ reserve computing nodes on the Super-Nodes ⇒ distribute the computation tasks over the Daemons ⇒ detect Daemon disconnections ⇒ detect global convergence
J. Bahi, R. Couturier, P. Vuillemin
HeteroPar’06, Barcelona (Spain)
28 September 2006
12 / 21
2. The JaceP2P environment
2.2. The JaceP2P architecture 2 types of JaceP2P users The resource provider ⇒ runs a Daemon on his computer when unused ("cycle stealing") ⇒ contributes to execute application launched on the JaceP2P infrastructure The application programmer ⇒ implements his own specific application using the JaceP2P API ⇒ executes a Spawner on his computer The Spawner is the only entity which must be stable
J. Bahi, R. Couturier, P. Vuillemin
HeteroPar’06, Barcelona (Spain)
28 September 2006
13 / 21
2. The JaceP2P environment
2.2. The JaceP2P architecture 2 types of JaceP2P users The resource provider ⇒ runs a Daemon on his computer when unused ("cycle stealing") ⇒ contributes to execute application launched on the JaceP2P infrastructure The application programmer ⇒ implements his own specific application using the JaceP2P API ⇒ executes a Spawner on his computer The Spawner is the only entity which must be stable
J. Bahi, R. Couturier, P. Vuillemin
HeteroPar’06, Barcelona (Spain)
28 September 2006
13 / 21
2. The JaceP2P environment
2.3. Interaction between peers Super−node1 Register
J. Bahi, R. Couturier, P. Vuillemin
Super−node2 Register
HeteroPar’06, Barcelona (Spain)
28 September 2006
14 / 21
2. The JaceP2P environment
2.3. Interaction between peers Super−node1
Super−node2 Register
Register N1 N2
Registration
Daemon
Daemon
N1
N2
J. Bahi, R. Couturier, P. Vuillemin
HeteroPar’06, Barcelona (Spain)
28 September 2006
14 / 21
2. The JaceP2P environment
2.3. Interaction between peers Super−node2
Super−node1 Register
Register
N1 N2 N3
N4 N5
Registration
Registration
Daemon
N1
J. Bahi, R. Couturier, P. Vuillemin
N2
Daemon
N3
HeteroPar’06, Barcelona (Spain)
N4
Daemon
N5
28 September 2006
14 / 21
2. The JaceP2P environment
2.3. Interaction between peers Spawner Super−node1
Super−node2
Register
Register
N1 N2 N3
N1
J. Bahi, R. Couturier, P. Vuillemin
4 processors
N4 N5
N2
N3
HeteroPar’06, Barcelona (Spain)
N4
N5
28 September 2006
14 / 21
2. The JaceP2P environment
2.3. Interaction between peers Spawner Super−node1 Register
Super−node2
RegApli
Register
N1 N2 N3 N4
N5
N1
J. Bahi, R. Couturier, P. Vuillemin
N2
N3
HeteroPar’06, Barcelona (Spain)
N4
N5
28 September 2006
14 / 21
2. The JaceP2P environment
2.3. Interaction between peers Spawner Super−node1
Super−node2
RegApli
Register
N1 N2 N3 N4
Register
N5
Send RegApli
N1 N2 N3 N4
N1 N2 N3 N4
N1 N2 N3 N4
N1 N2 N3 N4
N1
N2
N3
N4
J. Bahi, R. Couturier, P. Vuillemin
HeteroPar’06, Barcelona (Spain)
N5
28 September 2006
14 / 21
2. The JaceP2P environment
2.3. Interaction between peers Spawner Super−node1
Super−node2
RegApli
Register
N1 N2 N3 N4
Register
N5
Send Checkpoints
ite2
ite1 N1 N2 N3 N4
N1 N2 N3 N4
1 N1
J. Bahi, R. Couturier, P. Vuillemin
N1 N2 N3 N4
2 N2
ite3 N1 N2 N3 N4
3 N3
HeteroPar’06, Barcelona (Spain)
N4
N5
28 September 2006
14 / 21
2. The JaceP2P environment
2.3. Interaction between peers Spawner Super−node1
Super−node2
RegApli
Register
N1 N2 N3 N4
Register
N5
Send Checkpoints ite4 N1 N2 N3 N4
N1 N2 N3 N4
4 N1
J. Bahi, R. Couturier, P. Vuillemin
N1 N2 N3 N4
2 N2
N1 N2 N3 N4
3 N3
HeteroPar’06, Barcelona (Spain)
N4
N5
28 September 2006
14 / 21
2. The JaceP2P environment
2.3. Interaction between peers Spawner Super−node1
Super−node2
RegApli
Register
N1 N2 N3 N4
Register
N5
Send Checkpoints
ite1 N1 N2 N3 N4
1 N1
J. Bahi, R. Couturier, P. Vuillemin
ite3
ite2 N1 N2 N3 N4
4
N1 N2 N3 N4
2 2 N2
N1 N2 N3 N4
3 3
N3
HeteroPar’06, Barcelona (Spain)
N4
N5
28 September 2006
14 / 21
2. The JaceP2P environment
2.3. Interaction between peers Spawner Super−node1
Super−node2
RegApli
Register
N1 N2 N3 N4
Register
N5
N1 N2 N3 N4
8 6 4 N1
J. Bahi, R. Couturier, P. Vuillemin
N1 N2 N3 N4
6
7 5 N2
N1 N2 N3 N4
7 9
6
N1 N2 N3 N4
8 10 8
N3
HeteroPar’06, Barcelona (Spain)
N4
N5
28 September 2006
14 / 21
2. The JaceP2P environment
2.3. Interaction between peers Spawner Super−node1
Super−node2
Register
RegApli
1 processor
Register
N1 N2 N3 N4
N5
N5
N1 N2 N3 N4
8 6 4 N1
J. Bahi, R. Couturier, P. Vuillemin
N1 N2 N3 N4
6
7 5 N2
N1 N2 N3 N4
7 9
6
N1 N2 N3 N4
8 10 8
N3
HeteroPar’06, Barcelona (Spain)
N4
N5
28 September 2006
14 / 21
2. The JaceP2P environment
2.3. Interaction between peers Spawner Super−node1
Super−node2
RegApli
Register
N1 N2 N5 N4
Register
Update RegApli
N1 N2 N5 N4
8 6 4 N1
J. Bahi, R. Couturier, P. Vuillemin
N1 N2 N5 N4
6
7 5
N1 N2 N5 N4
N1 N2 N5 N4
8 10 8
N2
N4
HeteroPar’06, Barcelona (Spain)
N5
28 September 2006
14 / 21
2. The JaceP2P environment
2.3. Interaction between peers Spawner Super−node1
Super−node2
RegApli
Register
N1 N2 N5 N4
Register
N1 N2 N5 N4
8 6 4 N1
J. Bahi, R. Couturier, P. Vuillemin
N1 N2 N5 N4
6
7 5 N2
N1 N2 N5 N4
N1 N2 N5 N4
8 10 8 N4 N5 Reload last checkpoint
HeteroPar’06, Barcelona (Spain)
28 September 2006
14 / 21
3. Experimentations with JaceP2P
3. Experimentations with JaceP2P
J. Bahi, R. Couturier, P. Vuillemin
HeteroPar’06, Barcelona (Spain)
28 September 2006
15 / 21
3. Experimentations with JaceP2P
3.1. Problem description The Poisson equation : −∆u = f
Linear problem (PDE) in 2D Finite Difference Method : Space discretization in n2 meshes (problem size : n2 ) Bloc Jacobi like decomposition Each bloc solved by sequential sparse Conjugate Gradient method
J. Bahi, R. Couturier, P. Vuillemin
HeteroPar’06, Barcelona (Spain)
28 September 2006
16 / 21
3. Experimentations with JaceP2P
3.2. Execution context n from 2000 up to 5000 (size problem = n2 ) ⇒ matrices from 4,000,000×4,000,000 up to 25,000,000×25,000,000 Heterogeneous processors and networks : ⇒ 3 Super-nodes (2.40 GHz CPU) ⇒ 100 Daemons (1266 MHz up to 3.00 GHz CPU) ⇒ 1 Spawner (2.40 GHz CPU) ⇒ Ethernet 100 Mbps up to 1 Gbps Application launched on 80 Daemons randomly disconnected/reconnected (from 0 up to 50 disconnections per execution) Tasks checkpointed every 5 iterations J. Bahi, R. Couturier, P. Vuillemin
HeteroPar’06, Barcelona (Spain)
28 September 2006
17 / 21
3. Experimentations with JaceP2P
3.2. Execution context n from 2000 up to 5000 (size problem = n2 ) ⇒ matrices from 4,000,000×4,000,000 up to 25,000,000×25,000,000 Heterogeneous processors and networks : ⇒ 3 Super-nodes (2.40 GHz CPU) ⇒ 100 Daemons (1266 MHz up to 3.00 GHz CPU) ⇒ 1 Spawner (2.40 GHz CPU) ⇒ Ethernet 100 Mbps up to 1 Gbps Application launched on 80 Daemons randomly disconnected/reconnected (from 0 up to 50 disconnections per execution) Tasks checkpointed every 5 iterations J. Bahi, R. Couturier, P. Vuillemin
HeteroPar’06, Barcelona (Spain)
28 September 2006
17 / 21
3. Experimentations with JaceP2P
3.2. Execution context n from 2000 up to 5000 (size problem = n2 ) ⇒ matrices from 4,000,000×4,000,000 up to 25,000,000×25,000,000 Heterogeneous processors and networks : ⇒ 3 Super-nodes (2.40 GHz CPU) ⇒ 100 Daemons (1266 MHz up to 3.00 GHz CPU) ⇒ 1 Spawner (2.40 GHz CPU) ⇒ Ethernet 100 Mbps up to 1 Gbps Application launched on 80 Daemons randomly disconnected/reconnected (from 0 up to 50 disconnections per execution) Tasks checkpointed every 5 iterations J. Bahi, R. Couturier, P. Vuillemin
HeteroPar’06, Barcelona (Spain)
28 September 2006
17 / 21
3. Experimentations with JaceP2P
3.3. Results Time execution according to n with different amounts of disconnections 4000 3500 3000
0 disconnection 10 disconnections 20 disconnections 30 disconnections 40 disconnections 50 disconnections
Time (in s)
2500 2000 1500 1000 500 0 2000
2500
3000 3500 4000 n (problem size = n x n)
4500
5000
Maximum slowdown (i.e. with 50 disconnections) ⇒ ' 2 for n = 2000 ⇒ ' 2.5 for n = 5000 JaceP2P adapted to highly dynamic infrastructures J. Bahi, R. Couturier, P. Vuillemin
HeteroPar’06, Barcelona (Spain)
28 September 2006
18 / 21
Conclusion / Future works
Conclusion / Future works
J. Bahi, R. Couturier, P. Vuillemin
HeteroPar’06, Barcelona (Spain)
28 September 2006
19 / 21
Conclusion / Future works
Conclusion
Presentation of JaceP2P : programming and execution environment for iterative applications on P2P infrastructures Based on hybrid P2P topology and asynchronous iteration model Checkpoint mechanism for fault tolerance Experimentations with real scientific application (linear problem) JaceP2P adapted for iterative algorithms on dynamic and heterogeneous processors/network ⇒ 50 disconnections, slowdown ≤ 2.5
J. Bahi, R. Couturier, P. Vuillemin
HeteroPar’06, Barcelona (Spain)
28 September 2006
20 / 21
Conclusion / Future works
Future works
Experiment the scalability of JaceP2P (thousands of peers : EGEE, Grid’5000...) Implementation and experimentation with other kind of iterative problems (nonlinear, non-stationary, eigenvalues, ...) Make the Spawner fault tolerant (decentralize convergence detection and register...)
J. Bahi, R. Couturier, P. Vuillemin
HeteroPar’06, Barcelona (Spain)
28 September 2006
21 / 21