Using Loosely Coupled Clusters for HPC Software Components
C om i f it p
n e i Sc
uti ng
¨ rgens, Dr. Rainer Niekamp, Prof. Hermann G. Matthies Dominik Ju Institute of Scientific Computing, TU Braunschweig, Germany Author: Dominik J¨ urgens
Last change: October 26, 2007
– queries get routed through the connected RM structure – possible circles in the database-tree get resolved automatically
We propose a loosely coupled cluster (LCC) architecture for late binding component models with very few and weak requirements on its nodes. The solution is based on a software component model for highperformance computing and a resource discovery tool. This tool is a component registry implementing a distributed database for the efficient search of available software components.
Introduction Classical Cluster Computing Cluster computing is a widely used technique to combine the resources of a set of individual computers to solve a computationally complex problem. In most HPC applications such clusters are homogeneous and strongly coupled meaning the nodes are of the same kind, explicitly working for the cluster and using common services.
Remote Component Communication
Abstract Application
Parallel MPI Component
Local Component Local Message Passing
Component Native Component Component
• openness (extendable, free) • a platform-independent high-performance component model • a focus on simple usage • support for a wide range of platforms • usage of abstract interface definitions (comparable to the SIDL [2]) to encapsulate foreign code into components These encapsulations enable different software applications to communicate with foreign software-artefacts over some communication channel.
– components on managed nodes, which are down, are ignored – arbitrary redundancy can be introduced
Remote Tasks Remote Tasks Remote Tasks
Cluster R1
Parallel OpenMP Component
Shared Memory
Remote Tasks Remote Tasks Subtasks
SMP-Machine Native Component
PCIe-local
Coprocessor Programm
R1
A1
A1
B1
A2
A4 A3
A
B2
B4 B3
B
Native Component Component Model
Picture 2: Multi-paradigm integration — A distributed application may be composed from (intrinsically parallel) distributed components.
Loosely Coupled Cluster We define a loosely coupled cluster (LCC) as a set of machines which are interconnected using a network with an arbitrary topology. LCCs can be structured in a hierarchy of subclusters. Every two nodes in a subcluster can pairwise communicate. A machine in a LCC is called node. Every node knows every other node in its subcluster by an unique name. From every node programs can be started on every other node of the same subcluster. Example: A collection of workstations with Domain Name System (DNS) and a Remote Shell (RSH) e.g. Secure Shell (SSH) installed as remote program execution environment is an instance of a LCC.
Picture 4: The Topology of the database is an arbitrary subgraph of the physical network. In the picture R1, A1 and B1 are in the same level of the cluster hierarchy. B1 can instantiate components on R1, A1, B2, B3 and B4, but not on A2, A3 and A4 since B is not in cluster A.
The Resource-Manager provides a yellow pages-like infrastructure for component services. With a given requirement in form of a component interface name the manager queries in the distributed database to find a interface implementation. If the manager finds an adequate component he gives back a reference to it. The application can now connect directly to the component without any overhead. The infrastructure is not centralised and is therefore extremely flexible and adaptable.
[RC]
ResourceManager
[CRDC]
Error [R
S]
CTL-Environment
ResourceManager [RR∨¬LA]
SLD
SDD [
CS
SDD
RN
A]
complexOP(param):retT
[RR∨¬LA]
R A ∨¬ R
compile
LCC Managed Node
Component Repository
Network
ComponentInterface
CDB
Component Repository
Cluster Categorisation
local process space
• fits very well with component models
CDB
Component HPC Component Database
Component Allocation
CTL-Environment remote process space
Glossary Transitions: RC=request component RR=remote comp. recommended LA=local comp. available RA=local comp. available RS=reference selected NNV=node not visited CRDC=component ready for direct communication RN=[RNA∧NNV] Activities: SDD=recursive search dist. DB SLD=searching local DB CS=component selection RNA=remote nodes available
Picture 5: Activity diagram of the recursive search for a adequate component implementation.
Components on top of LCC • LCC provides a component execution environment
Python Implementation
• An infrastructure to resolve component locations is necessary Client Implementation
complexOP(param):retT Java Implementation
• hierarchical cluster management
B1:LCC Managed Node Component HPC Component Database
[¬RA]
+
Matlab Implementation
CDB
LCC Passive Node
A]
complexOP(param):retT
complexOP(param):retT
Benefits of the LCC • easy to implement or already at hand
Fortran Implementation
B3
hierarchical managed LCC
[R
complexOP(param):retT
B
central managed LCC
]
C/C++ Implementation
A3
Example Scenario
B4
Database Topology
[LA
Abstraction
B2
A1:LCC Managed Node HPC Component Component Database
SLD
Syntax
A4
A
Network/System Topology
local process
Semantics
A2
B1
SMP-Machine with SIMD Coprocessor
Component Template Library The Component Template Library (CTL) [3] is a generative component framework providing:
• Fault tolerance
Domains of Application
∈{C,C++,Fortran,Matlab,Python,Java}
main():int
complexOP(param):retT
CTL Component
Application CI object
Picture 1: A scientific component model provides services to enable the composition of a highly inhomogeneous set of software components to a distributed application.
remote execution
Application
provide service CTL Environment
LCC Node
LCC Node
Picture 3: Component gets instantiated remotely, the distributed system constructs itself recursively from a central point.
Components and HPC Scientific applications need to be designed with respect to: • managing complex systems • many different styles of programming – massively parallel hardware programmed in SIMD paradigm – multithreaded and message passing programs SPMD paradigm – computational grids MPMD paradigm • separation of concerns • direct peer communication Component based design of software is meant to achieve these goals.
Discovering Resources in a LCC Our solution is based on a software component named ResourceManager (RM) [1].
Meta Computing • Coupling of clusters • Resource discovery with respect to available components • Coupling of subsystem implemented in different paradigms/languages
Conclusion The loosely coupled cluster is an approach to combine the possibilities of component based software and cluster computing. The definition helps to find a configuration of arbitrary interconnected machines to couple them with a component based approach.
References • a RM holds a database with all available components – on its node (LCC Managed Node) – on local database subtrees – on its passive peer nodes (LCC Passive Nodes) • a RM can have entries for other RMs in its database – RMs browse their RM entries recursively – connected RMs provide a spanning tree semantic
[1] Dominik J¨urgens. Implementierung einer verteilten Ressourcenverwaltung als Komponente f¨ur die Component Template Library, 2005. Project Work. [2] Scott R. Kohn, Gary Kumfert, Jeffrey F. Painter, and Calvin J. Ribbens. Divorcing language dependencies from a scientific software library. In PPSC, 2001. [3] Rainer Niekamp. CTL Manual for Linux/Unix, 2005.
Contact Information: Person in Support: Dominik J¨urgens Phone: +49 531 391-3013 E-Mail:
[email protected] Postal Address: Hans-Sommer-Straße 65 38106 Braunschweig
Projectmanager: Dominik J¨ urgens
[email protected]
Telefon: +49 531/391-3013 Telefax: +49 531/391-3003 www.wire.tu-bs.de