Document not found! Please try again

Using Loosely Coupled Clusters for HPC Software ...

6 downloads 0 Views 262KB Size Report
[email protected]. Postal Address: Projectmanager: Telefon: +49 531/391-3013. Hans-Sommer-Straße 65. Dominik Jürgens. Telefax: +49 531/391-3003.
Using Loosely Coupled Clusters for HPC Software Components

C om i f it p

n e i Sc

uti ng

¨ rgens, Dr. Rainer Niekamp, Prof. Hermann G. Matthies Dominik Ju Institute of Scientific Computing, TU Braunschweig, Germany Author: Dominik J¨ urgens

Last change: October 26, 2007

– queries get routed through the connected RM structure – possible circles in the database-tree get resolved automatically

We propose a loosely coupled cluster (LCC) architecture for late binding component models with very few and weak requirements on its nodes. The solution is based on a software component model for highperformance computing and a resource discovery tool. This tool is a component registry implementing a distributed database for the efficient search of available software components.

Introduction Classical Cluster Computing Cluster computing is a widely used technique to combine the resources of a set of individual computers to solve a computationally complex problem. In most HPC applications such clusters are homogeneous and strongly coupled meaning the nodes are of the same kind, explicitly working for the cluster and using common services.

Remote Component Communication

Abstract Application

Parallel MPI Component

Local Component Local Message Passing

Component Native Component Component

• openness (extendable, free) • a platform-independent high-performance component model • a focus on simple usage • support for a wide range of platforms • usage of abstract interface definitions (comparable to the SIDL [2]) to encapsulate foreign code into components These encapsulations enable different software applications to communicate with foreign software-artefacts over some communication channel.

– components on managed nodes, which are down, are ignored – arbitrary redundancy can be introduced

Remote Tasks Remote Tasks Remote Tasks

Cluster R1

Parallel OpenMP Component

Shared Memory

Remote Tasks Remote Tasks Subtasks

SMP-Machine Native Component

PCIe-local

Coprocessor Programm

R1

A1

A1

B1

A2

A4 A3

A

B2

B4 B3

B

Native Component Component Model

Picture 2: Multi-paradigm integration — A distributed application may be composed from (intrinsically parallel) distributed components.

Loosely Coupled Cluster We define a loosely coupled cluster (LCC) as a set of machines which are interconnected using a network with an arbitrary topology. LCCs can be structured in a hierarchy of subclusters. Every two nodes in a subcluster can pairwise communicate. A machine in a LCC is called node. Every node knows every other node in its subcluster by an unique name. From every node programs can be started on every other node of the same subcluster. Example: A collection of workstations with Domain Name System (DNS) and a Remote Shell (RSH) e.g. Secure Shell (SSH) installed as remote program execution environment is an instance of a LCC.

Picture 4: The Topology of the database is an arbitrary subgraph of the physical network. In the picture R1, A1 and B1 are in the same level of the cluster hierarchy. B1 can instantiate components on R1, A1, B2, B3 and B4, but not on A2, A3 and A4 since B is not in cluster A.

The Resource-Manager provides a yellow pages-like infrastructure for component services. With a given requirement in form of a component interface name the manager queries in the distributed database to find a interface implementation. If the manager finds an adequate component he gives back a reference to it. The application can now connect directly to the component without any overhead. The infrastructure is not centralised and is therefore extremely flexible and adaptable.

[RC]

ResourceManager

[CRDC]

Error [R

S]

CTL-Environment

ResourceManager [RR∨¬LA]

SLD

SDD [

CS

SDD

RN

A]

complexOP(param):retT

[RR∨¬LA]

R A ∨¬ R

compile

LCC Managed Node

Component Repository

Network

ComponentInterface

CDB

Component Repository

Cluster Categorisation

local process space

• fits very well with component models

CDB

Component HPC Component Database

Component Allocation



CTL-Environment remote process space

Glossary Transitions: RC=request component RR=remote comp. recommended LA=local comp. available RA=local comp. available RS=reference selected NNV=node not visited CRDC=component ready for direct communication RN=[RNA∧NNV] Activities: SDD=recursive search dist. DB SLD=searching local DB CS=component selection RNA=remote nodes available

Picture 5: Activity diagram of the recursive search for a adequate component implementation.

Components on top of LCC • LCC provides a component execution environment

Python Implementation

• An infrastructure to resolve component locations is necessary Client Implementation

complexOP(param):retT Java Implementation

• hierarchical cluster management

B1:LCC Managed Node Component HPC Component Database

[¬RA]

+

Matlab Implementation

CDB

LCC Passive Node

A]



complexOP(param):retT

complexOP(param):retT

Benefits of the LCC • easy to implement or already at hand

Fortran Implementation

B3

hierarchical managed LCC

[R



complexOP(param):retT

B

central managed LCC

]

C/C++ Implementation

A3

Example Scenario

B4

Database Topology

[LA

Abstraction

B2

A1:LCC Managed Node HPC Component Component Database

SLD

Syntax

A4

A

Network/System Topology

local process

Semantics

A2

B1

SMP-Machine with SIMD Coprocessor

Component Template Library The Component Template Library (CTL) [3] is a generative component framework providing:

• Fault tolerance

Domains of Application



∈{C,C++,Fortran,Matlab,Python,Java}

main():int

complexOP(param):retT

CTL Component

Application CI object

Picture 1: A scientific component model provides services to enable the composition of a highly inhomogeneous set of software components to a distributed application.

remote execution

Application

provide service CTL Environment

LCC Node

LCC Node

Picture 3: Component gets instantiated remotely, the distributed system constructs itself recursively from a central point.

Components and HPC Scientific applications need to be designed with respect to: • managing complex systems • many different styles of programming – massively parallel hardware programmed in SIMD paradigm – multithreaded and message passing programs SPMD paradigm – computational grids MPMD paradigm • separation of concerns • direct peer communication Component based design of software is meant to achieve these goals.

Discovering Resources in a LCC Our solution is based on a software component named ResourceManager (RM) [1].

Meta Computing • Coupling of clusters • Resource discovery with respect to available components • Coupling of subsystem implemented in different paradigms/languages

Conclusion The loosely coupled cluster is an approach to combine the possibilities of component based software and cluster computing. The definition helps to find a configuration of arbitrary interconnected machines to couple them with a component based approach.

References • a RM holds a database with all available components – on its node (LCC Managed Node) – on local database subtrees – on its passive peer nodes (LCC Passive Nodes) • a RM can have entries for other RMs in its database – RMs browse their RM entries recursively – connected RMs provide a spanning tree semantic

[1] Dominik J¨urgens. Implementierung einer verteilten Ressourcenverwaltung als Komponente f¨ur die Component Template Library, 2005. Project Work. [2] Scott R. Kohn, Gary Kumfert, Jeffrey F. Painter, and Calvin J. Ribbens. Divorcing language dependencies from a scientific software library. In PPSC, 2001. [3] Rainer Niekamp. CTL Manual for Linux/Unix, 2005.

Contact Information: Person in Support: Dominik J¨urgens Phone: +49 531 391-3013 E-Mail: [email protected] Postal Address: Hans-Sommer-Straße 65 38106 Braunschweig

Projectmanager: Dominik J¨ urgens [email protected]

Telefon: +49 531/391-3013 Telefax: +49 531/391-3003 www.wire.tu-bs.de

Suggest Documents