A Platform of Components for Large Scale Parallel

0 downloads 0 Views 1MB Size Report
concrete ones is automatic and adaptive, by following one .... HPE is composed by three services, named FRONT-END, ... repository managed by the CORE.
A Platform of Components for Large Scale Parallel Programming Francisco Heron de Carvalho Junior Departamento de Computac¸a˜ o Universidade Federal do Cear´a Fortaleza, Brazil [email protected]

Abstract Component-based programming has been applied to address the requirements of large scale applications from sciences and engineering, with high performance computing requirements. However, parallelism has been poorly supported in usual component infrastructures. This paper presents the design and implementation of a parallel components platform based on the H ASH component model, targeting clusters and well suited for programming-in-the-large.

1. Introduction Due to the dissemination of low cost parallel architectures and the new possibilities regarding cooperative work in multidisciplinary environments arising from distributed computing through the internet, a new class of large scale applications from computational sciences and engineering has emerged. They pose new demands for development platforms that must attend their productivity and high performance requirements [27]. Parallel programming is a critical concern, since it is the most important high performance computing (HPC) technique, but it is still hard to be incorporated into widespread development platforms. Research efforts have been be started to advance such technological context. Regarding programming languages, the DARPA/HPCS language deserves special attention [23]. Also, new component models and frameworks have been proposed, such as CCA [4], Fractal/ProActive [11], and GCM [6], trying to reproduce the success of component technologies in corporative applications [29]. However, they lack comprehensive notions of parallel components and connectors for efficient parallel synchronization [13]. The efforts at the sides of programming languages and components are complementary, since programming languages are placed at the perspective of programming-inthe-small whereas components are placed at the perspective

of programming-in-the-large [18]. Unlike programming-inthe-small, the experiences with parallel programming tools and techniques that prioritize programming-in-the-large requirements, where software engineering/architecture techniques are applied with stronger emphasis, are rare. This is because, historically, software from sciences and engineering is structurally simple, compared to business software, being mostly programmed using Fortran and C by groups of one to three programmers at research institutions. The H ASH component model, also referred using the symbol #, has been proposed to meet the requirements of large scale parallel software from HPC domain. HPE (Hash Programming Environment) is a general purpose platform for controlling the life cycle stages of kinds of #-components targeting clusters [14]. The HPE project is hosted at http://hash-programmingenvironment.googlecode.com/. This paper aims to discuss the design, implementation, and performance of the BACK -E ND of HPE, the module responsible to deploy and execute components on the parallel computing platform. Section 2 gives an overview of parallelism support in existing component models and infrastructures. Section 3 overviews the design of HPE. Section 4 focuses on the BACK -E ND of HPE on top of the CLI/Mono platform, targeting clusters of multiprocessors. Section 5 presents a case study of a parallel program built in HPE, aiming to demonstrate some performance characteristics of HPE. Section 6 presents the final considerations.

2. Parallelism in Component Models Approaches for parallelism support in component models tend to extend existing component models and infrastructures. This section intends to describe the most important approaches proposed for the parallelism support on component platforms, from the first attempts to extend CORBA infrastructures to CCA and GCM.

F

f3

B E

f2

b2 e2

e2

d1

a1

D

G

C e1 f1

b1 d2

A

e1 g1

c1

a3

Figura 1. Overlapping Composition

PARDIS [22] and Paco [28] introduced SPMD CORBA objects, where a set of identical objects cooperate in parallel. In Paco, MPI is mandatory for interaction among them. For that, The MPI runtime was integrated to the CORBA BOA (Basic Object Adapter) for ensuring a scalable communication layer between CORBA objects. Paco++ [17] added portability to Paco. Instead of extending the CORBA IDL, it proposed an auxiliary XML file to configure parallelism. GridCCM [26] is an extension to CCM (Corba Component Model) where a parallel component is a collection of SPMD CORBA components. Data Parallel Corba [25] specification has been proposed by OMG for giving support for data parallel programming in CORBA. In a recent paper, a generic approach to incorporate the master-worker paradigm in component models has been proposed and applied to CCM [10]. Its main innovation regards the use of abstract assemblies whose instantiation to concrete ones is automatic and adaptive, by following one in a set of pre-defined request transport patterns for coordinating the interaction among manager and workers. The design of CCA (Common Component Architecture) has been inspired in CORBA, adapted for the requirements of HPC scientific applications [4]. However, the first CCA specification do not define how compliant frameworks must address parallelism, freeing researchers to investigate several possibilities. The SCMD (Single Component Multiple Data) approach has been adopted by CCAffeine [2], where a cohort of components is a set of identical components that interact through messaging passing. An application is formed by sets of cohorts, each one residing in a node of a cluster. The components that reside in the same node bind their ports directly for communication through the framework. If cohorts reside in different clusters, they may have different numbers of components (say N and M ), requiring redistribution of data between the N clients and the M servers, leading to the M ×N problem. For that, CCA people have also worked on PRMI (Parallel RMI) [9]. Fractal [8] is a hierarchical component model, where new components may be built by connecting existing inner components, whose ports may be exposed at the boundaries

of the new component in order to form its interface. Indeed, more than one inner server port may correspond to one port at the interface of the enclosing component, by means of group proxies, in order to support collective dispatching of method calls to a set of inner components. A recent proposal have extended Fractal with collective interfaces of kinds multicast (1-N) and gathercast (M-1), with a promise of introducing M-N interfaces in the near future [7]. Another proposal have approached high level and transparent group communication for object-oriented parallel and distributed programming environments, currently applied to the ProActive/Fractal library [5]. Since 2006, Fractal and CCA communities have cooperated, organizing joint conferences and converging their efforts to the design of GCM (Grid Component Model) [6], mostly based on Fractal and incorporating many innovations, such as collective interfaces.

3. The H ASH Programming Environment HPE (Hash Programming Environment) is a platform for development, management, deployment, and execution of parallel components. It aims to improve the development productivity in the domain of large scale high performance computing applications.

3.1. The H ASH Component Model (#) HPE complies to the H ASH component model (#), which proposes a notion of parallel components and how they can be combined to build new components and applications. It is based on the premise that the mixing of processes and concerns in the same dimension of software decomposition makes difficult to reconcile software engineering and parallel programming practices [15]. Thus, the # component model promotes concern-oriented parallel programming, which is closer to software engineering practices, moving from the process-based perspective that has been approached by usual parallel programming artifacts. The #-components may be deployed in a set of nodes of a parallel architecture. For that, a #-component is composed by a set of parts, called units, each one placed on a node. A unit defines the role of a process with respect to the concern addressed by the #-component. #-components are combined to form new ones by overlapping composition, illustrated in Figure 1, where #-components are represented by ellipses and their units by rectangles. The units of the new #-component are formed by importing units from its inner components, represented by arrows. The imported units are called slices in the context of the enclosing unit, belonging to the same process of the unit in the resulting parallel program. Sharing of inner components is supported (look at the #-component E, shared between B and C).

3.2. H ASH Programming Systems

3.4. HTS: Type System for #-Components

A H ASH programming system is a components-based programming platform that complies to the # component model. For that, it must support: distributed deployment and execution of their parts (units); hierarchical composition by overlapping; and a finite set of component kinds. Component kinds define a set of #-components that reveals relevant traits in common, such as their deployment models, the kind of program modules associated to units, and their restrictions for composition with other #-components that inhabit distinct component kinds. The component kinds supported by a # programming system define the domain of applications that it supports. For instance, the component kinds supported by HPE are: architectures, representing parallel platforms, whose units represent their nodes; environments, representing parallelism enabling environments, such as message passing libraries, grid middleware, computational frameworks, and so on; qualifiers, denoting ad-hoc non-functional concerns; computations, representing parallel computations, whose units represent the role of a process with respect to the computation; synchronizers, denoting patterns of synchronization and communication among processes that have their units as slices; enumerators, to give support for scalable configurations; and applications, which are computations that may be launched on a parallel computer. We are working on a services kind, to allow applications that reside in distinct clusters to interact through CCA bindings.

HTS (Hash Type System) is a type system designed for #-components in HPE. It makes possible programmers to make assumptions about specific features of parallel computing platforms at some desired level of abstraction. Let us introduce the notions of abstract component and #-component using an example. Let C HANNEL be an abstract component inhabited by #-components that implement unidirectional communication channels for different architectural assumptions. For instance, one may abstract away from the type of values that transit through the channel and the underlying message-passing environment used to implement it, like in C HANNEL [X

Suggest Documents