Design and Implementation of an Environment for

6 downloads 0 Views 178KB Size Report
occurrence of coordination events (indeed, slice activation in protocols), the #- ... In the current implementation of HPE, only one back-end has been devel- oped ...
Design and Implementation of an Environment for Component-Based Parallel Programming Francisco Heron de Carvalho Junior1 , Rafael Dueire Lins2 , Ricardo Cordeiro Corrˆea1 , Gisele Ara´ ujo1 , and Chanderlie Freire de Santiago1 1

2

Departamento de Computa¸c˜ ao, Universidade Federal do Cear´ a Campus do Pici, Bloco 910, Fortaleza, Brazil {heron,correa,gisele,cfreire}@lia.ufc.br Depart. de Eletrˆ onica e Sistemas, Universidade Federal de Pernambuco Av. Acadˆemico H´elio Ramos s/n, Recife, Brazil [email protected]

Abstract. The # component model has been proposed motivated by the inadequacy of current parallel programming artifacts to the new complexity of high performance computing (HPC) software. It has solid formal foundations, sitting on top of category theory and Petri nets. It also may be viewed as a general theory of parallel components. This paper presents design and architectural issues regarding the implementation of a new parallel programming environment based on the # model.

1

Introduction

The easier access to parallel processing power due to the dissemination of clusters and grids have brought new challenges to computer scientists. Contemporary parallel programming techniques, such as message passing libraries [12, 13], provide poor abstraction, requiring a fair amount of knowledge on architectural details and parallelism strategies that is out of reach of non-specialists [11]. Higher level approaches, such as parallel functional languages and scientific computing libraries [3, 5] do not merge efficiency with generality. The scientific community still looks for parallel programming paradigms that reconcile portability and efficiency with generality and a high-level of abstraction. Several research groups have tried to bring component technology to HPC software development [2, 16, 4]. Components are viewed as promising alternatives for leveraging multidisciplinary HPC environments where large-scale scientific computations may be built integrating software parts, implementing different kinds of models of interest, probably coded by different groups and certainly dynamically deployed. This is a typical scenario in grid computing. However, component models for HPC still have limitations. They do not address nonfunctional and cross-cutting concerns and give poor support for parallel components. Few alternatives for combining components are provided. In most cases, they are limited to nested composition. At the heart of such limitations, lies their close relation with the process-based style of parallel programming.

The # component model moves parallel programming from a process-based perspective to a concern-oriented one, where computation and coordination concerns are orthogonal. The usual notion of components is generalized, by bringing connectors [21] to the same domain of components. Thus, the # model goes far beyond the idea of treating connectors as first-class citizens, advocated by researchers on coordination models. #-components are essentially parallel and may capture functional and non-functional concerns, which may appear cross-cutting themselves. Abstraction in # programming may be achieved through skeletal programming [10] using the so-called notion of component classes. In the next sections, this extended abstract outlines issues regarding the implementation of a new extensible environment for parallel programming based on the # component model, implemented on top of the Eclipse platform [18]. The structure of the final version of this paper will be sketched along the text, with special mention in the concluding section.

2

The # Component Model: Premises and Intuitions

The # component model attempts to move parallel programming from a processbased perspective to a concern-oriented one, where concerns [17] are addressed by #-components. It intends to provide suitable support for the increasing scale and complexity of HPC software. Current component models, even that ones proposed for HPC [2, 16, 4], are not parallel in a general sense. The most common approach for supporting parallel components has been hierarchical composition in its nested form [4], which does not differ in essence to process-based parallel programming, once executing components are processes. Concerns are orthogonal to processes. The # model hypothesis states that a concern-oriented perspective may improve the practice of parallel software development, because application concerns, the essential building blocks of software [17], are commonly found cross-cutting processes in parallel programs [7]. In sequential programming, where only one process (the program itself) exist, cross-cutting concerns are in general exceptions found in software design. Thus, it is reasonable to approach the problem of separation of cross-cutting concerns in sequential software through orthogonal programming language extensions [14, 19]. The # component model adopts overlapping composition, a more general notion of hierarchical composition of components. Also, coordination concerns are orthogonally segregated from computational ones, by using first-class exogenous connectors [21, 15]. 2.1

Coordination Medium

A #-component is comprised by a set of units. Some units are defined as observable ones. #-components may be inductively composed from other #-components by overlapping composition. A unit of the new #-component is defined by the unification of observable units from the #-components being overlapped. They are called slices of the resultant unit. A # program is defined by a #-component, called application component, that do not define observable units because they

Concern−OrientedPerspective

Process−Based Perspective

! ! P0 ! ? !

# Components

P1

? ? P1 ! ! ? ? ? P4 ? !

? ? P2 ! !

C1

overlapping composition

mapping P0

P2

P4 C2

P3

? ? P3 ! !

C0

concern decompos.

slicing

Component Unit Port Channel

Process

C3

(c)

(b)

(a)

Fig. 1. Component Perspective versus Process Perspective

cannot be overlapped. The processes of a # program are defined as the units of the respective application component. Thus, a unit is intuitively defined as a process slice that state the role of the process with respect to the concern addressed by its owner component. Figures 1 and 2(a) illustrate the orthogonality between processes and concerns, a key premise of the # model design. The interface of a unit is defined by its set of slices and a protocol that specifies a partial order for activating slices. Intuitively, slice activation correspond to the execution of actions that define the process role in a concern. Protocols are specified using synchronized regular expressions for equivalence with Petri nets [20]. It is intended to integrate Petri net tools to the # programming environment for enabling verification and analysis of formal properties and performance evaluation of parallel programs. The protocol of a unit must obey the restrictions imposed by the protocols of its slices (behavioral preservation principle).

concerns

overlapping nesting assign

coordination medium

C4

#

C3 C2 C0

B

P0

P2

P3 P4 processes

(a)

I

N

D

I

N

G

P5 computation medium

computational modules

(b)

Fig. 2. Concerns/Processes Orthogonality (a) and Hierarchy of Components (b)

A category theory formalization for the # coordination model has been proposed [8], giving us the required foundations for proposing the # model as a general theory of parallel components. It is intended to use the # component model as an interoperation layer for existing component frameworks for HPC, by using #-frameworks [6]. Essentially, they give meaning to #-components,

by allowing some kind of parallel programming artifact to be mapped onto # coordination level abstractions, such as as units, slices, protocols, etc (Figure 2(b)). The coordination medium abstracts from the implementation of concerns, leaving such responsibility to #-frameworks. As a side-effect, coordination medium is independent from synchronization mechanisms, which becomes a concern of #frameworks. For instance, a #-framework could support the existence of a primitive Channel component, with two units, called sender and receiver. Channelbased synchronization is known to be a general technique for distributed synchronization. Thus, any other synchronization technique, such as remote procedure call and collective communication, may be defined in terms of it [1]. 2.2

A #-Framework for Defining Implementation of Concerns

Now, a general purpose #-framework is proposed to the # programming environment. It is general because it may support virtually any programming language to specify computations or synchronization mechanisms. It defines how programming language modules may be mapped onto #-components, which are called primitive because they are not overlapping compositions of other #-components. A module is defined by a partial abstract data type (PADT) and a computation over it. A computation is defined by an initial PADT’s value and a control flow for executing a set of step-procedures that map PADT’s values to PADT’s values. Side-effects may occur in step-procedures due to synchronization actions. A primitive #-component have only one unit, whose slices are defined by stepprocedures and whose protocol is defined by their control flow.

I2

...

U I6

...

S1 overlapping

...

I1

S2

S’1

I2

I3

I3

...

I1

S2

U I4

S3

S4

S’3 I5

...

I4 I5

S4

I6

S’1 extends S 1 implements 1 I 3, I S’3 extends S implements4 I 5, I 3

Fig. 3. Unification of Units

I2

I6

The concept of partial abstract data type (PADT) generalizes the well known concept of abstract data type. PADT’s are defined by a set of provided signatures and a set of required ones. Also, PADT’s may be extended by specification of additional signatures to be provided (sub-typing semantics), which must be defined in terms of the existing ones. PADT’s are parameterized by their required signatures, which must be supplied by other PADT’s. As well as ADT’s, PADT’s may be supplied by imperative, functional and object oriented languages. The use of PADT’s in the general purpose #-framework is explained by using the example in Figure 3. For instance, it presents a unit U that is composed from unification of units S1 , S2 , S3 , and S4 . The PADT’s implemented by slices S1 and S3 are extended (S01 and S03 ) in order to provide the required signatures (I1 ,I2 ,I3 ,I4 ,I5 ,I6 ) of the PADT’s of S2 and S4 . The PADT of unit U is defined from multiple inheritance from the PADT’s of their slices S1 , S2 , S3 , and S4 . Signatures that are not supplied by any slice, such as I2 and I6 , becomes required signatures of U. Programmers only deal with programming of primitive components. The implementation of the PADT that results from overlapping of components is transparent. The programmer are only responsible to define signature extensions for PADT’s whenever necessary. Such transparency makes possible to hide the fact that modern object oriented languages, such as JAVA and C++, do not support for multiple inheritance, by emulating it. The use of abstract data types for specification of concerns makes possible to attach algebraic specification techniques for specification of #-components. Virtually any algebraic specification formalism could be used.

2.3

Skeletal Programming Through Component Classes

Abstraction in # programming is supported through the notion of component classes. A component class comprises a set of interchangeable #-components, called component instances. Intuitively, a component instance correspond to a specific implementation of a component class that is appropriate to a given execution environment. For that reason, component classes and their instances resembles skeletal programming [9], a promising approach for supporting architectural abstraction in parallel programming that have gained weak acceptance outside academic setting due to reasons pointed out in [10]. A categorical characterization for component classes and instances at the coordination level was also introduced in [8]. By the fact that coordination level only differentiate components by their behavior in terms of partial orders for occurrence of coordination events (indeed, slice activation in protocols), the #equivalence of components does not take into account the concern semantics. However, using the ADT-based approach described in the previous section, it is now possible to extend the original categorical semantics by introducing information regarding implementation of concerns.

The HPE Core FRONT−END extends extends

HPE GENERAL PURPOSE BACK−END

extends

BACK−END

BACK−END

...

other #−frameworks

BACK−END

BACK−END

BACK−END

...

BACK−END

HPE−interoperable #−frameworks

Fig. 4. The HPE Extensible Architecture (Front-End and Back-End ’s)

3

A Programming Environment for #-Components

A environment for parallel programming based on #-components has been prototyped using Eclipse, an extensible programming platform developed by IBM []. The main goal of such environment is to make available a proof-of-concept testbed for empirical studies intended to evaluate the suitability of the # component model for increasing productivity in the design and implementation of high performance computing applications. Such environment has been called HPE (The # Programming Environment). HPE has been structured in two layers: the front-end layer and the back-end ’s layers. The former one implements aspects related to the coordination medium of #-components, while the later ones have direct relation with computation medium, dealing with implementation of their concerns. The front-end may have several back-end’s attached to it. For that, the front-end has been implemented as an Eclipse perspective, an extensible class framework. Back-end’s may be developed by extending the class framework of the front-end or other existing back-end’s with their specific functionalities. Figure 4 depicts such extensible architecture. The front-end has two main concerns: to support a graphical configuration editor for composition of #-components from overlapping of existing ones and to provide a library of #-components, from which programmers may pick up #components to be overlapped. GEF, the Graphical Editing Framework supported by Eclipse [18], has been used to build the configuration editor. It must be emphasized that the configuration editor must be able to deal with specification of protocols, by implementing a protocol editor for that purpose. In HPE, the protocol editor allows to combine protocols of unified units by application of a fixed set of canonical combining operations whose application produce protocols that satisfy the behavior preservation principle aforementioned. In the current implementation of HPE, only one back-end has been developed, as depicted in Figure 4. It implements the general purpose #-framework described in Section 2.2. We think that such general purpose back-end may be used as a substratum on top of which specific purpose back-ends may sit in order

to interoperate. For that, we suggest, but not impose, that any back-end must be implemented by extending the proposed general purpose #-framework. Examples of #-frameworks that may be implemented by specific purpose back-end’s are ports to BLAS-based linear algebra HPC libraries, such as ScaLAPACK [5] and PETSc [3], and to CCA frameworks, whose functionalities might be exposed to # programmers through #-components. It is important to emphasize that the use of components and component classes is transparent. The back-end’s are responsible to choose an appropriate instance of a component class according to the execution environment. Besides to provide some examples of application designs using HPE, the final version of this paper will approach three hot topics in the design of HPE: – how to interface modules developed using different programming languages; – how to put a program to execute in a given execution environment; – how to instantiate component classes for enabling skeletal programming.

4

Conclusions and Lines for Further Work

This extended abstract have outlined the main aspects regarding the design and architecture of a parallel programming environment based on #-components on top of Eclipse. After some years of conceptual design, its implementation is now an ongoing effort that have started at the beginning of September 2005. It have progressed very rapidly due to the highly productive programming environment provided by Eclipse and GEF. It is intended to finish a fully functional implementation until the end of this year. Until there, interested people must accompany the progress of the implementation through the site www.lia.ufc.br/~fhcj/hpe. The final version of this paper must provide more details about design, architecture and implementation of the # programming environment. Also, it is intended to present the design of an application in the area of imaging processing on grid architectures for illustrating our ideas.

References 1. F. Arbab. Reo: A Channel-Based Coordination Model for Component Composition. Mathematical Structures in Computer Science, 14(3):329–366, 2004. 2. R. Armstrong, D. Gannon, A. Geist, K. Keahey, S. Kohn, L. McInnes, S. Parker, and B. Smolinski. Towards a Common Component Architecture for HighPerformance Scientific Computing. In The Eighth IEEE International Symposium on High Performance Distributed Computing. IEEE Computer Society, 1999. 3. S. Balay, K. Buschelman, W. Gropp, D. Kaushik, M. Knepley, L. C. McInnes, B. Smith, and H. Zhang. PETSc Users Manual. Technical Report ANL95/11 Revision 2.1.3, Argonne National Laboratory, Argonne, Illinois, 1996. http://www.mcs.anl.gov/petsc. 4. F. Baude, D. Caromel, and M. Morel. From Distributed Objects to Hierarchical Grid Components. In International Symposium on Distributed Objects and Applications. Springer-Verlag, 2003.

5. L. S. Blackford, J. Choi, A. Cleary, E. D’Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley. ScaLAPACK User’s Guide. Society for Industrial and Applied Mathematics (SIAM), 1997. 6. F. H. Carvalho Junior and R. D. Lins. The # Model for Parallel Programming: From Processes to Components with Insignificant Performance Overheads. In Workshop on Components and Frameworks for High Performance Computing (CompFrame 2005), June 2005. 7. F. H. Carvalho Junior and R. D. Lins. The # Model: Separation of Concerns for Reconciling Modularity, Abstraction and Efficiency in Distributed Parallel Programming. In ACM Symposium on Applied Computing, Special Track on Separation of Concerns, pages 1367–1375, March 2005. 8. F. H. Carvalho Junior and R.D Lins. A Categorical Characterization for the Compositional Features of the # Component Model. In Workshop on Specification and Verification of Component-Based Systems, September 2005. 9. M. Cole. Algorithm Skeletons: Structured Management of Paralell Computation. Pitman, 1989. 10. M. Cole. Bringing Skeletons out of the Closet: A Pragmatic Manifesto for Skeletal Parallel Programming. Parallel Computing, 30:389–406, 2004. 11. J. Dongarra, I. Foster, G. Fox, W. Gropp, K. Kennedy, L. Torczon, and A. White. Sourcebook of Parallel Computing. Morgan Kauffman Publishers, 2003. 12. J. Dongarra, S. W. Otto, M. Snir, and D. Walker. A Message Passing Standard for MPP and Workstation. Communications of ACM, 39(7):84–90, 1996. 13. G.A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek, and V. S. Sunderam. PVM: Parallel Virtual Machine - A User’s Guide and Tutorial for Networked Parallel Computing. MIT Press, Cambridge, 1994. 14. G. Kiczales, J. Lamping, Menhdhekar A., Maeda C., C. Lopes, J. Loingtier, and J. Irwin. Aspect-Oriented Programming. In Lecture Notes in Computer Science (Object-Oriented Programming 11th European Conference – ECOOP ’97), pages 220–242. Springer-Verlag, November 1997. 15. K. Lau, P. V. Elizondo, and Z. Wang. Exogeneous Connectors for Software Components. In Proceedings of ..., pages 1–. ACM Press, 2005. 16. N. Mahmood, G. Deng, and J. C. Browne. Compositional Development of Parallel Programs. In 16th International Workshop on Languages and Compilers for Parallel Computing, October 2003. 17. H. Milli, A. Elkharraz, and H. Mcheick. Understanding Separation of Concerns. In Workshop on Early Aspects - Aspect Oriented Software Development (AOSD’04), pages 411–428, March 2004. 18. B. Moore, D. Dean, A. Gerber, G. Wagenknecht, and P. Vanderheyden. Eclipse Development Using the Graphical Editing Framework and Eclipse Modelling Framework. IBM International Technical Support Organization, February 2004. http://www.ibm.com/redbooks. 19. H. Ossher and P. Tarr. Multi-Dimensional Separation of Concerns and the Hyperspace Approach. In Proceedings of the Symposium on Software Architectures and Component Technology: The State of the Art in Software Development. Kluwer Academics, June 2000. University of Twente, Enschede, The Netherlands. 20. C. A. Petri. Kommunikation mit Automaten. Technical Report RADC-TR-65-377, Griffiths Air Force Base, New York, 1(1), 1966. 21. M. Shaw. Procedure Calls are the Assembly Language of Software Interconnection: Connectors Deserve First-Class Status. In International Workshop on Studies of Software Design, Lecture Notes in Computer Science. Springer-Verlag, 1994.