2004 1st International Conference on Electrical and Electronics Engineering
A comparative study of intrinsic parallel programming methodologies Horacio Gonzhlez-Velez School of Informatics, University of Edinburgh, United Kingdom
Adriano de Luca Secci6n de Computaci6n, CINVESTAV, Mkxico
Virginia Gonz5tlez-Vklez Ciencias Bhicas e Ingenieria, UAM - Azcapotzalco Mkxico
[email protected]
dlaD(3delta.cs.cinvestav.m
vgvk3correo.azc.uam.m
Abstract
(IPP) methodologies is to produce the best possible abstraction of the underlying parallelism while maximizing the performance of the resulting application program. In order to synthesize a program, these techniques rely on the programmer to correctly describe and map an algorithm and its intrinsic dependencies into the abstract building blocks available from a given framework. The bulk of the programming effort is then placed on this description-mapping phase rather than in programming the communication and synchronization of data and processes. This hardwareindependence makes IPP methods particularly suitable for the development of parallel applications for heterogeneous systems. In order to develop a useful way to compare the different approaches, four different criteria are employed: maturity, adoption, implicitness, and standardization. This work does not intend to be comprehensive, but aims to provide a broad overview and a ranking of the different approaches employed in this subject. This way, an
This work provides a comparative report on intrinsic parallel programming (IPP) methodologies: Structured, Descriptive, and Component-based. lls main objective is lo develop an aid f o r programmers that will allow thein io selecr suitable 1001s which will enable them IO consimct hardware-irzdepeiident applicadion programs. It is true that every problem addressed using parollel methods does not necessarily employ IPP methodologies. Nohvithstanding, however; the hardware-independence of IPP methods makes them particularly suitable for software development in heterogeneous parallel arid distributed qstems. This comparison ranks the three programming paradigms using a pairwise method. The rating criteria is thus :maturip, adoption, implicitness, and standardization; all having equal weighrs. It is concluded that structured parallelism is the highest ranked merhodology.
application programmer can greatly benefit from this in-
sight when seIecting a developing tool. The remainder of this paper is organized as follows: the following t h e e sections describe each IPP methodology, next a simplified painvise ranking is constructed based on the four criteria aforementioned and, finally, some remarks are presented.
1. Introduction Computer programming has long been regarded as an
art. Despite the major breakthroughs made, it is still a highly demanding activity. It is widely acknowledged that parallel programming is more difficult than its sequential counterpart. Hence, easing parallel programming has long been studied through the employment of several assorted approaches. Parallel programming intends to increment the overall performance of a particular program, and eventually of the whole system. This is achieved through solving a given instance of a problem more speedily, increasing its size, or both. Interestingly, researchers can increment the resolution of their experiments and engineers may be able to develop larger and more reliable prototypes. This paper describes and compares methods which exhibit intrinsic parallelism. These methods can be classified into three related, but nevertheless distinct, families as presented in Figure 1. The objective of the intrinsic parallel programming
0-7803-8531-4104/$20.00 02004 IEEE
2. Structured Parallelism The algorithmic skeleton concept was originally introduced as higher order functions corresponding to parallel algorithmic techniques [ 11. Over time, skeletal programming has evolved into structured parallelism understood as the composition of skeletons for the development of programs where the control is inherited through the structure and the programmer must adhere to top-down design and construction. Skeletons possess a high degree of intrinsic parallelism. Skeletal programming requires the description of the algorithm rather than its implementation, therefore it may well
200
IPP
Structured Parallelism : Skeletons, Higher-order functions, Functors Descriprive Parallelism : Patterns, Templates, Archetypes, Generic programming Componenl - based Parallelism : Agents, Objects, Generative programming
Figure 1. High-levelparallel programming (IPP) methodologies object-oriented programming. Despite the increasing adoption of the structured parallelism paradigm it is still important to note that it has not been subjected to enough critical study in order for it to qualify as a mainstream programming technique. Although it is easily arguable the great applicability of the structured parallelism paradigm, it is also important to say that it still lacks of critical mass to become a mainstream programming technique. a shortcoming of this methodology is its constrained application space, since it can only address well-defined algorithmic solutions. Furthermore, skeletal programming does not possess an architecture nor a standard way of exchanging skeletons between different implementations. Some consideration has been devoted to the matter [ 121, and future research may lead to a standard architecture.
be argued that the corresponding algorithm formulation and implementation are simpler. Hence, the actual implementation of parallel algorithms using skeletal programming is not more complicated than other parallel programming methodologies. Structured parallelism methodologies present a topdown transformational approach where programs are derived, rather than constructed, from the application of algorithmic transformations. Formal research on how to properly write, evaluate and transform different skeleton composites has been widely explored [2]. Associated cost estimation models have also been developed to improve overall performance of the resulting programs, placing the performance of parallel skeletal systems in a preponderant position. The skeleton concept and its inherent parallelism are not tied to any particular implementation or architecture. It does not rely on specific hardware or software for their portability, but they gain benefit entirely from any performance improvements in the systems infrastructure. Hence, application programmers must not be deeply concerned with the low-level parallel execution of a program. This hardware independence has been successfully illustrated in large-scale parallel systems using activity graphs and skeletons targeting both MPI and BSP [3J. Although the notion of higher-order functions is immediately associated with the use of functional languages, it is important to mention that the skeletal paradigm has been successfully implemented not only in functional programming [4] but also through structured parallel lmp a g e s [5, 61, libraries [7, SI, commercial frameworks 191, and object-oriented environments [ 101. Indeed, the actual implementation ranges from macro-processing in compilers and MPI-calls in libraries to classes in object-oriented programming and lazy-evaluation functions in declarative programming. Different implementations provide class-problem skeletons such as divide-n-conquer, branch-n-bound, and dynamic programming; computation and communication skeletons such as scan, reduce, or map; and control structures such as pipelines, master-slave and processor farms [ l l ] . Alternative ways of expressing structured parallelism constructs are standard higher-order functions in functional programming, or functors in imperative or
3. Descriptive parallelism We defined the descriptive parallelism methods as those depicting a parametric solution strategy to reappearing classes. The structure of the resulting application derives from the assembly of patterns, archetypes or templates. Having been conceived as abstract identifiers to common themes in object-oriented programming [ 133, design patterns have been incorporated into concurrent computing. CO2P3S [ 141 is an automated pattern-based parallel programming tool, where an application programmer can generate parallel code by parameterizing a framework and adding the sequential parts. While supporting the creation of patterns and templates, it features multiple levels of abstraction, including native Jaw&++ code. PASM [lSJimplements architectural code-skeletons as a set of generic attributes associated with a pattern. In this context, the concept of code-skeleton is oriented toward C++/MPI code generation rather than abstraction of functions as discussed in Section 2. DIP [ 161 is a pattern-based coordination language for High-Performance Fortran (HPF). It abstracts the inter-process communication and synchronization among concurrent HPF domains and tasks. An archetype generalizes the concept of a pattern by abstracting into a single structure the parallel computation and the communication of a generic problem class [17]. Although restricted in scope, it is a well-rounded tool for
20 1
initial parallel software development, An initial archetypal implementation has been employed to exploit parallel computations based on n-dimensional meshes. Coincidentally, a design methodology to work with archetype-like patterns has been devised under the umbrella of a pattern language [lX]. Unlike a programming language, this pattern language presents rules to design parallel codes with archetypes. Type parametrization is commonly referred to in objectoriented programming as a template. Templates resemble higher-order functions in functional programming, because instantiation is performed through the types received. TACO 1191, Pooma [20], and C++2MPI E211 are complete template libraries. TACO supports data-parallel programming through object and collective operations such as join, map, and reduce. Pooma, currently part of ROSE [22], includes a collection of HPC-oriented templates including multidimensional arrays and objects to model particle physics experiments. C++2MPI can generate MPI code from instantiated C++ classes. Generic programming takes templates one step further by providing a methodology for the automatic creation of complete template-based libraries. Janus [23] provides a framework for the implementation of scientific libraries using generic programming. Although re-usability is the main claim of the descriptive paradigm, different implementations do not seem to exchange information at the moment. No evidence was found on the interaction or the exchange of elements between different implementations. That is to say, we could find evidence of a widely-accepted architecture or common specification.
Then, a generator synthesizes it from component abstractions. Generative programming encompasses an engineering methodology for the reuse of system families, rather than single components, and has started to find its own niche in concurrent programming [30]. Objects have also been adumbrated as agents interacting concurrentiy by message passing. They have been used to program systems in a parallel programming fashion. Visper I311 implements an agent-based system for composition, execution and testing of concurrent programming. It allows inclusion of C and Java code, generates MPI calls, and includes fault tolerant capabilities. IAP 1321 is an agent-based middieware for concurrent programming on clusters. It has successfully scheduled large calculations for telecommunications virtual private networks. PaCMAn [33] and JOPI [341 are Java-based frameworks that use agents to program distributed systems. Component-based systems pose a challenge to system complexity and compatibility due to the multiplicity of sources and formats of components [35]. As far as highperformance computing is concerned, a major standardization effort is underway with the creation of the technical specification of the Common Component Architecture (CCA) [36]. As part of the Advanced Computational Testing and Simulation (ACTS) project 1371 funded by the US government, the CCA is a major initiative undertaken to explore the use of components for scientific software, and therefore, (will be of great influence within the parallel.) influence the parallel programming arena.
4. Component-based parallelism
In this section, we draw a comparison using a simplified pairwise ranking method. The painvise ranking method is employed due to its widespread use to calculate preferences, taking into account that all individual ranks in this study are based on subjective assertion. For each of the four criteria, we present a set of orderings rating the three methodologies. This set of orderings provides a compact option to the standard pairwise matrix for comparisons where there are few criteria and items to be ranked [38]. We use 3 D A 5 B + C to denote that methodology A outranks B and B outranks C under the criterion %. That is to say A is ranked in place 1, B in 2 and C in 3. SP, DP, and CP stand for the Structured Parallelism, Descriptive Parallelism, and Component-based Parallelism methodologies respectively.
5. Comparison
A component i s a set of objects with published interfaces which comply with a set of rules defined by a specific component model. A component model augmented with a set of system components defines a component architecture. Parallel programs are assembled using independent components from a given architecture. Components can then be conceived as meta-objects for concurrent environs. The interaction between components and the diverse forms of composition yield to the different strains of componentbased parallelism, CmaBlanca [24], CB-PSE [25], Concerto [261, Ensemble [27], and HiMM [28] are complete component-based frameworks for concurrent programming. Their common objective is the synthesis of concurrent programs by component assembly. The application programmer is requested to provide parameters and sequential code inserts. Generative programming [29] is the automatic selection and assembly of Components on demand. The programmer specifies the application in a domain-specific language.
5.1. Maturity Maturity is defined as the level of development the methodology has reached. It is expressed in terms of its
202
Table 1. Pairwise Ranking of Intrinsic Parallel Programming Methodologies
Maturity
Adoption
Implicitness
Standardization
OVERALL
2
1
3
2
Components
2
1
3
2 3 1
1
Descriptive
1 3
Structured
trolIed by a master with a computational function on the intemal elements of each node.
evolution as well as the theory and studies devised. As shown in Section 2, the formalism devised around SP has greatly improved the composition and cost-estimation of the skeletal parallel programming. Due to its clear and concise modelling, it has brought about a new research area. Owing to this, it is, therefore, ranked in the first place, followed by CP, which has witnessed an increasing interest. In the last position is DP which still requires further improvement in this respect.
Maturity D
SP + CP k DP
0
(1)
SP: The program will be the result of the nesting and parameterizing of the ready-made skeletons. The optimal nesting and sequencing will be implicitly defined the framework. As an integral part of the process, some cost models will probably be available to estimate the performance of the resulting program. In this case, the parallelism and program generation are totally implicit to the model DP: It will be necessary to have the aI1-pairs template or pattern. Although the code creation is inherent to the system, the creation of the pattern or template will be the result of explicitly describing a sequencing of pattems constructs. Hence, the finaI pattemkmplate is explicitly constructed from basic elements while the parallelism is implicitly generated.
5.2. Adoption Adoption is quantified in terms of the number of software development tools available to the appIication programmer. It is probably the most straightforward criteria as i t relies solely on the number of projects described in the literature. The ACTS initiative is per se enough reason for CP to outrank the rest, not to mention the increasing interest in CP due to the global grid and metacomputing initiatives. Having been around for longer and with more active research groups, SP ranks second whilst DP remains last,
Adoption D CP t- SP + DP
3 2
0
CP: This approach is illustrated in [39]. Special components for pipeline, master and aI1-pairs pipeline need to be created before distributing them to the executing architecture. The parallelism and its corresponding distribution of components on the underlying architecture needs to be explicitly defined. Implicitness D SP
(2)
+ DP k CP
(3)
Standardization refers to the availability of common formats for exchange of structures among different implementations as well as mutual architectures. This is of particular interest for global software projects where several research and industrial groups are involved. Indeed, only CP has a defined standard architecture 1361. SP is placed in second as per some initial efforts in this respect [121.
5.3. Implicitness Implicitness in this context is understood as the development features provided to an application programmer to express parallelism in a given algorithm. Alternatively, one can picture this as how much effort a programmer has to devote when writing an application in parallel. We can illustrate this with an example. Thus, let us suppose we need to implement a certain non-trivial program, e.g. a generic all-pairs computation [39] which is roughly a pipeline con-
Standardization D CP
+ SP + RP
(4)
'In this context, standardization can also be understood as Inrerupernhilil)'
203
References
The summary of the ranks obtained from sets of ordering ( 1),( 21, ( 3), and ( 4) is presented in Table 1. Entry i , j in the table represents the rank of methodology iunder criterion j . To calculate the overall rank i n a simplified painvise fashion, we simply add all assigned ranks for each methodology(row) since all four criteria have been assigned the same weight. The row sums (6,11,7) place SP, DP, and CP in ranks {1,3,2). This overall ranking is conveyed in the last column of Table 1. Hence, it is concluded rhat the structured parallelism methodology outranks the other.
[ 11
M. Cole, Algorithmic Skeletons: Structured Management of Parallel Computation. London, UK: MIT Press, 1989. [2] M. Aldinucci, S . Gorlatch, C. Lengauer, and S. Palagatti, “Toward parallel programming by transformation: The FAN skeleton framework,” Parallel Algorithms Appl., vol. 16, pp. 87-121,2001. [3] M. Cole and A. Zavanella, “Coordinating heterogeneous parallel systems with skeletons and activity graphs,” J. Sys. Integmtion, vol. 10, no. 2, pp, 127-143, 2001. [4] H.-W. Loidl, F. Rubio, N. Scaife, K. Hammond, S. Horiguchi, U. Klusik, R. Loogen, G. J. Michaelson, R. Pena, S. Priebe, A. Rebon, and P. W. Trinder, “Compar-
6. Closing Remarks
ing parallel functional languages: Programming and performance,” Higher-Order urd Symb. Compurai., vol. 16, no. 3, pp. 203-25 I , 2003. [ 5 ] S.Pelagatti, T a s k and data parallelism in P3L,” in Parterns attd skeletons for parallel and distributed computing, E A. Rabhi and S . Gorlatch, Eds. London, UK: Springer-Verlag, 2003, pp. 155-186. [6] C. A. Hemnann and C. Lengauer, “HDC: A higher-order language for divide-and-conquer,” Parallel Process. h t t . , vol. 10, no. 2-3, pp. 239-250,2000. [7] E. Alba, E Almeida, M. Blesa, 1. Cabeza, C. Cotta, M. Diaz, I. Dorta, J. Gabarro, C. Leon, 5. Luna, L. Moreno, C . Pabios, J . Petit, A. Rojas, and F. Xhafa, “MALLBA: A library of skeletons for combinatorial optimisation,”in EuroPar 2002, 8th In? Conf on Parallel Processing, ser. LNCS, B. Monien and R. Feldmann, Eds., vol. 2400. Paderbom, Germany: Springer-Verlag,2002, pp. 927-932. [8] H. Kuchen, “A skeleton library:’ in Euro-Pur 2002, 8rh b t t Cunf on Parallel Processing, ser. LNCS, B . Monien and R. Feldmann, Eds., vol. 2400. Paderbom, Germany: Springer-Verlag, 2002, pp. 620-629. [9] E. Bacci, M. Danelutto, S. Pelagatti, and M. Vanneschi, “SklE: A heterogeneous environment for HPC applications,” Parallel Coinput., vol. 25, no. 13, pp. 1827-1852, 1999. [IO] M. Aldinucci, M. Danelutto, and P. Teti, “An advanced environment supporting structured parallel programming in Java,” Future Gener: Comput. Syst., vol. 19, no. 5, pp. 61 1626,2003. [ 1 1J E A. Rabhi and S. Gorlatch, a s . , Patterns and skeletonsfor parallel and distributed computiiig. London, U K: SpringerVerlag, 2003. I121 M. Cole, “Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming,” Parallel Compur., vol. 30, no. 3, pp. 389406,2004. [13] E. Gamma, R. Helm, R. Johnson, and J. Vlissides, “Design patterns: Abstraction and reuse of object-oriented design,” in ECOOP’93 - 7th European Conf on Object-OrieriredPmgramming, ser. LNCS, 0.Nierstrasz, Ed., vol. 707. Kaiserslautem, Germany: Springer-Verlag, 1993, pp. 406-431. [14] S. MacDonald, J. Anvik, S. Bromling, J. Schaeffer, D. Szafron, and K. Tan, “From pattems to frameworks to parallel programs,” Parallel Comput., vol. 28, pp. 1663-
It is argued that structured parallelism methodologies tend to express domain-level knowledge in a superior way and therefore, exploit more efficiently the implicit parallel structure of a given problem. Several approaches have been successfully implemented in heterogeneous parallel and distributed systems. Nevertheless, the full potential available through its application within parallel programs for employment in computational grids is yet to be fully explored. As previously stated, this work is by no means comprehensive and its main objective is to provide a selection aid to researchers and practitioners in the field. Its main contribution is its novelty since scant research has been devoted to similar comparative work. This investigation can be extended or modified by: Adding or deleting criteria Modifying the ranking methodology, either by just al-
tering the weights of criteria or by replacing it e
Introducing a consensus to determine individual rankings or weights
Furthermore, updates to this work will arise owing to the fact that the number of parallel programming methods and tools are increasing on a regular basis. Additionally, it may be interesting to evaluate the aforesaid paradigms using a base problem and an empirical approach.
Acknowledgement This work was partly supported by the British Foreign & Commonwealth Office and the Consejo Nacional de Ciencia y Tecnologia-Mexico under grant number 81887.
204
1683,2002. [IS] D. Goswami, A. Singh, and B. R. Preiss, “From design pattems to parallel architectural skeletons,” J. Parallel Distrib. Comput., vol. 62, no. 4, pp. 669495,2002. [I63 M. Diaz, B. Rubio, E. Soler, and J. M. Troya, “Domain inter-
1291
action pattems to coordinate HPF tasks,” PuraZZel Cumpur., vol. 29, no. 7, pp. 925-951,2003. [I71 B . L. Massingill and K. M. Chandy, “Parallel program archetypes,” in IPPS/SPDP’99 13th Int Symp on Parallel Processing & 10th Synp mi Parallel and Distribuled Pmcessing. San Juan, Puerto Rico: IEEE CS, 1999, pp. 290296. [I81 B. L. Massingill, T. G. Manson, and E. A. Sanders, “A pattern language for parallel application programs,” in EuroPur 2000 6th 1111 Euro-Par Conf on Parallel Processing,ser. LNCS, A. Bode, T. Ludwig, W. Karl, and R. Wismuller, Eds,, vol. 1900. Munich, Germany: Springer-Verlag, 2000, pp. 678-68 1. [I91 J. Nolte, M, Sato, and Y. ishikawa, “TACO-dynamic distributed collections with templates and topologies,” in EuroPar 2000 6rh h r Euro-Par Confait Parallel Processing, ser. LNCS, A. Bode, T. Ludwig, W. Karl, and R. Wismuller, Eds., vol. 1900. Munich, Germany: Springer-Verlag, 2000, pp. 1071-1080. [20] S . Haney and J. Crotinger, “How templates enable highperformance scientific computing in C++,” Cuniputiiig in Science & Eizg., vol. 1, no. 4, pp. 66-72, 1999. [ZI] R. Hillson and M. Igfewski, “C++2MPI: A software tool for automatically generating MPI datatypes from C++ classes,” in PARELEC’UO lnt C o ~ f o Parallel n Computing in Electrical D i g . Quebec, Canada: IEEE CS, 2000, pp. 13-1 7. [22J D. I. Quinlan, M. Schordan, 8. Miller, and M. Kowarschik, “Parallel object-oriented framework optimization,” Concurrency Cunipurar. Pract. ExpeE, vol. 16, no. 2-3, pp. 293302,2004. [23] J. Gerlach and J . Kneis, “Generic programming for scientific computing in Ct+,Java, and C#,”in APP72003 5th bit Wksp on Advanced Parallel Processing Technologies, ser. LNCS, X. Zhou, S . Jahnichen, M. Xu, and J. Cao, Eds., vol. 2834. Xiamen, China: Springer-Verlag, 2003, pp. 301310. [24] W. B.VanderHeyden, E. D. Dendy, and N. T. Padial-Collins, “CmaBlanca- a pure-Java, component-based systems simulation tool for coupled non-linear physics on unstructured grids- an update,” Concurrency Computar. Pract. Exper, vol. 15, no. 3-5, pp. 43148,2003. 1251 M. Li, 0. P. Rana, and D. W. Walker, “Wrapping MPI-based legacy codes as JavdCORBA components,” Future Geiier: Comp1)ur.Syr., vol. 1 8, pp. 213-223, 2001. [26] L. Courtrai, E Guidec, N. Le Sommer, and Y. Maho, “Resource management for parallel adaptive components,” in IPDPS’O3 17th b i i Symp on Parallel and Distributed Processirig. Nice, France: IEEE CS, 2003, p, 134.2. [ t 7 ] J. Y. Cotronis, “Reusable message passing components,” in 8th Euromicro Wksp on Parallel and Distributed Piocessing. Rhodes, Greece: IEEE CS, 2000, pp. 398405. [ZS] M. Di Santo, F. Frattolillo, W. Russo, and E. Zimeo, “A
[30]
[3I]
[32]
component-based approach to build a portable and flexible middleware for metacomputing,” Parallel Comput., vol. 28, no. 12, pp. 1789-18l0,2002. K. Czarnecki and U. W. Eisenecker, “Components and generative programming,” in ESEC/FSE ‘99, 7th European Sofrware Engineering Con5 ser. LNCS, U. Nierstrasz and M. Lemoine, Eds., vol. 1687. Toulouse, France: SpnngerVerlag, 1999,pp, 2-19. I. McRitchie, T. J. Brown, and I. T. A. Spence, “A Java framework for the static reflection, composition and synthesis of software components,” in 2nd Znt Conf on principles and practice ofprograniming in Java. Kilkenny City, Ireland: ACM Press, 2003, pp. 19-20. N. Stankovic and K. Zhang, “A distributed parallel programming framework,” lEEE Trar2s. S o f i . Eng., vol. 28, no. 5, pp. 478493,2002. F. De Turck, S . Vanhastel, B. Volckaert, and RDemeester, “A generic middleware-based platform for scalable cluster computing,” Future Gener: Comput. Syst. vol. 18, no. 4, pp. I
549-560,2002.
[33] P. Evripidou, G. Samaras, C. Panayiotou, and E. Pitoura, “The PaCMAn metacomputer: parallel computing with Java mobile agents,” Future Genu. Cumput. Sysr., vol. 18, no. 2, pp. 265-280,2001. [34] J . AI-Jaroodi, N. Mohamed, H. Jiang, and D. Swanson, “Middleware infrastructure for paralleI and distributed programming models in heterogeneous systems,” IEEE Trans. Parall. Disrrib. Sys., vol. 14, no. 1 1, pp. I 100-1 1 1 1 , 2003. 1351 I. Cmkovic and M. Larssom, “Challenges of componentbased development,”J. Sysr. Suftw., vol. 61, no. 3, pp. 201212,2002. [36] Common Component Architecture forum, “Common Component Architecture technical specification,” www.ccafomm.org, 2004. 1371 L. Dmmmond and 0. Marques, “The advanced computational testing and simulation toolkit (ACTS),” i n Scaling lo New Heights Wksp. Berkeley, USA: NERSC, Lawrence Berkeley National Laboratory, 2002, p. 1, available as Tech. Rep. LBNL-50414. [38] J. C. Hsu, Mdtiple Conparisow: Theory nizd Merhuds. London, UK: Chapman and Hall, 1996. [39] A. Radenski, B. Noms, and W. Chen, “A generic &pairs cluster-computing pipeline and its applications,” in Parco99 Ini Couf oii Parallel Compirtiiig - Furzdumeritals and Applications, E. H. D’Hollander, G. R. Joubert, E Peters, and H. Sips, Eds. Delft, The Netherlands: Imperial College Press, 1999, pp. 366-374. [40] B. Monien and R. Feldmann, Eds., Eurc-Par 2002, 8rh h t Conf on Pumllel Processing, ser. LNCS, vol. 2400. Paderbom, Gemany: Springer-Verlag, 2002. [41] A. Bode,T. Ludwig, W. Karl, and R. Wismuller, Eds., EuroPar 2000 15thh t Euro-ParConf un Parallel Processing, ser. LNCS, vol. 1900. Munich, Germany: Springer-Verlag, 2000.
205