Parallel Computing with X10 PVR Murthy Corporate Technology, Siemens Bangalore, India
[email protected] ABSTRACT
architectures. X10 introduces a Partitioned Global Address Space (PGAS) that materializes as locality in the form of places. To provide a foundation for concurrency constructs in the language, dynamic and asynchronous activities are introduced in X10. To support dense and sparse distributed multi-dimensional arrays, X10 introduces a rich array sub-language.
Many problems require parallel solutions and implementations and how to extract and specify parallelism has been the focus of Research during the last few decades. While there has been a significant progress in terms of (a)automatically deriving implicit parallelism from functional and logic programs, (b) using parallelizing compilers to extract parallelism from serial programs written in Fortran or C mainly by parallelizing loop constructs and (c) evolution of standards such as Message Passing Interface (MPI) to allow a Fortran or C programmer to decompose a problem into a parallel solution, the parallel computing problem is still not solved completely. With the emergence of parallel computing architectures based on multicore chips, there is a need to rewrite existing software and also develop future software so that parallelism available at the hardware level is fully exploited.
The Java programming model uses the notion of a single uniform heap and this is a limitation in using the language on non-uniform cluster computing systems. Scalability problems are reported in trying to automatically map a uniform heap onto a non-uniform cluster. Places in X10 attempt to address the scalability issue by letting an X10 programmer decide which objects and activities are co-located. To be able to create light-weight threads locally or remotely, X10 introduces the notion of asynchronous activities. The corresponding mechanisms in Java are heavy weight. The language constructs async, future, foreach, ateach , finish, clocks and atomic blocks are designed to co-ordinate asynchronous activities in an X10 program.
Executing concurrent or distributed programs using modern object-oriented programming languages such as Java and C# is possible on two platforms: 1. a uniprocessor or shared memory multiprocessor system on which one or more threads execute against a single shared heap in a single virtual machine and 2. a loosely coupled distributed computing system in which each node has its own virtual machine and communicates with other nodes using protocols such as RMI. Computer systems are already consisting of and will have multicore SMP nodes with non-uniform memory hierarchies interconnected in horizontally scalable cluster configurations. Since the current High Performance Computing programming models do not support the notions of a non-uniform data access or of tight coupling of distributed nodes, the models are ineffective in addressing the needs of such a system. As a consequence, X10 is proposed [1,2].
The elements of an array are distributed across multiple places in the partitioned global address space based on the array’s distribution specification. Throughout the program’s execution, the distribution remains unchanged. The issues of locality and distribution cannot be hidden from a programmer of highperformance code and X10 reflects this in its design choices. To illustrate X10’s features to implement concurrent and distributed computations, sample programs are discussed.
Categories and Subject Descriptors D.1.3 [Programming Techniques]: Concurrent Programming – Distributed programming, Parallel programming; D.3.2 [Programming Languages]: Language Classifications – Concurrent, distributed, and parallel languages, Objectoriented languages
The target machine for the execution of an X10 program may range from a uniprocessor machine to a large cluster of parallel processors supporting millions of concurrent operations. The design goals of X10 are to achieve a balance among Safety, Analyzability, Scalability and Flexibility. The X10 programming model uses the serial subset of Java and introduces new features to ensure that a suitable expression of parallelism is the basis for exploiting the modern computer
General terms Languages, Performance, Design
Keywords X10, Java, Multithreading, Non-uniform Cluster Computing (NUCC), Partioned Global Address Space(PGAS), Places, Data Distribution, Atomic Blocks, Clocks, Scalability
Copyright is held by the author/owner(s). IWMSE’08, May 11, 2008, Leipzig, Germany. ACM 978-1-60558-031-9/08/05.
5
[2] Kemal Ebcioglu, Vijay Saraswat, Vivek Sarkar, 2004. X10: Programming for Hierarchical Parallelism and Nonuniform data access. 3rd International Workshop on Language Runtimes, Impact of Next Generation Processor Architectures on Virtual Machine Technologies co-located with ACM OOPSLA.
REFERENCES [1] Phillipe Charles, Christopher Donowa, Kemal Ebcioglu, Christian Grothoff, Alan Kielstra, Christoph Von Praun, Vijay Saraswat, Vivek Sarkar 2005. An object-oriented Approach to Non-Uniform Cluster Computing. ACM OOPSLA.
6