A Composable Array Function Interface for ...

1

Introduction

Array API

Runtime Code Generation

Preliminary Results

Conclusions

A Composable Array Function Interface for Heterogeneous Computing in Java Juan Jose Fumeroλ

Michel Steuwerπ λ University

π University

Christophe Dubachλ

of Edinburgh, UK

of Münster, Germany

ARRAY’14 , 13.06.2014

2

Introduction

Array API


Preliminary Results

Programming for Heterogeneous Computing

Conclusions

3

Introduction

Array API


Preliminary Results


Conclusions

4

Introduction

Array API


Preliminary Results


Conclusions

5

Introduction

Array API


Preliminary Results

Conclusions

Previous Work

Embedded DSL in High Level Languages: PyCUDA, PyOpenCL,... JOCL, JavaCL, JCuda, ...

Stream programming: IBM Liquid Metal: new operators for tasks and data parallelism Sumatra: Stream API based on JDK 8 for GPU array programming

Introduction

Array API


Preliminary Results

API Description Array Programming Interface

Function ArrayFunction

Map

6

Reduce

Zip

Conclusions

Introduction

Array API


Preliminary Results

Example - dotProduct

P(n−1) 0

ai ∗ b i

f = z i p ( ) . map ( x −> x . _1 ∗ x . _2 ) . r e d u c e ( ( x , y ) −> x + y ) ; F l o a t [ ] a = new F l o a t [ N ] ; F l o a t [ ] b = new F l o a t [ N ] ; F l o a t [ ] r e s u l t = f . a p p l y ( new T u p l e ( a , b ) ) ; 7

Conclusions

Introduction

Array API


Preliminary Results


P(n−1) 0

ai ∗ b i

f = z i p ( ) . map ( x −> x . _1 ∗ x . _2 ) . r e d u c e ( ( x , y ) −> x + y ) ; F l o a t [ ] a = new F l o a t [ N ] ; F l o a t [ ] b = new F l o a t [ N ] ; F l o a t [ ] r e s u l t = f . a p p l y ( new T u p l e ( a , b ) ) ;

8

Conclusions

Introduction

Array API


Preliminary Results


P(n−1) 0

ai ∗ b i


Conclusions

Introduction

Array API


Preliminary Results


P(n−1) 0

ai ∗ b i


Conclusions

11

Introduction

Array API



Preliminary Results

Conclusions

12

Introduction

Array API


Deoptimisation Process

Preliminary Results

Conclusions

13

Introduction

Array API


Vision in the Future Opportunities for Specialisation

Preliminary Results

Conclusions

Introduction

Array API


Preliminary Results

Setup Workstation with AMD SDK OCL Driver

Black-Scholes problem Comparison with: Java Sequential: primitives data types Java Objects: using Float and Tuples Array Function: our API Java threads OpenCL GPU

Conclusions

15

Introduction

Array API


Preliminary Results

Conclusions

Sequential Version Black - Scholes (AMD version) ArrayFunction API

Sequential J. Objects

Sequential J. Primitive

Runtime in milliseconds

1000

800

600

400

200

2

51

1K

2K

4K

8K

K

16

K

32

Input size

K

65

8K

12

25

6K

50

0K

1M

16

Introduction

Array API


Preliminary Results

Conclusions

Parallel Executions Black-Scholes on AMD GPU and Intel 16 cores #32 Java Threads

OpenCL GPU

50

Speedup

40

30

20

10

0

2

51

1K

2K

4K

8K

K

16

K 32

Input size

K

65

8K

12

25

6K

50

0K

1M

Introduction

Array API


Preliminary Results

GPU execution time breakdown Black-Scholes on AMD Tahiti

1M elements

Kernel execution workflow

Conclusions

Introduction

Array API


Preliminary Results


1M elements

18


Conclusions

Introduction

Array API


Preliminary Results


1M elements


Conclusions

Introduction

Array API


Preliminary Results


1M elements


Conclusions

Introduction

Array API


Preliminary Results


1M elements


Conclusions

Introduction

Array API


Preliminary Results

.zip(Conclusions).map(Future)

Present Java Array Programming API: very high level approach of using parallel patterns in heterogeneous systems We have presented an early prototype of Map/Reduce by using Graal JDK8 and OpenCL

22

Conclusions

Introduction

Array API


Preliminary Results

Conclusions

.zip(Conclusions).map(Future)

Present Java Array Programming API: very high level approach of using parallel patterns in heterogeneous systems We have presented an early prototype of Map/Reduce by using Graal JDK8 and OpenCL Future Runtime scheduling (Where is the best place to run the code?) Code generation for multiple devices Specialised code generation at runtime can improve performance and portability

22

Introduction

Array API


Preliminary Results

Thanks so much for your attention

This work was supported by a grant from:

Juan José Fumero [email protected]

23

Conclusions

A Composable Array Function Interface for ...

A Composable Array Function Interface for ...

Suggest Documents

A Proposal for a Customizable, Composable User Interface to Fractal

micro-well array interface for capillary array electrophoresis

A 10 BIT INTERFACE CIRCUIT FOR AN ARRAY OF

A Description Language For Composable Components - CiteSeerX

An Interface for Fitness Function Design

Object, function, action for tangible interface design.

Containment Units: A Hierarchically Composable Architecture for ...

Microneedle Array Interface to CE on Chip

MICROPHONE ARRAY FRONT-END INTERFACE ...

Composable trees for configurable behavior

Flexible EMG Sensor Array for Haptic Interface, Proc. SICE

Opto-Î¼ECoG Array: A Hybrid Neural Interface with ...

Composable Simulations

Multi-function Phased Array Radars (MPAR)

Wigner function measurement using a lenslet array - Zhengyun Zhang

Composable Language Extensions for ... - Semantic Scholar

Composable and Efficient Mechanisms

Composable Services Architecture for Dynamically ... - salsahpc

Composable Languages for Bioinformatics-v17 - arXiv

Composable Multi-Threading for Python Libraries

Composable Services Architecture for Dynamically ... - salsahpc

Time-Predictable and Composable Architectures for Dependable ...

Thermodynamic stability conditions for nonadditive composable ...

Composable Markov Processes - heim.ifi.uio.no