Domain-Specific Languages for Stencil Computations - Mathematics ...
Recommend Documents
drawbacks to achieving high performance on modern architec- tures. First ..... which is a Python-based sequence [l, u) in increments of i. 1) Thread count: The ...
Data Layout Transformation for Stencil Computations on Short-Vector SIMD
Architectures. Tom Henretty1, Kevin Stock1, Louis-Noël Pouchet1, Franz
Franchetti2 ...
based on eight flops per stencil, and one stencil per time step for every point excluding the boundary (if present). 3. NAÃVE IMPLEMENTATION. Code for the 3D ...
I thank my parents, who gave me the best education and built my personality. Last but not the least, I would like to thank my wife, who always stands with me ...
PowerXCell 8i SPU SIMD implementations. In some instances (RoadRunner, Blue-. Gene/L), custom ISA extensions were designed since the supercomputing ...
May 25, 2005 - require a bounded amount of storage per space point, we present a ... granted without fee provided that c
Conway's famous game of life is a two-dimensional cellular automaton in which
... of a grid point (x, y) at time t + 1 depends only on the state of the grid point and
...
Feb 21, 2014 - FLOPS are cheap - MEMORY access is costly. Optimization .... Domain. Decomposition. Computational. Input/Output. Raúl de la Cruz (CASE - BSC) .... Alleviate footprint and register pressure by splitting internal loops (fission).
(Generic Stencil Computing Library) [5] to let application programmers specify stencil ... over regular grids using generic programming capabilities of C++.
Apr 17, 2018 - Thus, they are excellent candidates for acceleration on Graphics ... stencil code in order to exploit the underlying GPU architec- ture. They must ...... [17] OpenCV, âThe OpenCV library,â http://docs.opencv.org. [18] M. Hall, E.
experiments. The IBM Cluster is a Symmetric Multipro- cessing (SMP) system. The nodes are made of 1.3-GHz. POWER4 processors. The processors in a single ...
We call Ni,Nj, and Nk as the loop size ..... tional Center for Atmospheric Research, Colorado, for our ... and the MPI communication call start up cost (β), with a.
programmers in stencil computations with multiple spatial dimensions. We also ... application developers no choice but to include such tiling ...... Cray Inc. blog,.
oblivious [2] schemes, this paper develops their NUMA-aware variants, explaining why ..... reads (7 vector elements plus 7 matrix coefficients) and .... icpc 12.1.2.
Aug 25, 2014 - Cendant Corporation,. Columbus, OH ..... AVX, and NEON vector ISA with the split tiling and DLT optimizations described in this dissertation.
fold speedup on Intel Harpertown. We also examine lower precision (3rd, 4th, and 5th orders) stencil computations to analyze the dependency of data-level ...
[email protected]. AbstractâThis paper investigates the performance of flash ..... Each xy-plane is a unit comprising asynchronous IOs by multiple threads in.
lence Award from the Digital Equipment Corporation Faculty Program, and .... provide the history-sensitive behavior necessary to model asynchronous access.
Page 1. SARá. MERICA. MAKE. GAIN +. V * * L. X Y. D) â¢
E ⥠and the fraction of electrons energy e. 0.1 ε ⥠. This provides a possible explanation .... son cross section can be obtained from the electron distribution e e.
to investigate languages of infinite concurrent processes in one class of flow event structures - infinite homogenous flow event ... X is conflict-free,. ⢠X does not ...
We present the approach underlying a course on Domain-Specific Languages of Mathematics [16], currently being developed
33 No. 1, 65-85. Languages and Mathematics Achievements Among Rural and
Urban ..... Sekolah Rendah (UPSR) (Primary School Assessment Test)
Mathematics. Paper 2 and incorporated adequate spaces following every
question for the.
I want to make great tools that developers enjoy using. Languages. Developer Tools. Mathematics. Domain-Specific. C, C++
Domain-Specific Languages for Stencil Computations - Mathematics ...
Finite-‐difference stencils are very common in numerical modeling. ... Stencil
panern determines the interacVon among points ... 65 66 101 102 133 134. 00
00.
Domain-Specific Languages for Stencil Computations Azamat Mametjanov Boyana Norris Mathema4cs and Computer Science Division Argonne Na4onal Laboratory
CACHE-2012 Annual Meeting, December 6-7, 2012
Motivation
q
q
Finite-‐difference stencils are very common in numerical modeling. They exhibit high degree of data parallelism and regular structure. However, their memory requirements hinder the performance. Our approach consists of – – – –
Exploita4on of a stencil’s data access paOern Automa4c conversion of C loops to CUDA C host+kernel code Automa4c tuning of CUDA C performance parameters Raising the programming model to domain abstrac4ons
2
Outline
q q q q q q
Introduc4on Stencil data structures Transforma4on and tuning framework of Orio Our approach Results DSLs for stencils
3
Stencils
q q
Sets of neighboring discrete points in a structured grid Stencil paOern determines the interac4on among points – – – –
Domain dimension: 1D, 2D, 3D Stencil shape: star, box Stencil width: distance from stencil center Boundary condi4on: Dirichlet, periodic etc.
4
Grid, adjacency matrix and its compression -‐3
Grid Element
1
3
4
5
6
7
1
2
3 8
4
5
6
7
8
9
2
9
5
7
8
0
1
3
4
57 58 93 94
125 126
00
00
00
00
57
58
93
94
125 126
59 60 95 96
127 128
00
00
00
00
59
60
95
96
127 128
9
129 130
129 130
00
00
25
26
61
62
97
98
27 28 63 64 99 100
131 132
00
00
27
28
63
64
99
100 131 132
29 30 65 66
101 102
133 134
00
00
29
30
65
66
101 102 133 134
31 32 67 68
103 104
135 136
00
00
31
32
67
68
103 104 135 136
1
2
33 34
69
70
105 106
137 138
1
2
33
34
69
70
105 106 137 138
3
4
35 36
71
72
107 108
139 140
3
4
35
36
71
72
107 108 139 140
5
6
37
38
73
74
109 110
141 142
5
6
37
38
73
74
109 110 141 142
7
8
39
40
75
76
111 112
143 144
7
8
39
40
75
76
111 112 143 144
9
10
41
42
77
78
113 114
145 146
9
10
41
42
77
78
113 114 145 146
11 12
43
44
79
80
115 116
147 148
11
12
43
44
79
80
115 116 147 148
13
14
45
46
81
82
117 118
13
14
45
46
81
82
117 118 00
00
15
16
47
48
83
84
119 120
15
16
47
48
83
84
119 120 00
00
17
18
49
50
85
86
121 122
17
18
49
50
85
86
121 122 00
00
19
20
51
52
87
88
123 124
19
20
51
52
87
88
123 124 00
00
21
22
53
54
89
90
21
22
53
54
89
90
00
00
00
00
23
24
55
56
91
92
23
24
55
56
91
92
00
00
00
00
(a)
6
-‐1
3
25 26 61 62 97 98 2
1
(b)
(c)
5
Outline
q q q q q q
Introduc4on Stencil data structures Transforma2on and tuning framework of Orio Overview of the approach Results DSLs for stencils
6
Method: Code Transformation q
Mo4va4on – Compila4on: HL source code into LL portable executable code – Op4miza4on: performance, energy – Refactoring: resiliency, maintainability, readability
q
Workflow – – – –
q
Parse: any structured source text into abstract syntax tree Analyze: common intermediate representa4on Transform: composi4ons of reusable transforms Generate: any structured target text
Challenges – Create source and target domains – Create analysis and transforma4on rules 7
Method: Code Tuning q
Mo4va4on – Deep component stacks – Each component is adjustable
q
Workflow – – – –
q
System model: pre-‐specified, learned Applica4on profile: memory-‐/compute-‐bound Configure: create a valid configura4on of parameters Select: the best performing parameter configura4on