Support Operator Rupture Dynamics on GPU

2 downloads 0 Views 389KB Size Report
(Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, 55455, USA) ( ... simulation of earthquake rupture dynamics is developed by Ely al.
Support Operator Rupture Dynamics on GPU

Shenyi Song1,2,3, Yichen Zhou1,4, Tingxing Dong1,2,3, David A. Yuen1,5 (Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, 55455, USA) (Graduate School of China Academy of Science, Beijing, 100190, China) Information Center, China Academy of Science, Beijing, 100190, China)

(Computer Network (Department of

Computer Science, University of Minnesota, Minneapolis, 55455, USA) (Department of Geology & Geophysics, University of Minnesota, Minneapolis, 55455, USA)

Abstract The method of Support Operator (SOM) is a numerical method to simulate seismic wave propagation by solving the three dimension vsicoelastic equations. Its implementation, the Support Operator Rupture Dynamics (SORD) has been proved to be highly scalable in large-scale multi-processors calulations. This paper discusses accelarating SORD using on GPU using NVIDIA CUDA C. Compared to its original version on CPU, we have acrhieved a maximum 12.8X speed-up. Key words: Support Operator; CUDA; Seismic wave propagation.

Introduction The method of Support Operator The method of Support Operator is a generalized finite difference method introduced by Samarskiiet al. and Shashkov. SOM is a general scheme for discretizing the differential form of partial differential equations. The Support Operator Rupture Dynamics (SORD), an application of this method in the simulation of earthquake rupture dynamics is developed by Ely al.. This application uses singleprecision floating-point Support Operator Method. SORD can be used to investigate idealized wave propagation and rupture dynamics problems and to simulate potential future earthquakes with realistic fault and basin models. One example is the simulation of Mw 7.6 earthquake scenarios on the southern San Andreas fault.

Solving Partial Differential Equations on GPU A variety of applications require solving partial differential equations (PDE), such as Laplace equation in image denoising, Poisson equation in image editing and mesh editing and Navier-Stocks equations in fluid simulation, etc. Numerical simulation of the PDEs usually requires high-intensity computation and large consumption of computational resources.

As a multiple SIMD porcessing unit, GPU has inherent parallelism, which is suited for explicit and lattice-based computations. Solid-earth geophysics remains one of the last bastions to have resisted the use of GPUs, especially in geodynamics. GPU programming on NVIDIA graphics cards has become significantly easier with the introduction at the end of 2006 of the CUDA C programming language, which is relatively easy to learn because its syntax is similar to C.

Support Operator Rupture Dynamics Theoretical Formulation The governing equations of wave propagating in 3D, isotropic viscoelastic medium are g ij = ∂ j (u i + γvi ) ,

(1)

σij = λδij g kk + µ( g ij + g ji ) ,

(2)

ai = (1 ρ)∂ j σij ,

(3)

v i = ai ,

(4)

u i = vi .

(5)

Where σ is the stress tensor, u and v are displacement and velocity vectors, ρ is density, λ and μ are elastic moduli, and γ is viscosity.

Numerical Method Finite Difference Method (FDM) is widely used in modeling three dimensional seismic wave propagation and rupture dynamics problems [9]. We apply the Support Operator Method (SOM). Many simple FDMs are special cases of SOM. The approach constructs discrete analogs of continuum derivative operators that satisfy important integral identities, such as the adjoint relation between gradient and divergence. SOM brings to an FDM-type formulation the FEM advantage that energy is conserved in the semi-discrete equations. The scheme is explicit in time, and discretized on a hexahedral, logically rectangular mesh. On the mesh we define the space of nodal function HN consisting of the hexahedra vertices, and the space of cell HC consisting of the hexahedra volumes. If we do a difference to a variable in HN, we can get a variable in HC, and if we do a difference to a variable in HC, we can get a variable in HN. So we define two discrete difference operators.

Di : H N → H C and D i : H C → H N .

(6)

On the nodes we have (ρ, γ, β, u, v, a) ϵ HN , and on the cells we have (λ, μ, y, σ, g) ϵ HC . Using the two operators we can obtain a variable in HN through variables in HC and vice versa. In this case Di is called the natural operator and Di is called the support operator. As for time, we adopt a centered difference in second-order accuracy. So the discretized difference equations are:

g ij = D j (u in + γvin −1 2 ) ,

(7)

σij = Λδij g kk + M ( g ij + g ji ) ,

(8)

ai = RD j σij −Q k yQk (u in + βvin −1 2 ) ,

(9)

vin+1 2 = vin−1 2 + ∆tai ,

(10)

uin +1 = u in + ∆t in +1 2 .

(11)

The material variable incorporate the cell volumes VC and the node volumes VN : Λ=λ V C

(12)

M =µ V C

(13)

R =1 ρV

(14)

N

Viscous as well as stiffness hourglass control may be used, for which we define the viscosity β, and stiffness y = µ(λ + µ) [6(λ + 2 µ)]

(15)

The form we choose for hourglass stiffness y is based on the approximate analysis of Kosloff . Instabilities in the numerical method due to non-uniform stress modes are corrected for by hourglass operators:

Qk : H N → H C and Q k : H C → H N .

(16)

Implement on GPU using CUDA Algorithm – thread strategy As a finite differential method on structured grid, SORD scales well on GPU. We develop the GPU code based on David Wang’s work. Use CUDA instead of Fortran to imply the program on NVIDIA GPU device. The program performs as Fig. 1.:

Fig. 1. Program flowchart of SORD on GPU.

Setup GPU device and allocate data array Initialize

Read input data and initialize output files Generate compute grid Initialize material, PML and source

Compute velocity and Displacement

I/O files

Compute Stress Loop: timestep++

Compute acceleration CUDA memory copy

TRUE timestep t?


= j_start && threadIdx.y < j_end ) && ( blockIdx.x

>= k_start && blockIdx.x

< k_end )

&& ( blockIdx.y

>= l_start && blockIdx.y

< l_end ))

{ // switch statement is unfolded - no branch - due to compile time template switch (iq) { case 0: val = f[th_id + (

(off_3d_f *

i))] + f[th_id + (1 + off_1d_f + off_2d_f + (off_3d_f * i))] + f[th_id + (

off_1d_f + off_2d_f + (off_3d_f * i))]

+ f[th_id + (1 +

(off_3d_f *

i))] f[th_id + (1 +

off_2d_f + (off_3d_f *

i))] f[th_id + (

off_1d_f +

(off_3d_f *

i))] f[th_id + (1 + off_1d_f +

(off_3d_f *

i))] f[th_id + ( i))]; break; case 1: val =

off_2d_f + (off_3d_f *

f[th_id + (

(off_3d_f *

i))] + f[th_id + (1 + off_1d_f + off_2d_f + (off_3d_f * i))] f[th_id + (

off_1d_f + off_2d_f + (off_3d_f * i))]

f[th_id + (1 +

(off_3d_f *

i))] + f[th_id + (1 +

off_2d_f + (off_3d_f *

i))] + f[th_id + (

off_1d_f +

(off_3d_f *

i))] f[th_id + (1 +

off_1d_f +

(off_3d_f *

i))] f[th_id + (

off_2d_f + (off_3d_f *

i))]; break; case 2: val = f[th_id + (

(off_3d_f *

i))] + f[th_id + (1 + off_1d_f + off_2d_f + (off_3d_f * i))] f[th_id + (

off_1d_f + off_2d_f + (off_3d_f * i))]

f[th_id + (1 +

(off_3d_f *

i))] f[th_id + (1 +

off_2d_f + (off_3d_f *

i))] f[th_id + (

off_1d_f +

f[th_id + (1 +

off_1d_f +

(off_3d_f *

i))] + (off_3d_f *

i))] + f[th_id + (

off_2d_f + (off_3d_f *

i))]; break; case 3: val = f[th_id + (

(off_3d_f *

i))] f[th_id + (1 + off_1d_f + off_2d_f + (off_3d_f * i))] + f[th_id + (

off_1d_f + off_2d_f + (off_3d_f * i))]

f[th_id + (1 +

(off_3d_f *

i))] + f[th_id + (1 +

off_2d_f + (off_3d_f *

i))] f[th_id + (

off_1d_f +

(off_3d_f *

i))] + f[th_id + (1 + off_1d_f +

(off_3d_f *

i))] f[th_id + ( i))]; break; } df_output[th_id] = val; } else { return; } }

off_2d_f + (off_3d_f *