GPU Accelerated Fast FEM Deformation Simulation

3 downloads 120208 Views 530KB Size Report
GPU Accelerated Fast FEM Deformation Simulation. Youquan Liu1,3 ... 1Faculty of Science and Technology. University of ... 5Department of Computer Science and Engineering, ..... in processing large scale data due to its high degree of.
GPU Accelerated Fast FEM Deformation Simulation Youquan Liu1,3, 1

Faculty of Science and Technology University of Macau, Macau, China [email protected]

Shaohui Jiao3, 2

Wen Wu2,5,

Faculty of Information Technology Macau University of Science and Technology, Macau, China [email protected]

4

State Key Lab of Computer Science, Institute of Software,Chinese Academy of Sciences, Beijing, China [email protected]

Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China [email protected]

Abstract—In this paper we present a general FEM (Finite Element Method) solution that enables fast dynamic deformation simulation on the newly available GPU (Graphics Processing Unit) hardware with compute unified device architecture (CUDA) from NVIDIA. CUDA-enabled GPUs harness the power of 128 processors which allow data parallel computations. Compared to the previous GPGPU, it is significantly more flexible with a C language interface. We not only implement FEM deformation computation algorithms with CUDA but also analyze the performance in detail. Our test results indicate that the GPU with CUDA enables about 4 times speedup for FEM deformation computation on an Intel(R) Core 2 Quad 2.0GHz machine with GeForce 8800 GTX.

INTRODUCTION

In graphics community, from 80s’ some pioneers like [1] et al. have started the physically based deformation simulation. After so many years, this area is still active since some problems still there, even though some very excellent progresses available. The tradeoff between performance and precision is always a headache everlasting problem. For a recent survey about the methods of deformation in computer graphics, readers can refer to [2, 3]. The introduction of the Graphics Processing Unit (GPU) provided a means for massive data-parallel computation on the PC. Besides traditional graphics rendering, it became possible to program general purpose GPUs (GPGPU) for a variety of data-intensive applications [4]. For deformation problems, James et al. [5] used vertex processor to calculate the modal synthesis. And Ranzuglia et al. [6, 7] used pixel processor to accelerate the mass-spring deformation framework. However, harnessing the power of the GPU remained tricky since the GPU could only be programmed through a graphics API, such as OpenGL or D3D, adding the overhead of an inadequate API to floating point applications. While GPU programs could gather information from any part of the DRAM, they were not as flexible in scattering the information to any part, making

the GPU less flexible than the CPU. To overcome these problems, NVIDIA unveiled the Compute Unified Device Architecture (CUDA) [8] in November 2006 which allows the use of the C programming language to code algorithms to execute on the GPU. CUDAenabled GPUs include data parallel cache, which allows 128 processor cores in the GeForce 8 Series GPUs. By opening up the GPU architecture, CUDA provides an ideal environment for the development of computation-intensive tasks that can take advantage of the massively parallel nature now available in the G8X series GPUs. This paper presents a very general solution to the FEM deformation algorithm, which is implemented using the CUDA to obtain some performance gains on PCs. And also it analyzes the bottleneck of the whole simulation in detail. Compared to another popular deformation method - massspring system, FEM (Finite Element Method) is more sophisticated and more close to its physics property, but certainly it is much slower. What’s more FEM can provide more precise results for engineering problems, such as structure analysis. In Section II the deformation implementation details are given, and then some comparisons and analysis between CPU and GPU are illustrated in Section III. And lastly, we present our conclusion and our future work in Section IV. II.

GPU-ACCELERATED FEM DEFORMATION

A.

Dynamic FEM Deformation For dynamic problems, the motion of an object obeys the following law:

 + Du + Ku = F Mu (1) where u is the 3n-dimensional nodal displacement vector, n is the total number of nodes in the object; M is the mass matrix; D is the damping matrix, here we apply Rayleigh damping

Support was provided by NIH R01 EB005807 & the National Grant Fundamental Research of Science and Technology (973 Project: 2002CB312102)

978-1-4244-2342-2/08/$25.00 ©2008 IEEE.

3

5

Department of Mechanical Aerospace and Nuclear Engineering Rensselaer Polytechnic Institute, Troy, NY, USA [email protected]

I.

Suvranu De4

606

D = α M + β K with α and β used to control the damping; K is the stiffness matrix; F is the external force vector which comes from collision or gravity. x is the position of the current configuration position, x0 is the rest configuration position of the input object. The original stiffness matrix K is a banded sparse matrix because of the low number of neighbor nodes to each node. If we store the whole matrix that it is 3n × 3n with many zero elements. To avoid the following performance loss, we use block-fashion to store the non-zero entries associated with each node, which is represented as a list of 3 × 3 matrices, similar to [9]. In this paper, we just focus on the linear elastic problems. The constitutive relationship of the linear material follows the Hookean material law σ = Cε , where C is the symmetric elastic matrix which relates the material’s physical properties. And Cauchy strain tensor is used:

ε=

(

1 T ∇u + ( ∇u ) 2

)

We assume K doesn’t change during the iteration, the entire simulation algorithm can be written as follows (See Figure 1): 1. 2. 3. 4.

Compute K and M on the host side Compute K ′ on the host side

‹

For each node

(2)

bi = Δtf

To maintain stable during interaction with large time steps, the implicit integration [9, 10] is used to compute the discrete approximates by (3) and (4):

Mv t +Δt

„

After substituting (3) into (4), we can solve velocity

(

Mv t +Δt = Mvt + Δt − Dv t +Δt − ( ut + Δtv t +Δt ) + Ft +Δt

)

i ∈ (1...n) : t +Δt i

+ mi v ti − ΔtK ii uti neighbors j ∈ (1...l

Suggest Documents