IEEE International Conference on Advances in Engineering & Technology Research (ICAETR - 2014), August 01-02, 2014, Dr. Virendra Swarup Group of Institutions, Unnao, India
Development of 3D-CFD code for Heat Conduction Process using CUDA Yogesh D. Bhadke, Manik R. Kawale, Dr. Vandana Inamdar Department of Computer Engineering and IT, College of Engineering, Pune, India
[email protected] Abstract— Heat conduction is natural phenomenon which is governed by three dimensional, transient partial differential equation. The partial differential equation can be solved by many numerical method such as finite difference method, finite volume method, finite element method etc. These methods require heavy computation to solve the system of algebraic equations. Graphics processing unit (GPU) can be used to handle the computation of CPU as a co-processor so the GPU will save a lot of time for computation. The dominant proprietary framework for GPU computing is CUDA, provided by NVidia. It can be used to solve computation intensive task. The objective of this work is to develop a heat conduction code on CUDA platform, which will solve the system of algebraic equations using GPU framework. Keywords—Heat conduction; CFD; CUDA; GPU;
I. INTRODUCTION Heat is the form of energy that can be transferred from one system to another as a result of temperature difference. A thermodynamic analysis is concerned with the amount of heat transfer from one equilibrium state to another equilibrium as a system undergoes a process. The Heat transfer is a stream of thermal engineering that concerns the creation, use, conversion, and transactions of thermal energy and heat amid physical systems. Heat transfer is categorized into assorted mechanisms such as thermal conduction, thermal convection, thermal radiation and transfer of energy by phase change. As these mechanisms have different characteristics, they occur simultaneously in the same system. The basic condition for the heat transfer is temperature difference. There can be no net heat transfer amid two mediums that are at the alike temperature. The rate of heat transfer in a particular direction depends on the magnitude of the temperature gradient (the temperature difference per unit length or the rate of change of temperature) in that direction. The larger the temperature gradient, the higher the rate of heat transfer. Heat transfer by conduction is the flow of thermal energy inside solids and non-flowing fluids, driven by thermal nonequilibrium usually measured as a heat flux (vector), i.e. the heat flow per unit time at a surface.
In solids conduction occurs due to the combination of vibrations of the molecules and the energy transferred by free electrons. The rate of heat conduction across a medium depends on the geometry of the medium, its thickness, and the physical of the medium, as well as the temperature difference across the medium. Heat conduction equation is the parabolic partial differential equation (PDE). Heat transfer have direction as well as magnitude. The rate of heat conduction in a particular direction is proportional to the temperature gradient. Heat conduction in a medium is three-dimensional and depends on time that is transient [8]. That is, u=u(x, y, z, t) here u is the temperature function over space variable x, y, z and time t. Transient conduction occurs as the temperature inside an object changes as a function of time. Transient, three-dimensional heat conduction equation is given as 0
(1)
Where
is the thermal conductivity, is the density of the material and is the specific heat capacity. The differential equations do not include any information regarding the conditions on the surfaces such as the external temperature. . Yet we understand that the temperature dissipation in a medium depend on the conditions at the surfaces and the description of a heat transfer problem in a medium is not finished unless a maximum description of the thermal conditions at the bounding surfaces is specified [8]. To delineate a heat transfer problem completely, two boundary conditions have to be given for every single direction of the coordinate system where the heat transfer is significant. Therefore, we have to give two boundary conditions for onedimensional problem, four boundary conditions for twodimensional problem, and six boundary conditions for threedimensional problems. Analysis of multidimensional transient heat conduction PDE is complicated and requires numerical analysis by high
978-1-4799-6393-5/14/$31.00 ©2014 IEEE
IEEE International Conference on o Advances in Engineering & Technology Researcch (ICAETR - 2014), August 01-02, 2014, Dr. Virendra Swarup Group of Institutions, Unnaao, India end computer with great computation poweer. So now a days there is hurdle in increase in computationn power (GHz) of workstation so one think of using the Graphiccs Processing Unit (GPU) to offload the heavy computation taskk [8]. II. ALTERNATING DIRECTION IMPLICIIT METHOD Alternating direction implicit (ADI) meethod is known as the splitting method. For heat equation shown s in (1) the computation for the method is split into the thhree steps. We can call these steps as the X-sweep, Y-sweep annd Z-sweep. Each sweep have the equation with simple structture which can be solved efficiently by the tridiagonal matrix allgorithm. The difference equation for the (1) is giveen as /
, ,
, ,
/
, ,
∆
(∆ )
, ,
, ,
(∆ )
(∆ )
(2)
In case of three-dimensional case, in first step implicit method is applied in X-direction and expliicit method in Ydirection and Z –direction producing an inteermediate solution. In second step, implicit method is applied in i Y-direction and explicit method in X-direction and Z-directtion. In third step, implicit method is applied in Z-direction and explicit method is applied in X-direction and Y-direction. The finite difference equation of model eqquation in the ADI formulation are /
, ,
, ,
/
, ,
, ,
∆
∆
/
(∆ )
, ,
/
, ,
, ,
, ,
(∆ )
(∆ )
/
/
, ,
(∆ )
,
, , / ,
/
, ,
(1
2 3)
1
/ , ,
, ,
3
(1
, ,
/ , ,
2(
,
2 1 / ,
)
(8)
Where 1
.∆ (∆
)
2
.∆ (∆
)
3
.∆ (∆
)
ADI is the implicit approaach where the unknown must be obtained by means of a simultaaneous solution of the difference equation, applied at all grid pooint at a given time level. If we write system of algebraic equuation (6), (7) and (8) in matrix form then it will look like the tridiagonal t matrix defined as the having non zero elements onlyy along the three diagonals such as tridiagonal matrix can solvedd using Thomas algorithm which is almost the standard for the treatment t of tridiagonal systems of equations. III. ADI-CUDA IMPMELNTATION For ADI-CUDA implementtation, we consider only regular or non-regular cubes. In CUDA A we are talking about the lots of threads so we have to arrannge CUDA threads according to our need. Consider cube as the phyysical domain which is divided into the discrete cells then the resultant cube looks similar to cube shown in the Fig. 1
(3)
, ,
(∆ )
3 2 2)
/
(4)
(∆ )
Figure 1. Physicaal domain after Meshing , , ∆
, ,
/
, ,
/
, ,
(∆ )
Where
/
, ,
(∆ )
(5)
(∆ )
, ,
, ,
2
, ,
, ,
, ,
, ,
2
, ,
, ,
, ,
, ,
2
, ,
, ,
d boundaries Boundary conditions are applied on the domain and for any typical node (i, j, k) the algebraicc equation is given as 1 2 3)
2 2 3) / , ,
, ,
2
, ,
, , ,
)
(1
/ , /
2 1) ,
1
, ,
,
,
(1
2 2)
1
/ , ,
3
,
/ , ,
(1
, ,
2 / , ,
,
2 2
, ,
/ ,
3(
, ,
(1
(6)
2 1
/ , ,
Figure 2. Set of XY-planes chopped across Z-direction.
For mapping CUDA threadds we consider the cube as the set of XY-planes chopped acrooss the Z-direction as shown in Fig. 2. For performing paralllel implementation we have to consider set of 1D blocks withh set of 1D threads in it. We can assume the block in the CUD DA as the one XY plane in the physical domain. One XY plane consists of the number of rows and number of column annd set of XY-planes can be seen across Z-direction. We are ussing the set of 1D threads, one
(7)
978-1-4799-6393-5/14/$31.00 ©2014 IEEE
IEEE International Conference on o Advances in Engineering & Technology Researcch (ICAETR - 2014), August 01-02, 2014, Dr. Virendra Swarup Group of Institutions, Unnaao, India thread can handle one row or column foor calculating the temperature values. In CUDA environment, there is no effecctive way available for inter-block synchronization so we have too create every new call to GPU when we require the synchronnization among the threads from different block. So in our desiggn we have created GPU call for X-sweep, Y-Sweep, and Z-Sweep and also for the checking convergence of the solution and forr copy temperature values between successive time steps.
Convergence of the solutionn is said to be achieved when the maximum temperature differeence between the each discrete point of the domain is almostt zero or constant in successive time steps. E IV. RESULTS
ADI serial and Douglus code which is written in Clanguage and the ADI-CUDA is written in CUDA-C is tested on the system with specification Table 1 and test data Table 2. Table 1. Sysstem specification
Fig.3 Computation of Tridiagonal system for row is assigned to 1 thread for XSweep
In X-Sweep, the number of thread gennerated for kernel launch is equal to the number of rows across every plane in the discretized domain. One thread can handdle the one row completely to generate the coefficient matrixx of the tridiagonal system and again gives call to the tridiagonaal system solver to solve the tridiagonal system and write backk the results from tridiagonal system solver. Each thread stores the coefficient for the tridiagonal matrix in its private memoryy which is in turn stored in the local memory.
Processor
Intell Xeon E5-1607 3.00GHz X 4
RAM
16GB B
OS
Ubunntu 12.04 LTS 64-bit
GPU
GeFoorce GTX 480
Compute capability
2
CUDA cores
480
GPU Global Memory
1.5 GB G
GPU clock rate
1.4 GHz G
CUDA version
5.5 Table 2. Test data
In Y-Sweep, the number of thread generaated for the kernel launch is equal to the number of column acrross every plane in the discretized domain as shown in Fig. 5.3. Here also one thread can handle one column completely annd call tridiagonal solver and write back results.
Time in second
In Z-Sweep, the number of thread gennerated for kernel launch is equal to the number of rows in onee plane only. As in Z-Sweep, temperature values lies across blocks b in the gird there is no use for generating the threads acrooss the block.
Temperature at left ennd (˚c)
80
Temperature at Right end (˚c)
20
Temperature at Top ennd (˚c)
60
Temperature at Bottom m end (˚c)
30
Temperature at Front end (˚c)
70
Temperature at Back end (˚c)
20
Density(ρ) (kg/m3)
7800
Sp. Heat(C) (J/K)
473
Conductivity(κ) (W/((m·K)
43
Time step (second)
0.001
16000 14000 12000 10000 8000 6000 4000 2000 0
ADI ADI-CUDA
Grrid Size Fig. 4 Computation of Tridiagonal system for column is i assigned to 1 thread for Y-Sweep
Fig. 5 Time comparison beetween ADI and ADI CUDA
978-1-4799-6393-5/14/$31.00 ©2014 IEEE
IEEE International Conference on Advances in Engineering & Technology Research (ICAETR - 2014), August 01-02, 2014, Dr. Virendra Swarup Group of Institutions, Unnao, India Fig.5 implies that for any gird size the time taken for execution is greater for ADI method and less for ADI-CUDA method except for initial grid size.
V.
Speedup
CONCLUSION
Alternating Direction Implicit method is an alternative to the Crank-Nicolson method which is computationally expensive when extended to multidimensional. ADI method shows the high level of data parallelism which can exploited using the high end GPU.
15
Speedup
Fig. 8 shows the results of parallel implementation are in complete analogy with the reference line shown in blue and red color respectively.
10 5 0 10X10X10 20X20X20 30X30X30 40X40X40 50X50X50
Grid Size Fig. 6 Speedup
Fig.6 implies that as the grid size increases there is increase in speedup also. Heat map for sample grid size 10 X 10 X 10 at convergence state is given as
Speedup up to the 11 X is obtained which is increasing with respect to the grid size. In parallel implementation local memory which is private memory of the CUDA thread is best utilized. As there is no significant inter-block synchronization available in CUDA, we were forced to split out kernel call for X-sweep, Y-sweep, Z-sweep which is significant overhead in case of parallel implementation. If in future any significant inter-block synchronization available then again it will be possible to improve the speedup with the same or another strategy. 3D heat conduction PDE almost similar to the some of PDE which are used in the computational finance with some change in one can easily extend this work for computational finance application. Speedup of the system is possible to increase if the memory layout of the data stored in the GPU is tuned with the memory access during X-sweep, Y-sweep and Z-sweep in parallel implementation as the memory access in these sweeps is widely scattered ACKNOWLEDGMENT We are thankful to Dr. Vikas Kumar for giving us opportunity to work for the project. We are deeply indebted to our guide Dr. Supriyo Paul whose guidance, stimulating suggestions and encouragement helped us in all the time of our project discussions & coding. REFERENCES
Fig.7 Heat map at convergence state
Validation of the ADI-CUDA implementation is shown in the Fig. 8
[1]
[2]
[3] [4]
[5] [6]
Fig. 8 Validation of ADI-CUDA code.
[7]
Xiaohua Meng, Dasheng Qin, Yuhui Deng, “Designing a GPU based Heat conduction algorithm by leveraging CUDA”, Fourth International Conference on Emerging Intelligent Data and Web Technologies,2013 Tomasz P. Stefa_ski, Timothy D. Drysdale,” Acceleration of the 3D ADI-FDTD Method Using Graphics Processor Units”, IEEE IMS 2009,2009 Jonathan Cohen, M. Molemaker, “A Fast Double Precision code using CUDA”, Proceedings of Parallel CFD, 2009 Dana Jacobsen, Julien Thibaullt, Inanc Senocak, “An MPI CUDA Implementation of massively parallel Incompressible Flow Computations on Multi-GPU Cluster”, Aerospace Sciences Meeting, 2010 Duy Minh Dang, “Pricing of Cross-Currency Interest Rate Derivatives on Graphics Processing Units”, IEEE, 2010 OlafSchenk, Matthias Christen, Helmar Burkhart, “Algorithmic performance studies on graphics processing units” J. Parallel Distrib. Compute. 68 (2008) 1360–1369 Paulius Micikevicius,”3D Finite Difference Computation on GPUs using CUDA”, GPGPU2, Washington D.C., US, March 8, 2009.
978-1-4799-6393-5/14/$31.00 ©2014 IEEE
IEEE International Conference on Advances in Engineering & Technology Research (ICAETR - 2014), August 01-02, 2014, Dr. Virendra Swarup Group of Institutions, Unnao, India [8]
Yunus A. Cengel,”Heat and mass Transfer”, 2nd Ed, Mc-Graw-Hill Education,ch.01-05,2007 [9] John D. Anderson, ”Computational Fluid Dynamics The Basics with Applications”, Springer, ch.04,1992 [10] Klaus Hoffmann, Steve Chaing, “Computational Fluid Dynamics Volume I ”,4th Ed.,EES,Ch.02,03,2000
[11] Jason Sanders, Edward Kandrot, “CUDA by example An introduction to General purpose computing”, Addison Wesley, 2010 [12] D. B. Kirk and W. mei W. Hwu, “Programming Massively Parallel Processors: A Hands-on Approach”, Morgan Kaufmann, 2010.
978-1-4799-6393-5/14/$31.00 ©2014 IEEE