Apply Cluster and Grid Computing on Parallel 3D Rendering Chao-Tung Yang Chuan-Lin Lai Department of Computer Science and Information Engineering Tunghai University Taichung, 407 Taiwan, R.O.C. Tel: +886-4-23590121 Ext. 3279 Fax: +886-4-23591567 E-mail:
[email protected] Abstract A cluster is a collection of independent and cheap machines, used together as a supercomputer to provide a solution. In this paper, a PC cluster consists of one master node and nine diskless slave nodes (10 processors), is proposed and built for parallel rendering purpose. The system architecture and benchmark performances of this cluster are also presented. Internet computing and Grid technologies promise to change the way we tackle complex problems. They will enable large-scale aggregation and sharing of computational, data and other resources across institutional boundaries. And harnessing these new technologies effectively will transform scientific disciplines ranging from high-energy physics to the life sciences. Also, in this paper, we construct two heterogeneous PC clusters for parallel rendering purpose and install Linux Red Hat 9 on each PC cluster. Then, these clusters are set to the different subnet. Therefore, we use the grid middleware - Globus ToolKit, to connect these two clusters to form a grid computing environment on multiple Linux PC clusters. We also install the SUN Grid Engine, to manage and monitor incoming or outgoing computing job and schedule the job to achieve high performance computing and high CPU utilization. The system architecture and benchmark performances of this cluster are also presented. Keywords: PC cluster, Parallel rendering, Cluster computing, Grid Computing, Speedup.
1. Introduction
Extraordinary technological improvements over the past few years in areas such as microprocessors, memory, buses, networks, and software have made it possible to assemble groups of inexpensive personal computers and/or workstations into a cost effective system that functions in concert and posses tremendous processing power. Cluster computing is not new, but in company with other technical capabilities, particularly in the area of networking, this class of machines is becoming a high-performance platform for parallel and distributed applications [4, 5, 9, 10, 11, 12, 13, 14].
0-7803-8603-5/04/$20.00 ©2004 IEEE.
Grid computing, most simply stated, is distributed computing taken to the next evolutionary level. The goal is to create the illusion of a simple yet large and powerful self managing virtual computer out of a large collection of connected heterogeneous systems sharing various combinations of resources. The standardization of communications between heterogeneous systems created the Internet explosion. The emerging standardization for sharing resources, along with the availability of higher bandwidth, are driving a possibly equally large evolutionary step in grid computing [1, 2, 3, 4, 5, 6, 14]. Rendering is a technique for generating a graphical image from a mathematical model of a two or three-dimensional object or scene. A common method of rendering is ray tracing. Ray tracing is a technique used in computer graphics to create realistic images by calculating the paths taken by rays of light entering the observer’s eye at different angles. Ray tracing is an ideal application for parallel processing since there are many pixels, each of whose values are independent and can be calculated in parallel. The Persistence of Vision Ray Tracer (POV-Ray) is an all-round 3-dimensional ray tracing software package [7, 8]. It takes input information and simulates the way light interacts with the objects defined to create 3D pictures and animations. In addition to the ray tracing process, newer versions of POV can also use a variant of the process known as radiosity (sophisticated lighting) to add greater realism to scenes, particularly those that use diffuse light POVRay can simulate many atmospheric and volumetric effects (such as smoke and haze). In this paper, a cluster with diskless clients was built. The system architecture and benchmark performances of the cluster are presented. In order to measure the performance of our cluster, the parallel ray-tracing problem is illustrated and the experimental result is also demonstrated on our Linux PC cluster. The experimental results show that the highest speedup is 6.55 for PVMPOV, when the total numbers of processor are 10 on cluster. Then, we use MPIPOV parallel ray-tracing techniques to examine the Grid Computing system. The experimental results show that the highest speedups are
obtained for MPIPOV, when the total number of processors is 8, by creating 8 tasks on Grid system. The results of this study will make theoretical and technical contributions to the design of a message passing program on Grid Computing system.
2. Background 2.1. Cluster Computing
A Beowulf cluster uses multi-computer architecture, as depicted in Figure 1. It features a parallel computing system that usually consists of one or more master nodes and one or more compute nodes, or cluster nodes, interconnected via widely available network interconnects. All of the nodes in a typical Beowulf cluster are commodity systems-PCs, workstations, or servers-running commodity software such as Linux. From a user’s perspective, a Beowulf cluster appears as a Massively Parallel Processor (MPP) system. The most common methods of using the system are to access the master node either directly or through Telnet or remote login from personal workstations. Once on the master node, users can prepare and compile their parallel applications, and also spawn jobs on a desired number of compute nodes in the cluster. Applications must be written in parallel style and use the message-passing programming model. Jobs of a parallel application are spawned on compute nodes, which work collaboratively until finishing the application. During the execution, compute nodes use standard message-passing middleware, such as Message Passing Interface (MPI) and Parallel Virtual Machine (PVM), to exchange information [4, 5, 6, 9, 10, 11, 12, 13]. Parallel Applications
LAN
Cluster Management Tools
Master Node
Message Passing Library
Linux S D
Ω PRF OE SSO N I A WO L RK ST AT I O N P4 0 A
File Server Compute Nodes S D
S D
Ω
Ω
PRF O E S IO N A L WO RKS TA T IO N AP 4 0
PRF OE SSO N I A WO L RK ST AT I O N P4 0 A
S D
Ω
1 x
x 7 x 6
1 3 x
t a S t u s r e n = g e a n lb e d ,l in k OK fla s h in g re e n d is a o b f= e d l ,in k lin fa k O li K
1 2 x
PRF OE S S IO N A WO L RK ST AT I O N P4 0 A
1 8 1 9 x
SUE PR ST A C K
Mo u d le n it U
S D
S D
3 Co m Ω
1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 4 1 1 5 6 1 1 7 8 1 1 9 0 2 2 1 2 2 3 2 4
Pc a k e t S ta tu s Pa c e k t
1 3 41 1 5 61 1 7 81 1 9 02 2 1 22 2 3 2 4
S ta tu s
1 3 5 7
2 4 6 8
PRF O E S IO N A L WO RKS TA T IO N AP 4 0
4 2 S u p re S t a c k I w itc h 3 0 0 S
C19 3 68 0
Compute Nodes
Interconnection S D
S D
Ω
Ω
PRF O E S IO N A L WO RKS TA T IO N AP 4 0
PRF OE SSO N I A WO L RK ST AT I O N P4 0 A
S D
Ω PRF OE SSO N I A WO L RK ST AT I O N P4 0 A
Figure 1: Logic view of a Beowulf cluster.
2.2. Grid Computing
Grid computing requires the use of software that can divide and farm out pieces of a program to as many as several thousand computers. Grid computing can be thought of as distributed and large-scale cluster computing and as a form of network-distributed parallel processing. It can be confined to the network of computer workstations within a corporation or it can be a public collaboration. The establishment, management,
and exploitation of dynamic, cross-organizational sharing relationships require new technology. This technology is Grid architecture and supporting software protocols and middleware [1, 2, 4, 5, 14] Another key technology in the development of grid networks is the set of middleware applications that allows resources to communicate across organizations using a wide variety of hardware and operating systems. The Globus Toolkit [1, 2] is a set of tools useful for building a grid. Its strength is a good security model, with a provision for hierarchically collecting data about the grid, as well as the basic facilities for implementing a simple, yet world-spanning grid. 2.2.1. Globus Toolkit The Globus Project [2] provides software tools that make it easier to build computational grids and grid-based applications. These tools are collectively called The Globus Toolkit. The Globus Toolkit is used by many organizations to build computational grids that can support their applications. Globus has a layered architecture (see Figure 2) that includes the fabric, connectivity, resource, collective and application layers. Applications Languages/Frameworks
Collective Service APIs and SDKs Collective Services
Resource APIs and SDKs Resource Services
Collective Service Protocols
Resource Service Protocols
Connectivity APIs Connectivity Protocols
Fabric Layer
Local Access APIs and Protocols
Figure 2: The Globus Grid architecture, where protocols, services and APIs occur at each level
2.2.2. MPICH-G2 MPICH-G2 [6] is a grid-enabled implementation of the MPI v1.1 standard. That is, using services from the Globus Toolkit® (e.g., job startup, security); MPICH-G2 allows you to couple multiple machines, potentially of different architectures, to run MPI applications. MPICH-G2 automatically converts data in messages sent between machines of different architectures and supports multi-protocol communication by automatically selecting TCP for inter-machine messaging and (where available) vendor-supplied MPI for intra-machine messaging. Existing parallel programs written for MPI can be executed over the Globus infrastructure just after recompilation [6]. 2.2.3. SUN Grid Engine Sun Grid Engine is new generation distributed resource management software which dynamically matches users’ hardware and software requirements to
160
$./pvmpov +iskyvase.pov +w800 +h600 +nt10 -L/home/ct/pvmpov3_1g_2/povray31/include $./pvmpov +ipawns.pov +w800 +h600 +nt10 -L/home/ct/pvmpov3_1g_2/povray31/include $./pvmpov +ifish13.pov +w800 +h600 +nt10 -L/home/ct/pvmpov3_1g_2/povray31/include
This is the benchmark option command-line with the exception of the +nw and +nh switches, which are specific to PVMPOV and define the size of image each of the slaves, will be working on. The +nt switch is specific to the number of tasks will be running. For example, +nt10 will start 10 tasks, one for each processor. The messages on the screen should show that slaves were successfully started. When completed, PVMPOV will display the slave statistics as well as the total render time. In case of Skyvase model, by using single processor mode for processing 1600×1280 image, the render time was 88 seconds. Using out the diskless cluster (10 processors) further reduced the time to 17 seconds. The execution times for the different POVray model (Skyvase, Pawns, and Fish13) on the cluster were shown in Figure 3, respectively. The corresponding speedups of different problem size by varying the number of task (option: +nt) was shown in Figure 5. The highest speedup was obtained about (1600×1280) for Pawns model by using our cluster with 10 processors, respectively.
Figure 3: The images of three pov models
800X600
120 100
88
1600X1280 57
60 40
1024X768
82
80
36
35 22 68
20
22 811
17
34 23 78
16
Fish13 by 10 proc
Fish13 by 1 proc
Pawns by 10 proc
Pawns by 1 proc
Skyvase by 10 proc
Skyvase by 1 proc
0
Model and processor used
Figure 4: The processing time of three pov models 8
3. Performance Evaluation
7 6 Speedup
We have run PVMPOV [7, 8] on our 10-processors testbed and have had amazing results, respectively. With the cluster configured, runs the following commands to begin the ray tracing and generates the image files as shown in Figure 2.
144
140 Processing time
the available (heterogeneous) resources in the network, according to policies usually defined by management [3]. Sun Grid Engine acts as the central nervous system of a cluster of networked computers. Via so-called daemons, the Grid Engine Master supervises all resources in the network to allow full control and achieve optimum utilization of the resources available. Sun Grid Engine aggregates the compute power available in dedicated compute farms, networked servers and desktop workstations, and presents a single access point to users needing compute cycles. This is accomplished by distributing computational workload to available systems, simultaneously increasing the productivity of machines and application licenses while maximizing the number of jobs that can be completed.
5
800X600
4
1024X768
3
1600X1280
2 1 0 Sky v ase
Pawns
Fish13
Model
Figure 5: The speedup of three pov models
MPIPOV has the ability to distribute a rendering across multiple heterogeneous systems. Parallel execution is only active if the user gives the “+N” option to MPIPOV. Otherwise, MPIPOV behaves the same as regular POV-Ray and runs a single task only on the local machine. Using the MPI code, there is one master and many slave tasks. The master has the responsibility of dividing the image up into small blocks, which are assigned to the slaves. When the slaves have finished rendering the blocks, they are sent back to the master, which combines them to form the final image. The code is designed to keep the available slaves busy, regardless of system loading and network bandwidth. We have run MPIPOV on our Grid Computing testbed and have had amazing results, respectively. With the cluster configured, runs the following commands to begin the ray tracing and generates the image files as shown in Figure 5.
$./mpirun –np 8 mpi-x-povray +iskyvase.pov +w800 +h600 +nt10 -L/home/cll/pvmpov3_1g_2/povray31/include $./mpirun –np 8 mpi-x-povray +ipawns.pov +w800 +h600 +nt10 -L/home/cll/pvmpov3_1g_2/povray31/include $./mpirun –np 8 mpi-x-povray +ichess13.pov +w800 +h600 +nt10 -L/home/cll/pvmpov3_1g_2/povray31/include
In case of Skyvase model, by using single processor mode for processing 1600×1280 image, the render time was 4652 seconds. Using out the Grid Computing sys-
tem (8 processors) further reduced the time to 647 seconds. The execution times for the different POVray model (Skyvase, Pawns, and Fish13) on the Grid were shown in Figures 7, 8, and 9, respectively. The highest speedup was obtained about (1600×1280) for Chesss2 model by using our Grid Computing system with 8 processors, respectively.
Figure 6: The images of three pov models MPIPOV for Chess2
Processing Time
5000 4000
1024X768
2000
1600X1200
1000
2
4
8
Number of processors used in a Grid
Figure 7: The processing time of chess2.pov models MPIPOV for Skyvase 160 140 Processing Time
In this paper, a cluster with diskless clients was built. The system architecture and benchmark performances of the cluster are presented. In order to measure the performance of our cluster, the parallel ray-tracing problem is illustrated and the experimental result is also demonstrated on our Linux PC cluster. The experimental results show that the highest speedup is 6.55 for PVMPOV, when the total numbers of processor are 10 on cluster. Then, we use MPIPOV parallel ray-tracing techniques to examine the Grid Computing system. The experimental results show that the highest speedups are obtained for MPIPOV, when the total number of processors is 8, by creating 8 tasks on Grid system. The results of this study will make theoretical and technical contributions to the design of a message passing program on Grid Computing system.
800X600
3000
0
120 100
800X600
80
1024X768
60
1600X1200
40 20 0 2
4
8
Number of Processors in a Grid
Figure 8: The processing time of Skyvase.pov models MPIPOV for Pawns
Processing Time
5. Conclusions
220 200 180 160 140 120 100 80 60 40 20 0
800X600 1024X768 1600X1200
2
4
8
Number of processors used in a Grid
Figure 9: The processing time of Pawns.pov models
References
[1] Global Grid Forum, http://www.ggf.org [2] The Globus Project, http://www.globus.org/ [3] Sun ONE Grid Engine, http://wwws.sun.com/software/gridware/ [4] R. Buyya, High Performance Cluster Computing: System and Architectures, Vol. 1, Prentice Hall PTR, NJ, 1999. [5] R. Buyya, High Performance Cluster Computing: Programming and Applications, Vol. 2, Prentice Hall PTR, NJ, 1999. [6] http://www.hpclab.niu.edu/mpi/, MPICH-G2 [7] http://www.haveland.com/povbench, POVBENCH – The Official Home Page. [8] http://www.povray.org/, POV-Ray The Persistence of Vision Raytracer. [9] T. L. Sterling, J. Salmon, D. J. Backer, and D. F. Savarese, How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters, 2nd Printing, MIT Press, Cambridge, Massachusetts, USA, 1999. [10] B. Wilkinson and M. Allen, Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers, Prentice Hall PTR, NJ, 1999. [11] M. Wolfe, High-Performance Compilers for Parallel Computing, Addison-Wesley Publishing, NY, 1996. [12] C. T. Yang, S. S. Tseng, M. C. Hsiao, and S. H. Kao, “A Portable parallelizing compiler with loop partitioning,” Proc. of the NSC ROC(A), Vol. 23, No. 6, 1999, pp. 751-765. [13] Chao-Tung Yang, Shian-Shyong Tseng, Yun-Woei Fan, Ting-Ku Tsai, Ming-Hui Hsieh, and Cheng-Tien Wu, “Using Knowledge-based Systems for research on portable parallelizing compilers,” Concurrency and Computation: Practice and Experience, vol. 13, pp. 181-208, 2001. [14] Chuan-Lin Lai, Chao-Tung Yang, “Construct a Grid Computing Environment on Multiple Linux PC Clusters” International Conference on Open Source 2003.