WSEAS TRANSACTIONS on COMPUTERS

46 downloads 0 Views 285KB Size Report
Complexity Analysis To Software Defect System. 102. Zhang Kai. Design Security for Internet-Based Workflow Management Systems Adopting Security Agents.
WSEAS TRANSACTIONS on COMPUTERS A.

Issue 2, Volume 4, February 2005

ISSN

1109-2750

http://www.wseas.org

Mobile Robot Global Localization Using an Artificial Landmark with Geometric-Projective Properties Armida Gonzalez Lorence, Mayra P. Garduno Gaffare, J. Armando Segovia de los Rios

77

Extracting Reusable Knowledge from Portal Activity Christopher John Hogger, Frank R. Kriwaczek

83

A self-adaptable Inference Engine Chinh Phan Cong, Axel Hunger

90

Efficient Sequential Pattern Mining Algorithms Renata Ivancsy, Istvan Vajk

96

Complexity Analysis To Software Defect System Zhang Kai

102

Design Security for Internet-Based Workflow Management Systems Adopting Security Agents Myeonggil Choi

108

A Fast Evolutionary Algorithm in Codebook Design Abdolali Momenai, Siamak Talebi

117

Unification of Sorts Among Local Ontologies for Semantic Web Applications Nwe Ni Tun, Satoshi Tojo

123

Analytic-based Estimation of Query Result Sizes Carlo dell'Aquila, Ezio Lefons, Filippo Tangorra

130

Designing an Understanding and Debugging Tool (UDT) for Object-oriented Programming Language Nor Fazlida Mohd Sani, Abdullah Mohd Zin, Sufian Idris, Zarina Shukur

137

Process Monitoring Using Bond Graph Approach Ismail Dif, Mohammed Mostefai, Mabrouk Khemliche

143

Solving 3-SAT Using Constraint Programming and Fail Detection Gary Yat-Chung Wong, Kin-Yeung Wong, Kai-Hau Yeung

148

Incremental Verification Methodology for DEVS Models Wan Bok Lee, Chang Hyun Roh

154

On a versatile and costless OMR system

160

Iosif Androulidakis, Nikolaos Androulidakis

Constraint-based Fuzzy Models for an Environment with Heterogeneous Information Granules Robert Lai, David Chiang

166

Extracting synchronization-free threads in perfectly nested loops using the Omega project software Wlodzimierz Bielecki, Krzysztof Siedlecki

172

Investigating factors influencing the response time in J2EE web applications Agnes Bogardi-Meszoly, Gabor Imre, Hassan Charaf

179

Formula of Software Defect Number Zhang Kai

184

A UML Class Diagram-Based Pattern Language for Model Transformation Systems Tihamér Levendovszky, László Lengyel, Hassan Charaf

190

An Efficient Algorithm for Computing all Program Static Slices Jehad Al Dallal

196

A Framework for Differential Quality of Service Infrastructure for a Java Application Server Rajeshwari Ganesan; Varun Chhabra

201

The Design of Algorithm Translation Package Using UML Noraida Haji Ali, Masita Masila Abdul Jalil and Mustafa Mat Deris

207

Model Checking for Aspect-Oriented Software Evolution Wuttipong Ruanthong and Pornsiri Muenchaisri

216

Component Integration for Web Based Applications Richard Wasniowski

222

Measuring Maintainability in Early Phase using Aesthetic Metrics Matinee Kiewkanya, Pornsiri Muenchaisri

227

Automated Test Case Generation from IFAD VDM++ Specifications Aamer Nadeem, Muhammad Jaffar-ur-Rehman

233

Study of the Precision of a 3-D Image Registration Technique Using 3-D PET Brain Simulated Images Antoine Abche , Georges Tzanakos and E. Micheli-Tzanakou

240

Towards Real-Time Simulation of the Sidescan Sonar Imaging Process James Riordan, Daniel Toal, Colin Flanagan

247

FPGA Based Data Coding Ali M. Al-Haj

253

On Accelerating the Neighbours Lists Generation Process Using Field Programmable Gate Arrays Ali M. Al-Haj

258

A Novel Approach to Compress Image Set Shih-Chieh Shie, Shinfeng D. Lin

263

Image Blending for Virtual Environment Construction Based on TIP Model Chang Hyun Roh, Wan Bok Lee

267

On Accelerating the Neighbours Lists Generation Process Using Field Programmable Gate Arrays

ALI M. Al-HAJ Department of Computer Engineering, College of Electrical Engineering, Princess Sumaya University for Technology, Al-Jubeiha P.O.Box 1438, Amman 11941, JORDAN

Abstract — The smoothed particle hydrodynamics (SPH) is an efficient method used in the simulation of astrophysical systems. The most computation-intensive part of SPH is the process of generating neighbours lists for all particles in the simulated astrophysical system. In this paper, we propose two computation-efficient approaches to generate neighbours lists. We also outline their implementations on a hardware platform based on field programmable gate arrays. The FPGA platform will accelerate the neighbours list generation process, eventually leading to high speed simulation of astrophysical systems. Key words: — Neighbours list generation, SPH, FPGA implementation, Astrophysical system simulation.

1 Introduction Efficient N-Body simulation of astrophysical systems requires the treatment of both the gravitational and hydrodynamic forces [1]. The Smoothed Particle Hydrodynamics (SPH) method is widely used in the computation of hydrodynamical forces [2,3]. however, it is a computation-intensive method, and thus requires implementations capable of exploiting its inherent parallelism [4,5]. The most computation intensive part of SPH, is the process of generating a nearest neighbours list for each particle in the simulation space. In fact, it has been reported recently that generating such lists takes roughly one third of the total simulation time [6]. Accelerating the neighbour lists generation requires the development of efficient search algorithms that will process mutual nearest neighbour trees in a minimum time. It also requires high-speed implementations of such algorithms on customized hardware accelerators.

In this paper, we propose two approaches that will efficiently generate the neighbours lists on a filed programmable gate array (FPGA) platform [7]. The clump approximation approach and the group neighbourhood approach, are based on mutual nearest neighbour trees, and both are implemented using the linked list structure [8]. The FPGA platform is a multi-purpose reconfigurable accelerator computing engine based on the latest Xilinx Virtex technology [9,10]. Tow types of memory are connected on the board; high performance static RAM (SRAM) and high density dynamic RAM (SDRAM). This paper is organized as follows. Section two describes construction and search of mutual nearest neighbours trees. In sections three and four, the two neighbour lists generation approaches are described. Some concluding remarks are given in section five.

2 Mutual Nearest Neighbors Trees The overall speed of SPH simulation depends strongly on the ability to rapidly identify the nearest neighbouring particles. In a unified gravitational and hydrodynamical implementation, the search for the nearest neighbours takes roughly one third of the total time. To accelerate the process, a good solution has been to reorganize the original particles into a mutual nearest neighbour tree data structure [11]. The basic idea using such a tree structure is that, one can eliminate large sets of particles, contained in a clump of particles, with a single distance, or open criteria, calculation. In what follows, a description of the tree building process, and its linked list transformation are described. 2.1 Tree Construction Two particles are said to be mutually neatest pairs, if each of them is the nearest neighbour to the other. The tree is built bottom up by combining only mutually neatest pairs of particles at their centre of mass. That is, when two particles are mutual nearest neighbours, they are replaced by a second level node located at their centre of mass position. Then, the set of second level nodes are considered, plus any yet-unpaired single particles, and a search is made for mutual nearest neighbours, which become third level nodes; and so on. As each pair of mutually closest neighbours is joint and replaced by a nodes at the centre of mass, the tree hierarchy is reduced by one. This process is repeated until the original N particles are joined into a binary tree of N – 1 nodes [12].

2.2 Tree Elements The particle space( around one million particles) is generated randomly on the host workstation. Each leaf node in the tree must contain the particle’s quantities :mass, position, velocity, acceleration, force, and density. Also, the following quantities

are needed to traverse the tree: child pointer, sibling pointer and the smoothing length. The nonleaf nodes, including clumps and sub-clumps, which are equal to the number of leaf nodes minus one, store the quantities: mass, centre of mass, child pointer, distance to the centre of mass of the parent clump, and the smoothing length. 2.3 Linked List Recursive Processing The normal way of searching a tree is by successive recursive calls down the tree [13]. There is a considerable overhead from recursive calls. The tree search is called many times, at each time step, and so its best to eliminate recursion in this function. On the other hand, non-recursive tree search can be accomplished by arranging the tree nodes in a linked list. Each linked list node contains two pointers: one to its first child and one to its adjacent sibling. If a node is the last child in a level, its sibling pointer is assigned to its parent’s sibling. A node which contains one particle will have only one adjacent sibling and no children.

2.4 Non-recursive Processing of the LinkedList In the process of generating the nearest neighbours list of a certain particle, the nodes of linked list are visited one a time starting from the root or first node. The opening criterion is computed for each non-leaf node ( a clump or a sub-clump node). There are two scenarios depending on the outcome of the opening criterion result. If the criterion fails, which means that the node, and its contents, are in the neighbourhood of the particle, then its children must be visited ( the node must be opened). This is accomplished by moving to the children nodes using the ‘child pointer’ of the node. If the criterion passes, which means that the node, and its contents, are not in the neighbourhood of the particle, then the node, including the particles it contains, are ignored and not considered as a neighbour. This is accomplished by moving to the sibling node using the ‘sibling pointer’ of the node.

3 Clumps Approximation Approach The clumps approximation approach examines clumps (non-leaf) nodes of the linked list, one a time. A node or a clump opening criteria is used to decide if a clump may have nearest neighbours to the particle under consideration or not. If it has no neighbours, then its ignored and all its particles are not considered neighbours of the particle, thus saving a great amount of computation. If the clump satisfies the opening criteria, then it does have neighbour particles for the particle under consideration, and all particles are checked one a time to see whether they are actual neighbours or not. 3.1 FPGA Memory Requirements

and

Communications

The initial linked list is stored in the SRAM. Each element in the list represents a particle (or a clump/ sub-clump) and it consists of fields representing its properties. Run-time generated nearest neighbours lists for all particles are stored in the SDARM. Each list consists of pointers to the actual neighbour particles stored in the initial linked. As for communications, during neighbours list generation process, data is read from the SRAM, and after a neighbour is found, its positional pointer is written into its nearest neighbours list which is in the SDRAM. Also, when updating the force of a certain particle, its neighbour list is retrieved from the SDRAM into the FPGA. The FPGA uses the list to fetch the corresponding particles and update the force and other relevant properties. Finally, the updated linked list must be sent back to the host to generate a new tree and a new linked list. 3.2 FPGA Processing Requirements Every leaf particle is retrieved from the linked list, and compared with all remaining elements of the linked list. The elements representing clumps are compared earlier than the leaf nodes, since they are in a higher level in the original tree. If the compared clump happens to contain neighbour particles, then its child nodes are also compared one a time. Otherwise, if the compared element is a particle, not a clump node, then a different comparison is made, and the leaf node is either registered as a neighbour or ignored. If the retrieved linked list element, j, is a clump node and i is the particle under consideration, then we evaluate the following opening criteria :

rij 2 > ( h i

bjmax

+

+

max(

hj ) )2

(1)

where rij is distance between the particle under consideration and the clump node, hi is the smoothing length of the particle, hj is the smoothing length of the clump, bjmax is extent of the particles from the centroid. If the opening criteria is satisfied, then the clump node is a far away from the particle, its particles are ignored and a jump to the neighbour clump node is done using the sibling pointer. The above statement is recomputed for the new clump node. If the opening criteria is not satisfied, then the clump node contains neighbour particles, then a jump to the neighbour child node is made using the child pointer. If the child is a sub-clump node, the above processing is repeated. However, if the node is a leaf particle, then, if the linked list element retrieved is a particle node ( a leaf node), examine the particle for a possible neighbourhood relationship with the particle under consideration. In this case, the above processing is repeated except that neighbourhood statement becomes :

rij2 < ( h i

4 Group Approach

+

hj ) 2

Nearest

(2)

Neighbours

The group nearest neighbourhood approach exploits the fact that particles in a clump may share a subset of common neighbours. This is due to the fact the mutual nearest neighbours trees preserve the neighbourhood relationships of the original particles space. The approach works as follows. A nearest neighbour clumps list is first generated for each clump in the tree. For a given clump, its nearest neighbour clumps list contains pointers to the clumps that are found to be its neighbours. The generation of such a list takes little computation since its done on the clumps level, not the particle level. Next, the nearest neighbour particles list for each particle in a given clump is generated by considering all particles of the clump, in addition to its nearest neighbour clumps list. The clump’s particles are considered neighbours of each particle in the clump. The remaining neighbours are found by examining each particle in every clump appearing in the clump’s nearest neighbour clumps list.

4.1 FPGA Memory and Communications Requirements The initial linked list of particles and the initial linked list of clumps nodes are stored in the SRAM. Run time generated nearest neighbour clumps lists and nearest neighbour particles lists are stored in the SDRAM. Neighbour lists generation results in the following forms of communications. During the clumps neighbours list generation process, data is read from the clumps linked list stored in the SRAM. Nearest neighbour clumps lists are generated ( one for each clump) and stored into the SDRAM. During the nearest neighbour particles list generation process, data is read from both the nearest neighbour clumps lists stored in the SDRAM and the initial particles linked list stored in the SRAM. Also, when updating the force of any particle, its nearest neighbour particles list is retrieved from the SDRAM into the FPGA. The FPGA uses the list to fetch the neighbour particles and update the force and other relevant properties of the particle under consideration. After all forces have been updated, the updated linked list must be reorganized so it is sent back to the host which in turn generates a new tree and a new linked list. 4.2 FPGA Processing Requirements The following processing is done for each single particle in the initial particle space which has been reorganized into two linked list data structures: the clumps linked list, and the particles linked list. The processing on these two lists starts by generating a nearest neighbour clumps list for each clump node, and then using theses lists to generate a nearest neighbour particles list for each particle in the particles space. Therefore, the FPGA reads the clump nodes linked list stored initially in the SRAM, process theses lists, and finally outputs to the SDRAM a nearest neighbour clump list for each clump. If we let the retrieved clump node be j, then we compute the following neighbourhood criteria : rijc2 < ( h ic

+

hjc ) 2

(3)

where rijc is distance between the two clump nodes, hic is the smoothing length of the clump i, and hjc is the smoothing length of the clump j. If the neighbourhood criteria is not satisfied, a jump is made to the clump node specified by the sibling pointer, and the above statement is evaluated again. If the neighbourhood criteria is satisfied, then the clump is a neighbour clump, and the clump’s pointer is added to the current clump’s nearest neighbour clumps list which is stored in the SDRAM.. A jump is then made to the clump node specified by the sibling pointer, and the statement is evaluated again. The neighbours of a given particle are of two types; particles considered neighbours for all particles in the clump, and particles in neighbouring clumps which are specified in the list of nearest neighbour clumps list. The nearest neighbours list generation process of each particle in the current clump starts by generating an empty nearest neighbour particles list for the current particle, and then appending to this structure pointers of all leaf particles of the clump to which this particle belongs. This is followed by processing the clumps linked list of the clump to which this particle belongs. The neighbour clumps appearing in the list should be processed and opened one a time. If the retrieved particle node is j, then the following particle neighbourhood statement is evaluated to examine the particle for a possible neighbourhood relationship with the particle under consideration, and let it be i, :

rijp2 < ( h ip

+

hjp ) 2

(4)

where rijp is distance between the two clump nodes, hip is the smoothing length of the clump i , and hjp is the smoothing length of the clump j. If the particle neighbourhood statement is not satisfied, a jump is made to the next particle of the clump specified by the sibling pointer, and the statement is evaluated for the new particle. If the particle neighbourhood statement is satisfied, then the particle is a neighbour particle, and the particle’s pointer is appended to the current particle’s nearest neighbours list which has been generated in the SDARM. A jump is then made to the next leaf particle which is specified by the sibling pointer, and the statement is recomputed.

5 Conclusions and Ongoing Research

[5]. Lia C. and Carraro G.,“A parallel tree-SPH code for galaxy formation”,MNRAS, 1999.

In this paper, we described two SPH neighbours list generation approaches, with reference to their memory, communication and processing requirements on a target FPGA implementation platform. We are now working on implementing the two approaches to evaluate their performances. We expect the group nearest neighbourhood approach to outperform the clump approximation approach in terms of execution speed.

[6]. Hernquist L. and Katz N, “TREESPH: A Unification of SPH with hierarchical tree method,” Astrophysical Journal Supplement Series, 70, 419-446, 1989

Acknowledgements

[8]. R. Anderson, "Tree Data Structures for N-Body Simulation", SIAM J. Comput, Vol. 28, No. 6. pp. 1923-1940.

The author would like to thank Deutsche Forschungsgemeinschaft (DFG ) for sponsoring this research during the summer of 2004. He also would like to thank ,and the head and staff of the computer engineering department of Mannheim University, Germany, for their support during my research visit.

References [1]. J. Barnes and P. Hut, "A Hierarchical O(N log N) force-calculation algorithm, Nature, vol. 324, 1986, 446-449. [2]. J. Monaghan, "Smooth Particle Hydrodynamics", Annu. Rev. Astron. Astrophys, 1992, vol. 30, pp. 543-574 [3]. JW. Benz, "Smooth Particle Hydrodynamics: A Review", J.Buchler(ed), The Numerical Modelling of Nonlinear Stellar Pulsations, Kluwer Academic Publishers, 1990, pp. 269-288. [4]. J. Dubinski,“ A parallel tree code", New Astronomy, 1,1996, pp. 133-147.

[7]. P. Diniz and J. Park, " Data Search and Reorganization Using FPGAs: Application to Spatial Pointer-based Data Structures", Proc. Of the 2003 IEEE Symp. On FPGAs for Custom Computing Machines (FCCM'03), Napa, CA, April 2003.

[9]. Gerhard Lienhard, PHD Thesis, Mannheim University, Germany. [10]. A. Kugel, " RACE – 1 A PCI-64 based High Performance FPGA Co-Processor", http://www-li5.ti.uni-mannheim.de/fpga/race [11]. Wetzstein M., “WINE – A new code for astrophysical particle simulation,” Diploma thesis, Heidelberg University, 2000, Germany [12]. J. Waltz, G. Page, S. Midler, J. Wallin, and A. Antunes, "A Performance Comparison of Tree Data Structures for N-Body Simulation", Journal of Computational Physics, 178, 2003, pp1-14. . [13]. J. Makino, "Comparison of Two Different Tree Algorithms," Journal of Computational Physics, 88, 1990, pp339-408.