Parallel Implementation of Bi-cubic Interpolation Algorithm using MPI on multi-core Systems Lokendra Singh Umrao, Ravi Shankar Singh, Srijan Misra and S Sujana Department of Computer Science & Engineering, Indian Institute of Technology (BHU), Varanasi–221 005, India
Objectives
Speedup Computation
The main objectives of the paper are: • Study
the parallel computing and utilization of additional processing resources to accelerate computations. • Proving that massively parallel computer can solve scientific problems faster than a single processor. • Parallelization of Bi-cubic Interpolation algorithm using MPI API. • Showing that an exponential speedup is possible if sufficiently many processors are available.
Introduction Multi-core chips are an important new trend in computer architecture. Several Microprocessor manufacturing companies have entered the era of multicore processors where multiple processors are being added on the same chip instead of just increasing frequency of a single processor. The focus now has been shifted towards adding multiple cores on the same chip thereby allowing applications to run independently on a different cores at the same time. Not only different applications can run in parallel but also a single application could be threaded so that different threads could get a different core each and hence run in parallel at the same time.
Basics When a single application is running on an underlying multi-core processor, it can be threaded itself. Hence, utilize all the computing power. An algorithm has to be properly studied so that different areas in the algorithms could be explored where it can be parallelized. Various synchronization issues have to be taken care of while converting a sequential algorithm into parallel. Once an algorithm has been studied for parallelization, it has to be threaded so that different tasks can run in parallel.
Figure 1: Speedup performance based on Amdahl’s law
Software Environment Operating System: Linux Development tools: gcc compiler with MPI
Methods
Conclusion
Automatic Parallelization: Many compilers provide a flag or option for automatic program parallelization. When this is selected, the compiler analyzes the program, searching for independent sets of instructions, and in particular for loops whose iterations are independent from one another. It then uses this information to generate explicitly parallel code. Message Passing Interface: MPI is an application programming interface for distributed memory parallel programming platform which helps to create processes and give independent tasks to threads easily. Just by specifying some compiler directives one can able to create threads, divide work between threads and synchronize threads.
Increasing the number of threads after four has not lead into increase into speedup proportionally because the machine is quad-core but by using hyperthreading it gives good performance up to eight threads. Hence it can be clearly seen that complete computing power of these next generation multi-core processors can be used if the algorithms are designed to support distributed memory parallel programming and hence benefit the end user.
Future Work • Further
improve the parallel algorithm • Apply different approaches; OpenMP/CUDA
References
Bilingual International Conference on Information Technology: Yesterday, Today, and Tomorrow Organised by DESIDOC, DRDO, Metcalfe House, Delhi–110 054, India during 19-21 February 2015
Plateform model name intel core i7 − 3770 clock rate 3.40GHz architecture i686 CPU op-mode(s) 32–bit, 64–bit byte order little endian CPU(s) 8 socket 1 cores/socket 4 thread/core 2 on-line CPU(s) list 0–7 CPU family 6 model 58 CPU MHz 1600 L1 cache/core 32k L2 cache/core 256k L3 cache/socket 8192k
Results Figure 2 showed that using more number of cores for computation by increasing number of processors up to four, the real time for interpolating image has decreased continuously. There is 200% speedup when number of processors are two and nearly 350% speedup when number of processors are four.
[1] G. Tournavitis, Z. Wang, B. Franke, O’Boyle, and F. P. Michael. Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping. In ACM Sigplan Notices, volume 44 (6), pages 177–187, 2009. [2] J. G. Blas, F. Isaila, M. Abella, J. Carretero, E. Liria, and M. Desco. Parallel implementation of a x-ray tomography reconstruction algorithm based on mpi and cuda. In Proceedings of the 20th European MPI Users’ Group Meeting, pages 217–222. ACM, 2013. [3] Y. Liu and F. Gao. Parallel implementations of image processing algorithms on multi-core. In Fourth IEEE International Conference on Genetic and Evolutionary Computing, pages 71–74, 2010.
Contact Information • Web: Figure 2: Parallelization overhead for Bi-cubic Interpolation
http://www.iitbhu.ac.in/cse • Email:
[email protected] • Phone: +91 9415772305