Sciknow Publications Ltd.
Internet Technologies and Applications Research
ITAR 2013, 1(2):11-17 DOI: 10.12966/itar.09.01.2013
©Attribution 3.0 Unported (CC BY 3.0)
An Overview on High Performance Issues of Parallel Architectures Koushik Chatterjee1,*, Sumit Joshi2 1 2
Computer Science & Engineering, Pacific Academy of Higher Education and Research University, Udaipur, India Computer Science & Engineering, Sir Padampat Singhania University, Udaipur, India
*Corresponding author (Email:
[email protected])
Abstract – In recent years, as microprocessors have become cheaper and the technology for interconnecting them has improved, it has become both possible and practical to build general-purpose parallel computers containing a very large number of processors. Processors in a parallel computer need to communicate in order to solve a problem. Therefore, there is a need for some kind of communication highway or interconnection network, i.e. the processors to be connected in some pattern. Performance in multiprocessor systems is highly dependent on communication processes between processors and memory, I/O devices, and other processors. Therefore, choosing the right interconnection network is important for efficiency reasons. In the present work the issues of High Performance in parallel computing to achieve parallel architecture at level of hardware and software has been studied. Keywords – Parallel Architecture, Parallel Computing, Interconnection Network, Performance in Multiprocessor Systems
1.
Introduction
High-performance computers are highly desired in the areas of structural analysis, weather forecasting, aerodynamics simulations, artificial intelligence etc., achieving high performance depends not only on using faster and more reliable hardware devices but also major improvements in computer architecture and processing techniques such as parallelism. Parallelism can be applied at the hardware/software (or) at the algorithmic level, programming level and the design of interconnection networks in multiprocessor systems. Programing level of parallelization covers lot of programing models and trends. There are several trends in parallel programing model which are common in use. Shared Memory (without threads), Threads, Distributed Memory / Message Passing, Data Parallel, Hybrid, Single Program Multiple Data (SPMD), Multiple Program Multiple Data (MPMD) The Topic covers brief study of such elements and issues which are responsible for the performance of parallel architecture. 1.1. Concepts and terminology: Concepts: Traditionally software has been written for serial computation. Parallel computing is the simultaneous use of multiple compute resources to solve computational problems. It’s the future of computing, distinguishes multi-processor architecture by instruction and data. Terminology: 1) SISD – Single Instruction, Single Data
2) SIMD – Single Instruction, Multiple Data Example: Each processing unit operates on a different data element. 3) MISD – Multiple Instruction, Single Data Example: Multiple cryptography algorithms attempting to crack a single coded message. 4) MIMD – Multiple Instruction, Multiple Data Example: Most common type of parallel computer.
Fig.1. SIMD– Single Instruction, Multiple Data Architecture
Internet Technologies and Applications Research (2013) 11-17
12
Fig.2. MIMD – Multiple Instructions, Multiple Data Architecture 1.2. General Terminology 1) Task – A logically computational work
discrete
section
Fig.4. Uniform Memory Access (UMA) – Architecture
of
2.2. Distributed Memory 1) Each processor has its own memory. 2) Is scalable, no overhead for cache coherency.
2) Parallel Task – Task that can be executed by multiple processors safely
3) Programmer is responsible for many details of communication between processors.
3) Communications – Data exchange between parallel tasks 4) Synchronization – The coordination of parallel tasks in real time 1.3. More Terminology 1) Granularity – The ratio of computation to communication 2) Coarse – High computation, low communication 3) Fine – Low computation, high communication
Fig.5. Distributed Memory Architecture
1.4. Parallel Overhead
2.
1.
Synchronizations
2.
Data Communications
3.
Overhead imposed by compilers, libraries, tools, operating systems, etc.
Parallel Computer Memory Architectures
2.1. Shared Memory Architecture 1) All processors access all memory as a single global address space. 2) Data sharing is fast. 3) Lack of scalability between memory and CPUs.
Fig.6. Non-Uniform Memory Access (NUMA) – Architecture
3.
Parallel Programming Models
Exist as an abstraction above hardware and memory architectures Examples: 1) Shared Memory 2) Threads 3) Messaging Passing 4) Data Parallel
Fig.3. Shared Memory Architecture
Koushik Chatterjee & Sumit Joshi: An Overview on High Performance Issues of Parallel Architectures
13
3.1. Shared Memory Model Appears to the user as a single shared memory, despite hardware implementations. Locks and semaphores may be used to control shared memory access. Program development can be simplified since there is no need to explicitly specify communication between tasks. 3.2. Threads Model 1) A single process may have multiple, concurrent execution paths. 2) Typically used architecture.
with
a
shared
Figure.7. Data Parallel Model Parallelizing a Program: Given a sequential program/algorithm, how to go about
memory
producing a parallel version
3) Programmer is responsible for determining all parallelism.
Four steps in program parallelization 1) Decomposition
3.3. Message Passing Model 1) Tasks exchange data by sending and receiving messages. 2) Typically used architectures.
with
distributed
Identifying parallel tasks with large extent of possible parallel activity 2) Assignment
memory
Grouping the tasks into processes with best load balancing 3) Orchestration
3) Data transfer requires cooperative operations to be performed by each process.
Reducing synchronization and communication costs 4) Mapping
Example - A send operation must have a receive operation.
Mapping of processes to processors
MPI (Message Passing Interface) is the interface standard for message passing.
4.
MIPS Floating Point Code Example:
Most models of computation represent the computer as a general-purpose, deterministic, random access machine (a von Neumann machine). Algorithms which can be executed by vonNeumann type machines are called sequential algorithms (sometimes also called serial algorithms). We are about to examine models of computation that present a much different machine, one in which several instructions can be executed simultaneously. Generally, referred to as, parallel machines or parallel computers, these are computers which have More than one processor operating in parallel. Over the years, there have been many different models of parallel computation that have been developed. As with sequential machines, parallel machines are best suited to certain classes of problems and to take advantage of a parallel architecture, algorithms must be developed specifically for the parallel architecture. We will see several parallel models will be discussed along with their relative merits and weaknesses.
double A[1024], B[1024]; for (i=0; i