An Overview on High Performance Issues of Parallel ...

3 downloads 118 Views 484KB Size Report
Sep 1, 2013 - hardware/software (or) at the algorithmic level, programming ..... [21] http://chetsarena.files.wordpress.com/2012/10/3-3-recent-trends-in-par.
Sciknow Publications Ltd.

Internet Technologies and Applications Research

ITAR 2013, 1(2):11-17 DOI: 10.12966/itar.09.01.2013

©Attribution 3.0 Unported (CC BY 3.0)

An Overview on High Performance Issues of Parallel Architectures Koushik Chatterjee1,*, Sumit Joshi2 1 2

Computer Science & Engineering, Pacific Academy of Higher Education and Research University, Udaipur, India Computer Science & Engineering, Sir Padampat Singhania University, Udaipur, India

*Corresponding author (Email: [email protected])

Abstract – In recent years, as microprocessors have become cheaper and the technology for interconnecting them has improved, it has become both possible and practical to build general-purpose parallel computers containing a very large number of processors. Processors in a parallel computer need to communicate in order to solve a problem. Therefore, there is a need for some kind of communication highway or interconnection network, i.e. the processors to be connected in some pattern. Performance in multiprocessor systems is highly dependent on communication processes between processors and memory, I/O devices, and other processors. Therefore, choosing the right interconnection network is important for efficiency reasons. In the present work the issues of High Performance in parallel computing to achieve parallel architecture at level of hardware and software has been studied. Keywords – Parallel Architecture, Parallel Computing, Interconnection Network, Performance in Multiprocessor Systems

1.

Introduction

High-performance computers are highly desired in the areas of structural analysis, weather forecasting, aerodynamics simulations, artificial intelligence etc., achieving high performance depends not only on using faster and more reliable hardware devices but also major improvements in computer architecture and processing techniques such as parallelism. Parallelism can be applied at the hardware/software (or) at the algorithmic level, programming level and the design of interconnection networks in multiprocessor systems. Programing level of parallelization covers lot of programing models and trends. There are several trends in parallel programing model which are common in use. Shared Memory (without threads), Threads, Distributed Memory / Message Passing, Data Parallel, Hybrid, Single Program Multiple Data (SPMD), Multiple Program Multiple Data (MPMD) The Topic covers brief study of such elements and issues which are responsible for the performance of parallel architecture. 1.1. Concepts and terminology: Concepts: Traditionally software has been written for serial computation. Parallel computing is the simultaneous use of multiple compute resources to solve computational problems. It’s the future of computing, distinguishes multi-processor architecture by instruction and data. Terminology: 1) SISD – Single Instruction, Single Data

2) SIMD – Single Instruction, Multiple Data Example: Each processing unit operates on a different data element. 3) MISD – Multiple Instruction, Single Data Example: Multiple cryptography algorithms attempting to crack a single coded message. 4) MIMD – Multiple Instruction, Multiple Data Example: Most common type of parallel computer.

Fig.1. SIMD– Single Instruction, Multiple Data Architecture

Internet Technologies and Applications Research (2013) 11-17

12

Fig.2. MIMD – Multiple Instructions, Multiple Data Architecture 1.2. General Terminology 1) Task – A logically computational work

discrete

section

Fig.4. Uniform Memory Access (UMA) – Architecture

of

2.2. Distributed Memory 1) Each processor has its own memory. 2) Is scalable, no overhead for cache coherency.

2) Parallel Task – Task that can be executed by multiple processors safely

3) Programmer is responsible for many details of communication between processors.

3) Communications – Data exchange between parallel tasks 4) Synchronization – The coordination of parallel tasks in real time 1.3. More Terminology 1) Granularity – The ratio of computation to communication 2) Coarse – High computation, low communication 3) Fine – Low computation, high communication

Fig.5. Distributed Memory Architecture

1.4. Parallel Overhead

2.

1.

Synchronizations

2.

Data Communications

3.

Overhead imposed by compilers, libraries, tools, operating systems, etc.

Parallel Computer Memory Architectures

2.1. Shared Memory Architecture 1) All processors access all memory as a single global address space. 2) Data sharing is fast. 3) Lack of scalability between memory and CPUs.

Fig.6. Non-Uniform Memory Access (NUMA) – Architecture

3.

Parallel Programming Models

Exist as an abstraction above hardware and memory architectures Examples: 1) Shared Memory 2) Threads 3) Messaging Passing 4) Data Parallel

Fig.3. Shared Memory Architecture

Koushik Chatterjee & Sumit Joshi: An Overview on High Performance Issues of Parallel Architectures

13

3.1. Shared Memory Model Appears to the user as a single shared memory, despite hardware implementations. Locks and semaphores may be used to control shared memory access. Program development can be simplified since there is no need to explicitly specify communication between tasks. 3.2. Threads Model 1) A single process may have multiple, concurrent execution paths. 2) Typically used architecture.

with

a

shared

Figure.7. Data Parallel Model Parallelizing a Program: Given a sequential program/algorithm, how to go about

memory

producing a parallel version

3) Programmer is responsible for determining all parallelism.

Four steps in program parallelization 1) Decomposition

3.3. Message Passing Model 1) Tasks exchange data by sending and receiving messages. 2) Typically used architectures.

with

distributed

Identifying parallel tasks with large extent of possible parallel activity 2) Assignment

memory

Grouping the tasks into processes with best load balancing 3) Orchestration

3) Data transfer requires cooperative operations to be performed by each process.

Reducing synchronization and communication costs 4) Mapping

Example - A send operation must have a receive operation.

Mapping of processes to processors

MPI (Message Passing Interface) is the interface standard for message passing.

4.

MIPS Floating Point Code Example:

Most models of computation represent the computer as a general-purpose, deterministic, random access machine (a von Neumann machine). Algorithms which can be executed by vonNeumann type machines are called sequential algorithms (sometimes also called serial algorithms). We are about to examine models of computation that present a much different machine, one in which several instructions can be executed simultaneously. Generally, referred to as, parallel machines or parallel computers, these are computers which have More than one processor operating in parallel. Over the years, there have been many different models of parallel computation that have been developed. As with sequential machines, parallel machines are best suited to certain classes of problems and to take advantage of a parallel architecture, algorithms must be developed specifically for the parallel architecture. We will see several parallel models will be discussed along with their relative merits and weaknesses.

double A[1024], B[1024]; for (i=0; i

Suggest Documents