An FPGA Based SIMD Processor With A Vector ... - Semantic Scholar

An FPGA Based SIMD Processor With A Vector Memory Unit Junho Cho, Hoseok Chang and Wonyong Sung School of Electrical Engineering Seoul National University Kwanak-gu, Seoul 151-744 Korea E-mail: {juno, chs, wysung}@dsp.snu.ac.kr Abstract—A SIMD processor that contains a 16-way partitioned data-path is designed for efficient multimedia data processing. In order to automatically align data needed for SIMD processing, the architecture adopts a vector memory unit that consists of 17-bank memory blocks. The vector memory unit also has address generation and rearrangement units for eliminating bank conflicts. The MicroBlaze FPGA based RISC processor is used for program control and scalar data processing. The architecture has been implemented on a Xilinx FPGA, and the implementation performance for several multimedia kernels is obtained.

II.

The system consists of a host processor, a 16-way partitioned data-path SIMD coprocessor, a 17-way vector memory unit for parallel data memory accesses, and an instruction memory as illustrated in Fig. 1. The host processor takes the role of scalar operations and program control. And the coprocessor is in charge of vector operations. When an instruction is fetched, the instruction distributor classifies the instruction into a scalar or a vector type, and then supplies it to the corresponding unit. If the fetched instruction is a scalar type, it is transmitted to the host processor directly, while no operation (NOP) instruction is supplied to the SIMD coprocessor. The coprocessor will be in the idle state in this case. On the contrary, if the instruction is a vector type, it is sent to the coprocessor. Thus the program model is similar to that of the conventional vector processors. The scalar portion of the program is executed by the host processor while the array related operations in a loop is done by the coprocessor.

I. INTRODUCTION The SIMD architecture with a very long partitioned datapath is found in various CPU’s nowadays. The architecture is very efficient for multimedia data processing because it can process multiple samples or pixels using one vector instruction. However it frequently suffers from the overhead of aligning data. If the required data are not stored in order, the overhead of reordering data such as pack, unpack, rotate and shuffle is needed, which obviously reduces the performance gain. As the functional unit becomes wide, the aligning overhead becomes more critical [1]. In order to overcome this alignment problem, a vector memory structure which uses multiple banks of memory connected in an interleaved fashion has been developed [2][3]. However, the conventional multi-bank memory architecture is not free from the bank conflict problem. In this SIMD architecture, a vector memory unit that is designed to eliminate most of the bank conflicts is equipped.

I-Mem

Host Processor

D-Mem

ALU D-Bus

Mem Bank 0

Mem Bank 1

Reg File

Instruction Decoder

I-Bus

SReg File

Mem Bank 2

Mem Bank 16

D-Bus

DRU

FSL

I-Bus

Instruction Decoder

Mem

AGU Instruction Distributor

AGU

In this paper, the architecture, the methods of application development and the performance improvement will be explained. The rest of this paper is organized as follows. In the Section II, the architecture of the developed system will be presented. And the Section III explains the supporting instruction set. Next the vectorizing procedure and the performance improvement will be shown in Section IV. Finally concluding remarks are made in Section V.

0-7803-9390-2/06/$20.00 ©2006 IEEE

ARCHITECTURE

VReg File 0

VReg File 1

VReg File 2

VReg File 15

ALU 0

ALU 1

ALU 2

ALU 15

SIMD Coprocessor

Figure 1. Architecture of the overall system

525

ISCAS 2006

DSP48_2

TABLE I.

DSP48_4

OPMODE1

±

MAC1 ABD1

OPMODE2

DSP48_1

±

MAC2 ABD2

Unit

DSP48_3

OPMODE1

OPMODE2

x

±

ADD1 SUB1 ACC1 MUL1

x

±

ADDRESS / DATA GENERATION AND REARRANGEMENT

ADD2 SUB2 ACC2 MUL2

Function

AGU

Addr for (ALU#K) = (start address + K*stride)/17

ARU

Bank for (ALU#K) = (start address + K*stride)%17 = R

DRU

ALU for (Bank#R) = K

N

An FPGA Based SIMD Processor With A Vector ... - Semantic Scholar

An FPGA Based SIMD Processor With A Vector ... - Semantic Scholar

Suggest Documents

Architecture of SIMD Type Vector Processor - Semantic Scholar

Architecture of SIMD Type Vector Processor - Semantic Scholar

Architecture of SIMD Type Vector Processor - CiteSeer

A compact FPGA-based processor for the Secure ... - Semantic Scholar

An Analogue SIMD Focal-plane Processor Array - Semantic Scholar

An FPGA-based Point Pattern Matching Processor

fpga-based vector processor for algebraic equation solvers

fpga-based vector processor for algebraic equation solvers

fpga-based multi-core processor

SIMD Processor Array Architectures

An Analogue SIMD Focal-plane Processor Array

An FPGA-Based Doppler Processor for a Spaceborne Precipitation ...

Compiler Optimization for SIMD type Vector Processor - IJRASET

Partially Reconfigurable Vector Processor for ... - Semantic Scholar

FPGA prototyping of a RISC processor core for ... - Semantic Scholar

FPGA prototyping of a RISC processor core for ... - Semantic Scholar

Vector and SIMD Processors

A Vision Chip with SIMD Current-Mode Analogue Processor Array

A General-purpose Vision Processor with 160x80 Pixel-Parallel SIMD ...

An FPGA-Based Processor for Shogi Mating Problems

An FPGA-Based Processor for Shogi Mating Problems - Staff

Implementing a Stack-Based Processor with FPGAs - Semantic Scholar

An FPGA based SHA-256 processor - Philip Leong

VIP: An FPGA-Based Processor for Image Processing and Neural ...