An Introduction to Commercial Reconfigurable

12 downloads 0 Views 223KB Size Report
tary dedicated bus. ... Peripheral IPs: such as bus interfaces, DMA controllers, .... Applications (FPL'2002), Montpellier, France, Sept. 2002 ... VLSI (ISVLSI 2003), 20-21 February 2003, Tampa, FL, USA, IEEE. Computer Society, pp. 107-112.
An Introduction to Commercial Reconfigurable Processing Architectures* Rui F. L. Marcelino1, and João M. P. Cardoso2 1

Universidade do Algarve, Escola Superior de Tecnologia, Campus da Penha, 8005-139, Faro, Portugal [email protected] 2 Universidade do Algarve, Faculdade de Ciências e Tecnologia, Campus de Gambelas, 8000-117, Faro, Portugal INESC-ID, 1000-029, Lisboa, Portugal [email protected]

Abstract This paper presents an overview of a representative set of commercial devices implementing reconfigurable computing. The paper intends to provide a landscape of different kinds of those devices, exposing their main characteristics and discussing targeting applications, and architectural details.

1 Introduction Reconfigurable computing adds spatial computing to conventional computer architectures. High-end reconfigurable computing systems may provide performance acceleration than microprocessor and DSP-Based systems with similar power consumption and cost. Several academic and industrial efforts have proved the capabilities of reconfigurable computing systems and there have been many examples of reconfigurable architectures (see, for instance, [1]). Reconfigurable computing devices play an important role in order to satisfy the high-demand performance requirements of today’s and future embedded systems. Examples of configurable components can be seen in a large number of systems (e.g., game devices such as the Sony PSP [2]). Figure 1 shows a Y-chart with typical processing platforms and their main characteristics according to main configurability properties and performance. Configurability ranges from program instructions to logic functions and interconnections. Examples of machines with configurability realized with program instructions are typical microprocessors. NoCs (Network-onChip) include microprocessors and add extra configurability at the interconnection level. Coarse-grain arrays employ configurability at data-path (ALUs) and interconnection levels. FPGAs are the most flexible ones since they employ configurability at logic functions and interconnection levels. Performance may increase on the direction of highconfigurability levels since more specialization is possible. Note, however, that with respect to performance the relative position of the devices strongly depends on the target application. Applications for reconfigurable computing architectures include: communications, digital consumer products, multime* Any opinions are those of the authors and do not necessarily reflect institutions’ opinions.

dia, automotive electronic products, etc. In communications, the advent of different communication standards may play an important role to justify the reconfigurability flexibility. In addition, performance improvements to deal with cryptography can also be an important issue. The parallel processing capabilities, able to achieve by reconfigurable computing architectures, are important features for multimedia applications. Reconfigurable hardware offers almost the processing performance of specific hardware, plus the design flexibility of software, and therefore supports the development of products targeted to a wide range of customers. Performance Instructions (Program) Microprocessors

Instructions (Program) + Interconnect

Datapaths (ALUs + inteconnections

NoCs

Logic functions + inteconnections

Coarse-Grain Arrays FPGAs

Configurability

Devices

Figure 1. Y-chart landscape of different devices with respect to Performance and Configurability.

Although some papers have been published attempting to furnish an overview of the state-of-the-art [1][3], to categorize the reconfigurable computing landscape [4][5], etc., none has specifically focused commercial reconfigurable architectures. We think an overview of commercial reconfigurable architectures is very important since it may permit to understand better the actual landscape and the kind of applications they are focusing. This paper is organized as follows. Next section describes some of the most representative commercial reconfigurable devices. Finally, last section draws some comments.

2 Reconfigurable Platforms Advanced FPGA Platforms FPGAs have become truly reconfigurable SoCs, with hardware/software reprogrammability. Large number of gates and several on-chip memories, embedded arithmetic units (e.g., multipliers), general-purpose processor cores, high-speed I/Os, etc., are now part of the most advanced FPGAs. They are now considered as “platform FPGAs”, since they really may implement true platforms. This new category of programmable devices presents huge opportunities and exposes new application possibilities, but it also presents new challenges in the design and verification processes. In FPGA-based architectures, CPUs can be implemented by softcore or by hardcore processors. CPUs may handle control tasks and tasks needing low to medium processing demands. The reconfigurable logic allows massively parallel processing, since they employ the resources needed to perform several operations in parallel. Specialization is also a property enable by reconfigurable logic that may help to fulfill certain application requirements otherwise not accomplished. The fine-grain of FPGAs is one of the features makes them high-flexibility devices, but is also the one that may difficult their programming and the suitability to map certain algorithms. Configurable Processor Cores and System-on-Chip A configurable processor core is a complete, fully functional processor design that can be tailored or expanded to meet the performance, power dissipation or energy consumption requirements of a single or a set of applications [6][7][8][9]. These processors are delivered as synthesizable RTL (Register-Transfer Level) code, ready to be implemented with an FPGA or as an ASIC. Examples of companies that have developed configurable and extensible processors are: Tensilica [6][7][8], Improv Systems [9], ARC [10], 3DSP [11], etc. Usually, they also provide add-on packages such as encoders/decoders (e.g., MP3, MPEG-2/4, etc.). The configurability of these platforms allows designers to add/remove features according to the application, by selecting from standard configuration options, such as: additional instructions, register file type and size, number of interrupts, reset state, bus widths, memory and I/O interfaces, etc. Those customizable features require software tools in order to analyze the source code of a certain application and to help on the task to deliver a suitable processor configuration and extensions. At the system level, configurable processors can be part of customizable SoCs. Below are two examples of such systems. MeP: The Media Embedded Processor from Toshiba [21] is more than a processor as it defines an hierarchical architecture where several MeP Modules are connected through a proprietary dedicated bus. Each MeP Module may include one or more IP cores provided in the form of softcores: − Processor core (referred as MeP core): it is a 32-bit RISC based processor with various configurations options, such as instructions, local memory, interrupts, bus interfaces, etc.

− Extension Unit IP: it works such as dedicated hardware, implemented as a coprocessor. Examples of these extensions are floating-point and audio decoding units. − Peripheral IPs: such as bus interfaces, DMA controllers, JTAG debug interface, etc. SPEAr: ST Microelectronics [22] presented recently the SPEAr, a customizable SoC. The device integrates an ARM core with a set of IP blocks and an embedded configurable logic block. The chip provides interfaces for different kinds of connectivity such as Ethernet and USB, Memory controller Interface and regular embedded peripherals such as UART, GPIO, I2C, etc. The company refers the customizable logic has about 400K equivalent gates. No information has been possible to collect about the dynamically reconfigurable features. Microprocessors coupled with Reconfigurable Logic Chameleon RCP: Chameleon Systems has been one of the first companies to develop a product combining, in a single chip, a microprocessor and a reconfigurable architecture [12][13][14]. Figure 2 shows a block diagram of the Chameleon RCP architecture. The architecture uses an ARC microprocessor (RISC-based) and a reconfigurable processing fabric, among other components. The reconfigurable fabric is based on slices, each one with a number of tiles. Each tile contains a matrix of 32-bit datapath elements (multiplications use 16×24-bit), memories, and a control logic unit.

Figure 2. Chameleon RCP architecture (source: [13]).

Albeit the advantages of the approach, Chameleon does not resisted to the lack of enough customers and the company closed in 2004. Some comments indicate the difficulties to map applications to the architecture, possibly because of the early state of the design tools, as one of the reasons for the limited success of the product. Morphotech M-rDSP: M-rDSP [15] is an IP-core architecture that includes a 32-bit RISC microprocessor (mRISC) and a Reconfigurable Cell (RC) Array. The RC array may have from 8 to 64 cells. Each RC cell has an ALU, MAC, Logic units and specialized functional units. The company mainly addresses wireless applications.

Morphotech has its roots in the Morphosys architecture [16][17]. The Morphosys has been developed in the University of Californa at Irvine (USA) sponsored by a research project [18]. The architecture combines a RISC processor with an array of reconfigurable 8×8 cells. The core processor is implemented by a 32-bit microprocessor based on the MIPS architecture and the reconfigurable cell comprises an ALUmultiplier, a shift unit, two input multiplexers and a register file with four registers. In addition to typical RISC instructions, the processor has been augmented with special instructions, called Morphosys instructions, for controlling the behavior of the reconfigurable array. Atmel FPSLIC: Atmel developed FPSLIC [19], a dynamically reconfigurable SoC allowing multiple interfaces, peripherals and/or operators to share the same silicon at different times. The goal of the reconfigurable area is to implement multiple, interchangeable peripherals, computational operators, and bus interfaces, including UART, SDIO, PCI, PCMCIA, HDLC, and Ethernet. The FPSLIC II integrates an 8-bit AVR processor, with 36 KB program/data SRAM, a hardware multiplier, peripherals and a dynamically reconfigurable FPGA, with 256 to 2300 cells. The device also includes a configuration controller, two DMA controllers, and a dedicated FPGA-to-AVR interface. The on-chip AVR and configuration controller manage the reconfiguration process. E.g., the configuration controller signals the AVR when it is time to reconfigure the FPGA. DAPDNA-2: IPFLEX [20] developed DAPDNA-2 The device includes a Digital Application Processor (DAP), a 32bit IPFlex-proprietary RISC processor, and a Distributed Network Architecture (DNA). The DNA is a dynamically reconfigurable 2-D array with 376 processing elements. The RISC processor is mainly used to control the DNA dynamically reconfigurable array, but can also be used for data processing. The DAPDNA-2 also includes a variety of external interfaces such as DDR-SRAM interface, direct I/O interface, PCI interface, etc. Reconfigurable Coarse-Grain Arrays XPP: The XPP (eXtreme Processing Platform) [23][24] is a data-driven processing architecture based on an array of coarse-grain, adaptive processing elements, and interconnection resources. The architecture has been developed by PACT XPP Technologies, Inc., a German company. Figure 3 shows a block diagram of the architecture. The XPP core contains a 2-D array of two types of elements: ALU elements in the center and RAM elements in both sides of the array. Horizontal buses are used for interconnecting all these elements. Additionally, interconnections for vertical paths are also present. The data bit width can be configured from 8 to 32 bits. The configurations of the array are done through a configuration manager. The array also includes a configuration cache that stores one or several configurations. The array can be partially reconfigured while neighboring computing elements are processing data. Both academic and industrial efforts to develop a SoC with a RISC microprocessor and an XPP array can be found in [25].

Figure 3. XPP architecture (source: [24]).

RAP: The Reconfigurable Algorithm Processing (RAP) [26] is used by Elixent to define its dynamically reconfigurable product. A technology called “D-Fabrix processing array” is a platform that realizes the RAP concept. RAP has been developed by Elixent, an UK company. The embryonic of the RAP architecture started with the CHESS architecture developed by the HP labs in Bristol, UK [27]. RAP provides an array of 4bit ALUs and register/buffer blocks that may be cascaded to suit different data widths. Elixent advocates that since the arithmetic of common multimedia tasks is typically divisible by 4-bits, the grain used by RAP is suitable for providing efficient arithmetic for the 8-24 bit data-widths prevalent in multimedia signal processing. PicoChip: PicoChip has developed the picoArray [28]. The array is a massive parallel system, bearing in mind an alternative to ASICs for the range of applications within the wireless communications domain. PicoArray is an architecture which consist of an array of 340 heterogeneous processors connected together by an interconnect network (see Figure 4). The processors are organized in a two dimensional grid. The interconnect network consists of switches and special buses named PicoBus [28] to form a network of 32-bit unidirectional buses and programmable bus switches. All the processors in the picoArray are 16-bit width, and there are four RISC processor variants which share a common core instruction set, and make use of stream based processing. Typically, data is passed at a high rate through a chain of processors, each one performing relatively simple operations on data before passing data to the next stage. DRP: The Dynamically Reconfigurable Processor [30] (DRP) has been developed by NEC Electronics [31]. It is a coarse-grain reconfigurable processor that consists of a two dimensional array of Processing Elements (PE), a State Transition Controller (STC), and dual-port distributed memory modules. The number of PEs is optional. Each PE has an 8-bit ALU, an 8-bit DMU (Data Manipulation Unit, for shifts and masks), a register file with 16 8-bit registers, etc. Those units are connected by programmable switches and wires in a similar manner to common FPGAs. Each PE has a 16-depth instruction memory and supports multiple context operation.

Reconfigurable System-on-Chip (ACER)”, financed by CRUP/DAAD (Portugal/Germany bilateral cooperation).

References [1]

[2]

Figure 4. PicoArray architecture (source: [29]). [3]

3 Overall Comments Table I summarizes some of the most representative configurable platforms, commercially available. They target a wide variety of applications including network and wireless communications, multimedia, digital signal processing, etc. They cover also a wide spectrum of architecture properties with differences on bit-widths, interconnection resources, processing element operations, memory elements, etc. Each architecture has been designed believing its efficiency to execute certain kernels, tasks or applications. It is far from being pacific the more suitability of one architecture than the others for the same target applications. One of the reasons is the lack of comparison results (e.g., performance, energy savings, etc.) using the same benchmarks. The platforms presented are changing the way highperformance embedded applications are developed. Using reconfigurable hardware means that we may have specialized and quasi-optimized processing and instead of all the hardware being needed, a smaller reconfigurable hardware device can be used to meet the performance requirements. Typically, manufactures also need to furnish not only the hardware but also suitable software tools. The abstraction of software is usually envisaged since it would permit to migrate easily software programs already developed and to enable software programmers to use reconfigurable computing architectures. Software tools are therefore fundamental for compiling software programming languages into the target platforms. However, more efforts on developing efficient compilers for those architectures are mandatory needed. Most companies also provide libraries of IP cores that represent optimized implementations of relevant tasks in their architectures (e.g., MPEG decoders). This seems to be mandatory in order to show performance improvements over, e.g., DSP solutions, and to assist designers with specialized processing, very hard to develop and requiring fully mastering of the target architecture and of the application algorithms. Reconfigurable computing platforms are now part of the computing landscape. There is a strong evidence to justify an increase of their role in the future embedded systems and in the opportunities that certainly arise with nanotechnology.

[4]

[5]

[6] [7]

[8]

[9] [10] [11] [12]

[13]

[14] [15] [16]

[17]

[18] [19] [20] [21]

[22] [23]

[24] [25]

Acknowledgments This work has been partially supported by the project “Architecture and Compilation Exploration for a Dynamically

R. Hartenstein, “A Decade of Reconfigurable Computing: a Visionary Retrospective,” In Int’l Conf. on Design, Automation and Test in Europe (DATE’01), Munich, Germany, March 12-15, 2001, pp. 642-649. Yoshikazu Kurose, et al., “A 90nm embedded DRAM single chip LSI with a 3D graphics, H.264 codec engine, and a reconfigurable processor,” in HOT-CHIPS 16, A Symposium on High Performance Chips, Memorial Auditorium, Stanford, CA, USA, August 22-24, 2004. http://www.hotchips.org/hc16/program/ Francisco Barat, Rudy Lauwereins, Geert Deconinck “Reconfigurable Instruction Set Processors from a Hardware/Software Perspective,” in IEEE Transactions on Software Engineering, Vol. 28, Nº 9, September 2002, pp. 847-862. M. Sima, et al., “Field-Programmable Custom Computing Machines - A Taxonomy,” in 12th Int’l Conference on Field-Programmable Logic and Applications (FPL’2002), Montpellier, France, Sept. 2002, SpringerVerlag, Lecture Notes in Computer Science (LNCS), Vol. 2438, pp. 7988. Patrick Schaumont, et al., “A Quick Safari Through the Reconfiguration Jungle” in 38th Design Automation Conference (DAC’2001), Las Vegas, USA, June 2001, pp. 172-177. Ricardo E. Gonzalez, “Xtensa: A configurable and extensible processor,” in IEEE Micro, March-April 2000. Steve Leibson, “Configurable Processors: What, Why, How?,” SoCcentral-ASIC, FPGA, EDA, and IP news and design information, June 2005. Available at: http://www.soccentral.com/ Chris Rowen, and Steve Leibson. “Flexible Architectures for Engineering Successful SOCs,” in 41st Conference on Design Automation Conference (DAC'04), 2004, pp. 692-697. Tom R. Halfhill, “Best Processor cores of 2004,” in Microprocessor Report, January 2005. ARC Configurable Processor. http://www.arc.com/ 3DSP. http://www.3dsp.com/ X. Tang, M. Aalsma, and R. Jou, “A compiler directed aproach to hiding configuration latency in chameleon processors,” In Int’l Conference on Field-Programmable Logic and Applications (FPL’2000), Apr. 2000. Bill Salefski, and Levent Caglar, “Re-Configurable Computing in Wireless,“ in 38th ACM/IEEE Design Automation Conference (DAC’2001), Las Vegas, Nevada, USA, 2001, pp. 178 – 183. Chameleon Systems Corp. http://www.chameleonsystems.com/ http://www.morphotech.com/ Guangming Lu, et al., “The MorphoSys Dynamically Reconfigurable System-On-Chip”, Proceedings of the First NASA/DoD Workshop on Evolvable Hardware, Pasadena, CA, USA, 19-21 July 1999. Hartej Singh, et al., “Design and Implementation of the MorphoSys Reconfigurable Computing Processor”, Journal of VLSI and Signal Processing-Systems for Signal, Image and Video Technology, March 2000. http://www.eng.uci.edu/morphosys/ FPSLIC (AVR with FPGA) from Atmel. http://www.atmel.com/ Ipflex Inc. “DataSheet Dynamically Reconfigurable Processor DAPDNA-2,” November 2005. Shigeaki Takaki, et al., “Hardware/Software Partitioning Methodology for Systems on Chip (SoCs) with RISC Host and Configurable Microprocessors,” IP Based Design 2003. Nov. 13-14. 2003. ST Microelectronics – SPEAR – Structured Processor Enhanced Architecture Family. Available at: http://www.st.com/ V. Baumgarte, et al. “PACT-XPP – A Self-reconfigurable Data Processing Architecture,” In Journal of Supercomputing, Kluwer Academic Publishers, vol. 26, issue. 2, September 2003, pp. 167-184. XPP Technologies, “XPP-IIb Core Overview: White Paper”, Version 1.0.0 September 9, 2005. Available at: http://www.pactcorp.com/ Jürgen Becker, Martin Vorbach, “Architecture, Memory and Interface Technology Integration of an Industrial/Academic Configurable Systemon-Chip (CSoC),” in IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2003), 20-21 February 2003, Tampa, FL, USA, IEEE Computer Society, pp. 107-112.

[26] D-Fabrix Reconfigurable Algorithm Processing (RAP) - from Elixent. Available at: http://www.elixent.com/ [27] Alan Marshall, et al., “A Reconfigurable Arithmetic Array for Multimedia Application,” in ACM/SIGDA 7th Int’l Symposium on Field Programmable Gate Arrays (FPGA’99), Monterey, CA, USA, Feb. 21-23, 1999, pp. 135-143. [28] picoChip: http://www.picochip.com/

[29] Datasheet of PicoChip PC102- Wireless Communications Processors. [30] N. Suzuki, et al., “Implementing and Evaluation Stream Applications on The Dynamically Reconfigurable Processor”, in 12th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’04), 2004. [31] DRP - Dynamically Reconfigurable Processor. Available at: http://www.necel.com/

Table I. Representative configurable computing platforms and their main characteristics.

Platform2

Product Family

Advanced FPGA Virtex-4 FX Platforms Spartan and Virtex series Stratix ProASIC3

Custom instructions4

Word Bit5

Frequency6

Reconfigurable array granularity

Power PC 405 MicroBlaze NIOS II ARM7

N N Y Y

32 32 32 32/8

450 200 205 350

Fine Fine Fine Fine

http://www.xilinx.com/ http://www.xilinx.com/ http://www.altera.com/ http://www.actel.com/

CPU Core3

URL

Configurable Processors7

Xtensa Arc Jazz DSP SuperSIMD SP-5

Xtensa LX ARC700 JazzDSP proprietary

Y Y Y N

32 32 16, 32 32

400 533 100 320

NA NA NA NA

http://www.tensilica.com/ http://www.arc.com/ http://www.improvsys.com/ http://www.3dsp.com/

Configurable SoCs

MeP8 SPEAr QuickMIPS

proprietary ARM MIPS

Y Y Y

32 32 32

300 266 200

NA Coarse Coarse

http://www.mepcore.com/ http://www.st.com/ http://www.quicklogic.com/

Microprocessors coupled with Reconfigurable Logic

Chameleon RCP (CS2112) Morphotech M-rDSP FPSLIC DAPDNA-2

ARC TinyRISC AVC proprietary

N Y N N

32 32 8 32

125 25 166

Coarse Coarse Fine Coarse

http://www.chameleonsystems.com/ http://www.morphotech.com/ http://www.atmel.com/ http://www.ipflex.com/

Reconfigurable Coarse-Grain Arrays

12345678-

RAP- Reconfigurable Al- NA NA 4 Medium http://www.elixent.com/ gorithm Processing http://www.pactcorp.com/ XPP- eXtreme Processing NA NA 8..32 Coarse Plataform http://www.picochip.com/ picoChip PC-102 Array of CPUs NA 16 160 Coarse http://www.necel.com/en/techhighli DRP - Dynamically Recon- NA NA 8 133 Medium ghts/drp/ figurable Processor (DRP1) Most data on this table have been collected from the correspondent URLs in October 2005. With respect to our previous classification Best performance CPU core customization or extensibility ALU word bit width Maximum Frequency Usually, they can be included as hardcores or softcores when considering their implementation with ASICs or FPGAs, respectively ET1 combines MeP with the Elixent D-Fabrix processing array (RAP- Reconfigurable Algorithm Processing)