Classification of Application Development for FPGA-. Based Systems .... flows are augmented by additional tools and features needed for mission specific ...
Classification of Application Development for FPGABased Systems Ivan Gonzalez, Esam El-Araby, Proshanta Saha, Tarek El-Ghazawi, Harald Simmler NSF Center for High-Performance Reconfigurable Computing (CHREC), The George Washington University {ivangm, esam, sahap, tarek, simmler}@gwu.edu
Saumil G. Merchant, Brian M. Holland, Casey Reardon, Alan D. George, Herman Lam, Greg Stitt NSF Center for High-Performance Reconfigurable Computing (CHREC), University of Florida {merchant, holland, reardon, george, lam, stitt}@chrec.org
Nahid Alam, Melissa C. Smith Clemson University {nalam, smithmc}@clemson.edu Abstract—Field-Programmable Gate Arrays (FPGAs) have been used to accelerate DoD-related applications with promising performance. However, current development tools require significant hardware knowledge and are not amenable to the increasing complexity of FPGA-based systems. The application requirements are expected to change dramatically for future use cases, and require a well defined development methodology. This paper presents the results obtained after conducting an extensive survey and study about current FPGA tools. A classification for DoD use cases and FPGA tools is provided. This classification provides the current status of the available tools and identifies current tool limitations for DoD use cases.
I.
INTRODUCTION
Reconfigurable computing with high-speed processing on FPGA technologies has rapidly become a computational paradigm of prime interest throughout the DoD community. These interests range from mission-critical space systems to high-end supercomputing, where inherent advantages in performance, power, cost, size, and versatility are highly attractive as compared to traditional microprocessor and ASIC technologies. However, many technical challenges must be overcome before the potential of FPGA-based systems can be fully harnessed. Development of applications on FPGAs is today a highly laborious and specialized activity performed by electrical and computer engineers using low-level languages and vendor-specific tool flows and device interfaces. Attaining close to peak efficiency on a single FPGA device with existing tools requires a potent set of design and analysis skills, while This material is based on research sponsored by DARPA under agreement number FA8650-07-1-7742. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of DARPA or the U.S. Government.
designs spanning multiple devices are beyond the capabilities of most. This paper presents the results obtained after conducting an extensive survey and study about current FPGA tools and use cases. II.
DOD USE CASE SCENARIOS
As a starting point, a survey of literature and DoD-related contacts was conducted to identify potential DoD use cases. Over 60 members of the DoD, industry, and academia were invited to take part in a survey and provide feedback. Survey results show that there are a broad range of applications, mission scenarios, and needs in the DoD-related community. Based on the survey findings, we concluded that the classical FPGA design flow can be divided into four major phases: (1) Formulation, (2) Design, (3) Translation and (4) Execution (FDTE), as shown in Figure 1. In the Formulation phase, concepts and tools are used to enable the designer to explore and prune the design tree by studying at a high level a variety of algorithmic and architectural options and mappings and predicting behavior of each viable combination. In this phase, no software or hardware is formally designed, but instead a variety of mathematical, simulative, anecdotal, and even ad-hoc analyses are used to make decisions. Among the important pre-design issues here are parallelization and mapping strategies, which define and bound performance, and numerical analyses, since numerical format and resource utilization on the FPGA are inextricably linked. In the Design phase, concepts and tools are used to craft the actual design for the hardware and software components in the system. Depending upon the class of use cases, the designer is primarily interested in some particular mix of goals in performance, power efficiency, size, weight, cost, etc.
are increasingly important for bridging the semantic gap from high-level abstraction where application design is most productive to the device level where technology mapping occurs with synthesis, place, and route, and doing so for large devices with reasonable compilation times. Once an application has been formulated, designed, and translated, new issues arise in the Execution phase. Whereas unit and integration testing and debug are relatively well understood for software and supported by a broad range of tools, such steps are more challenging for systems involving new hardware and software components. FPGA debug tools are currently limited to on-chip logic analysis mechanisms for intra-device interactions, with little or no support at the system level. Beyond debug, after an application has been verified to operate correctly, the next step is performance analysis and optimization. For some use cases, such as a deterministic SoC design, this step may be trivial. However, for complex systems comprised of multiple FPGAs and other processing devices (e.g. microprocessors), and concomitant interconnect, memory, and storage resources, performance analysis while challenging is vital in achieving required behavior. In support of these same goals, a variety of run-time systems services may be associated with the application, such as schedulers for load balancing, heartbeat and recovery services for fault tolerance, etc. Figure 1. Application development phases
Although design for FPGAs is of course inherently oriented towards hardware design, nevertheless there are often many control-flow components and interfaces where software design is also involved, be it with on-chip resources (i.e. hard- or soft-core processors) or off-chip microprocessors, and the design balance between hardware and software is often critical in attaining design goals. Tools in this phase tend to be either linguistic (e.g. an HDL or HLL) or graphical (e.g. CAD schematic) in nature, depending upon the class of use and preference of the designers. Many challenging issues are found in the design phase, such as the nature, semantics, and syntax of the languages, environments, and tools used to provide the designer with the expressiveness necessary to render an effective design. Other increasingly important issues include design portability between FPGA devices and systems, a major weakness in most existing applications, as well as interoperability among languages, environments, and tools. A relatively new issue here is design scalability, where an integrated suite of tools is needed to take a designer from device-level design to system-level design for cases involving many devices. The Translation phase is comprised of the concepts and tools required to convert higher-level design structures, be it source code or CAD schematic diagrams, into an executable form. Hardware components go through a complex series of compilation steps in moving from HLL or HDL codes to bitstream configurations for the FPGA device. Software components do as well, as they are translated into machine language and linked with object modules from libraries. Core libraries for hardware are comparatively still in their infancy yet vital to exploit hardware design reuse. Issues in this phase
Based on information gathered through the surveys several FPGA mission scenarios ranging from data-mining on highperformance computing systems to space-based highperformance embedded computing applications were identified. Significant interest in multi-FPGA computing systems among the DoD community was evident from the surveys. An important conclusion drawn from the user tool flows identified was the existence of a common baseline tool chain to target either single- or multi-FPGA systems irrespective of the specific mission scenarios. The baseline flows are augmented by additional tools and features needed for mission specific requirements. Several key FPGA mission requirements were identified such as (i) real-time, (ii) fault tolerance, (iii) partial reconfiguration, and (iv) system-on-chip. FPGA use cases are some combination of these mission scenarios such as single-FPGA real-time scenario, and multiFPGA fault tolerant, partial reconfiguration scenario, as shown in Figure 2.
Figure 2. Use case taxonomy
Figure 3. Tool taxonomy
Use cases for the purpose of this study are general-purpose computing (GPC), high-performance computing (HPC) or high-performance reconfigurable computing (HPRC), and high-performance embedded computing (HPEC). In the GPC use case, the user perceives the underlying architecture as a standard von Neumann machine, such as a desktop. In the HPC/HPRC usage scenario, the user perceives the underlying machine as a parallel computer. Finally, the HPEC usage scenario uses the underlying machine as an embedded parallel computer. III.
TOOL CLASSIFICATION
The FDTE development flow in FPGA applications can be applied at the device level (i.e. design issues within the FPGA) as well as the system level (i.e. design issues outside the FPGA for single- and multi-device scenarios). Most FPGA tools cover, at least, one of these development phases so there is a direct relation between tools and development phases. Therefore, a preliminary tool classification could be based on
these phases. Additionally, it is possible to specify an additional sub-categorization related to the functionality of the tool. Each development phase requires three main tasks: 1) Performance analysis task to identify the application requirements (power, speed, area, etc.) 2) Development task to apply the available tools for that specific phase (formulation tools, design tools, etc.) 3) Validation task to verify that the application requirements for that phase are completed. So a new classification divides each development phase into three main tasks and classifies the tools as a function of the development phase and the number of required tasks supported. Figure 3 shows this new classification. Further, performance analysis and simulation are not independent of the development flow, these tasks are part of the development process, and a good tool supports not only the development itself, but also the performance analysis and the validation.
Figure 4. Detailed classification of development stages
Figure 5. FPGA-based system development space
After an intensive survey, including published research and vendor literature, an additional sub-categorization of the development tools was included to complete the tools taxonomy. Figure 4 shows these new sub-categories for each development phases. IV.
USER TYPE
In addition to the DoD use cases and tools, there is a third component that is necessary to consider: the user. The tool flow employed during application development is highly dependent upon the end-user type. Application developers, for example, used a different set of tools than system designers. While application developers were satisfied using tools that abstracted away the hardware, system designers required lowlevel tools to optimize a solution based on the target hardware. Users are classified into two main categories, System Designers and Application Designers. Application Designers are further broken down into three subcategories, Domain Scientists, RC-Aware Domain Scientists, and RC Engineers. The category of Domain Scientists encompasses users with little to no knowledge of the underlying FPGA-based system. The category of RC-Aware Domain Scientists is comprised of users with enough knowledge of the hardware to make explicit references to hardware-optimized functions and tools. Finally, the category of RC Engineers encompasses users knowledgeable of the entire tool chain or flow. By contrast, System Designers includes users who require advanced tools to optimize the hardware for specific target architectures. Figure 5 shows a three-dimensional representation of the three taxonomies: tool, use case, and user. The rings represent the level of hardware knowledge required, where the ring farthest away from the center represents an in-depth knowledge of hardware. Analysis of the user survey results revealed the fundamental limitations that exist in the current tools and design methodologies for FPGA-based development. These limitations are described and categorized below based upon the four application development phases: Formulation, Design, Translation, and Execution (FDTE).
V.
TOOL LIMITATIONS AND THEIR ROOT CAUSES
Current FPGA tools are mainly design centric [1] where users often skip the Formulation step altogether. In other words, formulation and design tools are not interoperable but rather disjoint, requiring the user to manually start over in the design stage. Additionally, there is no fluid formulation/design development flow [2], so users are unable to explore the architecture of the target platform [3, 4], and therefore can not describe hardware and software interactions and perform heterogeneous system-level simulative prediction [5, 6]. In conclusion, developers are currently limited to an iterative cycle of design-translate-execute. At the Design level, HLL tools offer tangible productivity but [7, 8] hardware design knowledge is essential for RCaware domain scientists. Although tools are either graphical or text based, they are not always suitable for all applications. The lack of polymorphism does not allow single designs to be used for multiple problem sizes. There is a limited support for reuse and little support for multiple devices. Also, current tools have limited support for portability and scalability [9] and code often must be completely rewritten for different architectures. Also, the integration of cores is difficult due to the lack of standards. Different standards for input/output, data bandwidth sizes, memory bank access types and so on create challenges for design portability. Another important limitation is that existing tools cannot scale to large systems, making it difficult to program in an HPC environment addressing multiple FPGAs and multiple nodes. For example, tools focus on mapping “C” to HDLs, [7, 8, 10, 11] prevents expression of important computational application features such as locality, parallelism and heterogeneity. Moreover, there is a lack of tools supporting HW/SW co-design [12, 13], which means that the effectiveness of manual co-design depends heavily on user’s application partitioning experience. In addition, design tools have evolved upward from low-level design and translation technologies where hardware features added to existing programming paradigms lack control and data abstractions. Furthermore, features [7, 8] like floating point support, dynamic allocation, recursion, variable loops, etc. are missing from HLL tools.
Figure 6. Technological Evolution of FPGAs
Translation tools are mainly compilation/synthesis and PAR tools. PAR times are unpredictable [14, 15] and sometimes exceeds development time [16]. There is currently no alternative to proprietary tools because FPGA chip vendors provide FPGA chips as black boxes [17]. Translation tools are extremely sensitive to coding constructs/practices [18]. Moreover, changes in tool chain can cause unpredictable results and vendor tool upgrades can render working cores rendered infeasible without any design modifications. Also, different approaches exist for the output of this phase: “single binary combining hardware and software” versus “separate software binary and hardware bitstream”. Finally, another challenge is the security of intellectual property, because bitstream encryption techniques are not integrated into design flows for secure reconfigurability. For Execution tools the main limitation is the lack of standard for FPGA subsystem architecture [9]. The number of local memory banks, data width and depth, and the communication between FPGA and microprocessor vary from one platform to another. Moreover, runtime services for HW/SW system integration do not presently exist nor is there operating system support for virtualization. Reverse engineering is also a problem because of the lack of security tools. Additionally, there is not tool support for multiFPGA/multi-node, which is necessary to support online debugging, tuning, and verification for heterogeneous and/or multi-FPGA nodes [19, 20]. Tools do not support synchronization and fault tolerance capabilities, and the same problem exists for dynamic load balancing, work sharing [21] and multi-user mode support. Finally, the configuration time is long and limits the potential benefits from run-time reconfiguration [22]. VI.
CAUSES OF THE FPGA PRODUCTIVIY PROBLEM
The current limitations are symptoms of the FPGA productivity problem and not the root cause. The problem can
be traced back to the incremental and evolutionary growth of tools trailing the advances in hardware capacities and innovations in FPGA devices. Figure 6 shows the evolution of FPGAs along with the concurrent evolution of design tools over the same period. FPGAs in their primordial state were mainly used for glue logic on circuit boards with RTL-level design tools. Advances in device architectures necessitated higher levels of design abstractions such as HDL, HDL+SW, and the present day HDL/HLL. Each evolution of the design tools was stacked ad-hoc on top of lower layers by device vendors to merely enable the use of underlying hardware resources. Tools were never designed for a larger customer base and were built bottom-up from the Translation and Design phases outward on a need-specific basis. Technological evolution and commercial growth of this industry over the past decade has largely been aligned to support wire-line applications of the rising communications and networking industry, the largest customer of FPGA technologies. Market survivability and proprietary closedsource technology have been the chief impediments to broader growth and widespread acceptance in other fields. VII. CONCLUSIONS In this work we conducted an extensive survey for literature, vendors, and DoD users which identified a broad range of applications, missions and needs in the DoD-related community. The results have shown that FPGA technology is becoming very important for the DoD community. Based on those results, taxonomies for both DoD usages as well as for FPGA tools were formulated. The classification was validated and extended to establish a relation between tools and design flow, where many tools cover different development steps. The formulated taxonomies provide a foundation for the characterization of limitations of existing tools based on FDTE flow.
REFERENCES [1]
S. Edwards, L. Lavagno, E.A. Lee, and A. Sangiovanni-Vincentelli, "Design of Embedded Systems: Formal Models, Validation, and Synthesis", Proceedings of the IEEE, Volume 85, Issue 3, March 1997. [2] D. Densmore, A. Sangiovanni-Vincentelli, and R. Passerone, "A Platform-Based Taxonomy for ESL Design", IEEE Design & Test of Computers, Volume 23, Issue 5, May 2006, pp. 359-374. [3] W. Fornaciari, D. Sciuto, C. Silvano, V. Zaccaria, A sensitivity-based design space exploration methodology for embedded systems, Design Automation for Embedded Systems, Kluwer Academic Publishers 7 (12), 2002, pp. 7-33. [4] M. Gries, "Methods for Evaluating and Covering the Design Space during Early Design Development", Technical report, Electronics Research Lab, University of California at Berkeley, UCB/ERL M03/32, August 2003, pp. 189-194. [5] A. Halambi, P. Grun, V. Ganesh, A. Khare, N. Dutt, A. Nicolau, EXPRESSION: A language for architecture exploration through compiler/simulator retargetability, in: Design, Automation and Test in Europe (DATE), 1999, pp.. 485-490. [6] A. Pinto, A. Bonivento, A. L. Sangiovanni-Vincentelli, R. Passerone and M. Sgroi, "System level design paradigms: Platform-based design and communication synthesis,", 2004, pp. 537-563. [7] E. El-Araby, P. Nosum, and T. El-Ghazawi, “Productivity of HighLevel Languages on Reconfigurable Computers: An HPC Perspective”, IEEE International Conference on Field-Programmable Technology (FPT 2007), Japan, December, 2007. [8] E. El-Araby, M. Taher, M. Abouellail, T. El-Ghazawi, and G. B. Newby, “Comparative Analysis of High Level Programming for Reconfigurable Computers: Methodology and Empirical Study”, III Southern Conference on Programmable Logic (SPL2007), Mar del Plata, Argentina, February, 2007. [9] Miaoqing Huang, Ivan Gonzalez, and Tarek El-Ghazawi, "A Portable Memory Access Framework for High-Performance Reconfigurable Computers", Proc. IEEE International Conference on FieldProgrammable Technology (ICFPT'07), Kokurakita, Kitakyushu, Japan, Dec. 12-14, 2007 [10] Jan Frigo, Maya Gokhale, Dominique Lavenier; "Evaluation of the Streams-C C-to-FPGA Compiler: An Applications Perspective"; FPGA 2001, 2001, pp. 1-7. [11] S. A. Edwards, "The Challenges of Synthesizing Hardware from CLike Languages,"IEEE Design & Test of Computers, vol. 23, 2006, pp. 375-386.
[12] J. Plantin, "Aspects on system-level design," /Hardware/Software Codesign, 1999. (CODES '99) Proceedings of the Seventh International Workshop on, 1999, pp. 209-210. [13] A. Antola, "A Novel Hardware/Software Codesign Methodology Based on Dynamic Reconfiguration with Impulse C and Codeveloper,"Programmable Logic, 2007. SPL '07. 2007 3rd Southern Conference on, 2007, pp. 221-224. [14] Li, J.; Cheng, C.-K.; "Routability improvement using dynamic interconnect architecture", IEEE Symposium on FPGAs for Custom Computing Machines, 19-21 April 1995, pp. 61-67. [15] Ping-Tsung Wang, Kun-Nen Chen; "A simultaneous placement and global routing algorithm for an FGPA with hierarchical interconnection structure"; IEEE International Symposium on Circuits and Systems, 1996. ISCAS '96, 12-15 May 1996, vol.4, pp. 659-662. [16] Kannan, P.; Bhatia, D., "Interconnect estimation for FPGAs," Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on , vol.25, no.8, Aug. 2006, pp. 1523-1534. [17] Xingzheng Li, Haigang Yang, Hua Zhong; " Use of VPR in Design of FPGA Architecture", 8th International Conference on Solid-State and Integrated Circuit Technology, 2006. ICSICT '06. 2006, pp.1880 – 1882. [18] S. Sivaswamy, G. Wang, C. Ababei, K. Bazargan, R. Kastner and E. Bozorgzadeh, "HARP: hard-wired routing pattern FPGAs", 2005, pp. 21-29. [19] K. Camera, H. K. So and R. W. Brodersen, "An integrated debugging environment for reprogrammble hardware systems," in Proc. of the 6th International Symposium on Automated Analysis-Driven Debugging (AADEBUG), 2005, pp. 111-116. [20] J. G. Tong, "A Comparison of Profiling Tools for FPGA-Based Embedded Systems,"Electrical and Computer Engineering, 2007. CCECE 2007. Canadian Conference on, 2007, pp. 1687-1690. [21] R. Deville, I. Troxel and A. D. George, "Performance monitoring for run-time management of reconfigurable devices," ERSA, June. 2005, pp. 175-181. [22] E. El-Araby, I. Gonzalez, and T. El-Ghazawi, “Performance Bounds of Partial Run-Time Reconfiguration in High-Performance Reconfigurable Computing”, First International Workshop on HighPerformance Reconfigurable Computing Technology and Applications (HPRCTA’07), held in conjunction with SC’07 Reno, NV, USA, November , 2007.