Operating System Support for Online Partial ... - Semantic Scholar

44 downloads 10021 Views 540KB Size Report
The software for the management of reconfiguration can be developed either as a .... these function calls can be used by user applications in order either to ...
Operating System Support for Online Partial Dynamic Reconfiguration Management Marco D. Santambrogio, Vincenzo Rana, Donatella Sciuto Politecnico di Milano Dipartimento di Elettronica e Informazione Via Ponzio 34/5 20133 Milano, Italy {santambr,rana,sciuto}@elet.polimi.it

ABSTRACT One of the main characteristics of reconfigurable embedded systems is their ability to be dynamically modified to be adapted at run-time to the current environment. This feature, that makes it possible to change the functionality of a system while it is up and running, requires a software application that is able to handle the reconfiguration process. The software for the management of reconfiguration can be developed either as a standalone application, that has to be specifically designed for each given system, or within an Operating System, in order to fully exploit both code reuse and code portability. This paper proposes a novel methodology for the design of dynamically reconfigurable systems in which the reconfiguration management is completely assigned to an Operating System reconfiguration support. Finally, a prototype implementation is presented, where a standard Linux operating system has been extended with the proposed Operating System support in order to handle dynamically reconfigurable hardware resources.

1.

INTRODUCTION

Nowadays, most commonly used reconfigurable devices are Field Programmable Gate Arrays (FPGAs), employed both as a component of a more complex system (playing the role of a co-processor), and as System-on-ProgrammableChip (SoPC), integrating all system components. Therefore, the possibility of hardware reconfiguration has to be added to the design flow as a new relevant degree of freedom. This enables the designer to create systems that autonomously modify their functionalities according to varying requirements. Although there are several techniques to exploit partial reconfiguration, there are only a few approaches that deal with frameworks and tools (e. g. [1–4]) to design dynamically reconfigurable SoPC (e. g. [5,6]). Examples of such frameworks are the operating systems for reconfigurable embedded platforms which have been analyzed in [7]. In [8] authors have presented a run-time system for dynamical on-demand reconfiguration. Several research groups, [9–11] have built reconfigurable computing engines to obtain high application performance at low cost by specializing the computing engine to the computation task; some preliminary results can be found in the literature, [9, 12–14], but no general framework and no publicly available tools are, at the best of our knowledge, available. Due to capabilities described above, FPGAs can be used to

create hardware/software platforms that keep their flexibility at runtime, allowing the development of SoPCs. Modern FPGAs can also contain a general-purpose processor, which can be both a physical CPU embedded in the FPGA fabric, or a soft core, mapped to a part of the FPGA. The last years have seen a growing interest in the introduction and adaptation of General Purpose Operating Systems (GPOS), such as the GNULinux kernel, to embedded systems and the use of specific micro-kernels. The use of a fully-featured operating system introduces some fundamental advantages and enhancements, but also increases the SW complexity of the system, presenting new issues in resource management. One of the most important features an OS can provide is the exploitation of reconfigurable resources from different processes through multitasking and multiuser capabilities. Modern FPGAs have a reconfigurable area large enough to allow mapping of a considerable number of functionalities, which might be made available to different processes at the same time, exploiting the intrinsic HW parallelism. Additionally, the OS must provide a completely free task-toresource mapping, similarly to what happens with normal HW resources (memory, IO interfaces) of the system. In [15] Wigley et al. have presented a discussion on the components of an operating system for a reconfigurable computer. These components are equivalent to the ones of a standard operating system. Instead of managing processes, they handle HW tasks, which are mapped on the reconfigurable hardware architecture. The discussed operating system, yet not fully implemented, is nevertheless an important starting point for the development of a complete run-time system for the management of HW tasks, capable of exploiting multitasking and multiprocessing capabilities offered by parallel allocation of different tasks on a single FPGA reconfigurable area. The novelty introduced by the proposed work can be found: • in an extensive and complete support for the self reconfiguration of SoC architectures via an extension of a standard operating system like GNU/Linux. Allowing the software processes to exploit a rich set of features by simply using the developed Linux modules. • the design of an Operating System solution able to support and manage the reconfiguration both in a SoC [5] or in Multi-FPGAs scenario [16]. Due the MultiFPGAs scenario it is mandatory to design a solution able to support both an internal and an external re-

configuration; This work is organized as follow, Section 2 describes the proposed methodology, presenting the Linux operating system support extension. A set of experimental results are shown in Section 3. Finally, Our conclusions are summarized in Section 4.

2.

RECONFIGURATION MANAGEMENT

The idea behind the proposed methodology is based on the assumption that it is desirable to implement a flow that can output a set of configuration bitstreams used to configure and, if necessary, partially reconfigure a standard FPGA to realize the desired system. One of the main strengths of the proposed methodology is its low-level architectural independence. The proposed approach is based on the Linux OS, that has been extended with both a reconfiguration support, introduced in Section 2.1, and a centralized reconfiguration manager, presented in Section 2.2, that is able to handle dynamic reconfigurability. Finally, Section 2.3 shows the proposed communication channel to access the dynamically configured hardware modules.

2.1

Reconfiguration support

Since a direct interaction between user applications (that need reconfigurable modules in order to speedup their execution time) and reconfiguration processes can affect both the reliability and the performance of the reconfigurable system, one of the main requirements and characteristics of the proposed OS reconfiguration support is to decouple the userside applications from the system processes that have to be executed to perform a single reconfiguration. This makes it possible to achieve the following benefits in the design of a reconfigurable system: • simplification of software calls that are necessary to perform reconfiguration: module request, module release, module removal and the modules list; these function calls can be used by user applications in order either to request or to remove a module from the reconfigurable system, or to get the list of the modules that are currently configured on the reconfigurable side of the system; • increment of code reuse and portability, since high-level reconfiguration calls do not contain any information about its low-level implementation. In this way it is possible both to reuse the same code (or the same portion of code) in different situations (for instance in a system with or without a DMA controller) and to port it on different hardware platforms, without the need to change its implementation; • support of different low-level implementations of the OS reconfiguration support, since it is possible to implement the same reconfiguration task in several different ways, for example by following different cache policies or different allocation mechanisms, and to choose at runtime the most suitable solution for each particular scenario. The only constraint is that each implementation has to satisfy the standard interface defined for each reconfiguration function.

2.2

Reconfiguration Manager

The Linux OS has been extended with a centralized reconfiguration manager in order to support and manage both external and internal reconfigurations. The choice of a centralized manager makes it possible to exploit the potential of several policies i.e., cache policy, allocation policy [17]. The Cache policy represents the way in which cached modules are managed. When a module is no longer in use it is possible to perform either an hard-removal or a softremoval in order to delete it (module removal ). The hardremoval, which sets the reconfigurable process state to removed, configures the slots occupied from the unused IPCore (a single reconfigurable module) with blank modules, removing physically all the logic of the deleted module. On the other hand the soft-removal, which brings the reconfigurable process to the configured state, leaves the FPGA configuration unaltered, but it performs a logic removal by deleting all the information associated with the deleted modules. Another alternative way in which it is possible to manage a module removal is to keep both the module configured on the reprogrammable device and its information, while setting its status as cached (module release). In this way the cached module can be assigned to other applications that require an IP-Core of the same kind of the deleted one. This approach brings to a remarkable improvement of performance, since it introduces the possibility to satisfy a module request without performing any physical reconfiguration. The Allocation policy aims at defining how to place a module requested with a module request function call. Implementing this kind of policy allows the application of well known algorithms in order to maximize the number of IPCores that it is possible to configure at the same time on the reprogrammable device. This can also be seen as a reduction of the number of refused modules, that are modules that cannot be placed on the device because there is no more space available for them. The main concept that is necessary to follow while implementing allocation policies is the minimization of the device fragmentation. Hence, for each requested module, it is necessary to find the minimum set of consecutive free slots in which it is possible to configure the module itself, in order to avoid the breaking of larger groups of free slots into several smaller groups. Once the location has been identified we need to select the bitstream that is able to perform the desired reconfiguration. There are two possible ways in which this selection can be executed. The first one is suitable for a scenario in which for each module position there is a different configuration bitstream. In this case the allocation layer searches the right collection of bitstreams for the desired IP-Core family and then selects the bitstream that corresponds to the place chosen during the allocation phase. The second way is suitable when the system contains a component (the relocation component) that is able to modify a bistream in order to shift its position within the FPGA [18]. In this case the allocation layer has to select the base bitstream, that is the only bitstream that represents the whole IP-Core family. This information is then used to setup the relocation component, that performs the shifting of the base bitstream to the desired position [18]. In this way it is possible to obtain a new bitstream with which it is possible to configure the desired module in the position selected.

2.3

IP-Cores devices access

Once that an IP-Core has been configured on the reprogrammable device, there is the need to establish a communication channel between the OS and the module itself. This channel is used by the OS to serve read and write from an IP-Core, since software applications cannot directly access the configurable hardware. The best way to achieve this goal is by following the standard Linux philosophy, that proposes the implementation of device drivers. Each IP-Core family is managed by the same device driver, so the number of device drivers loaded by the OS at any time corresponds to the number of families of IP-Cores that the OS is able to handle. Each device driver is able to distinguish a hardware module from another one of the same family by its memory address space, since it is unique for each module. The development of a centralized and automatic reconfiguration manager implies the development of a mechanism to dynamically manage this kind of drivers. Device drivers needed to handle the configured IP-Cores have to be dynamically loaded, and when no more modules of a given family are present on the FPGA, the corresponding device driver has to be unloaded. In order to allow user-side applications to access IP-Cores, the OS provides them with a collection of devices, located in the /dev/ directory. Each different device corresponds to a different IP-Core, so each set of devices of the same family has to refer to the same device driver. A device is characterized by its major number and its minor number. Each IP-Core family is represented by the same major number, that corresponds to a specific device driver, while the minor number distinguishes between different IP-Cores of the same type. Finally, to avoid direct calls to these devices, it is useful to develop a collection of user-side drivers, that allow user applications to indirectly access devices.

3.

Table 1: Hardware reconfiguration latency Columns 4 8 12 16

Latency (µs) 1469.88 2920.12 4370.36 5820.60

load time changes insignificantly if embedded multipliers or BlockRAMs are used. If also the BlockRAM contents have to be written during reconfiguration, an additional 1054.72 µs apply per BlockRAM column. This scenario assumes that no data compression is used for the partial bitstreams and thus gives worst case times. On the other hand, there is the time overhead caused by the Operating System configuration support.The first task to be executed is the startup, that initializes all data structures and prepares the Daemon to accept configuration requests; it takes around 500 µs, but it is necessary to perform it just once, when the Daemon starts. The second task is the device drivers setup, that loads the correct driver and initializes all necessary devices for a specific module; it takes around 650 µs and it is executed once for each kind of module. Module loading time is different if the requested module is cached or not; in the first case it takes around 2500 µs, otherwise it takes around 3450 µs. To be more precise, the module used to calculate these results is 4 columns wide. Finally, reading and writing from and to a configured module takes around 3.6 µs to read 4 bytes and 2.7 µs to write 4 bytes. Figure 1 shows a comparison between the software (introduced by the operating system support) and the hardware (due to the underlying hardware architecture) overhead for a single read (A) or write (B) operation. In the worst case, using

EXPERIMENTAL RESULTS

This section aims at presenting a set of experimental results that prove both the effectiveness and the quality of the proposed methodology. The proposed operating system support, built upon GNU/Linux, has been deployed on the RAPTOR2000 board [19] in order to test a Multi-FPGAs scenario.

3.1

Operating System Support Evaluation

The performance of the whole architecture is mainly affected by both the latency introduced by the partial reconfiguration and the overhead caused by the Operating System support. The latency introduced by the partial reconfiguration consists of the following parts: first, a static time that is required to initiate the DMA transfer of the partial configuration bitstreams from the SDRAM to the configuration interface, plus the time required to initialize the configuration interface of the FPGA and to flush the configuration buffer at the end of the configuration. Second, the time needed to download the bitstream on the FPGA. This time depends on the size of the reconfigurable hardware module. The static time is 158 clock cycles before reconfiguration and 824 clock cycles for buffer flushing after reconfiguration. Moreover, the number of clock cycles needed to reconfigure one CLB column of the used Xilinx Virtex-II FPGA (XC2V4000) is 18.128. Table 1 shows the reconfiguration time, where the reconfiguration clock period is 20 ns, introduced by the hardware for typical module sizes. These modules only use CLB columns. The down-

A

B

Figure 1: Comparison between the software and the hardware overhead for a single read (A) or write (B) operation a module that is just 4 columns wide, the hardware reconfiguration latency is around 1500 µs, while the same reconfiguration performed through the Operating System takes less than 4000 µs (included the delay introduced by socket communication).Figure 2 shows the percentage of the software overhead, introduced by the operating system support, when 4 (A), 8 (B) or 12 (C) columns wide module is loaded on the reconfigurable device. As shown in Figure 2, the software overhead rapidly decreases when the width of the reconfigurable module that has to be loaded in the system increases. Furthermore, considering the scenario where the requested module is cached, independently of its size, the performance can be considerably improved, since the whole reconfiguration process takes a constant time of 2500 µs. A significant improvement can be achieved by pre-fetching the

Software overhead

Software overhead

40 % 43 %

60 %

Software overhead

33 %

57 %

67 %

A

B

C

Software overhead

Software overhead

Software overhead

Figure 2: Percentage of the software overhead when a 4 (A), 8 (B) or 12 (C) columns wide module is loaded on the reconfigurable device A

B

C

execution of the Daemon, in order to perform the allocation of free resources for a new module before the actual request of the module itself.Figure 3 shows how the software overhead varies when a 4 (A), 8 (B) or 12 (C) columns Software overhead Software overhead Software overhead wide module is loaded on the reconfigurable system, using the pre-fetching technique. In particular, Figure 3 shows that the pre-fetching technique allows greater reduction of the software overhead with the size increase of the reconfigurable module. The pre-fetching technique allows the immeA

B

C

Software overhead

Software overhead

Software overhead

A

22 %

30 %

43 %

70 %

57 %

B

78 %

C

Figure 3: Percentage of the software overhead when a 4 (A), 8 (B) or 12 (C) columns wide module is loaded on the reconfigurable device, using the prefetching technique diate start of the hardware reconfiguration each time that a module is requested through the socket communication. Using wider modules, it is also possible to completely hide the software overhead due to the Daemon, since the hardware reconfiguration latency grows linearly with the module size, while the Daemon overhead remains constant. For these reasons, the overall reconfiguration latency is mainly affected just by the following two parameters: a static socket communication latency (that is obviously fixed for a given Operating System); and a dynamic hardware reconfiguration latency (that depends on the size of the module that has to be reconfigured).

4.

CONCLUDING REMARKS

This paper proposed a novel methodology where the reconfiguration management is completely assigned to an Operating System reconfiguration support. Finally, a prototype implementation is presented, where a standard Linux operating system has been extended with the proposed Operating System support in order to handle dynamically reconfigurable hardware resources. Future developments will consider the operating system as a centralized manager to handle requests from different applications, which can exploit enhanced features.

5.

REFERENCES

[1] Xilinx Inc. Two Flows of Partial Reconfiguration: Module Based or Difference Based. Technical Report XAPP290, Xilinx Inc., November 2003.

[2] Xilinx Inc. Early Access Partial Reconfiguration Guide. Xilinx Inc., 2006. [3] Gerard Habay Philippe Butel and Alain Rachet. Managing partial dynamic reconfiguration in virtex-ii pro fpgas. Xcell Journal Online, 2004. [4] P. Sedcole, B. Blodget, T. Becker, J. Anderson, and P. Lysaght. Modular dynamic reconfiguration in virtex fpgas. Computers and Digital Techniques, IEE Proceedings-, 153(3):157–164, 2006. [5] Alberto Donato, Fabrizio Ferrandi, Marco D. Santambrogio, and Donatella Sciuto. Coperating system support for dynamically reconfigurable soc architectures. In IEEE-SOCC, 2005. [6] Ryan J. Fong, Scott J. Harper, and Peter M. Athanas. A versatile framework for fpga field updates: An application of partial self-reconfiguation. rsp, 00:117, 2003. [7] Christoph Steiger, Herbert Walder, and Marco Platzner. Operating systems for reconfigurable embedded platforms: Online scheduling of real-time tasks. IEEE Trans. Computers, 53(11):1393–1407, 2004. [8] Michael Ullmann, Michael H¨ ubner, Bj¨ orn Grimm, and J¨ urgen Becker. An fpga run-time system for dynamical on-demand reconfiguration. In Proc. of the 18th International Parallel and Distributed Processing Symposium, 2004. [9] Mihai Budiu Srihari Cadambi Matt Moe Seth Copen Goldstein, Herman Schmit and R. Reed Taylor. Piperench: A reconfigurable architecture and compiler. In Computer. [10] Guangming Lu Nader Bagherzadeh Fadi J. Kurdahi Eliseu M. C. Filho Ming-Hau Lee, Hartej Singh and Vladimir Castro Alves. Design and implementation of the morphosys reconfigurable computing processor. In J. VLSI Signal Process. Syst. [11] Heiko Kalte and Mario Porrmann. REPLICA2Pro: Task relocation by bitstream manipulation in Virtex-II/Pro FPGAs. In Proc. of the ACM International Conference on Computing Frontiers, 2006. [12] Edson L. Horta, John W. Lockwood, and David Parlour. Dynamic hardware plugins in an fpga with partial run-time reconfigurtion. pages 844–848, 1993. [13] Eliseu Filho Rafael Maestre Ming-Hau Lee Fadi Kurdahi Hartej Singh, Guangming Lu and Nader Bagherzadeh. Morphosys: case study of a reconfigurable computing system targeting multimedia applications. In Proceedings of the 37th conference on Design automation (DAC00). ACM Press. [14] Edson Horta and John W. Lockwood. Parbit: A tool to transform bitfiles to implement partial reconfiguration of field programmable gate arrays (fpgas). Washington University, Department of Computer Science, Technical Report WUCS-01-13, July 2001. [15] Grant Wigley and David Kearney. The first real operating system for reconfigurable computers. austcsac, 00, 2001. [16] Omitted for blind review. [17] Markus Koester, Heiko Kalte, and Mario Porrmann. Task placement for heterogeneous reconfigurable architectures. In Proceedings of the IEEE 2005 Conference on Field-Programmable Technology (FPT’05), 2005. [18] Marco Novati Marco D. Santambrogio-Donatella Sciuto Fabrizio Ferrandi, Massimo Morandi. Dynamic reconfiguration: Core relocation via partial bitstreams filtering with minimal overhead. In International Symposium on System-on-Chip 06, 2006. [19] Mario Porrmann Heiko Kalte and Ulrich Ruckert. A prototyping platform for dynamically reconfigurable system on chip designs.

Suggest Documents