Improving Flexibility in On-line Evolvable Systems by ... - IfI

10 downloads 69 Views 875KB Size Report
Improving Flexibility in On-line Evolvable. Systems by Reconfigurable Computing . Jim Torresen and Kyrre Glette. Department of Informatics, University of Oslo.
Improving Flexibility in On-line Evolvable Systems by Reconfigurable Computing Jim Torresen and Kyrre Glette Department of Informatics, University of Oslo P.O. Box 1080 Blindern, N-0316 Oslo, Norway [email protected] http://www.ifi.uio.no/∼jimtoer

Abstract. Reconfigurable logic is a promising technology for adaptable systems – often called reconfigurable computing. However, one of the main challenges with autonomous adaptable systems is the flexibility. The paper starts with giving an overview of reconfigurable computing and different approaches to how it can be implemented. Then, we outline how these can be applied in on-line evolvable systems to improve flexibility in the hardware. The challenge of the latter is to include flexibility without re-synthesis and avoid having a too large logic gate overhead. An architecture based on system-on-chip and partial reconfiguration is proposed in the paper.

1

Introduction

Evolvable systems have the potential of becoming important in future autonomous adaptable systems. This could imply that both software and hardware are adaptable. However, commercial dynamic computer systems have so far mainly been based on context switching software – i.e. switching software processes on a processor. However, with the introduction of Field Programmable Gate Arrays (FPGAs) also hardware is able to be modified at run-time. Few embedded systems are designed today without containing one or more FPGAs. The technology has progressed from earlier being used only as glue logic to now also being much applied for fast data processing. However, substituting the configuration at run-time has so far not been much applied. Like software many years ago, the same code/configuration remains static in the device when a system is put in operation. Swapping processes has not yet reached the FPGA application designers. There are naturally some reasons for this including long reconfiguration time and reliability issues. The recent progress of the technology has however reduced these problems and dynamic FPGAs are now more applicable than earlier as will be described in this paper. Much research on reconfigurable computing is not related to evolvable systems. Thus, in this paper we emphasize on presenting the reconfigurable computing alternatives to try improve the applicability of it in evolvable systems. Since much (i.e. the signal routing) of the FPGA configuration bit string coding is secret, it is impossible for an evolvable system to directly change the FPGA content unless re-synthesis is undertaken. Re-synthesis is not

normally applicable and evolvable hardware architectures often end up being regular and with limited flexibility. The challenge is addressed in this paper by introducing some architectures with inherent reconfigurability. In the next section, an overview of the status of alternatives for run-time reconfigurable systems is included. This is continued by a introducing how this technology can be applied in evolvable systems in section 3. Finally, section 4 concludes the paper.

2

Reconfigurable Computing

Processing in run-time reconfigurable hardware systems is often named Reconfigurable Computing (RC). There is not a common unique definition of this expression since there is some variation in the way researchers define it. Below we will give an overview of the scope of reconfigurable computing as most researchers seem to perceive it. It refers to systems incorporating some form of hardware re-programmability. There seems to be three main degrees of reconfiguration: – Static: The configuration within the FPGA is the same throughout the lifetime of the system. This means no adaptivity at runtime. – Upgrade: The configuration is changed from time to time for bug fixes or functional upgrades. This represents rare adaptation. – Run-time: A set of configurations (multi-context) are available which the FPGA switch between at run-time. This could provide several benefits as described below. Most applications are implemented by applying the static approach – i.e. no adaptivity. However, upgrading of systems have recently become more common. This allows the configuration to be upgraded when bugs are found or when the functionality of the system is to be changed. However, several requirements of such a system exist [1]: – – – –

Fallback to old configuration must be possible. Switching in ideally one clock cycle. On-line upgrade over Internet. The configuration bitstream must be encrypted to avoid reverse engineering.

In the future, automatic dynamic products will probably arrive. These could autonomously upgrade the hardware as the environment (or data) changes or when bugs are detected in the system. One promising approach based on this idea is evolvable hardware [2].

The application areas for run-time reconfigurable systems are: – Space/cost/power reduction – Speeding up computation – Incorporating new data/patterns realized in reconfigurable logic If not all functions in a systems are needed at the same time, we can substitute a part of the configuration at run-time as seen in Figure 1. Function A contains the parts of the system that always need to be present. However, part B and C are not needed concurrently and can be assigned to the same resources (location) in the FPGA. An example of such an application can be a multifunctional handheld device with e.g. mobile phone, MP3-player, radio and camera. For most purposes, a user would normally not apply more than one of these functions at a time. Thus, instead of having custom hardware for each function, it could be efficient having a reconfigurable system where only the active function is configured. This would allow for a smaller hardware device which leads to reduced cost and for some systems reduced power consumption. Such benefits are important in a competitive market. One of the benefits of FPGAs is their parallel structure allowing for parallel matching/searching. This is applied in an evolvable classification architecture presented in section 3.

Fig. 1. lllustration of run-time reconfiguration of FPGA.

The application area for run-time reconfiguration for computational speedup is depicted in Figure 2. Rapid swapping between successive configurations can give a RC-based system a considerable throughput compared to having a general

Task A

Static Context switching

Task A.1

CSW Task A.2 CSW Task A.3

Time

Fig. 2. Illustration of a run-time reconfigurable FPGA compared to a static FPGA.

static FPGA configuration. If a task A can be partitioned into a set of separate tasks (A.1, A.2 and A.3 in the example in the figure) to be executed one after the other, an FPGA configuration can be designed for each of them. Thus, each configuration is optimised for one part of the computation. During runtime, context switching (CSW) is undertaken and the total execution time for the task in the given example is reduced. The context switching time would have to be short to reduce the overhead of switching between the different configurations. A run-time reconfigurable device has the advantage that its configuration can be modified according to the application at hand. In this way, it has the potential of achieving an even higher performance than an ASIC [3]. There are differences between the goals for run-time reconfiguration: – Space/power/cost optimisation: • Reconfigure for change in function, protocol, standard etc • Infrequent reconfiguration – Speed optimisation: • Reconfigure within a function or task • Frequent reconfiguration Infrequent reconfiguration would normally be easier to undertake than frequent reconfiguration. Configuration of systems can be further classified into [4]: – Deterministic configurations: Allocations in FPGA can be pre-planned. – Non-deterministic configurations: An operating system is needed to schedule tasks and control reconfigurable logic fragmentation. An operating system for run-time context switching is necessary for non-deterministic configurations. Although some work has been published on such operating systems, it is far from being ready for the commercial market. The main classes of reconfigurable devices available are [3]: – Single context (most commercial FPGAs) – Partially reconfigurable – (e.g. Xilinx Virtex FPGAs) – Multi-context (no commercial FPGAs)

In single context devices, full re-programming is required for any change in the configuration. In partially reconfigurable devices, only the part of the configuration that is changed is written to the FPGA. The undisturbed portion of the device may continue executing, allowing the overlap of computation with reconfiguration [3]. A new configuration should be placed onto the FPGA where it will cause minimum conflict with other configurations already present in the device. De-fragmentation could be necessary to consolidate the unused area by moving valid configurations to new locations. In multi-context devices, there are multiple memory bits for each programming bit location [3], providing context switching in one or a few clock cycles. No major FPGA vendor provides such devices yet. However, some IP cores are available [5–7]. These are typically based on an array of processing elements (PEs) with multiple context layers for switching routing between PEs and program code for the PEs. Context switching is typically undertaken in a single clock cycle. 2.1

Approaches to Reconfigurable Computing with FPGAs

General challenges of run-time reconfiguration in FPGA: – Reducing the long time required for reconfiguration – Avoiding the system from being inactive during reconfiguration (safe and robust reconfiguration) – Interfacing between modules belonging to different configurations – Predictability (reliability and testability) of system operation Since the configuration bit string is serially loaded into the device, the main problem with switching configurations is the long reconfiguration time. At the moment there seem to be three possible approaches; smaller devices, virtual FPGAs and partial reconfiguration. Smaller Devices Since the full reconfiguration time is less for smaller devices, reconfiguration time can be reduced by applying smaller devices. Moreover, by applying context switching, we may be able to implement a full system in a smaller device with the benefit of reduced cost and power consumption. The drawback would be that the system would have to be inactive during reconfiguration. Virtual FPGAs Virtual FPGA is based on designing a “virtual” FPGA inside an ordinary FPGA [8]. We have so far introduced an architecture for context switching based on a multi-context “virtual” FPGA [9, 10]. The architecture provides switching between 16 different configurations in a single clock cycle. Such a system would never achieve as high clock frequency as a leading edge processor. However, by applying massive parallel processing, the execution time can still be less [11]. Even though a fast processing can be achieved, the context switching architecture requires much reconfigurable resources (in that way, this architecture is prioritising speed before cost and power consumption).

Partial Reconfiguration As FPGA devices are getting bigger, the configuration bitstream becomes longer and programming time increases. Thus, runtime reconfigurable designs would benefit from having only a limited part of the FPGA being context switched by partial reconfiguration. This feature is available in some FPGAs where a selected number of neighbouring columns are programmed. This requires detailed considerations for having no interruption at context switching [12]. One approach that we have started to look at is pipelined downloading of configuration bitstream [13].

ROW

Configuration frame 1312 bit high, 1 bit wide (20 CLB high)

Fig. 3. Illustration of partial reconfiguration in Virtex-4 and Virtex-5 devices.

Another challenge is to limit the inter partition data transfer. That is, efficient communication between context switched tasks. While the first FPGAs offering partial reconfiguration required complete columns of the device being programmed, the more recent ones – including Virtex-4/5, require only a part of each column being programmed – see figure 3. This makes interfacing between tasks and having uninterrupted operation easier since some rows can be used for permanent configurations. The smallest Virtex-5 device (LX30) consists of 4 rows while the largest (LX330) consists of 12 rows. Further, there has been introduced tools like PlanAhead that makes partial reconfiguration easier. It is possible to reconfigure the Virtex devices internally using the Internal Configuration Access Port (ICAP). This will be applied in the architecture presented in the next section. There has been undertaken some work on real-time partial reconfigurable systems like e.g. [14–16]. Below we include proposals of how some of the RC principles can improve the flexibility of evolvable hardware systems. This will be targeted at an earlier proposed classification architecture.

3

A Flexible Classifier Architecture

To be able to provide run-time adaptation, some scheme for automatic design is necessary. So far evolution has been a much explored method. However, to provide not too slow evolution, many systems have been based on virtual FPGA implementation rather than FPGA reconfiguration. One of the problems related to virtual FPGA is the large gate overhead especially for routing signals. A common way to implement routing is by using multiplexers. However, these

become large as the signal resolution increases and the size of the architecture increases. To reduce this problem and make systems more scalable, we would in this paper introduce alternative architectures. 3.1

The Original Classification Module

We have earlier developed an architecture that has shown to give a high classification performances both for an image application [17] and signal processing application [18]. The system consists of three main parts – a classification module, an evaluation module, and a processor. The complete system is implemented in a single FPGA. The processor is running the evolution algorithm and configures the other modules. The evaluation module is used for fitness computation and is based on evaluating only a small part of the classifier at a time since incremental evolution is applied. The classification module, however, would have to be complete unless time multiplexing is applied. Therefore this module with its typically large structure is difficult to make online adaptable without a large logic gate overhead (mainly for routing). Thus, in this paper the focus is on the classification module rather than the evaluation module. CLASSIFICATION SYSTEM TOP-LEVEL MODULE

input pattern

CDM1

M A X.

CDM2

D E T E C T O R

category classification

CDMK

Fig. 4. EHW classification module.

The classification module operates stand-alone except for its reconfiguration which is carried out by the processor. The classification module consists of one category detection module (CDM) for each category to be classified – see figure 4. The input data to be classified is presented to each CDM concurrently on a common input bus. The CDM with the highest output value will be detected by a maximum detector, and the identifying number of this category will be output from the system.

CATEGORY DETECTION MODULE

input pattern

FU11

FU12

FU1N

N-input AND

FU21

FU22

FU 2N

N-input AND

FUM1 FUM2

C O U N T E R

output

FUMN

N-input AND

Fig. 5. Category detection module (CDM).

Each CDM consists of M “rules” or functional unit (FU) rows – see figure 5. Each FU row consists of N FUs. The inputs to the circuit are passed on to the inputs of each FU. The 1-bit outputs from the FUs in a row are fed into an Ninput AND gate. This means that all outputs from the FUs must be 1 in order for a rule to be activated. The outputs from the AND gates are connected to an input counter which counts the number of activated FU rows. The FUs are the reconfigurable elements of the architecture. Each FU behaviour is controlled by connected configuration lines (not shown in figure 5). Each FU has all input bits to the system available at its inputs but only one data element (e.g. one byte) of these bits is chosen. One data element is thus selected from the input bits, depending on the configuration lines. 3.2

A Flexible Classification Module

The evolution is undertaken for one or a few FUs at a time, thus, the flexibility would only be needed in the classification module to be applied after evolution. Including flexibility could either be undertaken by including flexibility in the design (virtual FPGA) or by changing the design itself either by partial re-synthesis or by having a number of pre-synthesized configurations. Reconfiguration of the two latter approaches could be undertaken with partial reconfiguration. For most applications, re-synthesis would not be applicable due to the long time needed.

Further, resource planning could be difficult as the flexibility increases. Thus, we would like to look into how flexibility in an architecture can be increased with pre-synthesized configurations. The classification architecture introduced above is based on using predefined values for N and M . To be able to change them, re-synthesis is required. Keeping N and M fixed lead to the most efficient hardware architecture since flexibility could often not be effectively implemented. However, with a system-on-chip implementation, there will typically be a limit on the total number of FUs that can be implemented in the device. On the other hand, since the data set changes over time, it would often be impossible to predict the optimal selection of N and M at design time. The following flexibilities could be explored: – Variance in the number of FUs in each row. – The number of FUs in each row (N) versus the number of FU rows (M). – The total number of units assigned to each CDM could be different for each category. Below, we first present how the variance in the number of FUs in each row could be implemented. Then, we present an architecture with the ability to select the combination of N and M that maximizes the performance. The latter approach could be combined with the first to get variance in the number of FUs among different rows.

input pattern

FU1 FU2

N-input AND

FU3 FU4

N-input AND

FU5 FU6

N-input AND

Fig. 6. Example of flexible connection of FUs to AND gates.

Flexible AND Gate Connections To allow variance in the number of FUs in each row, the architecture in the example in Figure 6 is proposed. Some of the FUs are here connected to several AND gates. Thus, some AND gates may connect to more FUs than other. This would have to be controlled by an input control in each AND gate that allow or block for shared FUs. That is, an FU connected to several AND gates could be input to one unique AND gate

or several. If evolution is undertaken on one AND gate at a time, one unique AND gate connection would be the most appropriate. The more global the FU sharing is implemented, the larger is the flexibility in number of FUs connected to each AND gate. However, this comes at a cost of increased routing overhead (increased use of logic gates). In any case, the processor would have to coordinate the assignment of FUs during evolution.

ONLINE EVOLVABLE SYSTEM TOP-LEVEL VIEW

CPU

EVALUATION MODULE

configuration & training patterns configuration

fitness

ICAP

configuration

input pattern

I N P U T I N T E R F A C E

CDM1

CDM2

CDMK

O U T P U T I N T E R F A C E

M A X. D E T E C T O R

category classification

CLASSIFICATION MODULE

Fig. 7. The complete evolvable system with partial reconfiguration of the CDMs in the classifier module.

Pre-synthesized Configurations The most intuitive approach to applying pre-synthesised configurations would probably be to store a number of configurations with different values for N and M . However, all have the same total number (N ∗ M ) of FUs based on the amount of available logic gate resources in the given device. This could be undertaken by the architecture in Figure 7.

In this architecture, the CDMs are programmable by partial reconfiguration through the ICAP interface. After the processor (CPU) has evolved a new classifier, CDMs are configured with the configuration corresponding to the best found combination of N and M . The Input and Output interfaces (as well as the Max Detector) are static but can be updated with data like the chosen value of N and M . A more active variant would be to change the LUT (look-up table) content available in the partial configuration bit string. However, this could require low level study of the design and low-level configuration. We will look more closely at this as a part of our future work.

4

Conclusions

In this paper, an introduction to reconfigurable computing and different approaches to how it can be implemented have been included. Further, it has been outlined how this technology can be applied in on-line evolvable systems to improve flexibility in the hardware. The challenge of the latter is to include flexibility without re-synthesis and avoid having a too large logic gate overhead. An architecture based on system-on-chip and partial reconfiguration is proposed in the paper which allows for increased flexibility.

References 1. G. Prophet. FPGAs + the Internet = upgradable product. EDN Europe, pages 28–38, November 2000. 2. J. Torresen. An evolvable hardware tutorial. In Proc. of the 14th International Conference on Field Programmable Logic and Applications (FPL 2004), pages 821– 830. Springer Verlag, LNCS 3203, 2004. 3. Katherine Compton and Scott Hauck. Reconfigurable computing: A survey of systems and software. ACM Computing Surveys, 34(2):171–210, 2002. 4. C. Steiger, H. Walder, and M. Platzner. Operating systems for reconfigurable embedded platforms: Online scheduling of real-time tasks. IEEE Trans. on Computers, 53(11):1393–1407, Nov 2004. 5. http://www.ipflex.com. 6. http://www.elixent.com. 7. http://www.pactcorp.com. 8. L. Sekanina and R. Ruzicka. Design of the special fast reconfigurable chip using common FPGA. In Proc. of Design and Diagnostics of Electronic Circuits and Systems - IEEE DDECS’2000, pages 161–168, 2000. 9. J. Torresen and K.A. Vinger. High performance computing by context switching reconfigurable logic. In Proc. of the 16th European Simulation Multiconference (ESM2002), pages 207–210. SCS Europe, June 2002. 10. K. A. Vinger and J. Torresen. Implementing evolution of FIR-filters efficiently in an FPGA. In Proc. of the 2003 NASA/DoD Workshop on Evolvable Hardware, 2003.

11. J. Torresen and J. Jakobsen. An FPGA implemented processor architecture with adaptive resolution. In Proc. of 1st NASA/ESA Conference on Adaptive Hardware and Systems (AHS-2006). IEEE, 2006. 12. Two flows for partial reconfiguration: module based or small bit manipulation, Application Note 290. Xilinx, 2004. 13. J. Torresen. Reconfigurable logic applied for designing adaptive hardware systems. In Proc. of the International Conference on Advances in Infrastructure for e-Business, e-Education, e-Science, and e-Medicine on the Internet (SSGRR’2002W). Scuola Superiore G. Reiss Romoli, 2002. 14. M. Hubner et al. New 2-dimensional partial dynamic reconfiguration techniques for real-time adaptive microelectronic circuits. In Proc. of IEEE Computer Society Annual Symposium on Emerging VLSI Technologies and Architectures (ISVLSI’06), pages 97–102. IEEE Computer Society, 2006. 15. A. Upegui and E. Sanchez. Evolving hardware by dynamically reconfiguring Xilinx FPGAs. In J.M. Moreno et al., editor, Evolvable Systems: From Biology to Hardware, volume 3637 of LNCS, pages 56–65, Berlin Heidelberg, 2005. SpringerVerlag. 16. A. Upegui and E. Sanchez. Evolving hardware with self-reconfigurable connectivity in Xilinx FPGAs. In A. Stoica et al., editor, Proceedings of the 1st NASA /ESA Conference on Adaptive Hardware and Systems(AHS-2006), pages 153–160, Los Alamitos, CA, USA, 2006. IEEE Computer Society. 17. K. Glette, J. Torresen, and M. Yasunaga. An online EHW pattern recognition system applied to face image recognition. In M. Giacobini et al., editor, Applications of Evolutionary Computing, EvoWorkshops2007: EvoCOMNET, EvoFIN, EvoIASP, EvoInteraction, EvoMUSART, EvoSTOC, EvoTransLog, volume 4448 of Lecture Notes in Computer Science, pages 271–280. Springer-Verlag, 2007. 18. Kyrre Glette, Jim Torresen, and Moritoshi Yasunaga. An online EHW pattern recognition system applied to sonar spectrum classification. To be published at ICES’2007.

Suggest Documents