1Dept of Computer Science and Engineering. 2Dept of Computer Science ... this paper, we present an efficient online task placement algo- rithm for minimizing ...
An Efficient Algorithm for Online Soft Real-Time Task Placement on Reconfigurable Hardware Devices ∗ Jin Cui1 , Zonghua Gu2 , Weichen Liu2 and Qingxu Deng1 1 2 Dept of Computer Science and Engineering Dept of Computer Science and Engineering Northeastern University, Shenyang, China Hong Kong University of Science and Technology, China
Abstract
FPGA to refer to PRTR FPGA. HW task scheduling on FPGAs shares many similarities with global SW task scheduling on identical multi-processors, where all processors in the system have identical processing speed and different task invocation instances may run on different processors. But it is actually a more general and challenging problem since a HW task may occupy a different area size on the FPGA while a SW task always occupies one and only one CPU. Similar to CPU scheduling, we can identify several approaches to FPGA scheduling:
Reconfigurable devices, such as Field Programmable Gate Arrays (FPGAs), are very popular in today’s embedded systems design due to their low-cost, high-performance and flexibility. Partially Runtime-Reconfigurable (PRTR) FPGAs allow hardware tasks to be placed and removed dynamically at runtime. Hardware task scheduling on PRTR FPGAs brings many challenging issues to traditional real-time scheduling theory, which have not been adequately addressed by the real-time research community compared to software task scheduling on CPUs. In this paper, we present an efficient online task placement algorithm for minimizing fragmentation on PRTR FPGAs. First, we present a novel 2D area fragmentation metric that takes into account probability distribution of sizes of future task arrivals; second, we take into the time axis to obtain a 3D fragmentation metric; third, we use a look-ahead heuristic to find a task placement in the 3D space/time coordinate system in order to minimize fragmentation. Simulation experiments indicate that our techniques result in low ratio of task rejection and high FPGA utilization compared to existing techniques.
1
1. For soft real-time tasks with unknown arrival times and execution times, online scheduling with optimization goals such as minimizing task rejection ratio while guaranteeing all tasks to meet their deadlines [2] or deadline miss ratio if no task is rejected. 2. For hard real-time periodic tasks: • Priority-driven scheduling with well-known algorithms such as Rate Monotonic (RM) or Earliest Deadline First (EDF). • Static offline scheduling in the time interval with length equal to the hyper-period (least common multiple of all task periods).
Introduction
A reconfigurable device, such as a FPGA, consists of a rectangular grid of Configurable Logic Blocks (CLBs) (referred to as cells in this paper) and the interconnects between them. FPGAs are inherently parallel, that is, two or more HW tasks can execute on a FPGA concurrently as long as they can both fit on it. FPGAs can be 1D reconfigurable, where each task occupies a contiguous set of columns, or 2D reconfigurable, where each task occupies a rectangular area. Partially RuntimeReconfigurable (PRTR) FPGAs, such as the Virtex family of FPGAs from Xilinx [1], allow part of the FPGA area to be reconfigured while the remainder continues to operate without interruption. In other words, HW tasks can be placed and removed dynamically at runtime. In this paper, we use the word
We have addressed the hard real-time case in our previous work [3, 4]. In this paper, we focus on the soft real-time case, i.e., online scheduling with the optimization goal of minimizing task rejection ratio. The FPGA reconfigurable area form a rectangle of width W and height H. When a task arrives at runtime, the operating system needs to find empty space on the FPGA to accommodate the newly-arrived task. There are mainly three approaches for managing the empty space on a FPGA device: maintaining a list of Non-Overlapping Rectangles, a list of Maximal Empty Rectangles (MER), or a Vertex List, each with its pros and cons. A MER is defined as an empty rectangle that can not be fully covered by any other rectangle. For example, the FPGA in Figure 1 contains three MERs: A with size 3×6, B with size 7×3, and C with size 10 × 2. When a task arrives at runtime, if it is accepted, one MER among all MERs is chosen, and one of the
∗ This work was partially supported by National Basic Research Program of China (973 Program) under Grant No.2006CB303000 and the Cultivation Fund of the Key Scientific and Technical Innovation Project, Ministry of Education of China, under Grant No.706016.
1
four corners of the chosen MER is in turn chosen to place the task if the MER area is greater than the task. Each task Ti is characterized by a tuple of five parameters, (Wi , Hi , Ei , Di , Ai ), where Wi and Hi stand for its width and the height; Ei and Di and Ai stand for its execution time, deadline and arrival time. Tasks are non-preemptive, i.e., a task runs to completion once placed on the FPGA. The scheduler, placer and FPGA area manager are part of an OS running on a CPU. When new tasks arrive, the OS acts as a dispatcher and decides whether it is accepted or rejected based on its deadline and the occupation condition of the FPGA area, and if accepted, when and where to place it on the FPGA. In this paper, we address the problem of choosing the MER for task placement to minimize the FPGA area fragmentation, and in turn, minimize task rejection ratio for online scheduling. Unlike CPU scheduling, where task context switch overhead is often small enough to be ignored, task reconfiguration on FPGAs carries a significant overhead in the range of hundreds to thousands of milliseconds that is proportional to the size of area being reconfigured. We assume that no gaps are allowed between a task’s reconfiguration stage and its execution stage, so we can treat the reconfiguration overhead as part of the task execution time. It would be interesting to consider separate reconfiguration and execution stages in the placement algorithm, which is left as part of our future work. We further assume that the entire FPGA area is uniformly reconfigurable without any pre-configured cells, and each task can be flexibly placed anywhere on the FPGA area as long as there is enough empty area to contain it. In practice it is common to pre-configure some cells of the FPGA area for dedicated purposes such as memory, and application tasks cannot be placed on these cells. This situation can be easily handled in our approach by denoting these cells as always in use. This paper is structured as follows. We first discuss related work in Section 2. We then present the details of our task placement algorithm in Section 3. We conduct extensive simulation experiments to evaluate the performance of our algorithm compared to existing techniques in Section 4, and finally we discuss conclusions and future work in Section 5.
2
Related Work
Online scheduling algorithms of tasks on multiprocessors or packets in networks have been well-studied. See [5] for a comprehensive survey. Yoo [6] studied efficient online algorithms for task allocation to 2D mesh architectures. This problem bears some similarities to the online scheduling and space management problem for FPGAs, but real-time constraints are not considered. Steiger [2] proposed heuristic online scheduling algorithms with ahead-of-time planning for minimizing task rejection ratio of a taskset with random arrival times while guaranteeing all accepted tasks meet their deadlines. Handa [7] proposed an integrated online scheduling and placement methodology by maintaining empty area as a list of maximal empty rectangles, and delaying scheduling decisions as much as possible
to accommodate dynamically changing task priorities. None of these authors considered the 2D area fragmentation explicitly. Handa [8] proposed a fragmentation metric to quantify the degree of fragmentation for each MER by summing up the contribution of each empty cell. The Fragmentation Contribution of a Cell (FCC) and the Total Fragmentation Contribution of a Cell (TFCC) are defined as: ( 1 − L vx−1 ifvx ≤ Lx x F CCCx = 0 otherwise ( F CCCy =
1− 0
vy Ly −1
ifvy ≤ Ly otherwise
T F CCC = F CCCx + F CCCy where Lx (Ly ) is twice the average width(height) of the tasks being placed, and vx (vy ) is the number of consecutive empty cells in the horizontal(vertical) dimension of the current cell. Intuitively, TFCC is small if there are a large number of empty cells around the current cell, and the average size of tasks being placed is small, and vice versa. The Total Fragmentation (TF) of a MER is defined as the average value of TFCC for all the cells in the MER. At runtime, a task is placed in one of the four corners of the MER with the largest TF, and the corner is chosen to maximize the TF of the task-sized rectangle. Tabero [9] proposed another fragmentation metric when using Vertex List to manage free space. They measure the fragmentation for each “hole” of empty cells after task placement, which may have more than four corners, instead for each MER: F = 1 − πi [(4/Vi ) ∗ (Ai /AF )] where Vi is the number of vertices of the hole, Ai is the hole’s area size, and AF is the total size of the free area. This metric penalizes the proliferation of holes as well as holes with irregular shapes and small sizes, e.g., it is perhaps a bad idea to place a task at a location where the remaining free area is cut into two separate holes, compared to placing it at another location where the remaining free area stays as a single hole. Tabero [10] later developed a task placement heuristic based on 3D adjacency and look-ahead, where all possible task placement locations are ranked to prefer the location where the task sits next to the borders of the free area for as long as possible. The 3D adjacency metric is calculated for all candidate locations at both the current time and the next time point when another task finishes, and the task is placed at the location and time that minimize the metric. This technique is shown to minimize the fragmentation metric in [9] during each simulation run. However, the fragmentation metric itself is not used for task placement. In this paper, we propose a novel fragmentation metric based on MERs that take into account probability distribution of the width and height of future task arrivals instead of just the average value in [8]. In addition, we take into account the time axis to minimize the Time-Averaged Area Fragmentation (TAAF) during the execution time of the task being placed, and use
look-ahead to pick the placement location that minimizes the TAAF among all candidate locations at both the current time and the next time point when one or more tasks finish.
3
Task Placement Algorithm
In this section, we present our heuristic algorithm for task placement. We first introduce the concept of Cell Fragmentation (CF) in Section 3.1 for each cell to measure its contribution to overall fragmentation of the FPGA based on its location on the FPGA and probability distribution of width and height of the next arriving task. We then introduce the concept of Area Fragmentation (AF) of a MER derived from the CF of all cells in the MER in Section 3.2, and Time-Averaged Area Fragmentation (TAAF) of a MER in a time interval as the average value of its AF over a time interval in Section 3.3. We introduce an optimization based on look-ahead task placement to further reduce fragmentation in Section 3.4. We present the overall task placement algorithm in Section 3.5, and the algorithm complexity analysis in Section 3.6.
3.1
(a)
(b)
Cell Fragmentation (CF)
(c)
Figure 2. Updating the Fragmentation Matrix.
Figure 1. The FPGA at time t. Fig. 1 shows a snapshot of the FPGA area with size 10 × 6 at a time instant t. Each cell C is labeled by a tuple (vx , vy , tf ). For an empty cell, vx and vy denote the number of contiguous empty cells in the horizontal and vertical directions, respectively, and tf is always 0. For an occupied cell, vx and vy denote the number of contiguous occupied cells in the horizontal and vertical directions that will become empty at the same time, and tf denotes finish time of the task that is occupying it. Instead of formally presenting the algorithm for updating the Fragmentation Matrix upon each task addition and deletion, we use an example to illustrate the procedure. Fig. 2 shows the sequence of steps taken to update the Fragmentation Matrix as tasks are placed on a hypothetical 5 × 1 FPGA. As we can see, the update procedure is incremental and efficient.
We use two arrays of probability values to represent the possible width and height of the next arriving task. Consider a FPGA with width W and height H. The lower left corner of the FPGA area has coordinate (1, 1), and the upper right corner has coordinate (W, H). In other words, the column number grows from 1 to W from left to right, and the row number grows from 1 to H from bottom to top. We divide W into M segments with possibly unequal lengths, and each segment Sw (m) is a set of consecutive column numbers such that: 1 ≤ m ≤ M ≤ W, 1 ∈ Sw (1), W ∈ Sw (M ) We define an array of probability values: PW = {pw (m)|1 ≤ m ≤ M } where pw (m) denotes the probability that the next arriving task Ti has width Wi such that Wi ∈ Sw (m), i.e., its width falls within the mth width segment. (In this paper, all probability values are limited to two digit precision and multiplied by 100 to avoid floating point arithmetic. A division by 100 (x/100) is implemented with a bit shift operator x