scheduling and placement. These services have to be fulfilled in real time according to the application constraints. Moreover, such an operating system also ...
Exploring RTOS issues with a high-level model of a reconfigurable SoC platform François Verdier, Jean-Christophe Prévotet, Amine Benkhelifa
Daniel Chillet, Sébastien Pillement
ETIS – UMR CNRS 8051 ENSEA / Université de Cergy-Pontoise
IRISA ENSSAT / Université de Rennes I
Abstract— Reconfigurable resources are more and more envisaged inside System-on-Chip designs for facing with embedded computing power constraints. Moreover, a high level of flexibility of such platforms is required and can only be achieved with embedded software and real-time operating systems (RTOS) services. Unfortunately, this leads to very complex and heterogeneous platforms that cannot be easily validated nor modeled with classical design tools. This paper presents the objective and the proposed methodology of the OVERSOC project. This project aims at developing a global methodology and the associated tools for a design space exploration of interactions between embedded real-time operating systems and reconfigurable SoC platforms.
I. I NTRODUCTION Nowadays, algorithmic complexity tends to increase in many domains such as signal, image processing or control. In parallel, embedded applications require a significant power of calculation in order to satisfy demanding real time constraints. This leads designers to opt for hardware architectures composed of heterogeneous, optimized computation units operating in parallel. Hardware components in SOC (System on Chip) may implement programmable computation units, dedicated or reconfigurable units, or even dedicated data-paths. In particular, reconfigurable units (e.g FPGA, DART [1] . . . ), denoted here as Dynamically Reconfigurable Accelerators (DRA), allow an architecture to adapt to various incoming tasks. The Platform-Based Design methodology [2] may thus be fully respected, adding flexibility to an architecture towards modifications of its functionality or even the environment (changing standards, mission statements, etc.). This flexibility is provided by the dynamic allocation of different and dedicated processing operators within the DRA. On the other hand, such heterogeneous architectures need even more complex management and control. In this context, the utilization of an RTOS (Real Time Operating System) is more and more required to furnish services such as communications, memory management, task scheduling and placement. These services have to be fulfilled in real time according to the application constraints. Moreover, such an operating system also provides a complete design framework which is independent of the technology and of the hardware architecture, drastically reducing the time to market. Embedded RTOS for SOCs has been of great interest and has led to several significant studies. In the context of reconfigurable architectures, a study of [3] has determined and classified the different services that operating systems
should provide to handle reconfigurability. Today, two different approaches have emerged. The first consists in utilizing an existing standard RTOS (RTAI [4], RTLinux, VxWorks, . . . ) and in adding functionalities dedicated to the management of the reconfigurable resources [5], [6]. The second is to develop a specific RTOS from scratch by implementing the necessary functionalities devoted to the management of the reconfigurable part [7], [8], [9]. Successful design of such complex and heterogeneous reconfigurable systems cannot be achieved when starting from scratch. A number of design choices have to be done prior any implementation. We advocate for utilizing a high level model of Reconfigurable SoCs (RSoC) in order to explore critical design choices. Among these important choices are the software architecture of an embedded RTOS (centralized, distributed, OS services organization), the interactions between OS service functions and underlying resources (reconfigurable parts, memory, interconnect busses) and the software programming model. The OVERSOC project presented in this paper aims at developing an open framework for an early exploration of these issues. An UML model of a reconfigurable SoC has been first proposed. The OVERSOC project consists in developing a SystemC-based modeling and exploration environment for RTOS vs. RSoC assessments. Remaining sections of this paper are organized as follows. Section II gives an overview of academic works in modeling SoC designs, reconfigurable hardware and embedded software. Section III briefly introduces the architecture of the concerned RSoC platform and Section IV gives the previously developed corresponding abstract model. Section V discusses the OVERSOC project and methodology. Section VI finally gives directions of future work. II. R ELATED WORKS Some recent works have described modeling tools for SoC design space exploration. Synopsys contribution to communication modeling in SystemC has led to the development of an interesting Transaction Layer Model (TLM) class library. TLM communication channels help refining communication in SoC models by inserting, in a progressive way, timing delays and information. [10] uses these models and describes a methodology for SoC architectures exploration. A real-time scheduling and OS architecture simulation framework is described in [11]. This project mainly extends
the SystemC 2.0 class library for its usage in real-time scheduling simulation. Classes are described for modeling dynamic process creation, process control, task scheduling and preemption. This environment has been used for the simulation of a mobile robot application [12]. Since then, some useful dynamic process creation primitives have been added to SystemC (v2.1) [13] that can considerably help software modeling. Modeling the behavior of a DRA is still very complicated due to the lack of dynamic hardware module creation primitives in the SystemC simulation kernel. The ADRIATIC project [14] addresses this problem and proposes a solution for modeling dynamically reconfigurable parts. This project uses a static approach for modeling dynamic IPs: all possible hardware tasks are statically created and encapsulated in a single SystemC module representing a reconfigurable fabric. Dedicated methods are used for managing input and output channels and hardware context switching. However, this project does not address the problem of how dedicated OS services could handle these reconfigurable fabrics. Some recent works have also explored the problem of managing data and control communications between dynamically allocated IPs within a reconfigurable device. Dedicated functionalities and topologies of dynamic interconnection networks are presented in [15] and [16]. These works are based on a two-dimensional allocation of dynamically reconfigurable resources. One can notice that in the case of a 2D mapping approach, an additional complex defragmentation should be envisaged and has also been studied [17]. III. A RCHITECTURE OF A RS O C PLATFORM It is assumed that SoC architectures are heterogeneous. The wide variety of hardware elements of such architectures is depicted figure 1. GPP
....
GPP
DSP
µCont.
Interconnection network
MEM
Dedicated accelerator
I/O Proc. Core
configuration
Fig. 1.
IP#1
IP#2
DRA1 (FPGA)
process #1
process #2
DRA2 (Data−Path)
Example of a highly heterogeneous reconfigurable SoC platform
The hardware elements composing such RSoC can be summarized as follows: • One or several general purpose processors (GPP). One processor is dedicated to executing the main part of the operating system. • One or several specialized processors (RISC, DSP, microcontroller). • Hardware accelerators or processing IPs such as filters, convolvers, VITERBI decoders, etc.
Memory resources and input / output interfaces. One or several DRAs as described later. • A set of communication resources (busses, hierarchical busses, communication network). A specific network for the transport of configuration information may also be necessary for parallelization between data and configuration transfers. It is also assumed that the processing operators have different execution models (sequential, parallel, pipeline, data-flow, etc). The main difference between a classical SoC architecture and a reconfigurable SoC platform resides on the existence of reconfigurable areas. Recent evolutions of both reconfigurable technologies and platform-based design concepts have ensured the availability of such synthesizable reconfigurable IPs. Basic structures and grain of these IPs depend on the application: • Fine-grain reconfigurable areas (FPGA-like) for control and bit oriented processes (ATMEL FPSLIC structures [18] for example). • Coarse-grain areas for multimedia oriented applications mainly implemented with arithmetic computations (PACT XPP [19] architecture). • Mixed solutions associating FPGA areas and reconfigurable data-paths (DART architecture [1] for example). The support of dynamic allocation of multiple dedicated processes on these reconfigurable accelerators is the common characteristic of the considered architectures. In this context, dynamic allocation means that the starting time of processes is unknown at compile time. This typically will appends in ambiant intelligent applications for example. • •
A. Application partitioning Due to their complexity (multimedia, telecommunication, networking), applications are decomposed into multiple elementary tasks. These applicative tasks represent some computation without any knowledge about the effective execution targets. In order to take into account the heterogeneous structure of the target architecture we have proposed a general model based on the implementation of each applicative tasks into software and/or hardware tasks (see figure 2). A software task is a portion of code (instructions) that can be executed by one of the RSoC processors. This processor may be one of the physically available processing nodes (GPP, RISC, DSP, etc) or a soft-processor core previously configured within a DRA. A hardware task is a dedicated hardwired process directly supported by a physical IP or a hardware module configured within a DRA. B. Application mapping on the RSoC The global software architecture is organized around the operating system. One of the OS services is responsible for mapping (i.e. allocating and placing) the different software and hardware tasks onto the available targets. Following the previous architectural and software descriptions, different tasks mapping levels can be defined on the RSoC platform. In the
Application
multi target compilation
•
T1 Synthesis target 1 target 2 target 3
T2
IP
T3 hard/soft partitioning
T4
Architecture exploration
HW SW or
T5
IV. M ODELING THE PLATFORM
HW
Task graph
SW
longing to this level (dedicated synthesized IPs, processor cores, filters). Configuring a hardware task within the DRA requires a specific download of configuration data (the H arrow in figure 3). The model features multiple configurations for a unique hardware task offering this task different forms and performances. The level 2 contains the set of software tasks to deploy on the GPP or on the processing cores within the DRA. Allocating a task on such processor is considered as a context switching (the S arrow). When the target is a processor within the DRA, an additional hardware configuration step may be necessary (it corresponds to the configuration of the processor core itself). As for the hardware tasks, a software task may come in a variety of compiled codes according to the different processor targets. By doing so, heterogeneous migration of tasks is feasible.
Fig. 2. The RSoC model supports different implementation schemes for applicative tasks (synthesis of hardware modules, hard/soft partitioning, multiple software compilations and different implementations solutions).
example illustrated in figure 3, a minimal RSoC platform composed of a GPP associated with a DRA is under concern.
A graphical model based on UML has been chosen for a sake of clarity in the description of all the elements in the RSoC architecture: the applicative tasks, the hardware and software elements and the interactions between them. The corresponding UML class diagram is depicted in figure 4. The presented model guarantees the independence between the application specification and its hardware implementation. The model is divided into three layers. A. The application layer
Task B SW 1
S Task D HW
Task E HW
Task C SW 1/2
Level 2 (Software Tasks)
S
Task A
S
Proc. Core
Level 1 (Hardware Tasks)
H H H
Processor Level 0 (Physical)
H Form 1 Form 2
Memory
Dunamically Reconfigurable Accelerator
Fig. 3. entities
The different mapping levels depend on the different processing
The different mapping levels are decomposed as follows: • The physical level (level 0) corresponds to the physical architecture of the RSoC. The set of targets contains the DRA, the general purpose processor, a communication channel (not shown in Fig. 3) and memories. This level is composed of resources (memory, data bandwidth, reconfigurable and CPU time) that can be allocated to the applicative tasks. • The level 1 corresponds to the hardware tasks utilizing the physical level resources. In particular, all of the hardware objects to be configured on the DRA constitute tasks be-
The application layer describes the applicative tasks (ApplicativeTask class). In this layer, software designers represent applications without taking into considerations the hardware implementation. This representation implies that the designer has previously broken up the application into tasks and is able to describe each one of them along with their dependencies (cf. figure 2). This problem is out of the scope of this paper and we assume in the following that this decomposition has already been performed [20], [21]. An applicative task is specified through the typical parameters used in the real-time context: identification, priority, period, termination time, ready-time and whether the task is preemptible or not. Control and data dependencies are also specified in this layer. In particular, tasks can produce and consume data to share information and to store internal temporary variables as well. These tasks properties are explicitly described in this model (DataAccess class). The amount of memory needed by each task is indicated in the Data class. In this layer the designer may also specify the Quality of Services (QoS class) associated with each tasks. QoS can be defined as the purpose of matching some constraints (realtime, bandwidth or power consumption) and must be adapted by the designer to a particular application. Obviously, these constraints can be respected only if lower layers elements can support particular execution schemes. QoS specification is out of the scope of this paper.
, i P it id , Com for UML ommun nity Edit idon for Edition n Pose for UML seidon UML, C n Pose Commu y Editio oseidon y Editio ition Po r UML, idon for , Comm mmunit P it d e fo o n E s n u C n o y , o io m P it L it n id m u M e for UML o Ed Edition seidon , Comm on for U UML, C ion Pos munity o L it r id m P d M e fo o E s U n C n o y r , o io P it it n id n n fo L, Com for UML n Pose unity Ed Commu y Editio Poseido n for UM oseidon r UML, , Comm mmunit ity Editio Edition Poseido dition P ML, Co eidon fo for UML ommun E s U n C n o y r , o io P it L it fo n id d n u M e n E s U m Com y itio ido L, Com ition Po r UML, idon for mmunit n Pose unity Ed n Pose unity Ed ML, Co eidon fo n for UM y Editio s , Comm o U io m o L it r id m P d M e fo o E s U n C n o y r , o itio nit id n P idon fo L, Co for UML n Pose unity Ed Commu y Editio n Pose n for UM oseidon r UML, , Comm mmunit ity Editio ty Editio Poseido dition P ML, Co eidon fo for UML ommun E s U n C n o y r , o io P it L it fo n id d n u M e n s io m Co yE ido n for U L, Com ition Po r UML, nity Edit mmunit n Pose Poseido n* SucceededBy unity Ed ML, Co eidon fo Commu n for UM s , o U io m ity Editio o L it r id m P d M e fo o E s U n C n o * ity r io P ido ML, ,C eidon fo ommun nity Edit Edition n Pose on for U for UML UML, C ion Pos munity Commu ApplictiveTask Poseid ity Editio oseidon L, Com ition r UML, Data DataAccess nity Edit idon for mmunQoS P d M e fo o E s U n C n o y r , o io P it L it fo n id d n E UM Pose eidon −Size:int 1..* ommu −Id:int y Editio −RealTime:float dition UML, C munity idon for unit−Pipeline:boolean E −Priority:int UML, C ion Pos L, Com idon for * n Pose Comm1..* n for (read munity −AverPower:float / write) nity Edit 1..*om −Period:int n Pose C n for UM oseidoTransfer y Editio r UML, , o io P +AccessCost():float it L it fo n id d n u M e n E s o U io m o − InstantPower:float y it r id , P fo om −Deadline:int nit se ity Ed for UML Edition seidon UML, C mun+ReadData():void Commu ition Po −AppQuality:float −ReadyTime:int munity oseidon L, Com ition Po r UML, +WriteData():void idon for m P d unity Ed M e fo o E s U n C n o y r , o io 1..* P it L it fo n id n u e n +ComputeQoS():void nity Ed * n for UM , Comm −−Type:int UML, ion Pos Poseido ity Editio Preemption:boolean Commu layer ,Application Poseido 1..* for UML ommun nity Edit idon for Edition for UML Edition seidon UML, C munity n Pose Commu n o y r , o io P it L it fo n id d n u M e n E s o U io m o it r P ity Poseid L, Com unity Ed eidon fo for UML ommun Edition Edition n for UM seidon , Comm UML, C ion Pos munity munity Poseido L, Com ition Po for UML nity Edit idon for d n u M e n ControleExecutionOf E s o U io m o y it r id m P it d e L fo o n E s n u C Po ity n for UM oseidon r UML, , Comm ity Editio ommun Edition Poseido dition P eidon fo for UML ommun E s UML, C munity n C n o y r , o io P it L it fo n id d n u M e n E s o U * y id Po Editio , Comm idon for * mmunit n Pose for UM Edition munity for UML n Pose ML, Co y Editio seidon , ComTransport munity seidon on for U RTOS Layer mmunit PoSoftwareTask ity Editio L, Com ition Po for UML n id d n u M e n E s VirtualMachine HardwareTask o U io m o y it r id m P it d e * fo o n n u Execute Store ity E L, C n Pos seidon n for UM y Editio r UM , Comm ommun −LoadUsage:float −Load:float −ResourceUsage:int y Editio ition Po mmunit Poseido eidon fo for UML UML, C mmunit * unity Ed ML, Co Edition seidon U−Form:float[] ion Pos m o y it r idon for m P it d e fo o n E s n u C n +AllocLoad():void o y , o io m P it it rU n id n Com for UML n Pose unity Ed eidon fo +ChangeForm():void Commu +FreeLoad():void y Editio r UML, OS o* seidon y Editio* r UML, , Comm on fo−AnArchi:RSoC ion Pos mmunit P it L it fo n id d n u M e n E s o U io m o y it r id m P it d fo o n E n n Pose for U seidon UML, C munity Commu * y Editio −TheTasks:ApplictiveTask ity Editio oseidon L, Com ition Po r UML, idon for mmunit ommun dition P n Pose unity Ed ML, Co eidon fo n for UM E s o U io m o y it r id m +Schedule():void P it d e fo o n E s n u C n o y , U m itio ido Execute unit UML ion P +CompactFormServices():void L, Com idon for Task n Pose unity Ed CommProtocol nity Edit idon for n Pose n for UM oseConfigurationContext r UML, , Comm ity Editio Commu Poseido +MemoryServices():void ity Editio dition P eidon fo for UML ommu−nState:int n E s n u C n o y , o io m P it L it r n id +InterconnectServices():void m d n u M e fo o U io m n yE ,C −Priority:int −CfgTime:float ion Pos L, Com nity Edit idon for mmunitOSTask Poseido for UML nity Edit * 1..* n Pose−CfgSize:int u−InterMigration:boolean ML, Co eidon+PowerEstimationServices():void Commu n for UM +Send():void Edition +PerfEstimationServices():void , Comm −IntraMigration:boolean Poseidon for U ion Pos munity Poseido ity Editio L it n m d n u M o E +Receive():void U io m C − Localisation:void y it r , o m it d L fo o ComposedOf +QoSServices():void id n for un ion nity E 1..* −NbCycles:int seidon UML, C n Pose Comm+xxxServices():void n for UM nity Edit Commu oseido+DVSServices():void y Editio ition Po +SaveContext():void L, Comm−uBreakPoints:int[] r UML, idon for P it d e fo n E s n u n o y 1..* o io m P it it n id m d +StartTaskService():void fo E se UM ion mu +RestoreContext():void L, Co −attribute_4:int ition Po Execute seidon , Com mun*ity Execute nity Edit idon for n for UM Support L, Com ition Po for UML nity Ed ido+StopTaskService():void n Pose Commu n Pose +ContinueTaskService():void unity Ed Commu n for UM oseidon y Editio , o io m P it L it n id m d n u M e o E s U io m C n o y it r , fo om nit ido f +OtherServices():void n P unity Ed for UML seidon UML, C n Pose Commu y Editio Execute , Comm oseidon y Editio ition Po r UML, idon for mmunit P it d e fo o n E s n u C n o y , o io m P it L it n id u e om Edition unity Ed n for UM seidon , Comm UML, C ion Pos munity , Comm Poseido ition Po for UML nity Edit idon for L, Com for UML Edition seidon n Pose unity Ed Commu n o y , o io m P it L it n id m d n u M e o E s U io m C n o r , P ity om * y Edit Poseido eidon fo for UML ommun Edition UML, C mmunit seidonArchitecture layer munity Edition UML, C ion Pos munity o it r idon for m P d ML, Co e fo o E s n C n o y , o io P it it n id n SoftwareTarget ProcessingTarget UML L, Com n Pose HardwareTarget unity Ed eidon Commu y Editio idon for n for UM ose1..* y Editio r UML, , Comm ion Pos mmunit −Load:float −Vdd:float[] −GlobalForm:float[] * mmunit Poseido dition P ML, Co eidon fo for UML o nity Edit E s n u C n o y , o io m P it L it n id m d n u M e − Frequency:float[] o E s U m y ,C n Po +AllocForm():void +AllocLoad():void y Editio L, Com idon for mmunit Poseido for UML −Power:float[] ity Editio mmunit n Pose ML, Co n for UM +FreeLoad():void +FreeForm():void Edition seidon ML, Co ommun on for U munity Poseido ity Editio L, C−Preemption:boolean ition Po n id m d n u M e o E s U io m C o y it r , m P it d L fo Connect o n ido n u C ity E n Pose n for UM * 1..* oseidon Include r UML, , Comm ity Editio ommun Poseido ity Editio dition P eidon fo for UML ommun n E s UML, C n u C n o y , o io m P it L it n id m d n u M e o U C yE omm n Pos ity Editio r UML, idon for mmunit Poseid UML, C y Editio ommun n Pose ML, Co eidon fo Edition idon for mmunit UML, C , CoTarget on for U ion Pos munity PoseMemoryTarget ity Editio L it n id m d n u M e o E s U io m C o y it r , m P it ExecuteTasksOn d fo i , Co ion mun ity E −Hierarchy:int for UML seidon n Pose * L L, Com for UM ommun nity Edit InterconnectTarget ition Po oseidon ity Editio seidon UML, C Commu n for UM dition P n Po−Size:int unity Ed ML,+Alloc():void eidoInclude ommun E s U io m C o y it r , m P it d −o Size:int L fo Connect n E n u M e C n y s , U m itio nit ido Po +AllocMemory():void +Free():void L, Com for UML +AllocBandwidth():void idon for n Pose unity Ed Commu Edition n Pose n for UM oseidon r UML, , Comm munity P+FreeMemory():void ity Editio Connect Poseido ity Editio L, Com for UML ommun n +FreeBandwidth():void Edition n u M C n y , o U io m it L it r n id m d u M e fo o s n Pose C yE for U , Comm oseidon y Editio ition Po r UML, idon 1..* mmunit or UML mmunit dition P n Pose unity Ed ML, Co eidon fo o 1..* E s U io m C o y it r , m P it d L fo o n E n u M C n s , U io ido unity , Comm ition Po for UML nity Edit idon for n Pose , Comm for UML seidon n Pose unity Ed Commu y Editio or UML oseidon y Editio ition Po r UML, , Comm mmunit P it RSoC d L fo o n E n u M C n y , o U io m it o L it r n id m P d u e fo o E UM n Include seidon , Comm UML, C ion Pos munity y Editio idon for L, Com ition Po for UML nity Edit idon for mmunit n Pose n Pose unity Ed ML, Co Commu n for UM y Editio , o U io m it L it Include r n id m d u M e fo o E s U m C n o y r , ion Po fo nit ido n P , Com for UML nity Edit seidon n Pose Commu y Editio for UML Commu oseidon y Editio ition Po r UML, , mmunit P it d L fo o n E n u M C n y , o U io m it L it r n id P e om UM mu y Ed eidon fo Edition UML, C ion Pos L, Com idon for mmunit ion Pos munity nity Edit idon for n Pose ML, Co n for UM Posemodel L, Com for Udiagram nity Edit n u M Commu n y Editio , o U io m it L it r n id m P Fig. 4. Class of the general RSoC d u M e fo o E s U m n C n r , m Po ity Poseido ity Editio ML, Co eidon fo for UML ommun Edition ommun Edition seidon on for U UML, C ion Pos munity C o y it r , m P it d L fo o n E n u M C n y , o U io m it L it r id mmun L, Com n Pose unity Ed eidon fo n for UM Edition ML, Co n for UM , Comm ion Pos munity Poseido ity Editio on for U Poseido L, Com for UML ommun nity Edit Edition n u M C n y , o U io m it L it r n id m d u M e fo o E s n C Po for U omm unity of the RSoCn fo seidon r UML, ity Editio target and isPophysically linked eidonlayer omm Edition UML, C B. The architecture eido operating system ommun Edition UML, C ion Pos munity don for UML, C ion Pos munity L, Com it r nity Edit idon for m d u M e fo o E s U m C n o y r , o io m P it L it fo o id It also supports n n Ed n UM munall elements in othe UML, C n Pose the execution Comto n forplatform. munity Poseidolayer contains ity Editio The architecture don for r UML, Poseid ity Editio L, Com foobjects ommunall the hardware n Edition n u M C n y , o U io m it L it r n id m d u M e fo o s U of software tasks. The auxiliary processorsPosgroup C yE ion omm eidon contains ition Po (level 0 of r UML, idon for mmunit nity Edit UML, Cthat are physically inunthe ity Edplatform n Pose implemented ML, Co eidon fo Edition Commu idon for , Comm on for U UML, or all other software targets ion Pos processing munity (for exampleongeneral ity Editio L it r n id m d u M e fo o E s U m C o y r , m P it itio fo n Co includes memory and Poseid for UML Edition seidon the processing unity Ed Commu y targets, r UML, figure 3). It Edition oseidon such as DSPs, y micro-controllers, ition Po r UML, , Comm dedicated architectures etc.). mmunit P it d eidon fo L fo o n E n u M C n y , o U io m it L it r n id m d fo UM yE resources. n Pose ML, Co Commu , communication oseidon y Editio idon for mmunit y Editio on for U software tasks ion Pmay itand for UML support sometimes mmunit n Pose ML, CoThese targetsn also mmunit Poseid seidon unity Ed ML, Co Co(ProcessingTarget n for Uare y Editio , o U io m it L it r n id m d u M e fo o E s 1) Processing Targets: They class) U m C n o y r , L it fo ido n P , Com r UM mmunit a micro-kernel fo require for multitasking. n Pose unity Ed Co oseidon y Editio for UML ion Pparts, oseidon r UML, , Comm mmunit elements (Softwareity Editio seidon divided the processor-like ity Edittwo dition P uninto ML, Co eidon fo for UML ommun E s U m C n o y r , o m P it L fo o n id d n u M e C n s , U io yE i oseido , Comm ition Po for UMLTarget class) nity Edit idon forclass: The hardware mmunit ion P the hardware targets (HardwareTarget for UML b) HardwareTarget processing tarn Pose unity Ed ML, Co Commu nclass). oseidon y Editand , o U io m it L it r n id m d u M e fo o E s U m C n o for unity seido , Com UML, ion P ommare nity Ed Editsupply n Pothe n for groups: Poseid for UMLThese two parts beondefined , by different voltages, gets divided into the DRA and dedicated seidotwo UML, C ionmay munity Commu y Editio oseidon L Com ition Po r UML, nity Edit idon for mmunit d u M e fo o E s U m C n o y r , o m P it L fo o n id Ed L, C ition idon frequencies. The Pose of , modules. Commu n for UM A DRA consists in a dynamically ion reconfigurable munity Poseand ity Edrepresentation n for UMpower consumptions Poseido L, Com for UML ommun nity Edit Edition n u M C n y , o U io m Poseido it L it r n id m d u M e fo o E s U m C n r unitygrain of which n fotake these into account such L,area, L, Comparameters allows ition Po to the structure r UML, Poseido of the eidoto focorresponds ommthe nity E y Edtechniques n for UM Edition seidon UM C ion Pos Commu mmunit munity Poseido L, Covoltage ition Po The proposed r UML, nity Editscaling of the idon for mmodel d u M e fo o E s U m C n o y as the dynamic supply (Dynamic Voltage r , o m P it L fo available operators. allows to consider o n id n u n L, C r UM n Pose unity E , Comm Poseido ity Editio eidon fo n for UM , Comm ity Editio for UMLall forms of ommun Edition ion Pos Poseido Scaling). for UML and furnishes ommun a generic seoverview Editgranularity seidon UML, C munity C n o y r , o m P it L fo o n id n u M C n , o U io m o L id ity for y Edit L, Com dition P n Poseclass: In theCocurrent n for UM ommun oseidonEach DRA m mmunit model, the id unity E n for UMof reconfigurable a) SoftwareTarget RSoC may implement several , UML, C ion Pareas. Poseido ity Editio Pose o L, Com for UML ommun nity Edit idon for n u M e C n s , o U io m o L it r id m P d M e fo o n E s U n u ity Po L, C nity The other proatmuleast. tasks according The dedoseidon to its available y Editio r UMhardware idon for architecturenitfeatures , Comm itresources. om Edition a main processor dition P n Pose eidon fo for UML ommun E s UML, C mu y C n o y r , o m P it L fo o n id n u M e C n s , o U m Po ditio in the plateid not play a particular r UML ones) icated have ML, Communit L, Commodules are Poptimized idon for for a given n Posdo unity Erole Edition ityfunctionality, eidon fo cessors (auxiliary n ose n for UM , Comm ity Editio ommun on Pos on for U Poseido ity Editio L, Cwithin for UML ommun n id n u M e C n s , o U io m o L it r id m P d form management. The main processor is the main processing access to communication channels the chip but cannot M e fo o s n uni C yE rU oseidon ition Po r UML, , Comm mmunit ity Editio eidon fo dition P unity Ed ML, Co eidon fo for UML ommun E s on Pos U m C n o y r , o m P it L fo o n id n u M e C n s U io ido r UML, , Comm ition Po nity Edit idon for mmun n Pose eidon fo for UML n Pose unity Ed ML, Co Commu y Editio ion Pos oseidon y Editio r UML, , Comm on for U mmunit P it L fo o n id n u M e C n s , o U io m o L it r id m P d fo o n E se n for UM seidon UML, C munity Commu y Editio ition Po seidon L, Com ition Po r UML, idon for mmunit unity Ed tion Po n Pose unity Ed ML, Co eidon fo n for UM s , Comm o U io m o L it r id m P d M e fo o E s U n u C n o y r , m itio fo nit ido n P L, Com for UML n Pose unity Ed Commu oseidon y Editio n for UM oseidon r UML, , Comm mmunit ity Editio dition P Poseido dition P ML, Co eidon fo for UML ommun E s U n C n o y r , o io m P it L it fo n id m d n u M e n o E n for U oseido , Comm UML, C ion Pos munity ity Editio Poseido dition P L, Com for UML ommun nity Edit idon for
be reconfigured. Examples of dedicated modules are typically predefined IP cores. 2) Memory resources: The architecture layer model also takes into account the memory organization (MemoryTarget class). Notably, memory hierarchy constitutes a key challenge in RSoCs [22]. Several studies are currently made in order to handle these aspects and should lead to a significant improvement of the corresponding part in the model, in a near future. 3) Communication resources: In the architecture layer, the role of communication resources is to define the hardware support for transactions between the different elements within the platform. A first type of interconnect deals with the communication between the main processor, the dedicated modules and the DRA. This type of interconnect is static and optimized during the design phase. The second type concerns interconnections between hardware modules within a DRA. It constitutes a real challenge and one of the most delicate aspects of the model. Indeed, the interconnections must structurally and functionally adapt according to the different configurations and consequently guarantee flexibility. This key point has motivated several studies on the implementation of NoC (Networks on Chip) for Reconfigurable SoCs [23], [24], that provide flexibility and quality of services. C. RTOS layer The RTOS layer contains all the objects which are scheduled by the OS i.e. all the tasks and their characteristics related to the hardware. At this level, the interactions between an application and an implementation on a reconfigurable platform are specified. This level also describes the different configuration tasks, the communication protocols that could be used and the OS which manages all these objects. 1) Tasks and their execution supports: Tasks (Task class) are the main elements of the RTOS layer. In this level a task may be considered in three different ways: (SoftwareTask class, HardwareTask class) and a third type to define the OS tasks (OSTask class). Software tasks (SoftwareTask class) run on RSoC processors (SoftwareTarget class). They are classical tasks scheduled by the OS on these targets. Hardware tasks (HardwareTask class) can either run on the DRA (HardwareTarget class) or on the dedicated modules that are present in the platform. These tasks are mainly efficient implementations of applicative tasks. In addition, some particular hardware tasks can execute software tasks. Examples are processor cores configured within the DRA (Altera NIOS, Xilinx MicroBlaze. . . ) or programmable data-paths. They are denoted as virtual machines (VirtualMachine class). The virtual machine, at least, contains a micro-kernel devoted to manage the machine, to start the execution of the loaded tasks and to indicate their state. This micro-kernel may be straightforward (a state machine for small virtual machines) or complex (a local RTOS layer for software processors). In the second case, this local kernel is also in charge of the scheduling and the management of the local resources.
Moreover, each hardware or software tasks require a configuration context step (ConfigurationContext class) in order to be executed on a target. When a task executes on a processor, this context is similar to the context of a classical software task (program counter, stack pointer. . . ). In the case of a hardware task, the context can be more complex and contains the whole configuration sequence of the hardware target (e.g. a FPGA bitstream). 2) Communication protocols: The RTOS layer exhibits the link between the data transfers and the hardware target (Protocol class). The data are transferred via memory accesses which are managed by the protocol. 3) OS: The operating system (OS) with its set of services belongs to this layer. Each service is represented by a particular system task (OSTask class). The OS services related to the management of the platform have been identified. Some of them are slightly modified due to the presence of DRA. Others are exclusively dedicated to the management of the reconfigurable areas. In the following, we outline the main services of an OS for RSoCs: a) Tasks selection: The proposed model permits to specify the execution support (hardware or software) of some applicative tasks. The OS scheduler dynamically selects an execution support according to the RSoC operating state (CPU charge, timing constraints, amount of free resources on a DRA, etc). b) Tasks migration: A task may be authorized to migrate from a software or a hardware target to an equivalent target (IntraMigration) or to a target of a different type (InterMigration). In this case, it is assumed that preemption of software and hardware tasks is possible. Software preemption is straightforward. However, preemption of hardware tasks involves defining behavioral breakpoints (with the corresponding context storage) [6]. c) Tasks placement and configuration: If the choice of the OS has led to the placement of a task on a DRA, a specific service is invoked. For example, if a software task has to be executed on a virtual machine, then the configuration phase of this machine may be avoided if it is already present on the DRA. d) Resource management: As for a classical processor, an OS for a RSoC must maintain a resources occupation table. Resources are dynamic in a RSoC due to the fact that they evolve according to the performed reconfigurations. Additional details have to be foreseen on the amount, the shape and the location of the occupied DRA resources. e) Tasks communication: This major service is essential. Two different types of communications must be handled: global RSoC communications through the static interconnection network and local reconfigurable communications within the DRAs. Indeed, the dynamic reconfiguration of operators within the DRA implies both the utilization of a flexible interconnect and the presence of dedicated interfaces. Moreover, some of these interfaces are already standardized and all the communication mechanisms are rigorously specified in several
protocols (c.f. VCI [25], OCP [26], etc). V. T OWARDS A GLOBAL RS O C SIMULATION
Absract application description − tasks graph − latencies, periods, priorities
(a)
Embedded software modeling − tasks services, scheduling − TLM communication modeling
(b)
Specific RTOS services modeling − hardware mapping − task migration − reconf. ress. management
(c)
METHODOLOGY
When summarizing the two previous descriptions of both highly heterogenous hardware architecture and a complex set of software services needed to manage the platform, it becomes very clear that designing such systems with classical specification and development tools will be infeasible. The need for very high abstraction levels, very early validation methodologies and complete hardware/software development suites is evident. In particular, the numerous interactions between RTOS services, applications tasks and hardware resources must be explored and evaluated prior to any deployment process. Developing a global hardware/software/middleware design space exploration environment becomes a critical challenge. Moreover, such environment must fulfill additional constraints: • Design reuse: time-to-market issues imply that previously designed and verified hardware IP blocks, software parts, processor cores and busses could be rapidly inserted without recoding. Commonly adopted formats and description languages must be privileged. SystemC has been identified as a good candidate for a modeling language as well as for its underlying high level simulation kernel. • Software and hardware exploration: RTOS issues and software evaluation must be possible within this environment. This advocates for a methodology based on multiple modeling views of a system (Fig. 6). Among the possible views are: (i) embedded software and RTOS services modeling, (ii) reconfigurable hardware modeling and (iii) flexible interconnect modeling. • Real-time constraints: timing issues should be modeled during the earliest stages of the exploration. This could be done by refining timing information simultaneously with hardware refinement. We present hereafter our project aiming at developing a global RSoC modeling and exploration environment. Objective of this work is to propose to system developers a high level modeling environment for early exploration of RTOS for RSoC issues. Central to our methodology are the interactions between reconfigurable fabrics, interconnect resources and OS services. The proposed modeling methodology is based on the SystemC methodology and is inspired by the UML model described previously. The methodology described Fig. 5 follows a top-down approach from an early abstract application modeling down to a final architectural description of the platform. Several successive refinement steps have been already identified in the OVERSOC methodology: abstract multi-task real-time application modeling (step (a)), embedded software modeling in step (b), reconfigurable hardware dedicated RTOS modeling in step (c), reconfigurable hardware and interconnect modeling in step (d) and platform dependent modeling in step (e). Objects related to the architecture layer are modeled with SystemC primitives. Memory, interconnect and hardware
Reconf. hardware modeling − hardware tasks − virtual machines
(d)
Flexible interconnect modeling − bandwidth allocation − dynamic IP mapping
Target architecture modeling − processing targets − hardware targets, ...
(e)
Fig. 5. Refinement process from an abstract software modeling view down to a hardware implementation
targets can be first described at a relatively high abstraction level. However, the description level of processor cores should be adapted for taking into account the execution of C code. For example, a cycle-accurate model of a MIPS processor could be envisaged during the last refinements steps. We argue that the main originality of this project resides in including the whole aspects of reconfigurable platform modeling in the same environment. Merging steps (b) and (c) that are currently studied separately constitutes the key challenge of our proposed approach. Figure 6 illustrates some of the interactions between a task model, the RTOS model and reconfigurable and interconnect targets. Labels indicated (left side) refer to the successive refinement steps described previously. The following discusses two specific points of this project: RTOS services and DRA modeling and refinement. A. Exploring RTOS services issues Supports for dynamic creation of processes have been recently added to the SystemC simulation kernel [13]. It is thus now possible to simulate user-defined or application-dependent scheduling methods within a complete hardware/software SystemC model. OS services that we propose to explore using this modeling environment can be classified in two categories. The first category corresponds to classical existing services that need to be redesigned for taking into account the reconfigurable targets. An example of such services is the task management function responsible in the tasks migrations. At a very high abstraction level (i.e. before having any knowledge about the architecture
HardwareTarget::DRA
Task (a)
Task context switch
Task
Arbitration context load/save
Task Abstract channels OS service I/F HardwareTask
HardwareTask
HardwareTask
OS service methods
(b) Scheduling
(c)
Mapping
Communication
Refinement process
Drivers
Fig. 7. Modeling dynamically reconfigurable tasks with static modules and context switch methods
OS model
(d)
Fig. 6.
DRA model
Interconnect Busses Protocols
RSoC modeling architecture within OVERSOC
partitioning), this OS service can be modeled as a classical scheduling function. New services dedicated exclusively to the management of DRAs belong to the second category. Services responsible in the management of communication resources within the DRAs strongly depend on architectural choices. They are good examples of services that can be modeled during subsequent architecture refinement phases. The modeling environment must then favor this progressive refinement methodology. B. Modeling dynamically reconfigurable fabrics Modeling the dynamically reconfigurable accelerators is a key point of our methodology. Configuring and reconfiguring a given hardware task within a DRA implies several memory and bus transfer operations. These overhead depend on the type of DRA (i.e. fine grain or coarse grain). An early evaluation of such operations is an essential factor in the design space exploration. Among these operations are obviously: • Context transfer between the context memory and the DRA, • Context switch between different hardware tasks within the DRA, • Context save and restore if hardware preemption is available for a given technology, • Bus, interfaces and ports arbitration within the DRA. Unfortunately there is a severe lack of industrial and academic languages and simulation tools adapted to the modeling of dynamic creation of hardware modules. A way to cope with this problem could be a static encapsulation of all hardware tasks inside a single DRA model as used in the ADRIATIC project [14] (Fig. 7). This modeling can be done with SystemC without the need of any modification in the language constructs neither in the simulation kernel. Moreover, this leads to a valuable modular modeling flow. All technology-dependent and DRA-specific information are stored as member variables and handled by local methods.
VI. F UTURE WORK The project presented in this paper is in its first state. We have first defined the global organization of a reconfigurable SoC platform and identified the major critical services a dedicated RTOS should offer. We have then proposed a formal description of software and hardware elements constituting this platform. This UML class diagram still needs to be refined and completed with precise interactions between all of the heterogeneous elements. We have also studied several available academic works in the field of RTOS and reconfigurable hardware modeling that could, in our opinion, participate in the development of a global RSoC exploration project. Reconfigurable technologies are now available and make this RSoC exploration framework a reality. Based on an ObjectOriented methodology and using a C++-based hardware description langage, this framework will be open and will permit high level optimisations and refinements on both RTOS and reconfigurable architectures. R EFERENCES [1] R. David, S. Pillement, and O. Sentieys, “Energy-Efficient Reconfigurable Processors,” in Low Power Electronics Design, ser. Computer Engineering, Vol 1, C. Piguet, Ed. CRC Press, August 2004, ch. 20. [2] K. Keutzer, S. Malik, A. R. Newton, J. M. Rabaey, and A. SangiovanniVincentelli, “System-Level Design: Orthogonalization of Concerns and Platform-Based Design,” IEEE Trans. on Computer-Aided Design of Intergrated Circuits and Systems, vol. 19, no. 12, pp. 1523–1543, December 2000. [3] O. Diessel and G. Wigley, “Opportunities for Operating Systems Research in Reconfigurable Computing,” Advance Computing Research Center, School of Computer and Information Science, Univ. of South Australia, Tech. Rep. ACRC-99-018, August 1999. [Online]. Available: citeseer.nj.nec.com/diessel99opportunities.html [4] RTAI, http://www.rtai.org/. [5] V. Nollet, J.-Y. Mignolet, T. Bartic, D. Verkest, S. Vernalde, and R. Lauwereins, “Hierarchical Run-Time Reconfiguration Managed by an Operating System for Reconfigurable Systems,” in International Conference on Engineering of Reconfigurable Systems and Algorithms (ERSA), Las Vegas, USA, June 2003, pp. 81–87. [6] V. Nollet, P. Coene, D. Verkest, S. Vernalde, and R. Lauwereins, “Designing an Operating System for a Heterogeneous Reconfigurable SoC,” in Reconfigurable Architectures Workshop, Nice, France, April 2003. [7] H. Walder and M. Platzner, “Reconfigurable Hardware Operating Systems: From Design Concepts to Realizations,” in Proceedings of the 3rd International Conference on Engineering of Reconfigurable Systems and Architectures (ERSA). CSREA Press, June 2003, pp. 284–287.
[8] C. Steiger, H. Walder, and M. Platzner, “Operating Systems for Reconfigurable Embedded Platforms: Online Scheduling of Real-time Tasks,” IEEE Transaction on Computers, vol. 53, no. 11, pp. 1392–1407, November 2004. [9] M. Ullmann, M. Hübne, B. Grimm, and J. Becker, “On-Demand FPGA Run-Time System for Dynamical Reconfiguration with Adaptive Priorities,” in Field Programmable Logic and Application: 14th International Conference, FPL. Springer-Verlag Heidelberg, August 2004, pp. 454– 463. [10] T. Kogel, M. Doerper, T. Kempf, A. Wieferink, R. Leupers, G. Ascheid, and H. Meyr, “Virtual Architecture Mapping: A SystemC based Methodology for Architectural Exploration of System-on-Chip Designs.” in SAMOS, 2004, pp. 138–148. [11] P. Hastono, S. Klaus, and S. A. Huss, “An Integrated SystemC Framework for Real-Time Scheduling Assesments on System Level,” in 25th IEEE International Real-Time Systems Symposium (RTSS 2004), Lisbon, Portugal, December 2004. [12] ——, “Real-Time Operating System Services for Realistic SystemC Simulation Models of Embedded Systems,” in Forum on Specification & Design Languages (FDL’04), Lille, France, September 2004, pp. 380– 391. [13] Open SystemC Initiative, “Standard SystemC Language Reference Manual - v2.1,” Open SystemC Initiative, available at http://www.systemc.org/, April 2005. [14] A. Pelkonen, K. Masselos, and M. Cupák, “System-Level Modeling of Dynamically Reconfigurable Hardware with SystemC.” in 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), Nice, France, April 2003, pp. 174–181. [15] A. Thomas and J. Becker, “Dynamic Adaptive Runtime Routing Techniques in Multigrain Reconfigurable Hardware Architectures,” in Proceedings of International Conference on Field-Programmable Logic and Applications (FPL), ser. Lecture Notes in Computer Science (LNCS), vol. 3203. Antwerp, Belgium: Springer, August 2004, pp. 115–124. [16] C. Bobda, M. Majer, D. Koch, A. Ahmadinia, and J. Teich, “A Dynamic NoC Approach for Communication in Reconfigurable Devices,” in Proceedings of International Conference on Field-Programmable Logic and Applications (FPL), ser. Lecture Notes in Computer Science (LNCS), vol. 3203. Antwerp, Belgium: Springer, August 2004, pp. 1032–1036. [17] J. van der Veen, S. Fekete, M. Majer, A. Ahmadinia, C. Bobda, F. Hannig, and J. Teich, “Defragmenting the Module Layout of a Partially Reconfigurable Device,” in Proceedings of the International Conference on Engineering of Reconfigurable Systems and Algorithms (ERSA), Las Vegas, USA, June 2005. [18] ATMEL, “FPSLIC (AVR with FPGA),” http://www.atmel.com/products/FPSLIC/. [19] H. Schueler, “Smart media processing with XPP, white paper,” http://www.pactcorp.com/xneu/px_SMeXPP.html, April 2003. [20] H. Gomaa, Software Design Methods for Concurrent and Real-Time Systems. Addison Wesley, 1993. [21] H. Kopetz and H. Obermaisser, “Temporal composability,” Computing and Control Engineering Journal, pp. 156–162, August 2002. [22] I. Ouaiss, “Hierarchical Memory Synthesis in Reconfigurable Computers,” Ph.D. dissertation, University of Cincinnati / OhioLINK, 2002. [23] T. Marescaux, J.-Y. Mignolet, A. Bartic, W. Moffat, D. Verkest, S. Vernalde, and R. Lauwereins, “Networks on Chip as Hardware Components of an OS for Reconfigurable Systems,” in Field Programmable Logic and Application: 13th International Conference, FPL, September 2003, pp. 595–605. [24] V. Nollet, T. Marescaux, and D. Verkest, “Operating-System Controlled Network on Chip,” in Design Automation Conference, DAC, June 2004, pp. 256–259. [25] VSIA, http://www.vsia.org/. [26] Open Core Protocol International Partnership, “Ocp-ip,” http://www.ocpip.org/.