AbstractâAdvances in process integration, the power wall and end-user application demands have made Multi-Processor. Systems on Chip (MPSoCs) a reality.
Backend for Virtual Platforms with Hardware Scheduler in the MAPS Framework Jeronimo Castrillon, Aamer Shah, Luis Gabriel Murillo, Rainer Leupers, Gerd Ascheid Institute for Communication Technologies and Embedded Systems (ICE) RWTH Aachen University Aachen, Germany 52074 {castrill,shaha,murillo,leupers,ascheid}@ice.rwth-aachen.de Abstract—Advances in process integration, the power wall and end-user application demands have made Multi-Processor Systems on Chip (MPSoCs) a reality. In mobile embedded devices, these systems are heterogeneous in order to cope with stringent real time and energy constraints, which makes them difficult to program, debug and verify. Therefore, a lot of research in industry and academia has focused on providing solutions to this MPSoC programming problem. In this paper we study and extend one of such frameworks, namely, the MPSoC Application Programming Studio (MAPS) [1]. We analyze MAPS retargetability by adding a new backend for a heterogeneous MPSoC with the OSIP hardware scheduler [2]. The new backend exports high level debugging information that is included in an environment for application debugging based on virtual platforms. The extensions are demonstrated on a heterogeneous virtual platform running the JPEG application. Index Terms—MPSoC programming, MPSoC debugging, virtual platforms, hardware scheduler, code generation
I. I NTRODUCTION Today’s mobile embedded systems have to support several communication and multimedia standards with a tight energy constraint. Additionally, embedded standards have stringent real time constraints, and continue to grow in complexity. To cope with all this, embedded devices feature heterogeneous Multi-Processor Systems on Chip (MPSoC) [3]. Programming such systems is a big challenge that has motivated a lot of research on new methodologies and tools [1], [4]–[8]. The problem of MPSoC programming covers several aspects, including: (1) Which programming model to use, (2) how to expose/extract parallelism and (3) how to map the application to the MPSoC. Mapping refers to the process of binding tasks to processing elements (spatial mapping), scheduling tasks on shared resources (temporal mapping) as well as assigning data and communication to memories and communication channels. Despite the efforts in academia, the common programming practice in industry is still a manual process, known to be slow and error prone. As a consequence, software productivity cannot keep the pace with which software requirements are changing, which has originated the socalled productivity gap [9]. The MAPS framework [1] strives at providing solutions to close this gap. It does so by providing: (1) A high level platform-independent dataflow programming language, (2) means for explicit parallelism complemented c 2011 IEEE 978-1-4244-9485-9/11/$26.00 ⃝
by tools for parallelism extraction and (3) an interactive framework for application mapping. When programming with MAPS, the programmer does not have to care about low level details of the underlying MPSoC: target specific communication primitives or Application Programming Interfaces (APIs) for multi-tasking. This is accomplished by an extensible code generator, that has been retargeted for several platforms. In this paper we present an extension to this code generator that targets a virtual platform equipped with the OSIP hardware scheduler [2]. Along with programming, MPSoC debugging is an issue that directly impacts software productivity. Apart from failing due to traditional bugs in sequential programing, MPSoC applications suffer from the difficulty to predict concurrency effects. Moreover, If the level of abstraction of the programming model is increased, something similar has to happen with the debugging interface. Traditional low level debuggers (e.g., gdb) are not aware of high level programming constructs (e.g., logical communication channels). As a consequence, it takes longer for a programmer to spot software errors in automatically generated code. We therefore added a functionality to MAPS that exports high level debugging information. We developed an environment for virtual platforms that, provided with this information, allows the user to debug the application more comfortably. The rest of this paper is organized as follows. Section II discusses related work. The tools and technologies involved in this paper are introduced in Section III followed by a description of the methodology in Section IV. Section V presents the target virtual platform and the extensions done to the MAPS framework. Results are given in Section VI and conclusions in Section VII. II. R ELATED W ORK Several works relate to the MAPS framework. (Semi)Automatic parallelism extraction from a sequential specification is addressed in [5], [6], [10]. Frameworks with parallel dataflow input specifications are treated in [4], [7], [11]–[13]. A lot of effort has been invested on improving the quality of generated code from dataflow languages. Initial works, e.g., in [14], only supported synchronous data flow languages and
Fig. 2: Target platform in SNPS-PA
Fig. 1: Extended MAPS flow overview did not take heterogeneity into account. More recently, authors in [15] address the issue for heterogeneous systems. MPSoC debugging has been recently addressed by introducing hardware components that capture system-level information at run-time [16], [17]. However, this approach not only needs hardware to be available, but it also lacks knowledge of high-level constructs. More related to virtual prototyping, authors in [18] have added debugging capabilities for system verification. These enhancements focus on detecting problems in the simulated hardware itself. To the best of our knowledge, no major contributions for simulation-based application debugging on MPSoCs can be found in the literature yet. III. BACKGROUND A. MPSoC Application Programming Studio (MAPS) The work presented in this paper is an extension of the MAPS framework [1], which consists of a set of tools for MPSoC programming. MAPS provides support for sequential and parallel programming. Techniques for analysis and partitioning of C sequential code were presented in [8]. Techniques for handling parallel specifications were presented in [19]. The parallel specification is based on the Kahn Process Networks (KPN) Model of Computation (MoC) [20], and is implemented as a small set of extensions to the C language. B. Virtual Platforms A virtual platform is a software model of a hardware platform. Being a software model, virtual platforms provide more controllability and observability when compared with real hardware. For this reason we believe that virtual platforms will play an important role in MPSoC programming, debugging and verification. C. Operating System asIP (OSIP) OSIP [2] is an Application Specific Instruction Set Processor (ASIP) tailored for task management in MPSoCs. OSIP uses interrupt signals to notify the processors of a scheduling event. Upon an interrupt, the processors in an MPSoC fetch the information of the task to be executed from the OSIP’s interface. A lightweight Application Programming Interface (OSIP-API) is provided with OSIP that enables low latency task dispatching in a heterogeneous platform.
IV. T OOL F LOW Figure 1 shows an overview of the tool flow used in this paper. The flow corresponds to the traditional MAPS flow with additional outputs for debugging on virtual platforms. The inputs of the flow are: a KPN specification of the application, a tool configuration file with user options, a set of constraints for the application and a model of the target MPSoC. The application is processed by the MAPS framework which computes a mapping of the application to the target platform. Details on how MAPS computes a mapping were given in previous publications. The mapping, in form of a mapping descriptor, is passed to the code generator which produces code that can be compiled with the target specific toolchains. MAPS code generator was extended to support execution and debugging on OSIP-based MPSoCs. For high level debugging, the code generator exports a text file that contains information of the KPN application (e.g., process names, channel names, deadlines). A set of tcl scripts use this information, the memory map and the binary images to enable high level debugging. Details on the new code generation and the debugging environment are provided in the next section. V. TARGET P LATFORM AND MAPS E XTENSIONS A. Target Virtual Platform The target virtual platform, shown in Figure 2, was modeled in Synopsys Platform Architect (SNPS-PA) [21]. It includes: (1) An instruction accurate Instruction Set Simulator (ISS) of the ARM926EJ-S architecture, (2) two cycle accurate ISSs of a 4 slot VLIW processor, (3) local memories for all processors and a shared memory for communication, (4) a time annotated behavioral transaction level model of OSIP, (5) an AMBA AHB bus as interconnect, and (6) a behavioral transaction level model of a proxy that enables the VLIW processors to communicate with OSIP. The latter is needed since the VLIW processor does not support interrupts. The ARM and the bus models are taken from the SNPS IP libraries. The OSIP model was extended from the one presented in [2] with debugging support. The VLIW processor was modeled with SNPS Processor Designer [21]. B. Code Generation The code generator’s main outputs are: OSIP configuration code, application’s logic code and a text file with debugging information. The generated C code uses the OSIP-APIs for scheduler configuration, task creation, task scheduling and synchronization [2].
Fig. 3: Sample mapping descriptor 1) OSIP Configuration: The mapping component of the MAPS framework generates a mapping descriptor of the application to the platform, as shown in Figure 1. A sample configuration is shown in Figure 3, where two tasks are scheduled with a priority based scheduler for the ARM processor and 3 tasks are scheduled in FIFO fashion for the two VLIW processors. Excerpts of the generated code for this configuration are shown in Listing 1. The code in Lines 3-8 tells OSIP about the available processors for this application and their interrupt signal identifiers. In the example CLASS_1 represents the VLIW class with two processing instances. The code in Lines 10-13 configures two task queues with their corresponding scheduling policies and assigns them to the processing classes. Finally, the code in Lines 15-18 shows how tasks are created. In the example, Task1 is assigned to the queue attached to ARM with a priority of 1. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
/* Platform description (CLASS_0 = 0 (ARM) CLASS_1 = 1 (VLIWs)) */ ProcClasses[CLASS_0].processing = 1; ProcClasses[CLASS_0].InterID = pID0; //*pID0 = [0] ProcClasses[CLASS_1].processing = 2; ProcClasses[CLASS_1].InterID = pID1; //*pID1 = [1,2] pParams->numProcClass = 2; pParams->numProcResInstances = 3; /*... Schedule configuration */ TaskQueues[CLASS_0].procClass = CLASS_0; TaskQueues[CLASS_0].entryNodeAlgo = OSIPSchedPriority; TaskQueues[CLASS_1].procClass = CLASS_1; TaskQueues[CLASS_1].entryNodeAlgo = OSIPSchedFIFO; /*... Task creation */ NewTaskParams.fpTask = (int (*)(void*))Task1; NewTaskParams.Priority = 1; NewTaskParams.EntryNodeRef = TaskQueueRefs[CLASS_0]; OSIPMTAPITaskNew(&NewTaskParams);
Listing 1: Generated configuration code example 2) Application Code: To generate code for the application logic, the existing code generator that supported pthreads and HVP-API was extended (see details in [19]). Multi-tasking for ARM was implemented with the OSIP-API. In the case of the VLIW processor, multi-tasking was implemented by using proto-threads [22]. Synchronization for channel accesses was implemented with OSIP’s hardware semaphores. 3) Debug Information: The high level debug information exported by the code generator includes: • Process names: Processes are typically implemented as C-functions. This information allows the programmer to list all the processes of an application and set breakpoints. • Channel information: Processes in a KPN communicate via FIFO channels. For every channel, its name, its producer(s) and its consumer(s) are recorded. With this information, the user can perform consistency checks (e.g., no one else but the producers can write to a channel), and keep track of channel utilizations.
Fig. 4: VPA debugging interface. On the right hand side, graphical representation of OSIP internal state.
•
Communication functions: Functions for channel access may differ greatly from processor to processor, or even within a processor. With this information and the channel names, the user can set breakpoints on channel access.
C. Debugging Environment Virtual platforms offer standard interfaces for debuggers, so that application programmers can debug their code on a running virtual platform with traditional tools (e.g., gdb). As mentioned before, these debuggers do not have access to high level application information. In order to account for this, an extra debugging layer was implemented as a set of scripts on top of the TCL layer provided by SNPS-PA. Graphical controls were also added that can be loaded from the Virtual Platform Analyzer to ease further the debugging process (see Figure 4). The scripts can be classified into two categories: system and application debugging. 1) System Debugging: These scripts serve to debug application independent functionality, which includes: interrupts, context switches, sleep modes and OSIP API calls. The latter includes task creation, task wait, task preemption and task yield. The debugging environment allows to set breakpoints at each of this events. In order to enable this, the debugging scripts were provided with information about OSIP’s software stack and main data structure layout (e.g., OS task descriptors). Additionally, the OSIP abstract model was extended so that its internal state can be dumped to graphical files during simulation execution. When debugging an application, the user can graphically see the status of every queue inside OSIP (see right hand side of Figure 4). 2) Application Debugging: The scripts for application debugging allow to set breakpoints at every channel access and every iteration of a process. To illustrate how the debugging process is eased with our approach, consider what a programmer would have to do if he were to debug an access to a specific channel with a traditional debugger: (1) Find the name of the function that performs the access, (2) find the data structure that represents the channel, (3) understand the signature of the channel access function, (4) extract the memory location of the channel passed to the function, (5) compare the address with the address assigned (probably dynamically) to the desired channel. In contrast to this, the debugging environment accomplishes the same via the com-
focus on more case studies and backends for the MAPS framework, and on a generalization of the concepts presented here for debugging on virtual platforms. ACKNOWLEDGMENT Fig. 5: JPEG example. (a) Performance of the generated code for 10 different configurations. (b) Two sample configurations mand set_bp_channel_read , even if the application was compiled without debug information. Internally, the scripts perform the same steps as described above, provided with the information exported by the code generator (e.g., function names and signatures) and by knowing the calling conventions. VI. C ASE S TUDY In this case study we analyze the correctness of the generated code and the functionality of the code generator by executing different mapping configurations of the JPEG (encoder and decoder) application on the target platform. To achieve this, the mapper component of the MAPS flow was run with 10 different configuration parameters. The KPN version of the JPEG application is the same as the one used in [19]. Since the application contains both JPEG encoding and decoding, the correctness of the generated code is assessed by simply comparing the input and output jpeg image files. None of the tests reported a difference in the output image. The results obtained with the flow are presented in Figure 5. Figure 5a shows the performance of the 10 different configurations measured on the virtual platform. Out of these configurations, two samples are shown in Figure 5b, where each process of the JPEG application is marked with a color corresponding to the processor it was mapped to. This case study shows the potential productivity increase provided by the MAPS framework. First, by extending the code generator, the same application specification can be used for different target platforms. Second, the code generator allows to test different configurations faster than if coding each of them by hand using the OSIP-APIs. Finally, although not covered in this case study, the debugging environment greatly simplifies application development cycles. VII. C ONCLUSION In this paper we presented extensions to the MAPS framework that allow to execute KPN applications on an OSIPbased MPSoC. We described a debugging environment and its integration with the MAPS framework, via backend extensions. This work shows that MAPS can reduce the portability and debugging effort, thereby improving software productivity. The code generated for OSIP-based MPSoC differs considerably from that of other backends (e.g., pthreads), and yet the application specification remains unchanged. We tested the flow with the JPEG application, for which different configurations were tested for correctness. Future work will
This work has been supported by the UMIC Research Centre, RWTH Aachen University. R EFERENCES [1] R. Leupers and J. Castrillon, “MPSoC Programming using the MAPS Compiler,” in ASP-DAC ’10: Proc. of the 15th Asia and South Pacific Design Automation Conference, 2010. [2] J. Castrillon and et al., “Task Management in MPSoCs: an ASIP Approach,” in ICCAD ’09: Proc. of the 2009 Intr. Conf. on ComputerAided Design. ACM, 2009. [3] P. Kollig, C. Osborne, and T. Henriksson, “Heterogeneous Multi-Core Platform for Consumer Multimedia Applications,” in DATE 2009: Proc. of the Conf. on Design, Automation and Test in Europe. [4] S. Kwon and et al., “A Retargetable Parallel-Programming Framework for MPSoC,” ACM Trans. Des. Autom. Electron. Syst., vol. 13, 2008. [5] J.-Y. Mignolet and et al., “MPA: Parallelizing an Application onto a Multicore Platform Made Easy,” IEEE Micro, vol. 29, no. 3, 2009. [6] P. Chandraiah and R. Domer, “Code and Data Structure Partitioning for Parallel and Flexible MPSoC Specification Using Designer-Controlled Recoding,” Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, vol. 27, no. 6, June 2008. [7] L. Thiele and et al., “Mapping Applications to Tiled Multiprocessor Embedded Systems,” in ACSD ’07: Proc. of the Seventh Intr. Conf. on Application of Concurrency to System Design. IEEE Computer Society. [8] J. Ceng and et al., “MAPS: An Integrated Framework for MPSoC Application Parallelization,” in DAC ’08: Proc. of the 45th annual Conf. on Design automation. ACM, 2008. [9] W. Ecker, W. Mueller, and R. Doemer., Hardware-dependent Software - Principles and Practice. Springer, 2008, ch. Hardware-dependent Software - Introduction and Overview. [10] S. Verdoolaege, H. Nikolov, and T. Stefanov, “PN: a Tool for Improved Derivation of Process Networks,” EURASIP J. Embedded Syst., 2007. [11] H. Nikolov, “System-Level Design Methodology for Streaming Multiprocessor Embedded Systems,” Ph.D. dissertation, Univ. Leiden, 2009. [12] C. Haubelt and et al., “A systemc-based design methodology for digital signal processing systems,” EURASIP J. Embedded Syst., 2007. [13] A. Hansson and et al., “CoMPSoC: A Template for Composable and Predictable Multi-Processor System on Chips,” ACM Trans. Des. Autom. Electron. Syst., vol. 14, no. 1, 2009. [14] S. S. Bhattacharyya, P. K. Murthy, and E. A. Lee, “Synthesis of Embedded Software from Synchronous Dataflow Specifications,” J. VLSI Signal Process. Syst., vol. 21, no. 2, 1999. [15] G. Schirner, A. Gerstlauer, and R. D¨omer, “Automatic Generation of Hardware Dependent Software for MPSoCs from Abstract System Specifications,” in ASP-DAC ’08: Proc. of the 2008 Asia and South Pacific Design Automation Conference, 2008. [16] A. Hopkins and K. McDonald-Maier, “Debug support strategy for systems-on-chips with multiple processor cores,” IEEE Transactions on Computers, vol. 55, no. 2, Feb. 2006. [17] A. Mayer, H. Siebert, and K. McDonald-Maier, “Boosting debugging support for complex systems on chip,” Computer, vol. 40, no. 4, 2007. [18] K. Tomasena and et al., “A transaction level assertion verification framework in systemc: An application study,” in CENICS ’09: Proc. of the 2009 Second Intr. Conf. on Advances in Circuits, Electronics and Micro-electronics. IEEE Computer Society, 2009. [19] J. Castrillon and et al., “Trace-based KPN Composability Analysis for Mapping Simultaneous Applications to MPSoC Platforms,” in Proc. of the Design, Automation and Test in Europe Conference and Exhibition, Dresden, Germany, 2010. [20] G. Kahn, “The Semantics of a Simple Language for Parallel Programming,” in IFIP Congress’74, J. L. Rosenfeld, Ed. [21] Synopsys, “System Level Design: Platform Architect and Processor Designer.” [Online]. Available: http://www.synopsys.com/Tools/SLD [22] A. Dunkels and et al., “Protothreads: Simplifying Event-driven Programming of Memory-constrained Embedded Systems,” in SenSys ’06: Proc. of the 4th Intr. Conf. on Embedded networked sensor systems. ACM.