Model-driven development of multi-core embedded software

Model-Driven Development of Multi-Core Embedded Software Pao-Ann Hsiung1† , Shang-Wei Lin1 , Yean-Ru Chen2 , Nien-Lin Hsueh3 , Chih-Hung Chang4 , Chih-Hsiong Shih5 , Chorng-Shiuh Koong6 , Chao-Sheng Lin1 , Chun-Hsien Lu1 , Sheng-Ya Tong1 , Wan-Ting Su1 , William C. Chu5 1 National Chung Cheng University, 2 National Taiwan University, 3 Feng Chia University, 4 Hsiuping Institute of Technology, 5 Tunghai University, 6 National Taichung University, † [email protected]

Abstract

bedded Real-Time Application Framework (VERTAF) [4], for multi-core embedded software design and verification. Our primary goal is model-driven architecture (MDA) development for such software. In this article, we focus on the code generation for multi-core embedded software using TBB. We use an example to illustrate the transition. TBB is a library, expressing parallelism in a C++ program, that helps us to leverage multi-core processor performance without having to be a threading expert. It represents a higher-level, task-based parallelism that abstracts platform details and threading mechanisms for performance and scalability. Additionally, it also realizes the concept of scalability of writing an efficient scalable program, i.e. a program can benefit from the increasing number of processor cores. TBB Tasks are the basic logical units of computation. The library provides a task scheduler, which is the engine that drives the algorithm templates. The scheduler maps the TBB tasks onto physical threads. Nevertheless, it requires expertise in parallel programming before a software engineer can correctly apply the different parallel programming interfaces provided by TBB. Several issues crop up when developing a model-driven architecture for multi-core embedded software. First of all, how much and what kinds of explicit parallelism must be specified by a software engineer through system modeling. Second, how can we automatically and correctly realize the user-specified models into multi-core embedded software code. Third, how do we test and validate the generate code. Finally, how do we apply a software engineering process to the development of multi-core embedded software. We will try to provide partial solutions to the above issues, which are still open to more research work. Mainly, the proposed VERTAF/Multi-Core (VMC) framework takes SysML models as input, which contains user specified model-level explicit parallelism, and generates corresponding multi-core embedded software code in C++, which are scheduled and tested for a particular plat-

Model-driven development is worthy of further research because of its proven capabilities in increasing productivity and ensuring correctness. However, it has not yet been explored for multi-core processor-based embedded systems, whose programming is even more complex and difficult that that for conventional uni-processor systems. We propose a new VERTAF/Multi-Core (VMC) framework to bridge this gap. In this work, we mainly show how VMC generates code automatically from user-specified SysML models for multicore embedded systems. We illustrate how model-driven design based on SysML can be seamlessly integrated with Intel’s threading building blocks (TBB) and the Quantum Framework middleware. We use a digital video recording system to illustrate the benefits of VMC. Our experiments show how SysML/QF/TBB make multi-core embedded system programming easy, efficient, and effortless.

1 Introduction With the proliferation of multi-core architectures [1] for embedded processors, multi-core programming for embedded systems is no longer a luxury. We need embedded software engineers to be adept in programming such processors; however, the reality is that very few engineers know how to program them. The current state-of-the-art technology in multi-core programming is based on the use of language extensions such as OpenMP [8] or libraries such as Intel Threading Building Blocks (TBB) [9]. Both OpenMP and TBB are very useful when programmers are already experts in multithreading and multi-core programming; however, there still exists a tremendous challenge in this urgent transition from unicore systems to multi-core systems. To aid embedded software designers in a smoother transition, we are in the process of extending our tool, Verifiable Em-

IWMSE’09, May 18, 2009, Vancouver, Canada 978-1-4244-3718-4/09/$25.00 © 2009 IEEE

9

ICSE’09 Workshop

form such as ARM 11MPCore and Linux OS. The code architecture consists of an OS, the TBB library, the Quantum Framework (QF) for executing concurrent state machines, and the application code. In summary, VMC alleviates the tedious task of multi-core parallel programming through model-based automatic code generation targeted as a specific embedded system platform. The article is organized as follows. Section 2 describes existing related work. Section 3 describes the proposed VMC framework. Section 4 describes how the code generation process in VMC. Section 5 uses a digital video recording system example to illustrate how VMC achieves automatic multi-core programming using TBB and QF. Finally, section 6 gives the conclusions with some future work.

the complexity of embedded software, but multithreading also aggravates the whole issue of complexity due to the incomprehensible interactions among multiple threads [6]. Hence, the model-driven development process is even more essential for multi-core embedded software design. However, this requires adaptation of the VERTAF flow to support multi-core embedded software design as illustrated in Figure 1. VERTAF uses the Quantum Framework (QF) [11] for software code generation. QF is a framework for rapidly implementing software in an object-oriented fashion. A UML state machine is implemented by a QF Active Object. Based on the programming principles and APIs provided by QF, VERTAF translates a system modeled by a user with UML state machines into C/C++ embedded software code. In this work, we are extending the code generation of VERTAF using TBB, which as discussed earlier is a C++ library that offers parallelism at higher levels. At the highest level, parallelism exists either in the form of data on which to operate in parallel, or in the form of tasks to execute concurrently. TBB tasks that take advantage of both data parallelism and task parallelism are most useful for programming multi-core embedded systems. We will use a digital video recording system as an example to illustrate how the data and task parallelisms are integrated into the embedded software code that is generated from SysML models.

2 Previous Work VERTAF [4] is a UML-based application framework for embedded real-time software design and verification. The original VERTAF is an integration of software componentbased reuse, formal synthesis, and formal verification. It takes three types of extended UML models [10], namely Class Diagrams with deployments, timed statecharts, and extended sequence diagrams. The sequence diagrams are translated into Power-Aware Real-Time Petri Nets and then scheduled for low power design along with satisfaction of memory constraints. The timed statecharts are translated into Extended Timed Automata (ETA) and model checked using the SGM (State Graph Manipulators) model checker. The class diagram and the statecharts are used for code generation. Commercial tools such as IBM Rational Rose and ILogix Rhapsody series of tools generate code automatically but do not perform any scheduling or verification that are required for guaranteeing non-functional properties such as time and performance. In contrast to commercial tools, the code generated by VERTAF is both scheduled and verified formally. The MoBIES (Model-Based Integration of Embedded Systems) project [3, 5, 12] funded by USA’s DARPA and the DESS project funded by Europe’s EUREKA-ITEA are all very large and 5-year longterm research projects. Nevertheless, what VERTAF has achieved has already surpassed the achievements of both MoBIES and DESS, because MoBIES results are all disintegrated into several small projects in different universities and DESS did not result in any practical implementation of the proposed theories (guidelines). In contrast, VERTAF has successfully proposed the theory and implemented it into a useful application framework. The main purpose of model-driven software development is to alleviate the problem of inherently high complexity in software. In our target embedded systems, multicore processor architectures not only drastically increases

3 VERTAF/Multi-Core Framework VERTAF is being extended to support multi-core programming. This project work is now called VERTAF/MultiCore (VMC). The control and data flows of VMC are represented by solid and dotted arrows, respectively, in Figure 1. Software synthesis is defined as a two-phase process: a machine-independent software construction phase and a machine-dependent software implementation phase. This separation helps us to plug-in different target languages, middleware, real-time operating systems, and hardware device configurations. We call the two phases as front-end and backend phases. The front-end phase is further divided into three sub-phases, namely SysML modeling phase, realtime embedded software scheduling phase, and formal verification phase. There are three sub-phases in the back-end phase, namely architecture mapping, code generation, and testing. SysML Modeling: VMC requires four diagrams as an input of system specification models, namely requirements diagram, block definition diagram, interaction diagram, and state machine. SysML is a generic language and its specializations are always required for targeting at any specific application domain. The multi-core architecture of a processor along with OS features is modeled using SysML. The models are also enhanced using real-time parallel program-

10

Interaction Diagrams Timed State Machines

Display Counterexample in SysML Interaction Diagrams No

SysML Models

The architecture mapping phase then becomes simply the configuration of the hardware system and operating system through the automatic generation of configuration files, make files, header files, and dependency files. Multi-core processor architecture configurations can also be set in this phase. For example, the number of processor cores available, the number of cores to be used, the number of TBB threads, the amount of buffer space, the number of network connections, the amount of hard disk space available, the number and type of I/O devices available, the security mechanisms, and the allowed level of processor core loadings are some of the configurations to be set in this phase. Code Generation: As shown in Figure 2, we adopt a multi-tier approach for code generation: an operating system layer (Linux), a middleware layer (QF), a multicore threading library layer (TBB), and an application layer. Since both QF and TBB are very small in size and very efficient in performance, they are quite suitable for real-time embedded system software implementation. Later in this section, we will discuss the TBB task model and how we mapped it into VMC.

Requirement Diagrams

Class Diagrams with Deployment

Design Patterns

Extended Timed Automata Generation

Thread Mapping

Model Check

Thread Scheduling

Display Unshedulability Information

No

Specification Satisfied ?

Schedulable ? Frontend

Yes

Architecture Mapping

Yes

Backend

Multithreaded Multicore Application

Code Generation

TBB Library Quantum Framework (Middleware)

Testing Linux Multicore Processor

Multicore Embedded Software

Figure 2. VMC Code Architecture

Figure 1. VMC Design Flow

Testing: After the multi-core embedded software code is generated, the code needs to be tested for several issues, such as functional validation, non-functional evaluation, deadlock detection, and so on. A remote debugging environment is used to perform testing, monitoring test results, and checking if the cross-compiled code running on the target system works as expected and satisifies all the user-specified requirements.

ming design patterns such as parallel pipeline models. Scheduling: In VMC, the pthreads are scheduled by the Linux OS, and the TBB threads are scheduled by the TBB library along with thread migration among different cores. Formal Verification: In VMC, formal timed automata models are generated automatically from user specified SysML models by a flattening scheme that transforms each state machine into a set of one or more timed automata, which are then merged into a state-graph. We have modeled real-time task scheduling, task migration between processor cores, and several load balancing policies into the SGM model checker, which is used in VMC to formally verify the automata models. Architecture Mapping: All hardware classes specified in the deployments of the class diagram are those supported by VMC and thus belong to some existing class libraries.

4 Multi-Core Code Generation A typical embedded system consists of input units such as sensors or devices, computation units such as encoders, transformers, or decoders, and output units such as actuators or network devices. Multi-core embedded systems are typically computation and/or communication intensive, because otherwise there is no need for powerful multi-core processors. In VMC, a multi-core embedded system application is

11

specified by a set of SysML models. To alleviate the burden of application designers, VMC supports model-driven development in two ways. First, VMC provides abstract architecture models of multi-core computing along with realtime task scheduling such as rate monotonic first and earliest deadline first [7] and load balancing mechanisms such as task migration among cores. Second, VMC also support parallel design patterns such as parallel pipeline to hide latency, parallel loop to reduce latency, and parallel tasks to increase throughput. These design patterns correspond exactly to the three real-world concurrency issues [2].

4.1

scheduled by the TBB threading library using a nonpreemptive unfair scheduling approach that trades-off between depth first execution and breadth first execution of tasks on the task graph, which are ready for execution. • Kernel-Level OS threads: Each of the above user-level threads, including POSIX and TBB, is mapped to a kernel thread of the underlying OS such as Linux in VMC. The kernel threads are scheduled by the OS scheduler using a preemptive priority-based scheduling algorithm.

Task and Thread Models

4.2

Model and Code-Level Parallelism

As introduced at the beginning of Section 4, the three real-world concurrency issues [2] include latency hiding, latency reduction, and throughput increasing. The corresponding solutions are parallel pipeline, parallel loop, and parallel tasks, respectively. TBB supports all of these solutions with certain restrictions such as the parallel loop is supported only in four forms, namely parallel for, parallel reduce, parallel scan, and parallel while. Since VMC generates code based on the QF and TBB APIs, all of the three corresponding solutions are supported at the code level. However, the main issue to be addressed here is how and what kinds of parallelism to allow application designers to specify at the model level. The following approach adopted in VMC. VMC provides a UML profile to support parallel design patterns such that users can apply stereotype tags to SysML models. Currently, VMC users can apply the following sets of stereotype tags:

In the operating system terminology, the terms task and thread have been used interchangeably. For example, the basic unit of computation in the Linux OS is task, while it is a thread in the Windows OS. However, in VMC, we adopt the TBB terminology, that is, a task is a computation job that obeys the run-to-completion semantics, while a thread is a basic unit of computation that can be assigned a task to execute. In TBB, an application is represented by a task graph. The tasks that are ready are assigned by a task scheduler for execution by threads from a thread pool. The scheduler performs breadth first execution to increase parallelism and depth first execution to reduce resource usage. TBB thus hides the complexity of native threads by addressing common performance issues of parallel programming such as load balancing and task migration. The task and thread models of TBB are quite generic and are suitable for general-purpose computing. However, for embedded systems we need to satisfy real-time constraints, thus we need to have threads that are devoted for specific tasks such as the input sensing, the computation, and the actuator outputs. The task/thread model in VMC consists of the following three parts:

1. to a transition in the state machine model, 2. , to a function invocation on a state transition, where a filter is the TBB terminology for a pipeline stage,

• User-Level Pthreads: The POSIX threads are devoted threads, that is, unlike TBB threads, they are never reused for executing other tasks. There are two uses of such Pthreads in VMC as follows: (a) to execute the user-specified state machines (represented by QF active objects), and (b) to execute conventional legacy parallel tasks such as the tasks forked by a concurrent TCP server. The Pthreads are scheduled by the POSIX thread library scheduler using default scheduling algorithms such as FIFO or round robin or user-defined scheduling algorithms.

3. , , , to a method or a part of a method, and 4. to a to a part of a method.

method,

and

Using this parallel design pattern profile, VMC thus bridges the gap between model-level and code-level parallelism. Application designers are required to explicitly specify parallelism at the model level because the designers know best what to parallelize and what not to. VMC alleviates the burden of parallel programming through automatic code generation. Designers have to only tag the models with the above stereotypes and VMC takes care of the rest.

• User-Level TBB threads: These are the threads maintained by the TBB scheduler. They can be reused and migrated across different cores. The TBB threads are

12

4.3

5 Digital Video Recording: A Case Study

Code Generation

We use a real-world example called Digital Video Recording (DVR) system to illustrate how VMC works and the benefits of applying VMC to multi-core embedded software development. DVR is a real-time multimedia system that is typically used in concurrent remote monitoring of multiple sites. The DVR server can perform both real-time and on-demand streaming of videos to multiple clients simultaneously. Several digital video cameras provide the input for real-time video streaming and previously recorded videos are stored for on-demand streaming. We chose DVR as an illustration example because there is not only task parallelism, but also data parallelism and data flow parallelism in the system. The overall architecture of DVR is illustrated in Figure 3, which shows that DVR has two subsystems, namely Parallel Video Encoder (PVE) and Video Streaming Server (VSS). PVE is responsible for collecting videos from multiple cameras and encoding them into more compressed data format such as MPEG. VSS is responsible for allowing connections from multiple Remote Monitor Clients (RMC), for servicing the clients with status information, real-time video streams, and on-demand video streams, and for storing the encoded video streams in large video databases. In the rest of this section, we will describe how task parallelism, data parallelism, and data flow parallelism, i.e., parallel pipeline, are automatically realized in the embedded software code generated from user-specified models of the PVE. We will also describe how conventional thread parallelism is integrated into the embedded software code generated from user-specified models of the VSS. Finally, we will summarize on the amount of software components generated by VMC for the DVR system.

VMC generates multi-core embedded software code automatically from the user-specified SysML state machine models. As introduced in Section 3, the code leverages two existing open-source software code, including the Quantum Framework (QF) and the Intel Threading Building Block (TBB) library. QF is a set of application programming interfaces implemented in C++ for executing hierarchical state machines. TBB is a user-level thread library that helps programmers avoid the tedious job of thread management across multiple processor cores. QF has a very small footprint and TBB is a very lightweight library, thus they are both quite suitable for embedded systems that have constrained physical resources such as memory space and computation power. VMC realizes each SysML state machine as a QF active object by generating code that invokes the QF APIs for states, transitions, and communication events. Each active object is executed by a user-level Pthread that maps to a kernel thread in the Linux OS. Within an active object, each do method that is executed in a state, is encapsulated as a TBB task or a TBB task graph depending on the complexity of the method and its ability to be parallelized. Thus, there are basically two sets of user-level threads, namely Pthreads and TBB threads. The distinction between these two sets of threads is mainly due to the requirement of UML state machines to satisfy the run-to-completion (RTC) semantics. The RTC semantics is required by both the do methods in a QF active object and a TBB task. A QF active object cannot be modeled as a TBB task because the active object never terminates execution and thus will violate the RTC semantics if it is a TBB task. Hence, a devoted user-level pthread is used instead.

5.1

Another effect of the RTC semantics is that whenever there is an indefinite polling of some I/O devices such as a remote controller, that is, the polling task never terminates, then the polling task can neither be a QF do method nor a TBB task because otherwise the RTC semantics will be violated. VMC addresses this issue by modeling such polling tasks as an independent state machine with a single state, a self-looping transition, and a single triggering event such as data input. Since the specific state machine need not do anything else, it waits on the single event and thus there is no need to follow the RTC semantics, which is required only if there is more than one type of event incoming to a state machine.

Parallel Video Encoder

The PVE subsystem has three functions including the capturing of raw video data from all digital cameras, the encoding of the raw video from each camera into more compressed data format for efficient network transmission and for smaller storage space requirement, and the transmission of the encoded video data to the buffer manager in the VSS subsystem. PVE is a very good illustration example for all the three issues of real-world concurrency [2] as described in the rest of this section. 5.1.1 Task Parallelism Capturing and processing video from each camera is an independent task. However, due to the requirement of RTC semantics in UML and QF, as described in Section 4.3, we need to segregate the capture and the processing of the video

In Section 5, we will use a real-world application example to illustrate the various strategies employed in VMC as described in this section.

13

Digital Camera 2

Digital Camera 1

Real-Time Video Capture

Parallel Video Encoder

Video Storage

5.1.2 Data Parallelism

Real-Time Video Data

Recorded Video Data Status Manager

Since video data is composed of a large number of frames and the encoding process is iteratively applied to a data block of 8 × 8 pixels in a frame, there is a high degree of data parallelism in video encoding. Further, since the color model of the video in DVR is RGB, with 8-bits per pixel color, the encoding process can be parallelized into a multiple of 3, that is, one set of threads for each of the three colors. For example, a frame size of 640 × 480 pixels consists of 80 × 60 × 3 = 14400 data blocks of 8 × 8 pixels. Thus, the maximum data parallelism in encoding this video will be 14, 400. However, this might consume too much system resources and cause more timing overhead than the time saved through parallelization (latency reduction). The minimum data parallelism could be 3 for this video as one thread can be used for each color. A tradeoff between parallelism and resource usage is required to achieve high system efficiency. In the method for encoding, the stereotypes , , , can all be used for parallelizing the encoding method.

Video Streaming Server

Encoded Data Buffer Manager

Database Server

chine performs the real-time encoding of video. Thus, for a set of n cameras, there are 2n QF active objects that are executed by 2n Pthreads.

Digital Camera m

Real-Time Video Data

Recorded Video Data

Video Streaming Manager

Request? Video Status Information

Real-Time Video or On-Demand Video Streaming Connection Server

Socket Connections Remote Monitor Client 1

Remote Monitor Client 2

Remote Monitor Client n

Figure 3. Architecture of Digital Video Recording System

5.1.3 Data Flow Parallelism

Video Capture

Besides the task parallelism for multiple camera video inputs and the data parallelism for multiple data blocks within each frame, we can also apply data flow parallelism to PVE because the video encoding process applied to each data block is itself a sequence of functions. For most multimedia standards such as MPEG, the sequence of functions consists of Discrete Cosine Transform (DCT), Quantization (Q), and Huffman Encoding (HE), which result in lossy compression of data. This sequence of functions can be parallelized as a pipeline to hide latency such that more than one data block is processed at any time instant. In Figure 4, note how data flow parallelism is specified by a designer through the three stereotypes: , , and . Tagged values such as num tokens and num buffers are specified, respectively, in the and stereotypes to represent the number of tokens (the TBB terminology for degree of parallelism in a parallel filter) and the maximum number of buffers (the TBB terminology for the maximum degree of parallelism in a system). The encoding pipeline in PVE has 2 serial filters, namely GetRF that gets and decomposes a raw frame for parallel processing by the parallel filters and

Capture_Frame entry/allocBuf exit/freeBuf OneRF_ready==1/ Ready_getRF := 1

Video Encoding Ready_getRF == 1 Encoded == 0/ GetRF, OneRF_ready :=0, DCT, Q, HE, PutEF, Encoded := 1 RF_Notification ENCODE_OK entry/ exit/

/Encoded := 0

Figure 4. State Machines of the Parallel Video Encoder

into two different state machines as illustrated in Figure 4. The video capture state machine is devoted to capturing video from a camera, while the video encoding state ma-

14

PutEF that collects all encoded data blocks and composes an encoded frame for transmission to the video buffer. The 3 parallel filters in PVE pipeline are responsible for computing in parallel the functions: DCT, quantization, and Huffman encoding.

multi-threaded software, the POSIX threads for executing QF active objects, and the TBB threads work together seamlessly. The main functions of the (VSS) include (1) accepting multiple connections from remote clients, (2) streaming multiple real-time videos and/or on-demand videos to the remote clients, (3) providing requested server status information to the remote clients, and (4) recording the encoded videos into storage devices. The architecture of the VSS subsystem is shown in Figure 5 and the functionalities of each component in VSS are described as follows.

Parallel Video Encoder Real-Time Video Frames

Database Server

Get/Put On-Demand Encoded Video Frames Video Frames Data Buffer Manager

5.2.1 Legacy Threads Legacy threads are simply multiple threads that exists in legacy software. This is illustrated in the Connection Server (CS) and the Video Streaming Server (VSM). The connection server is responsible for handling connections and invoking services corresponding to multiple client requests. Traditionally, this has almost always been implemented as an iterative or concurrent TCP server using either the select or the fork mechanism. The state machine for the connection server is shown in Figure 6. In the DISPATCH state, the server simply forks a new thread for servicing a new request from a client. The threads that are forked from the concurrent TCP server are what we call legacy threads. It is simply unreasonable to forsake wellestablished proven concurrent artifacts such as a concurrent TCP server. This example shows that the VMC framework does not force one to model everything for TBB or QF. Another reason for not applying the TBB principle here is that the parallelism is explicitly designed into the system and it is required for providing real-time services to the clients. The video streaming manager is also a typical concurrent manager that creates new streams at run-time to serve client requests. Due to quality-of-service (QoS) requirements, the manager simply forks new threads to serve new requests. A thread pool is managed for efficiency so that thread creation and destruction are avoided at run time. In DVR, because a minimum QoS of 15 frames per second (fps) is required for video streaming, VSM manages a pool of legacy threads. The state machine of VSM is illustrated in Figure 7, where a new thread is used for servicing each new request, either for a real-time video streaming or an on-demand video streaming.

Video Streaming Manager

Status Manager Get Status

Connection Server

Request Videos

Request Connection Remote Monitor Client

Figure 5. Architecture of Video Streaming Server

Connection Server entry/Init_CS exit/FreeResource / Init_End INIT exit/EndInit

Recv_RMC_Req / Req_Transmit IDLE

DISPATCH

do: wait_connreq()

do: create_port() // fork() Dispatch_Done/ RMC_Req_Done

Figure 6. State Machine Model of the Connection Server

5.2.2 TBB Tasks/Threads

5.2

Video Streaming Server

The VMC framework uses TBB tasks mainly for two reasons as follows: (a) A job is parallelizable, but there is no real-time constraints, or (b) A job is parallelizable, but the underlying hardware device is not. The first case is illustrated by the Status Manager (SM) and the second case by the Database Server (DS) and the Encoded Data Buffer

We use the Video Streaming Server (VSS) subsystem as a typical example of how conventional or legacy multithreaded software can be integrated into the VMC framework such that the integration between the threads in legacy

15

The RMC can also used to gather the server performance statistics for improving the video streaming QoS guarantees.

Video Streaming Manager RTV = Real-Time Video ODV = On-Demand Video

entry/Init_VSM exit/FreeResource

Receive_RTVStream_Req / Alloc_NewStream

/ Init_End INIT exit/EndInit

6 Conclusions

RTV_Streaming

IDLE

do: Create_RTVStream // fork() RTVStream_Created/ RTVStream_Req_Done ODVStream_Created/ ODVStream_Req_Done

do: wait_streamreq()

Receive_ODVStream_Req / Alloc_NewStream

VERTAF/Multi-Core (VMC) is an application framework for developing multi-core embedded software. It adopts a model-driven approach with automatic code generation from SysML models. The code generated by VMC uses the Quantum framework APIs and the Intel’s Threading Building Blocks library along with an operating system that supports multi-core processors such as Linux. VMC shows how easy it is to develop embedded software for multi-core processors. We used a real-world example, namely a digital video recording (DVR) system, to illustrate how VMC solves several of the issues related to modeldriven development for multi-core embedded systems.

ODV_Streaming do: Create_ODVStream // fork()

Figure 7. State Machine Model of the Video Streaming Manager

References

Manager (EDBM). Note that the EDBM also utilizes multiple QF threads for executing concurrent states. Due to page limits, EDBM description is omitted. The status manager retrieves the list of recorded video files and the list of on-line digital video cameras from the database server and passes the information to remote clients. Though there are multiple incoming requests, there are no real-time constraints and the workload is very light, thus there is no need for devoted threads, instead, VMC realizes these jobs as TBB tasks which can be executed by the TBB scheduler using a set of TBB threads. The stereotype is used to specify request servicing as a set of parallel TBB tasks. The database server provides recorded video files to VSM upon client requests and allows storing of real-time video data from EDBM. Multiple clients requests and multiple camera video inputs require the database server to be a concurrent one. However, since DVR considers a single hard-disk for database storage, allowing multiple devoted threads for each read and write request is unnecessary because ultimately all the requests must be serialized by the OS disk scheduler. Instead, VMC maps such read and write jobs as TBB tasks. Parallelism is still needed so that the disk accesses can be made efficient through the OS disk scheduler.

5.3

[1] S. Akhter. Multi-core Programming: Increasing Performance Through Software Multi-threading. Intel Press, 2006. [2] B. Cantrill and J. Bonwick. Real-world concurrency. ACM Queue, 6(5):16–25, September 2008. [3] D. de Niz and R. Rajkumar. Time Weaver: A softwarethrough-models framework for embedded real-time systems. In Proceedings of the International Workshop on Languages, Embedded Systems, pages 133–143, June 2003. [4] P. A. Hsiung, S. W. Lin, C. H. Tseng, T. Y. Lee, J. M. Fu, and W. B. See. VERTAF: An application framework for the design and verification of embedded real-time software. IEEE Transactions on Software Engineering, 30(10):656– 674, October 2004. [5] S. Kodase, S. Wang, and K. G. Shin. Transforming structural model to runtime model of embedded real-time systems. In Proceedings of the Design Automation and Test in Europe Conference, pages 170–175, March 2003. [6] E. A. Lee. The problem with threads. IEEE Computer, 39(5):33–42, May 2006. [7] C. Liu and J. Layland. Scheduling algorithms for multiprogramming in a hard-real time environment. Journal of the Association for Computing Machinery, 20:46–61, 1973. [8] OpenMP. http://www.openmp.org/. 2008. [9] J. Reinders. Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O’Reilly Media, Inc., 2007. [10] J. Rumbaugh, G. Booch, and I. Jacobson. The UML Reference Guide. Addison Wesley Longman, 1999. [11] M. Samek. Practical StateCharts in C/C++. CMP, 2002. [12] S. Wang, S. Kodase, and K. G. Shin. Automating embedded software construction and analysis with design models. In Proceedings of the International Conference of EurouRapid, December 2002.

Remote Monitor Client

The remote monitor client (RMC) allows users to interact with the DVR server through a graphical user interface in the following ways: (1) acquiring the status information of the DRV server, (2) real-time video streaming, (3) ondemand video streaming, and (4) debugging and testing.

16