26 Safe Dynamic Reshaping of Reconfigurable ... - ACM Digital Library

Safe Dynamic Reshaping of Reconfigurable MPSoC Embedded Systems for Self-Healing and Self-Adaption Purposes ¨ Darmstadt ALEXANDER BIEDERMANN and SORIN A. HUSS, Technische Universitat ADEEL ISRAR, COMSATS Institute of Information Technology, Islamabad

Multiprocessor system-on-chip (MPSoC) architectures are a huge challenge in embedded system design. This situation arises from the fact that available MPSoCs and related designs flows are not tailored to the specific needs of embedded systems. This work demonstrates how to provide self-healing properties in embedded MPSoC design. This is achieved by combining the features of a generic approach to create virtualizable MPSoCs out of off-the-shelf embedded processors with a methodology to derive system configurations, such as task-processor bindings, which are optimal in terms of safety and execution time. The virtualization properties enable a reshaping of the MPSoC at runtime. Thus, system configurations may be exchanged rapidly in a dynamic fashion. As a main result of this work, embedded multiprocessor systems are introduced, which dynamically adapt to changing operating conditions, possible module defects, and internal state changes. We demonstrate the figures of merit of such reconfigurable MPSoC embedded systems by means of a complex automotive application scenario mapped to an FPGA featuring a virtualizable array of eight soft-core processors. Categories and Subject Descriptors: C.4 [Performance of Systems]: Reliability, Availability, and Serviceability; C.0 [General]: System Architectures General Terms: Design, Algorithms, Reliability Additional Key Words and Phrases: Embedded system design, runtime reconfiguration, virtualization ACM Reference Format: Alexander Biedermann, Sorin A. Huss, and Adeel Israr. 2015. Safe dynamic reshaping of reconfigurable MPSoC embedded systems for self-healing and self-adaption purposes. ACM Trans. Reconfigurable Technol. Syst. 8, 4, Article 26 (September 2015), 22 pages. DOI: http://dx.doi.org/10.1145/2700416

1. INTRODUCTION

The paradigm shift in embedded system design from traditional single-processor architectures toward multiprocessor systems-on-chip (MPSoCs) not only has brought a distinct increase in terms of flexibility and performance but also has hardened existing design challenges. Binding and scheduling of tasks in a multiprocessor system have to account for several optimization goals, such as execution time, power consumption, safety, and reliability. The reliability of a system is viewed as the probability of completing the assigned task without any fault occurring during task execution. The concept of the safety of a system goes beyond reliability as introduced earlier. Now the safety denotes the probability of either completing a task without any fault in the executing Authors’ addresses: A. Biedermann and S. A. Huss, Integrated Circuits and Systems Lab, TU Darmstadt, Hochschulstr. 10, 64289 Darmstadt, Germany; emails: {biedermann, huss}@iss.tu-darmstadt.de; A. Israr, Department of Electrical Engineering, COMSATS Institute of Information Technology, Park Road, Tarlai Kalan, Islamabad 45550, Pakistan; email: [email protected]. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or [email protected]. c 2015 ACM 1936-7406/2015/09-ART26 $15.00 DOI: http://dx.doi.org/10.1145/2700416

ACM Transactions on Reconfigurable Technology and Systems, Vol. 8, No. 4, Article 26, Publication date: September 2015.

26

26:2

A. Biedermann et al.

mechanism or of the avoidance of transferring the effect of an occurring fault to the rest of the system. The main contribution of this work is the combination of a hardware-based virtualization approach for tasks with a binding optimization mechanism, which maps tasks to a variable set of processor instances with regard to both execution times and the overall system safety. As the virtualization approach provides a seamless shift of task execution among a processor array at runtime, applying the results of the binding optimization allows for maintaining an optimal system state in terms of safety and execution time even when processors in the array fail (e.g., due to module defects). In doing so, a comprehensive design flow for the creation of a self-healing MPSoC is derived. Subsequently, this dedicated design flow is detailed, which consists of two phases. In the first phase, an initial optimal task-processor binding is generated. The second phase is dedicated to the reshaping procedure of the system behavior. The proposed design flow is demonstrated by means of a complex automotive application example. The structure of the article is as follows. Section 2 discusses previous work in the area of MPSoC architectures and related design flows. In Section 3, a generic virtualizable MPSoC architecture is highlighted, which allows for its reconfiguration at runtime. Section 4 outlines a dedicated methodology for the design of reshapable MPSoC embedded systems by exploiting the virtualization features of the architecture detailed in Section 3. In Section 5, a case study taken from the automotive domain demonstrates how the advocated MPSoC architecture and the proposed design methodology allow for a secure reshaping of the system functionality. Section 6 and an appendix conclude the article. 2. RELATED WORK

Most high-level approaches targeting reliable system design restrict the analysis to either hard errors or soft errors. Lee et al. [2010] have proposed a static task migration of the task graph from a resource with a hard error to a healthy one. The migration schedule calculated for various faults configurations helps in maintaining the throughput of the streaming application in case of faults. Meyer et al. [2010] presented a costeffective slack allocation approach for the improvement of the reliability of MPSoCs. They proposed a design space exploration approach resulting in the generation of a set of Pareto optimal alternatives representing various combination of slack utilization in the event of a hard error. Das et al. [2014] presented a heuristic-based design-time multicriterion optimization technique for application mapping on MPSoCs with energy consumption and execution time as optimization criteria. This technique (as the previously mentioned ones) is restricted to the consideration of hard errors. Izosimov et al. [2009] presented a design technique aiming at the mitigation of soft errors in the system using radiation-hardened resources and task re-execution. Calvert et al. [2011] introduced an integrated framework for finding a set of components best suited for the system requirements. The selection process is assisted by a simulated annealing algorithm and a greedy approach. In contrast to these approaches, Glaß et al. [2007] presented a scheme for reliable system design at system level considering both hard and soft errors. Each process of the functional specification is bound to multiple resources, which execute each a copy of this process simultaneously. One major drawback of this method is the need for costly fail-silent resources. Two recent high-level design approaches by Israr and Huss [2012, 2014] consider both hard and soft errors occurring in the system as well, when conceiving a system that considers reliability as its main characteristic. These authors introduce a complex data structure, which models both hard and soft errors, and present genetic algorithms ACM Transactions on Reconfigurable Technology and Systems, Vol. 8, No. 4, Article 26, Publication date: September 2015.

Safe Dynamic Reshaping of Reconfigurable MPSoC Embedded Systems

26:3

for the optimization of the system reliability next to execution time and power consumption, while considering non–fail-silent resources for the subsequent implementation. They, however, are required to use metaheuristic-based algorithms to select the best binding, which is much more computationally expensive as compared to the solution presented in the sequel. In the area of multiprocessor architectures, especially in the field of personal computers, virtualization is an established mechanism for both transparent resource sharing and load distribution. Virtualization may be seen as a very strict form of multithreading/multitasking. However, in such systems, a common underlying operating system has to ensure that no memory access violations occur. This work focuses on software tasks dynamically scheduled on an FPGA platform. An early approach that features scheduling of hardware tasks is described in Brebner [1996]. Later, partial reconfigurable platforms were targeted in Brebner and Diessel [2001]. These proposals rely on a virtual hardware operating system. Although such approaches introduce the virtualization concept for embedded systems, their methods cannot directly be transformed to the envisaged multiprocessor virtualization. The work presented in Simmler et al. [2000] highlights a task switching procedure on an FPGA. There, the complete context of the device is read out and replaced by the context of another hardware implementation. The virtualization approach presented in the scope of this work relies on context extraction as well, but at a much finer granularity. In the work of Huang et al. [2009], hardware tasks are scheduled by means of techniques usually applied for software solutions. In contrast, we tackle the scheduling of tasks with the support of a hardware-based virtualization layer (VL). An evaluation of several scheduling schemes for task distribution in MPSoC may be found in Ventroux and Blanc [2005]. Other known virtualization approaches for embedded systems aim at different goals. To enable dynamic reuse of software on heterogeneous processor architectures, compiling at runtime is demonstrated in Cohen and Rohou [2010]. However, kernels or operating systems imply a significant overhead in resource-restricted embedded systems. A proposal to reduce such overhead is presented in Heiser [2008]. In contrast to Heiser’s work, we aim at multiprocessor systems. Another virtualizable MPSoC is detailed in Hansson et al. [2011]. There, an operating system is used again to enable the shift of applications to other processors. Thus, a fair amount of complexity is added to the system to exploit its virtualization property. Consequently, debugging and validating such designs becomes much harder. Please note that in conventional multiprocessor systems, the number of processor resources does not vary during runtime. In the advocated MPSoC architecture, however, the number of processors may easily be changed during runtime. ¨ In the work of Lubbers and Platzner [2009], a hybrid MPSoC is highlighted, in which a CPU is assisted by FPGA-based accelerators. The authors provide a corresponding programming model to handle this heterogeneous architecture. In contrast, the presented work is aimed at a homogeneous processor array in the first place. In the work of G¨ohringer et al. [2013], several reliability measurements are highlighted for an MPSoC built on top of a network-on-chip. Despite offering several means to increase a system’s reliability, no comprehensive methodology was provided to fully exploit the proposed reliable network-on-chip. To maintain system stability at any time, Varanasi and Heiser [2011] postulate five principles for module isolation. We adopt these principles and provide a strong encapsulation of software tasks as well as well-defined interfaces without unwanted side effects. Based on a proposal introduced in Biedermann et al. [2011] and refined in Biedermann and Huss [2012] and Biederman [2014], we discuss in the next section a novel virtualizable MPSoC architecture. ACM Transactions on Reconfigurable Technology and Systems, Vol. 8, No. 4, Article 26, Publication date: September 2015.

26:4


Fig. 1. A virtualizable MPSoC.

3. A VIRTUALIZABLE MPSoC ARCHITECTURE

The safety of a multiprocessor system is mainly defined by the mapping of tasks to processors (i.e., the set of task-processor bindings). Thus, to provide a multiprocessor system, which is optimal in terms of safety at any point in time, an architecture is needed, which may reshape its configuration upon request. The challenges here are fast and transparent switches between bindings as well as a varying number of processor instances and tasks at runtime. The foundation of the proposed architecture is the VL module (cf. Figure 1(a)). Such a module is able to monitor and manipulate both data and instructions being transferred between memories and processor cores. It therefore provides a variety of routines for task execution and communication as detailed in the sequel. Moreover, a strict separation of task memories is always ensured, thus contributing to a safe execution environment. 3.1. Transparent Task Suspension and Resuming

To temporarily suspend the execution of a task, the VL “highjacks” the instruction interface of the processor core. A dedicated portion of machine code is fed by the Code Injection Logic into the processor (cf. Figure 1(b)). This code portion consists of generic processor instructions, such as save word and branch commands present in every general-purpose processor instruction set. However, the VL detaches the data memory and reroutes the data memory interface of the processor toward a small dedicated memory located inside the VL, the so-called Task Context Memory. Besides the register content, the current program counter address is preserved in this memory as well. In doing so, the context of the task at the point in time of its suspension is extracted from the processor. This procedure is fast—for example, it takes just 44 clock cycles for a Xilinx 32-bit MicroBlaze soft-core processor. The duration is mainly affected by the size of the processor’s register set as the registers are read out sequentially within one clock cycle each. ACM Transactions on Reconfigurable Technology and Systems, Vol. 8, No. 4, Article 26, Publication date: September 2015.


26:5

Fig. 2. Task-processor interconnection network.

To resume a task’s execution, the steps of the procedure are performed in reversed order. As a last step, the program memory is reattached to the processor, and an unconditional jump to the previously saved program counter address is performed. This part takes 45 clock cycles in the reference implementation detailed in Section 5. Transparently suspending and resuming a task’s execution as outlined above is a prerequisite to enable the move of its execution within the processor array. 3.2. Shift of Task Execution

The VL features a sorting multistage interconnection network between task memories and processor instances (cf. Figure 2). The memory interfaces of tasks are inputs, and the processors’ memory interfaces are outputs of the network. By exploiting the sorting property of the network, a very fast routing can be achieved. Therefore, all processor instances are numbered in ascending order. To route a task to the targeted processor, it gets the number of the processor to which it is assigned. The network then sorts this input sequence by treating every crossbar switch as a comparator. For instance, to realize the binding denoted in the binding vector BV as BV = ((E → 1), (B → 2), (D → 3), (A → 4), (H → 5), (C → 7)), the network is being configured as depicted in Figure 2. The exploited multistage network is able to realize every required input-output configuration. As soon as a task is to be shifted within the processor array, the virtualization procedure halts the task’s execution and a new configuration of the network is computed in a top-down manner in parallel to the virtualization procedure. For an 8 × 8 network, a new configuration is computed in about 10 clock cycles. The duration depends on the depth of the network. Note that tasks, which do not feature an update of their processor binding, are not interrupted during this process. A benefit of this concept is the intrinsic feature for multitasking, because tasks may transparently share a processor. A binding, which maps two or more tasks to the same processor, forms a so-called task group. The VL manages the multitasking context switch within a task group autonomously. Therefore, a so-called self-scheduling instruction is added to the code of the tasks by the Code Injection Logic depicted in Figure 1(b) (cf. Biedermann and Huss [2012]). Synthesis results have shown that an array size of eight processor instances leads to a significant drop of the clock frequency. This is due to propagation delays within ACM Transactions on Reconfigurable Technology and Systems, Vol. 8, No. 4, Article 26, Publication date: September 2015.

26:6


internal communication paths in the purely combinatorial interconnection network. A feasible solution to this drawback is clustering smaller networks instead of exploiting a single larger one (cf. Biedermann [2014]). Moreoever, as FPGA platforms are not well suited to route the width of entire memory interfaces among the chip, selecting another target architecture, such as an ASIC, may also reduce the clock frequency drop.1 When exploiting the multitasking feature, a context switch between two tasks is accomplished in less than 100 clock cycles on the reference implementation. This outperforms by a factor of 15 common kernel-based solutions for embedded processors, such as the Xilkernel provided by Xilinx, Inc. for the MicroBlaze soft core [Xilinx, Inc. 2006]. 3.3. Task Communication

Conventionally, tasks communicate via point-to-point interfaces provided by the actual processor architecture. However, within the VL, the static binding between tasks and processors is resolved. Consequently, conventional point-to-point interfaces cannot be exploited anymore. Therefore, the concept for virtualized task communication introduced in Biedermann and Huss [2013] is being exploited as follows. To denote task communication in software, the designer exploits the “send” and “receive” commands provided by one of the default point-to-point communication interfaces. However, instead of denoting an interface identifier in this command, she sets the ID of the communication partner. At runtime, a task’s ISM inside the VL traps these instructions and modifies them to save word or load word commands, which access a so-called task data matrix residing in the VL. It features a send row and receive column for each communication participant. The IDs of the sending and receiving tasks are exploited to address the corresponding cell within the matrix. In doing so, tasks may still communicate via a point-to-point paradigm despite not knowing on which processor instance the communication partner is actually being executed. The outlined MPSoC architecture is thus able to transform any array of off-theshelf embedded processors into a virtualizable multiprocessor array. By exploiting code injection methods, any task execution can be stopped and resumed at any point in time. Furthermore, the task-processor bindings of the entire array can be updated at runtime. In doing so, a transparent multitasking service is provided. 4. DESIGN METHODOLOGY

To take advantage of the ability to reshape the system’s configuration at runtime and to apply arbitrary task-processor bindings to the virtualizable MPSoC architecture presented earlier, a design methodology is needed that provides a dynamic tasks-toprocessors mapping strategy. Such a methodology is being detailed in this section. 4.1. Safety Schedule Decision Diagram

We first provide in the sequel the terminology needed to understand the safety schedule decision diagram (SSDD) introduced in Definition 4.1.8, which forms the foundation of the advocated design methodology. 4.1.1. System Specification Graph. A system specification graph (SSG) consists of a set of task nodes (t ∈ Vt ) and a set of resource nodes (r ∈ Vr ). The nodes are connected via a 1 As

multistage networks feature a symmetrical layout (i.e., feature the same number of inputs and outputs), the number of tasks is at first limited as well. However, for FPGA technologies, other tasks may be mapped to the system, such as by partial dynamic reconfiguration, which may overwrite the content of program memories.



26:7

Fig. 3. System specification example.

set of mapping edges m ∈ Em ⊆ Vt × Vr , which represents all feasible mappings of tasks onto resources. Figure 3(a) depicts an example for such a system specification. The round nodes denote the tasks and the square nodes represent resources, whereas the dashed arrows depict mapping edges. Please note that the bold boundary of task node “A” in this figure implies that this task is to be scheduled exclusively on a resource. Such tasks are referred to as a schedule alone task. Every resource ri in Vr may be affected by both hard and soft errors. A mapping edge mi = (t, r) is attributed by a specific safety Ssp(mi ) and time Tsp(mi ) value, respectively. These values denote the probability of a safe execution of the task t on the mapped resource r in the presence of soft errors and the worst-case time required for t to execute on r, respectively. The table in Figure 3(b) provides some example figures for safety and time values for all mapping edges in the SSG given in Figure 3(a). 4.1.2. Task Instance. A task instance ti is the logical representation of a task on the resource set. It is denoted as a tuple (t, r), where t represents a task and r a resource, which can execute t. A task instance tii is attributed by a specific safety value Ssp(tii ), which denotes the probability of a safe execution of t on r in a soft error–prone environment and Tsp(tii ) represents the worst-case execution time of t executed on r. The task instances reflect the mapping edges introduced earlier. Therefore, the specific safety and time values for these task instances are the same as those of corresponding mapping edges. In the following, a task instance will be denoted by the name of the task indexed by the number of the resource. Column 2 of Table I shows all possible task instances for the SSG example of Figure 3(a). 4.1.3. Resource State Vector. Given a resource set, a resource state vector (RSV) represents the state of a system with respect to hard errors. This vector is composed of literals r1r2 . . . r|Vr | , where |Vr | represents the cardinality of the resource set Vr . A literal with a bar indicates a possible hard error in the corresponding resource. Table I shows all possible RSVs for the example SSG. The vector r1r2r3 , for instance, indicates that no resource is assumed to suffer from a hard error. In contrast, r 1r 2r 3 represents the case where all resources may feature such errors. 4.1.4. Resource State Tree. A resource state tree is a data structure representing all RSVs of an SSG. This tree has the following properties:

(1) The resource state tree is a binary tree. (2) The root and each internal node represent a resource, respectively. (3) The root and each internal node have two outgoing edges: a left one indicating that the corresponding resource has a hard error and a right one indicating that the corresponding resource is error free. ACM Transactions on Reconfigurable Technology and Systems, Vol. 8, No. 4, Article 26, Publication date: September 2015.

26:8

A. Biedermann et al. Table I. SI Sets of Example SSG Task Instance Set B C

RSV

A

r1 r2 r3

{A1 , A2 }

{B2 , B3 }

{C1 , C3 }

r1 r2 r 3

{A1 , A2 }

{B2 }

{C1 }

r1 r 2 r3

{A1 }

{B3 }

{C1 , C3 }

r1 r 2 r 3 r 1 r2 r3

{A1 } {A2 }

∅ {B2 , B3 }

{C1 } {C3 }

r 1 r2 r 3 r 1 r 2 r3 r1r2r3

{A2 } ∅ ∅

{B2 } {B3 } ∅

∅ {C3 } ∅

Valid and Invalid System Instances

{(A1 , B2 , C1 ), (A1 , B2 , C3 ), (A1 , B3 , C1 ), (A1 , B3 , C3 ), (A2 , B2 , C1 ), (A2 , B2 , C3 ), (A2 , B3 , C1 ), (A2 , B3 , C3 )} {(A1 , B2 , C1 ), (A2 , B2 , C1 )} {(A1 , B3 , C1 ), (A1 , B3 , C3 )} ∅ {(A2 , B2 , C3 ), (A2 , B3 , C3 )} ∅ ∅ ∅

System Instance Sets

OSI

S(SI)

T(SI)

{(A1 , B2 , C3 ), (A1 , B3 , C3 ), (A2 , B3 , C1 ), (A2 , B3 , C3 )}

(A2 ,B3 ,C3 )

0.884

0.23

∅

Nil

Nil

Nil

{(A1 , B3 , C3 )}

(A1 , B3 , C3 )

0.830

0.23

∅ {(A2 , B3 , C3 )}

Nil (A2 , B3 , C3 )

Nil 0.884

Nil 0.23

∅ ∅ ∅

Nil Nil Nil

Nil Nil Nil

Nil Nil Nil

Fig. 4. Example resource state tree.

(4) A path from the root node to a terminal node (i.e., a leaf) represents an RSV. This RSV is also written as the label of this terminal node. Figure 4 depicts the resource state tree for the resources of the SSG given in Figure 3(a). 4.1.5. System Instance. For each RSV, none, one, or several valid system instances (SIs) may be found. An SI is a logical representation of the task set on the resource set. A valid SI is defined as a combination of n ti’s, where

(1) n is the number of tasks in the SSG, (2) each task of the SSG is represented by one and only one task instance in the tuple, and (3) there must not be any task instance that is mapped to same resource that already executes a schedule alone task. An n-tuple, which does not fulfil the last condition, is called an invalid system instance. The concept of invalid SIs is needed, as our algorithms produce such tuples inherently during the process of SSDD generation. These tuples, however, are removed in the progress (see Appendix A). Please note that the definition of invalid SIs may be extended to instances whose time attribute is above a certain threshold. Table I shows all possible SIs for the SSG depicted in Figure 3(a). Let us denote the reliability of a system by R and its safety by S. In case of soft errors, we may assume without loss of generality that these errors are exponentially distributed. Then, for a resource ri within an SMR scheme, S = R = exp (−σ · T )

(1)



26:9

Table II. Safety and Time Attributes for the SIs of the Example SSG (A1 , (A1 , (A2 , (A2 ,

SI

Safety

Time (seconds)

B2 , C3 ) B3 , C3 ) B3 , C1 ) B3 , C3 )

0.839 0.830 0.829 0.884

0.2 0.23 0.3 0.3

Fig. 5. Time schedule diagrams for all SIs of the example SSG.

holds, whereas σ denotes the soft error rate as defined in Israr [2012] and T the worst-case execution time of the task on ri , respectively. Assuming that all tasks in the task graph are equally important for the system functionality, the safety value S of an SI, with respect to soft errors, is defined as S(SI) = Ssp(tii ). (2) tii ∈SI

The resulting execution time of an SI is then given by T (SI) = max T (ri , SI), ri ∈Vr

(3)

where T (ri , SI) denotes the execution time of the task instances mapped to the resource ri . This value is expressed as Tsp[tii ] : ri ∈ res(tii ) (4) T (ri , SI) = 0 : otherwise, tii ∈SI

where res(ti) denotes the set of resources the task corresponding to ti is instantiated on. Let us consider again the example SSG. One observes from Table I that the SIS for the RSV r1r2r3 is the super set of the SISs for the rest of the RSVs. Therefore, the SIs for only this SIS need to be considered. Table II denotes the safety and time attributes for all SIs in the SIS for the RSV r1r2r3 calculated using Equations (2) and (3), respectively, by means of the values given in the table of Figure 3(b). The time values for various SIs may also be observed from the execution schedules of various SIs, as depicted in Figure 5. 4.1.6. System Instance Set. The set of all valid SIs belonging to an RSV is denoted as the system instance set (SIS). Such a set is considered as a combination set, where each combination represents an SI. Due to the high degree of sparseness, a combination set is represented by a zero-suppressed binary decision diagram (ZBDD) instead of a binary decision diagram (BDD) (see Israr et al. [2009] for details). Column 4 of Table I shows the SISs for all RSVs of the example SSG. The SIS is empty for more than half of the system states. The corresponding RSVs represent a system failure due to hard errors in some resources. ACM Transactions on Reconfigurable Technology and Systems, Vol. 8, No. 4, Article 26, Publication date: September 2015.

26:10


The safety S of an SIS regarding soft errors is defined as the maximal safety of all SIs included in this SIS: S(SIS) = max {S(SIi )}. SIi ∈SIS

(5)

The execution time of an SIS is defined as the minimal execution time of all SIs included in this SIS: T (SIS) = min {T (SIi )}. SIi ∈SIS

Given an SSG, the number of possible SIs can be expressed as |SIS| = Countti (t),

(6)

(7)

t∈Vt

where Countti (t) denotes the number of task instances for the task t. In the worst case (i.e., if every task in Vt is mapped to every resource in Vr and only SMR task instances are allowed (see Section 4.1.9)), then the preceding expression can be simplified to |SIS| = |Vt ||Vr | . 4.1.7. Operational System Instance. Given the SIS for an RSV, an operational system instance (OSI) must be determined from the SIs in that SIS, which is defined as the SI with the maximum overall safety and minimum overall execution time when considering all SIs in the SIS. Let SIS RSV be the SIS for an RSV, then the OSI ∈ SIS RSV may be expressed as

OSI ∈ SISa : T (OSI) = T (SISa ), where SISa ⊆ SIS RSV ∧

S(SIi ) = S(SIS RSV ).

SIi ∈SISa

Consider again the example SSG. The size of SIS for both RSVs r 1r2r3 and r 1r 2r3 is one. Therefore, the corresponding entries in column 5 of Table I are the OSIs for these RSVs. Its columns 6 and 7 represent safety and time attributes, respectively, for the corresponding OSIs. The OSI for r1r2r3 is (A2 , B3 , C3 ), if the specific safety and time values for the mapping edges given in the table of Figure 3(b) are considered to be valid. On the other hand, if it is assumed that the specific safety values are equal, then the OSI for this RSV will be (A1 , B2 , C3 ). The computational complexity of any naive algorithm to explore the OSI for a RSV is given as O(|SIS| × |Vt |), where |SIS| is an exponential function of |Vt | and |Vr |. The complete procedure to explore the OSI for every RSV is presented in Section 4.2. 4.1.8. Safety Schedule Decision Diagram. A safety schedule decision diagram (SSDD) captures the conditions and configurations for a correct and safe operation of the entire system considering both hard errors on resources and soft errors, which may occur during the execution of some task instances. The task instance combination sets in case of SSDD only consist of valid SIs. The SSDD has three types of nodes, as visualized in Figure 13(a) in Appendix A:

(1) Resource nodes: These nodes settle on the upper portion of the SSDD and are represented as BDD nodes. The outgoing right edge of the node (continuous in Figure 13) indicates that the corresponding resource is free from hard errors. In contrast, the left edge (dotted line) indicates a hard error. Each path in this portion represents one or more RSVs. ACM Transactions on Reconfigurable Technology and Systems, Vol. 8, No. 4, Article 26, Publication date: September 2015.


26:11

Fig. 6. Design flow for generating OSIs.

(2) Task instance nodes: These nodes settle on the lower portion of the SSDD and model soft errors. These nodes belong to one or more ZBDDs, which represent the SISs of the system. Note that the ZBDD portion of the SSDD includes as many ZBDDs as the number of SISs indicates. Each of these ZBDDs features a root node, which in turn is a child of a resource node. (3) Two terminal nodes: These nodes include a right one labeled 1, which represents a functional system, and a left one labeled 0, which represents a failed system due to hard and/or soft errors. Finally, Figure 13(d) depicts the resulting SSDD for the SSG given in Figure 3(a), and Figure 13(a) through (c) shows the SSDD in intermediate stages, as discussed in the next section. Note that the safety attributes are not shown in these figures for clarity reasons. 4.1.9. Dual Modular Redundancy. For simplicity, the preceding definitions and examples assume single modular redundancy (SMR) only. Our solution, however, supports tworesource task instances, as well, if the SSG allows it. The task assigned to dual modular redundancy (DMR) is instantiated on two resources, and the result comparison is performed on one of them. If a soft error occurs in any of the resources during task execution, then the comparison result will request either for a rescheduling or a safe termination of this task. If safety is a major concern instead of reliability, then DMR generally is a better choice than triple modular redundancy (TMR). TMR is less safe than a DMR, but safer as compared to SMR (cf. Reimann et al. [2008]). When one or both resources suffer from hard errors, then we may reschedule the affected task on another DMR, if spare resources meeting the requirement are available. If, however, it is no longer possible to reschedule the task as a DMR, the SMR scheme is enforced. 4.2. Exploration for Operational System Instances

Figure 6 depicts the flow graph for generating OSIs for every system state. First, the input SSG is taken to generate the SSDD by Algorithm 4 using the procedure detailed in Appendix A. Then we need to identify the OSIs. To search the OSIs for every RSV, each node of the SSDD must be traversed. Israr et al. [2009] observed that the number of SIs in the SIS is exponentially related to the number of task instances. Obviously, the computational complexity of a naive approach is exponential (see Section 4.1.7). It may be possible to construct a heuristic search procedure (e.g., a genetic algorithm) to search for the best OSI. Such an algorithm may identify the correct OSI with a high success probability, but it may fail as well. Algorithm 1, on the other hand, always determines the correct OSI. In addition, this algorithm is much faster than the corresponding ACM Transactions on Reconfigurable Technology and Systems, Vol. 8, No. 4, Article 26, Publication date: September 2015.

26:12


genetic search algorithm: it only traverses each node once and is therefore of order O(N), where N denotes the number of nodes in the SSDD. The reason that each node is traversed only once is due to the dynamic programming method (e.g., Kang and Kim [2011]), which is implemented in lines 10 and 11 of Algorithm 1. Since the SSDD is a mixed BDD-ZBDD diagram, different operations have to be performed on resource and task instance nodes. The resource nodes are simply recorded as their working state defines the state of the system. The first task instance child node of a resource node is the root node for the ZBDD of the SIS. As defined previously, the OSI is an SI from the SIS that features both the best overall safety value and the minimum execution time. The safety of a task instance node is determined according to Equation (11) of Theorem B.3, whereas its execution time is determined from Equation (12) of Theorem B.4. The proofs of both theorems are given in Appendix B. The work in Israr [2012] presents a solution for selecting best binding with reliability and execution time as optimization parameters. There, however, the SSG consists of tasks that are data dependent on each other, and therefore an expression like Equation (12) in Appendix B cannot be proven. This makes it impossible to construct an efficient algorithm with O(N) complexity. Genetic algorithms therefore are used to explore the best binding for the general (i.e., data-dependent) case, which is more computationally complex as compared to the novel dynamic programming–based algorithm presented in the sequel. A comparison of resulting execution times is outlined in Section 5. Note that the optimal binding is defined as the one that has the best safety and execution time attributes among the rest of the possible bindings. Algorithm 1 takes the root node of the SSDD and recursively searches for the OSI for every system state. It sets the Dir attribute of all task instance nodes of an SSDD. A node with the Dir attribute set to right indicates that the task instance corresponding to the node is part of the OSI and the remainder of the task instances are present following the right edge of the node. If the attribute is set to left, then the task instance of the node is not part of the OSI and the rest of the task instances results from following the left edge. When the algorithm reading the Dir attributes reaches any of the leaf nodes, it indicates the completion of the sought OSI. The correctness of lines 18 through 22 of Algorithm 1, which implement the max function of the equation S(N) := max{S(ti) × S(NR), S(NL)}, has been proven in Theorem B.3 given in Appendix B. Here we determine, whether the task instance of the node is part of the OSI or not, based on the effect of its safety on the overall safety value of the OSI. When traversing a task instance node N, its task instances ti are denoted as being part (or not) of OSI by setting the Dir attribute of the node to right (or left). If the Ssp(ti) × S(NR) value is found to be greater than S(NL), then the task instance ti is chosen to be part of OSI and, if the contrary is true, then the task instance is to be excluded from the OSI. On the other hand, if both of these expressions are equal, then the decision for selecting the task instance of the node is determined according to the overall execution time of the OSI. Theorem B.4 provides the proof of correctness for the min function of the equation T (N) := min{T (ti) T (NR), T (NL)}, which is implemented in lines 23 through 32 of the algorithm. At this point, the task instance ti of the node is only selected if the Tsp(ti) × T (NR) value is found to be less than T (NL). Upon the completion of the algorithm, we yield the OSI for every system state. Figure 7 depicts two versions of the ZBDD representation of the SIS of RSV r1r2r3 with the Dir attributes of their nodes set differently after being processed by Algorithm 1. ACM Transactions on Reconfigurable Technology and Systems, Vol. 8, No. 4, Article 26, Publication date: September 2015.


26:13

ALGORITHM 1: FindOSI Require: SSDD Ensure: OSI for every system state (RSV) 1: N is root node of SSDD 2: function FINDOSI(N) 3: if N is the left terminal node of SSDD then 4: S(N) ⇐ 0.0 5: T (N) ⇐ −∞ 6: else if N is the right terminal node then 7: S(N) ⇐ 1.0 8: T (N) ⇐ 0.0 9: else 10: if Visited(N)==true then 11: return 12: else 13: Visited(N) = true 14: end if 15: FindOSI(NR) 16: FindOSI(NL) 17: if N represents a ti then

Locating SI with Max-Safety: Implementing Equation (11) 18: if Ssp(ti)×S(NR)>S(NL) then 19: Dir(N)←right

Case greater 20: else if Ssp(ti)×S(NR)T(NL) then

Case greater 26: Dir(N)←left Setting S(N) & T(N) values according to Dir(N) attribute 27: if Dir(N)=right then 28: S(N) ⇐ Ssp(ti)×S(NR) 29: T (N) ⇐ T (ti)T(NR) 30: else 31: S(N) ⇐S(NL) 32: T (N) ⇐T(NR) 33: end if 34: end if 35: end if 36: end if 37: end if 38: end function

The Dir attributes are set as in Figure 7(a) when specific safety and time values of the mapping edge are assumed as given in the table of Figure 3(b). On the other hand, if the specific safety values are assumed to be equal, then the Dir attributes are set as in Figure 7(b). It can be clearly observed from Figure 7(a) that the selected OSI is (A2 , B3 , C3 ) and the one in Figure 7(b) is (A1 , B2 , C3 ). The new method of determining task-processor bindings as presented in Algorithm 1 is efficient, as it features the complexity O(N), where N denotes the number of nodes in the SSDD. This algorithm always identifies correct OSIs for all RSVs, unlike any other heuristic exploration algorithms. The proof of correctness of the algorithm is given in Appendix B. Please note that such a proof can only be established for system specifications featuring tasks, which are not data dependent upon each other. In case of such dependencies, a genetic algorithm as elaborated in Israr [2012] may be applied. ACM Transactions on Reconfigurable Technology and Systems, Vol. 8, No. 4, Article 26, Publication date: September 2015.

26:14


Fig. 7. Dir attributes for the nodes of ZBDD of SIS for r1 r2 r3 .

However, there is no more a guarantee that existing correct OSIs are always found by this genetic algorithm, and its complexity is significantly greater than O(N). 4.3. OSI Optimization by Consideration of Secondary Constraints

An OSI resulting from the outlined approach, which forms the first phase of the advocated design methodology, has been optimized in terms of safety and execution time only. However, other secondary constraints, such as deadline constraints, may render this OSI invalid and thus require further optimization steps. In case an OSI violates secondary constraints, two possible solutions exist. In the first one, the actual OSI is discarded from the list of the SIS. Consequently, another entry from the SIS becomes the new OSI. Please note that with each OSI being discarded, the system safety decreases as well. Thus, the second solution is to take task priorities into consideration. In case of a violation of secondary constraints, the task with the lowest priority is removed from the task graph. Tasks are subsequently deleted until an OSI is found, which complies with the constraints. Although this method generally decreases the systems’ quality of service by discarding the least important tasks, the generated OSIs are guaranteed to still feature the highest possible safety values. As the decision of how to resolve the Pareto optimum between safety and quality of service is highly application specific, it is up to the designer to select the appropriate procedure to meet secondary constraints. The resulting optimized OSI features a valid but still static task-processor binding. ALGORITHM 2: ScheduleTaskGroup Require: A scheduling event within a task group TG, a task t1 being executed. Ensure: Suspension of t1 , activation of the next task t2 of TG. 1: if Time budget of t1 elapsed or t1 emits self-scheduling instruction then 2: Suspend t1 as described in Section 3.1 3: Select t2 , which is the next task listed in TG 4: Compute new network configuration, which routes t2 to corresponding processor as 5: detailed in Section 3.2

line 2 is executed in 6: parallel to line 4. 7: Evoke t2 as detailed in Section 3.1 8: end if

4.4. Static MPSoC Configuration

By exploiting the first phase of the outlined design methodology, an optimal OSI can be derived. This phase is denoted as Design Space Exploration in Figure 8. The resulting ACM Transactions on Reconfigurable Technology and Systems, Vol. 8, No. 4, Article 26, Publication date: September 2015.


26:15

Fig. 8. Design flow for virtualizable and reconfigurable MPSoCs.

OSI may be applied to any multiprocessor system as its binding configuration (e.g., on a virtualizable system as depicted in Figure 1(a)). This is achieved by setting up the interconnection network (cf. Figure 2) accordingly. In doing so, the system realizes a task-processor binding with optimal safety property and minimum execution time. When two or more tasks shall be bound to the same processor (i.e., form a task group), the virtualizable MPSoC realizes a transparent multitasking. Tasks may also trigger their suspension and the activation of the next task in the task group by a dedicated self-scheduling instruction. Algorithm 2 illustrates the special handling of task group scheduling. 4.5. Dynamic Configuration Reshaping

Although the OSI concept may be applied to any multiprocessor architecture, the virtualizable MPSoC introduced in Section 3 provides an intrinsic means to preserve the state of optimal safety in case of processor faults. Moreover, the system may dynamically change its configuration according to current requirements. This is achieved by a so-called reshaping process. During such a process, another OSI is selected from the binding database and then applied to the virtualizable MPSoC according to Algorithm 3. Algorithms 2 and 3 thus form the second phase of the design methodology depicted in Figure 8. The reshaping process of the entire system is quite fast: it takes less than 100 clock cycles on the FPGA prototype platform depicted in Figure 10—that is, it needs about the same time as a task switch within a task group (cf. Section 3.2). In case the detection of an error is the trigger for the reshaping procedure, the task originally executed on the now faulty processor is restarted. Its current context is discarded, as one cannot guarantee the integrity of the values in the register set content. 5. CASE STUDY

In an automotive sensor environment, optimal bindings for different scenarios, such as different road types or faults in modules, are created. Hence, the underlying architecture is then able to undergo a functional reshaping at runtime depending on the current scenario transition. Furthermore, if the occurrence of faults is detected, a safety-induced reshaping of the architecture ensures returning to a safe system state. The foundation of this case study is the Proreta project [Darms and Winner 2006]. Here, a car is equipped with a set of sensors. Raw sensor data are first accumulated by a so-called sensor fusion server and then passed to an application layer. In this layer, several independent tasks analyze sensor data to facilitate, for example, collision detection or parking assistance. The results are finally forwarded via CAN either to control corresponding actuators in the car or to display assistance messages on the driver’s cockpit interface. The layered structure of the system is depicted in Figure 9. ACM Transactions on Reconfigurable Technology and Systems, Vol. 8, No. 4, Article 26, Publication date: September 2015.

26:16


Fig. 9. Layers of the automotive assistance system example. Table III. Scenario-Task Assignment with Priority Numbers

ALGORITHM 3: ReshapeByVirtualization Require: A system realizing an OSI, OSI1 , a trigger for the reshaping procedure. Ensure: A reshaping of the system to realize OSI2 . 1: if Trigger requests a reshaping of the system’s configuration then

application-specific 2: Control processor (cf. Figure 10) fetches a new OSI, OSI2

storage location of OSI2 is 3: application specific 4: Control processor forwards OSI2 into VL 5: Suspend all tasks of OSI1 currently being executed according to Section 3.1 6: Compute new network configuration according to Section 3.2 7: for all Tasks ti denoted in OSI2 , where ti is either the first task listed in a task group or 8: is not part of a task group do 9: Activate ti as described in Section 3.1 10: end for 11: end if

For the case study, the sections highlighted in Figure 9 are considered by the proposed methodology. Therefore, the virtualizable MPSoC detailed in Section 3 realizes the application layer. A dedicated HW IP-core acts as the sensor fusion server, which forwards sensor data to the application layer. Processing results of tasks are passed to another HW IP-core, which acts as a CAN bridge and simulates communication with the car actuators or with the driver’s interface, respectively. The following set of safety-critical applications as well as comfort functions are considered for this case study: Collision Detection (CD), Blind Spot Detection (BSD), Lane Change Support (LCS), Lane Keeping Support (LKS), High Beam Assistance (HBA), Fog Light Assistant (FLA), Traffic Sign Detection (TSD), and Parking Assistant (PA). Out of the preceding set of tasks, three driving scenarios are composed: Freeway, Road, and Town. In each scenario, a subset of these tasks is considered (cf. Table III). The priority is given along with the tasks, whereas a low value denotes a high priority. 5.1. Prerequisites

The main purpose of the case study in the context of this work is to demonstrate both the functional and the self-healing–related reshaping properties of the outlined virtualizable and reconfigurable MPSoC architecture as well as the advantages of ACM Transactions on Reconfigurable Technology and Systems, Vol. 8, No. 4, Article 26, Publication date: September 2015.


26:17

Fig. 10. Architecture of the prototyped automotive assistance system. Table IV. Normalized Task Execution and Reactivation Times Related to Scenarios

the advocated design methodology. We therefore restrict ourselves to an SMR scheme as the foundation of the processor array and assign all reshaping requests to the control processor shown in Figure 10. The modules to accomplish the context extraction are subsumed as “VB” in this figure. The code running on this device thus acts as a simulation environment to perform and control the envisaged experiments. Related redundancy issues such as error detection, arbitration of results from a DMR scheme, or rescheduling of tasks will be addressed in a follow-up paper. Due to industrial nondisclosure requirements, neither the full tasks code nor the absolute execution times were available—only their approximate ratios were given. We therefore take the execution times shown in Table IV and denote them in normalized time units.2 Despite these restrictions, this approach still allows for a high-level assessment of the overall dynamic behavior of the system. The execution time is defined as the amount of time needed for interpreting a sensor input and for delivering an output via the CAN interface. Besides the execution times texec , a reactivation time tr is being introduced for each scenario and task. This value denotes the maximum amount of time between the end of a task’s execution and its next invocation. If the reactivation time is violated, a task will not produce reasonable results. Some tasks feature hard timing constraints, such as CD, LKS, and LCS. Besides their execution time, a continuous execution is essential to ensure their effectiveness. Scheduling one of these tasks with any other task on the same processor resource will lead to a violation of its timing constraints. Therefore, each of these tasks will be schedule alone tasks (cf. Section 4.1.1) and therefore are bound to a dedicated processor in the array. 2 Task execution times depend on factors that are highly implementation specific, such as the processor type employed or system clock frequency.


26:18


Fig. 11. Functional reshaping.

The usage of LKS and LCS, however, is mutually exclusive. Otherwise, LKS would interfere with the driver’s intention to change the lane. Therefore, LKS and LCS are regarded as the unified task LKS_LCS. We run the following reshaping experiments on top of an eight-processor array instance as depicted in Figure 10. 5.2. Optimal Bindings

To derive an optimal binding and task scheduling, information about reliability data of processors, the soft error rate, actual task execution times, and priorities have to be known in advance. Priorities of the tasks are given in Table III. By exploiting the design flow in Figure 8 for each scenario and the number of active processors in the system, an optimal binding may thus be derived. These bindings are then stored in an off-chip memory to be activated at runtime depending on the current driving situation. The code of the assistance tasks resides in the memory blocks of the FPGA device. An online binding calculation may be feasible as well, but this is outside the scope of this article. A control processor triggers reshaping events and applies appropriate bindings fetched from the off-chip memory. The resulting architecture is depicted in Figure 10. 5.3. Reshaping Experiments

5.3.1. Functional Reshaping. In this scenario, a car leaves the freeway heading to a rural road and subsequently enters a town. As a consequence, the set of active tasks changes at runtime from Freeway over Road to the Town scenario. As soon as the car detects the transition from one scenario to another, a binding update of the underlying MPSoC is triggered. The control processor fetches an OSI for the actual RSV element from the offchip memory, and the reshaping procedure of Algorithm 3 is then applied. In doing so, the tasks needed for driving on a rural road and later for driving in town are activated, whereas tasks not needed anymore are deactivated (cf. Figure 11). Scheduling of the tasks mapped to the same processor resource is managed intrinsically by the VL via interpreting the self-scheduling instructions (cf. Section 3.2) inserted at the end of the code of each assistance task. For this experiment, a fault-free system is assumed. We ran the functional reshaping experiment for a set of four array processors. For this set, constructing an SSDD and identifying valid OSIs results in the optimal bindings listed in Table V. The update of bindings, as well as the simultaneous setup of the task-processor interconnection network, takes less than 100 clock cycles.3 This transition time is about 15 times shorter than a conventional thread switch as in common kernels for embedded processors, such as the Xilkernel provided by the device vendor Xilinx, Inc. 3 This

time mainly depends on the number of processor registers, which have to be read out and restored during a binding update.



26:19

Table V. Optimal Bindings for the Four-Processor Variant

Table VI. Subset of OSIs for the Eight-Processor Variant

Fig. 12. Safety reshaping after the detection of faulty processor resources.

5.3.2. Safety Reshaping. In case the result of a task application is out of the expected range of values, either a hard or a soft error in the processor may be assumed. The reliability of the employed MicroBlaze soft-core processors is estimated as 0.99 and the soft error rate as 0.01. These data were considered for the determination of the bindings on an eight array processor architecture. Because of the SMR scheme employed, the affected processor should be disabled in this case, and a binding update needs to be performed using either a spare resource in the array and dynamically allocating the now unhosted tasks or rescheduling them on the remaining set of processors. If timing constraints cannot be met anymore, the most dispensable tasks may be discarded until timing constraints are met. A heuristic method therefore removes tasks with the lowest priority first. Table VI denotes the OSIs for some RSV elements featuring up to five hard errors, and Figure 12 depicts the graceful degradation of the system when applying the reshaping procedure of Algorithm 3 as five processors fail one by one. Table VII depicts a comparison of the execution times given in seconds of the proposed task mapping ACM Transactions on Reconfigurable Technology and Systems, Vol. 8, No. 4, Article 26, Publication date: September 2015.

26:20

A. Biedermann et al. Table VII. Comparison of Computation Time of Task-Processor Binding Algorithms

Table VIII. Synthesis Results for a Xilinx Virtex-6 LX240T FPGA

algorithm denoted as tFindOSI to the one presented in Israr [2012] denoted as tGenOpt for the system specification graphs exploited within this case study. As is visible from this table, the genetic optimization–based algorithm needed up to three orders of magnitude more computation time to determine the correct task-processor bindings. One may argue that GenOpt’s few seconds do not matter much in case of an offline calculation as exploited for this application example. We therefore generated a set of artificial SSGs to study how the size of a graph affects the computation times of both methods. For example, an SSG featuring 16 tasks results in 15,870 nodes of the SSDD, which are to be processed for finding the correct task-processor bindings. The GenOpt method required 5 minutes and 37 seconds to complete, whereas FindOSI needed just 3.5 milliseconds. Thus, only the FindOSI algorithm opens the door to an adequate online processing as envisaged in our future work. The application examples have been successfully synthesized for a Xilinx Virtex-6 LX240T FPGA and employ MicroBlaze soft-core processors. Table VIII lists the resulting synthesis figures for the prototyped architecture. The MicroBlaze processors are dedicated and thus highly optimized for Virtex architectures. As a consequence, they account for about 8,000 registers and 8,000 look-up tables only. Nevertheless, the overhead introduced by the VL, about 13% of the registers and 53% of the LUTs of the medium-sized LX240T FPGA, is still negligible considering the amount of available logic resources in today’s reconfigurable devices. 6. CONCLUSIONS

This work has introduced a comprehensive methodology for designing embedded MPSoCs featuring self-healing properties. This novel methodology is based on the concept of a hardware-based virtualization approach as well as on an iterative taskprocessor assignment procedure, which produces the required bindings. These bindings are proven to be operational and at the same time optimal for the selected optimization goals. The virtualization features enable a dynamic reshaping of the embedded MPSoC, which allows to meet important design goals such as safety and execution time limits even at the event of changing runtime environments or module defects. This is accomplished by selecting an appropriate task-processor binding from the previously computed set at runtime. Thus, dynamic binding updates may be applied to virtualizable multiprocessor arrays, which feature the proposed VL, because the reshaping of the underlying reconfigurable fabric outperforms common OS kernel-based systems by more than one order of magnitude. Such a reconfigurable embedded MPSoC yields a ACM Transactions on Reconfigurable Technology and Systems, Vol. 8, No. 4, Article 26, Publication date: September 2015.


26:21

self-healing system, as triggers for runtime reshaping are generally defective resources, changes in functional requirements, or energy constraints. Both the feasibility and efficiency of this approach has been demonstrated by means of a high-level assessment of a complex automotive application example. In future work, we plan to investigate the efficiency of the outlined methodology when applied to heterogeneous multiprocessor/ multicoprocessor system-on-chip architectures as well as the characteristics of an online task-processor binding calculation method. ACKNOWLEDGMENTS The authors extend their thanks to the anonymous reviewers for their suggestions and comments, which have helped to improve the presentation of this work.

REFERENCES A. Biedermann. 2014. Design Concepts for a Virtualizable, Embedded MPSoC Architecture. PhD Dissertation. TU Darmstadt. A. Biedermann and S. A. Huss. 2012. Hardware virtualization-driven software task switching in reconfigurable multi-processor system-on-chip architectures. In Proceedings of the Workshop on Mapping of Applications to MPSoCs. ACM, New York, NY, 32–41. A. Biedermann and S. A. Huss. 2013. A methodology for invasive programming on virtualizable embedded MPSoC architectures. In Proceedings of the International Conference on Computational Science. 359– 368. A. Biedermann, M. Stoettinger, L. Chen, and S. A. Huss. 2011. Secure virtualization within a multi-processor soft-core system-on-chip architecture. In Proceedings of the International Symposium on Applied Reconfigurable Computing. 385–396. G. J. Brebner. 1996. A virtual hardware operating system for the Silinx XC6200. In Proceedings of the Conference on Field-Programmable Logic and Applications. 327–336. G. J. Brebner and O. Diessel. 2001. Chip-based reconfigurable task management. In Proceedings of the Conference on Field-Programmable Logic and Applications. IEEE, Los Alamitos, CA, 182–191. R. E. Bryant. 1986. Graph-based algorithms for Boolean function manipulation. IEEE Transactions on Computers C-35, 8, 677–691. C. Calvert, G. L. Hamza-Lup, A. Agarwal, and B. Alhalabi. 2011. An integrated component selection framework for system-level design. In Proceedings of the International Systems Conference. IEEE, Los Alamitos, CA, 261–266. A. Cohen and E. Rohou. 2010. Processor virtualization and split compilation for heterogeneous multicore embedded systems. In Proceedings of the Design Automation Conference. 102–107. ¨ ein fahrerassistenzsystem zur unfallvermeidung. In M. Darms and H. Winner. 2006. Umfelderfassung fur VDI Berichte, Vol. 1931. Dusseldorf, Germany, 207. A. Das, A. Kumar, and B. Veeravalli. 2014. Energy-aware task mapping and scheduling for reliable embedded computing systems. ACM Transactions on Embedded Computing Systems 13, 72. M. Glaß, M. Lukasiewycz, T. Streichert, C. Haubelt, and J. Teich. 2007. Reliability-aware system synthesis. In Proceedings of the Conference on Design, Automation, and Test in Europe. 409–414. D. G¨ohringer, L. Meder, O. Oey, and J. Becker. 2013. Reliable and adaptive network-on-chip architectures for cyber physical systems. ACM Transactions on Embedded Computing Systems 12, 51. A. Hansson, M. Ekerhult, A. Molnos, A. Milutinovic, A. Nelson, J. Ambrose, and K. Goossens. 2011. Design and implementation of an operating system for composable processor sharing. Microprocessors and Microsystems 35, 2, 246–260. G. Heiser. 2008. The role of virtualization in embedded systems. In Proceedings of the Workshop on Isolation and Integration in Embedded Systems. ACM, New York, NY, 11–16. M. Huang, H. Simmler, O. Serres, and T. A. El-Ghazawi. 2009. RDMS: A hardware task scheduling algorithm for reconfigurable computing. In Proceedings of the Conference on Parallel and Distributed Processing. IEEE, Los Alamitos, CA, 1–8. A. Israr. 2012. Reliability Aware High-Level Embedded System Design in Presence of Hard and Soft Errors. PhD Dissertation. TU Darmstadt. A. Israr and S. A. Huss. 2012. Memory efficient reliability assessment for system-level design of embedded systems. In Proceedings of the Asia Symposium on Quality of Electronic Design. 238–246.


26:22


A. Israr and S. A. Huss. 2014. Reliable system design using decision diagrams in presence of hard and soft errors. In Proceedings of the Bhurban Conference on Applied Sciences and Technology. IEEE, Los Alamitos, CA, 136–144. A. Israr, A. Shoufan, and S. A. Huss. 2009. A compact error model for reliable system design. In Proceedings of the Conference on High Performance Computing and Simulation. 60–66. V. Izosimov, I. Polian, P. Pop, P. Eles, and Z. Peng. 2009. Analysis and optimization of fault-tolerant embedded systems with hardened processors. In Proceedings of the Conference on Design, Automation, and Test in Europe. 682–687. S. Kang and T. J. Kim. 2011. Adaptive dynamic programming approach to a multi-purpose location-based concierge service model. Intelligent Transport Systems 5, 4, 277–285. C. Lee, H. Kim, H. W. Park, S. Kim, H. Oh, and S. Ha. 2010. A task remapping technique for reliable multicore embedded systems. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis. 307–316. ¨ E. Lubbers and M. Platzner. 2009. ReconOS: Multithreaded programming for reconfigurable computers. ACM Transactions on Embedded Computing Systems 9, 1, 8. B. H. Meyer, A. S. Hartman, and D. E. Thomas. 2010. Cost-effective slack allocation for lifetime improvement in NoC-based MPSoCs. In Proceedings of the Conference on Design, Automation, and Test in Europe. 1596–1601. S. Minato. 1993. Zero-suppressed BDDs for set manipulation in combinatorial problems. In Proceedings of the Design Automation Conference. 272–277. F. Reimann, M. Glaß, M. Lukasiewycz, C. Haubelt, J. Keinert, and J. Teich. 2008. Symbolic voter placement for dependability-aware system synthesis. In Proceedings of the 6th International Conference on Hardware/Software Codesign and System Synthesis. 237–242. ¨ H. Simmler, L. Levinson, and R. Manner. 2000. Multitasking on FPGA coprocessors. In Proceedings of the Conference on Field-Programmable Logic and Applications. IEEE, Los Alamitos, CA, 121–130. P. Varanasi and G. Heiser. 2011. Hardware-supported virtualization on ARM. In Proceedings of the 2nd Asia-Pacific Workshop on Systems. ACM, New York, NY, 11. N. Ventroux and F. Blanc. 2005. A low complex scheduling algorithm for multi-processor system-on-chip. In Proceedings of the Conference on Parallel and Distributed Computing and Networks. 540–545. Xilinx, Inc. 2006. Xilkernel. Retrieved August 25, 2015, from http://www.xilinx.com/ise/embedded/edk91i_ docs/xilkernel_v3_00_a.pdf. Received March 2014; revised September 2014; accepted November 2014


26 Safe Dynamic Reshaping of Reconfigurable ... - ACM Digital Library

26 Safe Dynamic Reshaping of Reconfigurable ... - ACM Digital Library

Suggest Documents

Managing Pipeline-Reconfigurable FPGAs - ACM Digital Library

Parameterized architecture-level dynamic ... - ACM Digital Library

Dynamic Three-dimensional Information ... - ACM Digital Library

A Design Flow for Partially Reconfigurable ... - ACM Digital Library

A Desktop Computer with a Reconfigurable ... - ACM Digital Library

design - ACM Digital Library

crpit - ACM Digital Library

Conversations - ACM Digital Library

Incentives - ACM Digital Library

Gunrock - ACM Digital Library

Abstract - ACM Digital Library

AdaGIDE - ACM Digital Library

MOVELETS - ACM Digital Library

26 Extreme-Scale High-Order WENO ... - ACM Digital Library

P10 - ACM Digital Library

2PXMiner - ACM Digital Library

feature - ACM Digital Library

C++ ... - ACM Digital Library

practice - ACM Digital Library

TSP: Thermal Safe Power - Efficient power ... - ACM Digital Library

WatchOut: Facilitating Safe Driving Behaviors ... - ACM Digital Library

Proceedings of - ACM Digital Library

Design and Evaluation of Dynamic Optimizations ... - ACM Digital Library

The Dynamics of Changing Dynamic Memory ... - ACM Digital Library