minimized subject to the system's timing and resource constraints. ...... Layland [56] showed that a periodic workload set W = {T1, ··· ,Tn} is schedulable ...... and packet priority queue scheduling under EDF. ...... modulation speech encoding.
COMPOSITIONAL FRAMEWORK FOR REAL-TIME EMBEDDED SYSTEMS
Insik Shin
A DISSERTATION in Computer and Information Science
Presented to the Faculties of the University of Pennsylvania in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
2006
Insup Lee, Supervisor of Dissertation
Rajeev Alur, Graduate Group Chair
COPYRIGHT
Insik Shin
2006
This dissertation is dedicated to my parents.
iii
Acknowledgements First and foremost, I deeply appreciate my advisor, Professor Insup Lee. He has infused me with enthusiasm for research, guided me through different research stages, left me enough freedom to do things the way I though they should be done, trusted me when I doubted myself, and has always availed himself to me no matter what kind of problems I have. I just cannot imagine a better advisor. Special thanks go to my dissertation committee members, Professor Rajeev Alur, Professor Sanjeev Khanna, Professor C.J. Taylor and Professor Raj Rajkumar. Each devoted significant time and effort to my thesis, and their suggestions and comments led to substantial improvement in the final product. I would also like to thank Professor Sudipto Guha, Professor Sampath Kannan, Professor Saswati Sarkar, and Professor Oleg Sokolsky for helpful discussions and valuable comments on my research topics and questions. I would like to express my special gratitude to Professor Sang Lyul Min at Seoul National University. He always impressed me with his enthusiam for research and personality. Collaboration with him was a fruitful, instructive, and wonderful experience. I am also greatly indebted to Dr. Sheayun Lee and Dr. Woonseok Kim for their valuable contribution in collaboration. Thanks also go to past and present members of the Real-Time Systems Research Group at the University of Pennsylvania. In particular, I owe many thanks to Maria Adamou,
iv
Madhukar Anand, Dave Arney, Arvind Easwaran, Aaron Evans, Dr. Sebastian Fischmeister, Yerang Hur, Dr. Jesung Kim, Dr. Moojoo Kim, Wonhong Nam, Dr. Jangwoo Shin for helpful discussions and for being great friends of mine. I was so fortunate to belong to the Hanwoori tennis club. It significantly helped me to lead a physically, mentally healthy life during my long, lonely graduate years at Penn. A close friendship with Hanwoori members is another valuable harvest from Philadelphia. I cannot imagine being in this position without the unwavering support and love from my parents. They taught me the value of education, ensured that I had every opportunity, encouraged me to make the most of each, and showed their pride in all my accomplishments.
v
ABSTRACT COMPOSITIONAL FRAMEWORK FOR REAL-TIME EMBEDDED SYSTEMS Insik Shin Supervisor: Insup Lee
An embedded system consists of a collection of components that interact with each other and with their environment through sensors and actuators. Two key characteristics of embedded systems are real-time and resource-constrained. As embedded systems become more complex due to increased functionalities, it is desirable to achieve the compositional design and analysis of resource-constrained real-time systems, i.e., the system-level design and analysis on the timing and resource aspects can be achieved by composing independently obtained component-level design and analysis results. In this dissertation, we propose a framework for this problem. In the proposed framework, we develop techniques for the compositional schedulability analysis of real-time systems through real-time component interfaces. We also develop techniques for supporting the compositional design and analysis of resource-constrained real-time systems by determining the resource use of each task within individual components such that a total cost on collective resource use is minimized subject to the system’s timing and resource constraints. In the real-time systems community, compositional schedulability analysis has not been adequately addressed except trivial cases. In this dissertation, we extend the results of traditional real-time scheduling theories by including a notion of real-time resource model into schedulability analysis. We propose a periodic resource model that can specify periodic behavior in resource allocations and develop exact schedulability conditions with the worst-case resource supply scenario of the proposed periodic resource model. Based on this result, we derive a periodic component interface that specifies the minimum periodic resource requirements necessary to guarantee the schedulability of individual components. vi
We then achieve the compositional schedulability analysis of real-time systems through the periodic component interface. Typical scarce resources for real-time embedded systems include energy for batteryoperated systems and memory for cost-sensitive systems. Many techniques have been proposed to reduce energy consumption and program code size, respectively. These techniques often produce tradeoffs between reducing resource consumption and increasing program execution time. Given such tradeoffs for resource-constrained real-time systems, we consider a multidimensional optimization problem that is to determine the resource use of individual workloads while a total cost on the resource uses, in terms of processor utilization, code size, and processor energy consumption, subject to the system’s real-time and resource constraints. Showing the NP-hardness of this problem, we develop a framework for exploring tradeoff space to find sub-optimal solutions efficiently and extend the framework for addressing the problem compositionally. In this dissertation, we propose a framework for supporting component-based design and analysis on timing and resource aspects. Our proposed framework lays a groundwork for future advances of component-based design and analysis techniques for real-time embedded systems.
vii
Contents
Acknowledgements
iv
Abstract
vi
Contents
viii
List of Tables
xiii
List of Figures
xiv
1 Introduction 1.1
1.2
1
Motivation and Problem . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.1.1
Hierarchical Schedulability Analysis . . . . . . . . . . . . . . . . .
3
1.1.2
Real-time Component Interface . . . . . . . . . . . . . . . . . . .
6
1.1.3
Compositional Design for Real-time Embedded Systems . . . . . .
7
Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
1.2.1
Compositional Schedulability Analysis . . . . . . . . . . . . . . .
9
1.2.2
Design optimization for real-time embedded systems. . . . . . . . . 12
1.3
Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4
Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
viii
2 Background 2.1
2.2
16
Real-Time Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.1.1
Real-time Scheduling Algorithms . . . . . . . . . . . . . . . . . . 17
2.1.2
Schedulability Analysis . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.3
Real-time Workload Models . . . . . . . . . . . . . . . . . . . . . 22
2.1.4
Hierarchical Real-Time Scheduling Framework . . . . . . . . . . . 24
Real-Time Embedded Systems . . . . . . . . . . . . . . . . . . . . . . . . 26 2.2.1
Code Size Reduction Technique . . . . . . . . . . . . . . . . . . . 27
2.2.2
Energy Reduction Techniques . . . . . . . . . . . . . . . . . . . . 29
3 System Model 3.1
31
System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.1.1
Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.1.2
Models, Terms and Notations . . . . . . . . . . . . . . . . . . . . 32
4 Schedulability Analysis with Resource Models
40
4.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2
Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3
Workload Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.4
Resource Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.5
4.6
4.4.1
Periodic Resource Model . . . . . . . . . . . . . . . . . . . . . . . 48
4.4.2
Bounded-Delay Resource Model . . . . . . . . . . . . . . . . . . . 49
Schedulability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.5.1
Schedulability Analysis under EDF Scheduling . . . . . . . . . . . 51
4.5.2
Schedulability Analysis RM Scheduling . . . . . . . . . . . . . . . 53
Schedulable Workload Utilization Bounds . . . . . . . . . . . . . . . . . . 55 4.6.1
Periodic Resource Model . . . . . . . . . . . . . . . . . . . . . . . 55 ix
4.6.2 4.7
4.8
Bounded-Delay Resource Model . . . . . . . . . . . . . . . . . . . 74
Schedulable Resource Capacity Bounds . . . . . . . . . . . . . . . . . . . 84 4.7.1
Periodic Resource Model . . . . . . . . . . . . . . . . . . . . . . . 84
4.7.2
Bounded-Delay Resource Model . . . . . . . . . . . . . . . . . . . 88
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5 Real-Time Component Interfaces
93
5.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.2
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.3
System Model and Problem Statement . . . . . . . . . . . . . . . . . . . . 99
5.4
Real-time Component Interfaces . . . . . . . . . . . . . . . . . . . . . . . 100
5.5
5.6
5.7
5.4.1
Periodic Interface Model . . . . . . . . . . . . . . . . . . . . . . . 101
5.4.2
Bounded-delay Interface Model . . . . . . . . . . . . . . . . . . . 103
Abstraction Overheads: Analytical Results . . . . . . . . . . . . . . . . . . 105 5.5.1
Periodic Interface Model . . . . . . . . . . . . . . . . . . . . . . . 105
5.5.2
Bounded-Delay Interface Model . . . . . . . . . . . . . . . . . . . 108
Abstraction Overheads: Simulation Results . . . . . . . . . . . . . . . . . 109 5.6.1
Periodic Interface Model . . . . . . . . . . . . . . . . . . . . . . . 109
5.6.2
Bounded-Delay Interface Model . . . . . . . . . . . . . . . . . . . 112
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6 Compositional Scheduling Framework for Distributed Real-time Systems
117
6.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.2
System Model and Problem Statement . . . . . . . . . . . . . . . . . . . . 120
6.3
6.2.1
System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
6.2.2
Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . 122
Analysis for Dual Resource Scheduling with Dependency . . . . . . . . . . 123 x
6.4
6.3.1
Earliest-Deadline-First (EDF) Scheduling . . . . . . . . . . . . . . 124
6.3.2
Rate-Monotonic (RM) Scheduling . . . . . . . . . . . . . . . . . . 128
Compositional Analysis for Network Channel . . . . . . . . . . . . . . . . 132 6.4.1
Periodic Abstraction for Messages under EDF . . . . . . . . . . . . 133
6.4.2
Periodic Abstraction for Node Level Messages under RM . . . . . 137
6.4.3
Schedulability Analysis for Network Channel . . . . . . . . . . . . 139
6.4.4
Compositional Analysis for Network Schedulability . . . . . . . . 141
6.5
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
6.6
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
7 Design Framework for Real-Time Embedded Systems
145
7.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
7.2
System Model and Problem Definition . . . . . . . . . . . . . . . . . . . . 150
7.3
Design Optimization Problem: SETO . . . . . . . . . . . . . . . . . . . . 153
7.4
Algorithms for SETO Problem . . . . . . . . . . . . . . . . . . . . . . . . 157
7.5
7.6
7.7
7.4.1
Phase 1: Satisfying the timing and energy constraints . . . . . . . . 159
7.4.2
Phase 2: Minimizing the system cost function . . . . . . . . . . . . 162
7.4.3
The assignment algorithm . . . . . . . . . . . . . . . . . . . . . . 164
7.4.4
Reverse-direction version . . . . . . . . . . . . . . . . . . . . . . . 167
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 7.5.1
Simulation setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
7.5.2
Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . 171
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 7.6.1
Discrete-level CPU frequency settings . . . . . . . . . . . . . . . 177
7.6.2
Non-convex property of size/cycle tradeoff list . . . . . . . . . . . 179
Extension I: Component Abstraction . . . . . . . . . . . . . . . . . . . . . 180
xi
7.8
7.9
7.7.1
Component and Interface Models . . . . . . . . . . . . . . . . . . 181
7.7.2
Component Abstraction Problem: CAP-USE . . . . . . . . . . . . 182
7.7.3
Algorithm to Component Abstraction Problem . . . . . . . . . . . 183
Extension II: Compositional Approach to the SETO Problem . . . . . . . . 186 7.8.1
Component and Interface Models . . . . . . . . . . . . . . . . . . 187
7.8.2
Component Abstraction Problem: CAP-SCF . . . . . . . . . . . . 187
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
8 Conclusions and Future Work
192
8.1
Outstanding Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
8.2
Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
8.3
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Bibliography
198
xii
List of Tables 3.1
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
7.1
Benchmark Programs Used in the Experiments . . . . . . . . . . . . . . . 168
7.2
Impact of the Number of Tasks on the Solution Quality . . . . . . . . . . . 172
7.3
Impact of the Tightness of the Code Size Constraint on the Solution Quality 173
7.4
Impact of the Tightness of the Energy Constraint on the Solution Quality . . 173
7.5
Impact of the System Utilization on the Solution Quality . . . . . . . . . . 174
7.6
Impact of the Relative Importance of the System Code Size and the System Energy Consumption on the Solution Quality . . . . . . . . . . . . . . . . 175
xiii
List of Figures 1.1
Hierarchical scheduling framework: a resource is scheduled by a scheduler and each share of the resource is subsequently scheduled by other scheduler.
1.2
4
Resource types: (a) a dedicated resource is allocated all the time, and (b) a time-shared resource is allocated at some times but not allocated at the other times. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1
9
The worst-case resource supply of periodic resource model ΓhΠ, Θi for k = 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.2
Bounded-delay resource model: example. . . . . . . . . . . . . . . . . . . 50
4.3
Four cases to consider in deriving a utilization bound under EDF scheduling. 56
4.4
Utilization bound as a function of resource capacity: (a) under EDF scheduling and (b) under RM scheduling. . . . . . . . . . . . . . . . . . . . . . . 61
4.5
Utilization bounds of a bounded-delay resource model Φhα, ∆i, where α = 0.5, as a function of k, where k = Pmin /∆, under EDF and RM scheduling
4.6
76
Capacity bound as a function of workload utilization: (a) under EDF scheduling and (b) under RM scheduling. . . . . . . . . . . . . . . . . . . . . . . 86
4.7
Capacity bound as a function of workload utilization: (a) under EDF scheduling and (b) under RM scheduling. . . . . . . . . . . . . . . . . . . . . . . 90
5.1
Hierarchical scheduling framework. . . . . . . . . . . . . . . . . . . . . . 96
xiv
5.2
Schedulable region of a periodic resource ΓhΠ, Θi: (a) under EDF scheduling and (b) under RM scheduling. . . . . . . . . . . . . . . . . . . . . . . 101
5.3
Example of solution space of a bounded-delay scheduling interface model Φhα, ∆i for a workload set W = {T1 h100, 11i, T2h150, 22i} under EDF and RM scheduling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.4
Analytical bound of component abstraction overhead of periodic interface as a function of workload utilization: (a) under EDF scheduling and (b) under RM scheduling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.5
Component abstraction overheads of periodic interfaces as a function of workload utilization: (a) under EDF scheduling and (b) under RM scheduling.109
5.6
Component abstraction overheads of periodic interfaces as a function of interface period: (a) under EDF scheduling and (b) under RM scheduling. . 110
5.7
Component abstraction overheads of periodic interfaces as a function of the number of tasks: (a) under EDF scheduling and (b) under RM scheduling. . 111
5.8
Component abstraction overheads of bounded-delay interfaces as a function of k, where k = Pmin /D: (a) under EDF scheduling and (b) under RM scheduling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.9
Component abstraction overheads of bounded-delay interfaces as a function of workload utilization: (a) under EDF scheduling and (b) under RM scheduling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.10 Component abstraction overheads of bounded-delay component interfaces as a function of the number of tasks: (a) under EDF scheduling and (b) under RM scheduling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 6.1
Overview of our compositional distributed real-time scheduling Framework. 118
6.2
An example of EDF schedule. . . . . . . . . . . . . . . . . . . . . . . . . 124
xv
6.3
Worst-case scenario for the response time of Message 2 under RM scheduling130
7.1
Overall structure of the proposed design framework. . . . . . . . . . . . . . 147
7.2
The algorithm for assigning an size/cycle descriptor to each task. . . . . . . 165
7.3
Composing task-level tradeoffs into component-level tradeoff. . . . . . . . 186
7.4
The size/cycle tradeoff list SC: (a) EDF and (b) RM scheduling. . . . . . . 188
xvi
Chapter 1 Introduction Embedded systems are computer systems including hardware and software that are specially designed for particular kind of application devices, different from general-purpose computer systems. Current embedded systems control a range of devices, including cellular phones and PDAs, household appliances, automobiles, industrial machines, medical equipment, and airplanes. One of key characteristics of embedded systems is resourceconstrained; they are often subject to operating with scarce resources, which include processor power, memory, power, and communication bandwidth. Another key characteristic of embedded systems is real-time; they are often required to react to events or complete tasks in specific time. A real-time system is a system in which correctness depends not only on logical correctness, but also on timeliness. The schedulability analysis problem is one of the most principal problems for real-time systems, which is to determine whether or not their timing constraints can be satisfied. For resource-constrained real-time systems, an interesting 1
problem is a design problem, called the resource optimization problem, which is to determine the resource utilizations of scarce resources, such as processor, memory, and power, with a goal of minimizing global criteria on multiple resource utilizations subject to various real-time and resource constraints. As real-time embedded systems become more complex due to increased functionalities, it is necessary to develop techniques and methods that facilitate the design of large complex systems from subsystems. Component-based design has been widely accepted as a methodology for designing systems through systematic abstraction and composition. Componentbased design provides a means both for decomposing a system into components, allowing the reduction of a single complex design problem into multiple simpler problems, and for composing components into a system through component interfaces that abstract and hide their internal complexity. Component-based design also facilitates the reuse of components that may have been developed in different environments. Therefore, it is desirable to address the design problems of real-time embedded systems through component-based design. In this dissertation, we develop a framework for supporting component-based design techniques for real-time embedded systems. Our approach focuses on developing component interfaces, which are key to supporting component-based design, that accommodate two key aspects of real-time embedded systems: timeliness and resource utilization. Through these component interfaces, we achieve compositional schedulability analysis and present an compositional approach to the resource optimization problem of real-time em-
2
bedded systems. Our framework is based on an interdisciplinary approach that contributes to the state of the art in three key areas: • Real-time Scheduling: We extended the results of traditional real-time scheduling theories by including the notion of resource model. • Component-based Design: We developed component-based design techniques that support two para-functional aspects: timeliness and resource utilization. • Real-time Embedded Systems: We developed a framework that can be used for addressing design optimization problems, which are to minimize the total cost on several different resources subject to various real-time and resource constraints. In the remainder of this introductory chapter, we motivate our work, introduce our problem, present a high-level overview of our approach to the problem, identify research contributions, and outline the rest of the dissertation.
1.1 Motivation and Problem 1.1.1 Hierarchical Schedulability Analysis A hierarchical scheduling framework has been introduced for supporting hierarchical resource sharing among applications under different scheduling services. The hierarchical scheduling framework can be generally represented as a tree, or a hierarchy, of nodes, where each node represents an application, which has its own scheduler for scheduling internal workloads (threads), and resources are allocated from a parent node to its children 3
Resource Scheduler
Resource Share
Resource Share
Scheduler
Scheduler
Workload
Workload
Figure 1.1: Hierarchical scheduling framework: a resource is scheduled by a scheduler and each share of the resource is subsequently scheduled by other scheduler.
nodes, as illustrated in Figure 1.1. Goyal et al. [28] proposed a hierarchical scheduling framework for supporting different scheduling algorithms for different application classes in a multimedia system. The hierarchical scheduling framework can be effectively used for supporting multiple applications while allowing independent execution of those applications. This can be correctly achieved when the system provides partitioning, i.e., the applications are separated functionally usually for fault containment, which prevents any partitioned function from causing a failure of another partitioned function, and for ease of verification, validation and certification. ARINC-653 [3] is a good example of standard specifications that employ the hierarchical scheduling framework for supporting partitioning. ARINC-653 defines a standard APEX (APplication/EXecutive) interface between the operating system and the application software of Integrated Modular Avionics (IMA) systems and other safety-critical systems.
4
This standard specification specifies a set of facilities which the system provides space and time partitioning for embedded avionics software through controlling the scheduling, communication, and status information of its internal processing elements. To achieve time partitioning of applications, ARINC-653 specifies a two-level scheduling framework. The hierarchical scheduling framework is particularly useful in the domain of open systems [16], where applications may be developed and validated independently in different environments. The hierarchical scheduling framework allows applications to be developed with their own internal scheduling algorithms for their own purposes and to be imported in the systems that have different OS scheduling algorithms for scheduling applications. The Java environment [26, 53] is one of good examples that employ hierarchical scheduling frameworks for supporting open systems. The Java programming language [26] is a platform-independent language such that its compiler does not produce native code for a particular platform but rather intermediate code, called bytecode, instructions for the Java Virtual Machine (JVM) [53]. JVM is an application, executed and scheduled by an operating system, that interprets and schedules Java bytecode programs (threads). This way, the platform-independent execution of Java programs is achieved through a two-level scheduling framework. In addition to CPU scheduling, the hierarchical scheduling framework can be used for other resource scheduling. For example, the network is a typical hierarchical resource, i.e., the network is shared by nodes, and a share of the network is subsequently shared by applications within a node. Therefore, the hierarchical scheduling framework can be
5
effectively used for supporting network scheduling of distributed systems. In this dissertation, we address the compositional schedulability analysis problem, which is to achieve schedulability analysis of hierarchical scheduling frameworks in a compositional way, i.e., to achieve the system-level schedulability analysis through the results of component-level schedulability analysis.
1.1.2 Real-time Component Interface In component-based real-time systems, it is desirable to develop component interfaces that can specify resource requirements necessary to guarantee the schedulability of individual components without revealing the internal information of components such as their internal workload set information and scheduling algorithms. Then, a parent component (or a larger component) can provide resource allocations to child components (or smaller components) such that the schedulabilities of the child components can be guaranteed, as long as the parent component satisfies the resource requirements imposed by the child components’ interfaces, respectively. In this scheme, a parent component does not have to control (or even understand) how its child components schedule resources for their own tasks. Once a child component develops a component interface that specifies its timing properties, it exports the interface to its parent component. The parent component can then treat the child component as a single workload that has the same timing properties as those specified by the child component’s interface. This scheme allows the timing properties of a component to be independently analyzed as well as the system-level timing properties to be
6
established through component interfaces, i.e., by composing the component-level timing properties. In this dissertation, we consider the problem of developing component interfaces that can specify the timing properties of components.
1.1.3 Compositional Design for Real-time Embedded Systems Embedded systems are often subject to timing constraints as well as resource constraints. Typical scarce resources for embedded systems include energy and memory. Energy is a limited resource for battery-operated embedded systems such as digital cellular phones and personal digital assistants. Since battery operation time is a primary performance measure for such systems, energy consumption is one of the most important constraints. Memory is another limited resource for embedded systems, particularly, targeting systems-on-a-chip (SOC). Since the cost of a chip is proportional to the fourth (or higher) power of its die size [34], program code size is a key design factor that determines the memory size of a chip and thus affects the die size and the chip cost. Many techniques have been introduced to reduce energy consumption and program code size, respectively. For energy-constrained embedded systems, recent trends in embedded architecture provide support for dynamic voltage scaling (DVS) techniques at the processor level. The CPU clock speed (and its corresponding supply voltage) can be dynamically adjusted on several commercial variable-voltage processors such as Intel’s Xscale, AMD’s K6-2+ and Transmeta’s Crusoe processors. This DVS technique produces a voltage vs. CPU clock speed
7
tradeoff. Various DVS scheduling algorithms [83, 36, 78, 8, 68, 69] have been proposed to employ this tradeoff for scheduling hard real-time systems. Typical scheduling goal is to reduce the total system’s energy consumption while satisfying the system’s real-time requirements. For memory-constrained embedded systems, a popular technique is to reduce program code size by employing a “dual instruction set” with the processor capable of executing two different Instruction-Sets (IS) [31, 49]. Compilers can reduce code size by encoding a subset of normal 32-bit instructions into a 16-bit format as in ARM Thumb [27] and MIPS16 [80]. These 16-bit instructions can be dynamically decompressed by hardware into 32-bit equivalents before execution. This approach can substantially reduce the code size; however, it increases the number of instructions to be executed, and thus, increases the execution time of the program. For typical examples, the compressed code may require nearly 70% of the space of the original code, while executing 40% more instructions [23]. Our previous study [77] employed this code size vs. time tradeoff to address the problem of minimizing the total system’s code size subject to the system’s real-time requirements. Given such tradeoffs, we naturally consider the problem of determining the resource utilizations of individual tasks while minimizing a total cost on resource uses subject to the system’s real-time and resource constraints. This leads to the need for a design methodology that can efficiently explore tradeoff space for finding optimal solutions.
8
resource allocation
resource capacity
(a) dedicated resource
time resource capacity
(b) time-shared resource
time
Figure 1.2: Resource types: (a) a dedicated resource is allocated all the time, and (b) a time-shared resource is allocated at some times but not allocated at the other times.
1.2 Approach In this dissertation, we develop component interfaces for addressing two main problems. One is to achieve compositional schedulability analysis. The other is to address a design optimization problem of resource-constrained real-time embedded systems in a compositional way. We now present the outlines of these two main problems and their sub-problems.
1.2.1 Compositional Schedulability Analysis Schedulability analysis with resource models. In the real-time systems community, the schedulability analysis problem has been extensively studied [56, 50, 9, 6]. These previous studies have developed real-time scheduling theories for various real-time workload models and scheduling algorithms. However, they do not adequately accommodate the characteristics of resource, which we find is important for developing techniques for hierarchical schedulability analysis. 9
Figure 1.2 illustrates different types of resources. A resource is said to be dedicated to a workload set if it is exclusively assigned to the workload set or shared if it is shared with any other workload set. For example, when a workload set is allowed to exclusively utilize the processor, the processor is a dedicated resource to the workload set. In a distributed system where nodes share the network, the network is a shared resource to a workload set within a node. A shared resource is said to be time-shared to a workload set if it is available to the workload set at some times, but not available at other times. One of the assumptions shared by most previous studies on schedulability analysis is that a real-time system has dedicated resources, i.e., the resources are exclusively used by the system, not being shared by any other system. However, a real-time system can be developed with a shared resource. For example, a real-time system may have to share the network with other systems, even without knowing which other systems are subject to sharing the network together. Unlike workload models and scheduling algorithms that have been extensively studied, resource models have been little studied. One known resource model for specifying timeshared resources is the bounded-delay resource model [60]. In this dissertation, we propose another resource model, called the periodic resource model, for specifying the periodic behavior of a time-shared resource more accurately. Extending the results of traditional real-time scheduling theories, which have focused on the characteristics of various workload models and scheduling algorithms, we develop a framework that allows the schedulability analysis of a real-time system to be easily
10
achieved even though the system has time-shared resources.
Real-time component interface.
There has been lack of studies on the component in-
terfaces that can specify the timing properties of components, excluding trivial cases. We define the component timing abstraction problem as the problem of abstracting the collective real-time requirements of a component, which its internal workload set demands under a specific scheduling algorithm, as a component interface. It is desirable that the component interface represents the “minimum” resource requirements for guaranteeing the schedulability of the component, while the minimality can be determined with respect to various criteria. For this problem, in this dissertation, we consider two component interface models. They are the periodic and bounded-delay interface models, which are based on the periodic and bounded-delay resource models, respectively. Then, we develop a framework for addressing the component timing abstraction problem, extending our proposed framework for the compositional schedulability analysis involving the periodic and bounded-delay resource models.
Compositional schedulability analysis for distributed real-time systems. A distributed real-time system consists of a set of computation nodes connected by a shared communication medium. In general, such systems have multiple hierarchical resources with dependencies. For example, the processor and the network can have a sequential dependency, i.e., applications can require the processor to complete computation and then the network to transmit computation results prior to deadlines. The network is a good example of hi-
11
erarchical resources. We consider the compositional schedulability analysis problem for distributed real-time systems, where resources can be shared hierarchically under dependencies. Our approach is to develop the real-time component interface for each node such that the interface abstracts the sequence dependency between the processor and the network and the collective network demand within the node. We then accomplish the schedulability analysis of distributed real-time systems through the real-time component interface.
1.2.2 Design optimization for real-time embedded systems. With tradeoff relationships such as the voltage vs. CPU clock speed tradeoff and the code size vs. program execution time tradeoff, we formulate a multidimensional optimization problem and show the NP-hardness of this problem. We then provide an algorithm that finds solutions close to optimal ones. We also develop an approach to compositionally address the design optimization problem. Our approach is (1) to decompose the system-level design optimization problem into multiple component-level design problems and (2) to address the system-level problem through solutions to the component-level problems. We address a component-level problem by deriving a component interface that specifies the timing and resource properties of a component. This component interface includes a component-level code size vs. program execution time tradeoff, which is obtained by combining all task-level code size vs. program execution time tradeoffs within a component. This component interface allows us to view a component as a single workload such that the component has its own code size vs.
12
program execution time tradeoff, just as the single workload does. This way enables us to address the design optimization problem for a collection of components, just as to address the problem for a set of workloads.
1.3 Contributions Our contributions advance the state of the art in component-based real-time embedded systems - especially in the context of real-time scheduling - as follows: • Periodic resource model. We propose the periodic resource model that can specify the periodic behavior of resource allocations. We present a method that computes the worst-case resource allocations of the periodic resource model. • Real-time component interface. We develop methods to compute the minimum realtime resource requirements necessary for satisfying timing constraints. These methods allow us to develop “optimal” real-time component interfaces, which abstract the collective real-time resource requirements of a component with the “minimum” resource requirements. Real-time component interface is key to achieving the compositional schedulability analysis. • Compositional schedulability analysis. With real-time component interfaces, we develop a framework for achieving compositional schedulability analysis. We show that this framework can be used for analyzing the schedulability of distributed realtime systems in a compositional way. 13
• Real-time embedded system design. We identify the tradeoff relationship involving program code size, program execution time, and processor energy consumption. We propose a design optimization framework that flexibly balances this tradeoff. • Compositional approach to real-time embedded system design. We present an approach to explore aforementioned tradeoff space for finding optimal solutions in a compositional way. We provide techniques for combining multiple task-level tradeoffs into a single component-level tradeoff for this compositional approach.
1.4 Organization The rest of this dissertation is organized as follows. In Chapter 2, we survey related work in the fields of real-time scheduling and resource reduction techniques for real-time embedded systems. Chapter 3 presents the system model and assumptions that we consider in this dissertation. Chapter 4 describes our proposed resource model and extends the results of traditional real-time scheduling theories by providing the schedulability conditions and utilization bounds involving resource models. Chapter 5 develops real-time component interfaces, which specify the minimum realtime resource requirements necessary to satisfy the timing constraints of a component. We present both analytical results and simulation results for the overhead that the real-time component interfaces incur in terms of resource utilization increase. 14
Chapter 6 presents compositional schedulability analysis for distributed real-time systems, where resources can be hierarchically shared with dependencies. We describe how to abstract the node-level network demands and to achieve the system-level schedulability analysis through the node-level abstractions. Chapter 7 present a design framework for real-time embedded systems. We describe a design optimization problem, which captures two key aspects of real-time embedded systems: timeliness and scarce resources. We present a heuristic solution to the problem. We also provide an approach to address the problem in a compositional way. Finally, in Chapter 8, we identify a number of remaining challenges with our approach, present plans for future work, and conclude.
15
Chapter 2 Background
2.1 Real-Time Systems A real-time system is a system in which correctness depends not only on logical correctness, but also on timeliness. Depending on their characteristics, timing constraints are generally divided into two types: hard and soft. A timing constraint is said to be hard if missing it is considered as a fatal system fault, or soft if missing it contributes to the degradation of system performance. In hard real-time systems, one of the most primary concerns is whether or not their timing constraints can be satisfied, particularly, when their components (workloads) are subject to share resources. Scheduling is to assign resources in order to service workloads according to scheduling algorithms. A real-time system is said to be schedulable according to a scheduling algorithm if the timing constraints of workloads can be always met with given available resources under the scheduling algorithm. Schedulability analysis, which determines whether or not a system is schedulable, has been the most 16
fundamental research problem in the real-time systems community. There has been extensive research on real-time scheduling during over the past thirty years. This chapter presents backgrounds for scheduling algorithms, schedulability analysis, real-time workload models, and hierarchical scheduling framework.
2.1.1 Real-time Scheduling Algorithms Real-time scheduling algorithms can be classified into two types: clock-driven and eventdriven ones. In the clock-driven algorithms, a schedule, which means an assignment of all the workloads in the system on the available resources produced by scheduling algorithms, is generated off-line, i.e., decisions on which workloads to execute at what times are made a priori before the system begins execution. The cyclic executive approach, one of most widely used hard real-time scheduling algorithms, is a good example of clock-driven approaches. It generates a static periodic schedule. In the event-driven approaches, scheduling decisions are made when events such as releases and completions of workloads occur. At any scheduling decision time, eventdriven algorithms, which are also known as priority-driven algorithms, schedule tasks with the highest priorities to execute. Priority-driven algorithms differ from each other in how priorities are assigned to tasks. Priority-driven algorithms are generally classified into two types: fixed-priority and dynamic priority. A fixed-priority algorithm assigns the same priority to all the jobs in each task (in other words, the priority of each task is fixed relative to other tasks.), and does
17
not change the priority of a job at all at run-time. In contrast, a dynamic-priority algorithm assigns different priorities to the individual jobs in each task and permits the priority of a job to change at run-time. Here, we consider tasks are independent and preemptable. A well-known fixed-priority scheduling algorithm is the rate-monotonic algorithm [56]. This algorithm assigns priorities to tasks based on their periods: the shorter period, the higher the priority. The rate (or job releases) of a task is the inverse of its period. Liu and Layland [56] showed that it is an optimal fixed-priority algorithm for a periodic task set with per-period deadlines (relative deadlines are equal to periods). Another well-known fixed-priority algorithm is the deadline-monotonic algorithm [6]. This algorithm assigns priorities to tasks according to their relative deadlines: the shorter the relative deadline, the higher the priority. Audsley et al. [6] showed that it is an optimal fixed-priority algorithm for a periodic task set with pre-period deadlines (relative deadlines are smaller than or equal to periods). A well-known dynamic-priority algorithm is the earliest-deadline-first algorithm. This algorithm assigns priorities to individual jobs in the tasks according to their absolute deadlines. Another well-known dynamic-priority algorithm is the least-slack-first (LST) algorithm (also called the minimum-laxity-first (MLF) algorithm) [51, 58]. At any time t, the slack (or laxity) of a job with deadline d is equal to d − t minus the time required to complete the remaining portion of the job. This algorithm assigns priorities to jobs based on their slacks: the smaller the slack, the higher the priority. It has been known that the EDF and LST algorithms are both optimal for scheduling independent, preemptive jobs with
18
arbitrary release times and deadlines on a single processor. A criterion we will use to measure the performance of algorithms used to schedule periodic tasks is the schedulable utilization. The schedulable utilization of a scheduling algorithm is defined as follows: a scheduling algorithm can feasibly schedule any set of periodic tasks on a processor if the total utilization of the tasks is equal to or less than the schedulable utilization of the algorithm. Clearly, the higher the schedulable utilization of an algorithm, the better the algorithm. While by the definition of schedulable utilization, optimal dynamic-priority scheduling algorithms outperform fixed-priority algorithms, an advantage of fixed-priority algorithms is predictability. The timing behavior of a system scheduled according to a fixed-priority algorithms is more predictable. When tasks have fixed priorities, overruns of jobs in a task can never affect higher-priority tasks. It is possible to predict which tasks will miss their deadlines during an overload.
2.1.2 Schedulability Analysis EDF Scheduling.
Consider the standard periodic workload model T hp, ci with per-period
deadline, where p is a period and c is the worst-case execution time requirement. Liu and Layland [56] showed that a periodic workload set W = {T1 , · · · , Tn } is schedulable under EDF, if and only if, n X ci U= ≤ 1. p i=1 i
Its (schedulable) utilization bound is simply 100%.
19
(2.1)
Consider another periodic workload model T ′ hp, c, Di with pre-period deadline, where D is a relative deadline (D ≤ p). In general, the processor demand in an interval [t1 , t2 ], denoted by demand(t1 , t2 ), is the amount of processing time requested by those jobs that are released in [t1 , t2 ] and must be completed in [t1 , t2 ]. Therefore, the feasibility of a task set is guaranteed if and only if in any time interval the total processor demand does not exceed the available time, that is, if and only if
∀t1 , t2
demand(t1 , t2 ) ≤ t2 − t1 .
(2.2)
Baruah et al. [9] showed that a periodic workload set W = {T1′ , · · · , Tn′ } is schedulable under EDF scheduling, if and only if U < 1 and
∀L > 0
RM Scheduling.
n j X L + p i − Di k
pi
i=1
· ci ≤ L,
(2.3)
Consider the standard periodic workload model T hp, ci with per-period
deadline, where p is a period and c is the worst-case execution time requirement. Liu and Layland [56] showed that a periodic workload set W = {T1 , · · · , Tn } is schedulable under RM, if n X ci ≤ n(21/n − 1). U= p i=1 i
(2.4)
As n approaches to the infinity, its (schedulable) utilization bound, which is n(21/n − 1), approaches to ln(2) (approximately 69.31%). Lehoczky et al. [50] showed that the average schedulable utilization, for large randomly chosen task sets, is approximately 88%. If task 20
periods are harmonic (that is, each task period is an exact integer multiple of the next shorter period), then the utilization bound is 100%. Bini and Buttazzo [12] derived another utilization-based, sufficient schedulability condition, which they showed is tighter than Liu and Layland’s condition: a periodic workload set W = {T1 , · · · , Tn } is schedulable under RM, if n Y ci i=1
pi
+ 1 ≤ 2.
(2.5)
Lehoczky et al. [50] showed a periodic workload set W = {T1 , · · · , Tn } is schedulable under RM, if and only if,
∀1 ≤ i ≤ n ∃0 < t ≤ pi
ci +
X
Tm ∈HP (i)
l t m · cm , pm
where Tm = hpm , cm i,
(2.6)
where HP (i) is a set of tasks with priorities higher than task Ti . Audsley et al. [5] presented the following recursive method for computing the worstcase response time Ri of task Ti under RM:
Rik+1 = ci +
X
Tm ∈HP (i)
l Rk m i
pm
· cm ,
where Ri0 = ci and this recursive method continues until Rik = Rik+1 .
21
(2.7)
2.1.3 Real-time Workload Models Thus far, our focus has been on the scheduling based on the Liu and Layland’s classical periodic task model. Here, we consider other workload models. Specifically, we describe flexible applications. A flexible application contains tasks that can trade off at run-time the qualities of their results (services) for their time and resource demands. That is, the flexible application can reduce its time and resource demands at the expense of the quality of its result. The flexible workload model approach has been proposed as a means for handling overload and increasing availability. We describe flexible workload models. Hamdaoui and Ramanathan [32] introduced the (m, k)-firm deadline model for representing less stringent guarantees for temporal constraints. In this model, a periodic task is said to be have an (m, k)-firm deadline requirement if it is adequate to meet the deadlines of m out of any k consecutive instances of the task where m and k are two positive integers with m ≤ k. Ramanathan [66] proposed an algorithm for scheduling periodic tasks with (m, k)-firm deadlines. Liu and colleagues [52] introduced the imprecise computation model to specify the acceptable quality of service. In this model, a task is logically decomposed into two subtasks: mandatory and optional subtasks. The mandatory subtask should be completed before its deadline to produce a result of acceptable quality and the optional subtask can be completed before its deadline to enhance the quality of the result. The execution of an optional subtask is associated with error, which basically represents the amount of optional work left uncompleted. As a task’s optional part executes, the error decreases, according to an error
22
function. A typical scheduling objective is to ensure that the mandatory parts of all tasks are completed by their deadlines while the total errors in the system are minimized. In [74], optimal preemptive algorithms for imprecise computation model were first proposed using network-flow formulation. Faster algorithms were devised later in [73] to optimally schedule n independent tasks with identical linear error functions (in time O(n log n)) and linear error functions with different weights (in time O(n2 )). Dey et al. [18] introduced the increased-reward-with-increased-service (IRIS) model that allows task to get increasing reward with increasing service, without an upper bound on the execution times of the tasks and without the separation between mandatory and optional parts. A task executes for as long as the scheduler allows. Typically, a nondecreasing concave reward function is associated with each task’s execution time. In [18, 17], the problem of maximizing the total reward in a system of independent aperiodic tasks is explored. An optimal polynomial-time solution with static task sets (identically ready times) is presented, as well as two extensions that include mandatory parts and on-line policies for dynamic task arrivals. Rajkumar et al. [64, 65] introduced the QoS resource allocation model (Q-RAM) that uses continuous benefit functions to specify application benefit as a function of resource allocations. In this model, several applications are simultaneously managed, while each application has benefit functions on several different resources (such as CPU, memory, and network bandwidth). In these studies, the problem is to optimally allocate multiple resources to the various applications such that they simultaneously meet their minimum re-
23
quirements along multiple QoS dimensions and the total system utility is maximized. They have developed an algorithm for the case where the benefit functions are continuous and convex [64] and an approximation for the case where the benefit functions have discontinuities and are almost convex [65]. Lee et al. extended this work to address applications with discrete benefit functions [46].
2.1.4 Hierarchical Real-Time Scheduling Framework A hierarchical scheduling framework has been proposed for supporting hierarchical resource sharing among applications under different scheduling services. The hierarchical scheduling framework can be generally represented as a tree, or a hierarchy, of nodes, in which each node represents an application, each with its own scheduling algorithm to schedule internal workloads (threads); resources are allocated from a parent node to its children nodes. The concept of this hierarchical scheduling framework is particularly useful in the domain of open systems [16], where applications may be developed and validated independently even in different environments. The hierarchical scheduling framework allows applications to be independently developed with their own internal scheduling algorithms and to be transported to the systems that may have different OS scheduling algorithms for scheduling applications. For real-time systems, there has been a growing attention to hierarchical scheduling frameworks that support hierarchical resource sharing under different scheduling algorithms [16, 43, 54, 22, 67, 70, 55, 75, 76, 4].
24
Deng and Liu [16] introduced open systems where applications are developed and validated independently. To support such open systems, they proposed a two-level hierarchical scheduling framework, where the system scheduler schedules independently developed components (applications) and each component can have its own component scheduler for scheduling its internal tasks. For such a two-level hierarchical scheduling framework, Kuo and Li [43] and Lipari and Baruah [54] presented exact schedulability conditions for the RM and EDF system schedulers. Kuo and Li [43] considered the RM system scheduler assuming that all periodic tasks across components are harmonic, and Lipari and Baruah [54] considered the EDF system scheduler assuming the EDF system scheduler has knowledge of the task-level deadlines of each component. The common assumption shared by these initial approaches is that the parent node’s scheduler has a (schedulable) utilization bound of 100%. Feng and Mok [22] proposed the bounded-delay resource partition model for a hierarchical scheduling framework. Their model can specify the real-time guarantees that a parent component provides to its child components, where schedulers for the parent and child components can be different. This way allows to analyze the schedulability of a child node independently in a sufficient manner, removing the assumption that the parent component’s scheduler has a utilization bound of 100%. They showed that a parent component and its child components can be clearly separated with their model in a hierarchical framework, and presented a sufficient schedulability condition for their framework. There have been studies that considered the scheduling-component abstraction problem
25
for a component, which has a fixed-priority component scheduler and receives periodic resource allocations from its parent component. Saewong et al. [70] introduced the worstcase response time analysis and a utilization bound. Lipari and Bini [55] presented an exact schedulability condition based on time demand calculations1 and addressed the component timing abstraction problem. Almeida and Pedreiras [4] considered the issue of efficiently solving the component timing abstraction problem for the same component. Regehr and Stankovic [67] introduced another hierarchical scheduling framework that considers various kinds of real-time guarantees. Their work focused on converting one kind of guarantee to another kind of guarantee such that whenever the former is satisfied, the latter is also satisfied. With their conversion rules, the schedulability of a child component is sufficiently analyzed such that it is schedulable if its parent component provides realtime guarantees that can be converted to the real-time guarantee that the child component demands. However, they did not consider the scheduling component timing abstraction and composition problems.
2.2 Real-Time Embedded Systems Embedded systems often have resource usage constraints with limited resources. Energy and memory are typical limited resources for embedded systems. For example, energy is a limited resource for battery-operated embedded systems such as digital cellular phones and personal digital assistants. Since the battery operation time is a primary 1
Lipari and Bini [55] presented their schedulability condition as a sufficient condition. However, we consider it as an exact condition based on a different notion of schedulability.
26
performance measure for such systems, energy consumption is considered to be one of the most important constraints. Memory is another limited resource for embedded systems, particularly, targeting systems-on-a-chip (SOC). Since the cost of a chip is proportional to the fourth (or higher) power of its die size [34], program code size is a key design factor that determines the memory size of a chip and thus affects the die size and the chip cost. Many techniques have been introduced for reducing energy consumption [83, 36, 78, 63, 29, 8, 39, 68, 69, 42, 57, 45] and program code size [31, 49], respectively. For memory-constrained embedded systems, a popular technique is to reduce program code size by employing a “dual instruction set” with the processor capable of executing two different Instruction-Sets (IS). For energy-constrained embedded systems, recent trends in embedded architecture provide support for dynamic voltage scaling (DVS) techniques at the processor level. We now provide some backgrounds for these two techniques and introduce approaches related with them.
2.2.1 Code Size Reduction Technique For many embedded systems, program code size is a critical design factor. We present a brief overview of a compiler technique [31, 49] for code size reduction that works for a processor capable of executing dual bit-width instructions. A very good example of such a processor is ARM microprocessors with a 32-bit instruction set (IS) for normal modes and a 16-bit reduced bit-width IS for Thumb modes [27]. A reduction in code size comes
27
from encoding a subset of the 32-bit normal mode IS into the 16-bit Thumb mode IS. At the execution time, a decompression engine converts a Thumb-mode instruction into an equivalent normal-mode instruction during the decode stage. The Thumb IS can access only 8 general purpose registers (out of 16 general purpose registers in the normal mode) and can encode only a small immediate value. These limitations increase the number of execution cycles and, thus, increases the program execution time. For typical programs, by using this technique the code size can be reduced by around 30%, while the number of execution cycles increases by about 40% [23]. The dual bit-width ISA allows a program to contain both 32-bit normal-mode instructions and 16-bit reduced bit-width instructions where the mode change between the two can be performed by executing a single mode-change instruction. This capability allows for a tradeoff between code size and the number of execution cycles when compiling a program. For example, by progressively transforming program units such as functions or basic blocks in the normal mode into the equivalent ones in the reduced bit-width mode while adding patch-up code to maintain the correct semantics, we can obtain a table that gives possible (code size, the number of execution cycles) pairs [49]. The order by which the transformation is performed considers both reduction in code size and increase in the the number of execution cycles, i.e., it favors program units that give large reduction in code size with only a small increase in the number of execution cycles. Shin et al. [77] earlier work proposed a design framework that deals with a design problem taking advantage of this code size vs. time tradeoff.
28
2.2.2 Energy Reduction Techniques Energy consumption is one of the most important design factors for battery-oriented embedded systems. The dynamic power dissipation PD , which is the dominant component of 2 energy consumption in widely popular CMOS technology, is given by PD ∝ CL · VDD ·
Fclock , where CL is the load capacitance, VDD is the supply voltage, and Fclock is the clock frequency. Because the dynamic power dissipation PD is quadratically dependent on the supply voltage VDD , lowering VDD is an effective technique in reducing the energy consumption. However, lowering the supply voltage also decreases the clock speed, because the circuit delay TD of CMOS circuits is given by TD ∝ VDD /(VDD − VT )α , where VT is the threshold voltage and α is a technology-dependent constant. Hence, lowering VDD yields a tradeoff between reducing the energy consumption and increasing the circuit delay. The dynamic voltage scaling (DVS) technique, which involves dynamically adjusting the supply voltage and the CPU clock speed, has been widely accepted as a key technique to reduce the energy consumption for embedded systems. This technique has been increasingly employed on commercial variable-voltage processors such as Intel’s Xscale, AMD’s K6-2+ and Transmeta’s Crusoe processors. Targeting these processors, various DVS algorithms [83, 36, 78, 63, 29, 8, 39, 68, 69, 42, 57, 45] have been proposed to employ this tradeoff for hard real-time systems. Their main goal is to reduce the total system’s energy consumption while satisfying the system’s real-time requirements. A seminal work in the area of DVS algorithms for hard real-time systems has been reported by Yao et al. [83]. The authors proposed an optimal voltage setting algorithm for
29
aperiodic tasks under the EDF scheduling algorithm. The algorithm estimates the workload density for each execution interval based on tasks’ arrival times, deadlines, and execution cycles, according to which the frequency/voltage pair is assigned to each task instance. Quan and Hu [63] indicate that this algorithm is not directly applicable to other scheduling algorithms because the schedulability of a task set is largely dependent on the underlying scheduling policy. Based on this observation, they proposed an extension to Yao et al.’s algorithm that can be applied to aperiodic task sets scheduled by a fixed-priority scheduling scheme, which is optimal in terms of energy consumption. For a periodic task set, Shin et al. [78] proposed a power optimization method for computing the lowest possible processor speed that guarantees the schedulability of the task set under EDF and RM scheduling. Pillai and Shin [62] described heuristics for computing the processor speed dynamically dealing with situations where a task uses less than its WCEC. They also presented the implementation of their heuristics for EDF scheduling, which has been known as the first implementation that supports DVS for real-time systems. Gruian [29] developed a stochastic DVS approach based on execution time distributions of tasks. This algorithm assigns several different voltage levels to each task based on the given distribution function so that a task starts execution at a low speed and later accelerates the execution. They also proved that their stochastic DVS algorithm is probabilistically optimal. While most studies have focused on the problem of minimizing the processor energy consumption subject to the schedulability of a task set, Rusu et al. [68] proposed a framework for maximizing the system value (reward) subject to timing and energy constraints.
30
Chapter 3 System Model
3.1 System Model In this section, we describe the assumptions, terms and notations that are valid through this dissertation, unless noted otherwise. A component (or an application) consists of a workload set and a scheduling algorithm for scheduling the workload set. A workload may require usage of multiple resources in order to provide some system function. For example, a workload can require the processor to complete computation (or task) the network to transmit computation results (or message).
3.1.1 Assumptions • All tasks are periodic. In Chapter 4 and 5, we consider an additional task model, which is the bounded-delay task model.
31
• All periodic tasks are synchronous, i.e., are released at the beginning of period. • All periodic tasks have a deadline equal to their period, • All tasks are independent, that is, have no resource or precedence relationships. We note that in Chapter 6, resources have precedence relationships, i.e., workloads should use the processor and then the network. • All tasks have a fixed computation time, or at least a fixed upper bound on their computation times, which is less than or equal to their period. In Chapter 7, all tasks have a set of candidates for the fixed computation time. • No task may voluntarily suspend itself. • All tasks are fully preemptable. • All overheads are assumed to be 0. • There is just one processor. • In Chapter 6 only, all tasks in a single component have harmonic periods. Tasks across different components need not have harmonic periods.
3.1.2 Models, Terms and Notations We describe our workload models, resource models, and scheduling algorithms that are used in this dissertation.
32
Workload Model.
In Chapter 4 and 5, a workload is simply a task that requires the
processor for completing its computation. We define a workload model as the standard periodic task model T (p, c), where • Period p: the fixed time interval between the arrival times of two consecutive request of T . We assume each task has a relative deadline equal to its period. • WCET (Worst-Case Execution Time) c: the amount of time to complete the execution of T . We assume that 0 < c ≤ p. The resource utilization UT of task T represents the percentage of time a resource is requested by T and is defined as c/p. In Chapter 6, a workload consists of a task, which requires the processor for completing its computation, and a message, which requires the network for completing its transmission. We consider when the task of a workload finishes its computation, the message of the same workload is generated to contain computation results. We define a workload model as a triple hT hp, ci, Mhp, xi, R(T, M)i to characterize the periodic task model T hp, ci, the periodic message model Mhp, xi, and the relationship R(T, M) between T and M, where • Period p represents the fixed interval between two consecutive jobs of T . Its relative deadline is the same as P . • Computation time c represents the worst-case execution time (WCET) for the completion of the task T with the processor.
33
• Transmission time x represents the worst-case transmission time (WCTT) for the completion of the transmission of the message M over the network. • Dependency relationship R(T, M) represents that the message M is generated by the task T on completion of its execution on the processor. In Chapter 7, a workload is simply a periodic task that requires the processor for completing its computation. We consider that the processor supports the DVS techniques and the dual instruction sets. We define a workload model as a periodic task model with some additional parameters as follows: • Period p: the fixed time interval between the arrival times of two consecutive request of T . The relative deadline of task T is p. • WCEC (Worst-Case Execution Cycles) g: the number of the worst-case execution cycles to execute for the completion of T . Rather than the worst-case execution time, we use the notion of the worst case execution cycles (WCEC), since the execution time of programs can vary depending on run-time settings such as the voltage/frequency setting of the processor. • Code Size s: the size of T ’s executable code. • CPU Frequency f : the CPU frequency at which T executes. We assume that the processor supports dynamic voltage scaling (DVS) technique. We assume that the processor supports a finite number of variable voltage and clock frequency settings, where the operating frequency of the clock is proportional to the supply voltage. In 34
Section 7.3, 7.4, and 7.5, We assume that the operating frequency can be set at a continuous level in (0, fmax ], where fmax denotes the maximum clock frequency at which the processor can run, and we consider it as 1.0. In Section 7.7 and Section 7.8, we assume that that the processor provides 4 pairs of clock frequency and voltage, the clock frequencies are 1.0, 0.8, 0.6, and 0.4. We assume that all the tasks in the same component have the identical CPU frequency. • Execution Time c: the amount of time to complete the execution T . We assume that the execution time of a program is inversely proportional to the current frequency setting. That is, we define c = g/f . • Energy Consumption e: the amount of energy that the processor consumes to complete the execution of Ti . In addition, the energy consumption of the processor is assumed to be proportional to the supply voltage squared [15], and the supply voltage and the clock frequency are in a direct proportional relationship. Thus, it holds that e ∝ f 2 . For each task Ti , we now define its energy consumption e as proportional to its WCEC and its CPU frequency squared [15] as follows:
e = κc · f 2
PLCM . p
where κ gives the energy consumption constant, and PLCM denotes the hyperperiod, i.e., the least common multiple of periods of all the tasks.
35
Size/Cycle Tradeoff List.
In Chapter 7, we assume that each task Ti has a size/cycle
tradeoff list SGi that is defined as follows: SGi enumerates the possible values of task parameters hsi , gii, under the assumption that each task has multiple versions of its executable code and that each version can have different code size and WCEC, i.e.,
SGi = {hsi,j , gi,j i | j = 1, 2, . . . , NiSG } and hsi , gi i ∈ SGi ,
and NiSG denotes the number of elements of SGi . Now, we describe our assumptions on the size/cycle tradeoff list SGi . We assume that each task has multiple versions of its executable code using the selective code transformation technique [49], which utilizes a dual instruction set processor. This technique generates each code version with different WCEC and code size. The greedy nature of this technique ensures that the size/cycle tradeoff list SGi is constructed satisfying the following two properties: • The code size si,j increases while the WCEC gi,j decreases as the index j increases. That is, ∆si,j = si,j − si,j−1 > 0 and ∆gi,j = gi,j − gi,j−1 < 0, ∀i ∈ [1, n] and ∀j ∈ [2, NiSG ]. • The marginal gain in the WCEC reduction for the unit increase in the code size is monotonically non-increasing, i.e.,
∀i ∈ [1, n], ∀j ∈
[2, NiSG
36
− 1]
|∆gi,j+1 | |∆gi,j | ≤ s . ∆si,j+1 ∆i,j
Resource Model.
A resource is said to be dedicated to a workload set if it is exclusively
assigned to the workload set or shared if it is shared with any other workload set. For example, when a workload set is allowed to exclusively utilize the processor, the processor is a dedicated resource to the workload set. In a distributed system where nodes share the network, the network is a shared resource to a workload set within a node. A shared resource is said to be time-shared to a workload set if it is available to the workload set at some times, but not available at other times. In this dissertation, we consider two time-shared resource models. One is our proposed periodic resource model, and the other the bounded-delay resource model [60]. We propose a time-shared resource model, called periodic resource model, that can characterize time-shard resources with periodic behavior. This periodic resource model is defined as Γ(Π, Θ), where Π is a period (Π > 0) and Θ is a periodic allocation time (0 < Θ ≤ Π). The resource capacity UΓ of a periodic resource Γ(Π, Θ) is Θ/Π. Let us define the supply function supplyR (m, m+d) of a resource model R such that it calculates the processor (resource) supply of R during a time interval [m, m + d). This periodic resource model Γ(Π, Θ) can specify the resources that have the following property:
supplyΓ kΠ, (k + 1)Π = Θ, where k = 0, 1, 2, . . . . Mok et al. [60] introduced a bounded-delay resource partition model Φhα, ∆i, where α is an available factor (resource capacity) (0 < α ≤ 1) and ∆ is a partition delay bound (0 ≤ ∆). This bounded-delay model Φhα, ∆i is defined to characterize the following 37
property:
∀t1 ∀t2 ≥ t1
Scheduling Algorithm.
(t2 − t1 − ∆)α ≤ supplyΦ (t1 , t2 ) ≤ (t2 − t1 + ∆)α.
As a scheduling algorithm, we consider the earliest deadline first
(EDF) algorithm, which is an optimal dynamic scheduling algorithm [56], and the rate monotonic (RM) algorithm, which is an optimal fixed-priority scheduling algorithm [56]. A good reference for the comparisons between EDF and RM is given in [14].
Component.
A component (or an application) is defined as ChW, Ai, where W describes
a set of workloads (applications) supported in the component and A is a scheduling algorithm which describes how the workloads share the resources at all times. A scheduling unit is a unit of scheduling, defined as SUhW, R, Ai, that includes a workload set W , a resource model R available to the scheduling unit, and a scheduling algorithm A.
38
Table 3.1: Glossary Ti hpi , ci i pi ci gi si fi ei UTi rsTi Mi hpi , xi i xi Ri hTi , Mi i rsM i MS j ΓhΠ, Θi Π Θ Φhα, ∆i α ∆ PIhP, Ci P C BIhA, Di A D SGi NiSG
Periodic task i. (Ch. 4-7) The period of task Ti . (Ch. 4-7) The worst-case computation time of task Ti on each release. (Ch. 4-7) The worst-case execution cycles that is the number of the worst-case execution cycles necessary to execute for the completion of Ti . (Ch. 7) The code size of Ti ’s executable code. (Ch. 7) The CPU frequency at which Ti executes. (Ch. 7) The amount of energy that the processor consumes to complete the execution of Ti . (Ch. 7) The processor utilization of task Ti , which represents the percentage of time a resource is requested by task T . (Ch. 4-7) The worst-case response time of task Ti . (Ch. 6) Periodic message i. (Ch. 6) The worst-case transmission time of message Mi on each release. (Ch. 6) The dependency relationship between task Ti and message Mi that represents the message Mi is generated by the task Ti on completion of its execution on the processor. (Ch. 6) The worst-case response time of message Mi . (Ch. 6) The message set of node j in a distributed system. (Ch. 6) The periodic resource model. (Ch. 4-6) The period of periodic resource model Γ. (Ch. 4-6) The resource supply of periodic resource model Γ during period Π. (Ch. 4-6) The bounded-delay resource model. (Ch. 4-6) The resource capacity of bounded-delay resource model Φ. (Ch. 4-6) The delay bound of bounded-delay resource model Φ. (Ch. 4-6) The periodic interface model. (Ch. 5-7) The period of periodic interface model PI. (Ch. 5-7) The execution time requirement of PI during interface period P. (Ch. 5-7) The bounded-delay interface model. (Ch. 5) The resource capacity requirement of BI. (Ch. 5) The bounded delay of BI. (Ch. 5) The size/cycle tradeoff list of task Ti . (Ch. 7) The number of elements of the size/cycle tradeoff list SGi . (Ch. 7)
39
Chapter 4 Schedulability Analysis with Resource Models
4.1 Introduction Scheduling is to assign resources according to scheduling algorithms in order to service workloads. Schedulability analysis is to determine whether or not the timing constraints of workloads can be satisfied with resources under scheduling algorithms. In the real-time scheduling community, schedulability analysis has been substantially studied accommodating various workload models and scheduling algorithms [56, 50, 9, 6]. However, there has been little attention on schedulability analysis with various resource types. Previous work on schedulability analysis has focused on a workload set with dedicated resources, but has not considered a workload set with shared resources adequately1 . 1
We refer readers to Section 3.1.2 for the definitions of “dedicated” and “shared” resources.
40
In this chapter, our primary goal is to provide a fundamental understanding of schedulability analysis for real-time systems. To achieve this goal, we propose a real-time scheduling framework for providing systematic approaches that can accommodate the characteristics of various shared resources into schedulability analysis. In the proposed framework, we provide a general concept of schedulability analysis, which can be applied to every scheduling situation, as follows: the system is schedulable if its collective real-time resource requirements, which its internal workloads require under its scheduling algorithm, can be satisfied with the resource supplies available to the system. Traditional real-time scheduling theories have been extensively developed for calculating the resource demands of various workload models and scheduling algorithms, respectively. The proposed framework extends traditional theories by developing techniques for calculating the resource supplies of shared resources. Our contributions in this proposed framework can be summarized as follows: • we extend a general concept of schedulability analysis with a notion of shared resource. • we propose a shared resource model, called the periodic resource model, that can specify the periodic behavior of resource supplies. • we propose the notion of schedulability with the worst-case resource supply and develop exact schedulability conditions, according to this schedulability notion, with shared resources. 41
• we extend the notion of the (schedulable workload) utilization bounds for shared resources and derive the utilization bounds of shared resources. • we propose a notion of (schedulable resource) capacity bounds and derive the capacity bounds of shared resources. In our proposed framework, we consider two shared resources: one is our proposed periodic resource model and the other is the bounded-delay resource model [60]. With these two shared resources, we develop schedulability conditions, utilization bounds, and capacity bounds for the standard real-time workload model and scheduling algorithms, which are the Liu and Layland periodic task model [56] and the EDF and RM scheduling algorithms, respectively. Our results on the utilization bounds can be compared with previous results as follows. For dedicated resources, Liu and Layland [56] presented seminal results on the utilization bounds of a periodic task set under EDF and RM scheduling. For periodic resources, no utilization bounds have been introduced for a periodic task set under EDF scheduling, while there has been a previous study [70] on deriving the utilization bound under RM scheduling. Our results include a new utilization bound that refine this previous result [70]. In particular, our utilization bounds are the generalization of the Liu and Layland’s results [56], in the sense that our results become to the Liu and Layland’s utilization bounds under EDF and RM scheduling when the periodic resource essentially behaves like a dedicated resource. For bounded-delay resources, there has been no known results on the utilization bounds for a periodic task set under EDF and RM scheduling. Mok and Feng [59] intro42
duced utilization bounds for a time-shard resource, which is similar to, but not the same as the bounded-delay resource. In this section, we present new utilization bounds for the bounded-delay resource model under EDF and RM scheduling. We propose a notion of schedulable resource capacity bound. The capacity bound of a resource is a number such that if the utilization of a periodic task is no greater than the number, the task set is schedulable with the resource. We derive the capacity bounds of periodic and bounded-delay resource models for a periodic task set under EDF and RM scheduling. The rest of this chapter is organized as follows: Section 4.2 describe problems that we address. Section 4.3 provides backgrounds for the standard real-time workload model and scheduling algorithms in terms of resource demand calculation, and Section 4.4 describes the calculation of resource supplies of two shared resource models. Section 4.5 presents schedulability conditions, Section 4.6 presents utilization bounds, and Section 4.7 presents capacity bounds. Finally, we present the summary in Section 4.8
4.2 Problem Description In this section, we present the definition of terms that are used in the proposed framework and then provide a problem statement identifying the problems that are addressed in the proposed framework. Scheduling is to assign resources according to a scheduling algorithm in order to service workloads. We use the term scheduling unit to mean the basic unit of scheduling and
43
define a scheduling unit SU as a triple hW, R, Ai, where W describes the workloads (of applications) supported in the scheduling unit, R is a resource model that describes the resource allocations available to the scheduling unit, and A is a scheduling algorithm which describes how the workloads share the resources at all times. The resource demand of a scheduling unit SUhW, R, Ai represents the collective resource requirements that its workload set W requests under its scheduling algorithm A. The demand bound function dbfA (W, t) of a scheduling unit SUhW, R, Ai calculates the maximum possible resource demands that W requests to satisfy the collective timing requirements of W under A within a time interval of length t. The resource supply of a resource R represents the amount of resource allocations that R provides. The supply bound function sbfR (t) of a resource R calculates the minimum possible resource supplies that R provides during a time interval of length t. A scheduling unit SUhW, R, Ai is said to be schedulable with the worst-case resource supply if a set of timing requirements imposed by the workload set W can be satisfied with the worst-case minimum resource supply of the resource R under the scheduling algorithm A, i.e., ∀L > 0
dbfA (W, L) ≤ sbfR (L).
(4.1)
Hereafter, we consider SU as schedulable if SU is schedulable with the worst-case resource supply. In this chapter, we mainly address the problems: • Schedulability condition. We develop exact conditions for analyzing the schedulabil44
ity of SUhW, R, Ai, where W is a periodic workload set, R is a periodic resource or a bounded-delay resource, and A is EDF or RM. • Schedulable workload utilization bound. For a scheduling unit SUhW, R, Ai, we define a (schedulable workload) utilization bound UBR,A as a number such that SU is schedulable if UW ≤ UBR,A , where UW denotes the utilization of a workload set W . We derive the utilization bound of SUhW, R, Ai, where W is a periodic workload set, R is a periodic resource or a bounded-delay resource, and A is EDF or RM. • Schedulable resource capacity bound. For a scheduling unit SUhW, R, Ai, we define a (schedulable resource) capacity bound CBR (W, A) as a number such that the scheduling unit SU is schedulable if
UR ≥ CBR (W, A),
where UR denotes the capacity of resource R. We derive the capacity bound of SUhW, R, Ai, where W is a periodic workload set, R is a periodic resource or a bounded-delay resource, and A is EDF or RM.
45
4.3 Workload Models As a workload model of the proposed framework, we consider the standard real-time workload model, which is the Liu and Layland periodic model [56]. This model is defined as T (p, c), where p is a period and c is an execution time requirement (c ≤ p). A task utilization UT is defined as c/p. For a workload set W = {Ti }, a workload utilization UW is defined as
P
Ti ∈W
UTi . In this chapter, let Pmin denote the smallest task period in the
workload set W . For the periodic workload model Ti (pi , ci ), its demand bound function dbf(Ti , L) can be defined as follows: dbf(Ti , L) =
EDF Scheduling.
jLk pi
· ci .
For a periodic task set W under EDF scheduling, Baruah et al. [10]
proposed a demand bound function that computes the total resource demand dbfEDF (W, L) of W for every interval length t.
dbfEDF (W, L) =
X
dbf(Ti , L).
Ti ∈W
For an easier mathematical analysis, we can have a linear upper-bound function ldbfEDF (W, L) of the demand bound function dbfEDF (W, L) as follows:
ldbfEDF (W, L) = UW · L ≥ dbfEDF (W, L),
46
(4.2)
where UW is the utilization of the workload set W .
RM Scheduling.
For a periodic task set W under RM scheduling, Lehoczky et al. [50]
proposed a demand bound function dbfRM (W, t, i) that computes the worst-case cumulative resource demand of a task Ti for an interval of length t.
dbfRM (W, L, i) =
X
dbf(Tk , L∗k ),
Tk ∈HPW (i)
where L∗k =
lLm pk
· pk .
where HPW (i) is a set of higher-priority tasks than Ti in W , including Ti . For a task Ti over a resource R, the worst-case response time ri (R) of Ti can be computed as follows:
ri (R) = min{L}
such that
dbfRM (W, L, i) ≤ sbfR (L).
4.4 Resource Models In the proposed framework, we consider two resource models for specifying the characteristics of time-shared resource allocations provided to a single scheduling unit. We propose the periodic resource model that can describe a periodic behavior of resource allocations. Mok and his colleagues [60] proposed the bounded-delay resource model that can specify other timing behavior of resource allocations. For a resource model R, we define its supply function supplyR (t1 , t2 ) that computes the resource supply of R during the interval [t1 , t2 ).
47
t Π Π−Θ
k Π−Θ kΠ
Θ
Θ
Θ
Θ
Figure 4.1: The worst-case resource supply of periodic resource model ΓhΠ, Θi for k = 3.
4.4.1 Periodic Resource Model We propose the periodic resource model ΓhΠ, Θi, where Π is a period (Π > 0) and Θ is a periodic allocation time (0 < Θ ≤ Π). A resource capacity UΓ of a periodic resource ΓhΠ, Θi is defined as Θ/Π. The periodic model ΓhΠ, Θi is defined to characterize the following property:
supplyΓ kΠ, (k + 1)Π = Θ,
where k = 0, 1, 2, . . . .
(4.3)
For schedulability analysis, it is important to calculate the minimum resource supply of a resource model accurately. For a periodic model Γ, its supply bound function sbfΓ (t) is defined to compute the minimum resource supply for every interval length t as follows:
sbfΓ (t) =
t − (k + 1)(Π − Θ) (k − 1)Θ
where k = max
if t ∈ [(k + 1)Π − 2Θ, (k + 1)Π − Θ],
(4.4)
otherwise,
t − (Π − Θ) /Π , 1 . Figure 4.1 illustrates how the supply bound 48
function sbfΓ (t) is defined for k = 3. Since the supply bound function sbfΓ (t) is a discrete function, its linear lower bound function lsbfΓ (t) is as follows:
lsbfΓ (t) =
Θ Π
0
t − 2(Π − Θ)
if (t ≥ 2(Π − Θ)),
(4.5)
otherwise.
We define the service time of a resource as the duration that it takes for the resource to provide a resource supply. For a periodic resource ΓhΠ, Θi, we define a service time bound function tbfΓ (t) to calculate the maximum service time of Γ required for a t-time-unit resource supply as follows:
jtk tbfΓ (t) = (Π − Θ) + Π · + ǫt , Θ where ǫt =
j k j k t Π−Θ+t−Θ t if t − Θ > 0 Θ Θ 0
(4.6)
(4.7)
otherwise
4.4.2 Bounded-Delay Resource Model Mok et al. [60] introduced a bounded-delay resource partition model Φhα, ∆i, where α is an available factor (resource capacity) (0 < α ≤ 1) and ∆ is a partition delay bound (0 ≤ ∆). This bounded-delay model Φhα, ∆i is defined to characterize the following
49
resource supply
example supply
Δ α 1
time
Δ/2
Figure 4.2: Bounded-delay resource model: example. property: ∀t1
∀t2 ≥ t1
∀d ≤ ∆
(t2 − t1 − d)α ≤ supplyΦ (t1 , t2 ) ≤ (t2 − t1 + d)α. Figure 4.2 shows an example of resource supply of a bounded-delay resource model. For a bounded-delay model Φ, its supply bound function sbfΦ (t) is defined to compute the minimum resource supply for every interval length t as follows:
sbfΦ (t) =
α(t − ∆) 0
if (t ≥ ∆),
(4.8)
otherwise.
4.5 Schedulability Analysis In this section, we present conditions that can be used to determine whether or not the timing constraints of a periodic workload set W can be satisfied with the worst-case resource supply of shared resource R under EDF and RM scheduling, where R is either a periodic resource or a bounded-delay resource.
50
4.5.1 Schedulability Analysis under EDF Scheduling The following theorem present an exact schedulability condition for a scheduling unit SU hW, R, EDFi, where W is a periodic workload set and R is a periodic resource. Theorem 4.5.1 A scheduling unit SUhW, R, Ai, where W is a periodic workload set, R is a periodic resource ΓhΠ, Θi, and A is the EDF scheduling algorithm, is schedulable (with the worst-case resource supply of R), if and only if,
∀t > 0
dbfEDF (W, t) ≤ sbfΓ (t).
(4.9)
Proof. To show the necessity, we prove the contrapositive, i.e., if Eq. (4.9) is false, there are some workload members of W that are not schedulable by EDF. Suppose that there exists t∗ > 0 such that dbfEDF (W, t∗ ) > sbfR (t∗ ). That is, the maximum possible resource demand of W during an interval of length t∗ is larger than the minimum possible resource supply of Γ during the same interval. In this case, the resource demand of W cannot be satisfied by the resource supply of Γ. Therefore, there will be a workload member of W that misses a deadline during the interval. To show the sufficiency, we prove the contrapositive, i.e., if all workload members of W are not schedulable by EDF, then Eq. (6.1) is false. Let t2 be the first instant at which a job of some workload member Ti of W that misses its deadline. Let t1 be the latest instant at which the resource supplied to W was idle or was executing a job whose deadline is 51
after t2 . By the definition of t1 , there is a job whose deadline is before t2 was released at t1 . Let t = t2 − t1 . Since Ti misses its deadline at t2 , the total demand placed on W in the time interval [t1 , t2 ) is greater than the total supply provided by Γ in the same time interval length t.
2
Example 4.5.1 Let us consider a workload set W = {T1 h50, 7i, T2h75, 9i} and a periodic resource model ΓhΠ, Θi, where Π = 10 and Θ = 2.8. Then, according to Theorem 4.5.1, the workload set W is schedulable under EDF scheduling with the periodic resource model Γ. The following theorem present an exact schedulability condition for a scheduling unit SU hW, R, EDFi, where W is a periodic workload set and R is a bounded-delay resource. Theorem 4.5.2 A scheduling unit SUhW, R, Ai, where W is a periodic workload set, R is a bounded-delay resource Φhα, ∆i, and A is the EDF scheduling algorithm, is schedulable (with the worst-case resource supply of R) if, and only if,
∀t > 0 dbfEDF (W, t) ≤ sbfΦ (t).
(4.10)
Proof. The proof of this theorem can be obtained if Γ is replaced with Φ in the proof of Theorem 4.5.1.
2
Example 4.5.2 Let us consider a workload set W = {T1 h100, 11i, T2h150, 22i} and a bounded-delay resource Φhα, ∆i, where α = 0.4 and ∆ = 60. Then, according to Theorem 52
4.5.2, the workload set W is schedulable under EDF scheduling with this bounded-delay resource Φ.
4.5.2 Schedulability Analysis RM Scheduling The following theorem present an exact schedulability condition for a scheduling unit SU hW, R, RMi, where W is a periodic workload set and R is a periodic resource. Theorem 4.5.3 A scheduling unit SUhW, R, Ai, where W is a periodic workload set, R is a periodic resource ΓhΠ, Θi, and A is the RM scheduling algorithm, is schedulable (with the worst-case resource supply of R) if, and only if,
∀Ti ∈ W
∃ti ∈ [0, pi]
dbfRM (W, ti , i) ≤ sbfΓ (ti ).
Proof. Task Ti completes its execution requirement at time t ∈ [0, pi ], if and only if all the execution requirements from all the jobs of higher-priority tasks than Ti and ci , the execution requirement of Ti , are completed at time ti . The total of such requirements is given by dbfRM (W, ti , i), and they are completed at ti if and only if dbfRM (W, ti , i) = sbfΓ (ti ) and dbfRM (W, t′i , i) > sbfΓ (t′i ) for 0 ≤ t′i < ti . It follows that a necessary and sufficient condition for Ti to meet its deadline is the existence of a ti ∈ [0, pi ] such that dbfRM (W, ti , i) = sbfΓ (ti ). The entire task set is schedulable if and only if each of the tasks is schedulable. This means that there exists a ti ∈ [0, pi ] such that dbfRM (W, ti , i) = sbfΓ (ti ) for each task Ti ∈ W .
2 53
Example 4.5.3 Let us consider a workload set W = {T1 h50, 7i, T2h75, 9i} and a periodic resource model ΓhΠ, Θi, where Π = 10 and Θ = 3.5. Then, according to Theorem 4.5.3, the workload set W is schedulable under RM scheduling with the periodic resource model Γ.
The following theorem present an exact schedulability condition for a scheduling unit SU hW, R, RMi, where W is a periodic workload set and R is a bounded-delay resource. Theorem 4.5.4 A scheduling unit SUhW, R, Ai, where W is a periodic workload set, R is a periodic resource Φhα, ∆i, and A is the RM scheduling algorithm, is schedulable (with the worst-case resource supply of R) if, and only if,
∀Ti ∈ W
∃ti ∈ [0, pi ]
dbfRM (W, ti , i) ≤ sbfΦ (ti ).
Proof. The proof of this theorem can be obtained if Γ is replaced with Φ in the proof of Theorem 4.5.3.
2
Example 4.5.4 Let us consider a workload set W = {T1 h100, 11i, T2h150, 22i} and a bounded-delay resource Φhα, ∆i, where α = 0.4 and ∆ = 30. Then, according to Theorem 4.5.4, the workload set W is schedulable under EDF scheduling with this bounded-delay resource Φ.
54
4.6 Schedulable Workload Utilization Bounds For a scheduling unit SUhW, R, Ai, we define a (schedulable workload) utilization bound UBR,A as a number such that SU is schedulable if
UW ≤ UBR,A .
The utilization bound is particularly suited for on-line acceptance tests. When checking whether a new periodic task can be scheduled with existing tasks, computing the utilization bound takes a constant amount of time, much faster than doing an exact schedulability analysis based on a demand bound function. In this section, we address the problems of deriving the utilization bounds of scheduling units SUhW, R, Ai, where W is a periodic workload set, R is a periodic or bounded-delay resource, and A is EDF or RM. For the RM scheduling algorithm, we present new utilization bounds that refine previous results [70, 59].
4.6.1 Periodic Resource Model Utilization Bound under EDF Scheduling For a scheduling unit SUhW, R, Ai, where W = {Ti hpi , ci i}, R = ΓhΠ, Θi, and A = EDF, we represent its utilization bound UBR,A as a function of Pmin , that is, UBΓ,EDF (Pmin ), where Pmin is the smallest task period in W . We present the following theorem to introduce the utilization bound UBΓ,EDF (Pmin ).
55
Case:
1
2
3
4
ldbfEDF(W,t) sbf (t)
resource allocation
Γ
t
0
Π
-
2 -2 Π
Θ
2 Π
Θ
* PEDF
* + PEDF
3 -2
3 -
Π
Θ
Π
Θ
Pmin
ε
4 -2 Π
Θ
4 -
* + PEDF
Π
Θ
5 -2 Π
Θ
* + PEDF Π
Θ
5 Π
Θ
6 -2 Π
Θ
6 Π
Θ
Θ
Figure 4.3: Four cases to consider in deriving a utilization bound under EDF scheduling.
Theorem 4.6.1 For a scheduling unit SUhW, R, Ai, where W = {Ti hpi , ci i}, R = ΓhΠ, Θi, and A = EDF, its utilization bound UBΓ,EDF (Pmin ) as a function of Pmin is
UBΓ,EDF (Pmin ) =
k · UΓ , k + 2(1 − UΓ )
where k is the largest integer that satisfies (k + 1)Π − 2Θ + ǫ ≤ Pmin and ǫ = 2Θ/(k + 2). Proof. The constant k basically represents the relationship between Pmin and Π, i.e., ∗ roughly speaking, k indicates how many times bigger Pmin is than Π. Let PEDF denote ∗ (k + 1)Π − 2Θ, where k ≥ 1. It is shown in Figure 4.3 that PEDF is the largest time instant
(k + 1)Π − 2Θ, for integer k, that is smaller than Pmin . In this proof, we consider four cases in terms of an interval length t: (1) 0 ≤ t < ∗ ∗ ∗ ∗ ∗ PEDF + ǫ, (2) PEDF + ǫ ≤ t < PEDF + Θ, (3) PEDF + Θ ≤ t < PEDF + Π, and (4) ∗ t ≥ PEDF + Π. For each of the above four cases, we want to show that
dbfEDF (W, t) ≤ sbfΓ (t). 56
• Case 1. We first consider a case where an interval length t is shorter than Pmin . That ∗ is, from the assumption given in the theorem, which is PEDF + ǫ ≤ Pmin ,
∗ 0 ≤ t < PEDF + ǫ.
From the definition of dbfEDF (W, t), it is easy to see that
∗ ∀0 ≤ t < PEDF +ǫ
dbfEDF (W, t) = 0.
Since sbfΓ (t) ≥ 0 for all t ≥ 0, it follows that
∗ ∀0 ≤ t < PEDF +ǫ
dbfEDF (W, t) ≤ sbfΓ (t).
• Case 2. We consider a case where an interval length t can be longer than Pmin , but is ∗ + Θ. That is, shorter than PEDF
∗ ∗ PEDF + ǫ ≤ t < PEDF + Θ.
From the definition of sbfΓ (t), it is easy to see that
∗ ∗ ∗ ∀PEDF + ǫ ≤ t < PEDF + Θ sbfΓ (t) = (k − 1)Θ + t − PEDF .
57
∗ When t = PEDF + ǫ, it follows that
∗ sbfΓ (PEDF + ǫ) = (k − 1)Θ + ǫ
= kΘ −
k Θ k+2
(4.11)
Suppose that UW is no greater than UBΓ,EDF (Pmin ). That is,
UW ≤
k · UΓ . k + 2(1 − UΓ )
∗ When t = PEDF + ǫ, it follows that
ldbfEDF (W, t) = UW · t ≤ = = = =
k · UΓ ∗ · PEDF +ǫ k + 2(1 − UΓ ) kΘ · (k + 2)Π − 2Θ + ǫ − Π (k + 2)Π − 2Θ kΘ kΘ + · ǫ−Π (k + 2)Π − 2Θ kΘ −(k + 2)Π + 2Θ kΘ + · (k + 2)Π − 2Θ k+2 k Θ. (4.12) kΘ − k+2
∗ From Eq. (4.11) and (4.12), we can see that when t = PEDF + ǫ, ldbfEDF (W, t) =
sbfΓ (t). In this case, it is easy to see that sbfΓ (t) increases linearly in t with a slope of 1, and so does ldbfEDF (W, t) with a smaller slope of UW than 1. Thus,
58
considering dbfEDF (W, t) ≤ ldbfEDF (W, t), in this case, it is easy to see that
∗ ∗ ∀PEDF + ǫ ≤ t < PEDF + Θ dbfEDF (W, t) ≤ sbfΓ (t).
• Case 3. We consider a case where an interval length t can be longer than Pmin , ∗ ∗ particularly longer than PEDF + Θ, but is shorter than PEDF + Π . That is,
∗ ∗ PEDF + Θ ≤ t < PEDF + Π.
From the definition of dbfEDF (W, t), it is easy to see that
∗ ∗ ∀PEDF + Θ ≤ t < PEDF +Π
∗ dbfEDF (W, t) < UW · (PEDF + Π).
Suppose that UW is no greater than UBΓ,EDF (Pmin ). In this case, we then have
∗ dbfEDF (W, t) < UW · (PEDF + Π)
≤
k · UΓ (k + 2)Π − 2Θ k + 2(1 − UΓ )
= k·Θ
since UΓ = Θ/Π
= sbfΓ (t).
∗ • Case 4. We consider a case where an interval length t is longer than PEDF + Π, which
59
is longer than Pmin . That is, ∗ PEDF + Π ≤ t.
Suppose that UW is no greater than UBΓ,EDF (Pmin ). In this case, we then have
dbfEDF (W, t) ≤ UW · t ≤
k · UΓ ·t k + 2(1 − UΓ )
≤ UΓ · t
since 0 < UΓ ≤ 1
= lsbfΓ (t) ≤ sbfΓ (t).
For each of the four cases, we showed that when UW ≤ UBΓ,EDF (Pmin ), the scheduling component SUhW, Γ, EDFi is schedulable according to Theorem 4.5.1.
2
It should be noted that the utilization bound UBΓ,EDF (Pmin ) becomes 1 without regard to Pmin if the capacity of a periodic resource UΓ is 1, i.e., the periodic resource is essentially a dedicated resource. Thus, UBΓ,EDF (Pmin ) is a generalization of the result of Liu and Layland [56]. As an example, we consider a scheduling unit SUhW, ΓhΠ, Θi, EDFi, where Π = 10 and Θ = 4. The resource capacity UΓ of the periodic resource Γ is 0.4. We assume that the smallest task period Pmin of a workload set W is greater than 50, i.e., Pmin ≥ 50. Then, k = 4 and the EDF utilization bound UBΓ,EDF (50) is 0.323. That is, if UW is no greater
60
(a) Utilization Bound under EDF Scheduling
(b) Utilization Bound under RM Scheduling
1
1 limit k=16 k=4 k=2 k=1
0.8 0.6
limit k=16 k=4 k=2 k=1
0.8 0.6
0.4
0.4
0.2
0.2
0
0 0
0.2
0.4 0.6 Resource Capacity
0.8
1
0
0.2
0.4 0.6 Resource Capacity
0.8
1
Figure 4.4: Utilization bound as a function of resource capacity: (a) under EDF scheduling and (b) under RM scheduling.
than 0.323, then the scheduling unit SU is schedulable. Figure 4.4 (a) shows the effect of resource period, in terms of k, on the utilization bound as a function of resource capacity under EDF scheduling. The solid line, labeled “limit”, shows the limit of the utilization bound of a periodic resource, which is obtained when k = ∞. The other curves show the utilization bounds of a periodic resource when k is given as shown in the corresponding labels. It is shown that as k increases, the utilization bound of a periodic resource converges to its limit.
Utilization Bound under RM Scheduling For a scheduling unit SUhW, R, Ai, where W = {T1 hp1 , c1 i, . . . , Tn hpn , cn i}, R = ΓhΠ, Θi, A = RM, Saewong et al. [70] represented its utilization bound UBR,A as a function of n, that is, UBΓ,RM (n). They presented the following result to introduce the utilization bound UBΓ,RM (n), derived from the Liu and Layland’s utilization bound [56]. Lemma 4.6.2 ((Theorem 7 in [70])) For a scheduling unit SUhW, R, Ai, where W = {T1 hp1 , c1 i, . . . , Tn hpn , cn i}, R = ΓhΠ, Θi, A = RM, and pi ≥ 2Π − Θ, 1 ≤ i ≤ n, 61
its utilization bound UBΓ,RM (n) is
h 3 − U 1/n i Γ UBΓ,RM (n) = n −1 . 3 − 2 · UΓ We now present another utilization bound that refines the above previous result. For a scheduling unit SUhW, R, Ai, where W = {T1 hp1 , c1 i, . . . , Tn hpn , cn i}, R = ΓhΠ, Θi, A = RM, we represent its utilization bound UBR,A differently as a function of n and Pmin , that is, UBΓ,EDF (n, Pmin ), where Pmin is the smallest task period in W . We then present the following theorem to introduce the utilization bound UBΓ,RM (n, Pmin ). Theorem 4.6.3 For scheduling unit SUhW, R, Ai, where W = {T1 hp1 , c1 i, . . . , Tn hpn , cn i}, R = ΓhΠ, Θi, A = RM, and pi ≥ 2Π−Θ, 1 ≤ i ≤ n, its utilization bound UBΓ,RM (n, Pmin ) is, i h 2k + 2(1 − U ) 1/n Γ −1 , UBΓ,RM (n, Pmin ) = UΓ · n k + 2(1 − UΓ ) where Pmin is the smallest task period in W and k is the largest integer that satisfies (k + 1)Π − Θ ≤ Pmin . We present the proof of Theorem 4.6.3 as follows: We explain the notion of schedulable workload utilization. We consider a task period restriction that the largest ratio between task periods is less than 2. Lemma 4.6.4 and 4.6.5 show the properties of a workload W , when its workload utilization UW is the least upper bound to the schedulable workload utilization. Using the results of the two lemmas, Theorem 4.6.6 derives a utilization bound. Theorem 4.6.7 removes the task period restriction. 62
We consider a scheduling unit SUhW, R, Ai such that W = {Ti hpi , cii}, R = ΓhΠ, Θi, and A = RM. A workload set W is said to fully utilize a resource R if a scheduling unit SU is schedulable and SU is no longer schedulable if the execution time ci of any task Ti is increased. If a scheduling unit SU is schedulable, the workload utilization UW of a workload set W is said to be a schedulable workload utilization. The least upper bound to the schedulable workload utilization is said to be a utilization bound. Let us first restrict our discussion to the case in which the ratio between any two task period is less than 2. We present the following lemma to introduce a property of a workload set W in terms of the execution time requirement ci of each task Ti , when the workload utilization UW is the the least upper bound to the schedulable workload utilization. Lemma 4.6.4 For a scheduling unit SUhW, R, Ai, where W = {Ti hpi , ci i}, R = ΓhΠ, Θi, and A = RM, under the restriction that the ratio between any two task periods of W is less than 2, if the workload set W fully utilizes the resource Γ under RM scheduling with the smallest possible workload utilization, it follows that
X
ci = sbfΓ (Pmin ),
Ti ∈W
where Pmin is the smallest task period of the workload set W . Proof. For the workload set W , we assume that p1 < p2 < . . . < pn−1 < pn . We now assume that the workload set W fully utilizes the resource Γ under RM scheduling with the ∗ smallest possible workload utilization UW . Let c∗1 , c∗2 , · · · , c∗n be the execution times of the
63
∗ tasks T1 , T2 , · · · , Tn that determine UW . Then, we first need to show that
c∗1 = sbfΓ (p1 , p2 ).
Suppose that c∗1 = sbfΓ (p1 , p2 ) + ∆,
∆ > 0.
Let c′1 = sbfΓ (p1 , p2 ),
c′2 = c∗2 + ∆,
c′3 = c∗3 ,
...
c′n = c∗n .
Given that c∗1 , c∗2 , · · · , c∗n guarantee the schedulability of the scheduling component C and that any increase in c∗i will make C unschedulable, it is clear that a workload set with c′1 , c′2 , · · · , c′n is schedulable over Γ and that any increase in c′i will violate the schedulability ′ of the task set over Γ. Let UW denote the corresponding utilization. We have
∗ ′ UW − UW = (∆/p1 ) − (∆/p2 ) > 0.
Hence, this assumption is false. Alternatively, suppose that
c1 = sbfΓ (p1 , p2 ) − ∆,
∆ > 0.
Let c′1 = sbfΓ (p1 , p2 ),
c′2 = c∗2 − 2∆, 64
c′3 = c∗3 ,
...
c′n = c∗n .
′′
′′
′′
′′
Again, a workload set with c1 , c2 , · · · , cn−1 , cn is also schedulable over Γ and any increase ′′
′′
in ci will violate the schedulibility of the task set. Let U denote the corresponding utilization. We have ′′
∗ UW − UW = −(∆/p1 ) + (2∆/p2 ) > 0.
Again, this assumption is also false. ∗ Therefore, if indeed UW is the least upper-bound of the workload utilization, then
c∗1 = sbfΓ (p1 , p2 ).
In a similar way, we can show that
c∗2 = sbfΓ (p2 , p3 ),
c∗3 = sbfΓ (p3 , p4 ),
...
c∗n−1 = sbfΓ (pn−1 , pn ).
Consequently,
c∗n = sbfΓ (0, p1 ) − (c∗1 + c∗2 + · · · + c∗n−1 ) = sbfΓ (0, p1 ) − sbfΓ (p1 , pn )
Finally, we have X
ci = sbfΓ (0, p1).
Ti ∈W
2
65
We present the following lemma to show another property of a workload set W in terms of the smallest task period in W , when the workload utilization UW is the the least upper bound to the schedulable workload utilization. Let us here restrict our discussion to the case in which the smallest task period Pmin of a workload set W is in the range Pmin ∈ (k + 1)Π − Θ, (k + 2)Π − Θ for a positive integer k. Lemma 4.6.5 For a scheduling unit SUhW, R, Ai, where W = {Ti hpi , ci i}, R = ΓhΠ, Θi, and A = RM, under the restriction that (k + 1)Π − Θ ≤ Pmin < (k + 2)Π − Θ, where Pmin is the smallest task period of W and k is any positive integer, if the workload set W fully utilizes the resource Γ under RM scheduling with the smallest possible workload utilization, Pmin = (k + 2)Π − 2Θ. Proof. Suppose that the workload set W fully utilizes the resource Γ under RM scheduling. Then, according to Lemma 4.6.4, the following holds:
X
ci = sbfΓ (Pmin ).
Ti ∈W
Let Pk∗ denote (k + 2)Π − 2Θ. We first consider three cases in terms of Pmin : (1) Pmin ∈ Pk∗ − (Π − Θ), Pk∗ , (2) Pmin ∈ Pk∗, Pk∗ + Θ , and (3) Pmin ≥ Pk∗ + Θ. 1. For the first case where Pmin ∈ Pk∗ −(Π−Θ), Pk∗ , it is clear that sbfΓ (Pmin ) = kΘ,
since there is no resource supply during the interval Pk∗ − (Π − Θ), Pk∗ at the worstcase resource supply. We transform W = {Ti (pi , ci )} to W ′ = {Ti′ (p′i , c′i )} such
66
that Ti′ (p′i , c′i )
=
Ti (pi , ci )
Ti Pk∗ , ci
if (pi ≥ Pk∗ ), otherwise,
With this transformation, W ′ still fully utilizes the resource ΓhΠ, Θi while its workload utilization decreases, i.e., UW ′ < UW . 2. For the second case where Pmin ∈ Pk∗ , Pk∗ + Θ , there is a resource supply of Θ
during the interval Pk∗ , Pk∗ + Θ at the worst-case resource supply. Let us consider
Pmin = Pk∗ + δ, where 0 ≤ δ < Θ. In this case, sbfΓ (Pmin ) = kΘ + δ. We wish to show that sbfΓ (Pk∗ ) ≤
Pk∗ sbfΓ (Pmin ). Pmin
(4.13)
It follows that
sbfΓ (Pk∗) sbfΓ (Pmin ) − Pk∗ Pmin
=
kΘ kΘ + δ − ∗ Pk∗ Pk + δ
≤
0.
We transform W = {Ti hpi , ci i} to W ′ = {Ti′ hp′i , c′i i} such that p′i = pi · q and c′i = ci · q, where q = Pk∗ /Pmin . With this transformation, UW ′ = UW . According to Lemma 4.6.4, if W ′ fully utilizes the resource Γ under RM scheduling with the smallest possible workload utilization, it should satisfy
X
c′i = sbfΓ (Pmin ).
Ti′ ∈W ′
67
However, it follows
X
Ti′ ∈W ′
c′i = q ·
X
Ti ∈W
ci = q · sbfΓ (Pmin ) ≥ sbfΓ (Pk∗ ).
Thus, we may need to decrease some c′i to make SU ′ hW ′, R, Ai schedulable, and this consequently decreases the workload utilization UW ′ , which leads to UW ′ < UW . 3. For the third case where Pmin ≥ Pk∗ + Θ, we can easily see from the above two cases that when Pmin ∈ [Pk∗ + Θ + (m − 1)Π, Pk∗ + Θ + mΠ) for m = 1, 2, . . ., the least upper-bound of a schedulable workload utilization can be minimized only if Pmin = Pk∗ + mΠ. Thus, we consider Pmin as Pk∗ + mΠ. We now wish to show that
sbfΓ (Pk∗ ) ≤
Pk∗ sbfΓ (Pmin ). Pmin
(4.14)
It follows that
sbfΓ (Pk∗ ) sbfΓ (Pmin ) − Pk∗ Pmin
=
kΘ kΘ + mΘ − ∗ Pk∗ Pk + mΠ
≤
0.
As in the above second case, we transform W = {Ti (pi , ci )} to W ′ = {Ti′ (p′i , c′i )} such that p′i = pi · q and c′i = ci · q, where q = Pk∗ /Pmin . It is shown in the above second case that when Eq. (4.14) holds, we may need to decrease some c′i to make SU ′ hW ′, R, Ai schedulable and this consequently decreases the workload utilization UW ′ , which leads to UW ′ < UW .
68
2 Using the properties shown in Lemma 4.6.4 and 4.6.5, we present the following theorem to introduce a utilization bound UBΓ,RM (n, Pmin ) for a scheduling unit SUhW, R, Ai, where W = {T1 hp1 , c1 i, . . . , Tn hpn , cn i}, R = ΓhΠ, Θi, and A = RM. Theorem 4.6.6 A scheduling unit SUhW, R, Ai is schedulable, where W = {T1 hp1 , c1 i, . . . , Tn hpn , cn i}, R = ΓhΠ, Θi, and A = RM, under the restrictions that the ratio between any two task periods of W is less than 2, if
h (2k + 2) − 2 · U 1/n i Γ UW ≤ UΓ · n −1 , (k + 2) − 2 · UΓ where k is the largest integer that satisfies (k +1)Π−Θ ≤ Pmin , where Pmin is the smallest task period in W .
Proof. For a workload set W = {T1 (p1 , c1 ), . . . , Tn (pn , cn )}, we assume that pn > pn−1 > · · · > p2 > p1 . Let e∗1 , e∗2 , · · · , e∗n be the execution times of the tasks T1 , T2 , · · · , Tn that determine the least upper-bound of UW subject to the schedulability guarantee of the schedul∗ ing component C. Let UW denote the least schedulable utilization bound for C. To achieve ∗ UW , Lemma 4.6.4 shows that the execution times e∗1 , e∗2 , · · · , e∗n should be determined as
follows:
e∗1 = sbfΓ (p1 , p2 ), . . . , e∗n−1 = sbfΓ (pn−1 , pn ), e∗n = sbfΓ (0, p1 ) − sbfΓ (p1 , pn ). 69
∗ Then, we can derive UW as follows:
e∗ e∗1 e∗ + · · · + n−1 + n p1 pn−1 pn sbfΓ (pn−1 , pn ) sbfΓ (0, p1 ) − sbfΓ (p1 , pn ) sbfΓ (p1 , p2 ) +···+ + .(4.15) = p1 pn−1 pn
∗ UW =
∗ Now, we want to find the the minimum value of UW . According to Lemma 4.6.5, the ∗ smallest task period p1 should be Pk∗ , where Pk∗ = (k + 2)Π − 2Θ, to minimize UW . Thus,
we rewrite Eq. (4.15) as follows:
sbfΓ (Pk∗, p2 ) sbfΓ (pn−1 , pn ) sbfΓ (0, Pk∗) − sbfΓ (Pk∗, pn ) + + · · · + Pk∗ pn−1 pn ∗ sbfΓ (Pk , p2 ) sbfΓ (pn−1 , pn ) kΘ − sbfΓ (Pk∗ , pn ) = + . (4.16) + · · · + Pk∗ pn−1 pn
∗ UW =
∗ To find the minimum value of UW , Eq. (4.16) must be minimized over the pi ’s. Since
the supply bound function sbfΓ (t1 , t2 ) is a discrete function, however, it is difficult to ob∗ tain the minimum value of UW through a numerical analysis. Thus, we replace sbfΓ (t1 , t2 ) ∗ with its linear lower-bound function lsbfΓ (t1 , t2 ) to obtain the minimum value of UW
through the numerical analysis. Now, we rewrite Eq. (4.16) as follows:
lsbfΓ (pn−1 , pn ) kΘ − lsbfΓ (Pk∗ , pn ) lsbfΓ (Pk∗ , p2 ) + + · · · + Pk∗ pn−1 pn ∗ UΓ (p2 − Pk ) UΓ (pn − pn−1 ) kΘ − UΓ (pn − Pk∗ ) = + + · · · + Pk∗ pn−1 pn p ∗ kΠ + Pk pn 2 = UΓ + −n . (4.17) +···+ ∗ Pk pn−1 pn
∗ UW =
70
∗ We can now find the minimum value of UW by minimizing Eq. (4.15) over the pi ’s. ∗ This can be done by setting the first derivative of UW , with respect to each of the pi ’s equal
to zero and solving the resultant difference equations:
∗ ∂UW /∂pi =
p2i − pi−1 · pi+1 = 0, pi−1 · p2i
i = 2, 3, · · · , n.
(4.18)
The definition pn+1 = (kΠ + Pk∗ ) has been adopted for convenience. The general solution to Eq. (4.18) can be shown to be
pi =
(2k + 2)Π − 2Θ (i−1)/n (k + 2)Π − 2Θ · , (k + 2)Π − 2Θ
i = 1, 2, · · · , n. (4.19)
It follows from Eq. (4.17) and (4.19) that
h (2k + 2)Π − 2Θ 1/n i ∗ UW = UΓ · n −1 (k + 2)Π − 2Θ h (2k + 2) − 2 · U 1/n i Γ = UΓ · n −1 . (k + 2) − 2 · UΓ This completes the proof.
2
The restriction that the largest ratio between task periods less than is 2 in Theorem 4.6.6 can actually be removed, which we state as:
Theorem 4.6.7 A scheduling unit SUhW, R, Ai is schedulable, where W = {T1 hp1 , c1 i,
71
. . . , Tn hpn , cn i}, R = ΓhΠ, Θi, and A = RM, if h (2k + 2) − 2U 1/n i Γ UW ≤ UΓ · n −1 , (k + 2) − 2UΓ where k is the largest integer that satisfies (k + 1)Π − Θ ≤ Pmin , while Pmin is the smallest task period in W . Proof. Consider a task set W = {T1 , · · · , Tn }. Without loss of generality, we assume that p1 ≤ p2 ≤ · · · ≤ pn . Assume that c1 , c2 , · · · , cn guarantees the schedulability of W over Γ and that any increase in ci for any task Ti ∈ W will violate the schedulability of W over Γ. Let UW denote the corresponding utilization. Suppose that for some task Tk ∈ W , ⌊pn /pk ⌋ > 1. To be specific, let pn = q ·pk +r, q > 1 and 0 ≤ r < pk . Let us replace the task Tk by a task Tk′ such that p′k = q · pk and e′k = ck , and increase cn by the amount needed to maximize the utilization subject to the schedulability guarantee over Γ according to Theorem 4.5.3. This increase is at most ck (q − 1), the time within the critical window of Tn occupied by Tk , but not by Tk′ . Let UW ′ denote the utilization factor of such a set of tasks. We have
ck ck (q − 1) ck + ′ − pn pk pk ck (q − 1) ck q · ck + +( − ) q · pk + r q · pk q · pk 1 1 + ck (q − 1) . − qpk + r qpk
UW ′ = UW + = UW = UW
72
Since q − 1 > 0 and [1/(qpk + r)] − (1/q · pk ) ≤ 0, UW ′ < UW . Therefore we conclude that in determining the least upper bound of the processor utilization, we need only consider task sets in which the ratio between any two request periods is less than 2. The theorem thus follows directly from Theorem 4.6.6.
2
It should be noted that the utilization bound UBΓ,RM (n, Pmin ) becomes the Liu and Layland’s RM utilization bound [56] without regard to Pmin if the capacity of a periodic resource UΓ is 1, i.e., the periodic resource is essentially a dedicated resource. Thus, UBΓ,RM (n, Pmin ) is a generalization of the Liu and Layland’s result [56]. As an example, we consider a scheduling unit SUhW, ΓhΠ, Θi, RMi, where Π = 10 and Θ = 4. The resource capacity UΓ of the periodic resource Γ is 0.4. We assume that the smallest task period Pmin of a workload set W is greater than 50, i.e., Pmin ≥ 50 and that there are 2 tasks in W . Then, the RM utilization bound UBΓ,RM (2, 50) is 0.268, where k = 4. That is, if UW is no greater than 0.268, then the scheduling unit SU is schedulable. Figure 4.4 (b) shows the effect of resource period, in terms of k, on the utilization bound as a function of resource capacity under RM scheduling. The solid line, labeled “limit”, shows the limit of the utilization bound, which is obtained when k = ∞. The other curves show the utilization bound when k is given as shown in their labels. It is shown in the graph that as k increases, the utilization bound of a periodic resource converges to its limit.
73
4.6.2 Bounded-Delay Resource Model In this section, we present the utilization bounds of the bounded-delay resource model Φhα, ∆i, which has not yet been introduced. We note that Mok and Feng [59] presented utilization bounds of a partitioned resource that is characterized by a tuple (α, k), where α is a capacity and k is a temporal irregularity2. For instance, they provided the following theorem for an EDF utilization bound of a partitioned resource specified by (α, k). Theorem 6 in [59] A scheduling unit SUhW, R, Ai is schedulable, where W = {Ti hpi , ci i}, A = EDF, and R is a partitioned resource with the capacity of α and the temporal irregularity of k, if
X
Ti ∈W
ci ≤ α. pi − k
We note that the utilization bounds presented in [59], including the above one, are not for a bounded-delay resource model Φhα, ∆i, since the temporal irregularity k is not equal to a partition delay bound ∆. In this section, we derive utilization bounds of a bounded-delay resource model Φhα, ∆i under EDF and RM scheduling. We present the following theorem to introduce a utilization bound of a bounded-delay resource model Φhα, ∆i for a set of periodic tasks under EDF scheduling. Theorem 4.6.8 A scheduling unit SUhW, R, Ai is schedulable, where W = {Ti hpi , ci i}, 2
We refer interested readers to [59] for the definition of the temporal irregularity of k.
74
R = Φhα, ∆i, and A = EDF, if
∆ UW ≤ α 1 − , where Pmin = min {pi }. Ti ∈W Pmin Proof. Here, we consider two cases in terms of a time interval length t: (1) 0 < t < Pmin and (2) Pmin ≤ t. • For the first case where 0 < t < Pmin , from the definition of dbfEDF (W, t), we can see that ∀0 < t < Pmin : dbfEDF (W, t) = 0. Then, it is obvious that
∀0 < t < Pmin : dbfEDF (W, t) ≤ sbfΦ (t).
(4.20)
• For the second case where Pmin ≤ t, suppose that UW is no greater than UBΦ,EDF , i.e., UW
∆ . ≤α 1− Pmin
75
Utilization Bounds EDF
RM
0.6 0.5 0.4 0.3 0.2 0.1 0 1
2
4
8
16
32
64
128
256
512
k
Figure 4.5: Utilization bounds of a bounded-delay resource model Φhα, ∆i, where α = 0.5, as a function of k, where k = Pmin /∆, under EDF and RM scheduling
In this case, we then have
dbfEDF (W, t) ≤ UW · t ∆ ≤ α 1− ·t Pmin
≤ α · (t − ∆)
since Pmin ≤ t
= sbfΦ (t).
For each case, we show that if UW ≤ UBΦ,EDF , the scheduling unit ShW, Φ, EDFi is schedulable, according to Theorem 4.5.2.
2
Figure 4.5 shows how the utilization bounds of the bounded-delay resource model grow with respect to k under EDF scheduling, where k represents the relationship between the delay bound ∆ and the smallest period in the task set Pmin , k = Pmin /∆. It is shown in the figure that as k increases, the utilization bounds converge to their limits which is α under 76
EDF scheduling in Theorem 4.6.8. We present another theorem to introduce a utilization bound of a bounded-delay resource model Φhα, ∆i for a set of periodic tasks under RM scheduling. Theorem 4.6.9 A scheduling unit SUhW, R, Ai is schedulable, where W = {Ti hpi , ci i, . . . , Tn hpn , cn i}, R = Φhα, ∆i, and A = RM, if √ n UW ≤ α n( 2 − 1) −
∆ 2(n−1)/n · Pmin
,
where Pmin = minTi ∈W {pi }. We present the proof of Theorem 4.6.9 as follows: We consider a task period restriction that the largest ratio between task periods is less than 2. Lemma 4.6.10 shows the properties of a workload W , when its workload utilization UW is the least upper bound to the schedulable workload utilization. Using the results of this lemma, Lemma 4.6.11 derives a utilization bound. Lemma 4.6.12 removes the task period restriction. Let us first restrict our discussion to the case in which the ratio between any two task period is less than 2. We present the following lemma to introduce a property of a workload set W in terms of the execution time requirement ci of each task Ti , when the workload utilization UW is the the least upper bound to the schedulable workload utilization. Lemma 4.6.10 For a scheduling unit SUhW, R, Ai, where W = {Ti hpi , cii}, R = Φhα, ∆i, and A = RM, under the restriction that the ratio between any two task periods of W is less than 2, if the workload set W fully utilizes the resource Φ under RM scheduling with the 77
smallest possible workload utilization, it follows that
X
ci = sbfΦ (Pmin ),
Ti ∈W
where Pmin is the smallest task period of the workload set W . Proof. For the workload set W , we assume that p1 < p2 < . . . < pn−1 < pn . We now assume that the workload set W fully utilizes the resource Φ under RM scheduling with ∗ the smallest possible workload utilization UW . Let c∗1 , c∗2 , · · · , c∗n be the execution times of ∗ the tasks T1 , T2 , · · · , Tn that determine UW . Then, we first need to show that
c∗1 = sbfΦ (p1 , p2 ).
Suppose that c∗1 = sbfΦ (p1 , p2 ) + ǫ,
ǫ > 0.
Let c′1 = sbfΦ (p1 , p2 ),
c′2 = c∗2 + ǫ,
c′3 = c∗3 ,
...
c′n = c∗n .
Given that c∗1 , c∗2 , · · · , c∗n guarantee the schedulability of the scheduling component C and that any increase in c∗i will make C unschedulable, it is clear that a workload set with c′1 , c′2 , · · · , c′n is schedulable over Φ and that any increase in c′i will violate the schedulability
78
′ of the task set over Φ. Let UW denote the corresponding utilization. We have
∗ ′ UW − UW = (ǫ/p1 ) − (ǫ/p2 ) > 0.
Hence, this assumption is false. Alternatively, suppose that
c1 = sbfΦ (p1 , p2 ) − ǫ,
ǫ > 0.
Let c′1 = sbfΦ (p1 , p2 ), ′′
c′2 = c∗2 − 2ǫ,
′′
′′
c′3 = c∗3 ,
...
c′n = c∗n .
′′
Again, a workload set with c1 , c2 , · · · , cn−1 , cn is also schedulable over Φ and any increase ′′
′′
in ci will violate the schedulibility of the task set. Let U denote the corresponding utilization. We have ′′
∗ UW − UW = −(ǫ/p1 ) + (2ǫ/p2 ) > 0.
Again, this assumption is also false. ∗ Therefore, if indeed UW is the least upper-bound of the workload utilization, then
c∗1 = sbfΦ (p1 , p2 ).
79
In a similar way, we can show that
c∗2 = sbfΦ (p2 , p3 ),
c∗3 = sbfΦ (p3 , p4 ),
...
c∗n−1 = sbfΦ (pn−1 , pn ).
Consequently,
c∗n = sbfΦ (0, p1 ) − (c∗1 + c∗2 + · · · + c∗n−1 ) = sbfΦ (0, p1 ) − sbfΦ (p1 , pn )
Finally, we have X
ci = sbfΦ (0, p1 ).
Ti ∈W
2 Using the properties shown in Lemma 4.6.10, we present the following theorem to introduce a utilization bound UBΦ,RM (n, Pmin ) for a scheduling unit SUhW, R, Ai, where W = {T1 hp1 , c1 i, . . ., Tn hpn , cn i}, R = Φ(α, ∆), and A = RM. Lemma 4.6.11 A scheduling unit SUhW, R, Ai is schedulable, where W = {T1 hp1 , c1 i, . . . , Tn hpn , cn i}, R = Φ(α, ∆), and A = RM, under the restrictions that the ratio between any two task periods of W is less than 2, if
UW
h 2 · P 1/n i min − ∆ ≤α·n −1 , Pmin
80
where Pmin is the smallest task period in W . Proof. For a workload set W = {T1 hp1 , c1 i, . . . , Tn hpn , cn i}, we assume that pn > pn−1 > · · · > p2 > p1 . Let c∗1 , c∗2 , · · · , c∗n be the execution times of the tasks T1 , T2 , · · · , Tn that determine the least upper-bound of UW subject to the schedulability guarantee of the schedul∗ ing component C. Let UW denote the least schedulable utilization bound for C. To achieve ∗ UW , Lemma 4.6.4 shows that the execution times c∗1 , c∗2 , · · · , c∗n should be determined as
follows:
c∗1 = sbfΦ (p1 , p2 ), . . . , c∗n−1 = sbfΦ (pn−1 , pn ), c∗n = sbfΦ (0, p1 ) − sbfΦ (p1 , pn ).
∗ as follows: Then, we can derive UW
c∗ c∗1 c∗ + · · · + n−1 + n p1 pn−1 pn sbfΦ (p1 , p2 ) sbfΦ (pn−1 , pn ) sbfΦ (0, p1 ) − sbfΦ (p1 , pn ) = +···+ + p1 pn−1 pn α(p2 − p1 ) α(pn − pn−1 ) α(2p1 − ∆ − pn ) = +···+ + p1 pn−1 pn p pn 2p1 − ∆ 2 +···+ + −n . (4.21) = α p1 pn−1 pn
∗ UW =
∗ We can now find the minimum value of UW by minimizing Eq. (4.21) over the pi ’s. ∗ This can be done by setting the first derivative of UW , with respect to each of the pi ’s equal
to zero and solving the resultant difference equations:
∗ ∂UW /∂pi =
p2i − pi−1 · pi+1 = 0, pi−1 · p2i 81
i = 2, 3, · · · , n.
(4.22)
The definition pn+1 = 2p1 − ∆ has been adopted for convenience. The general solution to Eq. (4.22) can be shown to be
pi = p1 ·
2p − ∆ (i−1)/n 1 , p1
i = 1, 2, · · · , n.
(4.23)
It follows from Eq. (4.17) and (4.23) that
∗ UW
h 2p − ∆ 1/n i 1 = α·n −1 . p1
This completes the proof.
2
The restriction that the largest ratio between task periods less than is 2 in Lemma 4.6.11 can actually be removed, which we state as: Lemma 4.6.12 A scheduling unit SUhW, R, Ai is schedulable, where W = {T1 hp1 , c1 i, . . . , Tn hpn , cn i}, R = Φ(α, ∆), and A = RM, if h 2 · P 1/n i min − ∆ UW ≤ α · n −1 , Pmin where Pmin is the smallest task period in W . Proof. Consider a task set W = {T1 , · · · , Tn }. Without loss of generality, we assume that p1 ≤ p2 ≤ · · · ≤ pn . Assume that c1 , c2 , · · · , cn guarantees the schedulability of W over Φ and that any increase in ci for any task Ti ∈ W will violate the schedulability of W over Φ. Let UW denote the corresponding utilization. 82
Suppose that for some task Tk ∈ W , ⌊pn /pk ⌋ > 1. To be specific, let pn = q ·pk +r, q > 1 and 0 ≤ r < pk . Let us replace the task Tk by a task Tk′ such that p′k = q · pk and e′k = ck , and increase cn by the amount needed to maximize the utilization subject to the schedulability guarantee over Φ according to Theorem 4.5.4. This increase is at most ck (q − 1), the time within the critical window of Tn occupied by Tk , but not by Tk′ . Let UW ′ denote the utilization factor of such a set of tasks. We have
ck (q − 1) ck ck + ′ − pn pk pk ck (q − 1) q · ck ck + − ) +( q · pk + r q · pk q · pk 1 1 . + ck (q − 1) − qpk + r qpk
UW ′ = UW + = UW = UW
Since q − 1 > 0 and [1/(qpk + r)] − (1/q · pk ) ≤ 0, UW ′ < UW . Therefore we conclude that in determining the least upper bound of the processor utilization, we need only consider task sets in which the ratio between any two request periods is less than 2. The theorem thus follows directly from Lemma 4.6.11.
2
Figure 4.5 shows how the utilization bounds of the bounded-delay resource model grow with respect to k under RM scheduling, where k represents the relationship between the delay bound ∆ and the smallest period in the task set Pmin , k = Pmin /∆. It is shown in the figure that as k increases, the utilization bounds converge to their limits which is log 2 · α under RM scheduling in Theorem 4.6.9.
83
4.7 Schedulable Resource Capacity Bounds For a scheduling unit SUhW, R, Ai, we propose a notion of (schedulable resource) capacity bound CBW,A , which can be defined as a number such that a scheduling unit SUhW, R, Ai is schedulable if UR ≥ CBR (W, A). Like the utilization bound, the capacity bound can be particularly useful for on-line acceptance tests. When checking whether a periodic task set is schedulable with a shared resource, computing the capacity bound takes a constant amount of time, much faster than doing an exact schedulability analysis. Section 4.7.1 and 4.7.2 address the problem of deriving the capacity bounds of periodic and bounded-delay resources, respectively, with a periodic task set under EDF and RM scheduling.
4.7.1 Periodic Resource Model Capacity Bound under EDF Scheduling For a scheduling unit SUhW, R, Ai, where W = {Ti hpi, ci i}, R = ΓhΠ, Θi, and A = EDF, we present the following theorem to introduce its capacity bound CBΓ (W, EDF), from Theorem 4.6.1.
Theorem 4.7.1 For a scheduling unit SUhW, R, Ai, where W = {Ti hpi , ci i}, R = ΓhΠ, Θi,
84
and A = EDF, its capacity bound CBΓ (W, EDF, k) as a function of k is
CBΓ (W, EDF, k) =
(k + 2) · UW , k + 2UW
while k represents the relationship assumption between Pmin and ΓhΠ, Θi such that k is the largest integer that satisfies (k + 1)Π − 2Θ + ǫ ≤ Pmin , where ǫ = 2Θ/(k + 2). Proof. Theorem 4.6.1 states that the scheduling component SUhW, Γ, EDFi is schedulable if UW ≤ UBΓ,EDF (Pmin ), i.e.,
UW ≤
k · UΓ . k + 2(1 − UΓ )
UΓ ≥
(k + 2) · UW . k + 2UW
Then, it follows
2 As an example, we consider a scheduling unit SUhW, ΓhΠ, Θi, EDFi, where W = {T1 h33, 5i, T2 h75, 7i, T3 h100, 10i}. The workload utilization UW is 0.318. Assume that the relationship between Pmin and ΓhΠ, Θi is given by k = 2. Then, the capacity bound CBΓ (W, EDF, 2) is 0.483. That is, a scheduling unit SU is schedulable if UΓ ≥ 0.483 and the constraint between Pmin and ΓhΠ, Θi represented by k = 2 is satisfied.
85
(a) Capacity Bound under EDF Scheduling
(b) Capacity Bound under RM Scheduling
1
1
0.8
0.8
0.6
0.6
0.4
0.4
limit k=16 k=4 k=2 k=1
0.2
limit k=16 k=4 k=2 k=1
0.2
0
0 0
0.2
0.4 0.6 0.8 Workload Utilization
1
0
0.1
0.2 0.3 0.4 0.5 Worklaod Utilization
0.6
0.7
Figure 4.6: Capacity bound as a function of workload utilization: (a) under EDF scheduling and (b) under RM scheduling.
Figure 4.6(a) shows the effect of resource period, in terms of k, on the capacity bound as a function of workload utilization under EDF scheduling. The solid line, labeled “limit”, shows the limit of the capacity bound, which is obtained when k = ∞. The other curves show the capacity bounds when k is given as shown in their labels. It is shown that as k increases, the capacity bound of a periodic resource converges to its limit.
Capacity Bound under RM Scheduling For a scheduling unit SUhW, R, Ai, where W = {T1 hp1 , c1 i, . . . , Tn hpn , cn i}, R = ΓhΠ, Θi, and A = RM, we present the following theorem to introduce its capacity bound CBΓ (W, RM) from Theorem 4.6.3.
Theorem 4.7.2 For scheduling unit SUhW, R, Ai, where W = {T1 hp1 , c1 i, . . ., Tn hpn , cn i}, R = ΓhΠ, Θi, and A = RM, its capacity bound CBΓ (W, RM, k) as a function of k is
CBΓ (W, RM, k) =
UW log
86
2k+2(1−UW ) k+2(1−UW )
,
while k represents the relationship assumption between Pmin and ΓhΠ, Θi such that k is the largest integer that satisfies (k + 1)Π − Θ ≤ Pmin . Proof. Theorem 4.6.3 states that the scheduling component ChW, Γ, RMi is schedulable if UW ≤ UBΓ,RM (n, Pmin ), i.e.,
UW
h 2k + 2(1 − U ) 1/n i Γ ≤ UΓ · n −1 . k + 2(1 − UΓ )
(4.24)
When n is large, we have
h 2k + 2(1 − U ) 1/n i 2k + 2(1 − U ) Γ Γ . n − 1 ≃ log k + 2(1 − UΓ ) k + 2(1 − UΓ )
(4.25)
From Eq. (4.24) and (4.25), it follows
UΓ ≥
UW log
2k+2(1−UΓ ) k+2(1−UΓ )
.
Since UW ≤ UΓ , we have
log
2k + 2(1 − U ) 2k + 2(1 − U ) Γ W ≤ log . k + 2(1 − UW ) k + 2(1 − UΓ )
From Eq. (4.26), the scheduling component ChW, Γ, RMi is schedulable, if
UΓ ≥
UW log
2k+2(1−UW ) k+2(1−UW )
87
.
(4.26)
2 As an example, we consider a scheduling unit SUhW, ΓhΠ, Θi, RMi, where W = {T1 (33, 5), T2 (75, 7), T3(100, 10)}. The workload utilization UW is 0.318. Assume that the relationship between Pmin and ΓhΠ, Θi is given by k = 2. Then, the capacity bound CBΓ (W, RM, 2) is 0.554. That is, a scheduling unit SU is schedulable if UΓ ≥ 0.554 and the constant between Pmin and ΓhΠ, Θi represented by k = 2 is satisfied. Figure 4.6(b) shows the effect of resource period, in terms of k, on the capacity bound as a function of workload utilization under RM scheduling. The solid line, labeled “limit”, shows the limit of the capacity bound, which is obtained when k = ∞. The other curves show the capacity bounds when k is given as shown in their labels. It is shown that as k increases, the capacity bound of a periodic resource converges to its limit.
4.7.2 Bounded-Delay Resource Model Capacity Bound under EDF Scheduling For a scheduling unit SUhW, R, Ai, where W = {Ti hpi, ci i} and A = EDF, we present the following theorem to introduce the capacity bound CBΦ (W, EDF) of a bounded-delay resource, from Theorem 4.6.8.
Theorem 4.7.3 For a component SUhW, R, Ai, where W = {Ti hpi , ci i} and A = EDF, the capacity bound CBΦ (W, EDF, k) of a bounded-delay resource Φhα, ∆i as a function of
88
k is CBΦ (W, EDF, k) =
k · UW , k−1
while k represents the relationship assumption between Pmin and Φhα, ∆i such that k = Pmin /∆. Proof. Theorem 4.6.8 states that a scheduling unit SUhW, Φhα, ∆i, EDFi is schedulable if UW ≤ UBΦ,EDF (Pmin ), i.e., UW
∆ ≤α 1− . Pmin
Then, it follows
UΦ = α ≥
k · UW . k−1 2
Figure 4.7(a) shows the effect of k, where k = Pmin /∆, on the capacity bound as a function of workload utilization under EDF scheduling. The solid line, labeled “limit”, shows the limit of the capacity bound, which is obtained when k = ∞. The other curves show the capacity bounds when k is given as shown in their labels. It is shown that as k increases, the capacity bound of a bounded-delay resource rapidly converges to its limit.
Capacity Bound under RM Scheduling For a component SUhW, R, Ai, where W = {Ti hpi , ci i} and A = EDF, we present the following theorem to introduce the capacity bound CBΦ (W, RM) of a bounded-delay resource, 89
(a) Capacity Bound under EDF Scheduling 2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0
(b) Capacity Bound under RM Scheduling 1.6
k=2 k=4 k=8 k=16 limit
k=2 k=4 k=8 k=16 limit
1.4 1.2 1 0.8 0.6 0.4 0.2 0
0
0.2
0.4 0.6 0.8 Workload Utilization
1
0
0.1
0.2 0.3 0.4 0.5 Workload Utilization
0.6
0.7
Figure 4.7: Capacity bound as a function of workload utilization: (a) under EDF scheduling and (b) under RM scheduling.
from Theorem 4.6.9
Theorem 4.7.4 For a component SUhW, R, Ai, where W = {Ti hpi , ci i} and A = RM, the capacity bound CBΦ (W, RM, k) of a bounded-delay resource Φhα, ∆i as a function of k is CBΦ (W, RM, n, k) =
k · UW , k · n( 2 − 1) − 2(1−n)/n √ n
where k = Pmin /∆. Proof. Theorem 4.6.9 states that a scheduling unit SUhW, Φhα, ∆i, RMi is schedulable if UW ≤ UBΦ,RM (n, Pmin ), i.e.,
√ n
UW ≤ α n( 2 − 1) −
∆ 2(n−1)/n · Pmin
.
(4.27)
It follows that the scheduling unit SUhW, Φ, RMi is schedulable if
UΦ ≥
k · UW . k · n( 2 − 1) − 2(1−n)/n √ n
90
(4.28)
2 When n approaches to the infinity, we have
CBΦ (W, RM, n, k) =
k · UW . k · ln 2 − 2−1
(4.29)
Figure 4.7(b) shows the effect of k, where k = Pmin /∆, on the capacity bound as a function of workload utilization under RM scheduling. The solid line, labeled “limit”, shows the limit of the capacity bound, which is obtained when k = ∞. The other curves show the capacity bounds when k is given as shown in their labels. It is shown that as k increases, the capacity bound of a bounded-delay resource rapidly converges to its limit.
4.8 Summary In this chapter, we proposed a real-time scheduling framework that considers the characteristics of various shared resources. Specifically, we developed exact schedulability conditions, utilization bounds, and capacity bounds for scheduling units that consist of a periodic task set under EDF and RM scheduling and periodic and bounded-delay resource, respectively. Component-based design has been widely accepted as a methodology for designing large complex systems through systematic abstraction and composition. For real-time systems to benefit from component-based design, it is crucial to develop techniques for analyzing the schedulability of real-time systems in a compositional way. In the next chap91
ter, we propose a compositional real-time scheduling framework that develops techniques for deriving real-time component interfaces, which is key to achieving component-based real-time system design. The results presented in this chapter will serve as the basis in developing the techniques of the compositional real-time scheduling framework in the next chapter.
92
Chapter 5 Real-Time Component Interfaces
5.1 Introduction A hierarchical scheduling framework has been proposed for supporting hierarchical resource sharing among applications under different scheduling algorithms. The hierarchical scheduling framework can be generally represented as a tree, or a hierarchy, of nodes, in which each node represents an application (a component), each with its own scheduling algorithm to schedule internal workloads (threads); resources are allocated from a parent node to its children nodes, as illustrated in Figure 5.1. The concept of this hierarchical scheduling framework is particularly useful in the domain of open systems [16], where applications may be developed and validated independently even in different environments. The hierarchical scheduling framework allows applications to be independently developed with their own internal scheduling algorithms and to be transported to the systems that may have different OS scheduling algorithms for application scheduling. The hierarchical 93
scheduling framework can also support partitioning of applications such that the applications are separated usually for fault containment, which prevents any partitioned application from causing a failure of another partitioned application, and for ease of verification, validation and certification. In the real-time systems research, there has been growing attention to hierarchical scheduling frameworks [16, 43, 54, 22, 67, 70, 55, 75, 76, 4]. One of important problems for a hierarchical scheduling framework is to analyze the schedulability of the framework. Many approaches have been proposed for this problem. In this chapter, we propose a new scheduling framework, called the compositional scheduling framework, that can address the schedulability analysis of a hierarchical scheduling framework in a “compositional” manner. That is, our goal is to develop a framework, where we can achieve the system-level schedulability analysis by composing component-level schedulability analysis results. In our proposed compositional scheduling framework, as in a hierarchical scheduling framework, a parent component provides resource allocations to its child components. In the proposed framework, each child component C develops a “real-time” component interface I that specifies its collective real-time resource requirements and then exports the interface I to its parent component. As long as the parent component provides resource allocations to the child component C such that it can satisfy the real-time resource requirements imposed by the interface I, the parent component is able to guarantee the schedulability of the child component C. This scheme makes it possible for a parent component to supply resources to its child components without controlling (or even understanding) how
94
the child components schedule resources for their own internal workloads. This scheme also makes it possible for the timing requirements of a hierarchy of components to be established through the real-time component interfaces of components without knowing the timing requirement of individual workloads within components. Our contributions in this proposed framework are as follows: • Real-time component interface. We propose a notion of real-time component interface, which can summarize the temporal properties of a component hiding its internal complexity. • Component timing abstraction. We address the problem of finding the minimum resource requirements necessary to guarantee the schedulability of a component. The result becomes the real-time component interface of the component. • Abstraction overhead evaluation. We evaluate the overheads that real-time component interface incur in terms of the increase of resource capacity requirement, in solving the component timing abstraction problem. We derive analytical bounds to the overheads and evaluate the overheads through simulations. In the proposed framework, it is an important issue to define a real-time component interface model. One of our approaches is to use a standard real-time workload model, i.e., the Liu and Layland periodic task model [56], as a real-time component interface model. That is, this approach abstracts a component with collective real-time resource requirements as a single standard workload, i.e., a single task with a periodic resource requirement. In addition to the standard periodic interface model, in this chapter, we consider another 95
(parent) scheduling unit resource algorithm
resource allocation
workload
resource requirement
workload
resource
resource
algorithm
algorithm
workload
workload
workload
workload
(child) scheduling units
Figure 5.1: Hierarchical scheduling framework.
interface model, which is identical to the bounded-delay resource model [60], to show that the proposed framework can be developed without regards to real-time component interface models. With these two interface models, we address the component abstraction problem for the components that consist of a standard periodic workload set and the EDF or RM scheduling algorithm. The remainder of the work is structured as follows: Section 5.2 presents related work. Section 5.3 describes the system model and the problem statement. Section 5.4 addresses the component timing abstraction problems with two interface models, and Section 5.5 presents analytical bounds to the overhead incurred in addressing the component timing abstraction problem, and Section 5.6 the overhead through simulations. Section 5.7 summarizes this chapter.
96
5.2 Related Work In the real-time systems research, there has been growing attention to hierarchical scheduling frameworks [16, 43, 54, 22, 67, 70, 55, 75, 76, 4]. Many approach have been introduced to address the schedulability analysis problem in a hierarchical scheduling framework. Deng and Liu [16] first proposed a two-level hierarchical scheduling framework for open systems. In their proposed framework, each application can allow any scheduler to schedule its internal tasks, however, the system is constrained to only one scheduling algorithm, rather than allowing any algorithm for application scheduling. For such a framework, Kuo and Li [43] considered the RM system scheduler and presented an exact schedulability condition under the assumption that all periodic tasks across components are harmonic. Lipari and Baruah [54] considered the EDF system scheduler and presented an exact schedulability condition under the assumption that the system scheduler has knowledge of the deadlines of all tasks in each application. In open systems where applications are developed independently, however, it is desirable to remove such assumptions in order to achieve a clear separation between the system and its applications. Mok et al. [60] proposed the bounded-delay resource partition model for a hierarchical scheduling framework. Their model can be used to specify the real-time guarantees that a parent component provides to its child components, without regard to the scheduler of the parent component. They showed that the schedulability of a hierarchical scheduling framework can be analyzed with their model, while a parent component and its child components can have any schedulers. They provided sufficient schedulability conditions but
97
not necessary conditions. Recent three studies [70, 55, 4] commonly considered a hierarchical scheduling framework, where a parent component provides periodic resource allocations to its child components with the child components having the RM scheduler. They shared the Liu and Layland periodic model [56] to specify the characteristics of the periodic resource allocations provided to a child component. Saewong et al. [70] introduced an exact schedulability condition for a child component1 based on the worst-case response time analysis and provided a utilization bound. Lipari and Bini [55] presented an exact schedulability condition based on time demand calculations while addressing the component timing abstraction problem. Almeida and Pedreiras [4] considered the issue of efficiently solving the component timing abstraction problem. Regehr and Stankovic [67] introduced another hierarchical scheduling framework that considers various kinds of real-time guarantees. Their work focused on converting one kind of guarantee to another kind of guarantee such that whenever the former is satisfied, the latter is also satisfied. With their conversion rules, the schedulability of a child component is sufficiently analyzed such that it is schedulable if its parent component provides realtime guarantees that can be converted to the real-time guarantee that the child component demands. However, they did not consider the scheduling component timing abstraction and composition problems. 1
Saewong et al. [70] and Lipari and Bini [55] presented their schedulability condition as a sufficient condition. However, we consider it as an exact condition based on a different notion of schedulability.
98
5.3 System Model and Problem Statement In this chapter, we consider that a component model ChW, Ai, where W is a workload set and A is a scheduling algorithm. As a workload model, we consider the Liu and Layland periodic task model T hp, ci, where p is a period and c is the worst-case execution time requirement. As scheduling algorithms, we consider the EDF and RM algorithms. Given a component ChW, Ai and a real-time component interface I, the interface I is said to abstract (the collective real-time requirements of) the component C, (denoted as I |= C), if a scheduling unit SUhW, R, Ai is schedulable, where R = I. When an interface I is said to be the interface of a component C, it means that I |= C. In this chapter, we consider two interface models: one is the periodic model, and the other is the bounded-delay model. In this chapter, we address the following problems: • Component timing abstraction. We define a problem, called the component timing abstraction problem, as abstracting the collective real-time requirements of a component as a single real-time requirement, called the real-time component interface, without revealing the internal structure of the component, e.g., the number of tasks and its scheduling algorithm. We formulate the problem as follows: given a component ChW, Ai the problem is to find an “optimal” real-time interface I such that the interface I abstracts the component C. The optimality over the real-time interface can be determined with respect to various criteria such as minimizing resource capacity requirements imposed by the interface and minimizing context switch overheads. In 99
this chapter, we consider only the resource capacity requirements of real-time component interfaces as the criterion of optimality. • Component abstraction overhead. Given a component ChW, Ai and its component interface I, we define the component abstraction overhead (OC,I ) of a real-time component interface I for the component C
OC,I =
UI − UW , UW
(5.1)
where UI is the resource utilization of I. For the periodic and bounded-delay component interfaces, we derive analytical bounds to their component abstraction overheads and evaluate their overheads through simulation.
5.4 Real-time Component Interfaces In this section, we address the component timing abstraction problems with two real-time component interface models, which are the periodic and bounded-delay interface models. We first define the component timing abstraction problem for real-time component interface model I as follows: Given component ChW, Ai, where W = {T1 hp1 , e1 i, . . . , Tn hpn , en i}, the problem is to find periodic interface I such that the interface I abstracts the component C, i.e., I |= C. 100
(a) Solution Space under EDF
(b) Solution Space under RM 1 resource capacity
resource capacity
1 0.8 0.6 0.4 0.2 0
0.8 0.6 0.4 0.2 0
1
10
19
28
37
46
55
64
73
1
interface period
10
19
28
37
46
55
64
73
interface period
Figure 5.2: Schedulable region of a periodic resource ΓhΠ, Θi: (a) under EDF scheduling and (b) under RM scheduling.
Section 5.4.1 and 5.4.2 show how to address this problem with the two real-time interface models.
5.4.1 Periodic Interface Model In this section, we consider the standard periodic task model as a real-time component interface model. That is, we consider the periodic interface model PIhP, Ci, where P is an interface period and A is an execution time requirement. The resource capacity requirement UPI of periodic interface PIhP, Ci is defined as C/P. We are now interested in finding a periodic interface solution PIhP, Ci to the component timing abstraction problem. By definition, finding periodic interface PI that abstracts a component ChW, Ai is equivalent to finding periodic resource ΓhΠ, Θi that makes scheduling unit SUhW, Γ, Ai schedulable. We explain this with examples. As an example, let us consider a component ChW, Ai, where W = {T1 h50, 7i, T2h75, 9i}. The workload utilization UW is 0.26. We consider this example in two cases: one is
101
A = EDF and the other is A = RM. • (1) When A = EDF, we can obtain a solution space of PIhP, Ci to the component timing abstraction problem by finding periodic resources ΓhΠ, Θi that make scheduling units SUhW, Γ, EDFi schedulable according to Theorem 4.5.1. Figure 5.2(a) shows such a solution space as the gray area for each integer interface period P = 1, 2, . . . , 75. For instance, periodic interface PIhP, Ci abstracts the example component C when P = 10 and 2.8 ≤ A ≤ 10. • (2) When A = RM, we can obtain a solution space of PIhP, Ci to the component timing abstraction problem by finding periodic resources ΓhΠ, Θi that make scheduling units SUhW, Γ, RMi schedulable according to Theorem 4.5.3. Figure 5.2(b) shows such a solution space as the gray area for each integer interface period P = 1, 2, . . . , 75. For instance, periodic interface PIhP, Ci abstracts the example component C when P = 10 and 3.5 ≤ A ≤ 10. As shown in the solution spaces of Figure 5.2, the resource capacity requirements of periodic interfaces vary as their interface periods vary, even for the same component. For periodic interface solutions to the component timing abstraction problem, therefore, we consider it is reasonable to define their optimality as follows: Given component ChW, Ai, where W = {T1 hp1 , e1 i, . . . , Tn hpn , en i}, and constant P∗ for interface period, periodic interface PIhP, Ci is said to be the optimal periodic interface of the component C if P = P∗ , PI |= C and UPI is minimized. 102
Solution Space under EDF
Solution Space under RM 1 resource capacity
resource capacity
1 0.8 0.6 0.4 0.2 0
0.8 0.6 0.4 0.2 0
1
10 20 30 40 50 60 70 80 90 100
1
bounded delay
10 20 30 40 50 60 70 80 90 100 bounded delay
Figure 5.3: Example of solution space of a bounded-delay scheduling interface model Φhα, ∆i for a workload set W = {T1 h100, 11i, T2h150, 22i} under EDF and RM scheduling.
Then, we define the periodic component abstraction method, denoted as PCA(P, hW, Ai), that takes as input P and hW, Ai and returns C such that PI |= C and UPI is minimized. In the above example, when the interface period P is fixed to 10, PCA(P, hW, EDFi) returns 2.8. That is, PIh10, 2.8i is the optimal period interface of the example component C, where A = EDF.
5.4.2 Bounded-delay Interface Model We consider another real-time component interface model, which is the bounded-delay interface model BIhA, Di, where A is a resource capacity (rate) requirement and D is a bounded delay. The resource capacity requirement UBI of a bounded-delay interface BIhA, Di is simply A. We now consider finding a bounded-delay interface solution BIhA, Di to the component timing abstraction problem. By definition, finding periodic interface BI that abstracts a component ChW, Ai is equivalent to finding periodic resource Φhα, ∆i that makes schedul103
ing unit SUhW, Φ, Ai schedulable. We explain this with examples. As an example, let us consider a workload set W = {T1 h100, 11i, T2h150, 22i} and a scheduling algorithm A = EDF. The workload utilization UW is 0.26. We consider this example in two cases: one is A = EDF and the other is A = RM. • (1) When A = EDF, we can obtain a solution space of BIhA, Di to the component timing abstraction problem by finding bounded-delay resources Φhα, ∆i that make scheduling units SUhW, Φ, EDFi schedulable according to Theorem 4.5.2. A solution space of Φhα, ∆i is shown as the gray area in Figure 5.3(a). for each integer bounded delay D = 1, 2, . . . , 100. For instance, bounded-delay interface BIhA, Di abstracts the example component C when D = 60 and 0.36 ≤ A ≤ 1. • (2) When A = RM, we can obtain a solution space of BIhA, Di to the component timing abstraction problem by finding bounded-delay resources Φhα, ∆i that make scheduling units SUhW, Φ, RMi schedulable according to Theorem 4.5.4. A solution space of Φhα, ∆i is shown as the gray area in Figure 5.3(b). for each integer bounded delay D = 1, 2, . . . , 100. For instance, bounded-delay interface BIhA, Di abstracts the example component C when D = 60 and 0.50 ≤ A ≤ 1. As shown in the solution spaces of Figure 5.3, the resource capacity requirements of bounded-delay interfaces vary as their bounded delays vary, even for the same component. For bounded-delay interface solutions to the component timing abstraction problem, therefore, we consider it is reasonable to define their optimality as follows: Given component ChW, Ai, where W = {T1 hp1 , e1 i, . . . , Tn hpn , en i}, and 104
constant D∗ for bounded delay, bounded-delay interface BIhA, Di is said to be the optimal bounded-delay interface of the component C if D = D∗ , BI |= C, and UBI is minimized. In the above example, when the bounded delay D is fixed to 60, BIh0.36, 60i is the optimal period interface of the example component C, where A = EDF.
5.5 Abstraction Overheads: Analytical Results Given component ChW, Ai and real-time component interface I that abstracts C, we define their component abstraction overhead (OC,I ) as follows:
OC,I =
UI − UW , UW
(5.2)
where UI is the resource capacity requirement of real-time component interface I. In this section, we derive analytical bounds to the component abstraction overheads of periodic and bounded-delay interfaces.
5.5.1 Periodic Interface Model ∗ For a component ChW, Ai and its optimal periodic interface PI, let OC,PI denote an upper ∗ bound to their component abstraction overhead, i.e., OC,PI ≤ OC,PI .
Theorem 5.5.1 Given a component ChW, Ai and its optimal periodic interface PIhP, Ci, where W = {T1 hp1 , c1 i, . . . , Tn hpn , cn i} and A = EDF, an upper-bound to their compo105
nent abstraction overhead can be represented as a function of k as follows:
∗ OC,PI (k) ≤
2(1 − UW ) , k + 2UW
(5.3)
where k is the largest integer that satisfies (k+1)P−2C+ǫ ≤ Pmin , where ǫ = 2C/(k+2). Proof. According to Theorem 4.7.1, for any periodic resource Γ∗ hΠ∗ , Θ∗ i, scheduling unit SUhW, Γ∗ , Ai is schedulable, if
UΓ∗ ≥
(k + 2) · UW . k + 2UW
Then, periodic interface PI∗ hP∗ , C∗ i is guaranteed to abstract the component C, i.e., PI∗ |= C, when P∗ = Π∗ and C∗ = Θ∗ . The component abstraction overhead OC,PI∗ of PI∗ is then given by
UPI∗ −1 UW (k + 2) · UW ≥ −1 (k + 2UW ) · UW 2(1 − UW ) = . k + 2UW
OC,PI∗ =
When we have periodic interface PI∗ such that
UPI∗ =
(k + 2) · UW , k + 2UW
106
(a) Analytical Overhead Bound under EDF
(b) Analytical Overhead Bound under RM
2.5 2 1.5
2.5 k=1 k=2 k=4 k=8 limit
k=1 k=2 k=4 k=8 limit
2 1.5
1
1
0.5
0.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Workload Utilization
0 0.1
1
0.2
0.3 0.4 0.5 Workload Utilization
0.6
0.7
Figure 5.4: Analytical bound of component abstraction overhead of periodic interface as a function of workload utilization: (a) under EDF scheduling and (b) under RM scheduling.
it is guaranteed that PI∗ |= C and we obtain a component abstraction overhead as follows:
OC,PI∗ (k) =
2(1 − UW ) . k + 2UW
(5.4)
The component abstraction overhead in Eq. (5.4) is a upper-bound to the component abstraction overhead OC,PI of periodic interface PI.
2
Theorem 5.5.2 Given a component ChW, Ai and its optimal periodic interface PIhP, Ci, where W = {T1 hp1 , c1 i, . . . , Tn hpn , cn i} and A = RM, an upper-bound to their component abstraction overhead can be represented as a function of k as follows:
∗ OC,PI (k) =
1 log
2k+2(1−UW ) k+2(1−UW )
− 1,
(5.5)
where k is the largest integer that satisfies (k + 1)P − C ≤ Pmin . Proof. To be added.
2
107
Figure 5.4 shows the effect of period of periodic interface, in terms of k, on the bound of component abstraction overhead as a function of workload utilization under EDF and RM scheduling. The solid line, labeled “limit”, shows the limit of the bound, which is obtained when k = ∞. The limit is 0 under EDF scheduling and (1/ log 2) − 1, which is 0.443, under RM scheduling. The other curves show the bounds when k is given as shown in their labels. It is shown that as k increases, the analytical bound converges to its limit.
5.5.2 Bounded-Delay Interface Model ∗ For a component ChW, Ai and a bounded-delay component interface BI, let OC,BI denote ∗ a upper bound to their component abstraction overhead OC,BI such that if OC,BI ≤ OC,BI ,
then BI abstracts C, i.e., BI |= C. Theorem 5.5.3 Given a component ChW, Ai and its optimal bounded-delay interface BIhA, Di, where W = {T1 hp1 , c1 i, . . . , Tn hpn , cn i} and A = EDF, an upper-bound to their component abstraction overhead can be represented as a function of k as follows:
∗ OC,BI (k) =
1 , k−1
(5.6)
where k = Pmin /D. Theorem 5.5.4 Given a component ChW, Ai and its optimal bounded-delay interface BIhA, Di, where W = {T1 hp1 , c1 i, . . . , Tn hpn , cn i} and A = RM, an upper-bound to their
108
(a) Abstraction Overheads under EDF (k=2, n=8) 1.2
1.2
analytical bound simulation results
1
(b) Abstraction Overheads under RM (k=2, n=8)
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0 0.1
0.2
0.3 0.4 0.5 Workload Utilization
0.6
analytical bound simulation result
1
0.7
0 0.1
0.2
0.3 0.4 0.5 Workload Utilization
0.6
0.7
Figure 5.5: Component abstraction overheads of periodic interfaces as a function of workload utilization: (a) under EDF scheduling and (b) under RM scheduling.
component abstraction overhead can be represented as a function of k as follows:
∗ OC,BI (k) =
k − 1, k · n( 2 − 1) − 2(1−n)/n √ n
(5.7)
where k = Pmin /D.
5.6 Abstraction Overheads: Simulation Results In this section, we evaluate the component abstraction overheads of periodic and boundeddelay interfaces through simulations.
5.6.1 Periodic Interface Model For our simulation study, we used the following parameters: • The number of tasks (n) in the workload set W is 2, 4, 8, 16, 32, or 64. • The workload utilization (UW ) of the workload set W is in the interval [0.1, 0.7]. 109
(a) Abstraction Overheads under EDF (U=0.4, n=8) 0.7
0.7
analytical bound simulation result
0.6
(b) Abstraction Overheads under RM (U=0.4, n=8) analytical bound simulation result
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0 1
2
4
8 k
16
32
64
1
2
4
8 k
16
32
64
Figure 5.6: Component abstraction overheads of periodic interfaces as a function of interface period: (a) under EDF scheduling and (b) under RM scheduling.
• Each periodic task T (p, e) has a period p randomly generated in the range [5, 100] and an execution time requirement e generated in the range [1, 40]. • Scheduling algorithm (A) is EDF or RM. • The interface period (P) of periodic timing interface PI(P, C) is determined such that k is 1, 2, 4, 8, 16, 32, or 64, where k is the largest integer that satisfies (k +1)P−2C+ ǫ ≤ Pmin , where ǫ = 2C/(k + 2), under EDF scheduling, or (k + 1)P − C ≤ Pmin , under RM scheduling, while Pmin is the smallest task period in the workload set W . Each point shown in Figure 5.5, 5.6, and 5.7 represents the mean of 1000 simulation results unless specified otherwise. The 99% confidence intervals for data are within 1-4% of the means shown in the graphs. In Figure 5.5, 5.6, and 5.7, component abstraction overheads are plotted under EDF and RM scheduling as a function of workload utilization, interface period, and the number of tasks, respectively. In the graphs, the solid curve, labeled “analytical bound”, shows the maximum analytical bound to the component timing abstraction overhead, which is 110
(a) Abstraction Overheads under EDF (U=0.4, k=2) 0.7
0.7
analytical bound simulation result
0.6
(b) Abstraction Overheads under RM (U=0.4, k=2)
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
analytical bound simulation result
0 2
4
8 16 the number of tasks
32
64
2
4
8 16 the number of tasks
32
64
Figure 5.7: Component abstraction overheads of periodic interfaces as a function of the number of tasks: (a) under EDF scheduling and (b) under RM scheduling.
obtained through Theorem 4.7.1 or 4.7.2 depending on the scheduling algorithm. The dotted curve, labeled “simulation result”, shows the mean scheduling component abstraction overheads which are obtained through simulations. Figure 5.5 shows the effect of workload utilization on the component abstraction overheads under EDF and RM scheduling, where k = 2 and n = 8. It is shown that the component abstraction overheads generally decrease as the workload utilization increases. It is also shown that the component abstraction overheads are clearly lower under EDF scheduling than under RM scheduling, which is consistently shown in Figure 5.6 and 5.7. Figure 5.6 shows the effect of interface period, in terms of k, on the component abstraction overheads under EDF and RM scheduling, where UW = 0.4 and n = 8. It is shown that the component abstraction overheads decrease as interface period increases. Figure 5.7 shows the effect of the number of tasks under EDF and RM scheduling, where UW = 0.4 and k = 2. Under EDF scheduling, we performed 500 simulations runs when n = 32 and 250 runs when n = 64. It is shown that the component timing abstraction overheads decrease as the number of tasks increases under EDF scheduling. 111
(a) Abstraction Overheads under EDF (U=0.4, n=8) 1.2 analytical bound simulation result 1
1.2
(a) Abstraction Overheads under RM (U=0.4, n=8)
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
analytical bound simulation result
1
0
0 2
4
8
16
32
64
2
k
4
8
16
32
64
k
Figure 5.8: Component abstraction overheads of bounded-delay interfaces as a function of k, where k = Pmin /D: (a) under EDF scheduling and (b) under RM scheduling.
Under RM scheduling, however, the component timing abstraction overheads are relatively independent of the number of tasks. The implications of our simulation results can be summarized as follows: the component abstraction overhead of a periodic scheduling interface (1) is significantly affected by an interface period workload utilization, but not much by workload utilization, and (2) is scalable in terms of the number of tasks.
5.6.2 Bounded-Delay Interface Model In this section, we consider the bounded-delay component interface BIhA, Di. We first derive analytical bounds to its component abstraction overhead and then evaluate its overhead through simulation. We performed simulations to evaluate the component abstraction overhead of boundeddelay component interface. For simulation runs, we have used the following settings: • Workload Size (|W |) : The number of tasks in the workload. W is 2, 4, 8, 16, 32, 64, 112
(a) Abstraction Overheads under EDF (k=2, n=8) 0.8
(a) Abstraction Overheads under RM (k=2, n=8) 0.8
0.7
0.7
0.6
0.6
analytical bound simulation results
0.5 0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0 0.1
0.2
0.3 0.4 0.5 Workload Utilization
0.6
analytical bound simulation results
0.5
0.7
0 0.1
0.2
0.3 0.4 0.5 Workload Utilization
0.6
0.7
Figure 5.9: Component abstraction overheads of bounded-delay interfaces as a function of workload utilization: (a) under EDF scheduling and (b) under RM scheduling.
and 128. • Workload Utilization (UW ) : The utilization of the workload W is 0.1, 0.2, . . . , 0.7. • Task Model T (p, e) : Each task T has a period p randomly generated in the range [5, 100] and an execution time e generated in the range [1, 40]. • Scheduling Algorithm (A) : A is EDF or RM. • Delay Bound (D): The delay bound D is determined such that k = 2, 4, 8, 16, 32, and 64, where k = Pmin /D and Pmin is the smallest task period of a workload set W . Each point shown in Figure 5.8, 5.9, and 5.10 represents the mean of 500 simulation results unless specified otherwise. The 95% confidence intervals for data are within 1-5% of the means shown in the graphs. In Figure 5.8, 5.9, and 5.7, component abstraction overheads are plotted under EDF and RM scheduling as a function of k, where k = Pmin /D, workload utilization, and the number of tasks, respectively. In the graphs, the solid curve, labeled “analytical bound”, shows
113
(a) Abstraction Overheads under EDF (U=0.4, k=4) 0.8
(a) Abstraction Overheads under RM (U=0.4, k=4) 0.8
0.7
0.7
0.6
0.6
analytical bound simulation result
0.5
analytical bound simulation result
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0 2
4
8 16 the number of tasks
32
64
2
4
8 16 the number of tasks
32
64
Figure 5.10: Component abstraction overheads of bounded-delay component interfaces as a function of the number of tasks: (a) under EDF scheduling and (b) under RM scheduling.
the maximum analytical bound to the component timing abstraction overhead, which is obtained through Theorem 5.5.3 or 5.5.4 depending on the scheduling algorithm. The dotted curve, labeled “simulation result”, shows the mean scheduling component abstraction overheads which are obtained through simulations. Figure 5.8 plots the component abstraction overheads as a function of k, which represents the relationship between D and Pmin , i.e., k = Pmin /D, under EDF and RM scheduling, where n = 8 and UW = 0.4. We can see that the component abstraction overheads significantly depend on k. Figure 5.9 plots the component abstraction overheads as a function of workload utilization under EDF and RM scheduling, where k = 4 and n = 8. It is shown in the figure that the abstraction overhead is lower under EDF scheduling than under RM scheduling. We can see that the workload utilization is not a significant factor to affect the abstraction overheads. We also evaluate the component abstraction overheads with respect to the number of tasks n. Figure 5.10 shows the component abstraction overheads as a function of the num114
ber of tasks under EDF and RM scheduling1, where k = 4 and UW = 0.4. As stated earlier, each point in the graph is the result of 500 simulation runs under EDF scheduling. We can see that the component abstraction overheads do not increase as the number of tasks increases, but begin to decrease at some point. Under RM scheduling, even though our analytical bound implies that the abstraction overhead of the bounded-delay interface model would increase as the number of workloads increases, our simulation results show that its average abstraction overhead rather decreases as the number of workloads increase. The implications of our simulation results are that for a bounded-delay interface model, its bounded delay is most critical factor for component abstraction overheads, and the workload utilization does not have a relatively considerable impact on the overheads. In addition, the bounded-delay interface model is scalable.
5.7 Summary In this chapter, we proposed a compositional real-time scheduling framework, where (1) the timing properties of individual components can be independently analyzed, (2) real-time component interfaces can specify the timing properties of components, (3) the system-level timing properties can be established by composing real-time interfaces of components. We have developed such a framework using two real-time component interface models, which are the periodic and bounded-delay interface models. With these two real-time interface models, we have addressed the component timing abstraction problem for the components that consist of a periodic workload set and the EDF or RM scheduling algorithm. In ad-
115
dition, we evaluated the abstraction overheads that those two real-time interfaces incur in addressing the component timing abstraction problem. We derived analytical bounds to the overheads and presented average values obtained through simulations. We consider that the techniques for compositional schedulability analysis, developed in this proposed framework, can be useful in many other environments. For example, such techniques can be effectively useful in addressing the compositional schedulability analysis of distributed real-time systems, where resources can be hierarchically shared with dependency. We will show the application of such techniques into the schedulability analysis of distributed real-time systems in Chapter 6. While real-time embedded systems are often subject to resource constraints, many resource reduction techniques generate close relationships between resource and timing properties. Our proposed techniques can be effectively useful in developing component interfaces that can specify such close relationships of individual components. We will explain this in Chapter 7.
116
Chapter 6 Compositional Scheduling Framework for Distributed Real-time Systems 1
6.1 Introduction A distributed real-time system consists of a set of computation nodes connected by a shared communication medium. Such distributed real-time systems offer many advantages over a centralized real-time system such as dependability and scalability [41]. Implementing a dependable real-time service requires distribution of functions to achieve effective fault containment and fault tolerance so that the service can continue despite the occurrence of faults. As requirements on processing power grow, new nodes can be added incrementally. 1
This chapter is a joint work with Arvind Easwaran, Sebastian Fischmeister, and Oleg Sokolsky.
117
Node1 Task queue
Node2 Processor
Task queue
Processor
T1
T3
T2
Task schedule
Message queue
Part. network
Message queue
Part. network
MT′ 2
MT′ 1
t
Message schedule
Node queue
Network
N1 Node schedule
Ni
N1
Nj
t
Figure 6.1: Overview of our compositional distributed real-time scheduling Framework.
Distributed real-time systems often have multiple hierarchical resources with dependencies. For example, the processor and the network can have a sequence dependency. In clustered real-time sensor networks, a typical role of a cluster-head node is to gather raw data from its cluster-member nodes, perform computation on the gathered raw data, and send a computation result to a commander node prior to a deadline. In this case, the cluster-head node requires the processor and the network in sequence; it requires the processor to complete its computation and then subsequentially requires the network to transmit the computation result. The network is a typical hierarchical resource, i.e., the network is shared by nodes, and a share of the network is subsequently shared by applications within a node. It is challenging to analyze the schedulability of distributed real-time systems, where resources can be shared hierarchically under dependencies. In this chapter, we propose a compositional scheduling framework for distributed realtime systems, which supports compositional analysis for hierarchical resource scheduling with dependencies. As shown in Figure 6.1, we consider scheduling of three resources in our framework. Within each node, task scheduling determines which application is assigned the processor to complete its computation, and message scheduling determines 118
which application transmits its message when its node is permitted to access the network, under the sequence dependency that applications transmit messages only after completing computation. Across nodes, node scheduling determines which node is assigned to the network. Our goal is to develop techniques for compositionally analyzing the schedulability of a system consisting of these three resources. That is, we aim at abstracting the sequence dependency and the network demand of each node in order to be compositional. A body of research has been devoted to real-time scheduling for the processor and the network. In [13] and [33], the authors provide an overview of real-time processor scheduling and real-time communication protocols, respectively. However, there has been little work on compositional schedulability analysis for processor and network scheduling in the presence of a sequence dependency between the two resources. There have been a few approaches that support distributed real-time systems. One approach is the timetriggered architecture (TTA) [41] with the TTOS for processor scheduling and the TTP for network scheduling. Another two recent approaches are Giotto [35] and TDL [21] with machines for computation and scheduling, and TDMA for communication. However, all these approaches do not provide compositional analysis for combined processor and network scheduling. This work contributes in two ways: (1) it provides schedulability conditions for the analysis of multiple, dependent resources within a node under EDF and RM scheduling, and (2) it provides techniques for abstracting the network demand of a node, which guarantees the network schedulability of the node hiding its internal resource dependency.
119
The remainder of the work is structured as follows: Section 6.2 describes the system model and the problem statement. Section 6.3 provides schedulability conditions for processor and network scheduling under EDF and RM scheduling within a node. Section 6.4 describes the abstraction mechanism for the network demand of a node and provides schedulability conditions for the network. Section 6.5 presents related work, and Section 6.6 summarizes this chapter.
6.2 System Model and Problem Statement In this section we define the distributed real-time system architecture and the associated task model that we consider for schedulability analysis. We also give a problem statement identifying the different problems that this paper addresses.
6.2.1 System Model We consider a distributed real-time framework, where real-time nodes communicate with each other using a shared network channel, as shown in Figure 6.1. Each node of the distributed system consists of a task queue, a message queue, the node queue, the processor and a partitioned network. The task queue holds computation tasks to be scheduled on the processor and the message queue holds messages that need to be transmitted over the shared network channel. The messages generated at a node share a dependency relationship with computation tasks. A message can be scheduled to be transmitted only after it has been generated by the corresponding computation task. In general, each message will 120
be released after the execution of the corresponding computation task is complete. The message schedule generated for these messages is for a partitioned network because of the shared nature of the network. The node queue holds messages to be transmitted over the shared network in the order determined by the message schedule generated for the partitioned network. Let T hp, ci denote the task model and Mhp, xi denote the message model for the distributed system we consider. Then, • Period p represents the fixed interval between two consecutive instances of T or between two consecutive instances of M. Its relative deadline is the same as p. • Computation time c represents the worst-case execution time (WCET) for the completion of the task T with the processor. • Transmission time x represents the worst-case transmission time (WCTT) for the completion of the transmission of the message M over the network. We now define a sequence dependency relationship between the task T and the message M generated on completion of the execution of T . Definition 6.2.1 (Sequence Dependency Relationship) A task T = hp, ci and a message M = hp, xi belong to the dependency relation R denoted by RhT, Mi if and only if the message M is generated by the task T on completion of its execution on the processor. A workload w consists of a task and a message under the sequence dependency relationship and is defined as {T hp, ci, Mhp, xi, RhT, Mi}. When c = 0, then workload w 121
represents a periodic message Mhp, xi without any sequence dependency. Similarly, when x = 0, workload w presents a periodic task T hp, ci only. A component is represented as ChW, Ai, where W is a workload set W = {w} and A is a scheduling algorithm for W . In this chapter, we make the following assumptions about the task and message model
• p, c, and x are all positive integers. • Each node in the distributed system consists of one component. • All the tasks and messages at one node have harmonic periods. However, tasks and messages across different nodes need not have harmonic periods.
6.2.2 Problem Statement Our goal is to develop a compositional framework for distributed real-time systems consisting of multiple hierarchical real-time resources with dependency. In this paper, we mainly address the following problems:
• Analysis for multiple resource scheduling with dependency. In a node, each workload has a periodic real-time requirement to complete its computation (for the task T ) and its transmission (for the message M). This raises the issues of analyzing the schedulability of a workload set, where each task should receive the processor allocations necessary to complete its computation and the message should then receive the network allocations necessary to complete its transmission. Therefore, we have 122
a schedulability analysis problem for the processor and the partitioned network. The analysis has to be done in the presence of the dependency relationship between T and M as given by Definition 6.2.1. • Compositional analysis for hierarchical resource scheduling. A distributed real-time system has a hierarchy of real-time resources with the processor at one level and the network channel at another level. The message set generated at a node requires use of the network channel but the generation of this message also depends on the corresponding task as given by Definition 6.2.1. Compositional analysis would entail determining efficient abstractions for the messages that can mask this dependency. Using these abstractions we then need to derive schedulability conditions for the shared network channel.
6.3 Analysis for Dual Resource Scheduling with Dependency A workload can require usage of multiple resources in some sequence to provide some system function. In our distributed real-time framework, a workload requires the processor to complete computation and then the network to transmit computation results prior to a deadline. To support such workloads, a node should be able to determine the order of workloads for the use of the processor and the network. There have been extensive studies on scheduling real-time workloads for a single resource, however, there has been little work
123
Tasks
C1
Messages 0
C2
C3
X1
X2
Cn
X3
Xn
t0 t*
P*
Figure 6.2: An example of EDF schedule.
on scheduling real-time workloads for the multiple resources, while each workload has some constraints in using resources, such as the sequence dependency relationship. In this section, we address the problem of scheduling a set of periodic tasks for the processor and a set of messages for the network within a node, while a periodic task and a message has the sequence dependency relationship. We present schedulability analysis for this problem under EDF and RM scheduling.
6.3.1 Earliest-Deadline-First (EDF) Scheduling The EDF scheduling algorithm can be used for scheduling tasks and messages at a node. In this section, we present exact schedulability analysis for this case. We first provide intuition of our exact EDF schedulability conditions and then present a proof for those conditions. Let W be a set of n workloads such that W = {wi | pi = P ∗ − (n − i)} for some integer P ∗ > n. The set W is assumed to be sorted in an increasing order of deadline. We now consider scheduling the workload set W under EDF scheduling. According to EDF scheduling, the execution order of workloads is w1 , · · · , wn . As shown in Figure 6.2, suppose we schedule the tasks of workloads from time instant 0 and the messages of
124
workloads from time instant t0 , where t0 = P ∗ −
Pn
i=1
xi . Then, the schedule shown in
Figure 6.2 is schedulable if, and only if, there is no time instant at which the task Ti and message Mi of workload wi are scheduled together. In Figure 6.2, T2 and M2 are scheduled together at t∗ , which means the workload set W is not schedulable under EDF scheduling. For wi , we can check whether or not Ti and Mi are scheduled together at the same time instant by checking the following inequality for every 1 ≤ i ≤ n: i X k=1
ck +
n X k=i
xk ≤ P ∗ .
If the above inequality is true, then Ti and Mi can be scheduled separately. A job is said to an instance of a workload. Let Oi be the release time of job Jobi and Di be the deadline of job Jobi . The execution window of job Jobi is defined as W[Oi , Di ), where W[Ws , Wf ) denotes a time interval from Ws to Wf . For a set of n periodic workloads, w1 , · · · , wn , and a time interval W[Ws , Wf ), let JW denote a list of jobs of the periodic workload set such that the execution windows of the jobs are included in W, i.e.,
JW = {Jobi | Ws ≤ Oi < Di ≤ Wf }.
We assume JW is sorted in an increasing order of deadline. We now present the following theorem for an exact schedulability analysis of the CPU and packet priority queue scheduling under EDF. Theorem 6.3.1 (EDF Schedulability Analysis) A set of workloads, w1 , · · · , wn , is schedu125
lable under EDF scheduling, if and only if,
∀m ∈ [1, NW ]
m X
NW X
ci +
i=1
k=m
xk ≤ Wf − Ws ,
(6.1)
where NW = |JW |, for all 0 ≤ Ws < Wf ≤ pLCM , and pLCM is the hyperperiod of the workload set, which is the least common multiplier of all periods. Proof. To show the necessity, we prove the contrapositive, i.e., if Eq. (6.1) is false, the workload set is not schedulable. Suppose that for some interval W ∗ [Ws ∗ , Wf ∗ ), NW ∗
∗
∗
∃m ∈ [1, NW ∗ ]
m X
ci +
i=1
X
k=m∗
xk > Wf ∗ − Ws ∗ .
(6.2)
Let O ∗ be the smallest instant at which Jobi is released, where i ∈ [1, NW ∗ ]. Then, by definition, Ws ∗ ≤ O ∗ . Let D ∗ be the largest deadline of Jobi , where i ∈ [1, NW ∗ ]. Then, D ∗ ≤ Wf ∗ . Therefore, we have NW ∗
∗
∗
∃m ∈ [1, NW ∗ ]
m X
ci +
i=1
X
k=m∗
xk > Wf ∗ − Ws ∗ ≥ D ∗ − O ∗.
Since the tasks of Job1 , · · · , Jobm∗ finish at time t∗ , where t∗ = O ∗ +
Pm∗
i=1 ci ,
(6.3)
the messages
∗ ∗ can start at t at earliest demanding X-time-unit network allocations of Jobm∗ , · · · , JobNW
during an interval [t∗ , D ∗ ), where X = ∗ will miss its deadline. JobNW
P NW ∗
k=m∗
xk . Since X > D∗ − t∗ , it is clear that
To show the sufficiency, we prove the contrapositive, i.e., if the workload set is not
126
schedulable by EDF, then Eq. (6.1) is false. Let t2 be the first instant at which job Jobm misses its deadline. Let t1 be the latest instant prior to t2 such that at time instant t1 − 1, the network is idle or used by jobs with deadlines higher than t2 . Suppose job Jobl starts its transmission at t1 . By definition of t1 , the deadline Dl of Jobl is smallest among the jobs ready for transmission, and surely Dl ≤ t2 . Since a job of Tm misses its deadline at t2 , we have m X i=l
xi > t2 − t1 .
(6.4)
The message of job Jobl can start its transmission at t1 , only (1) when the task of Jobl finishes at t1 − 1 or (2) when the task of Jobl has finished before, the messages of its higher priority job finishes at t1 − 1, there is no other message with higher priority is ready for transmission at t1 . By definition of t1 , the network was not used by a higher-priority message at t1 − 1. Therefore, we see that the task of Jobl finishes at t1 . Let t0 be the latest instant prior to t1 such that at time instant t0 − 1, the processor is idle or used by jobs with deadlines higher than Dl . Suppose job Jobk starts its task at t0 . By definition of Jobk , Dk ≤ Dl . Then, we have
l X i=k
ci = t1 − t0 .
(6.5)
Combining Eq. (6.4) and (6.5), we have l X i=k
ci +
m X i=l
xi > t2 − t0 .
We then see that Eq. (6.1) is false with an interval W[t0 , t2 ). 127
2
Let rsTi denote the worst-case response time of task Ti . There have been previous studies [79, 61] on the techniques to compute rsTi under EDF scheduling. Considering the dependency between task and message, we can see that message Mi can be released as late as rsTi from the arrival of Ti . Hiding such a dependency, we can transform Mi as a periodic message Mi′ with jitter, i.e., Mi′ = hpi , xi , Ji = rsTi i. This transformed message model can be used in abstracting the EDF schedulability within a node, which will be described in Section 6.4.1.
6.3.2 Rate-Monotonic (RM) Scheduling Given a node using the RM scheduling algorithm for CPU and message scheduling, we provide schedulability analysis for the node, which are based on the worst-case response time (WCRT) analysis. There has been previous work on the WCRT rsTi of task Ti . In this section, we present techniques to compute the WCRT rsM i of message Mi . The worst-case response time rsTi of task Ti can be computed through the following iterative method [5]: rsTi (k)
= ci +
i−1 l X rsT (k − 1) m i
j=1
pj
· cj ,
(6.6)
The iteration continues until rsTi (k) = rsTi (k − 1), where rsTi (0) = ci . Each message at a node can be released only after the corresponding task has finished its execution. Any task with WCET ci and worst case response time rsTi , will finish its execution at t, where ci ≤ t ≤ rsTi . That is, any message Mi can be released at t, where ci ≤ t ≤ rsTi . Hiding the dependency of message Mi on Ti , therefore, every message Mi 128
at a node can be represented as a periodic message Mi′ with offset and jitter, i.e., Mi′ = hpi , xi , Oi = ci , Ji = rsTi − ci i, ∀1 ≤ i ≤ n. There have been previous studies on computing the WCRT under RM scheduling for tasks with offsets [81] and jitter [82]. We combine these two approaches and modify the iterative procedure for our message model with offset and jitter. Consider a set of messages M = {Mi′ = hpi , xi , Oi, Ji i|∀1 ≤ i ≤ n}. We first describe the worst-case scenario for the response time of a message and then give the WCRT equations for them. Theorem 6.3.2 (Worst-case Scenario for Response Time) The worst-case response time of message Mk′ occurs when • the message Mk′ is released as late as possible (Ok + Jk time units after its arrival) at time t∗ , • the higher priority messages Mi′ of Mk′ are released as late as possible (Oi + Ji time units after its arrival) at time t∗ , and • all the subsequent instances of the higher priority messages are released at the earliest (Oi time units after its arrival).
Proof. Consider the first instance MJob1i of some higher priority message Mix . Assume it is released at time t2 > t1 where t1 is the release time of message Mk′ . Now moving the release time of MJob1i will only lead to further increase in the response time of Mk′ and hence the worst would occur when the release time of MJob1i coincides with the release time of Mi′ . The same argument applies for all future jobs of message Mi′ and hence the 129
P1 O1 + J1 Message 1 Message 2
O1
X1
O2 + J2
X1
X2
X2
t* Figure 6.3: Worst-case scenario for the response time of Message 2 under RM scheduling
worst case response time for message Mk′ would occur when all the future jobs are released at the earliest, i.e, with Ji = 0 and offset Oi = ci . Further, by assuming that the arrival time of MJob1i was the earliest(Ji +Oi units of time prior to t1 ), the release time of all future jobs can be further decreased in absolute time thereby leading to an increase in the response time of Mk′ . The above argument can be applied to all messages with priority higher than that of message Mk′ . Also, increasing the release time of message Mk′ itself, further increases its response time. Hence message Mk′ must be released at the latest possible time, i.e, Jk + Ok time units after its arrival. Thus the critical instant for a message M2′ will occur for the scenario given in Figure 6.3 at time t∗ as specified by Theorem 6.3.2
2
We now derive the worst case response time equations for these messages when the messages are released according to the worst-case scenario shown in Theorem 6.3.2. Theorem 6.3.3 (WCRT under RM) For each message Mi′ at the node, its worst case response time is given by the iterative equations,
rsM i = wii + Oi + Ji , 130
(6.7)
where wii is given by the following iteration:
wik+1 i
= xi +
i−1 X
m=1
xm ·
l wik + J m m
i
pm
,
such that the iteration continues until wiki = wik−1 , where wi0i = xi . i Proof. For any iteration k + 1, wiki represents the maximum interference from the previous iteration, wiki + Jm + cm represents the total time elapsed since the arrival of the first ′ job MJob1m for message Mm and wiki + Jm + cm − cm represents the total time elapsed
since the arrival of MJob1m minus the offset of task k. Hence wii represents the maximum interference to task i by messages with priority higher or equal to that of i (with message period higher or equal than that of i). Termination of the computation of wiki is guaranteed since in each iteration, the value of wiki increases and we need to evaluate only until wiki ≤ pi − Ji − ci . The worst case response time for task i, i.e., time of completion of execution of task i since its arrival is then given by rsM i = wii + Ji + Oi , where Oi = ci .
2
The following theorem presents a sufficient schedulability condition for a set of workloads under RM scheduling, which is based on the WCRT analysis of messages. Theorem 6.3.4 (RM Schedulability Analysis) A set of workload, w1 , · · · wn , within a node is schedulable under RM scheduling if rsM i ≤ pi , ∀1 ≤ i ≤ n. This schedulability condition is only a sufficient condition, because the worst-case scenario described in Theorem 6.3.2 may never occur for the messages due to their dependence on corresponding tasks. Depending on the initial arrival times of the tasks, the messages might 131
never align as required in the worst-case scenario given in Theorem 6.3.2.
6.4 Compositional Analysis for Network Channel Schedulability analysis of the network channel involves determining schedulability of all the messages across all nodes under EDF or RM scheduling of the channel. Each message at a node depends on a corresponding task at that node as given by Definition 6.2.1. Since the CPU is a local resource of the node, for network schedulability analysis a reasonable approach is to abstract only this dependency. In this section, we represent each message at a node as a periodic message with offset where the offset is determined by the task on which the message depends. We then generate a single periodic message abstraction with offset to represent this set of messages at a node. The network channel can then be considered as a real-time resource with the message set consisting of all the abstracted messages (one for each node). Sufficient schedulability conditions are then derived for such abstracted messages. For this section, we assume that the periods of all the messages within a node are harmonic. We also assume that all the messages within a node together with the periodic message abstraction we generate are synchronized with each other. We believe this does not restrict the distributed system too much, because synchronization and harmonicity are node level requirements.
132
6.4.1 Periodic Abstraction for Messages under EDF Each message at a node can be released only after the corresponding task has finished its execution. Any task with worst case execution time ci and WCRT rsTi , will finish its execution at time t where ci ≤ t ≤ rsTi . Consider a task set Tj = {T hpi , ci i|1 ≤ i ≤ n} at some node j in the distributed system with the message set MS j = {Mhpi , xi i|1 ≤ i ≤ n}. The message set with offset can then be given as MS ′ j = {Mi′ = hpi , xi , Oi = rsTi )|∀1 ≤ i ≤ n}. In the rest of this section we will refer to the message set at a node j as MS ′ j . We will now generate the periodic abstraction for this message set under EDF scheduling.
Supply Bound Function for a Periodic Resource with Offset The message set MS ′ j of node j can be abstracted as a single periodic message with offset such that the abstracted single periodic message can represent the collective network demands of MS ′ j . We give an algorithmic approach for the computation of this periodic message abstraction. This periodic message abstraction can be regarded as a periodic resource that guarantees schedulability of the message set. The supply bound function of periodic resources with offset can then be used to compute a sufficient abstraction for the message set. Definition 6.4.1 (Periodic Resource with Offset.) A periodic resource with offset is a task N = hP, E, Oi where P is the resource period, E is the resource supply time and O is the initial offset.
133
We now prove that the worst case supply (minimum overlap with demand) from the periodic resource occurs when the resource is supplied at the earliest. Lemma 6.4.1 The worst case resource supply (minimum resource overlap with demand) from the periodic resource abstraction N = hP, E, Oi for the message set MS ′ j occurs when the resource is supplied at the earliest (release time of the resource = O) in each period. Proof. Let p1 denote the smallest period among all message periods in the message set. Consider any interval of time [t1 , t2 ) such that t2 − t1 = p1 and t1 coincides with the arrival of some message. In this interval, let the periodic resource abstraction N supply E transmission time units at the earliest, i.e., starting from t = O. Since all the messages in the message set are of the form Mi′ = hpi , xi , Oi = rsTi i, supplying the resource any later than t = O, will only further increase the resource available to the messages. Supplying the resource at its earliest will result in smaller deadlines for the messages to ensure schedulability. Hence supplying the resource at the earliest results in the worst case resource supply for the message set under the assumptions of harmonicity and synchronous arrival.
2
Given a periodic resource N = hP, E, Oi, its supply bound function is given by, Definition 6.4.2 sbf N (t1 , t2 ) = [(⌊(t2 − O)/P ⌋ + 1)E − Y ] + [Z − (⌊(t1 − O)/P ⌋ + 1)E] where Y and Z are given by,
Y =
kP + O + E − t2 0
if ∃k, t2 ∈ [kP + O, kP + O + E], otherwise 134
Z=
kP + O + E − t1 0
if ∃k, t1 ∈ [kP + O, kP + O + E], otherwise
Since the periodic resource is synchronized with the message set, the supply bound function for any interval [t1 , t2 ) is equal to the exact resource supply in that interval assuming that the resource arrives at t = 0 along with with all the other messages and is released at the earliest at t = O.
Periodic Abstraction for Message Set To determine the periodic abstraction for a message set, we also need to determine the demand bound function for that set. This demand bound function for the message set MS ′ j = {Mi′ = hpi , xi , Oi = rsTi i|∀1 ≤ i ≤ n} under EDF scheduling is given in paper [9]. Assuming, O = maxi {Oi} and P ′ = lcm{p1 , · · · , pn } = maxi {pi }, the demand bound function is given by, Definition 6.4.3 dbf EDF (MS j , t1 , t2 ) =
Pn
i=1
ηi (t1 , t2 )xi , ∀0 ≤ t1 < t2 ≤ O + 2P ′
where ηi (t1 , t2 ) is given by, ηi (t1 , t2 ) = max{0, ⌊(t2 − Oi − pi )/pi ⌋ − ⌈(t1 − Oi )/pi ⌉ + 1} We now give sufficient conditions for a periodic resource/message abstraction to be valid for a message set. Theorem 6.4.2 A periodic abstraction N = hP, E, Oi is a sufficient abstraction for a 135
message set MS ′ j , if
dbf EDF (MS j , t1 , t2 ) ≤ sbf N (t1 , t2 ), ∀0 ≤ t1 < t2 ≤ O + 2P ′
(6.8)
where O = maxi {Oi }, P ′ = lcm{p1 , · · · , pn } = maxi {pi}, P = p1 and p1 is the smallest period among all message periods. Proof. To prove necessity we prove the contrapositive, i.e., if Eq. (6.8) is false then the message set MS ′ j is not schedulable. If the total demand in some interval [t1 , t2 ) exceeds the total resource provided during that interval, then clearly there is no feasible schedule for all the jobs in this interval. To prove sufficiency of Theorem 6.4.2 we prove the contrapositive, i.e., if the message set MS ′ j is not schedulable on a dedicated network, then Eq. (6.8) is false. Let t2 be the first instant of time when a message misses its deadline. Let t1 be the instant of time prior to t2 such that no message in the message set has a released job. Further, let t1 be the maximum of all such time instants. Then in the interval [t1 , t2 ) the dedicated network is never idle. Since, the message set misses its deadline at t2 , the demand bound function for the interval [t1 , t2 ) is constrained by dbf EDF (MS j , t1 , t2 ) > t2 −t1 . Since the supply bound function sbf N (t1 , t2 ) of any periodic resource cannot be greater than t2 − t1 , we have the condition that dbf EDF (MS j , t1 , t2 ) > sbf N (t1 , t2 ). Hence, Eq. (6.8) is false.
2
Baruah et al. [9] provides an algorithmic approach for evaluating the condition given in Theorem 6.4.2. The algorithm to generate the periodic message abstraction basically will
136
do an exhaustive search over the entire range of values for E and O. For each such abstraction N = hP, E, Oi the algorithm will check whether it satisfies the conditions given in Theorem 6.4.2. There will be in general multiple periodic abstractions, which will satisfy the theorem. One such abstraction can be picked based on certain optimality criteria such as minimum utilization, largest execution window, etc.
Example 6.4.1 Consider a distributed system node with tasks Ti = {T1 h10, 2i, T2h20, 5i} and message set MS j = {M1 h10, 1i, M2h20, 6i}. The messages with appropriate offsets for the tasks are then MS ′ i = {M1′ h10, 1, 2i, M2′ h20, 6, 7i} where the WCRT of T1 is 2 and the WCRT of T2 is 7. The periodic abstraction for this message set can then be either N = h10, 5, 4i or N = h10, 4, 6i among many others. We will pick the abstraction N = h10, 5, 4i because it has a larger execution window thereby giving more flexibility at the network level scheduling.
6.4.2 Periodic Abstraction for Node Level Messages under RM Each message at a node can be released only after the corresponding task has finished its execution. Any task with WCET ci and WCRT rsTi , will finish its execution in the interval of time specified by ci ≤ t ≤ rsTi . Consider a task set Tj along with the message set MS j . In this section we will generate the periodic abstraction for this message set under RM scheduling by using the WCRT computation for this set. As will be shown in this section, the WCRT computation for the message set under RM scheduling is sensitive to message jitter. Hence the message set with offset and jitter is given as MS ′ j = {Mi′ = hpi , xi , Oi = 137
ci , Ji = rsTi − ci i|∀1 ≤ i ≤ n}. Periodic Abstraction for Message Set We give the demand bound function for the message set using the iterative equations given in the computation of wili for the WCRT in Section 6.3.2. Given a message set MS ′ j = {Mi′ = hpi, xi , Oi = ci , Ji = rsTi − ci i, ∀1 ≤ i ≤ n}, the demand bound function for RM scheduler is given by the equations,
dbf RM (MS j , i, t) = xi +
X
k∈HP (i)
xk ⌈(t + Jk + Ok − Ok )/pk ⌉
⇒ dbf RM (MS j , i, t) = xi +
X
k∈HP (i)
(6.9)
xk ⌈(t + Jk )/pk ⌉
The supply bound function for a periodic resource abstraction is as given in Definition 6.4.2 in Section 6.4.1. We now give sufficient conditions for a periodic abstraction to be valid for a message set under RM scheduling. Theorem 6.4.3 A periodic abstraction N = hP, E, Oi is a sufficient abstraction for a message set MS ′ j = {Mi′ = hpi, xi , Oi = ci , Ji = rsTi − ci i|∀1 ≤ i ≤ n}, if • dbf RM (MS j , i, t) ≤ sbf N (0, t), ∀0 < t ≤ pi , ∀1 ≤ i ≤ n • P = p1 is the smallest period among all message periods. Proof. By definition, if there is time instant 0 < t < pi for all message Mi′ such that the
138
worst-case network demand for each message Mi′ can be satisfied by the network supply by N at t, the message set is schedulable.
2
The algorithm to generate the periodic abstraction is then given as in Section 6.4.1.
Example 6.4.2 Consider a distributed system node with tasks Tj = {T1 h16, 1i, T2h32, 2i} and message set MS j = {M1 h16, 1i, M2h32, 2i}. The messages with appropriate offsets and jitters for the tasks are then MS ′ j = {M1′ h16, 1, 1, 0i, M2′ h32, 2, 2, 1i} where the WCRT of T1 is 1 and the WCRT of T2 is 3. The periodic abstraction for this message set can then be N = h16, 2, 3i.
6.4.3 Schedulability Analysis for Network Channel Using equations and algorithms given in Sections 6.4.1 and 6.4.2 we can generate periodic message abstraction for each node in the distributed system. The schedulability analysis problem of the network channel can then be stated as follows: Definition 6.4.4 Given a set of abstracted messages Nj = hPj , Ej , Oj i, ∀1 ≤ j ≤ m (m is the number of nodes in the system), derive appropriate conditions for the schedulability of this message set over the single network channel under a given scheduling strategy.
Network Schedulability Analysis under EDF We derive sufficient schedulability conditions for the abstracted messages Nj = hPj , Ej , Oj i, ∀1 ≤ j ≤ m, when EDF scheduling is used at the network channel. Baruah et al. [9] gives
139
the following theorem for schedulability analysis of message sets with offset under EDF scheduling. Theorem 6.4.4 Given a message set Nj = hPj , Ej , Oj i, this set is schedulable under EDF scheduling if, and only if,
Pn
j=1 Ej /Pj
≤ 1 and
Pn
j=1
ηj (t1 , t2 )Ej ≤ t2 − t1 , ∀0 ≤ t1 < t2 ≤ O + 2P ′
where O = maxj {Oj }, P ′ = lcm{P1 , · · · , Pn } and ηj (t1 , t2 ) = max{0, ⌊(t2 − Oj − Pj )/Pj ⌋ − ⌈(t1 − Oj )/Pj ⌉ + 1} Theorem 6.4.4 can be used to generate sufficient schedulability conditions for scheduling our abstracted message set over the network channel. It only provides sufficient conditions for schedulability because the message abstraction itself is only sufficient. Paper [9] also gives an algorithmic approach for evaluating the schedulability condition given in Theorem 6.4.4.
Example 6.4.3 Consider a distributed system consisting of two nodes with abstracted messages given by N1 = h10, 5, 4i and N2 = h16, 2, 3i. This message set is schedulable over the network because it satisfies the demand bound conditions given in Theorem 6.4.4.
Network Schedulability Analysis under RM We derive sufficient schedulability conditions for the abstracted messages Nj = hPj , Ej , Oj i, ∀1 ≤ j ≤ n when RM scheduling is used at the network channel. The demand bound 140
function for this set of messages under RM scheduling is given by Eq. (6.10). This demand bound function can be derived using the iterative equations for wili in the WCRT computation of messages with offset and jitter under RM scheduling as given in Section 6.3.2 if jitter is assumed to be zero. Sufficient schedulability condition for the abstracted message is then given by, Theorem 6.4.5 The message set Nj = hPj , Ej , Oj i, ∀1 ≤ j ≤ n is schedulable under RM scheduling over the network channel if, dbf RM (Nj , t) ≤ t, ∀0 < t ≤ pj , ∀1 ≤ j ≤ n . Example 6.4.4 Consider a distributed system consisting of two nodes with abstracted messages given by N1 = h10, 5, 4i and N2 = h16, 2, 3i. This message set is schedulable over the network because each message abstraction satisfies the demand bound condition given in theorem 6.4.5.
6.4.4 Compositional Analysis for Network Schedulability When a new node is added to the existing distributed real-time system, the processor and partitioned network schedulability analysis can be done as given in Section 6.3. For network schedulability analysis, we generate the abstracted message for this new node using equations depending on the scheduler used at that node as given in Sections 6.4.1 or 6.4.2. The new abstracted message set for the network then consists of all the previous abstracted messages and the abstracted message for the new node. We can then check for the schedulability of the network channel using equations given in Section 6.4.3 for this combined abstracted message set. 141
The periodic abstraction generation technique given in Sections 6.4.1 and 6.4.2 can be used to build a hierarchical compositional framework for the network channel if we assume that the message periods across all the nodes are harmonic. Under this assumption, the periodic abstractions generated for all the nodes can be represented as a single periodic abstraction using the techniques given in Sections 6.4.1 or 6.4.2 depending on whether EDF or RM scheduling is used at the network level, respectively. The final periodic abstraction can then be used to do schedulability analysis of the network channel. Sufficient schedulability condition for the final periodic abstraction is given by the following theorem. Theorem 6.4.6 The periodic abstraction N = hP, E, Oi is schedulable on the network channel if (E/P ) ≤ 1 and P − E ≥ O. When a new node is now added to the existing system, periodic abstraction generated for the messages of this new node can be combined with the final abstraction generated earlier to create a new final abstraction. Schedulability analysis of the network can then be done by applying Theorem 6.4.6 to this new final abstraction.
6.5 Related Work A body of research has been devoted to real-time communication on networks and scheduling. In [33] and [13], the authors provide an overview of network protocols and scheduling, respectively. In networking, compositionality is often discussed in terms of membership and nodes getting added or removed from the system. In scheduling, research often discusses schedulability of individual sets of tasks. However, to analyze a distributed real-time 142
system, we need a holistic approach, which looks at task and communication scheduling. The time-triggered architecture (TTA) and its time-triggered protocols TTP/A and TTP/C [20] provide proven technology for distributed hard real-time systems. The main aims of the protocols were clock synchronization, membership services, error detection, and redundancy management and each protocol provides different solutions for these issues. TTA stores task schedules in the task execution list (TADL), which is essentially a table stating start and finish times for each task. The dispatcher releases tasks according to this list. The TTA stores network in the message descriptor list (MEDL), which is similar to the TADL, but it is tailored to TDMA communication. The MEDL states for each slot, which node is allowed to communicate, and for fault-tolerance reasons dedicated hardware enforce the MEDL. Regarding compositionality, the TTA provides means for composing multiple nodes into one distributed real-time system, however, it does not provide an abstraction for compositionality. That means, for each integration of a new component, all resource utilization (i.e., all TADLs and MEDLs) need to be revisited. This is a disadvantage especially when new components are integrated at a later point in time or when manufacturers do not want to disclose all details about their components. In [35] the authors mention that the Giotto system is compositional with respect to value- and time-determinism. The Giotto system is designed to be ignorant of the scheduling policy and supports any policy as long as the deadlines are met. However, compositionality with respect to value- and time-determinism require compositionality in scheduling, otherwise all compositionality is in vain. In the real-time systems research, there is a grow-
143
ing attention to hierarchical scheduling frameworks [16, 43, 54, 22, 67, 70, 55, 75, 4, 76] that support hierarchical resource sharing under different scheduling algorithms. These studies mainly address hierarchical resource scheduling for a single resource. In particular, our compositional scheduling framework described in Chapter 5 presents compositional schedulability analysis techniques that support abstraction of schedulability analysis for single resource scheduling, while this proposed framework presents techniques for supporting abstraction of schedulability analysis for hierarchical dual resource scheduling with dependency.
6.6 Summary In this chapter, we have developed a compositional framework for the schedulability analysis of hierarchical, dependent resources in a distributed real-time system. For CPU and the partitioned network with dependency, we provide exact schedulability conditions under EDF scheduling and sufficient conditions under RM scheduling. We also provide a mechanism to abstract the dependency of the messages on the corresponding tasks by using the WCRT of the tasks. We have then described an abstraction mechanism to generate a single periodic message with offset as an abstraction for the set of messages within a node. Sufficient schedulability conditions are then given to check for schedulability of the abstracted message over the network channel under EDF and RM scheduling.
144
Chapter 7 Design Framework for Real-Time Embedded Systems 1
7.1 Introduction An embedded system consists of a collection of components that interact with each other and with their environment through sensors and actuators. Embedded systems are often resource-constrained as well as timing-constrained. One of typical scarce resources is memory for cost- sensitive systems. For such systems, it is may be desirable to generate as small code as possible, since the amount of memory used by application programs affect the component cost for memory modules. One of promising techniques for reduc1
This chapter includes a joint work with Sheayun Lee, Woonseok Kim, and Sang Lyul Min.
145
ing code size is a code generation technique that uses a dual instruction set [48]. Another typical scarce resource is power for battery-operated systems. For such system, one of popular approaches is to reduce the energy consumption of a processor using the dynamic voltage scaling (DVS) techniques. These techniques can effectively reduce resource use at the expense of increasing program execution time, which may cause timing constraints to be violated. Motivated by this observation, we propose an embedded system design framework that can be used to balance the tradeoff relationship among code size, execution time, and energy consumption. The proposed framework aims at providing a system designer with a parameterized view of the system development process, where the system can be flexibly fine-tuned with regard to the different performance criteria. A code generation technique has been proposed for a dual instruction set processor that exploits the tradeoff between a program’s code size and its WCET (worst case execution time) [49]. This technique generates the tradeoff between code size and execution time by providing different versions of code for the same application program, which have different characteristics in terms of code size and execution time. Specifically, a dual instruction set processor [48] supports a reduced (compressed) instruction set in addition to a full (normal) instruction set. By using the reduced instruction set in code generation, an application program’s code size can be significantly reduced, while its execution time is increased. Based on this property, a flexible tradeoff between the code size and the execution time can be obtained by selectively using the two instruction sets for different sections of the program [49].
146
system specification
task set
scheduling policy
hardware model
optimization criteria resource constraints
objective function
design parameters
1. satisfy resource constraints 2. minimize the objective function
design optimization
code generation parameters
voltage setting parameters
code generator
operating system
Figure 7.1: Overall structure of the proposed design framework.
The technique for dynamically adjusting the supply voltage and the clock frequency, called the DVS (dynamic voltage scaling) techniques, generates the tradeoff between execution time and energy consumption. A processor with the DVS techniques provides a mechanism to reduce the energy consumption by lowering the processor’s supply voltage at run time. However, when the supply voltage is lowered, the operating clock frequency must be lowered accordingly, which leads to an increased execution time for programs. Therefore, by setting an appropriate voltage/frequency level, we can exploit the tradeoff relationship between the processor’s execution speed and energy consumption.
147
Figure 7.1 shows the overall structure of the proposed design framework. The design framework receives two sets of inputs: a system specification and an optimization criteria. In the system specification, the task set defines the functionality of the system, along with the tradeoff data between the code size and execution time of each program. Besides, the system specification includes the scheduling policy used to schedule the tasks, which must meet the timing requirements associated with each of them. Finally, the hardware model describes the hardware platform on which the applications are executed. This includes the tradeoff data between the execution speed, i.e., the clock frequency, and the energy consumption of the processor. The optimization criteria are assumed to be specified by the system developer, which describe the resource constraints and the goal of the design optimization. The resource constraints are specified in terms of space, time, and energy. In other words, they consist of the code size constraint, the timing constraint, and the energy constraint imposed on the system. On the other hand, the goal of the design optimization is given in the form of an objective function that captures the system cost that is desired to be minimized. Given these inputs, the design framework generates a set of design parameters in such a way that the resulting system satisfies the constraints imposed on different aspects of the system resources, and at the same time minimizes the system cost function. The design parameters being derived are: (1) the code generation parameters that are given to the code generator to select a version of code for each application program, and (2) the voltage setting parameters that are given to the operating system so that it can adjust the
148
voltage/frequency of the processor at run time. By deriving these design parameters that are used at different levels of system development at the same time, the proposed design framework can effectively exploit the tradeoff involving space, time, and energy. As embedded systems become more complex due to increased functionalities, it is necessary to develop techniques and methods supporting component-based design for embedded systems. Motivated by this direction, we also provide extensions to the proposed design framework for addressing our design problems in a compositional way. The main contribution of the proposed framework can be summarized as follows. First, we identify the tradeoff relationship involving code size, execution time, and energy consumption, and propose an design optimization framework that flexibly balances this tradeoff. Second, the proposed technique mathematically formulates a multi-directional optimization problem, shows the NP-hardness of the problem, and provides an algorithmic solution to it. Third, the proposed technique automatically derives system design parameters, which is driven by an abstract specification of system requirements given by the system developer. Finally, we provide a systematic approach to address our proposed design optimization problem compositionally. The rest of this chapter is organized as follows. In Section 7.2, we describe the system model and assumptions. Section 7.3 present a formal description of the optimization problem that we address. Section 7.4 details the algorithm for deriving the system design parameters. We give the results from simulations in Section 7.5. In Section 7.6, we discuss possible extensions when we relax some assumptions. In Section 7.7 and Section 7.8, we
149
describe component-based design techniques for our proposed framework. Finally Section 7.9 concludes this chapter.
7.2 System Model and Problem Definition In this section, we describe the system model and assumptions of our proposed framework.
Workload Model. A workload is a task Ti that requires the processor for completing its computation. In this section, we also consider that task Ti requires memory for its code and power for processor execution. Task Ti is characterized by the following parameters: • Period pi : the fixed time interval between the arrival times of two consecutive request of Ti . We assume each task has a relative deadline equal to its period. • WCEC (Worst-Case Execution Cycles) gi : the number of the worst-case execution cycles to execute for the completion of Ti . In addition to the worst-case execution time, we use the notion of the worst case execution cycles (WCEC), since the execution time of programs can vary depending on run-time settings such as the voltage/frequency setting of the processor. • Code Size si : the code size of Ti ’s executable code. • CPU Frequency fi : the CPU frequency at which Ti executes. We assume that the processor supports dynamic voltage scaling (DVS) technique, where the operating frequency of the clock is proportional to the supply voltage. By lowering the supply
150
voltage and thus the clock frequency, the processor’s energy consumption can be reduced while execution performance is degraded to a certain extent. We assume that the operating frequency can be set at a continuous level in (0, fmax ], where fmax denotes the maximum clock frequency at which the processor can run. We assume that fmax = 1.0. • Execution Time ci : the amount of time to complete the execution Ti . We assume that the execution time of a program is inversely proportional to the current frequency setting. That is, we define ci = gi /fi . • Energy Consumption ei : the amount of energy that the processor consumes to complete the execution of Ti . In addition, the energy consumption of the processor is assumed to be proportional to the supply voltage squared [15], and the supply voltage and the clock frequency are in a direct proportional relationship. Thus, it holds that ei ∝ fi2 . For each task Ti , we now define its energy consumption ei as proportional to its WCEC and its CPU frequency squared [15] as follows:
ei = κ
PLCM ci fi2 , pi
(7.1)
where κ gives the energy consumption constant, and PLCM denotes the hyperperiod, i.e., the least common multiple of periods of all the tasks.
151
We define the system code size S as the sum of code sizes of all the tasks, i.e.,
S=
n X
si .
(7.2)
i=1
We define the system energy consumption E as the total energy consumption of all tasks, i.e., E=
n X
ei .
(7.3)
i=1
Size/Cycle Tradeoff List. In this chapter, we assume that each task Ti has a size/cycle tradeoff list SGi that is defined as follows: SGi enumerates the possible values of task parameters hsi , gii, under the assumption that each task has multiple versions of its executable code and that each version can have different code size and WCEC, i.e.,
SGi = {hsi,j , gi,j i | j = 1, 2, . . . , NiSG } and hsi , gi i ∈ SGi ,
and NiSG denotes the number of elements of SGi . Now, we describe our assumptions on the size/cycle tradeoff list SGi . We assume that each task has multiple versions of its executable code using the selective code transformation technique [49], which utilizes a dual instruction set processor. This technique generates each code version with different WCEC and code size. The greedy nature of this technique ensures that the size/cycle tradeoff list SGi is constructed satisfying the following two properties: • The code size si,j increases while the WCEC gi,j decreases as the index j increases.
152
That is, ∆si,j = si,j − si,j−1 > 0 and ∆gi,j = gi,j − gi,j−1 < 0, ∀i ∈ [1, n] and ∀j ∈ [2, NiSG ]. • The marginal gain in the WCEC reduction for the unit increase in the code size is monotonically non-increasing, i.e.,
∀i ∈ [1, n], ∀j ∈
[2, NiSG
− 1]
|∆gi,j+1| |∆gi,j | ≤ s . ∆si,j+1 ∆i,j
7.3 Design Optimization Problem: SETO With the assumptions described in the previous section, we present a formal statement of our design optimization problem, called the SETO (size-energy-time optimization) problem. Given a system specification that consists of a set of workloads under EDF scheduling, the SETO problem is to determine the code size and WCEC of each workload and the CPU frequency of the system such that a cost function is minimized subject to the system’s realtime and resource constraints. The input and output of the SETO problem are as follows: • Input. The input of the SETO problem is given as hW i, where W is a workload set {T1 , . . . , Tm } under EDF scheduling. Each task Ti has its own size/cycle tradeoff list SGi . • Output. The output of the SETO problem is represented as hV, F i, where – The size/cycle assignment vector V determines the code size and WCEC of each task, i.e., V = hv1 , . . . , vn i, where vi represents which element of SGi is 153
determined as the code size and WCEC of task Ti . That is,
∀Ti ∈ W si = si,k∗ and gi = gi,k∗ ,
where k ∗ = vi .
– The CPU frequency assignment vector F determines the CPU frequency of each task, i.e., F = hf1 , f2 , . . . , fn i, where fi denotes the frequency setting for task Ti . As a constrained optimization problem, the SETO problem has the following objective function and constraints: • Objective. We define a cost function to capture the resource costs in terms of code size and processor energy consumption considering their relative importance. We define the cost function as a linear combination of the system code size and energy consumption, i.e., f (S, E) = αS + βE ,
(7.4)
where coefficients α and β are constants, which we assume the system designer provides to indicate the relative importance of code size and energy consumption, respectively. Note that the system cost function does not take the timing behavior into account. Since in a hard real-time system, the timing requirement only serves as a constraint and does not affect the value or cost of the system, the objective function does not include the notion of time.
154
• Constraints: The component’s resource requirements are defined in terms of constraints imposed on size, energy, and time, as follows: – Resource constraints. The system code size S and energy consumption E must ¯ and E, ¯ respectively, i.e., not exceed certain upper bounds S,
S ≤ S¯
and E ≤ E¯ .
(7.5)
We assume S¯ and E¯ are given by the system designer. – Real-time constraint. The system must be schedulable under the given scheduling policy, i.e., all the task instances of the component must finish execution before their respective relative deadlines. We can check whether the timing constraint on the task set is satisfied using a simple schedulability test, since the tasks are scheduled by the EDF scheduling algorithm. The schedulability test calculates the system utilization, and determines that the whole task set is schedulable if and only if the utilization is below 1.0 [56].
Definition 7.3.1 The SETO problem is, given a workload set W under EDF scheduling, to find the size/cycle assignment vector V and the CPU frequency assignment vector F such that the following system cost function is minimized
f (S, E) = αS + βE
155
(7.6)
subject to n X ci ≤ 1.0 U= p i=1 i
S=
n X i=1
E=κ
n X i=1
si ≤ S¯
Ii gi fi2 ≤ E¯
(7.7)
(7.8)
(7.9)
where ci = gi /fi denotes the (worst case) execution time of task τi , and Ii gives the number of invocations of Ti in a hyperperiod, which can be calculated by Ii = PLCM /pi . Inequalities (7.7), (7.8), and (7.9) guarantee the timing, size, and energy constraints, respectively.
We present the following theorem to show that the SETO problem is intractable. Theorem 7.3.1 The SETO problem is NP-hard. Proof. The proof is via a polynomial-time reduction from the subset sum problem that is known to be NP-complete [24]. Let a set of positive integers A = {a1 , . . . , an } and k represent an instance of the subset sum problem. Assume that for each i, 1 ≤ i ≤ n, ai ≥ 1 and
Pn
i=1
ai = M.
For reduction, for each i, 1 ≤ i ≤ n, we first construct the size/cycle tradeoff list SGi of Ti such that SGi = {hǫ, ai − ǫi, hai − ǫ, ǫi}, where ǫ < 1/n. We pick the coefficients of the system cost function, α and β, such that α = 1 and β = 0. We also pick the system code size upper bound S¯ and the system energy consumption upper bound E¯ such that S¯ = M ¯ = κM. We pick the system-level CPU frequency Fsys such that Fsys = 1.0, and and E thus the CPU frquency fi of each Ti is constructed as fi = 1.0. We finally construct the 156
period pi of Ti such that pi = k. The energy consumption ei of Ti is κci . In any schedule,
Pn
i=1 (ci
+ si ) = M. Then, the system code size and energy con-
sumption constraints are then always met;
Pn
i=1
schedule, the timing constraint should be met; Pn
i=1
P ¯ In any feasible si < S¯ and ni=1 ei < E.
Pn
i=1 ci
≤ k. Then, in any feasible schedule,
si ≥ M − k. Considering ǫ < 1/n, we know that the minimum system code size is
between M − k and M − k + 1. Considering α = 1 and β = 0, the system cost function is minimized if and only if the system code size is minimized. The minimum system code size is achieved if and only if there is a subset A′ of A such that the sum of the elements of A′ is k.
2
Given the difficulty of the SETO problem, a natural approach is to develop a heuristic algorithm that find sub-optimal solutions. In the next section, we present a simple algorithm that can address the SETO problem effectively.
7.4 Algorithms for SETO Problem We propose an algorithm (ALG) that assigns to each task a pair of code size and WCEC and the processor’s operating frequency in such a way that the assignment meets the design goals described in the previous section. The algorithm consists of two distinct phases. In the first phase, it tries to satisfy the constraints imposed on the system’s code size, timing behavior, and energy consumption, while the second phase is aimed at minimizing the system cost function. The first phase begins with the smallest code size possible for each task by using an initial size/cycle assignment vector V = h1, 1, . . . , 1i. Then, aiming at 157
meeting the timing constraint and the energy constraint, the algorithm iteratively transforms the assignment vector. In each iteration of the algorithm, it selects a task whose code size is to be increased while the system code size does not exceed the given upper bound. In return for the increased code size of the selected task, the workload generated by the task is reduced, which not only makes the task set more likely to be schedulable, but also reduces the system’s energy consumption. Specifically, assuming the current size/cycle assignment vector is V = hv1 , v2 , . . . , vn i, each iteration of the algorithm selects a size/cycle descriptor from the candidate set {sgi,j |i ∈ [1, n] and j ∈ [vi + 1, NiSG ]} 2 and assigns vi = j. The selection is made in such a way that the system workload is reduced as much as possible for the unit increase in the code size. Once both the timing constraint and the energy constraint are met, the first phase is terminated and the second phase begins. In the second phase, the algorithm continues transforming the size/cycle assignment vector by selecting a size/cycle descriptor and assigning it to the corresponding task, where the selection is geared towards minimizing the system cost function. Note that the resource constraints remain satisfied during the second phase. Since the transformation reduces the system workload, the timing and energy constraints cannot be violated, while the code size constraint is checked in every iteration of the transformation. The algorithm terminates when (1) selecting any of the size/cycle descriptors remaining in the candidate set would violate the system code size constraint, or (2) no further reduction of the system cost function can be made by the transformation, i.e., the system cost function has been minimized. 2
If vi = NiSG for a task Ti , then the candidate set does not contain any size/cycle descriptor for that task.
158
After the algorithm finishes, the clock frequencies for tasks F = hf1 , f2 , . . . , fn i are assigned with a single value for all the tasks, i.e., fi = f , ∀i ∈ [1, n], which is shown to be optimal in terms of energy consumption under the EDF scheduling policy [69]. Calculating the optimal setting for the clock frequency f will be discussed later in Section 7.4.1. In Section 7.4.1 and Section 7.4.2, we show that throughout the transformation process, a single selection criteria can be used that favors the pair of code size and WCEC with the maximum ratio of reduction in the system workload to the increase in the code size. In Section 7.4.3, we will show that a simple greedy heuristic can be employed for this purpose, which has a low complexity but yet produces near-optimal results.
7.4.1 Phase 1: Satisfying the timing and energy constraints The first phase of the assignment algorithm tries to satisfy the timing constraint and the energy constraint by reducing the system workload by selecting a task and increase its code size. The selection is made in such a way that both the constraints are met with the increase in the system code size as small as possible. Therefore, in selecting a size/cycle descriptor, the algorithm favors the one with the maximum reduction of the system workload for the unit increase in code size, as will be explained in the following. As explained earlier in Section 7.3, we can check whether the timing constraint is met by using a simple schedulability test that compares the system utilization against the upper bound of 1.0. To be more specific, the deadlines of all the tasks are met if and only if the processor utilization is less than or equal to 1.0 assuming that all the task instances are
159
executed using the maximum frequency of the processor. If we use the same operating frequency f for all the task instances, the processor utilization can be represented as a function of the frequency. That is, U(f ) =
Pn
i=1 ti /pi
= (1/f )
Pn
i=1 ci /pi ,
and the schedulability
condition states that U(fmax ) ≤ 1.0. Furthermore, under the EDF scheduling algorithm, the optimal clock frequency setting is to use the same clock frequency for all the task instances in such a way that makes the processor utilization equal to 1.0 [69]. That is, if U(fmax ) is less than or equal to 1.0, setting fi = fmax × U(fmax ), ∀i ∈ [1, n] guarantees that the timing constraints of all the tasks are met, and at the same time minimizes the energy consumption of the task set. Based on this observation, if the task set is not schedulable with the initial size/cycle assignment vector V = h1, 1, . . . , 1i, the algorithm tries to lower the task set’s processor utilization by selecting a task and reduce its WCEC while increasing a certain amount of code size. For this purpose, the algorithm selects a size/cycle descriptor from the candidate set that has the largest ratio of the reduction of utilization to the increase in the code size. In order to estimate the reduction of the processor utilization by selecting a size/cycle descriptor, we rewrite the equation for the utilization as
U(f ) =
n n n 1 X ci 1 X 1 X = Ii ci = wi , f i=1 pi f P i=1 f P i=1
(7.10)
where wi = Ii ci gives the number of clock cycles required by all the invocations of task Ti during a hyperperiod. Based on this, the algorithm selects the size/cycle descriptor sgi,j with the maximum value of |∆wi,j |/∆si,j , where ∆wi,j = wi,j − wi and ∆si,j = si,j − si . 160
We call this ratio of |∆wi,j |/∆si,j the workload reduction factor of the size/cycle descriptor sgi,j , and it will be used throughout the entire transformation process. On the other hand, the energy consumption by the processor in a hyperperiod is calculated by E = κ
Pn
2 i=1 Ii ci fi ,
as in Eq. (7.9) presented earlier in Section 7.3. If we set
the operating frequency of all the tasks to be fi = f , the energy consumption becomes E = κf 2
Pn
i=1
wi . Letting f = fmax × U(fmax ) = κ E= 2 P
n X i=1
wi
1 P
Pn
!3
i=1
.
wi gives
(7.11)
Therefore, in order to satisfy the energy constraint with the smallest increase in the total code size, the algorithm selects the size/cycle descriptor sgi,j with the maximum workload reduction factor, which is the same selection criteria used in satisfying the timing constraint. This means that the first phase of the algorithm can be driven by a single selection criteria for the size/cycle descriptor to achieve its goal of satisfying both the timing and energy constraints with the smallest increase in the code size. That is, the algorithm incrementally transforms the size/cycle descriptor assignment vector by selecting in each iteration an size/cycle descriptor sgi,j with the maximum workload reduction factor among the ones in the candidate set. This selection is repeated until both the constraints are met, after which the algorithm moves on to its second phase where it tries to minimize the system cost function. In the case where either of the two constraints cannot be satisfied by the incremental transformation, the algorithm terminates and determines that there is no assignment of size/cycle descriptors that makes the task set feasible under the given set of 161
constraints.
7.4.2 Phase 2: Minimizing the system cost function Once the timing constraint and the energy constraint are met, the algorithm tries to minimize the system cost function by balancing the tradeoff between the system code size and the energy consumption. As mentioned earlier in Section 7.3, the system cost function is represented as a linear combination of the system code size and the system energy consumption, which is given by
f (S, E) = αS + βE = α
n X i=1
βκ si + 2 P
n X i=1
wi
!3
,
(7.12)
where constants α and β reflect the relative importance of the system code size and the system energy consumption, respectively. The second phase of the algorithm uses the same strategy in transforming the size/cycle assignment vector by selecting a task and increasing its code size in exchange for reduced energy consumption. In doing this, the value of the system cost function may increase or decrease depending on the amount of code size increase and the energy consumption reduction. Therefore, the algorithm should pay attention to the changes of the system cost function, as explained in the following. Suppose that the algorithm has selected an size/cycle descriptor sgi,j to be used in transforming the size/cycle assignment vector. Then the system code size increases, whereas the total energy consumption is reduced because (1) the total workload generated by the task set is reduced, and (2) the operating frequency can be lowered accordingly. That is, the 162
value of the system cost function after the transformation is f ′ = αS ′ + βE ′ , where S ′ = S + ∆S
(∆S > 0) (7.13)
E ′ = E − ∆E (∆E > 0) The reduction of the system cost function can be calculated by ∆f = f − f ′ = β∆E − α∆S. If we let ∆E = γ∆S, we have ∆f = (γβ − α)∆S. Therefore, for the reduction ∆f to be positive, the selection by the algorithm should have γ > α/β. Furthermore, in order to achieve the maximum reduction of the cost function, the algorithm should select the size/cycle descriptor that corresponds to the largest value of γ, i.e., the largest ratio of ∆E to ∆S. Since the largest reduction of the energy consumption can be achieved by the largest reduction of the system workload, the above mentioned selection is equivalent to selecting the one with the maximum workload reduction factor, which is the same selection criteria used in the first phase. Therefore, the second phase of the algorithm selects the size/cycle descriptor sgi,j with the maximum workload reduction factor, provided that ∆E/∆S > α/β and ∆si,j ≤ S¯ −S. If assigning the size/cycle descriptor with the maximum workload reduction factor would increase the system cost function, i.e., if ∆E/∆S ≤ α/β, selecting any other size/cycle descriptor would degrade the system cost function as well. In such a case, the optimization process is terminated and the final solution is generated. On the other hand, if assigning the size/cycle descriptor with the maximum workload reduction factor would violate the code size constraint, i.e., if ∆si,j > S¯ − S, the algorithm checks the one with the next
163
largest workload reduction factor by resuming the selection procedure after eliminating the size/cycle descriptor with the maximum workload reduction factor from the candidate set. If there remains no size/cycle descriptor in the candidate set, this means that no task’s workload can be further reduced, in which case the algorithm finishes with the current assignment. The next section presents a formal description of the assignment algorithm and discusses its properties and complexity.
7.4.3 The assignment algorithm Figure 7.2 shows the algorithm that assigns size/cycle descriptors to tasks. The algorithm, in both the first phase and the second phase, selects the size/cycle descriptor with the maximum workload reduction factor in a greedy fashion. Also note that in the case where either of the timing and energy constraints cannot be satisfied within the code size limit, the first phase of the algorithm determines that no assignment is possible such that the task set meets those requirements. On the other hand, in the second phase, the algorithm continues the transformation of the size/cycle assignment vector, either (1) until the system cost function is minimized, or (2) until the transformation has used up all the remaining code space. After the algorithm has finished, V will contain the assignment of size/cycle descriptors for all the tasks, from which the code generation decisions can be made. The algorithm does not examine all the size/cycle descriptors of tasks in each iteration. Instead, it checks only one size/cycle descriptor for each task in selecting the one with the maximum workload reduction factor. Specifically, since each task’s size/cycle descriptor
164
Algorithm Asn Exec Desc: Assign an size/cycle descriptor sgi = hsi , ci i for each task Ti , i = 1, 2, . . . , n. Input: SGi = {sgi |i = 1, 2, . . . , NiSG }, i = 1, 2, . . . , n Output: V = {vi |i = 1, 2, . . . , n} Initialization: V = hv1 , v2 , . . . , vn i ← h1, 1, . . . , 1i SG∗ ← {sg1,2 , sg2,2 , . . . , sgn,2} First phase: Pn Pn 3 ∗ 1 κ ¯ while ( fmax i=1 wi > 1.0 and P 2 ( i=1 wi ) > E and SG 6= φ ) { P ∗ select sgi,j ∈ SGP with maximum |∆wi,j |/∆si,j ¯ if (∆si,j ≤ S − ni=1 si ) { si ← si,j ; ci ← ci,j vi ← j } SG∗ ← SG∗ − {sgi,j } if (j < NiSG ) S SG∗ ← SG∗ {sgi,j+1} } if (SG∗ = φ) terminate (fail) Second phase: while (SG∗ 6= φ) { select sgi,j ∈ SG∗ with maximum |∆wi,j |/∆si,j if (∆E/∆S > α/β) { P if (∆si,j ≤ S¯ − ni=1 si ) { si ← si,j ; ci ← ci,j vi ← j } SG∗ ← SG∗ − {sgi,j } if (j < NiSG ) S SG∗ ← SG∗ {sgi,j+1} } else SG∗ ← φ } Figure 7.2: The algorithm for assigning an size/cycle descriptor to each task.
165
list has the property that the marginal gain in the ratio of reduction of the WCEC to the increase of the code size (|∆ci,j |/∆si,j ) is monotonically non-increasing as previously mentioned in Section 7.2, the workload reduction factors (|∆wi,j |/∆si,j = Ii |∆ci,j |/∆si,j ) are also in a non-increasing order. That is, the size/cycle descriptors sgi,j+1, sgi,j+2, . . . , sgi,NiSG for task Ti cannot have a greater workload reduction factor than the size/cycle descriptor sgi,j for the same task Ti . Therefore, assuming that the current size/cycle assignment vector is {v1 , v2 , . . . , vn }, the algorithm only needs to examine SG∗ = {sgi,j |i ∈ [1, n], j = vi + 1} 3 for the purpose of selecting an size/cycle descriptor among the remaining ones for all the tasks. The algorithm has a substantially lower time complexity than an exhaustive search algorithm. That is, the number of iterations of the proposed algorithm is
Pn
SG i=1 (Ni
− 1) in
the worst case, which is linear to the number of tasks. On the other hand, an exhaustive search would require the complexity of Πni=1 NiSG , which is exponential to the number of tasks and thus is considered impractical. Finally, the algorithm can be further improved by maintaining an ordered list for the candidate set SG∗ , with the workload reduction factor as the key. Having such an ordered list, selecting the size/cycle descriptor with the maximum workload reduction factor takes constant time and adding an size/cycle descriptor to the set requires O(log(|SG∗ |)) time, which would otherwise require time complexity of O(|SG∗ |) and constant time, respectively. Again, if vi = NiSG for a task Ti , this candidate set does not contain any size/cycle descriptor for that task. 3
166
7.4.4 Reverse-direction version So far, we have described the proposed ALG algorithm and now we present its reversedirection version (ALG-R). While the ALG algorithm works from the smallest possible code size of each task toward the largest possible code size, the ALG-R algorithm works the other way around, i.e., from the largest possible code size of each task toward the smallest possible code size. The ALG-R algorithm works as follows: • Phase 1. The first phase begins with the largest code size possible for each task by using the last size/cycle tradeoff for each task, i.e., V = hN1SG , N2SG , . . . , NnSG i. In this phase, the algorithm tries to reach a point where all constraints are met. It selects the size/cycle tradeoff sgi,j with the minimum value of ∆wi,j /|∆si,j | (workload increase factor), where ∆wi,j = wi,j − wi and ∆si,j = si,j − si . We note that the ALG algorithm selects a task with the maximum workload reduction factor. • Phase 2. Given that all constraints are met, we try to decrease the system cost function as much as possible, by choosing a task with the minimum workload increase factor as long as the chosen task does not increase the system cost function. • Assignment. The assignment step is the same as the ALG algorithm.
7.5 Results We performed a set of experiments to assess the effectiveness of the proposed algorithm. The analysis is done by simulations, which accurately emulate the behavior of the algo167
Table 7.1: Benchmark Programs Used in the Experiments
Name
Source
Description
32-bit CRC checksum computation fir SNU-RT finite impulse response filter jfdctint SNU-RT integer discrete cosine transform for the JPEG image compression algorithm ludcmp SNU-RT solution to 10 simultaneous linear equations by the LU decomposition method matmul SNU-RT multiplication of two 5 × 5 integer matrices adpcm.rawcaudio MiBench adaptive differential pulse code modulation speech encoding G.721.encode MediaBench CCITT G.721 voice compression blowfish public domain symmetric block cipher used for data encryption crc
SNU-RT
Number of tradeoff data 7 10 5
5
7 3 7 8
rithm described in Section 7.4. Section 7.5.1 describes the experimental setup used for the performance analysis, and Section 7.5.2 presents the results from simulations.
7.5.1 Simulation setup For the simulations, we obtained a set of benchmark programs from three different benchmark suites: SNU-RT real-time benchmark suite [1], MiBench [30], and MediaBench [47]. One exception is the blowfish program, which has been obtained in the public domain sources. Table 7.1 lists the programs used in the simulations. In the table, the first column shows the name of each benchmark, while the second one denotes the source of each
168
program. The third column gives a brief description of each benchmark program, and the final column shows the number of elements in each benchmark program’s execution descriptor list (Ki ). Note that the number of execution descriptors differs from one program to another, as it depends on the characteristics of the specific benchmark program. For simulations, we have five simulation parameters as follows: • The number of tasks (n). We set the variable n equal to 2, 3, . . . , and 8. • The tightness of the code size constraint (rS ). We set the variable rS equal to 0.2, 0.4, . . ., and 1.0 to determine the upper bound on the system code size S¯ as follows:
S¯ = rS · (Smax − Smin ) + Smin ,
where Smax and Smin represent the maximum and minimum possible system code sizes, respectively. In other words, a smaller value of rS means a tighter system code size constraint, whereas a larger value means a looser constraint, with an extreme case being when rS = 1.0, where the code size is not constrained at all. • The tightness of the energy constraint (rE ). Similar to rS , we set the variable rE equal to 0.2, 0.4, . . ., and 1.0 to determine the upper bound on the system energy ¯ as follows: consumption E
E¯ = rE · (Emax − Emin ) + Emin ,
169
where Emax and Emin denote the maximum and minimum possible system energy consumption, respectively. • The initial system utilization (rU ). We define the initial system utilization rU as the utilization of the task set when the execution descriptor assignment vector is V = h1, 1, . . . , 1i and the frequency assignment vector is F = hfmax , fmax , . . . , fmax i. The initial system utilization rU is adjusted by determining the period pi of each task τi by pi = ci,1 · n/rU , while we set the variable rU equal to 0.2, 0.4, . . ., and 1.0. • The relative importance of code size constraint and energy consumption constraint in the system cost function (α and β). To vary the relative importance of the system code size constraint and the system energy consumption constraint, we vary the relative importance of the two optimization criteria by letting α = 0.0, 0.2, . . ., and 1.0 and β = 1.0 − α. For each value of the number of tasks, n = 2, 3, · · · , 8, we generated 30 different task sets in a random manner from a set of benchmark programs4 . For each task set, the other four simulation parameters, combined together, give us a total of 750 different simulation cases. That is, since rS , rE , and rU have five different values, respectively, and the combination of α and β has six distinct values, the combination of all these values results in 5 × 5 × 5 × 6 = 750 distinct cases. Three performance metrics are used to measure the effectiveness of our proposed algorithm. One is the feasible solution percentage, which is the percentage of simulation cases, 4
To generate 30 different task sets, we added a modified crc program when n = 2, 6, and 7, and modified crc and matmul programs when n = 8.
170
where an algorithm finds a feasible solution. Another is the optimal solution percentage, which is the percentage of simulation cases, where an algorithm finds an optimal solution. Here, the optimal solution is defined to be an assignment that gives the minimum value of the objective function, which is found by an exhaustive search over all the possible combinations of the assignment. The final metric is the closeness, which represents the closeness of the solution of an algorithm to an optimal solution. The closeness of a solution z to the optimal solution z ∗ is defined as follows:
closeness(z) =
z . z∗
(7.14)
We computed the mean of closeness of an algorithm’s solutions only for its feasible solutions. We simulated two heuristic algorithms over all the simulation cases for each number of tasks, and also performed an exhaustive search to find an optimal solution.
7.5.2 Simulation results Table 7.2 presents the results to show the impact of the number of tasks (n) on the effectiveness of the two algorithms, which are our proposed algorithm (ALG) and its reversedirection version (ALG-R). It is shown that these two algorithms generate near-optimal solutions, with the average closeness no greater than 1.001, while ALG-R performs slightly better than ALG in terms of generating the worst-case closeness. The worst-case closeness of ALG ranges from 1.014 to 1.236 and that of ALG-R ranges from 1.009 to 1.122. Two 171
Table 7.2: Impact of the Number of Tasks on the Solution Quality Number of Tasks 2 3 4 5 6 7 8
Closeness Mean (Std. Dev.) ALG ALG-R 1.000 (4%) 1.000 (3%) 1.001 (5%) 1.001 (3%) 1.001 (4%) 1.001 (3%) 1.001 (3%) 1.000 (2%) 1.000 (2%) 1.000 (1%) 1.000 (1%) 1.000 (1%) 1.000 (1%) 1.000 (1%)
Worst-Case ALG ALG-R 1.236 1.185 1.156 1.122 1.089 1.057 1.056 1.032 1.029 1.015 1.017 1.011 1.014 1.009
Solution Percentage (%) Optimal Feasible ALG ALG-R ALG ALG-R 90.9 88.0 95.8 95.7 88.1 84.2 98.2 97.9 87.8 83.0 99.5 99.1 87.4 81.1 99.7 99.5 87.4 81.9 99.9 99.9 87.3 83.0 100.0 100.0 86.8 81.5 100.0 100.0
algorithms perform similarly in finding feasible solutions, but ALG performs slightly better than ALG-R in finding optimal solutions. The optimal solution percentage of ALG ranges from 86 % to 90 %, while that of ALG-R ranges from 81 % to 88 %. As the number of tasks increases, the algorithms generate a worst-case solution closer to the optimal one. With a large number of tasks, the algorithms have a number of different possible selections for the execution descriptor assignment. Therefore, even if the algorithms make sub-optimal decisions, the solutions are more likely to be close to the optimal one, since the algorithm have a number of choices of tasks whose code size is increased in return for reduced workload. On the other hand, the optimal solution percentage of the two algorithms decreases as the number of tasks increases. This can be explained as follows. When there are a large number of tasks, the solution space is large, i.e., there exist a number of different feasible solutions. Therefore, a number of execution descriptor assignments are possible that are different from the optimal one but yet very close to it, which leads to the situations that
172
Table 7.3: Impact of the Tightness of the Code Size Constraint on the Solution Quality rS 0.20 0.40 0.60 0.80 1.00
Closeness Mean (Std. Dev.) ALG ALG-R 1.000 (0.002) 1.001 (0.002) 1.001 (0.003) 1.001 (0.002) 1.001 (0.003) 1.000 (0.002) 1.001 (0.003) 1.000 (0.002) 1.001 (0.003) 1.000 (0.002)
Worst-Case ALG ALG-R 1.022 1.016 1.056 1.032 1.056 1.032 1.056 1.032 1.056 1.032
Solution Percentage (%) Optimal Feasible ALG ALG-R ALG ALG-R 83.7 73.8 98.7 97.3 85.2 77.5 100.0 100.0 86.7 80.1 100.0 100.0 89.1 81.1 100.0 100.0 91.5 93.1 100.0 100.0
Table 7.4: Impact of the Tightness of the Energy Constraint on the Solution Quality rE 0.2 0.4 0.6 0.8 1.0
Closeness Mean (Std. Dev.) ALG ALG-R 1.002 (0.005) 1.001 (0.004) 1.001 (0.002) 1.000 (0.001) 1.001 (0.003) 1.000 (0.002) 1.000 (0.001) 1.000 (0.001) 1.000 (0.001) 1.000 (0.001)
Worst-Case ALG ALG-R 1.056 1.032 1.023 1.013 1.021 1.016 1.010 1.012 1.010 1.012
Solution Percentage (%) Optimal Feasible ALG ALG-R ALG ALG-R 75.3 73.2 99.0 97.7 86.1 79.5 99.7 99.7 89.9 83.1 100.0 100.0 95.1 87.7 100.0 100.0 89.8 82.2 100.0 100.0
both of the two algorithms generate solutions different from the optimal one. However, even in such cases, the magnitude of the closeness for the algorithms is negligibly small, which indicates that the algorithms can always derive a near-optimal solution. To assess the impact of the other parameters, we summarize the simulation results in a number of different ways. In doing this, we fixed the number of tasks to n = 5, while all the parameters other than the one being considered are varied in the way explained in Section 7.5.1. The simulation results are averaged over all the combinations of the varying parameters. First, we analyzed the impact of the tightness of the system code size constraint on the
173
Table 7.5: Impact of the System Utilization on the Solution Quality rU 0.2 0.4 0.6 0.8 1.0
Closeness Mean (Std. Dev.) ALG ALG-R 1.001 (0.003) 1.000 (0.002) 1.001 (0.003) 1.000 (0.002) 1.001 (0.003) 1.000 (0.002) 1.001 (0.003) 1.000 (0.002) 1.001 (0.003) 1.000 (0.002)
Worst-Case ALG ALG-R 1.056 1.032 1.056 1.032 1.056 1.032 1.056 1.032 1.056 1.032
Solution Percentage (%) Optimal Feasible ALG ALG-R ALG ALG-R 87.3 81.2 99.7 99.5 87.2 81.1 99.7 99.5 87.3 81.1 99.7 99.5 87.2 81.1 99.7 99.5 87.2 81.1 99.7 99.5
solution quality by varying the variable rS . The results are shown in Table 7.3. When the constraint on the system code size is tight, i.e., when the value of rS is small, the algorithms should efficiently reduce the workload of the task set by distributing the small additional code space to appropriate tasks. On the other hand, when the constraint is loose, i.e., when the value of rS is large, the algorithms are allowed to more freely select a task whose code size is to be increased for reduced workload. Therefore, the optimal solution percentage of the proposed algorithm becomes higher as the value of rS increases. Table 7.4 shows the simulation results according to the variation in the tightness of the system energy consumption constraint. The results show that, when the energy constraint is tight, the closeness is larger for both the algorithms than when it is loose. The closeness of the algorithms is shown to be immune to the tightness of the energy consumption constraint. On the other hand, the worst-case closeness of the algorithms gets smaller and their optimal solution percentage becomes higher as the value of rE increases. Table 7.5 summarizes the results according to the varying initial system utilization. The system utilization constrains the optimization problem in the form of tightness of timing
174
Table 7.6: Impact of the Relative Importance of the System Code Size and the System Energy Consumption on the Solution Quality α 0.0 0.2 0.4 0.6 0.8 1.0
Closeness Mean (Std. Dev.) ALG ALG-R 1.000 (0.001) 1.001 (0.002) 1.000 (0.000) 1.000 (0.001) 1.000 (0.001) 1.000 (0.001) 1.000 (0.002) 1.000 (0.001) 1.001 (0.003) 1.000 (0.002) 1.002 (0.006) 1.001 (0.004)
Worst-Case ALG ALG-R 1.007 1.012 1.005 1.006 1.015 1.011 1.028 1.018 1.042 1.025 1.056 1.032
Solution Percentage (%) Optimal Feasible ALG ALG-R ALG ALG-R 80.5 37.5 99.7 99.5 97.7 94.6 99.7 99.5 97.3 98.1 99.7 99.5 92.7 95.2 99.7 99.5 83.6 87.0 99.7 99.5 71.6 74.5 99.7 99.5
constraint for each task. Since the EDF scheduling algorithm is able to meet the deadlines of all the tasks provided that the system utilization (under the maximum frequency settings) is under 1.0, the system utilization has a direct relationship with the schedulability of the task set and thus the timing constraints. The results do not differ significantly depending on the initial system utilization. That is, the timing constraint does not have a great influence on the solution quality, since it does not affect the procedure of minimizing the objective function once all the tasks are guaranteed to meet their respective deadlines. Finally, Table 7.6 shows the results according to the relative importance of the system code size and the system energy consumption.When α = 0.0 and β = 1.0 − α = 1.0, the objective function consists only of the system energy consumption, where the optimization goal is to minimize the system energy consumption. On the other hand, when α = 1.0 and β = 0.0, the optimization goal is to solely minimize the system code size, indicated by the objective function simply denoting the system code size. As shown in the table, the relative importance given by the system developer does not have a great impact on the solution
175
quality, in terms of both the worst-case closeness and the optimal solution percentage. Two notable cases from the others are when α = 1.0 and α = 0.0, where we have a smaller optimal solution percentage for both algorithms. These correspond to the cases where the optimization goal is merely the system code size or the system energy consumption only. Since the algorithms use the workload reduction (increase) factor as the criteria in selecting the task whose code size is increased (reduced) for a reduced (increased) workload in order to reduce a combination of the system code size and energy consumption, such performance degradation is possible when the objective function captures only the system code size or the system energy consumption. That is, in selecting a task and increasing its code size, the algorithms give priority to the one with the maximum workload reduction factor (minimum workload increase factor), instead of the one with the smallest increase in code size or in energy consumption. Even in such cases, both algorithm’s average closeness was no greater than 1.002.
7.6 Discussion So far, we have addressed the proposed optimization problem under the assumptions that (1) the operating frequency of the processor can be continuously scaled and (2) the execution descriptor list of each task has the property of monotonically non-increasing workload reduction factors. This section discusses the issues of extending our design framework in relaxing these two assumptions.
176
7.6.1
Discrete-level CPU frequency settings
In Section 2.1, we assumed that the clock frequency of the processor can be set at a continuous level. However, using continuously variable clock frequencies is infeasible, because it requires significant power and hardware cost [37]. In other words, real processors usually have a finite number of operating frequencies, and clock frequencies cannot be continusouly scaled. To sustain acceptable performance and timeliness guarantee, these processors have to operate at the next higher energy-efficient operating frequency(f ′ ) if a desired frequency is not available [70]. That is, where the processor has Q levels of clock frequency (i.e., f ′ ∈ {F1 , F2 , · · · , FQ } and fmax = FQ ), the next higher clock frequency(f ′ ) can be determined as
f ′ = MIN { Fi | U(FQ ) · FQ ≤ Fi , 1 ≤ i ≤ Q }.
(7.15)
Although the schedule of the task set is feasible under f ′ , it may not result optimal energy efficiency. When the resulting processor utilization is less than 1.0 (i.e., U(f ′ ) < 1.0), the processor is not fully utilized and there are idle times in the task schedule. In this case, some tasks may have a chance to lower their clock frequencies one more step, so that tasks may have different clock frequencies in order to improve the energy efficiency. For this problem, several DVS algorithms were proposed in [37, 44]. However, these algorithms cannot be directly used for the periodic task model. In this paper, we formulate this problem as a well known 0/1 Knapsack problem, and it can be solved using a traditional search algorithm. 177
Where the clock frequency for τi is
fi = { Fqi | Fqi ∈ {F1 , F2 , · · · , FQ } , 1 ≤ qi ≤ Q},
the set of current clock frequencies for tasks is
F = {f1 , · · · , fn } = {Fq1 , · · · , Fqn }.
Our goal is to find a set of tasks which can be scheduled with the next lower clock frequency (Fqi −1 ) while the feasible schedule of tasks is still guaranteed (UF ′ ≤ 1, where UF ′ is the processor utilization when tasks are scheduled with F ′ ). Let the solution be L = {li |li ∈ [0, 1], 1 ≤ i ≤ n}. If τi can be scheduled with a next lower clock frequency, then li is set to 1; otherwise to 0. Then the problem can be formulated as follows.
• Find L = {l1 , · · · , ln } – (Objective) which maximizes n n X X ci ti = U = p p · Fqi −li i=1 i i=1 i ′
– (Constraints) while satisfying
U ′ ≤ 1 and qi − li > 0
178
where ∀i ∈ [1, n] , li ∈ {0, 1} – (Assignment) When the solution(L) is found, the clock frequency for each task can be adjusted as
∀i ∈ [1, n] , qi = qi − li and fi = Fqi . Even if the tasks are scheduled with these lowered clock frequencies, the task set may not fully utilize the processor (i.e., UF ′ < 1). In this case, we may iterate the above procedure until no task can be scheduled with a further lowered clock frequency.
7.6.2
Non-convex property of size/cycle tradeoff list
So far, we have assumed that the size/cycle tradeoff list of each task has the property of monotonically non-increasing workload reduction factors. The proposed algorithm presented in Section 7.4 needs to examine only one size/cycle tradeoff for each task because of this property. That is, the candidate set X = {xi,j |i ∈ [1, n], j = vi + 1} only contains one size/cycle tradeoff for a task, which is immediately following the one xi,vi that is currently assigned during the iteration of the proposed algorithm. However, when we relax this assumption on the size/cycle tradeoff list property, the algorithm should explore an extended candidate set X ∗ = {xi,j |i ∈ [1, n], j ∈ [vi + 1, Ki]} that lists all the size/cycle tradeoffs for a task that have larger code sizes than the one currently assigned for that task
179
(i.e., xi,vi ). That is, by switching X with X ∗ in the proposed algorithm, we can solve the same problem of finding the size/cycle tradeoff assignment vector, without assuming any property for the size/cycle tradeoff list for each task. This modified algorithm still has a significantly lower time complexity than an exhaustive search algorithm. If we let K denote the sum of number of size/cycle tradeoffs for all the tasks, i.e., K = tradeoffs is given by
Pn
i=1
PK
Ki , then the worst case number of evaluations of the size/cycle
i=1 (K
− i) = (K 2 − K)/2. Since K is proportional to the number
of tasks n, this indicates that the time complexity of our modified algorithm is O(n2 ), while that of the algorithm presented in Section 7.4 is linear to the number of tasks. Note that this complexity is still polynomial to the number of tasks, while that of the exhaustive search is Πni=1 Ki , which is exponential to the number of tasks.
7.7 Extension I: Component Abstraction In the open environment [16], a component can be developed independently. In this environment, the resource utilizations of individual workloads within a component can be determined independently, i.e., without regard to other components. In this section, we consider a component-level version of the SETO problem, which decides the size/cycle descriptors and CPU frequencies of individual workloads of a component while minimizing a component-level cost function subject to component-level real-time and resource constraints. For this problem, we first present a component interface model, which can be used as a solution model to this problem, and propose an algorithm to solve the problem.
180
In this section, we assume that the processor supports a finite number of variable voltage and clock frequency settings, where the operating frequency of the clock is proportional to the supply voltage. We assume that the processor provides 4 pairs of clock frequency and voltage, i.e., the CPU clock frequency is one of 1.0, 0.8, 0.6, or 0.4.
7.7.1 Component and Interface Models We define a component as ChW, Ai, where W is a set of workloads, i.e., W = {T1 , . . . , Tn }, and A is a scheduling algorithm for W . The scheduling algorithm is assumed to be either EDF or RM. We define a component interface model that can abstract timing and resource properties of components. Our proposed component interface is defined as CI = hP, C, S, Fi, where • P is component period, • C is component execution time requirement, • S is component code size, and • F is component CPU frequency. For component ChW, Ai, we explain how to derive its component interface CI. We assume that for each task Ti ∈ W , its size/cycle descriptor sgi is decided from the size/cycle tradeoff list SGi such that si = si,j and gi = gi,j , where j = sgi , and its CPU frequency fi decided such that fi = f ∗ for some f ∗ . Assuming the value of P is given, we can derive
181
the component interface CI as follows5:
C = PCA(P, hW, Ai) ,
S=
n X
si ,
and
F = f∗ .
i=1
For a component interface CI, let U and E denote its component processor utilization and energy consumption, respectively, and we define them as
κCF2 . U = C/P and E = P
(7.16)
7.7.2 Component Abstraction Problem: CAP-USE We define a problem, called the CAP-USE (component abstraction problem - utilization, size, and energy) optimization problem, that decides the processor utilization, code size, and processor energy consumption of a component while minimizing a cost function subject to the component’s real-time and resource constraints. We define this problem more formally as follows: given a component ChW, Ai and a component period P, the problem is to determine the size/cycle assignment vector V and the CPU frequency assignment vector F and then derive its component interface CI such that the following cost function is minimized f (U, S, E) = αS + βE + γU 5
The PCA method is defined in Section 5.4.1.
182
(7.17)
subject to ¯ , S ≤ S¯ , and E ≤ E¯ . U≤U
(7.18)
where coefficients α, β and γ are constants, which we assume the system designer provides to indicate the relative importance of processor utilization, code size, and energy consumption, respectively. Inequalities (7.18) guarantee the processor utilization, size, and energy constraints, respectively. We note that as long as the component receives processor allocations according to the periodic requirement imposed by hP, Ci, the component’s timing requirements are satisfied. That is, U is the minimum processor utilization require¯ both the ment necessary to guarantee the schedulability of the component. When U ≤ U, real-time and processor utilization constraints are satisfied. Compared with the SETO problem, defined in Section 7.3.1, the CAP-USE optimization problem takes the processor utilization into account in its cost function and constraints. When the processor is not dedicated to a component, it is meaningful to place an upper bound to the processor utilization of the component, like other resources such as code size and energy. In this sense, it is also meaningful to add the resource cost of utilizing the processor into the cost function. We note that in the SETO problem, which is a system-level design optimization problem, the processor is dedicated to a workload set of the system.
7.7.3 Algorithm to Component Abstraction Problem We propose an algorithm that addresses the CAP-USE optimization problem efficiently. The proposed algorithm is a greedy algorithm which iteratively reduces the cost function
183
by choosing a path in such a way that the chosen path reduces it the most among all possible candidate paths. The proposed algorithm consists of two distinct phases. In the first phase, it tries to satisfy the constraints imposed on timing behavior, processor utilization, code size, and energy consumption, while the second phase is aimed at minimizing the cost function.
Phase 1: Satisfying the timing and energy constraints.
The goal of the first phase is
to find V and F such that all the constraints are met. Initially, the algorithm begins with the smallest code size, the largest WCEC, and the fastest CPU frequency for each task. That is, the size/cycle assignment vector V and the CPU frequency assignment vector F are initialized to V = h1, 1, . . . , 1i and F = hfmax , fmax , . . . , fmax i, where fmax = 1.0. Then, these initial settings bring with a component the smallest component code size and the largest energy consumption. From these initial settings, aiming at meeting the timing, energy, and processor utilization constraints, the algorithm iteratively transforms either the size/cycle assignment vector V or the CPU frequency F as follows: • In transforming the assignment vector V , it selects a task whose code size is to be increased while the component code size does not exceed the given upper bound. In return for the increased code size of the selected task, the WCEC of the task is reduced, which not only makes the component more likely to be schedulable, but also reduces the component energy consumption. Specifically, assuming the current size/cycle assignment vector is V = hv1 , v2 , . . . , vn i, each iteration of the algorithm selects a task Ti among the task set W , increases vi by 1 to j, where j ≤ NiSG , and 184
assigns to Ti its code size and WCEC such that si = si,j and gi = gi,j . The selection is made in such a way that the cost function is reduced as much as possible after transformation. • In transforming the CPU frequency assignment vector F = {hf1 , f2 , . . . , fn i, if fi is not equal to the minimum possible CPU frequency, the algorithm makes fi one-level slower by decreasing it by 0.2, for all i = 1, 2, . . . , n. In each iteration of the algorithm, it transforms either V or F , whichever reduces the cost function more. This iteration continues until all the constraints are met or no more transformation can be made.
Phase 2: Minimizing the cost function.
Once the timing, energy, and processor utiliza-
tion constraints are met, the first phase is terminated and the second phase begins. In the second phase, the algorithm continues transforming either the size/cycle assignment vector V and the CPU frequency assignment vector F in the same way as it does in the first phase. The algorithm terminates the iteration of transformation when any further transformation can violate any of the constraints or when there is no further transformation to reduce the cost function.
Time complexity of the algorithm. The algorithm has a substantially lower time complexity than an exhaustive search algorithm. That is, the number of iterations of the proposed algorithm is
Pn
SG i=1 (Ni −1)
in the worst case, which is linear to the number of tasks.
During each iteration, there is at most n+1 calculations of the cost function. Therefore, the 185
task-level tradeoff
component-level tradeoff
code size
code size
execution time
execution time
Figure 7.3: Composing task-level tradeoffs into component-level tradeoff.
algorithm requires O(n2 ) time to find a solution. On the other hand, an exhaustive search would require the complexity of Πni=1 NiSG , which is exponential to the number of tasks and thus is considered impractical.
7.8 Extension II: Compositional Approach to the SETO Problem Component-based real-time embedded systems may include a hierarchical scheduling framework, where components include their own schedulers and form a hierarchy of scheduling. In such component-based systems, it is desirable to compositionally address design problems, i.e., a system-level design problem can be decomposed into component-level design problems and a solution to the system-level design problem can be obtained through the solutions to the component-level design problems. As define in Section 7.3, the SETO 186
problem is one of design problems for real-time embedded systems, which is to determine the code size, WCEC, and CPU frequency of each task from its own candidate sets while optimizing a total cost on resource use. In this section, we present an approach to compositionally address the SETO problem. Our approach focuses on obtaining a component-level code size vs. execution time tradeoff list by combining task-level code size vs. execution time tradeoff lists, as illustrated in Figure 7.3.
7.8.1 Component and Interface Models A component is defined as ChW, Ai, where W is a workload set, i.e., W = {T1 , . . . , Tn }, and A is a scheduling algorithm, assumed to be either EDF or RM. A component interface model is defined as CI′ hP, STFi, where • P is component period. • SCF is component size/time/frequency candidate set. We first define component size/time tradeoff list SC as a list of pairs of hS, Ci, i.e., S = {hS, Ci}, where S is component code size and C is component execution time requirement. We then define SCF as a set of pairs of SC and F, i.e., SCF = {hSC, Fi}, where SC is component size/time tradeoff list and F is component CPU frequency.
7.8.2 Component Abstraction Problem: CAP-SCF We define a problem, called the CAP-SCF (component abstraction problem - size, execution time, CPU frequency) problem. Intuitionally, this problem is, for each combination of 187
convex
inbetween
(b)
dominated
2.7
2.7
2.6
2.6
CODE SIZE (K)
CODE SIZE (K)
(a)
2.5 2.4 2.3
convex
inbetween
dominated
2.5 2.4 2.3 2.2
2.2
2.1
2.1 1.2
1.3
1.4
1.5
1.6
1.7
1.5
1.6
1.7
1.8
1.9
2
WCEC (K)
WCEC (K)
Figure 7.4: The size/cycle tradeoff list SC: (a) EDF and (b) RM scheduling.
code size si , WCEC gi and CPU frequency fi of every task Ti , to combine the collective task-level code size and execution time into a component-level code size S and execution time C, as shown in Figure 7.3. We can obtain the component-level execution time C, assuming the value of component period P, through the PCA method6. We define the problem more formally as follows: given a component ChW, Ai and a component period P, the problem is to develop a component interface CI′ hP, SCFi, where SCF = {hSCk , Fk i}, that abstracts the timing and resource properties of the component C such that
∀hSCk , Fk i ∈ SCF
S=
X
Ti ∈W
∀hS, Ci ∈ SCk
∀Ti ∈ W
∃hsi , gi i ∈ SGi
si and C = PCA(P, hW ′, Ai), where W ′ = {Ti′ hp′i , c′ii|p′i = pi ∧c′i = gi /Fk }. (7.19)
We classify the elements of SCk into three categories. An element hS, Ci of SCk is said 6
where the PCA method is defined in Section 5.4.1.
188
to be dominated if ∃hS′ , C′ i ∈ SCk
S ≤ S′ and C ≤ C′ .
An element hSm , Cm i of SCk is said to be convex, where 1 < m < N SCk , if
∀i ∈ [1, m − 1] ∀j ∈ [m + 1, N SCk ]
|Sm − Si | |Sj − Sm | ≤ . Cm − Ci Cj − Cm
The first element hS1 , C1 i and the last element hSK , CK i are convex if they are not dominated, respectively. An element hSm , Cm i of SCk is said to be inbetween unless it is dominated and convex. Figure 7.4 illustrates such a classification of elements of component size/time tradeoff list. As an example, we consider a component ChW, Ai, where a workload set W = {T1 , T2 } such that p1 = 35, p2 = 45, and
SG1 = {h2.97, 0.64i, h2.84, 0.69i, h2.80, 0.78i, h2.79, 0.84i}, SG2 = {h4.96, 1.55i, h4.52, 1.64i, h4.42, 1.73i, h4.36, 1.80i},
and a scheduling algorithm A = EDF. Then, we can develop an interface CI ′ hP, SCFi of component ChW, Ai, where P is assumed to be given as 10, as follows:
SCF = {hSC1 , 1.0i, hSC2 , 0.8i, hSC3 , 0.6i, hSC4 , 0.4i}.
189
Figure 7.4 shows all the elements of SC1 depending on A = EDF or RM with a classification explained as above. We can trim a solution to the CAP-SCF problem with the above example by leaving 5 convex elements to each component size/time tradeoff list SCk as follows: SC1 = {h2.74, 2.19i, h2.54, 2.28i, h2.50, 2.33i, h2.48, 2.40i, h2.45, 2.58i}, SC2 = {h3.32, 2.19i, h3.07, 2.35i, h3.03, 2.40i, h3.01, 2.49i, h2.99, 2.64i}, SC3 = {h4.44, 2.19i, h4.17, 2.28i, h4.11, 2.33i, h4.07, 2.40i, h4.03, 2.58i}, SC4 = {h6.66, 2.19i, h6.12, 2.33i, h6.07, 2.40i, h6.01, 2.58i, h6.00, 2.64i}.
The component interface CI′ enables us to view a component as a task such that the component has its own code size vs. execution time tradeoff as the task does. We can then treat a component interface CI′ , which is a solution to the CAP-SCF problem, as a single task. That is, we can apply the same technique to addressing the CAP-SCF problem for am assembly of components as the one to addressing the CAP-SCF problem for a single component. This way allows us to compositionally address the CAP-SCF problem.
7.9 Summary In this chapter, we proposed a framework for addressing a design optimization problem, which involves the code size vs. WCEC tradeoff and the voltage vs. CPU clock speed tradeoff. We proposed an efficient algorithm for finding solutions close to optimal solutions
190
to the problem and evaluated the performance of the proposed algorithm. In addition, we described an approach to address the problem in a compositional way, mainly with the component interface that can abstract a component-level code size vs. WCET tradeoff from collective task-level code size vs. WCET tradeoffs.
191
Chapter 8 Conclusions and Future Work We close this dissertation with an enumeration of several remaining challenges to our proposed approach. We then present research problems that may be addressed in future work. Finally, we conclude.
8.1 Outstanding Problems Although our proposed framework provides a promising foundation for the design and analysis of component-based real-time embedded systems, a number of outstanding problems must be addressed to deploy our framework over the current and future component-based real-time embedded systems.
• Task and Component Dependencies. In our proposed framework, we assume that real-time workloads are independent. However, control and data dependencies impose precedence constraints among tasks, and timing constraints of tasks are not 192
independent. In some cases, the system-level (end-to-end) timing requirements can be given for a task set and individual tasks in the task set work collectively to meet the system-level timing requirements of the task set. The PCM (Period Calibration Method) framework [25] has been proposed that transforms the end-to-end timing requirements of a task set into the timing parameters of individual tasks such that the individual tasks collectively satisfy the system-level timing requirements while minimizing the system resource utilization. Component interfaces, which specifies the collective timing requirements of components, can be developed for workloads with precedence constraints employing the techniques proposed in the PCM framework. We also assume that components are independent. In some cases, however, components can have precedence constraints according to their control and data dependencies. Our proposed framework can be extended for supporting precedence constraints among components by addressing the following problems: (1) developing component interfaces that can specify the precedence constraints among components and (2) achieving schedulability analysis through such component interfaces. • Resource Access Control. We mainly focus on processor scheduling. In addition to a processor, however, each workload may require some other resource in order to execute. Resource contention can affect the execution behavior and schedulability of real-time workloads. Various resource access-control protocols have been proposed to reduce the undesirable effect of resource contention. For example, the priorityceiling protocol [71, 72] has been proposed to prevent deadlocks and to reduce the
193
blocking time. The priority-ceiling protocol can be employed to address intra- and inter-component resource contentions, and component interfaces can be developed to describe resource contention behavior within and across components. • Preemption Overhead. In this dissertation, we assume that preemption lays no additional costs in processor utilization and energy consumption. However, preemption overheads should be considered in practical settings. For example, the optimality of solutions to the component timing abstraction problem, described in Section 5.4, is defined as minimizing the resource utilization of the solutions. The optimality of solutions can be extended to include minimizing preemption overheads. The issues of modeling and bounding preemption overheads remain for future research. In the DVS (dynamic voltage scaling) techniques, when a CPU clock speed is lowered with the aim of reducing the energy consumption, task execution times are also extended accordingly. Therefore, a lower-priority task might be preempted more often by higher-priority tasks than when it is executed at a higher frequency. If preemptions are frequent due to the lengthened task execution times, cache-related preemption costs can amount to a significant portion of energy consumption in memory subsystems. Our proposed framework can be extended to handle such a problem. One approach to this problem is to employ preemption-aware DVS algorithms [40]. These algorithms control task preemptions by accelerating the task execution or postponing the activation of higher-priority tasks so that unnecessary preemptions are reduced while keeping the energy efficiency of the DVS algorithm.
194
• Soft Real-time Component Interface Models. Our proposed compositional real-time scheduling framework considers hard real-time workload models. However, there are many soft real-time models that can be used for describing real-time constraints of workloads. For example, the hm, ki-firm deadline model [32], the imprecise computation model [52], the reward-based computation models [18, 7], and the weakly hard workload model [11] can effectively used for specifying soft real-time constraints. In addition, a stochastic model [38] can be used to specify soft real-time workload models. For the components that consist of soft real-time workloads, it is necessary to develop soft real-time component interface models for specifying their collective real-time requirements. • Optimal Voltage Setting for Fixed-priority Scheduling. In our design framework for real-time embedded systems, described in Chapter 7, we addressed the SETO problem under the assumption that a workload set is scheduled by the EDF scheduling algorithm. However, a different scheduling algorithm, e.g., a fixed-priority scheduling policy such as the RM (Rate Monotonic) scheduling, can be used to schedule the workload set. An optimal approach to set the system-level CPU clock speed for all individual tasks in terms of reducing a total energy consumption under EDF scheduling has been known [69]. Examples of approaches aiming at assigning the optimal voltage settings for fixed-priority task sets are described in [69, 63, 29]. However, determining the system-level CPU clock speed for all individual tasks optimally under fixed-priority scheduling is an active area of open research.
195
8.2 Future Work In this section, we identify several new research areas to which our proposed framework could be potentially applied.
• Schedulability Analysis for Ad-Hoc Wireless Networks. Schedulability analysis in multihop wireless networks has been an area of open research. Schedulability analysis in distributed systems is, in general, an NP-hard problem. Hence, there is no closed-form formula to quantify the exact ability of the network for guaranteeing the schedulability of real-time packets. Abdelzaher et al. [2] presented an initial step toward schedulability in ad-hoc wireless environments. They proposed the notion of real-time capacity for quantifying the total byte-meters that can be delivered in time and derived a sufficient condition for schedulability for fixedpriority packet scheduling policies. Our future work includes developing techniques for schedulability analysis in ad-hoc wireless networks. Our compositional schedulability analysis techniques could be employed for such NP-hard schedulability analysis problems as an approach to alleviate the complexity of the problems. • Multiprocessor Scheduling. Our proposed framework supports uni-processor realtime scheduling. However, many future real-time systems are expected to deploy multiprocessor environments. Two major approaches for multiprocessor real-time scheduling are global scheduling or partitioning [19]. In a global scheduling approach, all workload instances are stored in a single queue and, at any given time,
196
the processors run the instances with the highest priority. A partitioning approach involves two steps: one assigns workloads to processors and the other schedules workloads on each processor. Our future work includes supporting compositional multiprocessor scheduling such that we can address global scheduling and partitioning hierarchically.
8.3 Summary In this dissertation, we proposed a framework for supporting the compositional design and analysis of real-time embedded systems, focusing on real-time scheduling and optimal resource assignment. We developed compositional schedulability analysis techniques, involving a periodic workload set under EDF/RM scheduling and a periodic/bounded-delay resource model. In addition, we provided compositional design optimization techniques for finding sub-optimal solutions in determining the usages of various scarce resources at design time. Even though our proposed framework lays a groundwork for the componentbased design and analysis on timing and resource aspects, we identify several outstanding problems to be addressed for putting the proposed framework into practical use and suggest research areas to which our techniques can applied for further development.
197
Bibliography [1] SNU
real-time
benchmark
suite.
Technical
report,
http://archi.snu.ac.kr/realtime/benchmark. [2] T. Abdelzaher, S. Prabh, and R. Kiran. On real-time capacity limits of multihop wireless sensor networks. In Proc. of IEEE Real-Time Systems Symposium, pages 359–370, December 2004. [3] Airlines Electronic Engineering Committee. ARINC Specification 653-1. Aeronautical Radio, INC., Annapolis, MD, 2003. [4] L. Almeida and P. Pedreiras. Scheduling within temporal partitions: response-time analysis and server design. In Proc. of the Fourth ACM International Conference on Embedded Software, September 2004. [5] N. Audsley, A. Burns, M. Richardson, and A. Wellings. Applying new scheduling theory to static priority pre-emptive scheudling. Software Engineering Journal, 8(5):284–292, 1993.
198
[6] N. Audsley, A. Burns, and A. Wellings. Deadline monotonic scheduling theory and application. Control Engineering Practice, 1(1):71–78, 1993. [7] H. Aydin, R. Melhem, D. Mosse, and P. Alvarez. Optimal reward-based scheduling for periodic real-time tasks. In Proc. of IEEE Real-Time Systems Symposium, December 1999. [8] H. Aydin, R. Melhem, D. Mosse, and P. Alvarez. Dynamic and aggressive scheduling techniques for power-aware real-time systems. In Proc. of IEEE Real-Time Systems Symposium, December 2001. [9] S. Baruah, R. Howell, and L. Rosier. Algorithms and complexity concerning the preemptive scheduling of periodic, real-time tasks on one processor. Journal of RealTime Systems, 2:301–324, 1990. [10] S. Baruah, A. Mok, and L. Rosier. Preemptively scheduling hard-real-time sporadic tasks on one processor. In Proc. of IEEE Real-Time Systems Symposium, pages 182– 190, December 1990. [11] G. Bernat, A. Burns, and A. Llamosi. Weakly hard real-time systems. IEEE Transactions on Computers, 50(4):308–321, 2001. [12] E. Bini and G. C. Buttazzo. The space of rate monotonic schedulability. In Proc. of IEEE Real-Time Systems Symposium, December 2002. [13] G. Buttazzo. Hard Real-Time Computing Systems. Kluwer Academic Publishers, 2000. 199
[14] G. Buttazzo. Rate Monotonic vs. EDF: Judgment day. Journal of Real-Time Systems, 29(1):5–26, 2005. [15] A. P. Chandrakasan, S. Sheng, and R. W. Brodersen. Low-power digital CMOS design. IEEE Journal of Solid State Circuits, 27(4):473–484, April 1992. [16] Z. Deng and J. W.-S. Liu. Scheduling real-time applications in an open environment. In Proc. of IEEE Real-Time Systems Symposium, pages 308–319, December 1997. [17] J. K. Dey, J. Kurose, and D. Towsley. On-line scheduling policies for a class of IRIS (Increasing Reward with Increasing Service) real-time tasks. IEEE Transactions on Computers, 45(7):802–813, 1996. [18] J. K. Dey, J. Kurose, D. Towsley, and M. Girkar. Efficient on-line processor scheduling for a class of IRIS (increasing reward with increasing service) real-time tasks. In Proceedings of ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, May 1993. [19] S. Dhall and C. Liu. On a real-time scheduling problem. Operations Research, 26(1):127–140, 1978. [20] S. Eberle, C. Ebner, W. Elmenreich, G. F¨arber, P. G¨ohner, W. Haidinger, M. Holzmann, R. Huber, R. Schlatterbeck, H. Kopetz, and A. Stothert. Specification of the TTP/A protocol v2.00. Research Report 61/2001, Technische Universit¨at Wien, Institut f¨ur Technische Informatik, Treitlstr. 1-3/182-1, 1040 Vienna, Austria, 2001.
200
[21] E. Farcas, C. Farcas, W. Pree, and J. Templ. Transparent distribution of real-time components based on logical execution time. In Proc. of the Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES’05), 2005. [22] X. Feng and A. Mok. A model of hierarchical real-time virtual resources. In Proc. of IEEE Real-Time Systems Symposium, pages 26–35, December 2002. [23] S. Furber. ARM System Architecture. Addison Wisley, New York, NY, 1996. [24] M. R. Garey and D. S. Johnson. Computers and Interactability: A Guide to Theory of NP-Completeness. W. H. Freeman and Company, 1979. [25] R. Gerber, S. Hong, and M. Saksena. Guaranteeing real-time requirements with resource-based calibratio n of periodic processes. IEEE Transactions on Software Engineering, 21(7):579–592, July 1995. [26] J. Gosling, B. Joy, and G. Steele. The Java Language Specification. Addison-Wesley, Reading, MA, 1996. [27] L. Goudge and S. Segars. Thumb: Reducing the cost of 32-bit RISC performance in portable and consumer applications. In Proc. of the 1996 COMPCON, September 1996. [28] P. Goyal, X. Guo, and H. M. Vin. A Hierarchical CPU Scheduler for Multimedia Operating Systems. In Usenix Association Second Symposium on Operating Systems Design and Implementation (OSDI), pages 107–121, 1996.
201
[29] F. Gruian. Hard real-time scheduling using stochastic data and DVS processors. In Proceedings of the Internatinoal Symposium on Low-Power Electronics and Design (ISLPED), pages 46–51, Huntington Beach, CA, August 2001. [30] M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the 4th IEEE Annual Workshop on Workload Characterization, December 2001. [31] A. Halambi, A. Shrivastava, P. Biswas, N. Dutt, and A. Nicolau. An efficient compiler technique for code size reduction using reduced bit-width isas. In Proceedings of Design Automation and Test in Europe (DATE ’02), 2002. [32] M. Hamdaoui and P. Ramanathan. A dynamic priority assignment technique for streams with (m, k)-firm deadlines. IEEE Transactions on Computers, 44(12):1443– 1451, 1995. [33] F. Hanssen and P. Jansen. Real-time communication protocols: an overview. Technical report, Centre for Telematics and Information Technology, 2003. [34] J. Hennessy and D. Patterson. Computer architecuture - a quantitive approach. Morgan Kaufmann, 2nd edition, 1996. [35] T. Henzinger, C. Kirsch, and S. Matic. Composable code generation for distributed giotto. In Proc. of the Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES’05), 2005. 202
[36] I. Hong, G. Qu, M. Potkonjak, and M. Srivastava. Synthesis techniques for low-power hard real-time systems on variable voltage processors. In Proc. of IEEE Real-Time Systems Symposium, pages 178–187, 1998. [37] T. Ishihara and H. Yasuura. Voltage scheduling problem for dynamically variable voltage processors. In Proceedings of the International Symposium on Low Power Electronics and Design, pages 197–202, August 1998. [38] K. Kim, J. Diaz, L. Bello, J. Lopez, C.-G. Lee, and S. Min. An exact stochastic analysis of priority-driven periodic real-time systems and its approximations. IEEE Transactions on Computers, 54(11):1460 – 1466, November 2005. [39] W. Kim, J. Kim, and S. Min. A dynamic voltage scaling algorithm for dynamicpriority hard real-time systems using slack time analysis. In DATES, 2002. [40] W. Kim, J. Kim, and S. L. Min. Preemption-aware dynamic voltage scaling in hard real-time systems. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), pages 393–398, August 2004. [41] H. Kopetz. Real-time Systems: Design Principles for Distributed Embedded Applications. Kluwer Academic Publishers, 1997. [42] C. M. Krishna and Y.-H. Lee.
Voltage-clock-scaling adaptive scheduling tech-
niques for low power in hard real-time systems. IEEE Transactions on Computer, 52(12):1586–1593, 2003.
203
[43] T.-W. Kuo and C. Li. A fixed-priority-driven open environment for real-time applications. In Proc. of IEEE Real-Time Systems Symposium, pages 256–267, December 1999. [44] W. Kwon and T. Kim. Optimal voltage allocation techniques for dynamically variable voltage processors. In Proceedings of the IEEE/ACM Design Automation Conference (DAC), pages 125–130, 2003. [45] W.-C. Kwon and T. Kim. Optimal voltage allocation techniques for dynamically variable voltage processors. ACM Transactions on Embedded Computing Systems, 4(1):211–230, 2005. [46] C. Lee, J. Lehoczky, R. Rajkumar, and D. Siewiorek. On quality of service optimization with discrete QoS options. In Proc. of IEEE Real-Time Technology and Applications Symposium, June 1998. [47] C. Lee, M. Potkonjak, and W. H. Mangione-Smith. MediaBench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of the 30th Annual International Symposium on Microarchitecture, pages 330–335, December 1997. [48] S. Lee, J. Lee, S. L. Min, J. Hiser, and J. W. Davidson. Code generation for a dual instruction set processor based on selective code transformation. In Proceedings of the 7th International Workshop on Software and Compilers for Embedded Systems (SCOPES), pages 33–48, Vienna, Austria, September 2003. 204
[49] S. Lee, J. Lee, C. Y. Park, and S. L. Min. A flexible tradeoff between code size and WCET using a dual instruction set processor. In Proceedings of the 8th International Workshop on Software and Compilers for Embedded Systems (SCOPES), pages 244– 258, Amsterdam, The Netherlands, September 2004. [50] J. Lehoczky, L. Sha, and Y. Ding. The rate monotonic scheduling algorithm: exact characterization and average case behavior. In Proc. of IEEE Real-Time Systems Symposium, pages 166–171, 1989. [51] J. Leung and J. Whitehead. On the complexity of fixed-priority scheduling of periodic real-time tasks. Performance Evaluation, 2:37–250, 1982. [52] K.-J. Lin, S. Natarajan, and J. W.-S. Liu. Imprecise results: utilizing partial computations in real-time systems. In Proc. of IEEE Real-Time Systems Symposium, December 1987. [53] T. Lindholm and F. Yellin. The Java Virtual Machine Specification. Addison-Wesley, Reading, MA, 1996. [54] G. Lipari and S. Baruah. A hierarchical extension to the constant bandwidth server framework. In Proc. of IEEE Real-Time Technology and Applications Symposium, pages 26–35, May 2001. [55] G. Lipari and E. Bini. Resource partitioning among real-time applications. In Proc. of Euromicro Conference on Real-Time Systems, July 2003.
205
[56] C. Liu and J. Layland. Scheduling algorithms for multi-programming in a hard-realtime environment. Journal of the ACM, 20(1):46 – 61, 1973. [57] P. Mejia-Alvarez, E. Levner, and D. Moss. Adaptive scheduling server for poweraware real-time tasks.
ACM Transactions on Embedded Computing Systems,
3(2):284–306, 2004. [58] A. Mok. Fundamental design problems of distributed systems for the hard-real-time environment. Technical report, Ph.D. dissetation, MIT, 1983. [59] A. Mok and X. Feng. Towards compositionality in real-time resource partitioning based on regularity bounds. In Proc. of IEEE Real-Time Systems Symposium, pages 129–138, December 2001. [60] A. Mok, X. Feng, and D. Chen. Resource partition for real-time systems. In Proc. of IEEE Real-Time Technology and Applications Symposium, pages 75–84, May 2001. [61] J. C. Palencia and M. G. Harbour. Response time analysis of EDF distributed realtime systems. In Journal of Embedded Computing. Cambridge International Science Publishing, December 2003. [62] P. Pillai and K. G. Shin. Real-time dynamic voltage scaling for low-power embedded operating systems. In Proceedings of 18th ACM Symposium on Operating Systems Principles (SOSP ’01), October 2001.
206
[63] G. Quan and X. S. Hu. Energy efficient fixed-priority scheduling for real-time systems on variable voltage processors. In Proceedings of the Design Automation Conference (DAC), pages 823–833, Las Vegas, NV, June 2001. [64] R. Rajkumar, C. Lee, J. Lehoczky, and D. Siewiorek. A resource allocation model for QoS management. In Proc. of IEEE Real-Time Systems Symposium, December 1997. [65] R. Rajkumar, C. Lee, J. Lehoczky, and D. Siewiorek. Practical solutions for QoSbased resource allocations. In Proc. of IEEE Real-Time Systems Symposium, December 1998. [66] P. Ramanathan. Overload management in real-time control application using (m, k)firm guarantees. IEEE Transactions on Parallel and Distributed Systems, 10(6), June 1999. [67] J. Regehr and J. Stankovic. HLS: A framework for composing soft real-time schedulers. In Proc. of IEEE Real-Time Systems Symposium, pages 3–14, December 2001. [68] C. Rusu, R. Melhem, and D. Mosse. Maximizing the system value while satisfying time and energy constraints. In Proc. of IEEE Real-Time Systems Symposium, pages 246–255, December 2002. [69] S. Saewong and R. Rajkumar. Practical volate-scaling for fixed-priority RT-systems. In Proc. of IEEE Real-Time Technology and Applications Symposium, pages 106–115, May 2003.
207
[70] S. Saewong, R. Rajkumar, J. Lehoczky, and M. Klein. Analysis of hierarchical fixedpriority scheduling. In Proc. of Euromicro Conference on Real-Time Systems, June 2002. [71] L. Sha, R. Rajkumar, and J. P. Lehoczky. Real-time synchronization protocol for multiprocessors. RTSS, 1988. [72] L. Sha, R. Rajkumar, and J. P. Lehoczky. Priority inheritance protocols: An approach to real-time synchronization. IEEE Transactions on Computers, pages 1175–1185, 1990. [73] W.-K. Shih, J. W.-S. Liu, and J.-Y. Chung. Algorithms for scheduling imprecise computations to minimize total error. IEEE Computer, 24(5):58 – 68, May 1991. [74] W.-K. Shih, J. W.-S. Liu, J.-Y. Chung, and D. Gillies. Scheduling tasks with ready times and deadlines to minimize average error. ACM Operating Systems Review, 23(3):14–28, July 1989. [75] I. Shin and I. Lee. Periodic resource model for compositional real-time guarantees. In Proc. of IEEE Real-Time Systems Symposium, pages 2–13, December 2003. [76] I. Shin and I. Lee. Compositional real-time scheduling framework. In Proc. of IEEE Real-Time Systems Symposium, December 2004. [77] I. Shin, I. Lee, and S. Min. Embedded system design framework for minimizing code size and guaranteeing real-time requirements. In Proc. of IEEE Real-Time Systems Symposium, pages 201–211, December 2002. 208
[78] Y. Shin, K. Choi, and T. Sakurai. Power optimization of real-time embedded systems on variable speed processors. In Proc. of International Conference on ComputerAided Design, pages 365–368, 2000. [79] M. Spuri. Analysis of deadline scheduled real-time systems. Technical Report: RR2772, INRIA, 1996. [80] D. Sweetman. See MIPS Run. Morgan Kaufmann, San Francisco, CA, 1999. [81] K. Tindell. Adding time-offsets to schedulability analysis. Technical Report: YCS 221, Dept. of Computer Science, University of York, York, England, January 1994. [82] K. Tindell, A. Burns, and A. Wellings. An extendible approach for analysing fixed priority hard real-time tasks. Real-Time Systems, 6(2):133–151, 1994. [83] F. Yao, A. Demers, and S. Shenker. A scheduling model for reduced CPU energy. In Proc. of IEEE Annual Foundations of Computer Science, pages 374–382, 1995.
209