Power-Aware Scheduling for Embedded Real-Time ...

3 downloads 1168 Views 1MB Size Report
running on a dedicated “virtual” processor of speed U i times the speed of the actual ... Distribute unused bandwidth to execute active servers for a longer time. ▫ GRUB (Lipari .... Average. Utilization: 0.5. CPU: Intel. PXA250. Simulation results ...
Pisa, December 11th 2007 PhD Thesis

Dynamic Voltage Scaling for Energy-Constrained Real-Time Systems Claudio Scordino [email protected]

Supervisors: Prof. Giuseppe Lipari Scuola Superiore Sant'Anna Prof. Susanna Pelagatti Università di Pisa

Summary        

Real-time systems Dynamic Voltage Scaling Resource Reservation technique Algorithm GRUB-PA Simulation results Implementation on a test-bed Future work Publications

Real-time systems 

Computing systems in which computational activities must be performed within predefined timing constraints



Implemented as Real-Time Operating Systems (RTOSs) running a set of concurrent tasks



The RTOS guarantees the timely execution of the tasks through some hypothesis about their behaviour

Real-time systems (2)

Timing constraint Actuator Task1

Sensor Actuator Task2

Sensor Real-Time System

Controlled System

Usage of real-time systems 

Widely used in:  Laboratory experiments  Car engines  Nuclear power plants  Chemical stations  Flight systems  Space shuttle  Robotics  Telecommunications  Multimedia

Real-time vs fast computing 

Formal approaches seldomly used



A faster system seems the right solution, but...





It does not ensure that timing constraints will be met



The greater is the computational power, the greater is the consumption of resources (e.g. cost, size, energy) !

Real-time does not mean “fast computing” 

Having the response of the system within the timing requirements is sufficient



Main goal: predictability

Real-time tasks



Cyclic structure



They execute some code and then they block waiting for a timer or for a particular event Can be periodic, sporadic or aperiodic



Modelled as a sequential stream of “jobs”





Jobs of the same task executed in FIFO order

Real-time jobs 

Characterized by a timing constraint (i.e. deadline)



Must finish execution before its deadline, but...



... it has variable execution time    

The actual execution time is unknown Variations up to 87% It can only be “discovered” by executing the job to completion We only know that it cannot be greater than the task worst-case execution time (WCET)

Hard vs Soft real-time tasks 

Hard real-time tasks:  

 



Soft real-time tasks:   



Critical activities whose deadline can never be missed A failure in meeting timing constraints leads to catastrophic consequences Typically used to control or monitor physical devices Air traffic, industrial, chemical, nuclear, safety-critical and military controls

Timing constraints are important but not rigorous A miss does not compromise system integrity QoS in telecommunications and multimedia applications

Many real systems have both kinds of task !!

Real-time scheduling 

Goal of the scheduling algorithm: 



Assign system resources to jobs so that no job will miss its deadline

Our model:   

Preemptive uniprocessor systems Processing fully preemptible at any point The processor is the only resource used by tasks

Parameters of a job cik

rik sik

dik

fik Response Time



rik: release time



sik: start time



fik: completion time



dik: absolute deadline



Dik: relative deadline



cik: computation time

Dik

WCETi ≥ Cik ∀ k ≥ 0

t

Energy-constrained systems 

Embedded systems 

  



Examples: PDAs, autonomous robots, smart phones, sensor networks Most of them are battery-operated devices Battery technology is improving rather slowly Energy-consumption affects autonomy, cost, size, weight, packaging, fault robustness, etc.

Real-time servers and clusters: 

Energy-consumption affects heat produced, cooling mechanisms, packaging, and cost

Dynamic Voltage Scaling (DVS) 

Technique to reduce the energy consumed by the CPU in CMOS circuits 



Change of CPU voltage and frequency at runtime  



P = Pstatic + Pdynamic ∝ f • VDD2

It allows to balance computational speed vs energy consumption Jobs take more time to be executed

On real-time systems it must be used carefully, otherwise some job may miss its deadline

Energy-aware scheduling



Select, at each instant, both the task to be scheduled and the CPU frequency



Goal: minimize energy consumption while meeting realtime constraints

Open issues 

Typical energy-aware algorithms 



Address only one kind of task  Typically, periodic hard real-time tasks  What about sporadic, aperiodic and soft real-time tasks??  Many real-world systems consist of a mixture of different types of real-time tasks !

Weakness wrt WCET evaluation 

On most algorithms, if a task misbehaves, then other tasks may miss their deadlines

Open issues (2) 

No much comparison between existing algorithms



Often the model does not account for 

Energy and time overheads for voltage transition



Energy consumption when CPU is idle

Resource Reservations 

Standard technique for scheduling hard and soft real-time tasks on the same system 

Each task is assigned a server (Ui, Pi)



Ui: minimum guaranteed CPU bandwidth 





Σ Ui ≤ 1

Pi: period of the server (≠ task period)

Server Si receives the same amount of CPU as if it was running on a dedicated “virtual” processor of speed Ui times the speed of the actual processor

Resource Reservations (2) 

Temporal protection property  



A task cannot affect performance guaranteed to another task If a task misbehaves, then it is stopped by postponing its server's deadline

Hard real-time guarantees 

If a hard real-time task τi = (Ci, Ti) is assigned a reservation with Pi ≤ Ti Ui ≥ Ci / T i then the task will meet all its deadlines

Reclaiming unused bandwidth 

If the total bandwidth reserved to tasks is less than 1, it is possible to reclaim the unused bandwidth



Two possibilities 

Distribute unused bandwidth to execute active servers for a longer time  GRUB (Lipari 2000)



Slow down processor to save energy  DVSST (Qadi 2003)  GRUB-PA (Scordino 2004)

GRUB



Aperiodic server with dynamic priorities (EDF based) 



Improvement of CBS (Constant Bandwidth Server)

Can schedule any mixture of hard and soft, and periodic, sporadic or even aperiodic tasks

GRUB (2) 

Each server Si maintains three internal variables: 



A state  inactive, activeContending, or activeNonContending A virtual time Vi 



Measure of how much bandwith Si has consumed

A deadline di 

Priority given to Si (EDF ordering)

GRUB: Server states 

activeContending: 





One or more pending jobs ready to execute

activeNonContending: 

No pending jobs



The bandwidth cannot be reclaimed because the server used part of its future reserved bandwidth (i.e. Vi > t)

Inactive: 

Initial state



No pending jobs



The bandwidth can be reclaimed

GRUB (3)



Global variable Total System Utilization:



The amount of reclaimed bandwidth is (1-U)

GRUB: State Transition diagram 2a

1

Reclaimed Bandwidth

2b

inactive 4

Transition 1 2a 2b 3 4

activeContending

U

3

activeNonContending

Event/Condition

Update

τik arrival τik termination , τi k+1 already arrived τik termination , τi k+1 not arrived τi k+1 arrival

Vi = rik , di = Vi + Pi, U += Ui di = V i + P i

Vi ≤ t

di = V i + P i U -= Ui

GRUB: Example of Scheduling Si

Vi = t di

t

Vi

Active Contending

Inactive

U dt dVi= Ui

Active Non Contending

Active Contending

Inactive U Ui

Active Non Contending

Inactive

t

GRUB: features



To guarantee a minimum performance, it is only necessary to properly set Ui and Pi



Soft real-time tasks (periodic or aperiodic) 

The server parameters can be set depending on the desired level of performance

GRUB: reclaiming features 

Two kinds of spare bandwidth: 





Static: due to the fact that the sum of the bandwidths of all the servers is strictly less than 1 Dynamic: due to hard and soft real-time tasks that execute less than expected

GRUB is able to automatically reclaim all the spare bandwidth  

The amount of reclaimed bandwidth is (1-U) It takes into account both static and dynamic reclaiming

GRUB-PA 

Key idea: if we set the speed to be equal to U, no server will miss its deadline 

Each server will execute for a longer time, but at a slower speed

U 1

100 MHz 400

0.5

200

MHz

0.25

MHz

Intel PXA250

t

GRUB-PA: Example 

Two tasks  

Task 1: sporadic, exec.time = 2-4, minimum interarrival = 8 Task 2: periodic, exec. time = 5, period=10 active

inactive

active

inactive

U1 = 0.5 P1 = 8 0

2

4

6

8

10

12

14

16

18

20

0

2

4

6

8

10

12

14

16

18

20

U2 = 0.5 P2 = 10

U 1 0.5

DVSST GRUB-PA

GRUB-PA: accounting for overhead 

Switching frequency is not for free  



Proposed solution 





On the PXA250 we tested, up to 500 μsec! We must avoid “over-switching”

When U goes up, we increase frequency immediately and set a timer Δ We cannot lower the frequency before the timer Δ has expired

Time overhead    

γ: time needed to change frequency (500 μsec on PXA250) Bandwidth reduction: 2 γ / Δ We can admit new servers up to a total bandwidth of 1 – (2γ / Δ) We set the processor speed to U + (2γ / Δ)

Simulation methodology 

Simulation environment  



CPUs modelled:  



RTSim (http://rtsim.sourceforge.net) GPL license

Intel Xscale PXA250 with 4 levels of speed Transmeta Crusoe TM5800 with 7 levels of speed

Several simulations with  

Variable average load Variable WCET/BCET ratio

RTSim

Simulations: GRUB vs DVSST

CPU: Intel PXA250

Simulations: GRUB vs DVSST (2)

CPU: Transmeta TM5800

-40%

Simulation results

Average Utilization: 0.5

CPU: Intel PXA250

Simulation results (2)

Average Utilization: 0.5

CPU: Transmeta TM5800

Simulation results (3)

WCET/BCET Ratio: 2

CPU: Intel PXA250

Simulation results (4)

WCET/BCET Ratio: 2

CPU: Transmeta TM5800

Simulation results (5)

WCET/BCET Ratio: 4

CPU: Intel PXA250

Simulation results (6)

WCET/BCET Ratio: 4

CPU: Transmeta TM5800

Implementation: the OCERA project 

Financially supported by the European Commission (IST-35102)



Objectives: 





Design and implementation of a library of open source software for the design of embedded real-time systems based on Linux http//www.ocera.org

Participants:       

Universidad Politecnica de Valencia (Coordinators) Scuola Superiore Sant’Anna, Italy Czech Technical University, Czech Rep. Centre pour l’Energie Atomique, France Unicontrols, Czech Rep. MNIS, France Visual Tools, Spain

Implementation (2) Linux tasks Hard real-time tasks

Linux RTLinux Hardware

Support soft real-time tasks in user space

Generic Scheduler Patch



Generic Scheduler Patch   

Small patch applied to the Linux Kernel Intercepts job arrival/termination Exports these events using some hooks

Linux Kernel 2.4

Scheduler

Generic Scheduler Patch

Generic Scheduler Patch (2) 

Exported events:     



New field in task_struct: void* private_data 



block_hook : task blocked (job finished) unblock_hook : task unblocked (new job) fork_hook : new task created cleanup_hook : task terminated setsched_hook : user calls sched_setscheduler() or sched_setparam()

pointer to the real-time private data of the task

New scheduling policy (SCHED_CBS)

The real-time scheduler  

Implemented as a Loadable Kernel Module “Communicates" the scheduling decision to the original Linux scheduler 



It sets the field rt_priority of the dispatched task to the maximum real-time priority (99) + 1

Transparent to the application

Linux Kernel 2.4

Scheduler

Generic Scheduler Patch

Real-time scheduler

Our approach: advantages

1. 2.

It is not intrusive Permits to implement advanced security policies 

3.

It does not assume any periodic behaviour 

4.

Memory protection It works with both periodic and aperiodic tasks

Linux services can be directly accessed by real-time tasks

Test-Bed

Intrinsyc CerfCube250: • 32 MB Flash ROM • 64 MB SDRAM • Intel PXA250 processor

Modified OS:



Linux Kernel 2.4.18

Test application 

Multimedia application:  



Audio stream decoder from Xiph.org Foundation Compressed audio format: Ogg  from 8 to 48 bits  fixed or variable bitrate (16-128 kbps per channel)  44100 Hz / 2 channels

The application code was not modified 

Before start time, we assign a server to the process, with bandwidth U and period P

Task speed vs CPU frequency

Experimental results 

Measure of consumed power : -38%

Evaluation of GRUB-PA 

Simple but effective and robust algorithm



Performance similar to other famous energy-aware algorithms 

Clever at maintaining a constant speed, even for high values of the WCET/BCET ratio



Up to 40% of improvement wrt DVSST

Evaluation of GRUB-PA (2) 

In addition... 



Temporal protection property  A task cannot affect the performance of other tasks Can schedule any mixture of hard and soft, periodic, sporadic or even aperiodic tasks at the same time

Future work 

Extension of GRUB-PA to multiprocessors and multicores 





Resource Reservations for Webservers (e.g. Apache) 

QoS to each server request



Power management

Energy-aware extension of the IRIS algorithm 



Comparison with GRUB-PA

Virtualization (e.g. Xen) and task migration 



Implementation of GRUB-PA in the Condor workload management

Power management in clusters

Implementation of GRUB-PA exploiting the new modular Linux schedulers

Future work (2) 

Exploiting Dynamic Ticks of Linux 



Based on the current real-time workload

Extension of the two-speed model (Chapter 3) 

Algorithm for discrete processor frequencies



Multiprocessor and multicores



Multiple speeds 

PACE algorithm with idle power and time/energy overheads

Publications Journals: 2007

E.Bini, C.Scordino, Optimal Two-Level Speed Assignment for Real-Time Systems, to appear on Internationl Journal of Embedded Systems (IJES), Special issue on Low-Power RealTime Embedded Computing.

2006

C.Scordino, G.Lipari, A Resource Reservation Algorithm for Power-Aware Scheduling of Periodic and Aperiodic Real-Time Tasks, IEEE Transactions on Computers, December 2006.

Conferences and workshops: 2007

L.Abeni, C.Scordino, G.Lipari, L.Palopoli, Serving Non Real-Time Tasks in a Reservation Environment, 9th Real-Time Linux Workshop (RTLW), Linz, November 2007.

2006

G.Lipari, C.Scordino, Linux and Real-Time: Current Approaches and Future Opportunities, International Congress ANIPLA 2006, Rome, Italy, November 2006.

2005

C.Scordino, E.Bini, Optimal Speed Assignment for Probabilistic Execution Times, 2nd Power-Aware Real-Time Workshop (PARC'05), Jersey City, NJ, September 2005

2004

C.Scordino, G.Lipari, Using Resource Reservation Techniques for Power-Aware Scheduling, Proceedings of the 4th ACM International Conference on Embedded Software (EMSOFT'04), Pisa, Italy, September 2004.

2003

C.Scordino, G.Lipari, Energy Saving Scheduling for Embedded Real-Time Linux Applications, 5th Real-Time Linux Workshop (RTLW), Valencia, Spain, November 2003.

Suggest Documents