using the same launch vehicle. (1) ... propulsion can longitudinally reposition geosynchronous ... (NSSK) of a 10-year, 1800 kg geosynchronous satellite were.
Attacks Power Wall. 3-level Model of Memory. â Main Memory, Local Store, Registers. â Attacks Memory Wall. Large Shared Register File & SW Controlled.
L. Jetto*,â and V. Orsini. Dipartimento di Ingegneria dell'Informazione, UniversitÑ Politecnica delle Marche, Ancona, Italy. SUMMARY. This paper presents a new ...
ABSTRACT. We propose a high-level analytical model for estimating the energy and/or power dissipation in VLSI processor (systolic) array implementations of ...
Department of Computer Science. University of ... environment, using battery-operated portable computers. They need to ... Toshiba proposed the Advanced Configuration and Power .... battery-operated IBM T20 laptop with a single Pentium III.
If one of the queues is empty, a customer of the other queue is served w.p. ... The number of customers arriving at queue j (j = 1,2) during slot k is denoted by aj,k.
This paper specifically compares how Intel and AMD* specify processor power
for ... TDP is measured under the conditions of all cores operating at CPU COF, ...
system is composed of a quad-CPU server with two hardware threads per CPU, ... Pentium IV's on-chip performance monitoring counters are periodically ...
the programming environment for our SDR DSP processor. Fi- nally, we ... Fourth, the development time of new and ..... has built a new best-in-class optimizing ANSI C compiler for the. Sandblaster ..... bastopol, CA, 1st edition, October, 2002.
add to the processor ISA a new instruction that allows us to change the power state of the ..... [HKH01]. Chung-Hsing Hsu, Ulrich Kremer, and Michael Hsiao.
as a free variable. The resulting low-power ... 198. Io'h Intemarional Conference on VLSI Design -January 1997 ... design to realize a processor array configurable for N ,. T, and /3. ..... ing!,â Technical Report, SERC Dept., Indian. Institute of
of Technology in Electrical and Electronics Engineering from Institute of. Technical ...... Three limb core type step-down transformer with delta primary and star grounded secondary. (D1Yg) is used .... Here in this VFD/FQB scheme the modest.
[3] Lau D., Schneider A., Ercegovac, M.D., Villasenor, J., â FPGA-based structures for on-line FFT and DCTâ, Field-Programmable Custom. Computing Machines ...
rithms helped define the wide-SIMD architecture, where the. SIMD width can be ... Fourth generation wireless technology (4G) has been pro- posed to increase ...
Therefore, a good energy-efficiency can be achieved by aggressively reducing the supply voltage [2] but unfortunately this results in lower circuit performance.
Power consumption, register file, out-of-order, MPSoC, com- piler support .... This work is currently being extended to multi-processor environments, where ... written. The write word line drivers drive the write word lines while the bit line ..... s
clock speed setting: the software controller dynamically adjusts processor clock speed to the frame rate (FR) requirements of the incoming multimedia stream.
IDEA with other application programs such as user authentication programs for securing secret information. The arithmetic logic unit (ALU) performs the NOP (no.
Jun 4, 2011 - server level, a queue monitor and an active ratio modulator in the memory ..... age of the queue level qa(
Abstract â A new low-power digital signal processor (DSP) design for a hearing aid system is proposed. A stochastic method was applied to the DSP in order to ...
There is a âchicken switchâ that changes the latch to normal operation if race problems occur in hardware. Designers modified the local-clock buffer circuit to ...
I would like to thank my advisor, Professor Jürgen Teich, for his support and en- ..... addressed: Proper clock domain partitioning with custom clock gating combined ..... and connected over a global, multi-ported register file and form together a .
Wae] Badawy, Guoqiug Zhang, Mike Talley, Michael Weeks and Magdy ..... 1 Michael A. Pratt, C. Henry Chu, and Stephen Wong, "Volume Compression of MRI ...
Managed by OS frequency governor. 3 e-Energy 2010. MHz ... Example: Linux kernel compilation. > Severe .... Latest generation server (Nehalem architecture).
Event-Driven Processor Power Management Jan H. Schönherr Jan Richling
Matthias Werner
Gero Mühl
Communication and Operating Systems Technische Universität Berlin
Operating Systems Group Chemnitz University of Technology
Architecture of Application Systems University of Rostock
Overview > > > > >
State of the art Current problems Proposed solution Evaluation Summary
Data source: BMWi
Jan H. Schönherr
e-Energy 2010
2
Saving Energy: State of the Art > When nothing has to be done > Switch off components > ACPI defines C-states for processors > C0: working > C1..Cn: sleep-states > Managed by OS idle routine ACPI processor power states
> When only a little bit has to be done > Reduce performance of components > ACPI defines P-states for processors > Frequency/voltage combinations > P0: high performance > P1..Pn: lower performance > Managed by OS frequency governor Jan H. Schönherr
MHz
TDP
P0
2600
115.0 W
P1
2100
93.2 W
P2
1700
80.8 W
P3
1400
71.5 W
P4
800
53.2 W
P-states of an AMD Opteron 8435 e-Energy 2010
3
Current Approach: Time-Driven Governor > Example: Linux kernel compilation
Governor
Energy
Performance
329 Wh
Ondemand (10 ms)
368 Wh
Ondemand (200 ms)
475 Wh
Energy consumption for a parallelism degree of 1
> Severe performance degradation > Increased energy consumption Jan H. Schönherr
e-Energy 2010
4
Analyzing Problems: Problem 1 > Governor determines load by polling > Incorrect results in case of > small activity phases/short running processes, and > a low overall degree of parallelism
No increase of frequency Performance degradation 50%
50%
50%
50%
50%
50%
load high
Core 1
low high
Core 2
low
50% Jan H. Schönherr
50%
50%
50%
50%
50%
load
e-Energy 2010
5
Analyzing Problems: Problem 2 > Governor reduces frequency of idling cores > Scheduler favors idle cores
> New load is scheduled on cores with low frequency > in case of a low overall degree of parallelism
Performance degradation 80%
0%
20%
100%
80%
0%
load high
Core 1
low high
Core 2
low
20% Jan H. Schönherr
100%
80%
0%
20%
100%
load
e-Energy 2010
6
Problem: Separation of Concerns > Clear separation between > Scheduling (which task when and where) Scheduler > Energy control (clock frequencies of cores) Governor
> But > The functions mentioned above depend on each other in reality! > Overhead: Governor reconstructs information already available to scheduler
> Energy consumption is a non-functional property > Separation of concerns may raise problems
Jan H. Schönherr
e-Energy 2010
7
Approach: Integration > Non-functional interdependency between scheduling and frequency control Integration instead of separation Combination of scheduler and governor
> Two-step approach 1. Event-driven frequency control by scheduler 2. Adaption of scheduler (future work) > Scheduling in space according to power states of cores > Concentration of load
Jan H. Schönherr
e-Energy 2010
8
Event-driven Frequency Control > Load changes trigger frequency transitions > Purely event-driven: No polling, no consideration of load
> Input events: State change from idle to load or vice versa > Output events: State change from low to high frequency or vice versa idle
> Optimization > Delays for transitions > Minimum time to stay in high or low frequency
low
high
load
Jan H. Schönherr
e-Energy 2010
9
Implementation > Based on current Linux kernel (2.6.32-rc5) > Scheduler interface > Scheduler notifies CPUFreq framework whenever load changes > New scheduler governor > Initiates P-state transitions based on load changes > Modified CPUFreq driver > Driver is now used in atomic contexts > Must be (sleep-)lock-free
Summary > Integration of scheduler and frequency governor > Event-driven frequency control
> Implementation based on Linux > Reduced energy consumption > No performance degradation > Improved interactive behavior
> Promising foundation for future extensions
Jan H. Schönherr
e-Energy 2010
14
Future Work > Event-driven governor > Further evaluation of parameters > Adaption of scheduling in space > Concentration of load > Energy efficiency by considering application behavior > Coordinated control of P-states and C-states
> Further approaches > Consideration of hardware-driven frequency scaling (Intel’s Turbo Boost Technology, AMD’s expected Core Performance Boost) > Consideration of SMT Jan H. Schönherr
e-Energy 2010
15
Evaluation: Intel Core 2 Quad Q9400
> Intel Core 2 Quad Q9400 > Previous generation desktop > 45 nm quad-core processor (codename Yorkfield) > Frequencies: 2.0, 2.33 and 2.67 GHz Jan H. Schönherr
e-Energy 2010
18
Evaluation: Dual Intel Xeon X5570
> Dual Intel Xeon X5570 > > > > Jan H. Schönherr
Latest generation server (Nehalem architecture) Two sockets 45 nm quad-core processors (codename Gainestown) Frequencies: 1.6 – 2.93 GHz in 133 MHz steps, 2/2/3/3 turbo e-Energy 2010
19
Current Approach: Time-Driven Governor > Example: Linux kernel compilation
Governor
Energy
Performance
329 Wh
Ondemand (10 ms)
368 Wh
Ondemand (200 ms)
475 Wh
Energy consumption for a parallelism degree of 1
> Severe performance degradation > Increased energy consumption Jan H. Schönherr
e-Energy 2010
21
Evaluation: Quad AMD Opteron 8435
Governor
Run-Time
Power
Energy
Relative Energy
Relative EDP
Performance
1:03:33 h
310 W
329 Wh
32 Wh
34 Whh
Scheduler
1:05:12 h
298 W
324 Wh
20 Wh
21 Whh
Ondemand (10ms)
1:14:58 h
294 W
368 Wh
18 Wh
23 Whh
Ondemand (200ms)
1:37:32 h
292 W
475 Wh
20 Wh
32 Whh
Powersave
3:09:06 h
284 W
895 Wh
13 Wh
40 Whh
Jan H. Schönherr
e-Energy 2010
22
Evaluation: AMD Phenom 9950
Governor
Run-Time
Power
Energy
Relative Energy
Performance
1:13:39 h
132 W
162 Wh
59 Wh
72 Whh
Scheduler
1:14:35 h
120 W
149 Wh
45 Wh
55 Whh
Ondemand (10ms)
1:19:43 h
117 W
156 Wh
44 Wh
59 Whh
Ondemand (200ms)
1:26:26 h
116 W
167 Wh
46 Wh
66 Whh
Powersave
2:18:43 h
107 W
246 Wh
52 Wh
120 Whh
Jan H. Schönherr
Relative EDP
e-Energy 2010
23
Evaluation: Intel Core 2 Quad Q9400
> Intel Core 2 Quad Q9400 > Previous generation desktop > 45 nm quad-core processor (code-name Yorkfield) > Frequencies: 2.0, 2.33 and 2.67 GHz Jan H. Schönherr
e-Energy 2010
24
Evaluation: Dual Intel Xeon X5570
> Dual Intel Xeon X5570 > > > > Jan H. Schönherr
Latest generation server (Nehalem architecture) Two sockets 45 nm quad-core processors (code-name Gainestown) Frequencies: 1.6 – 2.93 GHz in 133 MHz steps, 2/2/3/3 turbo e-Energy 2010