Achieving Ultra Low Power in Embedded Systems

0 downloads 0 Views 997KB Size Report
CPU Core. Peripherals. RTC. I/O. EBI. Clocking. Flash. SRAM. 0. 1000. 2000. 3000. 4000. 5000 ... Clocks could be enabled/disabled by switching Power Modes.
Achieving Ultra Low Power in Embedded Systems Understand where your power goes and what you can do to make things better Herman Roebbers, 28.02.2018

AGENDA • 1 Introduction

• 4 Hardware mechanisms

• 7 Conclusions

• 2 Where does my energy • 5 Effect of compiler

•8

• 3 Factors influencing

•9

go?

energy consumption

settings

• 6 Software development

2 Embedded World Conference, February 28, 2018 © H. W. Roebbers

strategy

1. Introduction

WE WANT TO REDUCE OUR ENERGY CONSUMPTION

Why? • It’s the law (EnergyStar and successors) • Customers require it (e.g. automotive) • Replacing batteries is expensive / impossible • Batteries are an environmental hazard • We want the battery to last 10 years • We want to be able to operate without batteries (e.g. IoT edge devices)

4 Embedded World Conference, February 28, 2018 © H. W. Roebbers

WE WANT TO REDUCE OUR ENERGY CONSUMPTION

How? • Understand where energy is consumed • Understand general energy reduction mechanisms • Understand our particular hardware • Let CPU and unused parts of the system sleep as long as possible • Make the software use available energy reduction mechanisms • Make software event driven • Understand battery technologies and properties • Use energy harvesting to replace battery/extend battery life (optional) 5 Embedded World Conference, February 28, 2018 © H. W. Roebbers

2. Øivind • Where does my energy go?

SLIGHTLY DETAILED HARDWARE VIEW OF A TYPICAL SYSTEM. Component specifics will be System examined hereafter

Power Mgt IC(s)

Energy

VCORE

MCU / SoC

I/O

VIO

VBAT

VBAT VMEM

7 Embedded World Conference, February 28, 2018 © H. W. Roebbers

I/O External Memory / Devices

REGULATORS ALSO CONSUME ENERGY.

There are big differences in regulator efficiency. The fewer output voltages the better! Efficiency > 97 % is possible for switching regulators (more expensive).

Power Mgt IC(s)

Energy

VCORE

System

VIO

VBAT

VBAT

8 Embedded World Conference, February 28, 2018 © H. W. Roebbers

VMEM

MORE DETAILED HARDWARE VIEW OF THE SYSTEM.

Component specifics will be examined hereafter

Energy

Power Mgt IC(s)

VCORE

VBAT

9 Embedded World Conference, February 28, 2018 © H. W. Roebbers

VIO

System MCU / SoC

CPU Core Clocking I/O I/O SRAM Flash

Peripherals RTC EBI VMEM

I/O External Memory / Devices

CORE ENERGY CONSUMPTION MCU / SoC CPU Core Clocking

Power = VCORE * ICORE VCORE depends on FCORE ICORE is not linear with VCORE

I/O Flash Peripherals

From MSP430FR58671 data sheet:

RTC

300 250 200

mA 150

ILPM0 (microA) vs. FSMCLK (MHz) @ Vcc = 2.2V

100

ILPM0 (microA) vs. FSMCLK (MHz) @ Vcc = 3.0V

50 0 0

5

10

MHz 15

10 Embedded World Conference, February 28, 2018 © H. W. Roebbers

SRAM

20

EBI

MCU MAY HAVE “OPERATING POINTS” CF. MSP430F5438A

MCU / SoC CPU Core Clocking I/O Flash

SRAM

Peripherals RTC

For a given FCORE a minimum VCORE is required Power = VCORE * FCORE * (mA / MHz) figure for FCORE 11 Embedded World Conference, February 28, 2018 © H. W. Roebbers

EBI

MCU MAY HAVE “OPERATING POINTS” MCU / SoC

CF. MSP430F5438A 10000 9000 8000 7000

mA

CPU Core Clocking

@ 25 MHz

I/O @ 16 MHz

6000

5000 4000

Flash Execution from Flash Execution from RAM

3000 2000 1000 0

For a given FCORE a minimum VCORE is required Power = VCORE * FCORE * (mA / MHz) figure for FCORE. Not linear! 12 Embedded World Conference, February 28, 2018 © H. W. Roebbers

SRAM

Peripherals RTC

EBI

CLOCKS

MCU / SoC CPU Core

• The higher the clock frequency, the more energy is spent • So: use minimum CPU clock frequencies? - It depends! Many times faster is better - If Icore is non-linear with Vcore: sweet spot could be anywhere between Fmin and Fmax!

• • • • • • • 13

Clocking I/O Flash

SRAM

Peripherals RTC

Peripheral clock settings may be interrelated PLL’s tend to consume a lot (EFM32 has none) See what is possible with just RTC crystal Some MCU’s can switch clocks on and off event based (e.g. EFM32 under DMA) USB tends to require 48 MHz clock, limiting options Clocks could be enabled/disabled by switching Power Modes Clocking can be quite elaborate, so study the clock tree to get best results in power Embedded World Conference, February 28, 2018 © H. W. Roebbers reduction.

EBI

OSCILLATORS

MCU / SoC CPU Core

• Internal RC oscillators

Clocking

I/O - No external components SRAM Flash - Tend to use less power than external crystal or oscillator Peripherals - Very low start-up time RTC EBI - Generally not as stable as external crystal o Drift (temperature, aging) o Precision - May be calibrated in factory - May use factory calibration data - Lately some are good enough to satisfy USB tolerance spec.

14 Embedded World Conference, February 28, 2018 © H. W. Roebbers

OSCILLATORS

MCU / SoC CPU Core

• External oscillator/crystal

- External components add cost o o

Crystal + 2 capacitors + PCB space Crystal oscillator (more board space, cost)

Clocking I/O Flash

Peripherals

- Tends to use more power than internal RTC oscillator - More startup time - More precise and stable than internal oscillator o Drift (temperature, aging) o Precision - Sometimes required to satisfy USB timing requirements. 15 Embedded World Conference, February 28, 2018 © H. W. Roebbers

SRAM EBI

PLL

MCU / SoC CPU Core

• Multiply frequency - Usually followed by dividers

• Allows many high frequencies to be generated from single crystal • PLLs tend to consume a lot (EFM32 has none) • PLLs may require long startup time - Wakeup time becomes significant (msecs)

16 Embedded World Conference, February 28, 2018 © H. W. Roebbers

Clocking I/O Flash

SRAM

Peripherals RTC

EBI

I/O

MCU / SoC CPU Core

• I/O pin settings can be complex (many options affecting power consumption) • Make distinction between active and sleep mode. Some MCU’s have shadow registers. • Beware of unused pins! -

Can use a lot of power when floating! Can cause spontaneous device reset! Default settings are different per device Usually input with pull-up Sometimes input without pull-up or down (floating!!!) Usually best as output with value 0, without pull-up or pull-down

• Drive strength

- Reduce to required settings (also good for EMI performance) - But not lower or spurious errors can occur (wrong data transfer 17 Embedded World Conference, February 28, 2018 © H. W. Roebbers

Clocking I/O Flash

SRAM

Peripherals

RTC

EBI

MEMORY

MCU / SoC CPU Core

• Flash -

Uses a lot of power compared to SRAM Requires extra internal power regulator So requires some startup time from Power On Sometimes has “sort of” or real cache HW May require wait states at higher frequencies Try to locate often used code in SRAM Sometimes can be switched off to save power Typical access time 12,5 – 125 ns (on-chip)

• SRAM

- Uses very little energy ( < 1 mA Iretention on MSP430) - Typical access time 2-6 ns. Much faster than flash.

• F(e)RAM. Non-volatile

- Read speed 125 ns. Comparable to flash. - Write speed 125 ns. Much faster than flash

18 Embedded World Conference, February 28, 2018 © H. W. Roebbers

Clocking I/O Flash

SRAM

Peripherals

RTC

EBI

MEMORY

MCU / SoC CPU Core

MSP430F5438A figure: • Active mode @ 8 MHz - 230 mA/MHz when running from flash - 110 mA/MHz when running from SRAM

• Off mode (LPM4) - 1.2 mA @ 3.0 V for full RAM retention, Fast Wakeup, Supply Supervisor operational

MSP430FR5{8,9}xx figure for FRAM: • Active mode - 100 mA/MHz when running from FRAM - 60 mA/MHz when running from SRAM 19 Embedded World Conference, February 28, 2018 © H. W. Roebbers

Clocking I/O Flash

SRAM

Peripherals

RTC

EBI

PERIPHERALS

MCU / SoC CPU Core

• On EFM32: Energy Management Unit - Controls energy modes, clocks, RAM retention

• On MSP430: Power Management Unit • On EFM32 use PRS (Peripheral Reflex System) • Timers - Run on minimum necessary clock frequency - Switch off unused timers - On EFM32 use LETIMER

• Direct Memory Access

- Let DMA transfer data instead of CPU

• UART

- Use minimum necessary clock frequency - On EFM32 use LEUART (Low Energy UART) on 32.768 kHz RTC clock up to 9600 bps. It can operate in EM2

20 Embedded World Conference, February 28, 2018 © H. W. Roebbers

Clocking I/O Flash

SRAM

Peripherals

RTC

EBI

REAL-TIME CLOCK

MCU / SoC CPU Core

• • • • • • •

Usually at 32.768 kHz (215 Hz) Divided down to generate 1 interrupt/sec Can have alarm at certain day/time Usually has own crystal Uses very little power May have some non-volatile RAM bytes Usually has separate power domain - Can remain active when rest of MCU is off

• Can generate clock for MCU/peripheral operation -

LE UART LE TIMER LE SENSE RAM access

21 Embedded World Conference, February 28, 2018 © H. W. Roebbers

Clocking I/O Flash

SRAM

Peripherals

RTC

EBI

EXTERNAL BUS INTERFACE

MCU / SoC CPU Core

• Not always present - Not on Cortex-M0(+) - Not on MSP430 - Present on Cortex-M{3,4}

• • • •

Access to external memory Access to external devices Selectable number of address / data lines Selectable control signals (strobes) for

Clocking I/O Flash

SRAM

Peripherals

RTC

- Dynamic memory (SDRAM) - Static memory (Flash, SRAM)

• Drive strength for address / data lines determines power - If multiple devices in parallel or longer PCB traces then more drive strength is needed. - If device load is more capacitive then more strength is required (cf. terminology slide on how to determine) 22 Embedded World Conference, February 28, 2018 © H. W. Roebbers

EBI

Energy VBAT

External Memory VMEM

23 Embedded World Conference, February 28, 2018 © H. W. Roebbers

EXTERNAL MEMORY

• • • • •

Dynamic memory (SDRAM/DDRAM) http://www.samsung.com/global/business/semiconductor/product/consumer-dram/overview http://en.wikipedia.org/wiki/Mobile_DDR http://en.wikipedia.org/wiki/DDR4_SDRAM Every generation uses less power at faster speeds.

Technology

Standard

Mobile / Low Power

Speed

SDRAM

3.3 V

1.8 V

133-200 Mbps

DDR

2.5 V

1.8 V

266-400 Mbps

DDR2

1.8 V

1.8 / 1.2 V

400-1066 Mbps

DDR3

1.5 V/ 1.35V

1.8 / 1.2 V

800-2133 Mbps

DDR4

1.2 V

1.1 V

2133-4266 Mbps

24 Embedded World Conference, February 28, 2018 © H. W. Roebbers

3. Factors influencing energy consumption

FACTORS INFLUENCING ENERGY CONSUMPTION There are many factors and then some…

Application SW OS configuration Printed Circuit Board Processor

Compiler & compiler settings Sensor application Low Power Modes

IP blocks Process Technology

26 Embedded World Conference, February 28, 2018 © H. W. Roebbers

Radio

Radio Technology HW accelerators

Energy consumption

Protocol (Zigbee / BLE / Zwave / WiFi / LoRa, …) Radio Frequency (2.4 GHz, 868 Mhz, 433 MHz, …) Battery technology

FACTORS INFLUENCING ENERGY CONSUMPTION There are many factors and then some… Application SW OS configuration Printed Circuit Board

Processor

Compiler & compiler settings Sensor application Low Power Modes

IP blocks Process Technology

27 Embedded World Conference, February 28, 2018 © H. W. Roebbers

Radio

Radio Technology HW accelerators

Energy consumption

Hardware Protocol (Zigbee / BLE / Zwave / WiFi / LoRa, …) Radio Frequency (2.4 GHz, 868 Mhz, 433 MHz, …)

Battery technology

FACTORS INFLUENCING ENERGY CONSUMPTION There are many factors and then some… Application SW OS configuration Printed Circuit Board

Processor

Compiler & compiler settings Software Sensor application Low Power Modes

IP blocks Process Technology

28 Embedded World Conference, February 28, 2018 © H. W. Roebbers

Hardware

Radio

Radio Technology HW accelerators

Energy consumption

Protocol (Zigbee / BLE / Zwave / WiFi / LoRa, …) Radio Frequency (2.4 GHz, 868 Mhz, 433 MHz, …)

Battery technology

FACTORS INFLUENCING ENERGY CONSUMPTION There are many factors and then some… Application SW OS configuration Printed Circuit Board

Processor

Compiler & compiler settings Software Sensor application HW+SW Low Power Modes

IP blocks Process Technology

29 Embedded World Conference, February 28, 2018 © H. W. Roebbers

Hardware

Radio

Radio Technology HW accelerators

Energy consumption

Protocol (Zigbee / BLE / Zwave / WiFi / LoRa, …) Radio Frequency (2.4 GHz, 868 Mhz, 433 MHz, …)

Battery technology

FACTORS INFLUENCING ENERGY CONSUMPTION There are many factors and then some… Application SW OS configuration Printed Circuit Board

Processor

Compiler & compiler settings Software Sensor application HW+SW Low Power Modes

IP blocks Process Technology

30 Embedded World Conference, February 28, 2018 © H. W. Roebbers

Hardware

Radio

Radio Technology HW accelerators

Energy consumption

Protocol (Zigbee / BLE / Zwave / WiFi / LoRa, …) Radio Frequency (2.4 GHz, 868 Mhz, 433 MHz, …)

Battery technology

POWER MANAGEMENT MECHANISMS BY LEVEL Level

Power Management works at all these levels

Mechanism

Application

Event driven, uses DMA, HW event mechanisms, Low Power Modes, …

Operating system

Power API Operation Performance Points API

Driver

Suspend / resume API

Board

Dynamic Voltage and Frequency Scaling Power Gating via I/O pin Controlling Voltage Regulator via I/O / I2C

Chip

Power Gating (Automatic) Clock Gating Clock Frequency management

Domain

Software

Hardware / Software

Dynamic Power Switching Adaptive Voltage Scaling Static Leakage Management IP block / chip

Power Gating State Retention

IP block / RTL

Automatic power / clock gating

Transistor

Body Bias, FinFet, Sub-Threshold

Substrate

SOI, FD-SOI

31 Embedded World Conference, February 28, 2018 © H. W. Roebbers

Hardware

4. Hardware mechanisms

REDUCING ENERGY CONSUMPTION: HARDWARE-ONLY MECHANISMS

Board Chip IP block / chip IP block / RTL Transistor Substrate 33 Embedded World Conference, February 28, 2018 © H. W. Roebbers

REDUCING ENERGY CONSUMPTION: HARDWARE-ONLY MECHANISMS

V+gated

V+

PERI PHERAL

Gate Control

Clock

Clockgated

PERI PHERAL

Gate Control

Power Gating Active Mode

110 mA/MHz

Sleep Mode

60 mA/MHz

Mode …

500 nA

Shutoff Mode

20 nA

(Automatic) Clock Gating Level

Mechanism

Chip

Power gating Offer Low Energy modes (Automatic) clock gating Clock frequency management Dynamic Power Switching

Energy Modes

Adaptive Voltage Scaling Static Leakage Management Hardware Event System/Router

http://www.mdpi.com/jlpea/jlpea-01-00261/article_deploy/html/images/jlpea-01-00261f1-1024.png

34 Embedded World Conference, February 28, 2018 © H. W. Roebbers

REDUCING ENERGY CONSUMPTION: HARDWARE-ONLY MECHANISMS Level

Mechanism

IP block / chip

Power Gating / State Retention

IP block / RTL

Automatic power / clock gating

Transistor

Select optimum transistor geometry per use case FinFET

TriGate FET Sub-threshold operation

Body Bias Substrate

Silicon-on-Insulator (SOI)

Fully Depleted Silicon-on-Insulator (FD-SOI) 35 Embedded World Conference, February 28, 2018 © H. W. Roebbers

REDUCING ENERGY CONSUMPTION: HARDWARE-SOFTWARE MECHANISMS Level

Mechanism

Board

Dynamic Voltage and Frequency Scaling

Power Gating via I/O pin Controlling Voltage regulator via I/O pin Clock Frequency Management Controlling device shutdown pins via I/O pin

36 Embedded World Conference, February 28, 2018 © H. W. Roebbers

REDUCING ENERGY CONSUMPTION: SOFTWARE MECHANISMS Level

Mechanism

Coding

Coding for minimum energy Coding for speed Cache friendly coding

Operating System

Power API Operating point API Tickless operation

Driver

Use DMA Use HW event mechanism Suspend / resume

37 Embedded World Conference, February 28, 2018 © H. W. Roebbers

5. EFFECT OF COMPILER SETTINGS

E m b e d d e d W o r l d C o n f e r e n c e ,

REDUCING ENERGY CONSUMPTION: SOFTWARE MECHANISMS The ones you didn’t think mattered that much • Compiler - Can make 10’s of % difference

• Compiler settings - Can make 100’s of % difference

• Data and code location Problem: • You cannot predict what settings give best results - So measure!

39 Embedded World Conference, February 28, 2018 © H. W. Roebbers

REDUCING ENERGY CONSUMPTION: SOFTWARE MECHANISMS GCC 4.8.3

Routine

Compiler settings

Run-time ms

Code Size

Current mA

mat_mul_simple

-O2

16.75

192

2.99

165.45

mat_mul_faster

-O2

13.00

224

3.06

129.35

mat_mul_simple

-O1

17.50

188

3.08

165.45

mat_mul_faster

-O1

15.13

200

3.07

152.14

mat_mul_simple

-O3

16.25

192

3.07

165.13

mat_mul_faster

-O3

15.25

244

3.05

152.52

mat_mul_simple

-Os

25.13

140

3.07

253.15

mat_mul_faster

-Os

29.88

168

3.12

307.59?

mat_mul_simple

-O0

69.00

264

3.07

695.20

mat_mul_faster

-O0

64.75

284

3.11

661.35

40 Embedded World Conference, February 28, 2018 © H. W. Roebbers

Energy uJ

6. Software development strategy: Matching Software to Hardware Capabilities

MATCHING SOFTWARE TO HARDWARE CAPABILITIES SOFTWARE DEVELOPMENT STRATEGY Strategy for the software development: • Follow the hardware developments. • Go through a typical use case, where implementation makes quite a difference to energy consumption.

42 Embedded World Conference, February 28, 2018 © H. W. Roebbers

EARLY DAYS

MCU / SoC CPU Core D a t a

D a t a

Memory

Bus Master

Bus

D a t a

Peripherals

43 Embedded World Conference, February 28, 2018 © H. W. Roebbers

Bus slave

POLLING

Standard (naive) behavior when waiting for hardware event: while (! event_occurred()) { /* polling, busy waiting */ }

This keeps the CPU active, as well as the code memory. These are both significant contributors to energy consumption, especially if the code resides in flash memory (leaving aside possible instruction caching).

44 Embedded World Conference, February 28, 2018 © H. W. Roebbers

DATA MOVED BY DMA. COMPLETION INTERRUPT SIGNALS CPU.

MCU / SoC DMA

Compl Int

D a t a

CPU Core D a t a

D a t a

Memory

D a t a

I n t

Peripherals

45 Embedded World Conference, February 28, 2018 © H. W. Roebbers

Bus Masters

Bus

Bus slaves

DATA MOVED BY DMA. COMPLETION INTERRUPT SIGNALS CPU.

Main program: volatile bool done = false; setup_peripherals_and_DMA(); while ( ! done ) { /* Check periodically, only core internal access */ /* Checking periph. registers delays bus access by DMA */ /* This can significantly delay peripheral/DMA operation! */ /* And consumes much more energy. */ __delay_cycles(CHECK_INTERVAL); }

46 Embedded World Conference, February 28, 2018 © H. W. Roebbers

Interrupt handler: void ISR_DMA_done( void ) { /* Clear int. source */ done = true; }

LET CPU SLEEP UNTIL INTERRUPT

MCU / SoC DMA

Compl Int

D a t a

CPU Core D a t a

D a t a

Memory

D a t a

I n t

Peripherals

47 Embedded World Conference, February 28, 2018 © H. W. Roebbers

Bus Masters

Bus

Bus slaves

LET CPU SLEEP UNTIL INTERRUPT

Using interrupt and CPU with sleep support: volatile bool done = false; setup_peripherals_and_DMA(); while (! done) { /* special instruction, CPU sleeps */ wait_for_interrupt(); } Stops the CPU until interrupt occurs, saving energy by • Stopping CPU clock • Stopping accesses to code memory 48 Embedded World Conference, February 28, 2018 © H. W. Roebbers

void ISR_event_occurred(void) { clear_interrupt_source(); done = true; }

HW GENERATES EVENTS, COMPLETION EVENT SIGNALS CPU

MCU / SoC DMA

D a t a

e v e n t s

HW Event “handler”

CPU Core

D a t a

D a t a

D a t a

events

I n t

Peripherals

49 Embedded World Conference, February 28, 2018 © H. W. Roebbers

Bus Masters

Bus D a t a

Memory Bus slaves

HW SUPPORT FOR EVENT PRODUCER/CONSUMER CAN DO THIS:

Event “Handler” is like a switch board for signals Peripheral can be event provider, event consumer or both

50 Embedded World Conference, February 28, 2018 © H. W. Roebbers

DMA done

Conversion done

ADC

Wakeup

Signals

DMA trigger

GPIO

Signals

Start conversion

GPIO goes high

Event matrix: connects producer to consumer(s)

CPU

DMA Memory

HW GENERATES EVENTS, COMPLETION EVENT SIGNALS CPU

setup_hardware_for_event_generation() wait_for_event(); /* CPU instruction, CPU sleeps */

Stops the CPU until HW event occurs, saving energy by • Stopping CPU clock saves energy • Stopping memory accesses to retrieve CPU code • No interrupt overhead

51 Embedded World Conference, February 28, 2018 © H. W. Roebbers

ENERGY CONTROL, MANAGING POWER STATE TRANSITIONS

MCU / SoC DMA D a t a

D a t a

HW Event “handler”

CPU Core

Bus Masters

D a t a

D a t a

Peripherals Bus slaves

52 Embedded World Conference, February 28, 2018 © H. W. Roebbers

Bus D a t a

Memory

D a t a

Energy Control

ENERGY CONTROL, MANAGING POWER STATE TRANSITIONS

Using event mechanism, power/energy control unit and CPU with event sleep support: setup_hardware_for_event_generation(); select_energymode_while_waiting(); wait_for_event(); /* CPU sleeps, lower power mode */

Stops the CPU until HW event occurs, saving energy by • Stopping CPU clock saves energy • Stopping memory accesses to retrieve CPU code • Obviating interrupt overhead • Allowing system to go into deeper sleep, saving more energy Attainable sleep mode depends on actual peripheral(s) used 53 Embedded World Conference, February 28, 2018 © H. W. Roebbers

7. CONCLUSIONS

E m b e d d e d W o r l d C o n f e r e n c e ,

CONCLUSIONS

• Ultra Low Power is a system issue. It is also multi-disciplinary (hardware / software) • Many power reduction mechanisms exist at different system levels

• • • •

There is a stepwise software approach to attaining Ultra-Low Power / energy consumption. It requires thorough understanding of both application and hardware. It involves making tradeoffs. For best results - Understand available hardware, software and mechanisms - Use them wisely

55 Embedded World Conference, February 28, 2018 © H. W. Roebbers

WANT TO KNOW MORE? Email me: [email protected] Tomorrow afternoon, March 1st 2018, there is a 3-hour workshop (Class 13) with more detail. Follow my 2-day hands-on workshop: http://www.hightechinstitute.nl/en/training/software/ultra_low_power_for_internet_of_things/ It is possible to run the workshop on-site

Altran offers various services related to ULP Some icons by Freepik from www.flaticon.com

56 Embedded World Conference, February 28, 2018 © H. W. Roebbers

QUESTIONS?

E m b e d d e d W o r l d C o n f e r e n c e ,

E m b e d d e d W o r l d C o n f e r e n c e ,