Achieving Ultra Low Power in Embedded Systems Understand where your power goes and what you can do to make things better Herman Roebbers, 28.02.2018
AGENDA • 1 Introduction
• 4 Hardware mechanisms
• 7 Conclusions
• 2 Where does my energy • 5 Effect of compiler
•8
• 3 Factors influencing
•9
go?
energy consumption
settings
• 6 Software development
2 Embedded World Conference, February 28, 2018 © H. W. Roebbers
strategy
1. Introduction
WE WANT TO REDUCE OUR ENERGY CONSUMPTION
Why? • It’s the law (EnergyStar and successors) • Customers require it (e.g. automotive) • Replacing batteries is expensive / impossible • Batteries are an environmental hazard • We want the battery to last 10 years • We want to be able to operate without batteries (e.g. IoT edge devices)
4 Embedded World Conference, February 28, 2018 © H. W. Roebbers
WE WANT TO REDUCE OUR ENERGY CONSUMPTION
How? • Understand where energy is consumed • Understand general energy reduction mechanisms • Understand our particular hardware • Let CPU and unused parts of the system sleep as long as possible • Make the software use available energy reduction mechanisms • Make software event driven • Understand battery technologies and properties • Use energy harvesting to replace battery/extend battery life (optional) 5 Embedded World Conference, February 28, 2018 © H. W. Roebbers
2. Øivind • Where does my energy go?
SLIGHTLY DETAILED HARDWARE VIEW OF A TYPICAL SYSTEM. Component specifics will be System examined hereafter
Power Mgt IC(s)
Energy
VCORE
MCU / SoC
I/O
VIO
VBAT
VBAT VMEM
7 Embedded World Conference, February 28, 2018 © H. W. Roebbers
I/O External Memory / Devices
REGULATORS ALSO CONSUME ENERGY.
There are big differences in regulator efficiency. The fewer output voltages the better! Efficiency > 97 % is possible for switching regulators (more expensive).
Power Mgt IC(s)
Energy
VCORE
System
VIO
VBAT
VBAT
8 Embedded World Conference, February 28, 2018 © H. W. Roebbers
VMEM
MORE DETAILED HARDWARE VIEW OF THE SYSTEM.
Component specifics will be examined hereafter
Energy
Power Mgt IC(s)
VCORE
VBAT
9 Embedded World Conference, February 28, 2018 © H. W. Roebbers
VIO
System MCU / SoC
CPU Core Clocking I/O I/O SRAM Flash
Peripherals RTC EBI VMEM
I/O External Memory / Devices
CORE ENERGY CONSUMPTION MCU / SoC CPU Core Clocking
Power = VCORE * ICORE VCORE depends on FCORE ICORE is not linear with VCORE
I/O Flash Peripherals
From MSP430FR58671 data sheet:
RTC
300 250 200
mA 150
ILPM0 (microA) vs. FSMCLK (MHz) @ Vcc = 2.2V
100
ILPM0 (microA) vs. FSMCLK (MHz) @ Vcc = 3.0V
50 0 0
5
10
MHz 15
10 Embedded World Conference, February 28, 2018 © H. W. Roebbers
SRAM
20
EBI
MCU MAY HAVE “OPERATING POINTS” CF. MSP430F5438A
MCU / SoC CPU Core Clocking I/O Flash
SRAM
Peripherals RTC
For a given FCORE a minimum VCORE is required Power = VCORE * FCORE * (mA / MHz) figure for FCORE 11 Embedded World Conference, February 28, 2018 © H. W. Roebbers
EBI
MCU MAY HAVE “OPERATING POINTS” MCU / SoC
CF. MSP430F5438A 10000 9000 8000 7000
mA
CPU Core Clocking
@ 25 MHz
I/O @ 16 MHz
6000
5000 4000
Flash Execution from Flash Execution from RAM
3000 2000 1000 0
For a given FCORE a minimum VCORE is required Power = VCORE * FCORE * (mA / MHz) figure for FCORE. Not linear! 12 Embedded World Conference, February 28, 2018 © H. W. Roebbers
SRAM
Peripherals RTC
EBI
CLOCKS
MCU / SoC CPU Core
• The higher the clock frequency, the more energy is spent • So: use minimum CPU clock frequencies? - It depends! Many times faster is better - If Icore is non-linear with Vcore: sweet spot could be anywhere between Fmin and Fmax!
• • • • • • • 13
Clocking I/O Flash
SRAM
Peripherals RTC
Peripheral clock settings may be interrelated PLL’s tend to consume a lot (EFM32 has none) See what is possible with just RTC crystal Some MCU’s can switch clocks on and off event based (e.g. EFM32 under DMA) USB tends to require 48 MHz clock, limiting options Clocks could be enabled/disabled by switching Power Modes Clocking can be quite elaborate, so study the clock tree to get best results in power Embedded World Conference, February 28, 2018 © H. W. Roebbers reduction.
EBI
OSCILLATORS
MCU / SoC CPU Core
• Internal RC oscillators
Clocking
I/O - No external components SRAM Flash - Tend to use less power than external crystal or oscillator Peripherals - Very low start-up time RTC EBI - Generally not as stable as external crystal o Drift (temperature, aging) o Precision - May be calibrated in factory - May use factory calibration data - Lately some are good enough to satisfy USB tolerance spec.
14 Embedded World Conference, February 28, 2018 © H. W. Roebbers
OSCILLATORS
MCU / SoC CPU Core
• External oscillator/crystal
- External components add cost o o
Crystal + 2 capacitors + PCB space Crystal oscillator (more board space, cost)
Clocking I/O Flash
Peripherals
- Tends to use more power than internal RTC oscillator - More startup time - More precise and stable than internal oscillator o Drift (temperature, aging) o Precision - Sometimes required to satisfy USB timing requirements. 15 Embedded World Conference, February 28, 2018 © H. W. Roebbers
SRAM EBI
PLL
MCU / SoC CPU Core
• Multiply frequency - Usually followed by dividers
• Allows many high frequencies to be generated from single crystal • PLLs tend to consume a lot (EFM32 has none) • PLLs may require long startup time - Wakeup time becomes significant (msecs)
16 Embedded World Conference, February 28, 2018 © H. W. Roebbers
Clocking I/O Flash
SRAM
Peripherals RTC
EBI
I/O
MCU / SoC CPU Core
• I/O pin settings can be complex (many options affecting power consumption) • Make distinction between active and sleep mode. Some MCU’s have shadow registers. • Beware of unused pins! -
Can use a lot of power when floating! Can cause spontaneous device reset! Default settings are different per device Usually input with pull-up Sometimes input without pull-up or down (floating!!!) Usually best as output with value 0, without pull-up or pull-down
• Drive strength
- Reduce to required settings (also good for EMI performance) - But not lower or spurious errors can occur (wrong data transfer 17 Embedded World Conference, February 28, 2018 © H. W. Roebbers
Clocking I/O Flash
SRAM
Peripherals
RTC
EBI
MEMORY
MCU / SoC CPU Core
• Flash -
Uses a lot of power compared to SRAM Requires extra internal power regulator So requires some startup time from Power On Sometimes has “sort of” or real cache HW May require wait states at higher frequencies Try to locate often used code in SRAM Sometimes can be switched off to save power Typical access time 12,5 – 125 ns (on-chip)
• SRAM
- Uses very little energy ( < 1 mA Iretention on MSP430) - Typical access time 2-6 ns. Much faster than flash.
• F(e)RAM. Non-volatile
- Read speed 125 ns. Comparable to flash. - Write speed 125 ns. Much faster than flash
18 Embedded World Conference, February 28, 2018 © H. W. Roebbers
Clocking I/O Flash
SRAM
Peripherals
RTC
EBI
MEMORY
MCU / SoC CPU Core
MSP430F5438A figure: • Active mode @ 8 MHz - 230 mA/MHz when running from flash - 110 mA/MHz when running from SRAM
• Off mode (LPM4) - 1.2 mA @ 3.0 V for full RAM retention, Fast Wakeup, Supply Supervisor operational
MSP430FR5{8,9}xx figure for FRAM: • Active mode - 100 mA/MHz when running from FRAM - 60 mA/MHz when running from SRAM 19 Embedded World Conference, February 28, 2018 © H. W. Roebbers
Clocking I/O Flash
SRAM
Peripherals
RTC
EBI
PERIPHERALS
MCU / SoC CPU Core
• On EFM32: Energy Management Unit - Controls energy modes, clocks, RAM retention
• On MSP430: Power Management Unit • On EFM32 use PRS (Peripheral Reflex System) • Timers - Run on minimum necessary clock frequency - Switch off unused timers - On EFM32 use LETIMER
• Direct Memory Access
- Let DMA transfer data instead of CPU
• UART
- Use minimum necessary clock frequency - On EFM32 use LEUART (Low Energy UART) on 32.768 kHz RTC clock up to 9600 bps. It can operate in EM2
20 Embedded World Conference, February 28, 2018 © H. W. Roebbers
Clocking I/O Flash
SRAM
Peripherals
RTC
EBI
REAL-TIME CLOCK
MCU / SoC CPU Core
• • • • • • •
Usually at 32.768 kHz (215 Hz) Divided down to generate 1 interrupt/sec Can have alarm at certain day/time Usually has own crystal Uses very little power May have some non-volatile RAM bytes Usually has separate power domain - Can remain active when rest of MCU is off
• Can generate clock for MCU/peripheral operation -
LE UART LE TIMER LE SENSE RAM access
21 Embedded World Conference, February 28, 2018 © H. W. Roebbers
Clocking I/O Flash
SRAM
Peripherals
RTC
EBI
EXTERNAL BUS INTERFACE
MCU / SoC CPU Core
• Not always present - Not on Cortex-M0(+) - Not on MSP430 - Present on Cortex-M{3,4}
• • • •
Access to external memory Access to external devices Selectable number of address / data lines Selectable control signals (strobes) for
Clocking I/O Flash
SRAM
Peripherals
RTC
- Dynamic memory (SDRAM) - Static memory (Flash, SRAM)
• Drive strength for address / data lines determines power - If multiple devices in parallel or longer PCB traces then more drive strength is needed. - If device load is more capacitive then more strength is required (cf. terminology slide on how to determine) 22 Embedded World Conference, February 28, 2018 © H. W. Roebbers
EBI
Energy VBAT
External Memory VMEM
23 Embedded World Conference, February 28, 2018 © H. W. Roebbers
EXTERNAL MEMORY
• • • • •
Dynamic memory (SDRAM/DDRAM) http://www.samsung.com/global/business/semiconductor/product/consumer-dram/overview http://en.wikipedia.org/wiki/Mobile_DDR http://en.wikipedia.org/wiki/DDR4_SDRAM Every generation uses less power at faster speeds.
Technology
Standard
Mobile / Low Power
Speed
SDRAM
3.3 V
1.8 V
133-200 Mbps
DDR
2.5 V
1.8 V
266-400 Mbps
DDR2
1.8 V
1.8 / 1.2 V
400-1066 Mbps
DDR3
1.5 V/ 1.35V
1.8 / 1.2 V
800-2133 Mbps
DDR4
1.2 V
1.1 V
2133-4266 Mbps
24 Embedded World Conference, February 28, 2018 © H. W. Roebbers
3. Factors influencing energy consumption
FACTORS INFLUENCING ENERGY CONSUMPTION There are many factors and then some…
Application SW OS configuration Printed Circuit Board Processor
Compiler & compiler settings Sensor application Low Power Modes
IP blocks Process Technology
26 Embedded World Conference, February 28, 2018 © H. W. Roebbers
Radio
Radio Technology HW accelerators
Energy consumption
Protocol (Zigbee / BLE / Zwave / WiFi / LoRa, …) Radio Frequency (2.4 GHz, 868 Mhz, 433 MHz, …) Battery technology
FACTORS INFLUENCING ENERGY CONSUMPTION There are many factors and then some… Application SW OS configuration Printed Circuit Board
Processor
Compiler & compiler settings Sensor application Low Power Modes
IP blocks Process Technology
27 Embedded World Conference, February 28, 2018 © H. W. Roebbers
Radio
Radio Technology HW accelerators
Energy consumption
Hardware Protocol (Zigbee / BLE / Zwave / WiFi / LoRa, …) Radio Frequency (2.4 GHz, 868 Mhz, 433 MHz, …)
Battery technology
FACTORS INFLUENCING ENERGY CONSUMPTION There are many factors and then some… Application SW OS configuration Printed Circuit Board
Processor
Compiler & compiler settings Software Sensor application Low Power Modes
IP blocks Process Technology
28 Embedded World Conference, February 28, 2018 © H. W. Roebbers
Hardware
Radio
Radio Technology HW accelerators
Energy consumption
Protocol (Zigbee / BLE / Zwave / WiFi / LoRa, …) Radio Frequency (2.4 GHz, 868 Mhz, 433 MHz, …)
Battery technology
FACTORS INFLUENCING ENERGY CONSUMPTION There are many factors and then some… Application SW OS configuration Printed Circuit Board
Processor
Compiler & compiler settings Software Sensor application HW+SW Low Power Modes
IP blocks Process Technology
29 Embedded World Conference, February 28, 2018 © H. W. Roebbers
Hardware
Radio
Radio Technology HW accelerators
Energy consumption
Protocol (Zigbee / BLE / Zwave / WiFi / LoRa, …) Radio Frequency (2.4 GHz, 868 Mhz, 433 MHz, …)
Battery technology
FACTORS INFLUENCING ENERGY CONSUMPTION There are many factors and then some… Application SW OS configuration Printed Circuit Board
Processor
Compiler & compiler settings Software Sensor application HW+SW Low Power Modes
IP blocks Process Technology
30 Embedded World Conference, February 28, 2018 © H. W. Roebbers
Hardware
Radio
Radio Technology HW accelerators
Energy consumption
Protocol (Zigbee / BLE / Zwave / WiFi / LoRa, …) Radio Frequency (2.4 GHz, 868 Mhz, 433 MHz, …)
Battery technology
POWER MANAGEMENT MECHANISMS BY LEVEL Level
Power Management works at all these levels
Mechanism
Application
Event driven, uses DMA, HW event mechanisms, Low Power Modes, …
Operating system
Power API Operation Performance Points API
Driver
Suspend / resume API
Board
Dynamic Voltage and Frequency Scaling Power Gating via I/O pin Controlling Voltage Regulator via I/O / I2C
Chip
Power Gating (Automatic) Clock Gating Clock Frequency management
Domain
Software
Hardware / Software
Dynamic Power Switching Adaptive Voltage Scaling Static Leakage Management IP block / chip
Power Gating State Retention
IP block / RTL
Automatic power / clock gating
Transistor
Body Bias, FinFet, Sub-Threshold
Substrate
SOI, FD-SOI
31 Embedded World Conference, February 28, 2018 © H. W. Roebbers
Hardware
4. Hardware mechanisms
REDUCING ENERGY CONSUMPTION: HARDWARE-ONLY MECHANISMS
Board Chip IP block / chip IP block / RTL Transistor Substrate 33 Embedded World Conference, February 28, 2018 © H. W. Roebbers
REDUCING ENERGY CONSUMPTION: HARDWARE-ONLY MECHANISMS
V+gated
V+
PERI PHERAL
Gate Control
Clock
Clockgated
PERI PHERAL
Gate Control
Power Gating Active Mode
110 mA/MHz
Sleep Mode
60 mA/MHz
Mode …
500 nA
Shutoff Mode
20 nA
(Automatic) Clock Gating Level
Mechanism
Chip
Power gating Offer Low Energy modes (Automatic) clock gating Clock frequency management Dynamic Power Switching
Energy Modes
Adaptive Voltage Scaling Static Leakage Management Hardware Event System/Router
http://www.mdpi.com/jlpea/jlpea-01-00261/article_deploy/html/images/jlpea-01-00261f1-1024.png
34 Embedded World Conference, February 28, 2018 © H. W. Roebbers
REDUCING ENERGY CONSUMPTION: HARDWARE-ONLY MECHANISMS Level
Mechanism
IP block / chip
Power Gating / State Retention
IP block / RTL
Automatic power / clock gating
Transistor
Select optimum transistor geometry per use case FinFET
TriGate FET Sub-threshold operation
Body Bias Substrate
Silicon-on-Insulator (SOI)
Fully Depleted Silicon-on-Insulator (FD-SOI) 35 Embedded World Conference, February 28, 2018 © H. W. Roebbers
REDUCING ENERGY CONSUMPTION: HARDWARE-SOFTWARE MECHANISMS Level
Mechanism
Board
Dynamic Voltage and Frequency Scaling
Power Gating via I/O pin Controlling Voltage regulator via I/O pin Clock Frequency Management Controlling device shutdown pins via I/O pin
36 Embedded World Conference, February 28, 2018 © H. W. Roebbers
REDUCING ENERGY CONSUMPTION: SOFTWARE MECHANISMS Level
Mechanism
Coding
Coding for minimum energy Coding for speed Cache friendly coding
Operating System
Power API Operating point API Tickless operation
Driver
Use DMA Use HW event mechanism Suspend / resume
37 Embedded World Conference, February 28, 2018 © H. W. Roebbers
5. EFFECT OF COMPILER SETTINGS
E m b e d d e d W o r l d C o n f e r e n c e ,
REDUCING ENERGY CONSUMPTION: SOFTWARE MECHANISMS The ones you didn’t think mattered that much • Compiler - Can make 10’s of % difference
• Compiler settings - Can make 100’s of % difference
• Data and code location Problem: • You cannot predict what settings give best results - So measure!
39 Embedded World Conference, February 28, 2018 © H. W. Roebbers
REDUCING ENERGY CONSUMPTION: SOFTWARE MECHANISMS GCC 4.8.3
Routine
Compiler settings
Run-time ms
Code Size
Current mA
mat_mul_simple
-O2
16.75
192
2.99
165.45
mat_mul_faster
-O2
13.00
224
3.06
129.35
mat_mul_simple
-O1
17.50
188
3.08
165.45
mat_mul_faster
-O1
15.13
200
3.07
152.14
mat_mul_simple
-O3
16.25
192
3.07
165.13
mat_mul_faster
-O3
15.25
244
3.05
152.52
mat_mul_simple
-Os
25.13
140
3.07
253.15
mat_mul_faster
-Os
29.88
168
3.12
307.59?
mat_mul_simple
-O0
69.00
264
3.07
695.20
mat_mul_faster
-O0
64.75
284
3.11
661.35
40 Embedded World Conference, February 28, 2018 © H. W. Roebbers
Energy uJ
6. Software development strategy: Matching Software to Hardware Capabilities
MATCHING SOFTWARE TO HARDWARE CAPABILITIES SOFTWARE DEVELOPMENT STRATEGY Strategy for the software development: • Follow the hardware developments. • Go through a typical use case, where implementation makes quite a difference to energy consumption.
42 Embedded World Conference, February 28, 2018 © H. W. Roebbers
EARLY DAYS
MCU / SoC CPU Core D a t a
D a t a
Memory
Bus Master
Bus
D a t a
Peripherals
43 Embedded World Conference, February 28, 2018 © H. W. Roebbers
Bus slave
POLLING
Standard (naive) behavior when waiting for hardware event: while (! event_occurred()) { /* polling, busy waiting */ }
This keeps the CPU active, as well as the code memory. These are both significant contributors to energy consumption, especially if the code resides in flash memory (leaving aside possible instruction caching).
44 Embedded World Conference, February 28, 2018 © H. W. Roebbers
DATA MOVED BY DMA. COMPLETION INTERRUPT SIGNALS CPU.
MCU / SoC DMA
Compl Int
D a t a
CPU Core D a t a
D a t a
Memory
D a t a
I n t
Peripherals
45 Embedded World Conference, February 28, 2018 © H. W. Roebbers
Bus Masters
Bus
Bus slaves
DATA MOVED BY DMA. COMPLETION INTERRUPT SIGNALS CPU.
Main program: volatile bool done = false; setup_peripherals_and_DMA(); while ( ! done ) { /* Check periodically, only core internal access */ /* Checking periph. registers delays bus access by DMA */ /* This can significantly delay peripheral/DMA operation! */ /* And consumes much more energy. */ __delay_cycles(CHECK_INTERVAL); }
46 Embedded World Conference, February 28, 2018 © H. W. Roebbers
Interrupt handler: void ISR_DMA_done( void ) { /* Clear int. source */ done = true; }
LET CPU SLEEP UNTIL INTERRUPT
MCU / SoC DMA
Compl Int
D a t a
CPU Core D a t a
D a t a
Memory
D a t a
I n t
Peripherals
47 Embedded World Conference, February 28, 2018 © H. W. Roebbers
Bus Masters
Bus
Bus slaves
LET CPU SLEEP UNTIL INTERRUPT
Using interrupt and CPU with sleep support: volatile bool done = false; setup_peripherals_and_DMA(); while (! done) { /* special instruction, CPU sleeps */ wait_for_interrupt(); } Stops the CPU until interrupt occurs, saving energy by • Stopping CPU clock • Stopping accesses to code memory 48 Embedded World Conference, February 28, 2018 © H. W. Roebbers
void ISR_event_occurred(void) { clear_interrupt_source(); done = true; }
HW GENERATES EVENTS, COMPLETION EVENT SIGNALS CPU
MCU / SoC DMA
D a t a
e v e n t s
HW Event “handler”
CPU Core
D a t a
D a t a
D a t a
events
I n t
Peripherals
49 Embedded World Conference, February 28, 2018 © H. W. Roebbers
Bus Masters
Bus D a t a
Memory Bus slaves
HW SUPPORT FOR EVENT PRODUCER/CONSUMER CAN DO THIS:
Event “Handler” is like a switch board for signals Peripheral can be event provider, event consumer or both
50 Embedded World Conference, February 28, 2018 © H. W. Roebbers
DMA done
Conversion done
ADC
Wakeup
Signals
DMA trigger
GPIO
Signals
Start conversion
GPIO goes high
Event matrix: connects producer to consumer(s)
CPU
DMA Memory
HW GENERATES EVENTS, COMPLETION EVENT SIGNALS CPU
setup_hardware_for_event_generation() wait_for_event(); /* CPU instruction, CPU sleeps */
Stops the CPU until HW event occurs, saving energy by • Stopping CPU clock saves energy • Stopping memory accesses to retrieve CPU code • No interrupt overhead
51 Embedded World Conference, February 28, 2018 © H. W. Roebbers
ENERGY CONTROL, MANAGING POWER STATE TRANSITIONS
MCU / SoC DMA D a t a
D a t a
HW Event “handler”
CPU Core
Bus Masters
D a t a
D a t a
Peripherals Bus slaves
52 Embedded World Conference, February 28, 2018 © H. W. Roebbers
Bus D a t a
Memory
D a t a
Energy Control
ENERGY CONTROL, MANAGING POWER STATE TRANSITIONS
Using event mechanism, power/energy control unit and CPU with event sleep support: setup_hardware_for_event_generation(); select_energymode_while_waiting(); wait_for_event(); /* CPU sleeps, lower power mode */
Stops the CPU until HW event occurs, saving energy by • Stopping CPU clock saves energy • Stopping memory accesses to retrieve CPU code • Obviating interrupt overhead • Allowing system to go into deeper sleep, saving more energy Attainable sleep mode depends on actual peripheral(s) used 53 Embedded World Conference, February 28, 2018 © H. W. Roebbers
7. CONCLUSIONS
E m b e d d e d W o r l d C o n f e r e n c e ,
CONCLUSIONS
• Ultra Low Power is a system issue. It is also multi-disciplinary (hardware / software) • Many power reduction mechanisms exist at different system levels
• • • •
There is a stepwise software approach to attaining Ultra-Low Power / energy consumption. It requires thorough understanding of both application and hardware. It involves making tradeoffs. For best results - Understand available hardware, software and mechanisms - Use them wisely
55 Embedded World Conference, February 28, 2018 © H. W. Roebbers
WANT TO KNOW MORE? Email me:
[email protected] Tomorrow afternoon, March 1st 2018, there is a 3-hour workshop (Class 13) with more detail. Follow my 2-day hands-on workshop: http://www.hightechinstitute.nl/en/training/software/ultra_low_power_for_internet_of_things/ It is possible to run the workshop on-site
Altran offers various services related to ULP Some icons by Freepik from www.flaticon.com
56 Embedded World Conference, February 28, 2018 © H. W. Roebbers
QUESTIONS?
E m b e d d e d W o r l d C o n f e r e n c e ,
E m b e d d e d W o r l d C o n f e r e n c e ,