May 24, 2011 - general, all operations are performed using a set of 32-bit registers (Figure ... The size, in bytes, that an instruction occupies in program memory is very .... The worst case for the 68000 processor is 108 cycles and on x86 can ...
Embedded Systems Conference - ESC Brazil 2011 (UBM), Sao Paulo, May 24, 2011
INTRODUCTION TO LOW POWER RF WITH ARM CORTEX-M0 Dirceu Rodrigues Jr. I. INTRODUCTION
Considering the increasing complexity in applications, embedded systems designers follow the tendency to replace the 8/16 bits processors by the 32 bit ones. The use of innovative architectures and advances in the manufacturing process resulted in high performance devices, greater memory capacity, low cost and energy efficiency. Cortex-M0 (CM0) is the smallest 32-bit ARM processor ever conceived and what features lower power consumption. The core reaches a performance comparable to the ARM7TDMI on only one-third the size and power requirements. These features make it ideal for applications requiring mobility and low cost. The main objective of this work is to show how to leverage its features in battery-operated applications and who require connectivity through radiofrequency (RF); particularly in the medical field, sports and healthcare.
II. POWER DISSIPATION AND LOW CONSUMPTION
The demand for low consumption products was driven with the invention of the bipolar transistor in the late 40 and especially with the advent of battery operated radio receivers. Modern digital integrated circuits, including microcontrollers, are manufactured with CMOS (Complementary Metal-Oxide Semiconductor), mainly due to the very low power dissipation and high density integration. The basic constructive element in this technology is the CMOS inverter, consisting of two MOSFETs (P-channel and N-channel), connected according Figure 1. Different logic gates can be derived by replacing Qp and Qn by an appropriate combination of MOSFETs connected in series or parallel.
Figure 1. The dynamic power associated with an ideal CMOS inverter.
1
Embedded Systems Conference - ESC Brazil 2011 (UBM), Sao Paulo, May 24, 2011
and is associated with the Here, the most relevant power dissipation is known as dynamic power switched mode behavior, depending on the supply voltage , the operating frequency and total capacitance present in the arrangement including those of transistors, binding elements and input section of the circuit to which the output of the inverter is connected. Note the quadratic dependency with the supply voltage causing, for example, a 70% reduction in dynamic power when a 3.3 V voltage is replaced by 1.8 V. In fact:
(1.8)
− (3.3) (3.3)
≅ −70.25%
The SIA (Semiconductor Industry Associaton) predicts that in the next few years, the use of 0.7 V or 0.5 V voltage will be disseminated. Even with such low values, we still have two ways to reduce consumption (or a combination of both): • •
In continuous operation, we can reduce the operating frequency to the lowest possible value. In periodic operation, we can "turn off" or put the circuit in a standby (sleep mode) in specific moments. In this mode only the static power dissipation accounts. Since leakage current is proportional to the number of gates, any economy in logic will have great impact.
III. CORTEX-M0 PROCESSOR The Cortex-M0 RISC processor, here named only as CM0, was designed in 2009 and meets the ARMv6-M specification. Based on the Von Neumann architecture, has a three-stage pipeline. As well as other ARM Cortex-M processors, can be classified as LOAD/STORE architecture. This means that, in general, all operations are performed using a set of 32-bit registers (Figure 2) and the memory is used only as source and destination for data. So, it’s possible to write initialization code directly in C language. It includes a 24-bit Timer (SysTick) particularly useful in multitasking / RTOS systems. The debug is accomplished through the SWD protocol, along the standard JTAG. A set of definitions and standardized functions, called CMSIS (Cortex Microcontroller Software Interface Standard), allows you to access core resources in C language.
Figure 2. Simplified block diagram and register set.
2
Embedded Systems Conference - ESC Brazil 2011 (UBM), Sao Paulo, May 24, 2011
The 4 GB address space is accessed linearly without any added feature such as paging, for example. The byte ordering can be either little-endian or big-endian, but is set by the manufacturer and cannot be changed by the user. Current implementations support the little-endian format. The CM0 core is formed by approximately 12000 logic gates in the minimum configuration, with a 85µ W/MHz (185 nm process) power consumption in active mode [1]. Supports low-power modes (sleep modes) where the processor dissipates a minimum power waiting the occurrence of an event, typically an interruption, to return to active mode. The Cortex-M0 has a lower average current consumption when compared to 8/16-bit microcontroller running similar interrupt routines, i.e. for the same task, the duty-cycle associated with the CM0 will be lower (Figure 3).
Figure 3. Power consumption comparison when running the same ISR.
The impact on energy efficiency will be justified when taking into account the following features:
IV. INSTRUCTION SET The size, in bytes, that an instruction occupies in program memory is very dependent on the architecture considered. Some 8-bit processors have 16 or 24-bit instructions. In a 16-bit processor, as the MSP430, the instructions may have a length ranging from 32 to 64-bit. The Cortex-M0 supports 56 instructions, with fifty from 16 bit Thumb and some 32-bit from Thumb-2 set. An important advantage in supporting just Thumb instructions, rather than the classic ARM Assembly instructions, is that is required only one instruction decoder, resulting in a lower consumption. Most instructions normally used by a program will be 16-bit in length, excluding the jump instruction BL (Branch and Link). In Table 1 are also highlighted the instructions WFE and WFI, both relevant when using low power modes.
Table1. Instructions by size.
Also, the code density reduction when using Thumb instructions impacts in power consumption, since a smaller Flash memory will be required. Table 2 briefly shows the instruction timing:
3
Embedded Systems Conference - ESC Brazil 2011 (UBM), Sao Paulo, May 24, 2011
Instruction Data processing Addition, shifting, logic , etc. Data transfer LOAD/STORE Jumps (when performed)
Clock cycles 1 2 3 4 (BL)
Table 2. Instruction timming.
Thumb instructions, even being of 16-bit, work with 32 bit registers and memory locations. Many microcontrollers have 10, 12 or 16-bit AD converters. The Cortex-M0 is able to manipulate the data for these peripherals (or a 32-bit timer) in less time and using fewer instructions than 8/16-bit processors. The CM0 shows a 0.9 DMIPS/MHz performance (Dhrystone 2.1). For comparison, the original 8051 and 80486DX2 feature 0.0094 DMIPS/MHz and 0.81 DMIPS/MHz values, respectively. As already was mentioned, a higher speed can lead to a reduction in the average consumption when using the RUN/SLEEP technique, that is, the processor can perform the task more quickly and get back to the lowpower mode. Alternatively, you can run the task at a similar time, but with a smaller clock - also with a reduction in consumption.
V. MULTIPLICATION – 32 BIT
The present work is involved with remote sensors, where mathematical operations, signal conditioning and filtering are common tasks. Let's describe in more detail the multiplication operation available in CM0. The instruction MUL, or MULS in UAL syntax (Unified Assembler Language), allows you to multiply two 32-bit values in one clock cycle, avoiding multi byte operations or the presence of dedicated peripherals, such as a hardware multiplier. This instruction allows you to perform calculations in a shorter time, thus reducing consumption in active mode. The support for the multiplication instruction is one of the main justifications when stating that CM0 is an ideal substitute for 8 and 16-bit processors In general, a n x n bit multiplication leads to a 2n bit product and so would be required 64 bits to represent the result; but MULS provides only the 32 least significant bits of the product. It is important to note that the result of MULS does not depend on multiplication be signed or unsigned, since the 32 least significant bits are identical in both cases. Figure 4 shows how an application with 16-bit variables can benefit of this instruction. The values are represented in Q15 fixed-point format with sign extension to 32-bit. So, the result can be is obtained with 32-bit resolution or smaller.
4
Embedded Systems Conference - ESC Brazil 2011 (UBM), Sao Paulo, May 24, 2011
Figure 4. An example of signed 16x16 product.
and to represent the A 64-bit product requires a few multiplications, sums and shifting. Let half-words of (16 bits). The unsigned product is obtained in the following manner: (Figure 5):
=
= (
2
2
+
+(
) ( +
2
+
)2
)
+
Figure 5. Doing a 32 x 32 bit multiplication.
Even with additional operations, CM0 provides the result in a shorter time in comparison with typical 8/16-bit processor.
5
Embedded Systems Conference - ESC Brazil 2011 (UBM), Sao Paulo, May 24, 2011
VI. INTERRUPT SYSTEM
The NVIC module (Nested Vector Interrupt Controller) handles up to 32 interrupt sources and the NMI. There are four programmable priority levels. The interrupt (exception) handler is executed automatically. There’s no need to determine the vector by software. If an interruption is not enabled or if the processor is already running an interruption of equal or higher priority, the request will remain pending. A pending request will be performed when the priority so allow. If during an interrupt, occur other having higher priority, the current will be suspended. This is called nesting (nesting). All the interruptions, excluding the NMI and the hard fault, can be blocked making the PRIMASK register equal to 1. The interrupt latency is defined as the time (measured in clock cycles) between the occurrence of the event and the execution of the first instruction in ISR. In ARM7TDMI latency can vary from 24 to 42 cycles. The worst case for the 68000 processor is 108 cycles and on x86 can reach hundreds of cycles. For the CM0, the latency is around 16 cycles, assuming no memory wait state. Most instructions execute in one cycle, including the multiplication. On other processors, the presence of instructions that require several cycles to run can raise the latency, mainly multiplication, division, and shifting. This is because an interruption may have to wait for the end of one of these long instructions.
VII. LOW POWER MODES
The main goal of the low-power modes is to reduce the average current of processor operation. So, will be possible to: • • • •
Extend battery life. Reduce interference in sensors and analog applications. Reduce interference in wireless applications. Simplify the power supply (specifications, size, cost and so on).
To take advantage of these benefits, the Cortex-M family includes in architecture, the support for two low-power modes: Sleep and Deep-Sleep. The exact behavior of these manners depends on the implementation. For example, in Sleep the clock for core can be removed and operating frequency/voltage of some processor blocks can be reduced. The Deep-Sleep mode can further expand the concept of low consumption, causing the removal of all clock signals and the power shutdown of certain sections, as PLL and Flash memory. Through the SLEEPDEEP bit on SCR (System Control Register), the designer can set the desired mode - Figure 6 [2].
6
Embedded Systems Conference - ESC Brazil 2011 (UBM), Sao Paulo, May 24, 2011
Figure 6. SCR register, highlighting the SLEEPDEEP bit.
The processor can be placed in a low-power mode (dictated by the SLEEPDEEP bit), through the following instructions:
WFE (Wait for Event): The processor returns to active mode or "wake up" on the occurrence of an event. Are considered events: • • • •
An interruption with sufficient priority for execution. An external signal (implementation-dependent). The execution of SEV (Send Event) instruction. A request from debugger.
Also, if SEVONPEND bit is set, any pending request, regardless of priority, becomes an event able to "wake up" the processor. If an event has already occurred when WFE was called, the processor does not enter on sleep mode and will continue execution from the instruction following WFE. Not all manufacturers implement this instruction. CMSIS library syntax: __WFE()
7
Embedded Systems Conference - ESC Brazil 2011 (UBM), Sao Paulo, May 24, 2011 WFI (Wait for Interrupt): The processor enters unconditionally into low-power mode. When an interrupt occurs with sufficient priority to run, the processor return to active mode, however the ISR will only be executed if the PRIMASK is equal to 0 (interrupts enabled). Also, the nested interrupts system allows WFI be called from an ISR. CMSIS library syntax: __WFI () The Sleep-On-Exit feature allows the designer to further reduce the time in active mode and consequently, the power consumption. In this case, when an interrupt finishes and there is no pending one, the processor immediately returns to sleep mode. The processor also doesn't waste time with the stacking and unstacking of registers between interrupts. This feature is enabled setting SLEEPONEXIT bit on SCR register. The operation of WFI instruction and Sleep-On-Exit feature are illustrated in Figure 7.
(a)
(b)
Figure 7. WFI instruction and Sleep-On-Exit feature (b).
The use of Sleep-On-Exit is recommended to applications entirely based on interrupts. So, the processor can spend most of the time “sleeping" as exemplified in Figure 8 (S and U mean stacking and unstacking). The blocks after the ISRs, named as Active Mode, represent instructions following WFI.
Figure 8. Benefits of Sleep-On-Exit.
8
Embedded Systems Conference - ESC Brazil 2011 (UBM), Sao Paulo, May 24, 2011
Note that, when PRIMASK is equal to 1, the occurrence of an interrupt can "wake up" the processor without executing the ISR. This allows you to perform some initialization tasks before entering the ISR. In Cortex-M processors, the presence of the optional module WIC (Wakeup Interrupt Controller) reduces even more the power consumption. The WIC integrates an interrupt detection logic and plays its role when the processor is placed in Deep-Sleep, shutting-down the processor core and NVIC. WIC decreases the static power (leakage) since saves the processor state into a set of flip-flops. When an interruption is detected, it reports to the PMU (Power Management Unit) which turns on the system, so that the NVIC and the processor core can continue the interrupt processing.
When the processor goes into WIC Deep-Sleep mode, the internal timer SysTick stops. So, if an OS uses such feature to run a scheduler (for example), should be started a separate timer with interrupt capability to "wake up" the processor at periodic intervals.
VIII. LOW POWER RF Here we are interested in very low power systems (Low Power RF), targeted especially for monitoring sensors, telemetry and battery operated devices. The ISM frequency band (Industrial, Scientific and Medical) are defined by international organizations such as the ITU (International Telecommunication Union). Are intended for the operation of low-power (and low range) systems not requiring a license. An important consequence is that these must tolerate interference with each other. In Europe, transmissions approved by ETSI (European Telecommunications Standards Institute) can occur in the 868 MHz range. In Brazil the ANATEL agency, through resolution 506 from 2008, regulates the ISM bands and describe such systems as radiation bonded radio communication equipment [3]. The 315 MHz band is not listed as ISM band. In other hand, transmissions in 433 MHz band can be made with radiated power limited to a maximum of 10 mW (e.i.r. p). Table 3 lists the ISM bands, highlighting the bands belonging to the most popular standards.
Table 3. ISM bands in Brazil.
9
Embedded Systems Conference - ESC Brazil 2011 (UBM), Sao Paulo, May 24, 2011
Note that the reason for 2.4 GHz band popularity is related to high bandwidth, but it is congested. Following is a summary of some concepts that are relevant to our work.
IX. ANTENNA GAIN
A hypothetical isotropic antenna is a punctual source that radiates equally in all directions. A real antenna will radiate more energy in some directions than in others. The gain is the amount of energy radiated in one direction compared with the energy radiated by an isotropic antenna in the same direction and under the same transmission power. Usually we are interested in maximum gain, i.e. in the direction in which the antenna is radiating most of the energy. The gain can be expressed in dBi ("i" stands for isotropic). Figure 9 shows the outline of an antenna radiation pattern, highlighting the gain, which should not be confused with signal amplification (impossible to an antenna).
Figura 9. Antenna gain in a given direction.
5). When radio Thus, an antenna with gain 5 (power or energy ratio) shows a 7 dBi gain (10 waves pass through obstacles, the signal weakens. The attenuation increases with increasing frequency. So, for the same transmission power, lower frequency signals achieve greater distances. The simplified Friis equation provides, under idealized conditions, the power available in a receiving antenna: !
=
"! #"! # !
% $ ) 4'(
Where: "! is the transmitted power, #"! and # ! are the antenna transmitting and receiving gains (not in decibels), % is the wavelength and ( is the distance between the antennas. The product "! #"! is called effective radiated power (e.i.r.p). The inverse of the ratio in brackets is called Path Loss (propagation loss in free space) being valid for the far field. In dB:
* = 20
4'( $ ) %
10
Embedded Systems Conference - ESC Brazil 2011 (UBM), Sao Paulo, May 24, 2011
So, for the same distance ( , the loss change between the 2.4 GHz and 900 MHz frequencies is:
∆ * = 20
2400 $ ) ≅ 8.5 ( 900
In order to achieve the same loss in 2.4 GHz, a 900 MHz receiver should be placed a distance 2.67 times greater (20 log 2.67 ≅ 8.5).
X. RF INTEGRATED CIRCUIT
Manufacturers currently offer all the elements of a radio for ISM in the form of a single integrated circuit or SOC (System On Chip). Table 4 lists the main features of the nRF24L01 [4]. The configuration (register set) and data transfer occur through SPI protocol. Figure 10 shows other signals. For example, IRQ signal interrupts the microcontroller, alerting that the data written to TX buffer were passed or are available to be read from the RX buffer (when addresses match and CRC is valid).
Figure 10. Typical MCU-Radio connection.
Parameter Modulation Data rate (RF) Channels Address length CRC length Supply voltage Supply current in Standby-I mode Supply current in Standby-II mode Supply current in Power down Output power Current in TX (0 dBm) Current in TX (-18 dBm) Current in RX (2 Mbps) Current in RX (1 Mbps) Sensitivity in 0.1% BER (2 Mbps) Sensitivity in 0.1% BER (1 Mbps)
Type/Value GFSK 1 or 2 Mbps 125 Up to 40 bit Up to 2 bytes 1.9 to 3.6V 32 µA 320 µA 900 nA Up to 0 dBm 11.3 mA 7 mA 12.3 mA 11.8 mA -82 dBm -85 dBm
Table 4. Main data from the nRF24L01 transceiver.
11
Embedded Systems Conference - ESC Brazil 2011 (UBM), Sao Paulo, May 24, 2011
The powers are given in dBm: ( 12)
= 10
$
(23)
1 45
)
The sensitivity represents the lowest power of the received signal to a given BER (Bit Error Rate). The BER reports the rate at which decoding errors occur (e.g. 1 in 100000). A more negative value in dBm, corresponds to a more sensitive receiver. The sensitivity decreases when increasing the data rate. If N is the packet size in bits, the ideal PER (Packet Error Rate) is: 67 = 1 − (1 − 67)8 The Figure 11 shows the packet fields:
Figure 11. nRF24L01 RF packet.
Preamble: Bit pattern required for synchronization between transceivers. Automatically included in TX and removed in RX. Address: Target identification. Automatically removed in RX Flags: Package ID (PID). Generated and incremented with each new transmission. In RX, if PID and CRC are equal to that one of the last packet, then the packet was resent. Payload: Application data. CRC: Automatically included in TX and removed in RX. The INT PIN only warns MCU if the addresses match and CRC is valid.
This transceiver enables the designer to implement a simple form of Frequency Hopping, the so called Frequency Agility. This method makes it possible to operate in environments with noise or interferences by other transmissions. The frequency (channel) is only changed if the packages are not confirmed by the destination or if the receiver does not detect the synch package from transmitter (beacon). Both look for a valid channel (scan) – a software procedure. The channel change is not continuous, when compared with BlueTooth. An example of RUN/SLEEP program taking advantage of the low-power modes for a RF application, has the structure shown in Figure 12. It’s entirely based on interrupts, with the WFI instruction being issued only at the beginning.
12
Embedded Systems Conference - ESC Brazil 2011 (UBM), Sao Paulo, May 24, 2011
Figure 12. Example of RF application.
If the currents in these modes are unsuitable for a given application or, if the radio does not have a Power Down mode, then it is possible to use external circuits, as suggested in Figure 13. It can be applied also to other components like ADC, amplifiers, etc. The resistor between gate and source, determines the MOSFET turn-off time. The use of the Power Down has a disadvantage that the radio must be re-configured when returning from this mode.
Figure 13. Auxiliar circuits for Power Down.
13
Embedded Systems Conference - ESC Brazil 2011 (UBM), Sao Paulo, May 24, 2011
XI. LOW CONSUMPTION ESTIMATES From the application (Cortex-M0, radio, etc.), it’s possible to estimate some relevant values involving power supply with batteries: (1) As the charge (in mAh) is drained from battery, the internal resistance 79 increases. The Figure 14 illustrates this fact for a typical 3 V CR2032 battery [5]. Consider the :;