Hardware/Software Co-design for A Wireless Sensor Network Platform Chih-Ming Hsieh, Farzad Samie, M. Sammer Srouji, Manyi Wang, Zhonglei Wang and Jörg Henkel Chair for Embedded Systems (CES) Karlsruhe Institute of Technology (KIT), Germany
{hsieh, samie, srouji, manyi.wang, zhonglei.wang, henkel}@kit.edu ABSTRACT Wireless sensor networks have become shared resources providing sensing services to monitor ambient environment. The tasks performed by the sensor nodes and the network structure are becoming more and more complex so that they cannot be handled efficiently by traditional sensor nodes any more. The traditional sensor node architecture, which has software implementation running on a fixed hardware design, is no longer fit to the changing requirements when new applications with complex computation are added to this shared infrastructure due to several reasons. First, the operation behavior changes because of the application requirements and the environmental conditions which makes a fixed architecture not efficient all the time. Second, to collaborate with other already deployed sensor networks and to maintain an efficient network structure, the sensor nodes require flexible communication capabilities. Furthermore, the information required to determine an efficient hardware/software co-design under the system constraints cannot be known a priori. Therefore a platform which can adapt to run-time situations will play an important role in wireless sensor networks. In this paper, we present a hardware/software codesign framework for a wireless sensor platform, which can adaptively change its hardware/software configuration to accelerate complex operations and provides a flexible communication mechanism to deal with complex network structures. We perform real-world measurements on our prototype to analyze its capabilities. In addition, our case studies with prototype implementation and network simulations show the energy savings of the sensor network application by using the proposed design with run-time adaptivity.
Keywords reconfiguration, sensor networks, hardware accelerator, FPGA, low power, multi-radio
1. INTRODUCTION A Wireless Sensor Network (WSN) basically consists of a Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from
[email protected]. ESWEEK ’14, October 12 - 17 2014, New Delhi, India Copyright 2014 ACM 978-1-4503-3051-0/14/10 ...$15.00. http://dx.doi.org/10.1145/2656075.2656086.
group of energy-limited, multi-functional sensors that are distributed in a geographical area to collect information. A WSN can be used in a large number of various applications, such as agricultural, structural monitoring, and different industrial and domestic applications [23]. Developing and deploying such kind of large scale systems using resource constrained sensor nodes involves considerable efforts. Therefore, to reduce the development and deployment cost, it is a common strategy to share the deployed WSNs as an infrastructure to support multiple applications [3, 8, 14] which can be added into the system at run-time and be assigned to certain nodes. While WSNs are interconnected and supporting an increasing number of applications, they are performing more and more complex tasks. For example, some advanced computations for image/audio processing applications including 2-D convolution, Finite Impulse Response(FIR) filter, Fast Fourier Transform (FFT), etc. are widely used in wireless multimedia sensor networks (WMSNs) [4]. Therefore, to meet the requirements introduced by these new applications, a new sensor platform architecture, which is able to perform complex tasks efficiently and adapt to the change of the applications as well as the environment, is necessary. However, designing a sensor node which is able to adapt to the run-time conditions and provides high performance and energy efficient operations accordingly is a very challenging task. Existing work failed to solve this issue due to the following reasons: • Performance constraints: The traditional sensor nodes, which have fixed hardware configuration [1, 12], suffer from performance issues because their softwareimplemented features often take very long time to execute the tasks and cannot exploit efficient duty cycling. • Energy consumption: The existing SRAM-based reconfigurable WSN architectures [20, 21] consume considerable energy in sleep mode and during the transition between active and sleep mode which is not acceptable in WSNs. • Hardware flexibility: The architecture with separated microcontroller and ultra-low power FPGA [19] incurs inflexibility and communication bottleneck because the fixed inter-chip communication is not adaptable and has very limited speed. • Legacy issues: Shared WSNs are becoming an essential part of the Internet of Things (IOT) where all the embedded devices can be accessed from the Internet and provide sensing services [17]. The existing WSNs, however, may not be able to collaborate even when they are geographically close to each other if they use
different wireless communication schemes, such as radio frequencies and modulations. Since these networks should operate for years and provide services anywhere pervasively with the help of energy harvesting devices [6], an inefficient design will lead to significant system lifetime reduction. In this paper, we propose a run-time adaptive WSN platform to address the above issues. Under the resource constraints of the adaptive sensor node, the node can adjust its hardware/software configuration and communication scheme to enhance the performance, improve the energy efficiency, and increase the communication reliability and throughput according to the run-time situation. The key contributions of this work include: 1. We propose HARP, a Highly Adaptive Reconfigurable Platform, with a hardware/software co-design framework to address the key requirements of a run-time adaptive system for WSNs, such as hardware acceleration, low power consumption, run-time reconfiguration and multiple communication schemes. 2. We observe that the run-time adaptivity can provide performance enhancement as well as energy efficiency in WSNs. In fact, depending on the changing requirements of newly added applications, the frequency of event occurrence, or the wireless channel condition, the efficiency of a sensor node can vary significantly. We employ a run-time adaptive scheme on each node which monitors the run-time situation and changes hardware/software configuration dynamically to adapt to the changing situation. 3. We implement the complete hardware and software stack of HARP and analyze the features of HARP with real-world measurement that provides the guidelines for the further investigation of adaptive sensor nodes. To evaluate HARP in a real scenario, we choose two application scenarios based on the distributed acoustic analysis [9, 22] as case studies. We evaluate an acoustic event recognition system in our office on our HARP prototype to demonstrate that the run-time configuration management reduces the average current consumption by 31.27% on average. Then we extend the experiment to an acoustic object tracking application with network simulations to demonstrate that the adaptivity on configuration and communication increases the network lifetime and throughput up to 101% and 104%, respectively. The rest of the paper is structured as follows. In Section 2 an overview of the related work is presented. The problem statement is given in Section 3. In Section 4, an overview of proposed design is described before showing the detail implementation in Section 5. The performance analysis of the platform characteristics and two case studies are presented in Section 6 and 7, respectively. Finally, the paper is concluded in Section 8.
2.
RELATED WORK
Traditional wireless sensor nodes, a.k.a. “motes”, are developed to execute simple monitoring tasks. MICAz [1], for instance, is equipped with a 8-bit Atmel ATmega128L microcontroller to monitor the temperature, humidity, etc. These low-end microcontrollers have been proven to be inefficient on the computation intensive tasks such as 2-D convolution and consume 2.6 times higher energy than 32-bit microcontrollers [7]. More advanced platforms are introduced
recently to cope with more complex operations using ARM Cortex-M3 microcontroller [12]. However, those motes are not efficient or even cannot meet the timing constraints on some critical operations such as acoustic and image processing. This motivates us to investigate a more efficient platform in this work. A modular FPGA-based sensor platform, Cookie, has been presented in [20] which includes four layers (processing, communication, power supply and sensing). The processing layer includes a microcontroller and a Spartan-3 FPGA. An enhanced version with Spartan-6, HiReCookie, was proposed in [21] to meet the requirements of complex processing. The SRAM-based FPGAs they used on the platform have high power consumption and standby/wake-up transition energy. Another reconfigurable sensor platform, called HaLOEWEn, has been introduced in [19]. The platform consists of a System-on-Chip (SoC) and an ultra-low-power IGLOO FPGA. Since the communication interfaces between the microcontroller, the FPGA, and the radio module are not reconfigurable in these platforms, the hardware design needs to be modified to support different applications. These studies focus on performance enhancement on single application and do not discuss the run-time adaptation and its challenges in real world WSNs. In [13], a multi-radio WSN platform has been proposed. It employs four parallel radio transceivers to increase the data transfer rate and decrease communication latency by sending/receiving data simultaneously. Having multiple independent radios, the node can act as a gateway in order to interconnect different WSNs. Although the sensor node is based on an FPGA, they do not support run-time reconfiguration. Since our proposed sensor platform makes use of microcontroller along with reconfigurable FPGA, it supports run-time reconfiguration. The variety of the communicating mechanism (frequency bands, modulation schemes, etc.) makes our platform flexible and reliable for different network structures and channel conditions. In addition, none of the aforementioned studies considers the run-time adaptation on both computation and communication mechanisms as a means of achieving high performance and energy efficiency. In particular, there is no existing work which has an overall consideration of an efficient run-time adaptive sensor node based on a practical platform and applications. As we foresee that an adaptive platform can improve the efficiency for the WSNs which will last for decades, we propose a highly adaptive system and investigate its impact.
3. 3.1
PROBLEM STATEMENT AND DESIGN PRINCIPLES Challenges on Efficient Sensor Nodes
In WSNs, the applications are deployed to perform advanced sensing (e.g. image/video, audio, etc.) and information processing tasks (en-/decoding, recognition, etc.) to accomplish the mission. To prolong the network lifetime, the nodes apply a low duty-cycle policy to stay in low power mode when they are idle. To reduce the duty-cycle, hardware accelerators can be exploited to perform certain operations more quickly and efficiently than general purpose processors/microcontrollers. To demonstrate this effect, we perform an experiment with a real hardware implementation.
0.14
HW-FFT SW-FFT
0.12
C
Current (A)
0.1
S
0.08 0.06
S S
0.04
C
0.02 0 0
5
Time (s)
10
15
Figure 1: Current consumption of software and hardware FFT implementation recorded by Agilent DC Power Analyzer. Software implementation takes additional 0.5 second and 7.7mA (21%) on average. Figure 1 depicts the measured current consumption of the acoustic event detection with hardware and software FFT implementation. Every 5 seconds the node records 16k samples (2 seconds) from the microphone, performs FFT to analyze the data and sends the result using the radio transceiver. As we can see that with the clock gating technique the accelerator only draws higher current when it is operating (it lasts only several micro-seconds after the audio acquisition finishes). The software FFT requires additional 0.5 second to finish the task before entering low power mode. Because of this effect, the software implementation consumes 21% more current on average. Although hardware accelerators gain the benefits on energy efficiency, it is not possible to accommodate all the accelerators at the same time since the computation resource of an FPGA is limited that directly influences the cost of the node. Therefore several challenges need to be addressed: • A fixed hardware configuration cannot fit all situations when new applications are deployed at run-time, because new operations might be needed for the new applications. A node should decide which tasks must be performed on the hardware and the others must be done on software. • In real scenarios under the constraints of the resources (e.g. FPGA size, residual energy, on-board sensors or transceivers, etc.), the time at which the reconfiguration should take place can vary and might result in different levels of efficiency. • The behavior of a sensor node is not only determined by the schedule of applications, but also the events triggered by the sensors or the requests coming from the collaborating nodes. Therefore the efficiency of a configuration is not known at design time. To deal with these challenges, an adaptive platform should be able to monitor the factors which affect the efficiency of the node at run-time and provide the capabilities to decide a hardware/software combination in order to improve the efficiency.
3.2 Motivational Scenario Figure 2 depicts a situation of unbalanced event occurrence for random deployed (e.g. airborne) nodes. In this scenario, an acoustic object tracking application [22] is deployed to the existing network which uses acoustic sensors to detect the object types. It is not always possible to load all the accelerators required by the application as it has to compete with other applications. However, due to the spatial cor-
C
Figure 2: Unbalanced event occurrence. The events triggering the operations depend on the location. The node should adapt its configuration to the events. relation of the object occurrence rate and the location, the tasks that should be performed by the nodes are different according to their locations. For instance, the nodes deployed near the road (node S) should perform FFT to determine the object types and send related information to the node (node C) which is away from the object and has capacity to perform the beamforming technique for location extraction. An adaptive sensor node can monitor the utilization and the efficiency of the tasks running on it and determine which accelerator should be loaded on the FPGA and which channel should be used to communicate.
3.3
Design Principles
Motivated by the challenges described in the previous section, we can summarize the design principles of an efficient adaptive hardware/software co-design for WSNs. • Hardware accelerations which can handle complex application within reasonable time and enter sleep mode to preserve energy. • Low power consumption in active and sleep mode is essential to prolong the lifetime using low-duty-cycle policy. • Run-time reconfiguration is required to adapt to the changing requirements subject to the newly deployed applications, role exchanging in network structure, unbalanced event occurrence, unbalanced energy harvesting rate, or other environmental factors. • Multiple communication schemes should be provided to communicate with existing sensor networks operating in different frequency band and to improve the communication efficiency. To achieve these goals, we propose a run-time adaptive sensor node, HARP, which will be described in details in the following sections.
4.
SYSTEM ARCHITECTURE OF HARP
In this section we give an overview of proposed HARP and describe how the design goals can be achieved. Figure 3 shows the HARP architecture which consists of hardware and software subsystems. Application software utilizes the platform through system service and drivers. We begin with the software components followed by the hardware architecture underneath.
4.1
Software Subsystem
The main components of the software subsystem have been
Figure 3: System overview of hardware/software subsystems depicted in Figure 3. They provide the interfaces between the application software and the hardware. The accelerators which can not be loaded on FPGA should perform their software counterpart on the driver (Acc. Driver). In addition, they have to monitor the environment and the node itself to adjust the configuration and operating modes for better efficiency.
4.1.1
Hardware Adaptation Layer
Hardware adaptation layer provides applications the accessibility to the hardware including peripherals and accelerators. The accelerator drivers implement the corresponding functions with the software while integrating the hardware accelerators with a wrapper. A set of unified programming interfaces is provided to the application software, so that the operations will be performed by the hardware accelerator automatically when it is loaded on the FPGA. In addition, the drivers have the access to the in-situ power meter to profile the energy usage of the functions. This information will be used by the Configuration Manager to determine an efficient configuration to be loaded on the FPGA.
4.1.2
Communication Manager
The condition of the wireless channels changes all the time. Therefore the platform should provide necessary information to the applications in order to select a suitable communication channel. The Communication Manager has the access to the radio drivers to obtain the quality of each link and keeps track of the error rate of the communication. The applications or the network protocols can determine which radio link can be used in a certain condition to improve the reliability and the throughput.
4.1.3
Power Manager
The Power Manager is integrated with a scheduler whose mission is to keep track of the tasks to be executed in the queue. When the queue is empty, the Power Manager can switch the node into corresponding low power state. In addition, if an accelerator is not required by the current operation, the Power Manager can apply clock gating mechanism to that accelerator to reduce the power.
4.1.4
Configuration Manager
The core component to provide run-time adaptivity is the Configuration Manager. In this section, we analyze the energy consumed on a sensor node under typical workloads of
WSNs and explain the design goal of a run-time adaptive system. Typical sense-store-send-sleep applications periodically acquire the samples from the on-board sensors, process them and store the required features, then deliver the results to the gateway for further processing. On the other hand, the event-triggered applications are activated when certain events trigger the on-board sensors or the requests coming from other nodes asking for collaboration. Therefore the number of executions for each function is actually the sum of both types of applications. We formulate the energy consumption as follows: • Let T denote the time period in which the sensor node provides the service. T consists of two parts, Tactive and Tsleep , in which the sensor node is in active and sleep state, respectively. In active state the node performs the tasks using the functions implemented on either software or hardware components and consumes various power (Pactive ) accordingly while in sleep state the node turns off all unnecessary components and draws constant power (Psleep ). • Let F = {f1 , f2 , · · · , fn } denote a set of functions implemented on the node. Each function requires time t and consumes power p to finish it task. Some of the functions, f 0 , can be accelerated by the hardware which requires time t0 , consumes power p0 andPoccupies area s0 on the FPGA under the constraint of s0 ≤ S, where S is the total size of the FPGA. • Assuming a sequential operation on the microcontroller, the energy consumed on the sensor node in time period T can be denoted by ET =
n X i=1
ki · Ti · Pi + T −
n X
ki · Ti · Psleep
(1)
i=1
where ki is the number of executions of fi and ( t 0 · p0 if f 0 is loaded on FPGA Ti · Pi = i i t i · pi otherwise • On a node which can adapt to the environment with run-time reconfiguration, for a time period T + 1 when a new configuration (FT +1 ) takes place, the energy consumption of the reconfiguration should be amortized by the energy saving of the new set of accelerated functions in order to gain the profit, which means ET +1 (FT +1 ) + Ereconf + Eδ ≤ ET +1 (FT )
(2)
where Ereconf is a constant energy consumption of the reconfiguration and Eδ is a threshold of the minimal energy saving above which we can trade higher efficiency of the next time window with the overhead introduced by the reconfiguration. As shown in Equation 2, the reconfiguration energy should be amortized by the energy saving of the new configuration to yield a profitable reconfiguration. Configuration Manager should collect the necessary information to infer a profitable configuration in the future. We should note that the event related parameter k in Equation 1 cannot be known a priori. Nonetheless, the event occurrence tends to have spatial and temporal correlations which might be able to be estimated through historical data [5]. We develop a heuristic strategy to determine a set of accelerators to be loaded on the FPGA based on the information collected by the drivers. This standalone strategy does not consider the workload which can
PC
where si is the size of the accelerator i. The goal of the Configuration Manager is to maximize the energy gain under the size constraint. If the Configuration Manager finds a new configuration which is more efficient than the current one and can amortize the reconfiguration energy, the reconfiguration will take place in the next time window. We highlight the strategy in Algorithm 1. Due to the sorting operation, the time complexity of this approximation is O(n log n) with a worst-case performance ratio equal to 12 , which is suitable for the sensor nodes with limited resources like the energy and the memory.
4.2
Hardware Subsystem
To provide efficient run-time adaptation, the hardware subsystem is built around a coprocessor-based computation unit. Figure 4 shows the hardware architecture of HARP, which is described in the following sections. 1
http://www.itl.nist.gov/div898/handbook/pmc/section3/pmc324.htm
Sensor
ADC
ADC
UART
Memory APB AHB Bus
...
Flash
Accn
Flash*Freeze
Acc1
Bridge
Bus
Amplifier
Power Meter
I2C
SPI
Radio controller
IAP
Cortex-M3 processor
ADC
be mapped and scheduled among different nodes. The application assignment and configuration management involving multiple nodes is more complex and deserves further investigations [11]. The service time of a sensor node is divided into equally spanned time windows, Tw . At the beginning of each Tw , the Configuration Manager determines whether to perform the reconfiguration with a new set of accelerators. As shown in Equation 1, the energy gain, gi , of an accelerator i for each operation is ti · pi − t0i · p0i . Therefore the total energy gain for this time windows, Gi , can be calculated by ki · gi . To smooth the historical event occurrence out, we use exponential moving average (EMA)1 to calculate KT w = α × k + (1 − α) × KT w−1 , {α ∈ R|0 ≤ α ≤ 1} where α > 0.5 is chosen in favor of the recent observation. The size of the FPGA constrains the number of the accelerators to be included in the configuration, therefore we define the efficiency function Ki × gi εi = (3) si
Sensor
Control
Algorithm 1: Configuration Management Initialize: Initialize vector Accel[] with K,g,s collected by the drivers and calculate ε according to equation 3 Initialize: Initialize vector NewConfig[] with 0 Initialize: UsedArea ← 0 sort Accel[] based on its ε; for i = 1 to sizeof( Accel[]) do if UsedArea + Accel[i].s < MAX-FPGA-SIZE then NewConfig[i] ← 1 ; UsedArea ← UsedArea + Accel[i].s; else break; end end OldGain ← calculateEnergyGain(OldConfig[], Accel[]) ; NewGain ← calculateEnergyGain(NewConfig[], Accel[]) ; if NewGain > OldGain + CONF-ENERGY + MIN-GAIN then OldConfig[] ← NewConfig[]; perform reconfiguration ; end
SPI Reconfigurable Computation Unit
Radio Module
Radio Module
Vcc (Power Supply)
Figure 4: Hardware architecture of HARP sensor platform
4.2.1
Reconfigurable Computation Unit
The computation unit consists of a microcontroller and an FPGA. The microcontroller accommodates the software subsystem and provides access to other hardware components. Hardware accelerators are implemented on the FPGA, each of which has an input buffer and an output buffer. The buffers are accessible from microcontroller via a high-speed bus. The software code on microcontroller feeds the input data to the input buffer of hardware accelerator and receives the output results from the output buffer. Hardware accelerators can run parallel to each other. New hardware accelerators can be added to the FPGA after deployment by means of run-time reconfiguration. The FPGA is also in charge of the radio interface management. Radio modules are connected to the SPI cores provided by the FPGA. The controller sends/receives the data to/from the SPIs through the bus. Implementing the controller on the FPGA provides this possibility to exploit both the radio modules at the same time. It is also possible to preprocess the data received from the radio modules on the FPGA simultaneously, so that the microcontroller can be relieved from handling masses of data for some applications. The interconnections between major components such as the FPGA, the microcontroller, the interfaces and the peripherals are reconfigurable. Online reconfiguration provides this possibility to change the architecture based on new conditions, if needed. The system controller provides the interface for the Configuration Manager and the Power Manager to perform run-time reconfiguration and control the low-power mode of HARP, so that the adaptation can be achieved with energy efficiency.
4.2.2
2.4GHz Transceiver
Radio Control
Power Meter
Debugging Interface
4.2.3
Sensing Unit
Sensors are connected to the ADCs which is controlled by the computation unit. Depending on the application requirements, the samples can be acquired on demand, stored on the memory and analyzed either by the microcontroller or the FPGA.
5.
IMPLEMENTATION
In this section, we introduce the hardware and software implementation of HARP in details. Figure 5 shows the complete HARP platform.
5.1
Reconfigurable System-on-Chip (SoC)
Microsemi offers SmartFusion2 which is a low-power, flashbased customizable SoC [18]. This SoC solution integrates an FPGA, a full-feature microcontroller subsystem (MSS) and several analog functions on the same chip. SmartFusion2 has very low static power by providing low power modes for the FPGA, the microcontroller and the peripherals which make it suitable for low duty cycle applications. SmartFusion2 devices use Flash*Freeze (F*F) technology which provides an ultra-low power (< 1mW) static mode for FPGA fabric. In F*F mode, all the SRAM and register information are retained. The FPGA fabric can be dynamically reprogrammed using In-Application Programming (IAP). The embedded Cortex-M3 processor obtains new programming bitstream file and passes it to the system controller. Then, the system controller runs the IAP service to reconfigure the FPGA. The MSS contains a hard 32-Bit Cortex-M3 processor, running at up to 166 MHz. The Coretex-M3 is in charge of controlling peripherals, sensor modules, hardware accelerators and performing dynamic reconfiguration.
5.2
USB/UART
Power Profiler
Power profiling is a key-component for the run-time adaptation. Since the Configuration Manager requires the information about the efficiency of the operations in terms of the energy consumption, the in-situ power meter samples the current consumption of the node and provides the necessary information.
4.2.4
Ethernet
Due to the flexible interconnection provided by the computation unit, it is possible to support multiple radio transceivers. The transceivers can be controlled either by the microcontroller or a customized controller implemented on the FPGA. This also opens the possibility to process the received data on-the-fly in parallel to the microcontroller.
Radio Modules
The wireless communication unit is composed of two low power radio modules, one for short distance transmission and the other for communicating over longer ranges. Two radio transceivers transmit data on two different, noninterfering channels simultaneously. We choose CC2420 and CC1100 radio modules which are widely used in WSN platforms such as Telos, MICAz, MICA, etc. They operate on different frequency bands (2.4 GHz and Sub-1GHz). Hence, they have different transmission range and fading characteristics. The radio transceivers are connected to a SPI interface to communicate and exchange data with the FPGA. A hard-
Reconfigurable SoC
Sub-1GHz Transceiver
USB Host
Sensor Board
Figure 5: The HARP sensor platform. Major parts including reconfigurable SoC, two radio modules, power metering circuit and sensor board are labeled. ware module is implemented on FPGA to get the packets from radio modules and send the data to the microcontroller, and vice versa.
5.3
Power Meter
The power meter is composed of an adjustable series of shunt resistors, a differential amplifier and a low power 12bit ADC, Texas Instruments (TI) ADC121C021. Since the current consumption of sensor platform varies significantly over time and states (sleep mode, active mode, idle mode), the power meter should be able to capture this wide dynamic range. The microcontroller can control this range dynamically by adjusting the resistance of the circuit.
5.4
Sensor Modules
To connect to the most of the commercial sensor modules, we adopt the widely used 51-pin sensor board connector. Since the interconnection is reconfigurable, it provides more flexibility than traditional motes. In this work, we integrate a sensor board with an acoustic sensor, a motion sensor and a magnetic sensor connecting to a low power 4-channel 16-bit ADC, TI ADS8341, for high resolution sensing requirements.
5.5
Software System
We integrate TinyOS [2] to provide the system services of HARP. One advantage of TinyOS is that there are numerous applications and network protocols developed under it. Our software architecture allows the application developers integrating they TinyOS applications without knowing the hardware details. The run-time adaptation occurs spontaneously under the control of Configuration Manager. We implement the hardware drivers and IAP to provide the adaptivity based on the decision of Configuration Manager. Cortex-M3 processor requests and receives the configuration through the wireless transceivers and stores it on an external flash memory for run-time reconfiguration.
6.
EVALUATION
To evaluate the performance and the efficiency of HARP, we conducted a series of experiments by profiling the operations performed by HARP with different configurations. We use Agilent N6705B DC Power Analyzer to supply the power to the prototype and record the current consumption of each operation. To measure the execution time of each op-
Table 2: Comparison of hardware accelerations Implementation FFT AES Enc RS Dec FIR
Figure 6: The experimental setup including DC Power Analyzer, Agilent oscilloscope and HARP
HW SW HW SW HW SW HW SW HW SW
Duration (ms) 0.051 138.333 0.303 2.460 0.005 2.444 0.105 5.665 0.021 35.55
Current (mA) 84.997 65.242 82.373 65.339 70.760 66.883 74.849 67.427 84.997 66.592
Energy (mJ) 0.014 29.783 0.082 0.530 0.001 0.539 0.026 1.261 0.006 7.812
0.09 0.08
Table 1: Power consumption of SAM3U and Smartfusion2
[email protected] (mW) Active Sleep 8.964 1.782 54.342 21.096 86.4 39.582 -
[email protected] (mW) Active Sleep 17.04 15.492 45.396 32.64 64.452 48.552 78.3 62.64
eration, we use Agilent MSO9104A oscilloscope to mark the duration. This experiment setup allows us to catch the event lasting for less then a micro-second and analyze the current consumption of several micro-amperes. We first measure the power consumption on the Smartfusion2 SoC and compare it with the Cortex-M3 based SAM3U used in [12] at various clock frequency as listed in Table 1. We can see that the power consumption of the Smartfusion2 SoC is more efficient at higher frequency despite slightly higher consumption at sleep mode. The features listed in Section 3.3 are evaluated to see whether the requirements are fulfilled. All the measurements are performed at the 96 MHz clock (maximum possible frequency on SAM3U) supplied to both Cortex-M3 and the FPGA. The voltage supplied is 3.3V and the consumption of the whole node is measured. The experimental setup is shown in Figure 6.
6.1
0.05 0.04 Transition Mode
0.03 0.02
Stable Low Power
0.01
1.254ms
0 −1
−0.5
Low Power Mode
HARP provides several levels of low power mode. The platform can request the microcontroller and the FPGA to enter low power mode individually. Table 3 shows the current consumption of different low power modes. It is worth noting that it takes time to reach the stable current consumption
0
0.5
1 Time (ms)
1.5
2
2.5
3
Figure 7: Entering low power mode after a certain component enters certain low power mode. Figure 7 shows the current consumption of HARP in different low power modes. The FPGA reaches low power state 0.043 ms after the process starts. Then the microcontroller enters the sleep mode. However, it takes 1.254 ms to reach stable low power state after the chip indicates the low power mode is entered. The energy consumed for this transition is 0.37 mJ. Table 3: Low power mode
Accelerations
The applications running on HARP can perform the task using either software or hardware implementation. The software implementation consumes less current than its hardware counterpart but incurs longer execution time. We implement several operations which are widely used in sensor network applications such as FFT for acoustic event recognition, Advanced Encryption Standard (AES) for secure communication [15], Reed-Solomon (RS) coding for reliable communication, and FIR for noise filtering. Table 2 shows the current consumption, the execution time, and the energy consumed. We can see that the energy saving from the hardware accelerations varies from several to 103 folds. In a real deployment, the access rate of these operations depends on the applications running on the node and the event occurrence rate. This emphasizes the importance of the run-time profiling and reconfiguration for an efficient system.
6.2
0.06
Current (A)
Frequency 4Mhz 48Mhz 96Mhz 144Mhz
Enter FPGA Low Power Mode: −0.045ms Enter MCU Low Power Mode : −0.002ms Low Power Mode: 0ms
0.07
LPM1 LPM2 LPM3 LPM4 LPM5 LPM6
6.3
Operation Mode MSS SLEEP FPGA OFF (50MHz MSS clock) FPGA, eNVM, PLL OFF (50MHz MSS clock) FPGA OFF (1MHz MSS clock) FPGA, eNVM, PLL OFF (1MHz MSS clock) LPM1+LPM5+ Peripheral Power Down
Current (mA) 66.785 63.6 50.86 46.681 41.163 9.878
Run-time Reconfiguration
The duration of the reconfiguration depends on the size of the configuration data (bitstream) which is proportional to the size of the FPGA. On our prototype we use M2S010 which has 12084 reconfigurable tiles resulting 611 KB of configuration data. Table 4 shows the duration and energy consumption of the reconfiguration. We can see that the reconfiguration is an expensive operation in terms of energy consumption. This cost should be considered when assigning a new application to a sensor node or determining whether to change the configuration. Even though the energy needed for the run-time reconfigu-
Table 4: Run-time Reconfiguration Duration (s) 11.776 41.111 15.556
Current (mA) 101.146 96.074 93.295
ration looks high on a Flash-based device, it is still a better solution for a sensor network which follows the sensestore-send-sleep operation pattern because there is no additional overhead to configure the FPGA while switching to active mode like on a SRAM-based device. For instance, a SPARTAN-6 used in [21] consumes 50mA at 1.2V for configuration each time when the FPGA wakes up. Utilizing the partial reconfiguration feature, it takes 0.5 second to load the configuration. It means that for an application which wakes up every 5 seconds, 518.4 Joule will be consumed in one day solely for loading the configuration at wake-up.
6.4
Communication
We characterize the radio modules integrated in HARP by measuring the duration and the current consumption at each stage of communication operations. The operation can be mainly divided into two parts, namely microcontroller related and transceiver related. Microcontroller should transfer data between RAM and transceiver’s FIFO and then the transceiver should transfer/receive the data in FIFO to/from the antenna. Table 5 shows the characteristics of both CC2420 and CC1100. The data size is 64 bytes for both transceivers. CC2420 performs standard IEEE802.15.4 operations due to its fixed configuration, while CC1100 is configured to run at 433 MHz with MSK (Minimum-shift keying) modulation. As we can see in the table, both transceivers share similar characteristics in terms of the current consumption and the duration. We observe that the time needed for the communication between the microcontroller and the transceiver dominates the total duration required. Depending on the complexity of the protocol, the microcontroller should spend corresponding time to process the data. This fact justifies the choice of connecting multiple transceivers using dedicated interfaces connected to the FPGA because this enables the possibility of acceleration and parallelism. Table 5: Communication Components chip
CC1100
CC2420
6.5
Operation Idle Receive TX MSS + Transceiver TX Transceiver (0dBm) RX MSS + Transceiver RX Transceiver Idle Receive TX MSS + Transceiver TX Transceiver (0dBm) RX MSS + Transceiver RX Transceiver
Current(mA) 4.216 22.666 5.199 12.134 5.149 19.699 4.334 24.071 4.829 15.035 22.555 21.689
0.085
Energy (J) 3.931 13.034 4.789
Duration (s) 0.051657 0.003064 0.034613 0.002114 0.051532 0.002939 0.113689 0.002055
Power Profiling
To evaluate the built-in power meter, we output the current readings to the UART port and record them into a file with markers of specific events. The built-in power meter is calibrated using Agilent N6705B DC Power Analyzer and the timing of each reading is measured using the oscilloscope. Due to the overhead of UART output, the minimum interval of each reading is 1.78 ms which is around 562 samples
0.08 0.075
Current (A)
Operation Authenticate Program Verify
0.07 0.065 0.06 DC Power Analyzer Built-in Power Meter
0.055 0.05 0.045 -2.5
-2
-1.5
-1
-0.5
Time (s)
0
0.5
1
1.5
Figure 8: Profile from in-situ Power Meter per second. With the output turned off, the built-in power meter can achieve more than 8 ksps. We compare the current consumption measured using our power meter with the measurements on the DC Power Analyzer and plot the results on Figure 8. As we can see, although the built-in power meter has lower resolution than the DC Power Analyzer, it can correctly capture the current consumption of the operations at real-time. By comparing the constant current draw on the DC Power Analyzer, the output from the built-in power meter has less than 1% error. This high accuracy is important for the power profiling to manage the applications and configurations efficiently.
7.
CASE STUDY
In this section we use two applications to demonstrate the advantages of HARP based on the prototype implementation and simulation. An experiment based on real world sensing data is performed to investigate the behavior of HARP in a real world scenario.
7.1
Indoor Monitoring System
In this experiment, an indoor monitoring system in an office environment with 7 sensor nodes is deployed as depicted in Figure 9. The sensor nodes detect the moving objects using Passive Infrared (PIR) sensors and record the sound samples to recognize the event. The log data must be transmitted through a secure and reliable channel periodically. Therefore, 4 kinds of accelerators can be used on the sensor nodes which are FFT, FIR, RS and AES for sound classification and filtering, channel coding and encryption, respectively. The Configuration Manager should decide an efficient configuration based on the run-time observation. FIR and FFT accelerators are installed by default since they have higher speed-up ratio. We record the event trace in a duration of 4 days and use it as the test data input. Then we record the energy consumption of the node using Agilent N6705B DC Power Analyzer. Figure 10 shows the PIR event distribution at each location during the office hour. As we can see in the figure, the nodes can be divided into three groups. Node 1 and 6 are not often triggered by the PIR events, while node 2, 4 and 5 are PIR event dominated. Node 3 and 7 are somewhere in between. In our implementation, we record 8k audio samples when the PIR sensor is triggered, then use FIR to filter the noise and FFT to classify the sound. On the other hand, the node should transmit data encrypted by AES and encoded with RS to increase the reliability.
CM1
Audio recording
CM2
FFT
Audio recording
RS enc TX FFT
Audio CM3 Sleep mode recording
3
5 7 6
1
CMn CH
Figure 9: Deployment of the Indoor Monitoring System 200
Node‐07 Node‐06 Node‐05 Node‐04 Node‐03 Node‐02 Node‐01
150
100
50
0
Figure 10: Event Distribution The average current consumption of selected nodes is shown in Figure 11. We compare the node with only software implementation, the fixed hardware configuration, and the run-time adaptation. We can see that even though the PIR events occur more frequently on node 4, the default configuration does not yield the best efficiency. The Configuration Manager decides to load the configuration based on FIR and AES during the experiment and save 7.77% and 22.07% of current consumption compared with the default configuration and the one without acceleration, respectively. However, since the PIR event rarely occurs on node 1, the Configuration Manager chooses AES and RS based configuration and has 10.22% and 11.42% savings compared to the default configuration and software implementation, respectively. To see the effect of frequent PIR events, we manually trigger the PIR operations every 10 seconds. The results of this imaginary node (Node-X) show that default configuration is actually the best choice in this case and yields 63.05% of current saving compared to the software implementation.
80 No HW acceleration
HW (no reconf.)
HW (with reconf.)
Average Current (mA)
70 60 50 40 30 20 10 0 Node‐1
Node‐4
Node‐7
RS enc TX FFT
Sleep mode
RS enc TX
Sleep mode
. . .
4
2
Sleep mode
Node‐X
Node Number
Figure 11: Current reduction with run-time adaptivity
· · · Sleep mode Sleep mode
Audio recording
FFT
RS enc TX
RX
Sleep mode RS dec Beamforming AES TX
One frame
Figure 12: Clustered acoustic object tracking operations for cluster head (CH) and cluster members (CMs)
7.2
Acoustic Object Tracking Application
To investigate all the features of HARP at network level, we use network simulator to simulate an acoustic object tracking application. In this application, Network clustering technique [10] is implemented, which organizes the network into clusters, each has a cluster head (CH) and cluster members (CM). CMs record audio samples using their microphones and recognize the object type. They forward the result to CHs, which track the object using beamforming algorithms [22] to locate its position. In order to balance the workload, every certain period of time (called ”round”), clusters are reformed and different nodes are elected to become CHs. In our case study, we deploy 96 nodes randomly distributed in a 250m×250m area using NS2 network simulator. Each node has initial energy of 18720 Joules which is equivalent to the energy of two AA batteries. A member node performs the sensing task in every 10-second frame as depicted in Figure 12. The task of a member consists of audio recording, audio clip analyzing (FFT based), encoding the data with RS error correcting code [16], sending the data to the CH via CC2420 transceiver, then entering sleep mode. A CH collects packets from its members, decodes the data, performs the beamforming algorithm (FIR based) to estimate the location, then sends it to the base station via CC1100 transceiver. When clusters are formed, the reconfiguration takes place to accelerate the cluster head. We compare HARP in terms of energy consumption and the network lifetime with three other cases: 1) no acceleration available. 2) fixed hardware accelerators. 3) SRAM-based FPGA with reconfiguration. Figure. 13 shows the residual energy of the whole network (96×18720 Joules) during the simulation. We can see that the case without hardware accelerations is very inefficient and drain out the energy very quickly. Comparing to the fixed hardware implementation, the run-time reconfiguration shows the benefit when round time increases because the longer the round time is, the less reconfiguration overhead is introduced. When the round time is set to 1, 2, and 3 hours, HARP obtains 5%, 9% and 12% energy saving, respectively. Comparing with the SRAM-based FPGA, the energy savings are 0.8%, 4.1% and 7.5%, respectively. In Figure. 14 we compare the network lifetime with different round time. Comparing with the case without hardware accelerations, HARP with 1, 2, and 3 hours round time increases the lifetime by 96%, 99% and 101%, respectively. Comparing with fixed hardware implementation, the lifetime is extended by 5.6%, 7.7% and 9.5%, respectively. Com-
Residual Energy (J)
2,000 k
References
1,750 k
[1] Memsic, micaz, http://www.memsic.com/wireless-sensornetworks/mpr2400cb. [2] Tinyos, http://www.tinyos.net. [3] A. H. Akbar, A. A. Iqbal, and K.-H. Kim. Binding multiple applications on wireless sensor networks. In Advances in Grid and Pervasive Computing, pages 250–258. Springer, 2006. [4] I. F. Akyildiz, T. Melodia, and K. R. Chowdhury. A survey on wireless multimedia sensor networks. Computer Networks, 51(4):921–960, Mar. 2007. [5] P. Bogdan and R. Marculescu. Towards a science of cyberphysical systems design. In 2011 IEEE/ACM International Conference on Cyber-Physical Systems (ICCPS), pages 99–108, Apr. 2011. [6] P. Corke, T. Wark, R. Jurdak, W. Hu, P. Valencia, and D. Moore. Environmental wireless sensor networks. Proceedings of the IEEE, 98(11):1903–1917, Nov. 2010. [7] I. Downes, L. B. Rad, and H. Aghajan. Development of a mote for wireless image sensor networks. Proceedings of COGnitive systems with Interactive Sensors (COGIS), Paris, France, 2006. [8] W. Grosky, A. Kansal, S. Nath, J. Liu, and F. Zhao. SenseWeb: an infrastructure for shared sensing. IEEE MultiMedia, 14(4):8 –13, Dec. 2007. [9] Y. Guo and M. Hazas. Localising speech, footsteps and other sounds using resource-constrained devices. In 10th International Conference on Information Processing in Sensor Networks (IPSN), pages 330–341. IEEE, Apr. 2011. [10] W. R. Heinzelman, A. Ch, and H. Balakrishnan. Energy-efficient communication protocol for wireless microsensor networks. In Proceedings of the 33rd Annual Hawaii International Conference on System Sciences, pages 3005–3014, 2000. [11] C.-M. Hsieh, Z. Wang, and J. Henkel. Dance: distributed application-aware node configuration engine in shared reconfigurable sensor networks. In Proceedings of the Conference on Design, Automation and Test in Europe, pages 839–842, 2013. [12] J. Ko, K. Klues, C. Richter, W. Hofer, B. Kusy, M. Bruenig, T. Schmid, Q. Wang, P. Dutta, and A. Terzis. Low power or high performance? a tradeoff whose time has come (and nearly gone). In Proceedings of the 9th European Conference on Wireless Sensor Networks, EWSN’12, pages 98–114, Berlin, Heidelberg, 2012. Springer-Verlag. [13] M. Kohvakka, T. Arpinen, M. H¨ annik¨ ainen, and T. D. H¨ am¨ al¨ ainen. High-performance multi-radio wsn platform. In Proceedings of the 2nd international workshop on Multi-hop ad hoc networks: from theory to reality, pages 95–97, 2006. [14] I. Leontiadis, C. Efstratiou, C. Mascolo, and J. Crowcroft. Senshare: transforming sensor networks into multi-application sensing infrastructures. In Wireless Sensor Networks, pages 65–81. 2012. [15] Y. Li, Z. Jia, S. Xie, and F. Liu. Dynamically reconfigurable hardware with a novel scheduling strategy in energy-harvesting sensor networks. IEEE Sensors Journal, 13(5):2032–2038, 2013. [16] C.-J. M. Liang, N. B. Priyantha, J. Liu, and A. Terzis. Surviving wi-fi interference in low power zigbee networks. In Proceedings of the 8th ACM Conference on Embedded Networked Sensor Systems, pages 309–322, 2010. [17] L. Mainetti, L. Patrono, and A. Vilei. Evolution of wireless sensor networks towards the internet of things: A survey. In 19th International Conference on Software, Telecommunications and Computer Networks (SoftCOM), pages 1–6, 2011. [18] Microsemi. Smartfusion2 system-on-chip fpga, http://www.microsemi.com/products/fpga-soc/socfpga/smartfusion2. [19] F. Philipp, F. A. Samman, and M. Glesner. Design of an autonomous platform for distributed sensing-actuating systems. In 22nd IEEE International Symposium on Rapid System Prototyping (RSP), pages 85–90, 2011. [20] J. Portilla, T. Riesgo, and A. De Castro. A reconfigurable fpgabased architecture for modular nodes in wireless sensor networks. In 3rd Southern Conference on Programmable Logic, pages 203–206, 2007. [21] J. Valverde, A. Otero, M. Lopez, J. Portilla, E. d. l. Torre, and T. Riesgo. Using SRAM based FPGAs for power-aware high performance wireless sensor networks. Sensors, 12(3):2667–2692, Feb. 2012. [22] A. Wang and A. Chandrakasan. Energy-efficient DSPs for wireless sensor networks. IEEE Signal Processing Magazine, 19(4):68 –78, July 2002. [23] J. Yick, B. Mukherjee, and D. Ghosal. Wireless sensor network survey. Computer Networks, 52(12):2292–2330, Aug. 2008.
1,500 k 1,250 k No HW acceleration
1,000 k
HW (no reconf.) 750 k
Reconfiguration(1 hr)
500 k
Reconfiguration(2 hrs)
250 k
Reconfiguration(3 hrs) SRAM (with reconf.)
0 k 0
2
4
6
8
10 12 14 16 18 20 22 24 26
Network Lifetime (hours)
Simulation time (hours) Figure 13: Energy consumption 80 75 70 65 60 55 50 45 40 35 30 25 20 15 10 5 0
No HW acceleration HW (with reconf.)
1 hour
HW (no reconf.) SRAM (with reconf.)
2 hours
3 hours
Round Time Figure 14: Network lifetime paring with the SRAM-based FPGA the increase is 1.1%, 3.2% and 5.3%, respectively. In addition, we measure the throughput of the system with and without multiple radio transceivers. When the node utilizes both radios for data transmission, it can reach 335.69 kbps while using a single transceiver, the throughput is only 164.53 kbps.
8. CONCLUSION In this paper we presented HARP - a hardware/software codesign framework for a reconfigurable wireless sensor platform which can adapt to the dynamic application requirements and the run-time situations in wireless sensor networks. HARP provides high performance and efficiency for applications by using hardware accelerators. It employs multiple radio transceivers with dedicate communication interfaces to achieve flexible communication and high throughput. HARP is capable of run-time reconfiguration which can remotely request and change the hardware/software combination to perform new tasks or improve the efficiency of applications. We provided a detailed description and a comprehensive analysis of HARP in terms of energy consumption and execution time with real-world measurements. The case studies show that the current consumption drops 31.27% on average and the network lifetime increases up to 101%. In the future, this framework can be integrated with energy harvesting devices and software-defined radios to further improve the energy efficiency and communication flexibility.
Acknowledgment This work was supported by German Research Foundation (DFG), Research Training Group GRK 1194 (Self-organizing Sensor-Actuator-Networks).