We implemented our TTA processor on a low-power Actel. Igloo AGL1000 Flash FPGA. The operating voltage of the. Igloo was chosen to be 1.2 V instead of 1.5 ...
Reconfigurable Miniature Sensor Nodes for Condition Monitoring Teemu Nyl¨anden, Jani Boutellier, ∗ Karri Nikunen, Jari Hannuksela, Olli Silv´en Computer Science and Engineering Laboratory ∗ Electronics Laboratory, University of Oulu, Finland {teemu.nylanden, jani.boutellier, karri.nikunen, jari.hannuksela, olli.silven}@ee.oulu.fi Abstract—The wireless sensor networks are being deployed at escalating rate for various application fields. The ever growing number of application areas requires a diverse set of algorithms with disparate processing needs. The wireless sensor networks also need to adapt to the prevailing energy conditions and processing requirements. The preceding reasons rule out the use of a single fixed design. Instead a general purpose design that can rapidly adapt to different conditions and requirements is desired. In lieu of the traditional inflexible wireless sensor node consisting of a micro-controller, radio transceiver, sensor array and energy storage, we propose a rapidly reconfigurable miniature sensor node, implemented with a transport triggered architecture processor on a low-power Flash FPGA. Also power consumption and silicon area usage comparison between 16-bit fixed and floating point and 32-bit floating point implementations is presented in this paper. The implemented processors and algorithms are intended for rolling bearing condition monitoring, but can be fully extended for other applications as well.
I. I NTRODUCTION Monitoring of industrial plants and environmental conditions has ascended to a new level through the introduction of wireless sensor networks (WSNs) consisting of wireless sensor nodes, referred to as motes from now on. Self-powered wireless sensor nodes have huge potential in these fields as well as various other implementation fields. The idea of a mote that scavenges the energy it needs from the surrounding environment and operating without the need of an bulky external power supply, such as a battery, is more than intriguing. In sophisticated applications, sensor nodes are required to perform complex data processing before transmitting their observations to a base station. This is mostly due to the high energy cost of radio transmission. Processing the data locally and transmitting only the processed data can result in better energy efficiency compared to applications that send all the collected data to be processed elsewhere. Complex data processing in conjunction with the low energy resources demands the use of application-specific solutions. Unfortunately, the design time and price prohibits the use of application specific integrated circuits (ASICs) in all but the highest-volume applications. However, with the combination of application-specific instruction processors, programs written in a high level language and low-power Flash fieldprogrammable gate array (FPGA)-circuits it is possible to design rapidly deployable, low-cost and high-efficiency sensor nodes even in low volumes.
Condition monitoring is justifiable when the financial benefits overcome the costs. The financial benefits can be substantial, if for example a machine breakdown can be avoided by condition monitoring. Since manual machine condition monitoring is very time consuming and therefore expensive, deploying a low-cost WSN to perform the same task is a very attractive choice. In addition, since the machine can be active while the monitoring is performed, no valuable production or function time is lost. However, there are a few fundamental obstacles for implementing a self-powered WSN. First of all, typically there is a huge number of applications that each require different measurement modalities and processing algorithms, which is why no single fixed design appears useful. By designing dedicated designs for each application, the cost becomes high. This is counter-intuitive if the nodes should be cheap and even disposable. Due to the ambient conditions the scavenged power typically tends to be unregulated, erratic and weak. Often the scavenging provides at most mWatts, as presented in Table I [1] [2] [3]. For example, the Atmel ATmega128(L) microcontroller, which is used in the MICAz platform has a power consumption of about 16.5 mW at the clock rate of 4 MHz in active state [4]. The numbers indicate that the scavenged energy alone is not enough to power the mote, instead an energy storage such as a super-capacitor is needed. Although, it has to be noted that a typical mote is in active state for only a short portion of its duty cycle and hibernates the rest of the time. Furthermore, recovery from power outages and sleep states can be very expensive and consume most of the available energy. For example, in [5], the boot time for Neutron operating system intended for sensor nodes was between 10-16 ms. In addition, in [6] the sleep mode latency on awake event is approximately 20000-35000 cycles for the DSP/BIOS real time operating system, which corresponds to about 20-35 ms with 1 MHz clock rate. As these two examples show, the boot and reboot times of lightweight operating systems are typically measured in ms, diminishing the energy efficiency of a mote. Every clock cycle that is used to power up or boot the mote, is taken off from the motes energy budget. Long boot times are often caused by instantiation of objects and initialization of the system resources. In addition, in case the mote powers down, the
TABLE I E NERGY H ARVESTING E STIMATES FOR I NDUSTRIAL E NVIRONMENT
Energy Source
Harvested Power
Electromagnetic
1-25 mW/cm3
Vibration/Motion
0.8 mW/cm3
Temperature Difference Light Indoor Outdoor RF GSM WiFi
1-10 mW/cm2 0.01 mW/cm2 10 mW/cm2 0.0001 mW/cm2 0.001 mW/cm2
random access memory (RAM) contents are lost and have to be re-initialized. Re-initializing the RAM prolongs the boot time even more. The Texas Instruments CC2420, which is the transceiver used in both the Tmote Sky and MICAz platforms, has power consumption of about 35-65 mW in transmitting and receiving modes and power consumption of about 1 mW at sleep mode. As communications consume a radical amount of energy, most of the signal processing should be carried out within the sensor node to minimize the high cost data transmission. Finally, a general design challenge is that low power operation has usually been understood to require fixed-point arithmetics and in-line assembly or programming language intrinsics. Since programming fixed-point designs can be very time consuming, fixed-point arithmetic is an illogical choice for the need of rapidly creating different designs with minimal programming and testing effort. This paper investigates three different arithmetic formats; 16 and 32-bit floating point and 16-bit fixed point from the perspective of energy efficiency and ease of use. The latter factor cannot be regarded as unimportant as the design of algorithms with fixed point arithmetic may take weeks of design time, which might become prohibitively expensive when tailored sensor nodes are to be implemented rapidly and in low volumes. The remainder of the paper is organized as follows. Section II reviews the related work in energy efficient motes. In section III, our reconfigurable sensor node implementation is presented and the tool chain and arithmetics used are discussed. Section IV describes the rolling bearing monitoring practices and algorithms. In Section V the performance of our mote implementations are presented. Sections VII and VI conclude the paper. II. R ELATED WORK The current motes store the scavenged energy to external power storages such as batteries or super-capacitors. Often motes are deployed in hard-to-reach places, which make their maintenance burdensome. Majority of current motes, like the
Heliomote presented in [7], use batteries as their main energy storage. However, since the lifetime of a battery is very limited and their replacement often infeasible, other solutions are needed. Alternatively, a super-capacitor can be used as an energy storage independently or in conjunction with a battery. In [8] a mote called Prometheus that uses a super-capacitor as the primary energy storage and a Li-ion battery as a back-up energy storage is presented. In [9] a mote named Everlast, that uses only a super-capacitor as its energy storage, is presented. In some cases it is also possible that the ambient energy itself is enough to power the mote, making the energy storage purposeless. This is possible in some rare cases where a continuous, stable and fully predictable ambient energy source exists. Unfortunately in most cases the scavenged energy amount is not constant or even continuous. The current sensor nodes also require lots of software if general purpose functionality is pursued. In addition, usually some sort of power management is needed, so the WSN can adapt to the prevailing energy and processing needs, which increases the overall complexity and energy consumption of the design. III. R ECONFIGURABLE
SENSOR NODE
We aim at motes that can be rapidly adapted from a few standard designs to most of the condition monitoring applications. Our mote consists of a reconfigurable processing module that can be attached to a multi-channel measurement module and an energy scavenging module. The choice of measurement and energy scavenging modules is application dependent. A combination of multiple energy sources, measurement sources or even both might be needed to gain sufficient energy levels and measurement accuracy. Figure 1 demonstrates one possible composition of our sensor node.
Energy Management Unit
Energy Scavenging Unit
Energy Storage
Multi-channel Measurement Module
Reconfigurable processing module
Reconfiguration Interface
Fig. 1.
Harvest-Store-Use Architecture of a mote.
A potential application candidate for our mote could be e.g. monitoring the mechanical parts of a moving machine. Typically monitored phenomena in this application field require sampling rates of 5-40 kHz [10][11]. Due to often very noisy measurement conditions and compensation of the performance of low cost sensors requires the use of multiple measurement channels and adaptive filtering techniques. Figure 2 illustrates the typical design of a complete WSN.
Sensor array
DSP unit - A/D conversion - Analysis and processing
Wireless transceiver unit
PC, database etc.
Base station
ality based designs the buffering needs are substantially lower. For example, 50 kHz sampling rate for four separate channels with 256 sample buffers per channel results in almost 800 interrupts per second, which means that interrupts are coming at 1.25 ms intervals. This entails that the whole processing chain has to be confined into this rate, which leaves only about 1250 cycles to perform the necessary data processing presuming a 1 MHz processor clock rate. C. Arithmetics
Fig. 2.
An example of typical WSN design.
A. Tool chain Our aim was to implement a rapidly reconfigurable and energy efficient mote, with as little run-time computational overhead and design effort as possible. We chose the mote to be implemented using the TTA-based Co-design Environment (TCE), a toolset for designing application specific instructionset processors (ASIPs) based on the very long instruction word (VLIW) flavored transport triggered architecture (TTA). The TCE toolset provides fast and simple way to design and implement ASIPs for FPGAs [12]. The amount of system software can be made minimal contributing to rapid bootstrap, unlike WSNs with operating systems. The TTA processor can be programmed using standard C-language, so there is no need to learn a new programming language. The disadvantage of TTA is the relatively poor code density, which mainly results from the minimal instruction encoding that is used to simplify decoding [13]. However, the poor code density can be improved using compression. The use of dictionary based compression decreased the instruction word length of our implementations from about 130 bits all the way down to 9 bits. We implemented our TTA processor on a low-power Actel Igloo AGL1000 Flash FPGA. The operating voltage of the Igloo was chosen to be 1.2 V instead of 1.5 V to achieve better energy efficiency. The Igloo has 1 kbit of Flash read only memory (ROM) and 144 Kbit static random access memory (SRAM) as on-chip memories. The Igloo is also live at powerup and by using the Flash*Freeze technology the Igloo can enter and exit ultra-low power modes in microseconds [14] [15]. B. Buffering and scheduling In signal processing the processing chains are usually very deterministic, which is why there is no need for priority based scheduling. However, often the number of measurement channels used should be dynamically selected based on signal quality and energy conservation. So scheduling technique that adapts to the needs is required. Multichannel data acquisition and processing in real time can be a major testing challenge if the system performance is sacrificed to achieve low power consumption for example with Rate-monotonic scheduling. Extensive buffering of samples is almost a necessity with software oriented solutions. However, with hardware function-
Typically fixed-point implementations are considered more efficient in terms of silicon area and power efficiency compared to floating point implementations. This is due to the fact that the comparison is generally performed between the single precision 32-bit floating point and 16-bit fixed point arithmetic. By using 16-bit floating point arithmetic instead of the 32-bit arithmetic the energy savings can be notable [16]. In application areas like WSNs, the 16-bit floating point offers more than sufficient accuracy and dynamic range in most cases. In addition, the long development and testing times required by fixed point implementations also speak for floating point implementations. In fixed point arithmetic, the multiplication and division operations can be replaced by simple shift operations, providing for a much more area efficient and faster design. Often it is forgotten, that by choosing suitable parameters, there are also similar ways to make floating point operations faster and more area efficient. For example, the square root estimate can be calculated rapidly using the inverse square root presented in [17]. In addition, algorithms like FFT and RMS require division with sample or FFT lengths. If these lengths are chosen to be powers of two, the floating point division can be performed by a simple exponent subtraction. By exploiting the floating point tricks the performance gap between fixed point and floating point implementations decreases significantly. D. Implementation comparison We implemented three different general purpose TTA processors, one for each of the following arithmetics: 16-bit fixed point, 16-floating point and 32-bit floating point. The general purpose processors were designed so that they could handle the typical algorithms for bearing fault monitoring. Although the different arithmetics require slightly different processor composition, the three processors were designed to be as similar as possible for fair comparison. The general purpose nature of the design together with the hard silicon area requirements also forced us to use only basic function units (FUs) on our processor. One small special function unit (SFU) was however implemented for dividing floating point numbers with powers of two. In our floating point implementations we used the IEEE 754-2008 floating point formats for our floating point implementations. A non-standard composition for the 16-bit floating-point implementation was also considered, but to achieve the rather high dynamic range requirements of some
algorithms, the number of exponent bits could not be decreased to achieve better accuracy. Table II presents a comparison between the three different arithmetics in terms of silicon area and power dissipation. Surprisingly, the 16-bit floating point implementation achieves the best results in both silicon area and power efficiency. This is due to the fact that our 16-bit fixed point implementation actually needs 32-bit bus and register file, as well as a 32 bit arithmetic unit for storing and processing the intermediate results of some algorithms. By using wider word lengths for the intermediate results lower power consumption was achieved compared to pure 16-bit implementation. It has to be noted that a SFU could have been implemented for this purpose also, but it would have increased the design and testing time of the fixed point implementation even further. TABLE II C OMPARISON OF
SILICON AREA USAGE AND POWER CONSUMPTION BETWEEN FIXED AND FLOATING POINT ARITHMETIC
Arithmetic 32-bit floating point 16-bit floating point 16-bit fixed point
Silicon area (Actel versatiles) 15553 8523 11693
Power dissipation (mW) 18.474 12.949 15.435
Actually, the FUs implemented with the 16-bit floating point arithmetic are larger and slower than the corresponding 16-bit fixed-point ones. A comparison of silicon usage and power consumption of a single 16-bit floating point and 16-bit fixed point arithmetic unit is presented in Table III. Although the floating point arithmetic unit is almost twice the size of the fixed point one and the power consumption is also more than doubled, the arithmetic benefits of the floating point implementation outweigh the fixed point implementation, when the total energy consumption is analyzed. This is due to the lower operation count and therefore substantially lower number of total clock cycles required for processing. However, the FUs themselves take only about 40-50 percent of the total area requirements of our TTA processor, while the TTA overhead requires the rest. This means that the silicon area of the FUs does not play a major role when the whole design is examined. TABLE III C OMPARISON BETWEEN A
SINGLE FLOATING POINT AND FIXED POINT ARITHMETIC UNIT SILICON AREA USAGE AND POWER CONSUMPTION
Arithmetic 16-bit floating point 16-bit fixed point
Silicon area (Actel versatiles) 1350 800
Power dissipation (mW) 0.164 0.064
Since the processor is composed of general purpose FUs and only about one third of the total silicon area is used, it might be even possible to implement also the radio communications with the same processor by adding FUs and SFUs for this purpose. However, radio implementation was not in the scope of this study.
IV. B EARING
FAULT MONITORING
Although bearings are critical parts of a machine their failure rarely causes catastrophic consequences. However, a bearing failure is typically a signal of a more significant problem that can cause severe machine failures and even safety risks if not fixed in time. Typically bearing failure is caused by ineffective lubrication, contaminated lubrication, heavier loading than anticipated, improper handling or installation, etc. [18]. Since bearings are often located in hard to reach places, their manual monitoring requires machine shutdown and dismantling it. By deploying WSNs to do the monitoring, no unnecessary shutdowns would be needed. A damaged rolling bearing or damaged outer cage typically generates vibration and acoustic noise. The vibration can be measured by accelerometers and the noise by acoustic emission sensors. Often the damage starts with only a tiny crack, which then gradually progresses to a bigger and bigger fracture. The gradual progression of a rolling bearing damage offers a long time frame to detect the damage and even an incipient damage can be detected and an estimate of the bearing lifetime can be made based on it. When bearing condition is to be monitored, it means that something must rotate. The rotating motion therefore provides for a natural choice for energy scavenging in the form of changing magnetic field and vibration. Depending on the application the rotating motion provides for a continuous energy source presumed that the machine is active. In addition, since most likely the ambient energy provides for much more energy than the mote actually requires, it would be feasible to store this excess energy so that the mote could perform the computation and analysis also when the machine has already halted. However, in some cases, like in paper industry, where machines operate relatively long periods with high and constant speeds, the scavenged energy can be enough to power the node without an external energy storage. In our approach the signals from the bearings are collected with piezoelectric elements. They are able to operate in the most harsh environments and are unaffected by the metal dust that most likely is present during measurement. A. Algorithms The typical environment where the bearing condition measurement takes place is very noisy, which is why the measured signals need to be heavily filtered to recover the data of interest and typically adaptive filtering is used to cancel the noise unrelated to the measurement. The measurement data can be analyzed in time or frequency domain. Typical time domain algorithms used in rolling bearing monitoring are mean, maximum, minimum, peak, root mean square (RMS) and kurtosis. The most commonly used frequency domain algorithms involve Fourier spectrum estimation. The frequency domain analysis can even separate whether the defect is on the inner or outer cage or in the ball, since they all have specific defect frequencies [19] [18]. A simple time domain algorithm that can be efficiently used to detect defects in the bearings is the RMS algorithm. A
damaged bearing will result in higher RMS value compared to a reference RMS value of undamaged bearing. Although the RMS algorithm is fairly simple, it may have quite high dynamic range requirements depending on the sample length the RMS is calculated on. Alternatively the condition monitoring can be also be performed in frequency domain. The spectrum analysis reveals damaged bearings efficiently. Since the defect frequencies are higher compared to the ones depicting loose or unbalanced bearings, it is fairly easy to classify the source of the problem by using frequency domain analysis. Figure 3 presents a power spectrum for damaged and undamaged bearing. It is easy to see the outer cage damage indicated by the high peak at about 12 MHz. −3
2
x 10
Intact bearing 1.5 1 0.5 0 −3
2
2
4
6
8
10
12
14
16
18
20
x 10
Damaged bearing 1.5 1 0.5 0
2
4 Fig. 3.
6
8
10
12
14
16
18
The power spectrum can be calculated as P (k) = |X(k)|2 =
N −1 nk 1 X | xn ei2π N |2 . 2 N n=0
(2)
We used a 256 point radix-2 fast-Fourier transform (FFT) in our implementation. Together with the preprocessing the implementation for calculating the power spectrum required about 5120 multiplications, 3584 additions and 3072 subtractions. Although the power spectrum computation was much more complex compared to the RMS calculation, it did not require as high dynamic range as the RMS-computation did. B. Node design By using only basic FUs the design is easily adaptable to various algorithms and the same design can also be used for example for filtering and radio transceiver purposes. Since a general purpose rapidly reconfigurable design was pursued and due to silicon area restriction, no SFUs were used in the design, apart from the floating point division SFU. Figure 4 illustrates the composition of our TTA processor for the mote. Only the mandatory operations for performing calculations in the required algorithms are supported, keeping the processor as small and therefore as energy efficient as possibly. The processor consists of only three FUs. The LOGIC-unit performs the logical operations. The DIV-unit is only used for floating point designs. It is a simple bit manipulation SFU that performs division by powers of two. The ARITH-unit performs all other arithmetic operations. The processor also has load/store-unit (LSU), four register files (RFs) and a global control unit (GCU). The TTA processor implementations for all the three different arithmetics were designed to be as similar as possible to allow a fair comparison.
20
Power spectrum of the measured signal.
The RMS algorithm is pretty straightforward,
R(x) =
s
PN
k=0
N
x2k
,
(1)
where xk is kth sample and N is the total number of samples. However, depending on the RMS length it can require a relatively large dynamic range. The sample length of 256 was chosen for this study, resulting in 256 multiplication and 255 addition operations and one division operation. In this study the square-root of the final sum was not calculated to maximize the energy efficiency. Since square-rooting is only a way to scale the final sum, it is not a mandatory operation. By using 32-bits for the intermediate results for fixed point implementation the accuracy remained good. However, the accuracy of the 16-bit floating point implementation started to suffer with the RMS-calculation, but it was still able to efficiently point out the damaged bearings.
Fig. 4.
TTA implementation of the node
The 16-bit fixed-point implementation, however, required one extra function unit to perform the 32-bit operations required for filtering and RMS-algorithm purposes to avoid overflow. By employing one 32-bit arithmetic unit for the intermediate results instead of dividing the intermediate results into two 16-bit parts for the fixed-point implementation, lower operation count and notably lower number of total clock cycles required for the computation was achieved. The 32-bit unit adds only about 0.5 mW to the total processor dynamic power consumption compared to a pure
16-bit implementation, excluding the effects of the increase in interconnection networks dynamic power consumption. The use of 32-bit arithmetic, however, allows a significant reduction in the total operation count. The total operation count has much greater effect on the total power consumption, because the higher the operation count the higher the amount of total clock cycles is needed for computation. In other words, the number of clock cycles required for the total execution has much greater effect on the total energy consumption than the one caused by the higher power consumption with 32-bit FU. The effect on the total number of lines in code is even greater. The same effect can also be observed in Table IV. Table IV presents the 12 MHz clock rate energy consumption for the whole 256-sample power spectrum calculation chain including preprocessing. Although in Table III a single floating point arithmetic unit had almost twice the power consumption compared to an equivalent fixed point unit, the 16-bit floating point design achieves better energy efficiency when the whole design is analyzed. This is due to lower operation count. TABLE IV E NERGY CONSUMPTION COMPARISON
Arithmetic 32-bit floating point 16-bit floating point 16-bit fixed point
Total energy consumption (µJ) 84.80 59.43 70.84
The 1 MHz clock rate is sufficient if the sampling rates are low or only one or two sensors are used. The 12 MHz is better suited for cases with higher sampling rates and/or higher number of sensors. The memory usage is also an important factor with WSN implementations. The contribution of memory to the total power consumption cannot be discarded. By using the 16bit floating point arithmetic the data memory requirements are almost halved compared to the 32-bit floating point implementation. The total instruction memory size for each of the three implementations is about 5 kbits. Since the Actel Igloo only has an 1 kbit embedded Flash ROM, an external instruction memory was needed. Alternatively, the core cells could have been used as instruction ROM, but the reconfigurability of the TTA processor would have then been lost. A power consumption comparison between our node and two popular nodes Tmote Sky and MICAz [20], [4] is presented in Table VII. Only the micro-processor power consumption of the Tmote Sky and MICAz is considered in this table. Compared to the Telos and MICAz platforms our platform achieves almost three times better power efficiency compared to Telos with same clock rate and over seven times better power efficiency compared to MICAz with 4 MHz clock rate. The radio power consumption for Telos and MICAz platforms is between 35-65 mW for both platforms. TABLE VII P OWER CONSUMPTION COMPARISON
Platform
V. N ODE
PERFORMANCE
The power consumption for our implementations with operating voltage of 1.2 V and clock rates of 12 MHz and 1 MHz are presented in Tables V and VI. The maximum clock rates of the three implementations did not differ much. The fixed-point implementation achieves the highest clock rate 18.3 MHz. The maximum clock rate for the 16-bit floating point implementation is slightly lower but the difference is only 0.3 MHz. The maximum clock rate for the 32-bit floating point implementation is 16.6 MHz. TABLE V P OWER CONSUMPTION COMPARISON FOR 12 MH Z CLOCK RATE Arithmetic 32-bit floating point 16-bit floating point 16-bit fixed point
Dynamic power dissipation (mW) 18.418 12.893 15.378
Total Power dissipation (mW) 18.474 12.949 15.435
TABLE VI P OWER CONSUMPTION COMPARISON FOR 1 MH Z CLOCK RATE Arithmetic 32-bit floating point 16-bit floating point 16-bit fixed point
Dynamic power dissipation (mW) 1.411 1.078 1.269
Total Power dissipation (mW) 1.467 1.134 1.325
Telos MICAz 16-bit FLP TTA mote 16-bit FLP TTA mote
Clock Rate (MHz) 1 4 1 4
Total power consumption (mW) 3 33 1.13 4.37
VI. D ISCUSSION Figure 1 presented a harvest-store-use architecture version of our mote. However, also another type of mote could be implemented if the ambient energy source is continuous, stable and predictable [21]. Paper and steel industry are good examples of such environments. Since the run times are long and the rolling speed relatively constant, also the scavenged energy remains constant and predictable. In environments like these also the harvest-use type mote architecture can be used in which case the energy storage becomes obsolete. However, if an energy storage is used together with the energy scavenging unit the mote is better able to adapt to the prevailing conditions. For example a higher sampling rate or more complex algorithms might be needed to improve the performance and accuracy of the analysis and the extra energy from the energy storage can be utilized for these purposes. The harvest-use type architecture simply is not capable of such scaling of sampling rate and computation algorithms. The ambient energy source should provide a power level that is both predictable and preferably as stable as possible to be
suitable for energy self-sufficient solutions without an energy storage. Table VIII demonstrates the total power consumption of our reconfigurable sensor node. The power consumption estimates presume an active state for all the components. The worst case maximum power consumption is about 70 mW, which defines the momentary output requirements for the energy storage. In addition, the average power needs which are about 10 mW define the energy scavenging needs. TABLE VIII P OWER CONSUMPTION COMPARISON Component TTA processor Transceiver (TI CC2420) Sensor (Kionix KXTE9-2050) Total
Maximum power consumption 4.37 mW 35-65 mW 0.1 mW 40-70 mW
Approximated duty cycle 100 % 10 % 100 %
Average power consumption 4.37 mW 3.5-6.5 mW 0.1 mW 8-11 mW
It has to be noted that the instruction memory size problem needs to be solved before the mote is useful as such. Now the instruction memory does not fit to the embedded ROM on the Actel Igloo FPGA. For now this frailty in the FPGA can be avoided by using an off-chip memory, which would elevate the worst case maximum total power consumption to almost 100 mW. However, this defect could be easily avoided by designing a Flash FPGA that has sufficient amount of ROM that could be used for instruction memory. VII. S UMMARY Using ASIC-based fixed mote designs that are optimized for a single or few alike algorithms with similar requirements is reasonable as long as the longer development times can be accepted. However, in case of a design that needs to be rapidly designed and support a variety of algorithms with different requirements, the ASIP solution becomes the natural choice. A rapidly reconfigurable WSN implemented with the TCE tools on an Flash FPGA can be deployed fast and with only a little effort presuming floating point arithmetic is used. The development and testing time and effort can be substantially lower compared to a fixed point implementation. As this study shows the benefits of the floating point arithmetic in WSN implementation do not come with a high price like typically expected, at least for low clock rate energy efficient WSN designs. As future work the instruction memory issues need to be solved. Since our signal processing module used only one third of the total silicon area of the Actel Igloo FPGA, additional FUs and SFUs could be added to boost up the calculation or to widen the range of supported algorithms or even to implement the radio communications on the mote. ACKNOWLEDGEMENTS This study was carried out in the InterSync project. The project is a part of the Finnish Metals and Engineering Com-
petence Cluster (FIMECC) research program EFFIMA. The project is also financially supported by the Finnish Funding Agency for Technology and Innovation (TEKES) and industrial companies. Their support is gratefully acknowledged. R EFERENCES [1] M. Raju and M. Grazier, “Ultra Low Power Meets Energy Harvesting,” 2010. [Online]. Available: www.ti.com/lit/wp/slyy018a/slyy018a.pdf [2] J. A. Paradiso and T. Starner, “Energy Scavenging for Mobile and Wireless Electronics,” IEEE Pervasive Computing, vol. 4, no. 1, pp. 18–27, January 2005. [3] K. Najafi, “Micro Energy Harvesters - An Alternative Source of Renewable Energy,” 2010. [Online]. Available: http://www.azonano. com/article.aspx?ArticleID=2613 [4] J. Polastre, R. Szewczyk, C. Sharp, and D. Culler, “The mote revolution: Low power wireless sensor network devices,” in Hot Chips 16: A Symposium on High Performance Chips, 2004. [5] Y. Chen, O. Gnawali, M. Kazandjieva, P. Levis and J. Regehr, “Surviving sensor network software faults,” in Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, ser. SOSP ’09, 2009, pp. 235–246. [6] V. Wan and E. Young, “Power Management in an RF5 Audio Streaming Application Using DSP/BIOS,” Texas Instruments, Tech. Rep., 2005. [7] A. Kansal, J. Hsu, S. Zahedi and M. B. Srivastava, “Power management in energy harvesting sensor networks,” ACM Trans. Embed. Comput. Syst., vol. 6, September 2007. [8] X. Jiang, J. Polastre and D. Culler, “Perpetual environmentally powered sensor networks,” in Information Processing in Sensor Networks, 2005. IPSN 2005. Fourth International Symposium on, april 2005, pp. 463 – 468. [9] F. Simjee and P.H. Chou, “Everlast: Long-life, Supercapacitor-operated Wireless Sensor Node,” in Low Power Electronics and Design, 2006. ISLPED’06. Proceedings of the 2006 International Symposium on, oct. 2006, pp. 197 –202. [10] F. Li, L. Ye, G. Zhang and G. Meng, “Bearing Fault Detection Using Higher-Order Statistics Based ARMA Model,” Key Engineering Materials, vol. Damage Assessment of Structures VII, pp. 271–276, September 2007. [11] I. S. Bozchalooi and M. Liang, “Parameter-free bearing fault detection based on maximum likelihood estimation and differentiation,” Measurement Science and Technology, vol. 20, no. 6, 2009. [12] P. J¨aa¨ skel¨ainen, V. Guzma, A. Cilio and J. Takala, “Codesign Toolset for Application-Specific Instruction-Set Processors,” in Proc. SPIE Multimedia on Mobile Devices, Jan. 29–30 2007, pp. 65 070X–1 – 65 070X–11. [13] Low-Power, High-Performance TTA Processor for 1024-Point Fast Fourier Transform, 2006. [14] Microsemo SoC Products Group, “Flash FPGAs in the value-based market,” Tech. Rep., 2005. [Online]. Available: http://www.actel.com/ documents/ValueFPGA WP.pdf [15] MicroSemi SoC Products Group, “Igloo low-power Flash FPGAs with Flash*Freeze technology,” 2011. [Online]. Available: http: //www.actel.com/documents/IGLOO DS.pdf [16] J. Janhunen, “Programmable MIMO Detectors,” Ph.D. dissertation, University of Oulu, 2011. [17] C. Lomont, “Fast inverse square root,” Tech. Rep., 2003. [18] J. Mais, “Spectrum analysis: The key features of analyzing spectra,” 2002. [Online]. Available: http://www.scribd.com/doc/ 30396415/Spectrum-Analysis [19] B. Li, M.-Y. Chow, Y. Tipsuwan and J.C. Hung, “Neural-network-based motor rolling bearing fault diagnosis,” Industrial Electronics, IEEE Transactions on, vol. 47, no. 5, pp. 1060 –1069, oct 2000. [20] “Project : Tmote sky.” [Online]. Available: http://www.snm.ethz.ch/ Projects/TmoteSky [21] S. Sudevalayam and P. Kulkarni, “Energy Harvesting Sensor Nodes: Survey and Implications,” Communications Surveys Tutorials, IEEE, vol. 13, no. 3, pp. 443 –461, 2011.