3rd Mediterranean Conference on Embedded Computing
MECO - 2014
Budva, Montenegro
FPGA Low-Power Implementation of QRS Detectors Jovan Kovačević1, Radovan Stojanović1, Dejan Karadaglić2, Bogdan Ašanin1, Živorad Kovačević1, Zlatko Bundalo3 and Ferid Softić3 1
University of Montenegro, Podgorica, Montenegro 2 Glasgow Caledonian University, UK 2 University of Banja Luka, Banja Luka, Bosnia and Herzegovina
[email protected] ,
[email protected] The relation between the level of project/algorithm complexity and the necessity for power optimisation are analysed and some recommendations are proposed.
Abstract—This paper presents a low power implementation of the algorithms for QRS complex detection in FPGA technology. We used cases of Balda and Pan-Tompkins algorithms for the case study. The optimization methodology is based on the use of heterogeneous logic blocks, pipelining, the variable code word lengths, on chip reorganizing of logic blocks and the control of the clocks. By applying the proposed techniques, the reduction of power consumption by 71% is achieved, in addition to the reduction of the chip occupancy by approx. 91%. The proposed optimization methodology and techniques are also applicable to other applications. The cases when the optimization could be justified in the term of project complexity are analysed and discussed.
II. FPGA STRUCTURE AND LOW-POWER DESIGN A. FPGA structure and sources of dissipation FPGAs belong to a group of reconfigurable digital integrated circuits. Their structure usually contains four units: configurable logic blocks (CLBs), input and output, I/O blocks, configurable routes for connections between blocks and blocks with predefined functions, such as memory and arithmetic circuits (adders, multipliers, etc.), Figure 1.
Keywords: low power design, FPGA; QRS detection.
I.
CLBs, I/Os and memory units affect static power dissipation. The parasitic capacitances of FPGA routes cause dynamic power dissipation which occurs during the signal transition due to the charging and discharging cycle. This dissipation is closely related to the transition density, which is equal to the ratio of the number of signal transitions to the total number of clock signal intervals. The transition density can have values from 0 to 1, and in the case of 1, the signal has the same number of transitions as the clock signal.
INTRODUCTION
The electronic medical equipment and devices are used for diagnosis, prevention and treatment of various diseases and conditions. The portability is often one of their most appreciated properties [1]. Usually they are battery powered, and sometimes they feature extremely low power consumption. The choices for the designers of such devices are frequently restricted to few families of low power microcontrollers. However, FPGA (Field Programmable Gate Array) chips have become increasingly attractive recently because of the high degree of parallelism, low cost, in-system reconfigurability and the possibility for rapid prototyping. The FPGA chips are by their nature high energy consumers, and that significantly reduces their applicability in the energy critical applications such are biomedical ones. Nevertheless, there is an increasing demand to apply FPGA chips in various biomedical applications like multichannel physiological signal processing (ECG, EEG etc), medical imaging, minimally-invasive surgery platforms, radio-frequency identification etc.
Block of programmable switches
Logic block
Route
Figure 1 FPGA structure.
Therefore, an additional effort to optimise the power consumption in FPGA circuits for biomedical signal processing is required, primarily to identify the sources of power consumption [2] and then to employ adequate power optimisation techniques.
This paper presents an attempt to address the issue how to optimise this power consumption. Several optimisation strategies are demonstrated on the example of processing of physiological, such as ECG QRS complex detection. The Balda and Pan-Tompkins algorithms are considered here.
is node „y“ capacitance, main power supply, where maximum voltage signal, D(y) the transition density of node „y“, and is clock frequency. The expression is calculated as the sum of the nodes (hubs) of integrated circuits, and
The total dynamic power can be represented as [3]:
98
=∑
∙
∙
∙
∙ ( )∙
3rd Mediterranean Conference on Embedded Computing
MECO - 2014
Budva, Montenegro
determines the dynamic power dissipation per clock pulse. Transition density of the node „y“ is calculated based on the static probability, using the expression [4]:
8) The reconfiguration of the system while changing the working conditions.
In this section, two case studies of FPGA low power design are presented related to the implementations of QRS detection using Balda and Pan-Tompkins algorithms.
( ) = 2 ∙ ( ) ∙ (1 − ( ))
III. LOW-POWER OPTIMISATION IN QRS DETECTION
where P(x) is the probability that the signal „x“ is in the state of logical one. The above expression is calculated separately for each route and logic block, using the estimation model [3].
A. Balda algorithm The algorithm contains several steps and starts by calculating the first derivative ( ), based on the input signal ( ):
Static power dissipates when FPGA is working in the stationary regime, i.e. when there is no change in the clock speed. Leakage currents are main cause of this dissipation. There are two types of leakage current: inverse and subthreshold current, which appear at voltage values in the region of the threshold voltage. The impact of inverse current is negligible, so the static power dissipation, at transistor level, is calculated as:
=
∙
( ) = | ( ) − ( − 2)|.
The second derivative is calculated as:
( ) = | ( ) − 2 ( − 2) + ( − 4)|.
The resulting signal is calculated by combining the first and second derivates:
is main voltage supply, and drain current. For the where calculation of total static power dissipation, it is necessary to use one of the previously developed estimation models given in [3].
( ) = 1.3 ( ) + 1.1 ( ).
The resulting signal is compared with the normalized threshold. If the threshold is exceeded, from the next 8 values, 6 of them should cross the threshold in order to proclaim that segment as part of QRS complex. The result of the algorithm is a rectangular pulse width proportional to the width of the QRS complex. It should be noted that the method is sensitive to noise.
B. Techniques of Low-Power design The literature lists several techniques for the design of lowpower FPGA systems [5]: 1) The use of heterogeneous integrated blocks against the configurable logic blocks (CLBs), where possible. 2) Pipelining (system of pipes) - one instruction is divided into several smaller operations that are carried out simultaneously. In this way, the energy consumption per operation is reduced for between 40% and 90%, for example in the multiplication of integers.
B. Pan-Tompkins algorithm Pan-Tompkins algorithm is used for detection of the cardiac impulse (QRS complex) of the ECG signal. The algorithm was proposed by Jiapu Pan and Willis Tompkins in 1985 [6]. The algorithm reliably detects QRS complex, based on the digital analysis of the amplitude, width and slope of the signal. In the first stage of the algorithm, the signal is passed through a lowpass, highpass and derivative filter, in order to reduce the impact of electromyographic noise, electrical interference and "baseline drift" artifact. The lowpass filter of the Mth order transfer function is described:
3) The optimization of the length of the word - the use of decimal numbers with a stationary point. The studies have shown that this method can reduce energy consumption by as much as 87%, in the designed adaptive filters. 4) The control („gating“) of the clock represents the technique used for reducing the dynamic consumption in a way that the propagation of clock signals towards the inactive regions is disabled.
5) Dynamic alignment of the voltage supply is used to adapt the voltage supply of the FPGA chip while having temperature change. The consumption reduced by using this method can vary from 4% to 54%.
H (z) = ∑
a z
stands for the filter coeficient. The highpass filter of where the Mth order is described by transfer function:
6) The methods of planning the chip or “floorplanning” the elements on the chip among which a common communication is maintained, are grouped together so as to reduce the dynamic dissipation.
( )=∑
is filter coeficient. After that, the signal is derived, where using filter of the Mth order described by the transfer function:
7) The combination of the concept of Runtime Reconfiguration (RTR) and optimization of the word length.
99
H (z) = ∑
c z
3rd Mediterranean Conference on Embedded Computing
MECO - 2014
x Scenario 1, non-optimized, x Scenario 2, optimized by methods of pipelining and the use of heterogeneous logic blocks, x Scenario 3, optimized by methods of pipelining, the use of heterogeneous logic blocks and word length optimization, x Scenario 4, optimized by methods of pipelining, the use of heterogeneous logic blocks, word length optimization, clock gating and floorplanning, which are tested on the power consumption as a function of clock frequency. IV. TESTING, RESULTS AND DISCUSSION The testing procedure is implemented using the following approach: 1) The algorithms are implemented in MATLAB, 2) The algorithms are implemented in power nonoptimized code. 3) The algorithms are implemented in power optimized code so that each of the constituent components is firstly coded in VHDL using accepted optimisation techniques. 4) The optimisation techniques are clustered in Scenarios 1 to 4, which are tested on power dissipation, silicon consumption and the accuracy of detection in the terms of time diagrams. „PowerPlay Power Analysis“is the tool used for power dissipation testing. It calculates the total thermal power dissipation for a given scenario, at a range of frequencies (from 50 MHz down to 100 Hz). The following tools are used for simulation and testing of silicon consumption and detection accuracy: Quartus II, ModelSim and Matlab. The results are compared and the conclusions are derived. FPGA test case is Altera integrated circuit EP4CE115F29C7 (circuit capacity is 114480 logic elements and 529 pins). A. Testing of Balda algorithm For the nonoptimized Scenario 1 the word length was of 32 bits. The total thermal dissipation at a frequency of 50 MHz, was 147.74 mW, Figure 2. This scenario required 1118 logic elements and 36 pins. For Scenario 2 and Scenario 3 the dissipation reduction is considerable, while for Scenario 4, Figure 2, the total thermal dissipation in this scenario, at a frequency of 50 MHz, is 134.75 mW (dissipation is reduced by 8.8%). This scenario
100
148
50 MHz 25 MHz 1 MHz 1 KHz 100 Hz
146 144
Power [mW]
142 140 138 136 134 132 130
Scenario 1
Scenario 2
Scenario 3
Scenario 4
Figure 2. Total thermal power dissipation for the four scenarios described in this chapter. Scenarios are shown on x-axis.
Figure 3 illustrates the time diagram of detected QRS complexes for Scenario 1 and Scenario 4. Since the testing ECG signal in raw format was of duration that request huge number of simulation operations in Quartus, the ModelSim was used. 1
Relative Signal Amplitude
C. Optimization scenarios The following optimisation techniques from Section II.A are applied in the optimisation of Balda and Pan-Tompkins algorithms: the use of heterogeneous logic blocks, pipelining, word length optimization, clock gating and floorplanning. These techniques are clustered in four optimisation scenarios:
requires 419 logic elements (reduction of 62.5%), and 12 pins (a decrease of 66.66%).
0.8 0.6 0.4 0.2 0
-0.2 -0.4 -0.6 -0.8 -1
0
0.5
1
1.5
0.5
1
1.5
2
t[s]
2.5
1
Relative Signal Amplitude
In the second stage of the algorithm, the signal is squared, wherein samples assume only positive values. Then, the signal integration window is applied. The last stage of the algorithm contains a comparison of the signal with the threshold value.
Budva, Montenegro
0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1
0
2
t[s]
2.5
Figure 3. The results of QRS detection using Balda algorithm in FPGA technology, for scenarios 1 and 4. Input ECG (blue) and result (red).
The results were as expected, because the implementation of Balda algorithm is a design that takes up not more than 1% of FPGA chip resource capacities. The aforementioned chip possesses certain energy consumption, which is not dependent on the project and the design of the system. For this reason, in order to achieve the effect of applying methods of low-power design, the scalability is required. In other words, a project that requires a much greater commitment of FPGA chip resources. B. Testing of Pan Tompkins algorithm The study included four scenarios that are tested on five frequencies of the main clock (50 MHz, 25 MHz, 1 MHz, 1 kHz and 100 Hz). Scenario 1, the least favourable, in terms of
3rd Mediterranean Conference on Embedded Computing
MECO - 2014
power dissipation is characterized by the word length of 32 bits. The total thermal dissipation in this scenario, at a frequency of 50 MHz, is 566.88 mW. This scenario required 47015 logic elements and 100 pins. 600 50 MHz 25 MHz 1 MHz 1 KHz 100 Hz
550 500
400
for the two algorithms. It is noted that for Balda algorithm, this parameter reaches a value of 8.8% at a frequency of 50 MHz, and the already mentioned 70.59% for the PanTompkins algorithm on the same frequency. I.e. the degree of reduction in total power dissipation is eight times better at Pan-Tompkins algorithm. The degree of reduction in the overall dissipation will be even higher if the main (nonoptimized) design is more demanding in terms of logic and I/O resources of the integrated circuit.
350
80
300
70
Reduction degree in total dissipation [%]
Power [mW]
450
250 200 150 100
Scenario 1
Scenario 2
Scenario 3
Scenario 4
Figure 4.The total thermal power dissipation for the four examples described in this chapter.
In Scenarios 2, 3 and 4, the dissipation reduction was considerable. The total thermal dissipation in scenario with lowest total thermal dissipation (Scenario 4), at a frequency of 50 MHz, was 166.71 mW (dissipation was reduced by 70.59%), Figure 4. This scenario required 4057 logic elements (reduction of 91.37%), and 34 pins (decrease of 66%). Figure 5 illustrates the time diagram of detected QRS complexes for Scenario 1 and Scenario 4 using low power Pan-Tompkins algorithm. 1
Relative Signal Amplitude
0.8 0.6 0.4
Budva, Montenegro
Balda algorithm Pan Tompkins algorithm
60
50
40
30
20
10
0
0
5
10
15
20
25
30
35
40
45
50
Frequency [MHz]
Figure 6. The comparison of the reduction degree in the total dissipation for Balda and Pan-Tompkins algorithm performances.
V. CONCLUSION The paper presents the strategies for power consumption reduction in FPGAs with focus on the hardware implementation of algorithms for QRS detection. Several optimisation scenarios are compared and adequate conclusions and recommendations are drawn. The optimisation approach can be applied to others research and technological areas.
0.2
ACKNOWLEDGMENT
0 -0.2
The research presented was funded as a part of the National project MESI Montenegro and Bilateral project between Montenegro and B&H.
-0.4 -0.6 -0.8 -1
0
0.5
1
1.5
2
2.5
t[s]
REFERENCES
1
Relative Signal Amplitude
0.8
[1]
0.6 0.4 0.2 0
[2]
-0.2 -0.4 -0.6
[3]
-0.8 -1
0
0.5
1
1.5
2
2.5
t[s]
Figure 5. The results of QRS detection using the Pan-Tompkins algorithm in FPGA, for scenarios 1 and 4. Input ECG (blue) and output (red).
[4]
C. Comparison of the optimisation results The results of applying low-power mode in Balda and PanTompkins algorithm can be compared on the basis of the degree of reduction in total power dissipation. In Figure 6 the degree of reduction in the total power dissipation is compared,
101
[5]
[6]
Salditt P. and Bothell W. A, "Trends in medical device design and manufacturing." SMTA News And Journal Of Surface Mount Technology 17 (2004): 19-24. Kovačević J, Stojanović R, Bundalo Z, “An example of low-power design in FPGA technology for the purpose of medical measurements”, Infoteh Simposium, Sarajevo, 2013. Poon, Kara KW, Andy Yan, and Steven JE Wilton. "A flexible power model for FPGAs." Field-Programmable Logic and Applications: Reconfigurable Computing Is Going Mainstream. Springer Berlin Heidelberg, 2002. 312-321. Najm, Farid N. "Power estimation techniques for integrated circuits." In Proceedings of the 1995 IEEE/ACM international conference on Computer-aided design, pp. 492-499. IEEE Computer Society, 1995. Lamoureux J. and Luk W., "An overview of low-power techniques for field-programmable gate arrays." Adaptive Hardware and Systems, 2008. AHS'08. NASA/ESA Conference on. IEEE, 2008. Pan J. and Tompkins W. J., "A real-time QRS detection algorithm", IEEE Trans. on BMI, vol 32, (1985): 230-236.