An Evaluation of Microphone Array Sound Source Localization for Deforestation Detection with Wireless Sensor Networks Lucian Petrica∗1,2 1
University Politehnica of Bucharest, Romania 2 IMT Bucharest, Romania September 20, 2015
Abstract
have been proposed as an efficient deforestation monitoring (DM) solution. A WSN for deforestation monitoring (DM-WSN) consists of a number of microphone equipped sensing nodes (SNs), capable of analyzing audio data individually or cooperatively in order to determine whether deforestation activities are taking place in the monitored area, e.g. chain-saw cutting [3, 4]. For chainsaw identification in particular, several algorithms and their WSN implementations have been proposed and evaluated in previous work [5, 6]. Upon identification of a chain-saw noise in the WSN monitored area, the authorities may be called in to apprehend the illegal loggers. This type of monitoring is useful in some situations where logging is completely prohibited, e.g, in natural reservations. A more complex but also more realistic use-case for a DM-WSN is when cutting is permitted in certain areas of the forest but prohibited in others such as steep slopes or riverbanks. Another often encountered form of illegal logging is clear-cutting, whereby all trees in an area are cut down, which prevents forest re-growth. Licensed loggers sometimes ignore such restrictions, looking to maximize the profit from an allocated license. Both of these scenarios are very difficult to detect by the DM-WSNs proposed in previous work because the localization, i.e., finding the approximate position of the deforestation activity, is required in addition to detection. The simplest form of localization is identifying the sound source direction. To date, no DM-WSN has been proposed with sound source localization (SSL) capability. Figure 1 illustrates four SNs of a DM-WSN and a triangular area of restricted logging, e.g., a steep slope. If as in Figure 1a logging takes place in the restricted area, the sound propagates outward and reaches the SNs. Without localization capability, as in Figure 1b, the WSN cannot determine if the logging activity is legal or not. At most, it may be said that there is a higher probability of the sound originating somewhere within the polygon enclosed by the SNs. Complicating matters further is the fact that the range of each SN, i.e., the maximum distance at which it is able to detect logging noise, is a factor of the acoustic propagation environment and varies with temperature, air pressure,
Illegal deforestation is a worldwide problem which may be alleviated by employing technological means of deforestation monitoring (DM), such as wireless sensor networks (WSNs) capable of identifying chain-saw noise. However, the practical utility of currently proposed WSNs for DM is restricted by their inability to precisely determine the location of the chainsaw activity. We propose to utilize microphonearray-equipped sensor nodes (SNs) and the Delay-and-Sum (DS) beam-forming algorithm to identify the direction of incoming chain-saw noise. Our work is the first application of this technique for chain-saw noise. We evaluate array configurations of 4, 8, and 16 microphones, and a multitude of DS algorithm configurations on an online dataset of chain-saw noise. We also implement the DS algorithm as a digital circuit in a Xilinx Spartan-6 FPGA and analyze its energy consumption. Our analysis indicates that accurate chain-saw localization can be achieved with much simpler microphone arrays and DS configurations compared to previous work. Furthermore, the localization capability increases the SN energy consumption by less than 10% compared to chain-saw identification only.
1
Introduction
Illegal deforestation is a worldwide problem, with recent studies [1] indicating that between 20% and 40% of all deforestation is illegal in nature. As a direct result of illegal logging, national governments are losing significant timber sales revenue, and deforestation is estimated to be the second largest anthropogenic source of carbon dioxide emissions, with an estimated 12% of the worldwide CO2 emissions caused by deforestation [2]. Affected countries are looking to implement technological measures to monitor the forest and discourage deforestation. Wireless sensor networks (WSNs) ∗
[email protected]
1
Figure 1: Example DM-WSN coverage of forest exploitation. signal phase at the sensing nodes is caused by differences in relative position of the sound source to the respective SNs. If the geometry of the WSN is known a priori, then the individual measurements are aggregated to determine the position of the sound source relative to the WSN, through a beam-forming algorithm such as SRP-PHAT [10]. As the beam-forming algorithm requires all the acoustic data captured by each SN, it follows that the data must be transported from each SN to a central processing location. In [11] it is demonstrated that communication energy costs for WSN nodes is orders of magnitude larger than computation and sensing energy costs, therefore this centralized beam-forming exerts a large energy cost on the WSN.
wind and other factors [7, 8]. SSL-enabled SNs, capable of identifying the direction of the incoming sound, provide a clear advantage, as illustrated in Figure 1. Each SN determines the direction of the incoming sound and by combining direction estimations from two or more SNs, the exact source of the sound may be identified, and an alarm may be raised if the source is within the restricted area. SNs may also identify clear-cutting indirectly, by indicating unusually concentrated chain-saw activity in a small area. This paper evaluates a SSL-enabled SN architecture based on a circular array of microphones, in the context of DM-WSNs. We build upon a previously published, general-purpose microphone array SSL algorithm. We evaluate its effectiveness for chain-saw localization, on a publicly available data-set of chain-saw noise, and perform extensive design space exploration in order to identify the best algorithm parameter configurations, with regard to SSL accuracy and hardware implementation efficiency. Our work demonstrates that microphone array chain-saw localization is possible and effective even in hardware configurations much reduced compared to previous work. We also evaluate SSL from the perspective of energy consumption, which is the most important WSN metric, and demonstrate that microphone array SSL consumes less than 2 mJ per localization, and increases SN energy consumption by only under 10% compared to chain-saw identification only. Together, these contributions demonstrate that microphone array SSL for DM-WSNs is not only possible, but also practical.
A different, distributed TDOA approach to sound source localization has been explored in [12] utilizing 52-microphone multiple-ring circular arrays on each SN, utilizing a FPGA on the SN to execute a Delayand-Sum (DS) [13, 14] beam-forming algorithm for sound direction estimation. The multi-microphone data gathering and processing in A-TDOA is above the capabilities of a microprocessor and a Field Programmable Gate Array (FPGA) must be utilized for signal acquisition and processing. Therefore, at the cost of increased SN complexity and energy consumption, localization may be performed without significant inter-SN communication. DS-based SSL has been evaluated in [12] on monochromatic sounds and found to provide good localization accuracy for high-pitched sounds.
While the previous work on microphone array SSL is conclusive with regard to the general characteristics of microphone array SSL for monochromatic sounds, 2 Previous Work it remains an open question the extent to which the Sound source localization has been achieved for single- DS algorithm is effective for chain-saw and tree cutting microphone SNs in previous work through the use of sound, which is inherently noisy and complex. Furtherdistributed time difference of arrival (TDOA) [6, 9]. more, a 52-microphone array is excessive for a practical In TDOA, the WSN functions as a microphone ar- WSN; therefore it is desirable to identify a lower bound ray. To perform the localization, nodes synchronize on the number of microphones which enable effective their internal clocks and sample a reference sound at sound source localization in the context of chain-saw a pre-determined moment in time. Because it has al- noise. Less microphones are expected to not only reready been established through synchronization that duce the cost of a SN, but also the energy consumpthe sampling is simultaneous, any difference in audio tion during SN operation, thus extending WSN life. 2
Figure 2: A circular array of 8 microphones, its principal directions, and their steered response power
Figure 3: The sensor node with delay-and-sum algorithm FPGA implementation Finally, an exploration of the FPGA design parameters is desirable in order to determine how factors such as FFT size and the number of audio samples analyzed affect the dimensions of the FPGA circuit and its power dissipation. While the authors of [12] claim a FPGA implementation of their algorithm, no specifics are described in their work, and no power analysis is performed.
the array microphones is processed by the Delay-andSum algorithm which generates, for each PD, a steered signal, i.e., a signal which amplifies sounds along the respective PD and attenuates sounds from all other directions, resulting in a per-PD power value. The maximum power value corresponds to the PDs closest to the incident sound direction.
The DS algorithm operates as follows. In the first stage of DS processing, a number of samples are captured from each microphone. The sound is assumed 3 SSL Implementation to be a plane wave, corresponding to a sound source at a distance much larger than the diameter of the In this work we propose utilizing single-ring, 4-, 8- and microphone array. As the sound wave propagates, it 16-microphone arrays with a symmetric circular geomreaches the array microphones in sequence, and the etry. The number of microphones is denoted NM . The signal from each microphone represents the same auradius of the array is 10 cm as in [12], a reasonable size dio information, but differing in phase. for a SN. Figure 2 illustrates an 8-microphone array. The principal directions (PDs) of the array are radial Subsequently, a steered signal is generated for P D0 directions which connect the center of the array with by delaying the input signals and adding together the each of the array microphones. Because the array is results. The delays at this stage are designed to comsymmetric, any PD may be chosen as the root P D0, pensate for the phase differences of the microphone and all other PDs are named according to the counter- signals if the sound is incident along P D0, i.e., coming clockwise angle between themselves and PD0. A 16- from top to bottom in Figure 2. If this is the case, microphone array has 8 supplementary microphones the steered signal is an amplified version of the input and PDs. signals. If not, the input signals cancel each-other out, The principal directions are separated by the step and the steered signal is mostly noise. The process angle θS , the value of which is expressed in Equation 1, is repeated to generate a steered signal for each PD. while the angle between P D0 and the direction of the Power values are computed for each steered signal by incoming sound is denoted θI . The sound captured by performing the sum of squares of FFT bin values. 3
dataset contains 40 recordings of chain-saw noise, each 5 seconds long, sampled at 44.1 KHz, encoded as Ogg Vorbis [18] files. For the purposes of our evaluation, each ESC audio file was converted to the Octavecompatible WAV format and segmented into four 1second files, resulting in an evaluation dataset of 160 files. Evaluation was performed for three array configurations, corresponding to NM values 4, 8, and 16. Various incoming sound directions were simulated in Octave by applying the correct delays to the sound at each of the microphones. With regard to the DS algorithm configuration, we explored its effect on the localization accuracy. In our implementation, when NS < NF F T a Hanning window is applied to the audio, which is subsequently zero-padded up to the FFT size. Otherwise, the audio is truncated to the FFT size and the window applied before FFT processing. All FFT bin values corresponding to frequencies between 600 Hz and 5 KHz are accumulated to obtain the power of the steered signal. We chose the 600 Hz lower threshold because of findings in [12] which indicate poor directionality in lower frequency sounds, and set the upper threshold based on empirical observations of the spectrum of chain-saw noise, which is mostly concentrated below 5 KHz.
360 θS = (1) NM Figure 3 illustrates a SN with localization capability, whereby the sensor array in Figure 2 transmits audio samples to a FPGA, which computes the steered signal power values and communicates them to a microcontroller, which also controls communication to other SNs via WiFi. The delay and sum circuit is also illustrated. We assume the microphone outputs are encoded in Pulse Density Modulation [15], as is the case with most micro-electromechanical (MEMS) microphones. Compatibility with MEMS microphones is important to our work because of their compact size and energy efficiency, which makes them especially suitable to WSN use. MEMS PDM microphones capture PCM samples at low sampling frequency, which is subsequently PDM-encoded as 1-bit symbols at a high symbol frequency. The PDM oversampling rate ROS is defined as the ratio between the PDM symbol frequency and the PCM sampling frequency. For the rest of this paper, ”samples” refers to PCM samples, while symbol refers to PDM symbols. For each sample captured by the microphone, a total of ROS symbols are transmitted to the FPGA. In the FPGA, the symbols corresponding to NS + NO samples are loaded into memory buffers, where NO is the maximum possible sample offset between two microphones in the array, given the diameter of the array and the speed of sound. After the buffers have been filled, the symbols corresponding to NS samples are read out simultaneously from all memories, with each buffer reading from a different offset in order to re-align the microphone signals. The buffer outputs are added together, resulting in a signal of NS · OS symbols of log2 NM bits each. Our FPGA implementation differs from previous work in [12] in two ways. Firstly, [12] captures symbols for only NS samples and utilizes circular buffers. This introduces an artificial discontinuity in the signal at the output of the buffer when it is read. We capture instead sufficient samples to implement true DS delays without introducing discontinuities. Secondly, we have added an accumulator after the DS adder, which accumulates each ROS consecutive adder outputs and operates as a simple low-pass filter and PDM de-modulator. The accumulator reduces the number of steered signal samples to NS and allows us to use smaller FFT sizes. The tuple (NS , N F F T ), where NF F T is the number of FFT bins, defines a DS configuration. The tuple (NM , NS , N F F T ) defines a SN configuration. The microphone array configuration is defined simply by NM .
4
4.1
Correct Localizations
In order to obtain a first evaluation of the quality of the SSL with the implemented DS algorithm, θI is fixed at 0 degrees for each of the SN configurations. Figure 4 illustrates the correct localization rate for the four microphone array configuration. The 128-bin FFT performs badly for all values of NS , while larger FFT sizes perform better and provide over 90% correct localizations for NS > 200. Overall, performance increases only marginally above 512 frequency bins. All localizations are correct for NF F T larger than 512 bins and NS larger than 500 points. A similar situation is described in Figures 5 and 6 for the eight and sixteen microphone array configurations respectively. On these array configurations, the 128-bin FFT DS configuration obtained under 75 correct localizations. Localization again improves slightly with FFT size up to 2048 bins. For NS > 200, the accuracy on different sizes of FFT is practically indistinguishable. Overall, localization evaluations reveal that the DS algorithm performs well on all array configurations if NS > 300, and the FFT size is 512 bins or more. For all three array configurations under analysis, over 99% of localizations are correct under these conditions, suggesting that utilizing larger FFT sizes, such as in [12], is redundant and does not offer real benefit for chainsaw localization.
SSL Accuracy for Chain-Saw Noise
We evaluate the chain-saw localization accuracy 4.2 Localization Confidence through simulation, utilizing a GNU Octave 3.8 [16] implementation of the DS algorithm and the Har- We define the confidence of a localization of a sound vard ESC environmental sound dataset [17]. The ESC according to Equation 3, where X is described by Equa4
Figure 4: Correct localizations as a function of NS and Figure 6: Correct localizations as a function of NS and NF F T , for NM = 16 NF F T , for NM = 4
CθI =
tion 2 and denotes the index of the PD which we expect to yield the largest power value, given the angle of incidence. Therefore, if for example the angle of incidence is 35 degrees for a 4-microphone array, the value of X is 1, and we expect the steered signal corresponding to PD1 to be the most powerful. The localization confidence is then the power of the steered signal of PD1, minus the largest power of the steered signals of the remaining PDs, divided by the largest overall power value. If the localization is correct, then the confidence is positive, otherwise it is negative, and it is always in the interval [-1,1]. Confidence values approaching 1 are desirable as they indicate that the DS result is unlikely to be perturbed by noise. θI N ) 360
(3)
We compute and present the confidence of the localization for all array and DS configurations, omitting the 128-bin FFT, which was proven to perform badly in the previous section. Figures 7, 8, and 9 illustrate the localization confidence for the array configurations with 4, 8, and 16 microphones respectively. For NM = 4, confidence rises sharply up to NS = 200, for all FFT sizes, and subsequently levels off to a peak of 0.62 at NS = 300, then dropping off towards 0.6. Above 400 samples there is no noticeable difference between FFT sizes. The increase in confidence with NS can be attributed to the increase in spectrum resolution, which also explains why larger FFT sizes perform better with smaller sampling windows. A similar scenario is observed for NM = 8, with slightly higher confidence overall and a larger difference between FFT sizes. This affect may be attributed by the amplification effect of having more microphones, and is also observed on the 16-microphone array, although it must be noted that the confidence difference between the 512-bin FFT and the 16384-bin FFT is only 20%. The 16-microphone array has less confidence overall than the other configurations, a fact which may be attributed to the physical proximity of the microphones on a 10 cm radius circle, causing confusion between adjacent microphones.
Figure 5: Correct localizations as a function of NS and NF F T , for NM = 8
I = round(
PI − maxi6=I Pi maxi Pi
4.3
Directivity
So far evaluations have been performed for θI = 0, but in a real-world setting sound sources will rarely be aligned with an array principal direction. We therefore evaluate the DS algorithm for angles of incidence between 0 and θS /2, in order to determine the localization correctness and confidence when the sound direc(2) tion does not coincide with one of the array principal 5
Figure 9: SSL Confidence as a function of capture winFigure 7: SSL Confidence as a function of capture window size, for NM = 16 dow size, for NM = 4
Figure 10: SSL Confidence as a function of Normalized Figure 8: SSL Confidence as a function of capture win- Angle of Incidence, for N F F T = 1024 and NS = 500 dow size, for NM = 8 directions. For each selected SN configuration, five lo- P D0. Figure 10 illustrates the loss in SSL confidence calizations were performed on the entire dataset at the as the incidence angle increases. As expected, the confidence decreases to zero at the mid-point between various angles of incidence described in Equation 4. microphones, with the 8-microphone array retaining slightly more confidence. θS 2θS 3θS 4θS θS θI ∈ { , , , , } (4) 9 9 9 9 2 Table 1 lists the average normalized confidence for various array and DS configurations. The normalized θ I θIN = (5) confidence responds very little to changes in either θS FFT size of number of samples analyzed by the DS CθI = hCθI i (6) algorithm, suggesting that the drop in SSL confidence is more of a natural result of the array geometry than Cθ CθNI = I (7) of the DS processing. The maximum localization erC0 ror is inversely proportional to NM, but again we obThe normalized incidence angles are described in serve a reduction in confidence at NM = 16, indicating Equation 5 in relation to the array step angle, while that the array is becoming too densely packed, and Equations 6 and 7 express the average SSL confidence the diameter should be increased. Overall the localover all values of θI and its normalized form in rela- ization performance is good even when sound sources tion to the SSL confidence of sounds incident along are not aligned with the principal directions of the mi6
NM
max θE
NF F T 512
4
45 1024
512 8
22.5 1024
512 16
11.25 1024
NS 300 400 500 300 400 500 300 400 500 300 400 500 300 400 500 300 400 500
Parameter [mW] TP U [ms] TS [us] TCCLK [ns] TP OR [ms] PQLX4 [mW] PQLX9 [mW] PQLX16 [mW] PQLX25 [mW] LX4 SB [Mbits] LX9 SB [Mbits] LX25 SB [Mbits]
CθNI 0.65 0.66 0.68 0.64 0.65 0.66 0.72 0.72 0.72 0.73 0.73 0.73 0.61 0.61 0.61 0.62 0.62 0.62
PaM EM S
Table 2: Energy Model Parameters
In order to obtain a comparison of the localization energy compared to chain-saw identification, we rely on data in [19] which states that the time TCSI required for identification is 3 seconds, when the algorithm executes on a Mica sensor node equipped with a ATmega128RFA1 [20] microprocessor. We extract the uP active power PaM ICA from the datasheet, in order to obtain the identification energy with Equation 11, and compute in Equation 12 the energy increase ∆E caused by adding localization to a system which is already performing identification. The FPGA processing energy component depends upon the DS configuration and FPGA device utilized and therefore cannot be expressed analytically. We evaluate EP through simulation, by executing a localization with the DS algorithm, when implemented on a Xilinx Spartan-6 Low Power FPGA [21]. Table 2 lists the values of all the energy model parameters. In the case of FPGA parameters, values are listed for all the Spartan-6 FPGA devices utilized in our evaluation: LX4, LX9, and LX25. The MEMS parameters correspond to the ST Microelectronics MP32DB01 device [22]. The FPGA DS implementation was performed in Verilog HDL, utilizing behavioral code and FFT cores generated by the Spiral project [23], which offers an online FFT core generation portal. Iterative (nonpipelined), radix-2 cores were generated for each FFT size from 128 to 16384 bins. In order to evaluate the power dissipation and energy of the FPGA circuit, the implementation of each DS configuration was synthesized with Xilinx ISE 14.7 in order to determine the smallest Spartan 6 device into which the DS implementation fits. Afterwards, each design was placed and routed on the identified device, with constraints for the system clock at 50 MHz and the PDM sampling frequency of 2 MHz. A delay-sum operation was simulated on a post-implementation FPGA netlist in order to obtain signal activity rates, which were utilized by the Xilinx Power Analyzer software to extract the average power dissipation with high confidence. During each post-implementation simula-
Table 1: Normalized SSL Confidence of Selected Configurations. crophone array. The performance decreases when the angle of incidence approaches the mid-point between microphones, as expected for single-ring microphone arrays.
5
SSL Power and Energy
The total SSL energy consumption consists of three factors, as illustrated in Equation 8, corresponding to MEMS microphone sampling energy EM EM S , FPGA processing energy EP , and FPGA power-up and configuration energy EC . In turn, EM EM S may be approximated with good accuracy by Equation 9 from the active power, the sampling frequency and the power-up time. The FPGA configuration energy depends on the time required for configuration and the quiescent currents of the FPGA device, available in the datasheet. The FPGA power-up energy EC is estimated in Equation 10 from the size of the FPGA bitstream SB , the configuration clock frequency TCCLK and the poweron-reset time TP OR . The quiescent power PQ is computed from the voltage and quiescent current specifications in the FPGA datasheet. The reader should note that the configuration energy is a rough approximation, as actual configuration currents are not specified in the FPGA datasheet. ET = EM EM S + EC + EP EM EM S =
NM PaM EM S ((NS
EC = PQF P GA (
+ 30)TSP CM
(8) + TP U ) (9)
F P GA SB TCCLK + TP OR ) 16
ECSI = TCSI PaM ICA ∆E =
ECSI + ET ECSI
Value 1.17 10 22.5 40 20 11.15 11.15 16.5 21.6 2.73 2.74 6.44
(10) (11) (12) 7
NM
4
8
16
NS
NF F T
256 200 300 400 200 300 300 400 200 300 200 300
256 1024 1024 512 2048 512 2048 1024 1024 512 4096 1024
C0 0.51 0.59 0.63 0.69 0.41 0.44
FPGA Device LX4L LX4L LX9L LX4L LX9L LX9L LX9L LX9L LX9L LX25L LX25L LX25L
TP [ms] 9.66 8.57 11.18 13.53 10.82 12.21 13.79 15.68 12.08 14.78 18.85 15.77
PP [mW] 29.43 27.14 31.53 28.69 37.04 36.91 39.56 38.13 40.76 60.78 61.67 61.71
EP [uJ] 284 232 352 388 400 450 545 597 492 898 1162 973
EC [uJ] 299 299 299 299 299 299 299 299 299 779 779 779
EM EM S [uJ] 41 35 46 57 60 81 81 102 108 150 108 150
ET [uJ] 624 566 697 744 759 830 925 998 899 1827 2049 1902
∆E [%] 2.8 2.5 3.1 3.3 3.4 3.7 4.1 4.5 4.0 8.2 9.2 8.6
Table 3: SSL Energy
Figure 11: Distribution of energy components for various SSL configurations tion, the duration of the localization was recorded in order to obtain an energy consumption figure for the localization. For each array configuration, we select two pairs of DS configurations with roughly the same localization accuracy, resulting in the SSL configurations listed in Table 3 along with the approximate confidence value for each pair and the power and energy required to perform the DS processing in FPGA. At both accuracy points for the 4- and 8-microphone arrays, it is more expensive energy-wise to increase NS than NF F T . This is because the transfer of samples between the MEMS microphones and the FPGA occurs at a much lower frequency than FFT processing; therefore more processing time is required for a larger number of samples, relative to a larger FFT size for the same amount of samples. The only exception to this rule occurs at the 0.44 confidence point of the 16-microphone array, where the very large FFT is actually more expensive energetically. Overall, the results indicate that an increase in accuracy is energetically more efficient to achieve through an increase in NF F T rather than an increase in NS, and the difference between the two approaches is less
as NM increases. The table also indicates the smallest Spartan-6 Low-Power FPGA device required to implement the DS algorithm in each configuration, demonstrating that larger and more expensive FPGA devices are necessary as NM increases. Larger FPGAs also have more leakage current, thereby increasing processing power, and larger configuration bitstreams, increasing configuration energy.
Figure 11 illustrates the distribution of energy towards the components EM EM S , EC , and EP for various SSL configurations, including those in Table 3. Immediately evident is that for all configurations, the dominant energy components are EC and EP . Significant energy increases occur when the larger FPGA devices are required, such as is the case with most 16microphone array configurations. EC varies between 30% and 50% of total energy, and may be reduced by utilizing flash-based FPGAs, which do not require configuration at power-up. 8
6
Conclusion
Technologies, 2007. ISCIT’07. International Symposium on, pp. 1413–1416, IEEE, 2007.
Sound source localization has the potential to enable practical deforestation monitoring with wireless sensor networks. Our paper provides an extensive design space exploration of this localization technique for chain-saw noise, and demonstrates that localization of chain-saw noise is possible with good accuracy even on arrays with a small numbers of microphones. A 4microphone array and the smallest Spartan-6 FPGA is sufficient to provide accurate localization at minimum hardware cost for a sensor network node. Our analysis also quantifies the energy cost of sound source localization, suggesting that the technique would not add significant power and energy overheads to a deforestation monitoring sensor network. We envisage several avenues of future work, centered on the evaluation of our proposed SSL system in real-world scenarios. Factors such as reverberation and fading have not been evaluated in this work and may require modifications to the Delay-Sum algorithm or the microphone array. With regard to the FPGA implementation of the Delay-Sum, investigations into Flash-based FPGAs are possible in order to reduce configuration energy.
[6] J. C. Chen, L. Yip, J. Elson, H. Wang, D. Maniezzo, R. E. Hudson, K. Yao, and D. Estrin, “Coherent acoustic array processing and localization on wireless sensor networks,” Proceedings of the IEEE, vol. 91, no. 8, pp. 1154–1162, 2003. [7] M. E. Swearingen and M. J. White, “Influence of scattering, atmospheric refraction, and ground effect on sound propagation through a pine forest,” The Journal of the Acoustical Society of America, vol. 122, no. 1, pp. 113–119, 2007. [8] U. Ing˚ ard, “A review of the influence of meteorological conditions on sound propagation,” The Journal of the Acoustical Society of America, vol. 25, no. 3, pp. 405–411, 1953. [9] T. Ajdler, I. Kozintsev, R. Lienhart, and M. Vetterli, “Acoustic source localization in distributed sensor networks,” in Signals, Systems and Computers, 2004. Conference Record of the ThirtyEighth Asilomar Conference on, vol. 2, pp. 1328– 1332, IEEE, 2004.
Acknowledgments
The work was funded by the Sectoral Opera- [10] J. H. DiBiase, H. F. Silverman, and M. S. Brandstein, “Robust localization in reverberant rooms,” tional Programme Human Resources Development in Microphone Arrays, pp. 157–180, Springer, 2007-2013 of the Romanian Ministry of Euro2001. pean Funds through the Financial Agreement POSDRU/159/1.5/S/132397. [11] M. A. Razzaque and S. Dobson, “Energy-efficient sensing in wireless sensor networks using compressed sensing,” Sensors, vol. 14, no. 2, pp. 2822– References 2859, 2014. [1] D. Brack, “Illegal logging and the illegal trade in forest and timber products,” International [12] J. Tiete, F. Dom´ınguez, B. d. Silva, L. Segers, K. Steenhaut, and A. Touhafi, “Soundcompass: a Forestry Review, vol. 5, no. 3, pp. 195–198, 2003. distributed mems microphone array-based sensor for sound source localization,” sensors, vol. 14, [2] G. R. Van der Werf, D. C. Morton, R. S. DeFries, no. 2, pp. 1918–1949, 2014. J. G. Olivier, P. S. Kasibhatla, R. B. Jackson, G. J. Collatz, and J. Randerson, “Co2 emissions from forest loss,” Nature geoscience, vol. 2, no. 11, [13] R. A. Mucci, “A comparison of efficient beamforming algorithms,” Acoustics, Speech and Signal pp. 737–738, 2009. Processing, IEEE Transactions on, vol. 32, no. 3, [3] V. Harvanov´ a, M. Vojtko, M. Babis, M. Dur´ıek, pp. 548–558, 1984. and M. Pohronsk´ a, “Detection of wood logging based on sound recognition using zigbee sen- [14] M. Brandstein and D. Ward, Microphone arrays: signal processing techniques and applicasor network,” in Proceedings of the International tions. Springer Science & Business Media, 2001. Conference on Design and Architectures for Signal and Image Processing, Tampere, Finland, vol. 24, [15] P. M. Aziz, H. V. Sorensen, et al., “An overview of 2011. sigma-delta converters,” Signal Processing Maga[4] J. Pap´ an, M. Jurecka, and J. P´ uchyov´ a, “Wsn for zine, IEEE, vol. 13, no. 1, pp. 61–84, 1996. forest monitoring to prevent illegal logging,” in 2012 Federated Conference on Computer Science [16] J. W. Eaton, D. Bateman, and S. Hauberg, Gnu octave. Network thoery, 1997. and Information Systems (FedCSIS), 2012. [5] T. Soisoonthorn and S. Rujipattanapong, “Deforestation detection algorithm for wireless sensor networks,” in Communications and Information
[17] K. Piczak, “ESC: Dataset for Environmental Sound Classification.” http://dx.doi.org/10. 7910/DVN/YDEPUT. [Online; accessed 2015-07-30]. 9
[18] J. Moffitt, “Ogg vorbisopen, free audioset your media free,” Linux journal, vol. 2001, no. 81es, p. 9, 2001. [19] L. Cz´ uni and P. Z. Varga, “Lightweight acoustic detection of logging in wireless sensor networks,” in The International Conference on Digital Information, Networking, and Wireless Communications (DINWC2014), pp. 120–125, The Society of Digital Information and Wireless Communication, 2014. [20] “ATmega128RFA1 Datasheet.” http: //www.atmel.com/Images/Atmel-8266-MCU_ Wireless-ATmega128RFA1_Datasheet.pdf. [Online; accessed 2015-07-30]. [21] P. Alfke, “xilinx spartan-6 fpga user guide lite,” EE Times-Design: UBM Electronics, 2009. [22] “MP32DB01 MEMS audio sensor omnidirectional digital microphone.” http: //www.st.com/st-web-ui/static/active/ en/resource/technical/document/datasheet/ CD00284650.pdf. [Online; accessed 2015-07-30]. [23] P. Milder, F. Franchetti, J. C. Hoe, and M. P¨ uschel, “Computer generation of hardware for linear digital signal processing transforms,” ACM Transactions on Design Automation of Electronic Systems (TODAES), vol. 17, no. 2, p. 15, 2012.
10