Detection of Hum in Audio Signals

Detection of Hum in Audio Signals Matthias Brandt, Joerg Bitzer Institute for Hearing Technology and Audiology Jade University of Applied Sciences 26121 Oldenburg, Germany Email: {brandt, joerg.bitzer}@jade-hs.de

Abstract—This article presents a method to automatically determine whether an audio signal contains one or more additive disturbing tones (hum) or not. If hum is detected, frequencies of the additive tones are identified precisely. The developed algorithm, which is based on an analysis of statistical properties of the short-term Fourier transform of the input signal, does not require a priori information about the type of wanted signal (e.g. speech or music), about the intensities (i.e. powers) or frequencies of the tones that are to detect. First, the outlines of hum disturbance problems in audio signals are presented, followed by a description of the detection algorithm. The article concludes with an evaluation that documents the developed method’s good detection performance.

I. I NTRODUCTION Audio signals in general may suffer from additive sinusoid disturbances. Those are degradations of the sound quality, usually resulting from power line interference problems during recording and/or copying processes. Nonlinearities in the signal chain often generate harmonics of the mains voltage frequency which can – in some cases – go up to several kHz. Relating to the characteristic sound of this kind of disturbance, it is commonly called hum. In this article we present a method to detect above mentioned sinusoids up to frequencies of 1kHz. In contrast to existing solutions, which are focused on determining whether a sinusoid is present at a specified frequency, our algorithm does not require any a priori information about the number of tones or whether hum is present at all. Furthermore, the algorithm is capable of tracking changing hum-scenarios (i.e. combinations of tones) over time which is especially useful for working on material that is a montage of different recordings, each containing different hum disturbances. The developed method is quasi real-time capable in that it does not require access to future parts of the signal. A. The challenge For gaining an insight into the topic of this article figure 1 shows a spectrogram of a music recording containing a typical hum disturbance. The hum frequencies at 50Hz and 150Hz are – on close examination – clearly visible. They are quite striking because there is power at those frequencies for the complete duration of the recording. However, obviously the intensity of the hum disturbance is rather low, compared to tones actually contained in the wanted signal.

Considering what could be learned from examining figure 1, some definitions for the underlying signal model are made. We state that hum disturbances are characterized by 1) their long duration compared to actual elements of the wanted signal (e.g. bass notes) 2) their stability in frequency 3) their stability in power Assuming long-term stability in frequency can actually be legitimated in most cases by the origin of the hum: Usually, sinusoidal disturbances are caused by power line interference during recording or copying processes – and as the frequency of the mains voltage is very stable in the majority of cases so are the hum frequencies. (Sometimes, battery powered mobile recording devices generate hum that may slowly drift monotonically in frequency, when batteries run empty. Investigations have yielded that the latter case can be expected to be rare enough to not pursue this situation.) The constancy of power can be justified by the fact that microphones and recording (or copying) equipment are usually fixed at a certain location, causing unvarying exposition to hum-inducing electromagnetic fields. Hence, the process of detecting hum disturbances aims at finding elements in the signal that exhibit the aforementioned properties. B. The compromise By defining hum disturbances the way we did in section I-A a systematic source of errors remains: Elements of the wanted signal featuring all three of the stated properties will be (very likely to be) indicated as hum disturbances. A possible solution to lower the false alarm rate of the detector would incorporate higher-level information, e.g. the key of a music piece and the tuning of the instruments. For example, if for a solo concert recording a piano is tuned according to A4 = 442Hz a frequency of 50Hz is very likely to not be part of the music (as G1 ≈ 49.22Hz and G♯1 ≈ 52.15Hz). However, this approach has not been followed and remains for future investigation. II. T HE DETECTION ALGORITHM A. The signal model The signal model that the detector is based on is shown in figure 2: One or more sine tones are added to the wanted signal, yielding the disturbed input signal.

1000 −40dB

900

frequency / Hz

800

−60dB

700 600

−80dB

500

−100dB

400 −120dB

300 200

−140dB

100 −160dB

0

5

10

15

20

25

30

35

40

45

50

55

time / s

Fig. 1. Spectrogram of a music recording containing a hum disturbance. In this example, showing an excerpt of a BBC recording of Elton John’s “Sorry Seems To Be The Hardest Word” from 1976, the hum frequencies are fhum ≈ (50Hz, 150Hz). Although the the hum power is rather low, compared to the actual wanted signal, the disturbance is clearly audible.

sinusoid(s)

+

DFT

buffer

quantile ratio

remove local trends

0.6

msteady (n)

Fig. 2. Signal model for the hum detection problem and block diagram of the first detector stage. The leftmost, framed part is the signal model: The wanted signal is degraded by one or more harmonic tone complexes. The detector works in the spectral domain, analyzing the variation of the power spectral density over time. A measure msteady (n) (where n is the frequency index) is computed based on the quantile ratio of the power in each frequency band.

The presumed long-term stability is the main feature of the hum that is exploited. Hence, the basis for the detector is to analyse a section from the input signal of a certain length. The processing is based on the discrete Fourier transform (DFT) and each frequency band is analysed statistically, and independently of all other bands. To achieve robustness towards false alarms (i.e. erroneously detecting hum tones that are not actually present) a two-stage approach has been taken. Both stages are described in the following sections. As the majority of hum disturbances consists of frequencies below 1kHz the detector does not consider frequencies above 1kHz. Therefore, the input signal is downsampled to 2kHz at first to get rid of unnecessary information and to lower the computational requirements. B. Stage 1 – The steady tone detector Although in section I-A we stated that the power of hum disturbances can be expected to be stable over a longer period of time, the intuitive examination of spectral minima within a certain time window yields only suboptimal results. Practice has shown that the hum power may occasionally drop well below its mean value, mainly caused by destructive interference with elements of the wanted signal (e.g. bass tones). Hence, to gain robustness, quantiles are examined instead of minimum values. To create a measure indicating hum candidate frequencies the 5%-quantiles of each frequency band are related to the 80%-quantiles. This quantile ratio is a measure for the amount of fluctuation of power over time in each frequency bin. In conjunction with an analysis window of the proper length,

0.5

msteady (f )

wanted signal

0.4 0.3 0.2 0.1 0

−0.1

0

50

100

150

200

250

300

f /Hz

Fig. 3. The measure that indicates steady tones. Shown is the ratio of the 5%- and 80%-quantiles of the power spectral densities. The quantiles have been computed from a section of the spectrogram shown in figure 1 of 15s length, starting at 20s. To take account for broader frequency ranges that show low fluctuation the local trends have been removed by subtracting the running median filtered version of the quantile ratio.

hum disturbances become clearly visible (see figure 3). To take account for broader frequency ranges showing low fluctuation of power, the “local trends” are removed by subtracting the running median filtered version of the quantile ratio: m ˜ steady (n) = msteady (n) − Medianrunning {msteady (n)} . The robustness of this measure towards false detections is further increased by – again – smoothing by a median filter, now in time direction, applied to each frequency band. This final value, m ˜ steady (n) – where n is the frequency bin index –, is used to indicate frequencies that are very likely to contain hum disturbances. A threshold t = Median {m ˜ steady (n)} + a ·

q Var (m ˜ steady (n))

is computed which, if exceeded, indicates hum frequencies. The factor a has to be chosen properly to find a compromise between hit and false alarm rates. C. Stage 2 – The post processing stage The output of the steady tone detector is fed through a post processing stage to sieve its output and to raise the detection accuracy. In this step, further requirements concerning the detected sinusoids can be defined. The most important parameter is the minimum tone duration Tmin. Of

0

relative frequency

gain / dB

hit false alarm

1

−20

← →

−40 −60 −80

0.8 0.6 0.4 0.2 0

−100 0

0.1

0.2

0.3

0.4

0.5

−40

−30

−20

−10

normalized frequency (f /fs )

Fig. 5.

Transfer function of an adaptive notch filter.

course, choosing Tmin rather high, e.g. Tmin = 30s drastically reduces the false alarm rate but, on the other hand, raises the probability to miss hum sinusoids by dismissing short detections. Furthermore, different hum frequencies that appear (and dissappear) at the same time are assigned to a hum group. This way, changing hum disturbances – each consisting of one or more hum frequencies – are tracked efficiently and are represented in a concise way for further processing. D. Refinement of the detected frequencies Since the steady tone detector is based on the DFT of the input signal the frequency resolution is limited to ∆f = fs /NDFT . Therefore, the frequency estimation of detected tones is improved by running the input signal through an adaptive notch filtering algorithm [1]. The latter uses a gradient based approach to minimize the power of its output signal. It features a transfer function as shown in figure 5 and converges to that frequency that leads to the smallest output power. To avoid convergence to erroneous frequencies close to the hum frequencies but with higher power a bandpass filter with a bandwidth of fBP = 2Hz is centered around the estimated frequency to pre-filter the input signal of the adaptive filtering algorithm. A schematic diagram of the whole algorithm is shown in figure 4. The depicted data flow is performed for every input block and for every frequency bin. III. E VALUATION To examine the detection behaviour of the presented method an extensive evaluation has been performed. A. The test method and performance measures The detection performance was evaluated by computing the hit (correct frequency detected) and false alarm (hum detected although the input signal contains no hum) rates of the developed method. For this purpose, a number of test signals (compare section III-B) with known hum disturbances were processed by the algorithm. By comparing the detection results with the true values the performance measures could be determined in terms of relative frequencies. For being able to compute the false alarm rates, the same number of hum-free signals were also processed by the algorithm.

0

10

20

30

40

SNR / dB

Fig. 6. Detection performance of the presented algorithm. The test signal base consists of six different wanted signals, containing artificial hum disturbances at different SNRs.

To achieve a hit the detection algorithm had to determine the hum frequency with an error of not more than 1Hz, otherwise a measurement was declared a miss. B. The test signals The test signals have been generated artificially by adding single sine tones between 50Hz and 500Hz to disturbancefree signals. To investigate the algorithm’s performance for a variety of wanted signals, each of very different character, the following signals have been chosen: • a white noise signal • a pink noise signal • a speech signal [2] • random excerpts from a classical music recording [3] • random excerpts from a popular music recording [4] • random excerpts from varying electronic Techno music • random excerpts from a field recording from inside a passenger plane during flight (mostly consisting of long lasting machine noises, e.g. from the engines and air conditioning system) The SNR – which in this case actually is the “wanted signal to hum” ratio – has been varied from −40dB . . . 40dB. C. Results 1) Hit and false alarm rates vs. input SNR: It can be seen in figure 6 that for low input SNRs from −40dB . . . 10dB the algorithm features hit rates of 100% and the false alarm rate is low for all SNRs. For SNRs above 15dB the hit rate decreases, but for these cases, informal listening test have shown that the hum is hardly noticeable anyway. To tune the balance between hit and false alarm rates the threshold parameter a, described in section II-B, may be set to a different value to best fit the respective application’s requirements. 2) Accuracy of the frequency estimation: The deviation of the detected frequencies from the true values is shown in figure 7. Especially for low SNR conditions the precision is very high (less than 0.01Hz). D. A reference algorithm Research on hum detection algorithms for audio applications seems to having been quite neglectable in the past. A single article on this topic could be found, dealing with a related problem: Power-line interference in ECG signals. As a reference, we have implemented the

increment pause duration / set preliminary end time?

is in list of detected tones?

is there a hum frequency?

maximum pause duration exceeded?

yes

no tone detector

remove from list of tracked tones

tone detected?

add to the list of tracked tones

no yes group all tones that

no is in list of yes detected tones?

yes is paused?

yes add pause duration

increment duration

duration ≥ minimum duration?

tone is valid

start close to each other

refine frequencies

Fig. 4. Flowchart of the hum detection algorithm. The procedure that is depicted is performed for every input data block and for every frequency bin. The “tone detector” block’s output is the steady tone indicator measure which is exemplarily shown in figure 3.

0.06 hit false alarm

1

relative frequency

∆f / Hz

0.05 0.04 0.03 0.02 0.01

0.8 0.6 0.4 0.2 0

0 −40

−30

−20

−10

0

10

20

30

40

SNR / dB

−30

−20

−10

0

10

20

30

40

SNR / dB

Fig. 7. Accuracy of the frequency detection. Shown is the absolute value of the difference between estimated and true hum frequency. If for a signal the detection algorithm yielded a deviation of more than 1Hz the measurement was declared a miss and has not been included in this plot.

Fig. 9. Detection performance of the tone detector (stage 1) only. The test signal base is the same as for the evaluation of the reference algorithm. The post processing stage has been switched of in this simulation.

algorithm. The post processing is skipped in this case, as it improves the detection performance significantly and would inhibit comparability.

hit false alarm

1

relative frequency

−40

0.8 0.6 0.4

IV. C ONCLUSION

0.2

In this paper a multi-stage hum detection algorithm was presented. It was shown that accurate hum dectection is possible up to a certain SNR. The developed method features good detection properties and precisely determines hum frequencies. For being able to work on material containing different hum disturbances (e.g. montages of different recordings) the method is able to identify changing scenarios.

0 −40

−30

−20

−10

0

10

20

30

40

SNR / dB

Fig. 8. Detection performance of the reference algorithm. The test signal base consists of six different wanted signals, containing artificial hum disturbances in the form of sinusoid tones of 60Hz at different SNRs.

algorithm [5] that is based on linear discriminant analysis (LDA) to decide whether a block of a signal contains hum or not. As the algorithm has been developed for application on ECG signals, containing 50Hz or 60Hz disturbances, the test scenario was adjusted to better meet the originally intended circumstances. Hence, the test signal base consisted of different signals from all classes mentioned in section III-B, containing 60Hz sine tones with different SNRs, and – to examine the false alarm behaviour – the same number of hum-free signals. Furthermore, we used the same data for testing and for training. Considering figure 8 it becomes clear that – even under optimum conditions – the LDA is not able to effectively seperate hum and no hum signals, especially for SNRs of more than 0dB. For a fair comparison, figure 9 shows the detection performance of the steady tone detector (stage 1) described in this article, working on the same test signal set as the reference

ACKNOWLEDGEMENT This research was (partly) funded by grant 17N3008 of the German Federal Ministry of Education and Research (BMBF). The views and conclusions contained in this document, however, are those of the authors. R EFERENCES [1] P. K. Dash, B. R. Mishra, R. K. Jena, and A. C. Liew, “Estimation of Power System Frequency using Adaptive Notch Filters,” Proceedings of the International Conference on Energy Management and Power Delivery, 1998. [2] Wikipedia spoken article, “Bird,” 2008. [3] Johann Baptist Vanhal, “Symphony in c minor – Menuetto Moderato,” ≈1760–1770. [4] J. Osborne, “St. Teresa,” 1996. [5] Y.-D. Lin and Y. H. Hu, “Power-Line Interference Detection and Suppression in ECG Signal Processing,” IEEE Transactions on Biomedical Engineering, vol. 55, no. 1, 2008.