noise filtering and occurrence identification of mouse ...

17 downloads 31483 Views 776KB Size Report
occurrence is done based on the power spectral density (PSD). .... shows the best threshold value;(c) A graph shows the identification of mouse call occurrences ...
Proceedings of the 2013 International Conference on Machine Learning and Cybernetics, Tianjin, 14-17 July, 2013

NOISE FILTERING AND OCCURRENCE IDENTIFICATION OF MOUSE ULTRASONIC VOCALIZATION CALL NANCY YU SONG1, JÉRÔME NICOD2, BIAO MIN1, RAY C. C. CHEUNG1, MD ASHRAFUL AMIN1,3, HONG YAN1 1

Department of Electronic Engineering, City University of Hong Kong, Tat Chee Avenue, Kowloon Tong, Hong Kong The Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford, OX3 7BN, United Kingdom 3 Computer Vision and Cybernetics Research, Computer Science & Engineering, Independent University, Bangladesh. E-MAIL: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] 2

Abstract: Currently, there exists a large amount of mouse ultrasonic vocalization data to be analyzed. However, manual annotation of mouse ultrasonic vocalization data requires a lot of human efforts and sometimes it is a mission impossible. As a result, a method is proposed in this paper to filter out the noise in the vocalization recordings and automatically identify the occurrence of mouse vocalization calls. The method can speed up the process of annotating the vocalization data.

Keywords: Mouse ultrasonic vocalization; Occurrence detection; Denoise; Spectral Subtraction; Mouse Voice Activity Detection;

1.

Introduction

Mice are the most common mammalian model organism in biology and medicine. Comparing with the other mammals, mice are relatively cheaper and easier to be handled and maintained. Mice genome shares a high similarity with that of human beings. Moreover mice reproduce very quickly. The gestation period of one female house mouse is around 19 to 21 days. One female house mouse can have 5 to 10 litters every year. Each litter usually contains 3 to 14 young mice. Female house mice reach sexual maturity at around 6 weeks and male house mice at around 8 weeks. These reasons make mice a very important animal for the research of human disease. Mice are highly vocal animal. Both male and female mice vocalize in same-sex and cross-sex social encounters [1-5]. Mice produce both vocalizations audible and inaudible to human [6][2]. Examples of audible vocalizations are postpartum sounds and distress calls [7]. Inaudible calls are ultrasonic vocalizations with frequencies between 30kHz and 110kHz [8]. As shown by some studies, in at least two situations mice produce ultrasonic vocalization: “isolation

calls” are produced by pups when they are cold or removed from the nest [9]; “ultrasonic vocalizations” are produced by male mice when female mice are presented or female mice’s urinary pheromones are detected [2][10-11][4]. It has been shown that the mouse ultrasonic vocalization have the characteristics of song which consists of several different syllable types and repeated phrases. [7]. Adult mice are able to distinguish the vocalization of pups and adults [12][9][13]. It was found that a few developmental changes in mouse syllables and sequences could be used as acoustic cues for adult mice to determine the age of the caller [14]. It was claimed that mice may have limited version of the brain and behavior traits for vocal learning which are found in humans for learning speech and in birds for learning song [15][16]. This finding can help scientist’s studies of the diseases like autism and anxiety disorders. There have been many researches on analyzing mouse ultrasonic vocalizations [7][14][17]. An example of one vocalization clip can be viewed in Figure 1. Many existing researches analyze mouse vocalization calls’ shapes and the information they contain. However, little was mentioned on the preprocessing of the noise contaminated vocalization recordings. Automatic identification of mice vocalization call occurrence and extraction of the call shapes by computational methods are critical problems to be solved. Manual annotation of the call occurrence and call shapes requires a lot of human efforts. For example, annotating the occurrence of each call in a 300 seconds long recording which contains 1227 calls requires one person to work for one day by using software Praat [18]. And manually annotating the shapes is an even more difficult task. The call shapes are complicated and there is no general criterion to do the annotation. In this paper, we explore a computational method to automatically identify the occurrence of the mouse ultrasonic

978-1-4799-0260-6/13/$31.00 © 2013 IEEE 1218

Proceedings of the 2013 International Conference on Machine Learning and Cybernetics, Tianjin, 14-17 July, 2013 vocalization calls, as shown in Figure 2. The difficulty in identifying the occurrence of calls is mainly caused by the noises in the recordings. Therefore, the noises should be eliminated.

Figure 1. Two visual graphs of a mouse ultrasonic vocalization clip.

The upper graph shows the vocalization in time domain and the lower graph shows the spectrogram of the corresponding clip. 2.

Figure 2. Flowchart of the Our Proposed Method

Methods

The mouse vocalization recordings are usually contaminated by noises from two sources. The first source is the recording environment. The second source is the mice themselves. There are many types of environment noise sources, e.g. wind, fluorescent lamp, computer internal fan, recording equipment etc. The environment noise sources are hard to be eliminated because it is difficult to identify them, especially when the emitted noise is in the ultrasonic level. Also the exact noise sources vary with the recording environment. Moreover, it is not possible to remove some of the sources, e.g. the recording equipment itself. In Figure 3, the noise marked by the red rectangles is possibly generated by an environment source. These noises may be depressed in a well-designed recording studio with soundproofing and sound absorbing functions. However, for practical or economic reasons, recordings are often performed without soundproofing.

Figure 3. Stationary noise is shown in the red rectangles.

Because it is hard to eliminate the noise in the recording stage, the recording must be pre-processed by a computational method to filter the noise. The noise can be classified into two types. The first type is stationary noise as shown in Figure 3. Stationary noise lasts from the beginning to the end of the recording. Its frequency band is fixed. The stationary noise often causes incorrect extraction of call shapes. The second type is spike noise as shown in Figure 4(b). Spike noise appears randomly on the spectrogram. Its frequency band usually spreads across all the bandwidth. Spike noise is the major cause of incorrect identification of call’s occurrence. In our proposed method, the identification of the call occurrence is done based on the power spectral density (PSD). In order to get the PSD, we calculate the spectrogram first as shown in Figure 4(a). Then we sum the power spectrum in the same time point together as shown in Figure 4(b). The

1219

Proceedings of the 2013 International Conference on Machine Learning and Cybernetics, Tianjin, 14-17 July, 2013 stronger the PSD is, the more likely a call is generated at that time. 2.1 Removal of Stationary Noise

where Pm is the maximum power in the spectrum for one time point, the denominator

l

∑P k =1

The noise with frequency lower than 20kHz can be easily removed by a high-pass filter because the mouse ultrasonic vocalization usually ranges over 30kHz-110kHz. Here we use the first derivative as a high-pass filter. However, the noise with high frequency is harder to be removed because they are mixed up with the original mouse ultrasonic vocalization. Since the additive noise is wide-band and stationary, spectral subtraction is one simple and effective method to filter it. Spectral subtraction is initially developed to improve the intelligibility of human speech corrupted by broadband noise. In the spectral subtraction theory, the noise and speech magnitude are assumed to be independent to be each other. The phase of the noise and speech are also assumed to be independent of each other and of their magnitude. In our experiment, a MATLAB function specsub in a speech processing toolbox VOICEBOX is used to perform the filtering [20]. The MATLAB code implements the algorithms in [21-23]. In our experiment, all the parameters are set to be default.

k

is the total power of the

spectrum, l is the total number of points in the spectrum, and a is the value which represents the percentage the strongest power takes in the total power for one time point. The power spreads in different frequency bands for the spike noise, while the power doesn’t spread but only appears in one or several frequencies for a real mouse call. Therefore the value a is smaller for the spike noise than that for a real call. In Figure 5, it can be observed that the two peaks at the beginning of Figure 5(b) were suppressed in Figure 5(e) where both noise-filtering and equation (1) were applied.

2.2. Removal of Spike Noise Spike noises cause interference in the identification of the calls’ occurrence because the PSD values are large when the spike noises appear. A false alarm can be easily generated as shown in Figure 4(b). We propose a function to check whether the spike noise exists in every time point. a=

Pm l

∑P k =1

× 100%

Spike Noise

(1)

k

Figure 4. (a) Spectrogram of a piece of mouse ultrasonic vocalization recording (The same as the lower graph of Figure 1.) with the spike noise shown in the red rectangle;(b) Plot of the power spectral density of the recording with peaks caused by spike noise shown in the red rectangle;

1220

Proceedings of the 2013 International Conference on Machine Learning and Cybernetics, Tianjin, 14-17 July, 2013

3.

Results and Discussions

The occurrence of the mouse vocalization calls are manually annotated for three pieces of mouse ultrasonic vocalization recordings with length 300 seconds (1227 calls), 61 seconds (147 calls) and 45 seconds (142 calls), respectively. The manually annotated occurrences are used as the ground truth. Receiver operating characteristic (ROC) curve is used for performance evaluation. The larger the area under the ROC curve, the better the performance is. Besides the ROC

curve, accuracy is also taken as a measure of the performance. Youden’s index is applied to find the best threshold value. Then the accuracy of the identification is calculated for the threshold value. In our proposed method in Figure 2, the percentage value a is used as the final decision value. After we have percentage value a for one piece of recording as shown in Figure 5(d), we normalize the value a to be between 0 and 100. The normalization is done in order check if the best threshold value does not vary a lot in different recordings.

Figure 5. (a) Spectrogram which is the same as in Figure 4(a); (b) Power Spectral Density Obtained by (a) with the original noise. The red horizontal line shows the best threshold value;(c) A graph shows the identification of mouse call occurrences based on PSD value in (b) (Value 1 indicates occurrence and Value 0 indicates silence); (d) Spectrogram of the same piece of mouse vocalization recording as in (a) but with noise filtering(e) Percentage Value obtained with (d) by the method shown in Figure 2. The red horizontal line shows the best threshold value;(f) A graph shows the identification of mouse call occurrences based on percentage value in (d) (Value 1 indicates occurrence and Value 0 indicates silence);

Then we set threshold value to be 0. All the time points

with value no smaller than the threshold value are labeled as positive. All the time points with value smaller than the

1221

Proceedings of the 2013 International Conference on Machine Learning and Cybernetics, Tianjin, 14-17 July, 2013 threshold value are labeled as negative. Then we vary threshold value from 0 to 100 in order to obtain the ROC curve. We compare the performance of our proposed method with a method without noise filtering as shown in Figure 6. Instead of using percentage value, the PSD value is directly used as the decision making value. In Table 1, it is shown that the accuracy achieve by the method with filtering is higher than that of the method without filtering. Another point to be noted is that the best threshold values do not vary much between different recordings. The best threshold value a is around 10. This is important because it means that possibly during the same recording environment, the best threshold value is almost the same for different recordings. It makes the automatic identification of occurrence possible. Even if the environment changes, only a small amount to recordings needs to be manually annotated in order to adjust the best threshold value.

Figure 7. Comparisons of the performances in presence (red) or absence (blue) of noise filtering.

4. Figure 6. Flowchart of an identification method without noise filtering Table 1. Threshold values and accuracy obtained by performing two different methods.

Value 3

Accuracy 82.1%

Percentage Value a (Figure 2) Value Accuracy 11 96.1%

5

66.3%

9

90.4%

5

89.6%

10

97.8%

PSD (Figure 6)

1st Recording (300 Seconds) 2nd Recording(61 Seconds) 3rd Recording(45 Seconds)

Conclusion

Manual annotation of mouse ultrasonic vocalization data requires a lot of human efforts. We have proposed a method to filter out the noise in the vocalization recordings and automatically identify the time when mouse ultrasonic vocalization calls occur. The method can speed up the process of annotating the vocalization data. The noise filtering and accurate identification of the occurrence of mouse calls also advances the progress of developing an automatic call shape extractions algorithm.

1222

Proceedings of the 2013 International Conference on Machine Learning and Cybernetics, Tianjin, 14-17 July, 2013

References [1] [Panksepp J B, Jochman K A, Kim J U, et al., “Affiliative behavior, ultrasonic communication and social reward are influenced by genetic variation in adolescent mice.” PLoS One 2(4), e351, 2007. [2] Gourbal B E, Barthelemy M, Petit G, & Gabrion C, “Spectrographic analysis of the ultrasonic vocalisations of adult male and female BALB/c mice”, Naturwissenschaften, 91(8), 381-385, 2004. [3] Maggio JC, Whitney G, “Ultrasonic vocalizing by adult female mice (Mus musculus)”, J Comp Psychol, 99: 420–436, 1985. [4] ] Stowers L, Holy TE, Meister M, Dulac C, Koentges G, “Loss of sex discrimination and male-male aggression in mice deficient for TRP2”, Science, 295: 1493–1500, 2002. [5] Liu RC, Miller KD, Merzenich MM, Schreiner CE, “Acoustic variability and distinguishability among mouse ultrasound vocalizations”, J Acoust Soc Am, 114: 3412–3422, 2003. [6] Whitney G, Nyby J, “Sound communication among adults”, The auditory psychobiology of the mouse, Springfield (Illinois): C. C. Thomas, pp. 98–129, 1983. [7] Holy T E, Guo Z, “Ultrasonic songs of male mice”, PLoS biology, 3(12): e386, 2005. [8] Sales GD, “Ultrasound and mating behavior in rodents with some observations on other behavioural situations”, J Zool Lond 168: 149–164, 1972. [9] Haack B, Markl H, Ehret G, “Sound communication between parents and offspring”, The auditory psychobiology of the mouse. Springfield (Illinois): C. C. Thomas, pp. 57-97, 1983. [10] Wysocki C, Nyby J, Whitney G, Beauchamp G, Katz Y “The vomeronasal organ: Primary role in mouse chemosensory gender recognition”, Physiol Behav 29: 315–327, 1982. [11] Sipos M, Kerchner M, Nyby J, “An ephemeral sex pheromone in the urine of female house mice (Mus domesticus)”, Behav Neural Biol, 58: 138–143, 1992. [12] Hammerschmidt K, Radyushkin K, Ehrenreich H, Fischer J, “Female mice respond to male ultrasonic

[13] [14] [15] [16]

[17] [18] [19]

[20]

[21] [22]

[23]

1223

'songs' with approach behavior”, Biol Lett, 5: 589–592, 2009. Ehret G, Koch M, Haack B, Markl H, “Sex and parental experience determine the onset of an instinctive behavior in mice”, Naturwissenschaften, 74: 47, 1987. Grimsley J M S, Monaghan J J M, Wenstrup J J. “Development of social vocalizations in mice”, PLoS One, 6(3): e17460, 2011. Arriaga G, Jarvis E D, “Mouse vocal communication system: Are ultrasounds learned or innate?”, Brain and language, 124(1): 96-116, 2013. Arriaga G, Zhou E P, Jarvis E D, “Of mice, birds, and men: the mouse ultrasonic song system has some features similar to humans and song-learning birds”, PloS one, 7(10): e46610, 2012. Zakaria J, Rotschafer S, Mueen A, et al., “Mining Massive Archives of Mice Sounds with Symbolized Representations”, SDM, 2012. Boersma P, Weenink D, Phonetic Sciences, University of Amsterdam, http:// www.praat.org Johnson A M, Doll E J, Grant L M, et al., “Targeted Training of Ultrasonic Vocalizations in Aged and Parkinsonian Rats”, Journal of visualized experiments: JoVE, 54, 2011 Brookes M, “Voicebox: Speech processing toolbox for matlab”, Software, available [Mar. 2011] from www.ee.ic.ac. uk/hp/staff/dmb/voicebox/voicebox. html, 1997. M. Berouti, R. Schwartz and J. Makhoul, Enhancement of speech corrupted by acoustic noise,Proc IEEE ICASSP, 1979, 4, 208-211 Martin R, “Noise power spectral density estimation based on optimal smoothing and minimum statistics”, IEEE Trans. Speech and Audio Processing, 9(5):504-512, July 2001. Gerkmann T. Hendriks R C, “Unbiased MMSE-Based Noise Power Estimation with Low Complexity and Low Tracking Delay”, IEEE Trans Audio, Speech, Language Processing, 20, 1383-1393, 2012.

Suggest Documents