Neural Information Processing Scaled for Bioacoustics

llll

l

Neural Information Processing Scaled for Bioacoustics -from Neurons to Big Data-

Proceedings of NIPS4B, international workshop joint to NIPS, USA, 2013 Glotin H., LeCun Y., Artières T., Mallat S., Tchernichovski O., Halkias X. Toulon, New-York, Paris

ISBN 979-10-90821-04-0 http://sabiod.univ-tln.fr

ISBN: 979-10-90821-04-0

We acknowledge CNRS MASTODONS MI for its support to SABIOD.ORG project which co-organized this workshop. We thank BIOTOPE, PhD Yves Bas, and O. Dufour who co-organized the Bird challenge of this workshop. We thank ADEME who supports Dufour's grant. These proceedings were compiled with the help of R. BALESTRIERO, Toulon University. We thank O. Dufour for flyers edition, Toulon University - LSIS UMR CNRS - DYNI SABIOD, Dec. 2013

Legend of the cover : Left image : spectrogram of the challenge 2 humpback whale song. [from V. Lostanlen, ENSDI, Paris]. Right picture : Minke whale Fourier time-frequency representation, on 5 minutes scale (left) versus on two years scale (right), showing seasons effects, global frequency shift. [from L. Kindermann 2013, in this book section 9.2]

This book is available on line at http//sabiob.org/NIPS4B2013_book.pdf Please cite its content with: " Proc. of Neural Information Processing Scaled for Bioacoustics: from Neurons to Big Data, 2013, Glotin H., LeCun Y., Artieres T., Mallat S., Tchernichovski O., Halkias X., joint to NIPS Conf., http://sabiod.org/nips4b, ISSN 979-10-90821-04-0 " Bibtex = @proceedings{procnips4B2013, title={Proc. Neural Information Processing Scaled for Bioacoustics, from Neurons to Big Data}, year={2013}, author={Glotin H., LeCun Y., Arti\`eres T., Mallat S., Tchernichovski O., Halkias X.}, organization={NIPS Int. Conf.}, address={USA}, note={http://sabiod.org/nips4b}, key={ISSN 979-10-90821-04-0}, }

In: proc. of int. symp. Neural Information Scaled for Bioacoustics, sabiod.org/nips4b, joint to NIPS, Nevada, dec. 2013, Ed. Glotin H. et al.

2

Contents Video supports: Most of the talks are available in video at http://sabiod.univ-tln.fr/nips4b/

Authors list..............................................................................................................................7 Chapter 1 Introduction............................................................................................... 9 1.1 Objectives..................................................................................................................11 1.2 2YHUYLHZRIWKH%LUG challenge...............................................................................12 1.3 Whale song clustering challenge .............................................................................14 1.4 Neurosonar analysis.................................................................................................15 1.5 Acknowledgements................................................................................................... 20

Chapter 2 Natural Neural Bioacoustic Learning ...............................21 2.1 Physiological brain processes that underlie song learning....................................... 22 Tchernichovski O.

2.2 Neuroethology of hearing in crickets: embeded neural process to avoid bat........39 Pollack G.

Chapter 3 Representation for Bioacoustics.............................................47 3.1 Dynamic timewarping and gaussian process multinomial probit regression for bat call identification ...................................................................................48 Stathopoulos V., Zamora-Gutierrez V., Jones K., Girolami M.

3.2 Whale songs classification using sparse coding ........................................................ 56

Glotin H., Razik J., Paris S., Adam O., Doh Y.

3.3 Classification of mysticete sounds using machine learning techniques .................69 Halkias X., Paris S., Glotin H.


3

Chapter 4 Advanced Artificial Neural Net................................................77 4.1 Convnets & DNN for bioacoustics... ............................................................................78

LeCun Y.

4.2 Mapping functional equations to the topology of networks yields a natural interpolation method for time series data...................................... 79 Kindermann L., Lewandowski A.

Chapter 5 Learning to Track by Passive Acoustics ..........................87 5.1 Mono-channel spectral attenuation modeled by hierarchical neural net estimates hydrophone-whale distance ........................................................ 88 . Doh Y., Glotin H., Razik J., Paris S.

5.2 Physeter localization: sparse coding & fisher vectors ........................................... 97 Paris S., Glotin H., Doh Y., Razik J.

5.3 Range-depth tracking of multiple sperm whales over large distances using a two element vertical array and rhythmic properties of click-trains ....................................................................................................103 Mathias D., Thode A., Straley J., Andrews R., Le Bot O., Gervaise C., Mars J.

5.4 Optimization of Levenberg-Marquardt 3D biosonar tracking ........................... 108

Mishchenko A., Giraudet P., Glotin H.

5.5 Data driven approaches for identifying information bearing features in communication calls........................................................................................ 116 Elie J., Theunissen F., Wills H.


4

Chapter 6 Non Human Speech Processing.................................................133 6.1 Gabor Scalogram Reveals Formants in High-Frequency Dolphin Clicks ............134 . Trone M., Balestriero R., Glotin H. 6.2 Supervised classification of baboon vocalizations...................................................143

Janvier M., Horaudm R., Girin L., Berthommier F., Boe L., Kemp C., Rey A., Legou T.

6.3 Software tools for analyzing mice vocalizations with applications to pre-clinical models of human disease ...................................................153 Shokoohi-Yekta M., Zakaria J., Rotschafer S., Mirebrahim H., Razak K., Keogh E. E

Chapter 7 Bird Song Classification Challenge...................................... 163 7.1 Multi-instance multi-label acoustic classiﬁcation of plurality of animals : birds, insects & amphibian........................................................................... 164 Dufour O., Glotin H., Bas Y., Artieres T., Giraudet P.

7.2 Bird song classification in field recordings: winning solution for NIPS4B 2013 competition ........................................................176 Lasseck M.

7.3 Feature design for multilabel bird song classification in noise (NIPS4B challenge) .......................................................................................

182

Stowell D., Plumbley M.

7.4 Learning multi-labeled bioacoustic samples with an unsupervised feature learning approach ........................................................................ 184 Mencia E., Nam J., Lee D.

7.5 Ensemble logistic regression and gradient boosting classifiers for multilabel bird song classification in noise (NIPS4B challenge)....................... 190

Massaron L.

7.6 A novel approach based on ensemble learning to NIPS4B challenge .............. 195 Chen W., Zhao G., Li X.


5

Chapter 8 Whale Song Clustering................................................................... 199 8.1 Analyzing the temporal structure of sound production modes within humpback whale sound sequences............................................................200 Mercado III E. Mercado III E.

8.2 Unsupervised whale song decomposition with Bayesian non-parametric Gaussian mixtur e....................................................................................205 Bartcus M., Chamroukhi F., Razik J., Glotin H. Bartcus M., Chamroukhi F., Razik J., Glotin H.

8.3 Classifying humpback whale sound units by theirvocal physiology, including chaotic features....................................................................................................212 Cazau D., Adam O.

8.4 Gabor scalogram for robust whale song representation ........................................218 Balestriero R., Glotin H.

227 8.5 Automatic analysis of a whale song.............................................................................. . Potamitis L., Ntalampiras S.

Chapter 9 Big Bioacoustic DATA...................................................................237 9.1 Cabled observatory acoustic data: challenges and opportunitie ..........................238 Hoeberechts M.

9.2 A challenge for computational bioacoustics................................................................246 Kindermann L.

Annex : Schedule.............................................................................................................255


6

Authors list (not including all participants)


7

5.5 Data driven approaches for identifying information bearing features in communication calls.

Julie E. Elie and Frédéric E. Theunissen. UC Berkeley. Dept of Psychology and Helen Wills Neuroscience Institute.

Bioacousticians have traditionally investigated the acoustical nature of information bearing features in communication calls by describing sounds using a small number of acoustical parameters that appear particularly salient (e.g. the mean frequency, duration, spectral balance). These measures are then used as parameters for linear discriminant analyses (LDA) or other supervised learning approaches to investigate what acoustic parameters drive sound categorization. This classical approach is computationally efficient and yields results that are easily interpretable. However this approach can also be limited by the a priori choice of the putative information bearing features: as long as the sound representation is not complete, one will not be able to determine whether the correct information bearing features are identified and, thus, whether the actual amount of information present in the calls (measured for example as the quality of a discrimination test) is correctly estimated. To address this issue, we have adopted a data driven approach. In the traditional approach, specific acoustical parameters are chosen for two reasons: for dimensionality reduction and for the implementation of a non-‐linear transformation that could be required for linear discriminant approaches to effectively discriminate among sound categories. These two steps, however, can be implemented without a priori assumptions or loss of information. In our approach, our non-‐linear transformation is an invertible spectrographic representation of the sound. Then before using a classifier, the dimension of this high-‐ dimensional representation is reduced using principal component analysis (PCA). For this approach to work, the spectrograms of the sounds must be aligned and cross-‐validation techniques that evaluate both the effect of the number of PCs in the PCA and the number of parameters in the classifier must be implemented to prevent over-‐fitting. In our approach alignment of spectrograms was based on the cross-‐correlation between amplitude envelopes. We then used proven cross-‐validation techniques to prevent over-‐fitting: bootstrap for LDA and the Random Forest algorithm for non-‐linear tree classifiers. Finally, we compared the classification performance of the two classifiers on this sparse spectrographic representation of the sound (PCA on spectrogram) with those obtained for two other feature spaces: the Mel frequency cepstral coefficients (MFCC) and the modulation power spectrum (MPS). These approaches were tested for the analysis of calls in a unique database of 1275 zebra finch communication calls obtained in our laboratory. This database includes many exemplars of all the call types in the zebra finch repertoire for a large number of male and female birds. Our algorithms were used to determine the discriminability of vocal types. The PCA on spectrogram yielded the best feature space: these features are both easy to interpret and yield higher performance of classifiers. The best results were obtained using the Random Forest algorithm and a PCA spectrographic representation using 50 PCs; we show that the 11 distinct call types in the zebra finch repertoire, irrespective of the identity of the vocalizing bird, could be classified with 83.1% of accuracy. This classification


116

performance is significantly higher that what one could obtain from any of the two other sound representations tested with the Random Forest algorithm (MFCC, 53.1% of accuracy; MPS, 63.5% of accuracy). Besides, the Random Forest yielded better classification performance than LDA, irrespective of the feature space used. In conclusion, the data driven algorithm using the DFA on the spectrogram showed superior results and we propose that it could be used both for investigating behavioral and neural mechanisms of sound discrimination and for the vocalization based identification of species in ecological or environmental studies.


117

Mo#va#on' •  Classical'approaches'in'bio2acous#cs'and' sound'analyses'~'Using'simple'(ad'hoc)' features.'' –  Hyena'Vocaliza#ons'

•  Percep#on'and'Neural'Representa#on.'


118

Individual'Signature'in'the'Hyena'giggle'sounds.'

33%'Correct'

Mathevon'et'al,'BMC'Ecology'2010'

Zebra'Finch'Complete'Repertoire'is' Complex.' 6 1@@@ D222

)*$+,$-./

)*$+,$-./

1???

4 C@@@ A222 2 2@@@ ?222 0 A2 @ 1 ?0 2

0 @

B

>:::

)*$+,$-./

)*$+,$-./ Frequency (kHz)

Distance Nest Long Tonal call 0!"#%12%34-&.566(783&9%#%:5#"6"5*(%;?%@AAA Begging 0!"#%123%45-&.677(89:&9%#%,-;6#"7"6*(%?%@=A>AAA 0!"#%123%45-&.677(89:&9%#%,-;6#"7"6*(%?%@=A>AAA Tet 0!"#%123%45-&.677(89$&:%;%;6#"7"6*(%?@%A=B?BBB 0!"#%12%34-&.566(78$&3%#%,-95#"6"5*(%:;%?;@=@@@ 0!"#%123%45-&.677(89"&:%#%,-;6#"7"6*(%?@%A=3?333 0!"#%11%23-&.455(678&2%#%,-94#"5"4*(%:;AAA uck 0!"#%123%45-&.677(89:&;%2@222 Call E222 C??? 1::: 8 D@@@