Monaural Blind Source Separation in the context of ...

Monaural Blind Source Separation in the context of Vocal Detection Bernhard Lehner, Gerhard Widmer Department of Computational Perception, JKU Linz [email protected]

Introduction I We evaluate the usefulness of four monaural blind source separation (BSS) methods in the context of vocal detection (VD). I BSS methods: Adaptive REpeating Pattern Extraction Technique (aREPET), Kernel Additive Modelling (KAM), Flexible Audio Source Separation Toolbox (FASST), Robust Principal Component Analysis (RPCA). I First set of experiments: What is the best strategy to utilise BSS as pre-processing to improve VD? I Second experiment: Can we improve BSS estimates by post-processing them according to VD output?

Strategies to improve VD I Foreground Separation: features only from estimated vocals.

Results BSS for Pre-processing to improve VD:

MIX VOC aREPETmix aREPETsep FASSTmix FASSTsep KAMmix KAMsep RPCAmix RPCAsep

Table: Results of Foreground Separation. MIX: trained and tested with mixed audio; VOC: trained with mixed audio, tested with pure vocals. METHODmix: trained with mixed audio, tested with separated vocals; METHODsep: trained and tested with separated vocals.

I Foreground Concatenation: features from both mixed and estimated vocals (yields a double-sized vector). I Foreground Enhancement: remixing estimated vocals with original audio signal before feature extraction. I We compare two state-of-the-art feature sets (IC14 and OS11) by feeding them to Random Forest (RF) and Support Vector Machine (SVM) classifiers.

Utilising VD to improve BSS

Internal Data Set (framesize=200ms) RF SVM accuracy F-measure accuracy F-measure IC14 OS11 IC14 OS11 IC14 OS11 IC14 OS11 .837 .795 .846 .814 .855 .807 .863 .819 .916 .910 .920 .905 .939 .949 .943 .951 .768 .756 .800 .781 .783 .742 .797 .789 .841 .796 .850 .810 .861 .811 .866 .822 .732 .670 .682 .603 .751 .711 .756 .686 .826 .778 .835 .795 .845 .791 .854 .803 .752 .736 .773 .738 .631 .577 .728 .709 .826 .786 .835 .798 .849 .805 .855 .815 .752 .691 .788 .763 .620 .563 .704 .703 .845 .797 .851 .809 .861 .820 .867 .828

MIX MIX+VOC MIX+aREPET MIX+FASST MIX+KAM MIX+RPCA

Internal Data Set (framesize=200ms) RF SVM accuracy F-measure accuracy F-measure IC14 OS11 IC14 OS11 IC14 OS11 IC14 OS11 .837 .795 .846 .814 .855 .807 .863 .819 .960 .985 .962 .986 .976 .984 .977 .985 .845 .800 .853 .817 .865 .825 .872 .834 .842 .798 .850 .816 .863 .825 .871 .835 .844 .800 .853 .815 .871 .830 .877 .839 .850 .806 .858 .822 .870 .833 .877 .841

Table: Results of Foreground Concatenation. The classifier is given a double-sized vector containing the features from the mixed and the separated audio signal. MIX+VOC: concatenating features from the real vocals to simulate perfect separation.

I Vocals: mute the vocal estimate at non-vocal parts. I Background: select the original signal at non-vocal parts.

MIX VOC −6dB VOC 6dB aREPET −6dB aREPET 6dB FASST −6dB FASST 6dB KAM −6dB KAM 6dB RPCA −6dB RPCA 6dB

Internal Data Set (framesize=200ms) RF SVM accuracy F-measure accuracy F-measure IC14 OS11 IC14 OS11 IC14 OS11 IC14 OS11 .837 .795 .846 .814 .855 .807 .863 .819 .880 .861 .886 .868 .907 .869 .911 .874 .937 .943 .940 .944 .960 .945 .961 .946 .844 .792 .852 .809 .862 .807 .869 .818 .845 .799 .854 .813 .867 .813 .874 .823 .844 .795 .852 .812 .861 .805 .868 .817 .844 .799 .852 .815 .864 .811 .871 .822 .845 .801 .854 .816 .866 .815 .873 .825 .845 .803 .854 .816 .870 .821 .876 .829 .847 .803 .855 .817 .868 .817 .874 .826 .850 .809 .858 .821 .873 .821 .878 .829

Table: Results of Foreground Enhancement. The classifier is given the features extracted from a signal, where the separated vocals are remixed with the original audio signal. VOC: using the real vocals instead of the separated.

Figure: Example of RPCA separated singing voice. In the upper subplot we can see the mixed signal (grey) and the embedded vocals (black). In the lower subplot we can see the estimated vocals from RPCA (grey) and the partially muted vocal estimates according to our VD (black).

VD for Post-processing to improve BSS:

Discussion I Website with examples: www.cp.jku.at/misc/ismir2015bss/ I All four BSS methods show a very similar characteristic regarding the (only limited) improvement of VD results. I However, by utilising the VD output, we could improve BSS estimates for both vocal (useful for artist recognition) and background (useful for karaoke track creation). Bernhard Lehner, Gerhard Widmer, Department of Computational Perception, JKU Linz.

Figure: RPCA vocal estimation evaluation results. A: raw RPCA output; B: VD post-processed output; C: post-processed using ground truth. The global measure OPS indicate better performance for the post-processed output. The higher performance regarding interferences IPS are caused by the parts, that are muted, when our VD classifies them as non-vocal.

contact: [email protected]

Monaural Blind Source Separation in the context of ...

Monaural Blind Source Separation in the context of ...

Suggest Documents

SOURCE-FILTER BASED CLUSTERING FOR MONAURAL BLIND ...

Underdetermined Blind Source Separation of

Blind and semi-blind source separation

Underdetermined Blind Source Separation with

UNDERDETERMINED SPARSE BLIND SOURCE SEPARATION

Sound Source Separation in Monaural Music Signals - Tampereen ...

Advances in Nonlinear Blind Source Separation

Performance Measurement in Blind Audio Source Separation

Advances in Nonlinear Blind Source Separation

Blind Audio Source Separation in Time Domain

Blind Source Separation of Sparse Sources with

Comparison of Blind Source Separation Algorithms - CiteSeerX

Neural Network based Blind Source Separation of

Blind source separation of multichannel neuromagnetic ... - CiteSeerX

Blind Source Separation of Multispectral Astronomical ...

Blind source separation and analysis of

monaural source separation from musical mixtures ... - ISMIR 2007

Monaural Audio Speaker Separation Using Source-Contrastive ... - arXiv

BLIND SOURCE SEPARATION USING CLOSED-FORM ...

BLIND SOURCE SEPARATION VIA MULTINODE SPARSE ...

Nonlinear Blind Source Separation for EEG Signal

Blind Source Separation using Variational Expectation-Maximization

Blind Source Separation Based Dynamic Parameter

Blind Source Separation: Recovering Signals from