Available online at www.sciencedirect.com
ScienceDirect ScienceDirect
Procedia Computer Science 00 (2018) 000–000 Procedia Computer Science 00 (2018) 000–000
Available online at www.sciencedirect.com
ScienceDirect
www.elsevier.com/locate/procedia www.elsevier.com/locate/procedia
Procedia Computer Science 126 (2018) 363–372
22nd International Conference on Knowledge-Based and Intelligent Information & 22nd International ConferenceEngineering on Knowledge-Based Systems and Intelligent Information & Engineering Systems
A Comparative Study of Blind Source Separation for Bioacoustics A Comparative Study of Blind Source Separation for Bioacoustics Sounds based on FastICA, PCA and NMF Sounds based on FastICA, PCA and NMF a,b
Norsalina Hassanaa, Dzati Athiar Ramlib* Norsalina Hassan , Dzati Athiar Ramlib*
School of Electrical & Electronic Engineering, Universiti Sains Malaysia, Nibong Tebal, 14300, Penang, Malaysia School of Electrical & Electronic Engineering, Universiti Sains Malaysia, Nibong Tebal, 14300, Penang, Malaysia
a,b
Abstract Abstract Blind Source Separation (BSS) is a task of separating a set of source signals from mixed signal without (or very little Blind SourceofSeparation (BSS)and is athetask of separating a set of addresses source signals from mixed without (or very little information) both the sources mixing process. This paper the problem of BSSsignal in bio-acoustic mixed signals. In a noisy acoustic environment, species recognition basedaddresses on vocalization remains a challenging task. In order to information) of both the sources andanimal the mixing process. This paper the problem of BSS in bio-acoustic mixed signals. In a noisy acoustictheenvironment, animal speciessignals recognition basedneed on to vocalization challenging task. order to robustly recognize specific species, the source of interest be separatedremains from thea mixed signals. ThisInseparation robustly is recognize the specific species, the source signals of interest need to betakes separated the mixed This separation process a significant pre-processing step before the recognition process place.from In this paper,signals. three different source separation namely Fast Fixed-Point Independent Component Analysis (FastICA), Principal Component process is amethods significant pre-processing step before the recognition process takesalgorithms place. In this paper, three different source separation methods namely Fast Fixed-Point Independent Component Analysis Inalgorithms (FastICA), PrincipalofComponent Analysis (PCA) and Non-Negative Matrix Factorization (NMF) are implemented. this experiment, the mixtures frog sound Analysis (PCA) Non-Negative Matrix Factorization (NMF) areusing implemented. thisand experiment, the mixtures of frog sound signals are used and as input. The quality of separated source signals FastICA, In PCA NMF algorithms are compared and evaluated to BSS_EVAL metrics. These metrics of PCA signal distortion ratio are (SDR), signaland to signals areaccording used as input. The quality toolbox of separated source signals using consist FastICA, andtoNMF algorithms compared evaluated BSS_EVAL toolboxratio metrics. metricsshow consist of signalwith to negentropy distortion ratio (SDR), interferenceaccording ratio (SIR)toand signal to artifacts (SAR).These The results that FastICA technique for signal findingtoa interference ratio (SIR) andhas signal artifacts ratio (SAR). The results show that FastICA with negentropy technique for finding a maximum non-gaussianity the to best performances in separating mixed signals. maximum non-gaussianity has the best performances in separating mixed signals. © 2018 The Authors. Published by Elsevier Ltd. © 2018 The Authors. by Ltd. © 2018 The Authors. Published by Elsevier Elsevier Ltd. This is an open accessPublished article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Selection under responsibility of KES International. This is an and openpeer-review access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Selection and peer-review under responsibility of KES International. Selection and peer-review under responsibility of KES International. Keywords: Frog Sound Signal Separation;Blind Sourse Separation (BSS);FastICA, NMF Keywords: Frog Sound Signal Separation;Blind Sourse Separation (BSS);FastICA, NMF
1. Introduction 1. Introduction Frogs are among the most vocal of animal and generally communicates using sounds to convey warnings, attract Frogs among theterritories most vocal of animal and generally mates andare defend their against the opponent [1]. communicates using sounds to convey warnings, attract mates and defend their territories against the opponent [1]. __________ * Corresponding author. Tel.: +6-004-559-5999; fax: +6-004-594-1023. __________ * Corresponding Tel.: +6-004-559-5999; fax: +6-004-594-1023. E-mail address:author.
[email protected] E-mail address:
[email protected]
1877-0509 © 2018 The Authors. Published by Elsevier Ltd. This is an open access under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) 1877-0509 © 2018 Thearticle Authors. Published by Elsevier Ltd. Selection under responsibility of KES International. This is an and openpeer-review access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Selection and peer-review under responsibility of KES International. 1877-0509 © 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Selection and peer-review under responsibility of KES International. 10.1016/j.procs.2018.07.270
364 2
Norsalina Hassan et al. / Procedia Computer Science 126 (2018) 363–372 Norsalina Hassan and Dzati Athiar Ramli/ Procedia Computer Science 00 (2018) 000–000
The sound produced by frogs can be used as a species identifier. Each species has a unique call which holds a sufficient individual information thus make it feasible for human to detect their existence in the particular area [2]. The species identification based on vocalization is considered valuable for biological research and environmental monitoring application [3]. Frogs usually interact acaoustically in large social aggregations, comprising hundreds of males making large breeding choruses at potential breeding site. Thus, the high noise levels and temporally overlapping sound signals within frog choruses, interfere with the ability of listeners to detect, recognize, and discriminate among vocalizations [4]. For several applications, such as automatic sound recognition, it is essential to separate the target source sounds from the complex acoustic environment as pre-processing steps, before it can be used as an input to the system to increase the performance. Recent years, sound separation has received much attention in the research of signal processing fields and one common method is Blind Source Separation (BSS). BSS is the procedure of estimating the original sources from signal mixtures. A typical example of BSS cases is the wellknown ‘cocktail party problem’. The cocktail party problem is a phenomenon of being able to focus on specific human voice while filtering out other voices or background noise. Many methods for BSS have been proposed, which aims at providing a solution to the cocktail party problem [5]. Some of the widely used techniques for solving BSS includes Independent Component Analysis (ICA), Principle Component Analysis (PCA) and Nonnegative Matrix Factorization (NMF). The ICA based separation techniques is among the dominant successful BSS techniques [6]. ICA’s has diverse applications including speech-music separation [7], speech separation [8][9], power system analysis [10], face recognition [11] and biomedical signal analysis [12][13]. FastICA [14], Infomax [5] and JADE [15] are several different algorithms available to compute the independent component. The fast-fixed point algorithm (FastICA) is the most popular due to its fast convergence and good separation [16]. While ICA exploits higher-order statistics of the data, PCA only considers the second-order statistics to solve the BSS problem [17]. Non-negative matrix factorization (NMF) on the other hand, has been proved to produce good results in addressing monaural separation problem [18][19][20], multichannel source separation [21], source separation in automatic speech recognition [22] and music analysis [23]. Despite that the use of ICA, PCA and NMF for BSS is well studied, they have never been reported yet to be perform for bio-acoustic sound separation. While ICA method works in time-domain, and estimates the source and mixing matrix by finding components that are statistically independent, NMF method works in frequency domain and enforces a non-negativity constraint on the original sources and their mixing components. The distinction between PCA in the frequency domain and time domain can be described based on how the eigenvalues are determined. In the time domain, the correlation matrix is employed. In the frequency domain, the Fourier transform of the correlation/spectral density matrix is employed to determine the eigenvalues. These differences have significant implication for the study presented in this paper. This paper proposes to compare the differences and performances of the ICA, PCA and NMF techniques in the context of frog sound separation. To the best of our knowledge, no comprehensive work has been dedicated to source separation of frog sounds. In what follows is the description of the basic principles of ICA, PCA and NMF based BSS technique. 2. Methodology 2.1. ICA approach for source separation ICA is a statistical analysis method of Blind Source Separation [8]. The term blind is used here because there is no explicit knowledge of source signals or the mixing systems besides the mixtures. ICA is formulated by which the independent original signals are extracted from the mixtures at multiple sensors. Imagine a room with two persons and two sensors (i.e. microphone) for recording. When these two persons speak at the same time, each sensor will register a particular linear combination of the two signals. From Fig. 1, if the two original signals denoted by 𝑆𝑆1 and 𝑆𝑆2 , then the linear combination of their mixtures, 𝑋𝑋1 and 𝑋𝑋2 can be expressed mathematically as: 𝑋𝑋1 = 𝐴𝐴11 𝑆𝑆1 + 𝐴𝐴12 𝑆𝑆2
𝑋𝑋2 = 𝐴𝐴21 𝑆𝑆1 + 𝐴𝐴22 𝑆𝑆2
(1)
(2)
Norsalina Hassan et al. / Procedia Computer Science 126 (2018) 363–372 Norsalina Hassan and Dzati Athiar Ramli / Procedia Computer Science 00 (2018) 000–000
365 3
where 𝐴𝐴11 , 𝐴𝐴12 , 𝐴𝐴21 and 𝐴𝐴22 represent the mixing matrix that generates 𝑋𝑋 from 𝑆𝑆. The ICA aims to recover the original signals 𝑆𝑆 only from the signal observations 𝑋𝑋 without specific prior information about the sources and the mixing system. By using the vector-matrix notation, the mixing process from equation (1) and (2) can be modelled as Separation System
Mixing System
S1
A11
X1
A21
Source Signals
A22
Y1
W21 W12
A12 S2
W11
X2
W22
Observed Signals
Y2 Separated/Estimated Signals
Fig. 1 Basic BSS
𝑋𝑋 = 𝐴𝐴𝐴𝐴
(3)
where 𝑋𝑋 , 𝑆𝑆 are random vectors and 𝐴𝐴 is the matrix of parameters. The model in equation (3) is known as independent component analysis or ICA model. This ICA model is a generative model that describe the computation of source signals or independent components, 𝑆𝑆 by estimating the mixing matrix 𝐴𝐴 from the known random vector 𝑋𝑋. After determining 𝐴𝐴 , demixing matrix, 𝑊𝑊 denoted by 𝑊𝑊 = 𝐴𝐴−1 is computed.Then, the estimated sources can simply be recovered as 𝑆𝑆 = 𝑊𝑊𝑊𝑊 (where 𝑆𝑆 is the real independent component) by maximizing the non-gaussianity to achieve the independence of sources [24]. In this work, negentropy and kurtosis are used to measure the nongaussianity of sources. The details of the algorithm can be found in [24]. 2.1.1. Data pre-processing for ICA Before applying specific ICA algorithm on the signals, it is imperative to execute some pre-processing to the observed data in order to reduce the complexity of ICA algorithm implementations. Common pre-processing often involves signal centering and whitening. Centering Centering is achieved simply by subtracting the mean, 𝐸𝐸(𝑥𝑥) of the signal from each reading of that signal. For observed variable 𝑥𝑥, the centered observation vector, 𝑥𝑥𝑖𝑖 is obtained as 𝑥𝑥𝑖𝑖 = 𝑥𝑥 − 𝐸𝐸(𝑥𝑥)
(4)
𝑆𝑆 = 𝐴𝐴−1 (𝑥𝑥𝑖𝑖 + 𝐸𝐸(𝑥𝑥))
(5)
This to ensure x has a zero-mean variable. By taking expectation on both sides of 𝑋𝑋 = 𝐴𝐴𝐴𝐴, implies that 𝑆𝑆 is zero mean as well. Once the mixing matrix 𝐴𝐴 is estimated with the centered data, we can obtain the actual estimates of the independent components as follows: Whitening
Norsalina Hassan and Dzati Athiar Ramli/ Procedia Computer Science 00 (2018) 000–000 Norsalina Hassan et al. / Procedia Computer Science 126 (2018) 363–372
4 366
After centering, the observed vector x was linearly transformed into a new vector which is white such that its components are uncorrelated and have variances equal to unity. Suppose 𝑍𝑍 denote the whitened vector, then it satisfies the following equation: 𝐸𝐸[𝑍𝑍𝑍𝑍 𝑇𝑇 ] = 1
(6)
where 𝐸𝐸[𝑍𝑍𝑍𝑍 𝑇𝑇 ] is the covariance matrix of 𝑍𝑍 . One simple method for whitening transformation is to use the eigenvalue decomposition (EVD) of 𝑥𝑥, 𝐸𝐸{𝑥𝑥𝑥𝑥 𝑇𝑇 } = 𝑌𝑌𝑌𝑌𝑌𝑌 𝑇𝑇 where 𝑌𝑌 is the matrix of eigenvectors of 𝐸𝐸{𝑍𝑍𝑍𝑍 𝑇𝑇 }, and 𝐷𝐷 is the diagonal matrix of eigenvalues, 𝐷𝐷 = 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 (𝑑𝑑1 , 𝑑𝑑2 , … … … , 𝑑𝑑𝑛𝑛 ). The observation vector can be whitened by the following transformation: 𝑍𝑍 = 𝑌𝑌𝐷𝐷 −1/2 𝑌𝑌 𝑇𝑇 𝑥𝑥 where
the
(7) matrix 1 − 2
𝐷𝐷 −1/2 = 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 (𝑑𝑑1 , orthogonal
1 − 2
𝐷𝐷 −1/2
is
obtained
1
− 𝑑𝑑2 , … … … . 𝑑𝑑𝑛𝑛 2 ) .
by
a
simple
component
wise
operation
as
Whitening transforms the mixing matrix into a new one, which is
𝑍𝑍 = 𝑌𝑌𝐷𝐷 −1/2 𝑌𝑌 𝑇𝑇 𝐴𝐴𝐴𝐴 = 𝐴𝐴𝑤𝑤 𝑠𝑠 (8) Hence,
𝐸𝐸{𝑍𝑍𝑍𝑍 𝑇𝑇 } = 𝐴𝐴𝑤𝑤 𝐸𝐸{𝑠𝑠𝑠𝑠 𝑇𝑇 }𝐴𝐴𝑇𝑇𝑤𝑤 = 𝐴𝐴𝑤𝑤 𝐴𝐴𝑇𝑇𝑤𝑤 = 𝐼𝐼
(9)
Whitening has the advantage of halving the number of parameters to be estimated. Instead of having to estimate the 𝑛𝑛2 elements of the original matrix A, we only need to estimate the new orthogonal mixing matrix, 𝐴𝐴𝑤𝑤 needs to be estimated since it only has n(n − 1)/2 degrees of freedom. By data whitening, the computational complexity of ICA is reduced and leads to a high probability of achieving a successful signal recovery. 2.1.2. FastICA algorithms
The FastICA algorithm is proposed in [10], is based on a fixed-point iteration scheme for finding a maximum of the non-gaussianity of 𝑤𝑤 𝑇𝑇 𝑥𝑥 . The resulting FastICA learning rule finds a direction, i.e, a unit vector 𝑤𝑤 such that the projection 𝑤𝑤 𝑇𝑇 𝑥𝑥 maximizes non-gaussianity. In this approach, it is assumed that the data is pre-processed by centering and whitening as discussed in the preceding section. The whole FastICA process is decomposed into samples, the following equation represents a one-unit FastICA algorithm for each data sample after the whitening process. [10] 1. 2. 3. 4.
Initiallize weight vector, 𝑤𝑤 (e.g random) Iterate: 𝑤𝑤 + = 𝐸𝐸{𝑥𝑥𝑥𝑥(𝑤𝑤 𝑇𝑇 𝑥𝑥)} − 𝐸𝐸{𝛼𝛼 ′ (𝑤𝑤 𝑇𝑇 𝑥𝑥)}𝑤𝑤 𝑤𝑤 +
Divide 𝑤𝑤 + by its norm, 𝑤𝑤 = ‖𝑤𝑤 + ‖
If not converge, go back to step 2.
Where 𝑤𝑤 is a column-vector of unmixing matrix. 𝑤𝑤 + is a temporary variable used to calculate 𝑤𝑤. 𝛼𝛼′ is the derivative of 𝛼𝛼 and 𝐸𝐸(. ) is the expected value (mean). In this work, the FastICA algorithm on two different frog sounds. A random mixing matrix was specified to mix the sound source signals. The mixtures are the pre-processed to recover the estimated sources Fig. 2 presents a flowchart of FastICA.
Norsalina Hassan and Dzati Athiar Ramli / Procedia Computer Science 00 (2018) 000–000
Norsalina Hassan et al. / Procedia Computer Science 126 (2018) 363–372
5 367
Start Centering and whitening the mix data, 𝑥𝑥 Choose an initial weight vector 𝑤𝑤 Update w, Find w by non-gaussianity: Kurtosis negentropy
Normalize 𝑤𝑤 No
Does w point at the same direction Yes Stop
Fig. 2 Flowchart of FastICA
2.2. PCA approach for source separation PCA was originally developed by Karl Person in 1901 which is then commonly used for signal processing for separating linear combination of signals [25]. PCA is a method that is used to reduce number of linear dimensional from multi-directional data in the maximum variance. It simplifies statistical problem by composing sample characteristic, where the similar elements or elements with highest variance are determined. The main steps of the PCA algorithm are presented in the following 1. 2. 3.
Centering input signal. Calculation of eigenvectors and eigenvalues of covariance matrix by using singular-value decomposition (SVD). It is used to produce diagonal matrices, S that has same dimension as input signal, X and nonnegative diagonal element, as well as unitary matrices U and V, such that X=U * S * V’. Compute principal component such that Zpca = S * V’.
2.3. NMF approach for source separation Non-negative matrix factorization (NMF) is a technique to factorize data utilizing predefined non-negativity constraints [22]. It can be written as follows
368 6
Norsalina Hassan et al. / Procedia Computer Science 126 (2018) 363–372 Norsalina Hassan and Dzati Athiar Ramli/ Procedia Computer Science 00 (2018) 000–000
(10)
𝑉𝑉 ≈ 𝑊𝑊
where 𝑉𝑉 ϵ RFxT is the audio spectrogram of original non-negative data, 𝑊𝑊 ϵ RFxK is a single column referred as a set of basis vector and 𝐻𝐻 ϵ RKxT is the matrix of activations/weights/gains, a row represents the gain of corresponding basis vector. Matrix 𝑉𝑉 is the output from the multiplication of the compressed matrix 𝑊𝑊 weighted by the component of 𝐻𝐻. The aim of NMF is to factorize 𝑉𝑉 into the product of 𝑊𝑊𝑊𝑊 in order to achieve source separation. The error of reconstruction can be minimized by finding such a pair of 𝑊𝑊 and 𝐻𝐻.
K→
F→
≈
K→ V
≈
T→
W
H
Fig.3 NMF with spectrogram data
Mixed data, 𝑥𝑥 STFT
Source Separation NMF 𝑉𝑉 = |𝑋𝑋|
Filter
|𝑋𝑋1 |
𝑊𝑊, 𝐻𝐻
|𝑋𝑋2 |
ISTFT
𝑋𝑋1
𝑋𝑋2
|𝑋𝑋3 | 𝑋𝑋3
Evaluation Fig.4 Flowchart of NMF
Fig.4 shows the flow of performing NMF source separation. By computing the squared magnitude of the shorttime Fourier (STFT), the spectrogram of signal X can be estimated. The spectrogram are then factorized into 𝑊𝑊 and 𝐻𝐻 in order to derive masking filters for extracting estimated sources These estimates are then synthesized into time domain signals via an ISTFT algorithm, the original phase components 𝑉𝑉 is added back into the estimates, and passed down to the next stage of the pipeline for evaluation.
Norsalina Hassan et al. / Procedia Computer Science 126 (2018) 363–372 Norsalina Hassan and Dzati Athiar Ramli / Procedia Computer Science 00 (2018) 000–000
369 7
3. Simulation and separation results All experiments were performed using Matlab. To evaluate and compare the performances of FastICA, PCA and NMF, we applied these algorithms to 3 sets of original source signals with duration of 184 seconds in total and 370 total number of syllables from our in-house frog sound dataset. Each set consists of sounds from two different frogs, S1 and S2 as shown in Fig. 5. FastICA, PCA and NMF algorithms have been applied on each of these set of mixtures to estimate the original sources.
Fig. 5 Three sets of original source signals
Fig. 6 Source Signals from Set 1 and their linear mixtures
Norsalina Hassan and Dzati Athiar Ramli/ Procedia Computer Science 00 (2018) 000–000 Norsalina Hassan et al. / Procedia Computer Science 126 (2018) 363–372
8 370
Fig. 7 Estimated Original Source Signals from Set 1 using FastICA, NMF and PCA
Here, we only display the simulation result for Set 1 while the whole results are accumulated in table below. Fig. 6 shows the original source signals from Set 1 and their linearly mixed signals. Fig. 7 illustrates the estimated source signals obtained from the mixtures after performing separation for Set 1. The performances of FastICA using negentropy and kurtosis are satisfying as it was able to separate the mixtures well. NMF was able to offer recognizable reconstruction. PCA tended to show poorer separation. To measure the performance of target species signal, we evaluate the separation quality using BSS_EVAL toolkit [26], which has the metrics of the signal to distortion ratio (SIR), signal to interference (SDR) and signal to artifact (SAR). The suppression of interference is reflected in SIR. The artifact introduced by the separation process is reflected in SAR and the overall performance is reflected in SDR.
80.00 70.00 60.00 50.00 40.00 30.00 20.00 10.00 0.00
SIR
dB
dB
SDR 80.00 70.00 60.00 50.00 40.00 30.00 20.00 10.00 0.00
Set 1
S1 S2 S1 S2 S1 S2 S1 S2 FastICA_negFastICA_kur NMF PCA 66.90 47.77 55.40 46.99 51.10 45.96 2.89 2.90
Set 1
Set 2
S1 S2 S1 S2 S1 S2 S1 S2 FastICA_negFastICA_kur NMF PCA 66.91 63.25 55.40 54.26 51.10 50.39 2.90 2.89
57.84 59.11 57.08 58.61 55.82 64.14 6.21 6.23
Set 2
Set 3
57.84 64.63 57.08 63.06 57.61 64.14 6.21 6.23
71.83 69.36 68.10 65.69 58.51 55.36 0.34 0.34
Set 3
Average 65.52 58.75 60.19 57.10 55.14 55.16 3.15 3.156 Set 1
Set 2
Set 3
Average
Fig. 8 The comparative results of SDR for FastICA, NMF and PCA
71.84 72.24 68.10 70.43 58.51 55.45 0.34 0.34 Average 65.53 66.71 60.19 62.58 55.74 56.66 3.15 3.15 Set 1
Set 2
Set 3
Average
Fig. 9 The comparative results of SIF for FastICA, NMF and PCA
Norsalina Hassan et al. / Procedia Computer Science 126 (2018) 363–372 Norsalina Hassan and Dzati Athiar Ramli / Procedia Computer Science 00 (2018) 000–000
3719
dB
SAR 120.00 100.00 80.00 60.00 40.00 20.00 0.00 Set 1
S1 S2 S1 S2 S1 S2 S1 S2 FastICA_neg FastICA_kur NMF PCA 98.78 48.96 95.76 47.89 90.13 45.66 49.72 20.89
Set 2
110.1 60.54 109.8 60.54 60.54 58.00 67.64 61.48
Set 3
104.9 72.50 104.0 69.00 105.4 72.50 75.55 75.46 Average 104.6 60.66 103.2 59.14 85.37 58.72 64.30 52.61 Axis Title Set 1
Set 2
Set 3
Average
Fig. 10 The comparative results of SAR for FastICA, NMF and PCA
From Fig 8-10, it shows that between two non-gaussianity methods applied on FastICA, negentropy gives the best values of SDR, SIR and SAR compared to kurtosis. It can also be noticed that FastICA methods considerably outperformed NMF and PCA gives the lowest values in all metrics compared to the rest of algorithms. 4. Conclusion We have presented a comparison of blind source separation using FastICA, PCA and NMF methods on three set mixtures of frog sounds. Visually, the simulation shows that FastICA and NMF are better than PCA for estimating the original source signals from the mixtures. However, from the objective evaluation result, it shows that FastICA gives the best value compared to NMF. PCA has the lowest performance and this confirms that PCA is not preferable for frog sound separation problem. From the experiment, we reach a conclusion that FastICA works well compared to NMF and PCA as it provides more clear output and perceptually more relevant to frog sound separation. Acknowledgements This work was supported (No:1001.PELECT.8014057).
by Universiti
Sains
Malaysia
under
Research
University
(RU)
Grant
References [1] Narins, Peter M., and Albert S. Feng. (2017) "Hearing and sound communication in amphibians: Prologue and prognostication." Hearing and sound communication in amphibians:1-11. [2] Obrist, Martin K., Gianni Pavan, Jérôme Sueur, Klaus Riede, Diego Llusia, and Rafael Márquez.(2010) "Bioacoustics approaches in biodiversity inventories." Abc Taxa 8 : 68-99. [3] Huang, Chenn-Jung, Yi-Ju Yang, Dian-Xiu Yang, and You-Jia Chen. (2009) "Frog classification using machine learning techniques." Expert Systems with Applications 36(2) :3737-3743. [4] Bee, Mark A., and Christophe Micheyl.(2008) "The cocktail party problem: what is it? How can it be solved? And why should animal behaviorists study it?." Journal of comparative psychology 122(3): 235-251 [5] Bell, Anthony J., and Terrence J. Sejnowski.(1995) "An information-maximization approach to blind separation and blind deconvolution." Neural computation 7 1034(6): 1004-1034 [6] Hyvärinen, Aapo.(2013) "Independent component analysis: recent advances." Phil. Trans. R. Soc. A Math Phys. Eng Sci 371(1984): 20110534-20110534 [7] Smita, Silk, Sharmila Biwas and Sandeep Singh Solanki (2014) “Audio Signal Separation and Classification: A Review Paper.” International Journal of Innovative Research in Computer and Communication Engineering. 2 (11): 6960–66. [8] Chien, Jen-Tzung, and Bo-Cheng Chen. (2006)"A new independent component analysis for speech recognition and separation." IEEE
372 10
[9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26]
Norsalina Hassan et al. / Procedia Computer Science 126 (2018) 363–372 Norsalina Hassan and Dzati Athiar Ramli/ Procedia Computer Science 00 (2018) 000–000
transactions on audio, speech, and language processing 14(4): 1245-1254. Anand, Vivek, and M. L. Dewal. (2011) "Implementation of blind source separation of speech signals using independent component analysis." International Journal of Computer Science and Information Technologies 2 (5): 2147-2151. Arya, Yogendra.(2017) "AGC performance enrichment of multi-source hydrothermal gas power systems using new optimized FOFPID controller and redox flow batteries." Energy 127 : 704-715. Bartlett, Marian Stewart, Javier R. Movellan, and Terrence J. Sejnowski. (2002) "Face recognition by independent component analysis." IEEE Transactions on neural networks 13(6): 1450-1464. Migliorelli, Carolina, Joan F. Alonso, Sergio Romero, Miguel A. Mañanas, Rafał Nowak, and Antonio Russi. (2015)"Automatic BSS-based filtering of metallic interference in MEG recordings: definition and validation using simulated signals." Journal of neural engineering 12(4) :046001 Szalai, János, and Ferenc Emil Mozes. (2014)"Determining fetal heart rate using independent component analysis." In Intelligent Computer Communication and Processing (ICCP): 11-16. Hyvärinen, Aapo and Erkki, Ojja (1997) “A Fast Fixed-Point Algorithm for Independent Component Analysis,” Neural Comput., vol. 9(7):1483–1492. Rutledge, Douglas N., and D. Jouan-Rimbaud Bouveresse. (2013) "Independent components analysis with the JADE algorithm." TrAC Trends in Analytical Chemistry 50: 22-32. Min, Zhang, Zhu Mu, and Ma Wenjie.(2012) "Implementation of FastICA on DSP for Blind Source Separation." Procedia Engineering 29 : 4228-4233. Oosugi, Naoya, Keiichi Kitajo, Naomi Hasegawa, Yasuo Nagasaka, Kazuo Okanoya, and Naotaka Fujii.(2017) "A new method for quantifying the performance of EEG blind source separation algorithms by referencing a simultaneously recorded ECoG signal." Neural Networks 93: 1-6. Lee, Daniel D., and H. Sebastian Seung.(1999) "Learning the parts of objects by non-negative matrix factorization." Nature 401(6755): 788. Schmidt, Mikkel N., and Rasmus K. Olsson. (2006) "Single-channel speech separation using sparse non-negative matrix factorization." In Ninth International Conference on Spoken Language Processing 2:2-5. Virtanen, Tuomas. (2007) "Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria." IEEE transactions on audio, speech, and language processing 15(3): 1066-1074. Ozerov, Alexey, Emmanuel Vincent, and Frédéric Bimbot. (2012) "A general flexible framework for the handling of prior information in audio source separation." IEEE Transactions on Audio, Speech, and Language Processing 20(4): 1118-1133. Kumar, S. Santosh, S. H. Bharathi, and M. Archana.(2016) "Non-negative matrix based optimization scheme for blind source separation in automatic speech recognition system." In Communication and Electronics Systems (ICCES), nternational Conference on Communication and Electronics Systems (ICCES):1-6. Févotte, Cédric, Nancy Bertin, and Jean-Louis Durrieu.(2009) "Nonnegative matrix factorization with the Itakura-Saito divergence: With application to music analysis." Neural computation 2(3): 793-830. Hyvärinen, Aapo. (1999) "Survey on independent component analysis." 2:94-128. Pearson, K. (1901) “Principal components analysis.” The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 6(2):559 Févotte, Cédric, Rémi Gribonval, and Emmanuel Vincent.(2015) "BSS_EVAL toolbox user guide--Revision 2.0.": 19.