MATLAB Toolbox

220

A MATLAB Toolbox A.1 Chapter Organization Due to the strong link between theory and practice in speech communication applications, this book is supplemented with many experiments to demonstrate the usefulness of phase-aware processing in several different applications. In addition, it is the authors’ strong belief that having access to the corresponding MATLAB code is necessary to move the young field of phase-aware speech processing forward. This appendix provides the list of implementations used to produce the results presented in the book. The contents and the corresponding sections where the experiment was used are described. A detailed description of the PhaseLab Toolbox is provided.

®

A.2 PhaseLab Toolbox

®

We introduce the PhaseLab Toolbox, comprising the MATLAB implementations of selected experiments presented in the course of this book. For each Chapter, the files are organized into two folders: • main folder, containing implementations of the experiments themselves; • additional functions, which are called from within the main files.

® Code

A.2.1 MATLAB

®

One subfolder is dedicated to each chapter; here, the MATLAB implementations are located together with readme.txt and readme.pdf files, which provide information on how to use the files available in the PhaseLab Toolbox. A list of the files included in the PhaseLab Toolbox is shown in Table A.1. For further reference, for each file a description together with the figure or experiment where it is used are also provided in the table. The MATLAB code files are downloadable as .rar archives from https://www.spsc.tugraz .at/PhaseLab. The web page also provides several audio examples as supplementary material.

®

Single Channel Phase-Aware Signal Processing in Speech Communication: Theory and Practice, First Edition. Pejman Mowlaee, Josef Kulmer, Johannes Stahl, and Florian Mayer. © 2017 John Wiley & Sons, Ltd. Published 2017 by John Wiley & Sons, Ltd.

A MATLAB Toolbox

®

Table A.1 Filename, description, and experiment number for each MATLAB implementation used in the book and included in the PhaseLab Toolbox. Filename

Description

Exp./Fig.

Exp1_2.m

Effects of phase modification

Exp. 1.2

Exp1_3.m

Mismatched window experiment

Exp. 1.3

Exp2_1.m

One-dimensional phase unwrapping

Exp. 2.1

Exp2_3.m

Comparative study of group delay spectra

Exp. 2.3

Exp2_5.m

Circular statistics of the spectral phase

Exp. 2.5

Exp2_6.m

Comparative study of phase representations

Exp. 2.6

Exp3_1.m

Monte Carlo simulation: ML versus MAP phase estimator

Exp. 3.1

Exp3_2.m

Monte Carlo simulation: window impact on phase estimation

Exp. 3.2

Exp3_3.m

GLA versus FGLA for phase retrieval

Exp. 3.3

Exp3_4.m

Phase estimation comparative study

Exp. 3.4

Fig4_9.m

Deterministic components and complex coefficients distribution

Fig. 4.9

Exp4_3.m

Sensitivity analysis of phase-aware amplitude estimators

Exp. 4.3

Exp5_1.m

Phase estimation for proof-of-concept signal reconstruction

Exp. 5.1

Exp5_2.m

Comparative study of GLA-based phase reconstruction methods

Exp. 5.2

Exp5_3.m

Phase-aware time frequency masks

Exp. 5.3

Exp5_5.m

Complex matrix factorization (CMF): Figure 5.20

Exp. 5.5

Exp6_2.m

Phase and perceived quality estimation

Exp. 6.2

Exp6_3.m

Phase and speech intelligibility estimation

Exp. 6.3

Exp6_4.m

Evaluating the phase estimation accuracy

Exp. 6.4

A.2.2 Additional Material

Additional material is required when using the code to reproduce the experiments described in the book. For example, speech files selected from the GRID (Cooke et al. 2006), SiSEC (Araki et al. 2012), or TIMIT (Garofolo et al. 1993) databases need to be acquired separately. Also, some experiments require access to other speech processing toolboxes, including COVAREP (Degottex et al. 2014), VOICEBOX (Brookes et al. 2005), CircStat (Barens 2009), and CMF Toolbox (King and Atlas 2012). For performance evaluation in speech enhancement, perceptual evaluation of speech quality (PESQ; Rix et al. 2001) and short-time objective intelligibility measure (STOI; Taal et al. 2011) software is required. To quantify the source separation performance, the blind source separation evaluation (BSS EVAL; Vincent et al. 2006) is required.

References S. Araki, F. Nesta, E. Vincent, Z. Koldovsk`y, G. Nolte, A. Ziehe, and A. Benichoux, The 2011 Signal Separation Evaluation Campaign (SiSEC2011): Audio Source Separation, Proceedings of the International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA), pp. 414–422, 2012.

221

222

Phase-Aware Signal Processing in Speech Communication

P. Barens, CircStat: A MATLAB toolbox for circular statistics, Journal of Statistical Software, vol 31, no. 10, pp. 1–21, 2009. M. Brookes et al., VOICEBOX: Speech Processing Toolbox for MATLAB, [Online], http:// www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html, 2005. M. Cooke, J. Barker, S. Cunningham, and X. Shao, An audio-visual corpus for speech perception and automatic speech recognition, The Journal of the Acoustical Society of America, vol. 120, pp. 2421–2424, 2006. G. Degottex, J. Kane, T. Drugman, T. Raitio, and S. Scherer, COVAREP: A Collaborative Voice Analysis Repository for Speech Technologies, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 960–964, 2014. J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, and N. L. Dahlgren, DARPA TIMIT Acoustic Phonetic Continuous Speech Corpus CDROM, National Institute of Standards and Technology (NIST), 1993. B. King and L. Atlas, Complex Matrix Factorization Toolbox Version 1.0 for MATLAB, [Online], https://sites.google.com/a/uw.edu/isdl/projects/cmf-toolbox, University of Washington, 2012. A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, Perceptual Evaluation of Speech Quality (PESQ): A New Method for Speech Quality Assessment of Telephone Networks and Codecs, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, pp. 749–752, 2001. C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, An algorithm for intelligibility prediction of time–frequency weighted noisy speech, IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 7, pp. 2125–2136, 2011. E. Vincent, R. Gribonval, and C. Févotte, Performance measurement in blind audio source separation, IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 4, pp. 1462–1469, 2006.