warping function for sub-band error equalisation in ... - CiteSeerX

WARPING FUNCTION FOR SUB-BAND ERROR EQUALISATION IN SPEAKER RECOGNITION Roland Auckenthalery and John S. Masonz yDepartment of Electronics,Technical University Graz, Inffeldgasse 12,A-8010 GRAZ, AUSTRIA zDepartment of Electrical & Electronic Engineering, University of Wales Swansea, SA2 8PP, UK email: feeaucken, [email protected]

´ RESUM E´ Il est possible que le traitement de la parole avec des sous-bandes ait certains avantages par rapport au traitement avec une bande unique que ce soit dans le domaine de la reconnaissance de la parole ou de locuteur [1] [2] [3] [4]. Lors de notre précédent travail de recherche [4], nous avons examiné le taux d’erreurs a` propos de la division en sous-bandes quand on utilise la déformation en fréquence ( frequency warping ) et un e´ chelle de fréquence linéaire, ce qui a conduit a` montrer qu’un minimum d’erreur ce situe quelque part entre ces deux techniques. Ici, deux fonctions de déformations sont dérivées ce qui montres non seulement une distribution plus e´ gale du taux d’erreur des sous-bandes mais aussi une ammélioration générale des taux de reconnaissance comparé aux taux de reconnaissance utilisant les e´ chelles linéaires de fréquence. Ceci est particulièrement vrai pour un e´ chantillon de locutrice o on a déjà prouvé que l’échelle linéaire est loin d’être optimale. Dans ce cas, le taux d’erreur est réduit de plus de 50%.

the context of speech and speaker recognition [1] [2] [3]. These include robustness against narrow-band noise, closer simulation of human perception [5], and the possibility of tailoring the processing in time and frequency. In our previous work [4] [6] we have examined some of the related issues in the context of speaker recognition. In particular we have shown that errors across sub-bands when using either the standard mel-frequency warping or a linear frequency scale are not evenly distributed. It is argued that an even error distribution would at the very least be convenient under the circumstances that narrow-band noise was at a level to drop or de-weight a sub-band. In fact in [4] we show that an analytical mel-like warping function, designed specifically in an attempt to equalise recognition errors across subbands, also leads to a small but consistent overall improvement in recognition performance. In this paper we first review this and related work, and then extend the drive towards equal errors across sub-bands by deriving a warping function which flattens the error profile. This is achieved using an iterative algorithm with a cost based directly on the error rates across the sub-bands.

2. SUB-BAND PROCESSING

ABSTRACT It is possible that sub-band processing might provide benefits over the conventional full-band approach, in speech and speaker recognition [1] [2] [3] [4]. In our previous work [4] we examined error rates across sub-bands when using the standard mel frequency warping and a linear frequency scale showing an optimum was likely to lie somewhere between the two cases.

Spectral Analysis (FFT) 128 bins

Frequency warping

linear

optimum

mel

32 bins

Band Splitting M bins M bins M bins

Here, two new warping functions are derived which show not only a more even distribution of sub-band errors, but also an overall improvement in recognition rates when compared with the standard mel case. This is particularly true for a set of female speakers where the mel scale is shown to be distinctly sub-optimal, and error rates are reduced by more than 50%.

Similarity

................. Measure

Weight and Add Li Final Decision on L

Figure 1: Concept of a sub-band processing in a speaker recognition system

1. INTRODUCTION It has been suggested recently that sub-band processing might provide some benefits over the conventional full-band approach, in

Figure 1 shows the band splitting approach which leads to subband processing. Spectral analysis via an FFT is followed by a warping and smoothing in frequency. The diagram shows 3 warp-

% Error

65 60 55 50 45 40 35 30

standard mel function:

mel scale linear scale

fmel =

f : 2595 log (1 + 700f ) :

f < 1000Hz f 1000Hz

(2)

by a mel-like function:

0

500

1000

1500

2000 2500 Frequency

3000

3500

4000

ing functions: linear, mel, and the sought-after optimum. In the current context, the optimisation aims at producing an equal distribution of recognition errors across the sub-bands, while maintaining or even improving the final system recognition rate. The reduction in the number of bins (here from 128 to 32) provides frequency domain smoothing without introducing excessive correlation across bins. Throughout the work the smoothing operation is based on standard mid-band to mid-band triangular shaped filters of [7]. The choice of 32 bins leads to simple divisions for multiple bands, and has also been shown to give good recognition performance [8]. The 32 bins are then divided into non-overlapping sub-bands of M bins; each sub-band classification produces a log-likelihood subband score, Li . Then, recombination occurs according to:

n X i=1

i Li with 1 = 2 = : : : = n

kf a log (f ? b) + d

: :

f > > < feqm = > > > :

2f f + 300 f + 600 f + 450 19 28 f + 1285:7 5 4 7 8

: f

warping function for sub-band error equalisation in ... - CiteSeerX

warping function for sub-band error equalisation in ... - CiteSeerX

Suggest Documents

Warping and Partitioning for Low Error Shadow Maps - CiteSeerX

Frequency-Warping in Speech - CiteSeerX

A Lower Error Bound for Oversampled Subband ... - Semantic Scholar

SUBBAND ACOUSTIC WAVEFORM FRONT-END FOR ... - CiteSeerX

Fiscal Equalisation

Watermarking Using Subband DCT - CiteSeerX

Reduced Dynamic Time Warping for Handwriting ... - CiteSeerX

a generic framework for filtering in subband-domain - CiteSeerX

Reduced Dynamic Time Warping for Handwriting ... - CiteSeerX

Warping Techniques for Light Fields - CiteSeerX

THRESHOLD DYNAMIC TIME WARPING FOR SPATIAL ... - CiteSeerX

Results for Room Acoustics Equalisation Based on ... - CiteSeerX

A Novel Method for De-warping in Persian Document ... - CiteSeerX

Local government financial equalisation

Local government financial equalisation

Blind Channel Identification and Equalisation in OFDM ... - CiteSeerX

local fiscal equalisation in switzerland: the case of the ... - CiteSeerX

Frequency-Warping in Speech

A SUBBAND BASED SPATIO-TEMPORAL NOISE ... - CiteSeerX

Subband Transforms

Subband Coder for Pre-echo

Subband Adaptive Filtering For Acoustic Echo Control ... - CiteSeerX

Subband architecture for Hybrid Filter Bank A/D converters - CiteSeerX

Performance Analysis of Subband Arrays - CiteSeerX