warping function for sub-band error equalisation in ... - CiteSeerX

1 downloads 0 Views 78KB Size Report
when using the standard mel frequency warping and a linear fre- ... ing functions: linear, mel, and the sought-after optimum. ..... S. B. Davis and P. Mermelstein.
WARPING FUNCTION FOR SUB-BAND ERROR EQUALISATION IN SPEAKER RECOGNITION Roland Auckenthalery and John S. Masonz yDepartment of Electronics,Technical University Graz, Inffeldgasse 12,A-8010 GRAZ, AUSTRIA zDepartment of Electrical & Electronic Engineering, University of Wales Swansea, SA2 8PP, UK email: feeaucken, [email protected]

´ RESUM E´ Il est possible que le traitement de la parole avec des sous-bandes ait certains avantages par rapport au traitement avec une bande unique que ce soit dans le domaine de la reconnaissance de la parole ou de locuteur [1] [2] [3] [4]. Lors de notre pr´ec´edent travail de recherche [4], nous avons examin´e le taux d’erreurs a` propos de la division en sous-bandes quand on utilise la d´eformation en fr´equence ( frequency warping ) et un e´ chelle de fr´equence lin´eaire, ce qui a conduit a` montrer qu’un minimum d’erreur ce situe quelque part entre ces deux techniques. Ici, deux fonctions de d´eformations sont d´eriv´ees ce qui montres non seulement une distribution plus e´ gale du taux d’erreur des sous-bandes mais aussi une amm´elioration g´en´erale des taux de reconnaissance compar´e aux taux de reconnaissance utilisant les e´ chelles lin´eaires de fr´equence. Ceci est particuli`erement vrai pour un e´ chantillon de locutrice o on a d´ej`a prouv´e que l’´echelle lin´eaire est loin d’ˆetre optimale. Dans ce cas, le taux d’erreur est r´eduit de plus de 50%.

the context of speech and speaker recognition [1] [2] [3]. These include robustness against narrow-band noise, closer simulation of human perception [5], and the possibility of tailoring the processing in time and frequency. In our previous work [4] [6] we have examined some of the related issues in the context of speaker recognition. In particular we have shown that errors across sub-bands when using either the standard mel-frequency warping or a linear frequency scale are not evenly distributed. It is argued that an even error distribution would at the very least be convenient under the circumstances that narrow-band noise was at a level to drop or de-weight a sub-band. In fact in [4] we show that an analytical mel-like warping function, designed specifically in an attempt to equalise recognition errors across subbands, also leads to a small but consistent overall improvement in recognition performance. In this paper we first review this and related work, and then extend the drive towards equal errors across sub-bands by deriving a warping function which flattens the error profile. This is achieved using an iterative algorithm with a cost based directly on the error rates across the sub-bands.

2. SUB-BAND PROCESSING

ABSTRACT It is possible that sub-band processing might provide benefits over the conventional full-band approach, in speech and speaker recognition [1] [2] [3] [4]. In our previous work [4] we examined error rates across sub-bands when using the standard mel frequency warping and a linear frequency scale showing an optimum was likely to lie somewhere between the two cases.

Spectral Analysis (FFT) 128 bins

Frequency warping

linear

optimum

mel

32 bins

Band Splitting M bins M bins M bins

Here, two new warping functions are derived which show not only a more even distribution of sub-band errors, but also an overall improvement in recognition rates when compared with the standard mel case. This is particularly true for a set of female speakers where the mel scale is shown to be distinctly sub-optimal, and error rates are reduced by more than 50%.

Similarity

................. Measure

Weight and Add Li Final Decision on L

Figure 1: Concept of a sub-band processing in a speaker recognition system

1. INTRODUCTION It has been suggested recently that sub-band processing might provide some benefits over the conventional full-band approach, in

Figure 1 shows the band splitting approach which leads to subband processing. Spectral analysis via an FFT is followed by a warping and smoothing in frequency. The diagram shows 3 warp-

% Error

65 60 55 50 45 40 35 30



standard mel function:

mel scale linear scale

fmel =

f : 2595 log (1 + 700f ) :

f < 1000Hz f  1000Hz

(2)

by a mel-like function:

0

500

1000

1500

2000 2500 Frequency

3000

3500

4000

ing functions: linear, mel, and the sought-after optimum. In the current context, the optimisation aims at producing an equal distribution of recognition errors across the sub-bands, while maintaining or even improving the final system recognition rate. The reduction in the number of bins (here from 128 to 32) provides frequency domain smoothing without introducing excessive correlation across bins. Throughout the work the smoothing operation is based on standard mid-band to mid-band triangular shaped filters of [7]. The choice of 32 bins leads to simple divisions for multiple bands, and has also been shown to give good recognition performance [8]. The 32 bins are then divided into non-overlapping sub-bands of M bins; each sub-band classification produces a log-likelihood subband score, Li . Then, recombination occurs according to:

n X i=1

i Li with 1 = 2 = : : : = n



kf a log (f ? b) + d

: :

f > > < feqm = > > > :

2f f + 300 f + 600 f + 450 19 28 f + 1285:7 5 4 7 8

: f

Suggest Documents