The International Congress for global Science and

0 downloads 0 Views 1MB Size Report
for cochlear implant coding, coefficients at 22 scales, ..... cochlear implants using neural network ..... Since iLr is more than load current I0, the capacitor CS.
The International Congress for global Science and Technology ICGST International Journal on Digital Signal Processing (DSP)

Volume (9), Issue (I) June, 2009 www.icgst.com www.icgst-amc.com www.icgst-ees.com © ICGST LLC, 2009

ACSE-DSP Journal ISSN: Print 1687-4811 ISSN Online 1687-482X ISSN CD-ROM 1687-4838 © ICGST LLC, Delaware, USA, 2009

Table of Contents DSP, Volume 9, Issue I, June, 2009 Papers P1180844451 Ruchi Pasricha and Sanjay Sharma “An FPGA-Based Design of Fixed-Point Kalman Filter” P1180848514 P.Ramanathan and P.T.Vanathi “Power Delay Optimized Adder for Multiply and Accumulate Units” P1180852542 K.M Ravikumar and R.Rajagopal and H.C.Nagaraj “An Approach for Objective Assessment of Stuttered Speech Using MFCC Features” P1180843439 Roshen Jacob and Tessamma Thomas and A.Unnikrishnan “Analysis of Underwater Transients using Lifting Based Wave Packet Transform” P1180847507 Sumithra M G and Thanuskodi K and Anitha M R “Modified Time Adaptive Wavelet Based Approach for Enhancing Speech from Adverse Noisy Environments” P1180924786 A.K. Panda and Hari N.Pratihari and Bibhu Prasad Panigrahi and L.Moharana “A Zero Voltage Transition Synchronous Buck Converter with an Active Auxiliary Circuit”

Pages 1--9

11--17

19--24

25--32

33--40

41--49

A publication of the International Congress for global Science and Technology - (ICGST)

ICGST Editor in Chief: Dr. rer. nat. Ashraf Aboshosha www.icgst.com, www.icgst-amc.com, www.icgst-ees.com [email protected]

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

An FPGA-Based Design of Fixed-Point Kalman Filter          

 

Ruchi Pasricha1, Sanjay Sharma2 Assistant Professor, CEC, Landran 2 Assistant Professor, Thapar University, Patiala 1

representation (or equivalently in the Kalman filter) may be associated either with the Levinson recursions for factoring the inverse of the correlation matrix for the time series or with the LeRoux-Gueguen recursions for factoring the correlation matrix itself. The latter association leads to a fixed-point algorithm for computing Kalman gains. This so-called fast algorithm produces a fast Kalman filter. In this paper, we present results from a study of fast Kalman predictors, implemented in floating point and in fixed-point arithmetic, for autoregressive moving average time series. More extensive results of this study, for noisy and noise-free filtering and prediction, may be found in the thesis of Sigurdsson [l]. In our summary of results for Kalman filtering we draw heavily upon the work of Morf, Kailath, Anderson, and Moore. [2-3]. In our derivation of scaling rules and expressions for rounding error variances we adapt the stationary results of Jackson [4] and Mullis and Roberts [5] to our non-stationary problem. The organization of the paper is as follows: The communication system under consideration is discussed in section 2. The specifications of the system for final simulation are described in section 3. Section 4 contains the square root filtering technique which is used to improve the numerical stability of the filter. The simulation results are presented in section 5 and the FPGA implementation is illustrated in section 6.

Abstract In this paper we study scaling rules and round-off noise variances in a fixed-point implementation of the Kalman filter for an ARMA time series observed noise free. The kalman filter is realized in a fast form that uses the so-called fast Kalman gain algorithm. The algorithm for the gain is fixed point. Scaling rules and expressions for rounding error variances are derived the numerical results show that the fixedpoint realization performs very close to the floating point realization for relatively low-order ARMA time series that are not too narrow band. The floating-point model of the Kalman filter is simulated on matlab and then the design was translated into the fixed-point one using C language. The RTL version of the model was created in VHDL. Experimental results were obtained by running the fixed-point and floatingpoint filters on identical data sets and a close matching is found between them. RTL simulation is also done and the results obtained were similar to the fixed- as well as floating-point models. Keywords: FPGA, Kalman, Fixed-point, DQPSK

1. Introduction Finite-dimensional Gaussian time series have stationary Markovian state-space descriptions. In such descriptions the initial conditions are multivariate normal and state variables are predicted values of the series based on an infinite past of observations. The linear filtering problem is one of estimating the state at time t based on observations up to time r and the prediction problem is one of predicting the state at time t + 1 based on observations up to time t. Corresponding to the Markovian representation is the innovations representation. The essential characteristic of this nonstationary representation is that it may be used to synthesize a time-series, starting from zero initial conditions, whose second-order statistics match the statistics of the original time series. The states are predicted values of the time series based on a finite past of observations. Using this representation, the Kalman filter may be written down from inspection as the causal and stable inverse of the representation. The so-called Kalman gain in the innovations

2. The Communication System The signal model for a communication system using DQPSK signaling scheme is shown in figure 1. For a signal that has been differentially encoded, there is an obvious alternative method of demodulation. Instead of demodulating as usual and ignoring carrier-phase ambiguity, the phase between two successive received symbols is compared and used to determine what the data must have been. When differential encoding is used in this manner, the scheme is known as differential phase-shift keying (DPSK). Note that this is subtly different to just differentially-encoded PSK since, upon reception, the received symbols are

1

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

hk = AIhk −1 + BIhk − 2 + CIh k −3 + DIw k

not decoded one-by-one to constellation points but are instead compared directly to one another.

(4)

where w k is a (p+1) × 1 zero mean white gaussian process with the covariance matrix defined as

{

}

E wk wlT = Q δ kl . Equation (4) tells us that h k depends only on three successive impulse responses i.e. r=3. Using (3) and (4), we can write

⎡ AI ⎢ xk+1= I ⎢ ⎢⎣ 0

Figure 1 The Signal Model for Baseband Communication System

x k = (h k , h k −1 , h

k −2

,…, h k −r +1 )

h k =(h k ,0 , h k ,1 ,…, h k , β )

Where

1 − Az −1

D − Bz − 2 − Cz −3

(5)

F

and

G

(6) are

3( β + 1) × 3( β + 1) and

3( β + 1) × ( β + 1) matrices respectively, I is the

(1)

identity matrix. F is called the state transition matrix and G is the process noise-coupling matrix. By defining the 3( β + 1) × 1 vector H k as

(2)

H k = (bk , bk −1 , bk −2 ,..., bk − β , o,...o)

A model for a two-ray fading channel is shown in figure 2. The Gaussian complex random signals x(k) and y(k) are shaped in the fading filter according to the maximum doppler frequency shift to produce the multiplicative coefficients. The fading filter can be approximated by a third order filter and its transfer function can be written as

P( z ) =

I

⎡ DI ⎤ ⎢ 0 ⎥w ⎢ ⎥ k ⎢⎣ 0 ⎥⎦

x k +1 = Fx k + Gwk

where h k is a ( β + 1 ) dimensional complex Gaussian random vector at sampling time k T

0

CI ⎤ 0 ⎥⎥ xk + 0 ⎥⎦

Or

The fading channel includes the transmitter-shaping filter that leads to a corresponding state space model for this channel [6]. The state of such a system can be represented by a vector which consists of r subsequent channel impulse responses as T

BI

(7)

Where b k is the transmitted information sequence at the sampling intervals and 2 × ( β + 1) zeros are inserted after bk − β , we can write the received signals as

z k = H k xk + nk

(3)

(8)

z k is a convolutional sum and where, H k is the input data sequence to a fading channel with impulse response x k and n k ’s the additive gaussian noise

{

}

with the covariance of E = n k nl∗ = N 0δ kl . Equations (6) and (8) describe a linear time-variant system, whose state is represented by x k . Some estimation method has to be used for channel estimation. There are many different estimation techniques, which include the Kalman filter as an optimum for minimizing the mean square estimation error [7]. However, the Kalman filter approach to track the channel is computationally intensive and in practice sub-optimal methods are more advantageous due to resulting implementation simplicity.

  Figure 2 The Fading Channel Model

An autoregressive moving average (ARMA) representation of the wide-sense stationary Gaussian random vector h k is introduced as

To avoid the decision delay in data detection, the joint data and channel estimation method as of is used. In this method, there is channel estimate for

2

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

every possible transmitted data sequence, H k . Each estimator uses its own hypothesized data vector for H k and based on this and the received signal, it provides an estimate of the channel impulse response. Only the survivor paths keep and update the channel estimate; and hence the data sequence of the shortest path is used for the channel estimation along the same path.

Time Update Equations

xˆ k + 1 = F xˆ k | k

P k + 1 FP

k

= Pk H

−1 k

R

To illustrate the design methodology, a data communication system based on the IS_136 standard is considered. The modulation is QPSK with four possible symbols ( ±1 ± j ) and a symbol rate of 25 ksymbols/s. As in the IS-136 standard, the differentially encoded data sequence is arranged into 162 symbol frames. The first 14 symbols of each frame is a framing preamble sequence to help the adaptation of the channel estimator. For the shaping filter at the transmitter, we implement a finite impulse response (FIR) filter, which approximates a raised cosine frequency response with an excess bandwidth of 25% (slightly different from the 35% selected in IS-136).

K R

k k

= Pk H = H

k

T k

R

Pk H

updated

P k |k = P k − K

k

+ N H

k

k

R

Pk H

−1

T k

(z

− H

k

k

xˆ k )

+ λ

( Pk − K

k

H

k

Pk )

squares

estimate

of

xk

given

estimate

of

x k given

In the RLS observations (z 0 , z1 ,...z k ) . algorithm, λ is called the forgetting factor. The kalman filter consists of two parts: measurement update equations, and time update equations. As shown in [8] the RLS algorithm is essentially identical to the measurement update equations of the Kalman filter. The Kalman filter can be used for channel estimation when some a priori information about the channel is available at the receiver (i.e. F and G matrices). The RLS algorithm, which is sub optimal method, does not require the a priori information and its computational complexity is less compared to the kalman filter. The main reason for the difference between theory and practice of implementing these algorithms can be found in the error analysis of the respective numerical methods. At the same precision, mathematically equivalent implementations can have different numerical stabilities, and some methods of implementation are more robust against round-off errors. In the Kalman and RLS algorithms, the estimation depends upon the correct computation of the error covariance matrix. Factorization methods and “square root” filtering are well known for their numerical stability and are widely employed as implementation techniques [9].

4. Square Root Filtering There are different factorization methods within which different techniques are used for changing the dependent variable of the recursive estimation algorithm to factors of the covariance matrix. A Cholesky factor of a symmetric nonnegative definite

xˆ k

matrix M is a matrix C such that CC T =M. Cholesky decomposition algorithms solve for C that is either upper triangular or lower triangular. The modified Cholesky decomposition algorithms solve for a diagonal factor, and either a lower triangular factor L or an upper triangular factor U such that

−1 k T k

= H

T k

observations (z 0 , z1 ,..., z k ) , and xˆ k +1 is the time

To estimate the states of the system described by (6) and (8), the Kalman filter and the RLS algorithm can be employed. The following are the Kalman filter and RLS algorithm equations. Kalman Filter Algorithm Measurement Update Equations k

k

least

Implementation of the Channel Estimator

−H

T

The measurement update estimate xˆ k |k is the linear

and α 1 are produced at the output of two fading filters, where the inputs are two independent zero mean complex gaussian process with equal variances. The length of the discrete impulse response of the shaping filter is set equal to the symbol interval so that the ISI at the receiver is only due to the multipath nature of the channel. Since one ray is delayed by an amount equal to one symbol interval, the total length of the CIR is two symbol intervals i.e. β + 1 = 2 if there is one sample per symbol interval. Therefore, there is ISI between two neighbouring symbols and there are four possible states in the trellis diagram.

k

k

Pk +1 = λ

In order to keep the simulation simple, we consider a two-ray fading channel model as described in figure 2, where one ray has a fixed delay equal to one symbol period. The multiplicative coefficients α 0

(z

+ GQG

T

xˆ k + 1 = xˆ k + K K

k

F

The RLS Algorithm

3. The Simulated system

xˆ k | k = xˆ k + K

k |k

0

Pk

3

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

M = UDU U T = LD L L T , where D L and DU are

or division. This can be avoided by using LDU factorizations.

diagonal factors with nonnegative diagonal elements. The square methods propagate the L D or U D factors of the covariance matrix rather than covariance matrix itself. The propagation of square root matrices implicitly preserves the Hermitian symmetry and nonnegative definiteness of the computed covariance matrix. The condition number k ( P) = [eigenvaluemax ( P ) / eigenvaluemin ( P )] of

P k = LDL

P k |k = L

k ( P) = k ( LDL ) = k ( BB ) = [k ( B)] , T

2

D

(10)

and

L Tp

p

Dropping all time-index subscripts, equation (9) can be written as ⎡ 1 ⎢ 0 ⎣

the covariance matrix P can be written as T

p

T

⎡ 1 ⎢ K ⎣

where

B = LD . Therefore, the condition number of B used in the square root method is much smaller than the condition number of the P and this leads to improved numerical robustness of the algorithm.

⎤ ⎡ N ⎥ ⎢ 0 ⎦ ⎣

H L L 0 L

p

0 D

0

⎤ ⎡ R ⎥ ⎢ 0 ⎦ ⎣

0 D

p

⎤ ⎥ ⎦

1 / 2

⎤ ⎥ ⎦

1 / 2

Θ

=

(11)

1/ 2

Therefore, to compute the measurement update equations for the covariance matrix, the left hand side of (11) is applied with an orthogonal transformation; the upper triangular matrix on the left side can be converted to the lower triangular matrix on the right side. This is possible by introducing weighted norms and rotating the vector [1 p 2 ] to lie along the

There are also other factorization methods employed for increasing the numerical stability, such as triangulariaztion (QR decomposition) and WGS orthonormalization used for factorizing matrices as products of triangular and orthonormal matrices. The block matrix factorization of a matrix expression is a general approach that uses two different factorizations to represent the two sides of an equation such as CC

T

= AA

T

+ BB

T

= [A

vector [1 0] , keeping equality of the weighted norms

⎡A ⎤ B] + ⎢ T ⎥ ⎣B ⎦

[1

T

The alternative Cholesky factor C and [ A B ] can be related by orthogonal transformation. The measurement update equation for the covariance matrix of the kalman filter can be written as

P k |k = P k − P k H

T k

R

−1 k

H

k

0 ⎤⎡ 1 ⎤ ⎡d p 1 p 2 ]⎢ ⎥ ⎢ ∗⎥ = d 0 p 2 ⎣ ⎦ ⎣p2 ⎦ 0 ⎤ ⎡1 ⎤ ⎡d q 1 0]⎢ d q 2 ⎥⎦ ⎢⎣ 0 ⎥⎦ ⎣ 0

[1

(12)

Where p ∗ is the complex conjugate of p. It can be verified that by knowing all the parameters in the right hand side, we must choose

Pk

d

The implementation algorithm is based on working with LDU (unit lower triangular, diagonal, unit upper triangular) factorizations of Pk and Pk |k , and since

=

q 1

d

p 1

+

d

p 1

| p

|

2

2

d

p 2

(13)

And

d

=

q 2

d

d

p 2

(14)

q 1

Pk and Pk |k are symmetric, U = LT . to obtain the proper orthogonal transformation matrix in

It can be shown that by choosing a suitable orthogonal transformation matrix Θ , we can have

⎡N01/ 2 Hk Pk1/ 2 ⎤ ⎡ Rk1/ 2 0 ⎤ Θ= ⎢ ⎥ ⎢ T −T / 2 1/ 2 ⎥ Pk1|k/ 2 ⎥⎦ Pk ⎦ ⎣⎢Pk Hk Rk ⎣ 0

[1 [1

(9)

p

2

⎡d ] ⎢ 0p 1 ⎣

⎡d q1 0 ]⎢ ⎣ 0

0 d

p 2

0 d

q 2

⎤ ⎥ ⎦

⎤ ⎥ ⎦

1/2

1/2

Θ =

(15)

and this can be immediately verified by squaring both sides of (9). Generally, computing the triangular

By comparing (11) and (15), we can find that the values of d q1 and d q 2 which are components of R

factor Pk1|k/ 2 requires taking arithmetic square roots,

and D p in the right hand side of (11) and the next

which are usually more expensive than multiplication

step is to find K and L p in the unit lower triangular

4

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

matrix. This can be done by applying the transformation in (15) to an arbitrary vector

[ p1′

compute the FL. Also, in the time update equations, the xˆ k |k vector obtained at the output of

p 2′ ] , and we obtain

[ p 1′

p 2′

[q 1′

⎡d

]⎢

⎣ 0

d

⎡d q1 q 2′ ] ⎢ ⎣ 0

0 d

Where

q 2′

=



p

⎤ ⎥ ⎦

0

p 1

2

p 2

⎤ ⎥ ⎦

q 2

p ′

measurement update procedure, must be premultiplied by F. If the multiplication is performed using array processors, the same structure can perform both of these multiplications, without any change in the hardware. [11].

1/2

Θ =

(16)

1/2

+

1

p 2′

(17)

⎞ ⎟ q 2′ ⎟ ⎠

(18)

The above algorithms were employed to simulate the Kalman filter and the RLS algorithm with different number of bits for mantissa in the floating-point operations. In figure , the expectation of the mean square error (MSE) in estimation of the impulse response of a Rayleigh fading channel is plotted versus the mantissa word length in the floating-point operations. The simulated system is as described in section II. Two different implementation methods are considered for the Kalman filter along with the RLS

And

⎛ p 1′ + ⎜ p ⎜ ⎝

q 1′ =

∗ 2

d

p 2

d

q1

and LMS algorithms, when

Implementing Time Update Equations

Eb is 15 dB. Based on N0

the simulation results, the Kalman filter requires minimum of 20 bits in the mantissa for channel estimation to a satisfactory level. While for the RLS algorithm, the number of bits required is approximately 12. Of course, the best attainable result with the RLS algorithm is inferior to that of the Kalman filter, but it requires more intense computations and large hardware resources. The effect of reducing the number of bits on the overall BER performance is shown in figure 3.

The above algorithm would be sufficient for implementing the RLS estimator, since the RLS algorithm is basically the same as the measurement update equations of the Kalman filter. However, for implementing the Kalman filter, we need to calculate the time update equations. Instead of Pk |k , we have computed the L p and D p factors, therefore we need to employ an algorithm to use these factors. The Weighted Gram-Schmidt Orthogonalization is usually employed for this purpose [10]. In this method, the covariance update equation implementation is based on a Block Matrix Factorization. In the same update equations of the Kalman filter the covariance update equation can be written in the following matrix form

[

P k + 1 = FP

1/ 2 k |k

GQ

1/ 2

]⎡⎢ QP ⎣

T / 2 k |k T / 2

F G

T T

⎤ ⎥ ⎦

(19)

Again, if we use the LDU factorization for the covariance matrix, and show the diagonal matrix Q with D p after dropping all time index subscripts,

Figure 3 The Effects of changing the word length on estimation methods ( E b

then equation (19) becomes LDL

T

[

= FL

p

⎡D p G ⎢ ⎣ 0

]

0 ⎤ ⎡ L Tp F D q ⎥⎦ ⎢⎣ G T

T

⎤ ⎥ ⎦

/ N 0 =15dB)

The expectation of the mean square error (MSE) in estimation of the impulse response of a Rayleigh fading channel is plotted versus the mantissa wordlength in the floating point operations. Three different implementation methods of Kalman filter are considered, along with the RLS and LMS algorithms, when is 15 dB ( is energy pertransmitted-bit). For the measurement update of the

(20)

L p and D p in the right side of the equation (20), are known from the measurement update procedure, and L and D have to be computed. Before applying this algorithm, a matrix multiplication is required to

5

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

Kalman filter, and also for the RLS estimator, the square-root method of has been used. The step size in LMS and the forgetting factor in RLS are chosen to yield the best MSE. The initial values for the states are chosen randomly, also for and we choose the identity matrix as the initial value. As is clear that, with the Kalman filter the minimum achievable MSE is much lower than that of LMS and RLS algorithms. However, a larger number of bits is required for the Kalman filter. Direct implementation of the Kalman filter requires at least 26 bits per mantissa, while two other methods require 22 bits. The minimum number of bits per mantissa required for channel estimation with the RLS and LMS algorithms are 12 and 8, respectively.

Figure 4 Systolic Array architecture for LDC algorithm

To employ the Kalman filter as a channel estimator in a mobile communication receiver or for digital Predistortion of RF amplifiers to improve the nonlinearity, it is important to carry out all of the required computations in real time. The Kalman estimator is computationally intensive and, to speed up the estimation process, parallel VLSI structures have to be sought for implementation. It is also imperative to utilize the inherent parallelism of the proposed algorithm to be mapped on the parallel VLSI structure. The LDC algorithm is used in linear algebra to update the LD factorization of a matrix A to the LD factorization of A+vvT, where A is symmetric and positive definite and v is an arbitrary vector with appropriate size [12].

Figure 5 Mapping the systolic structure of the correction algorithm to a smaller number of processors.

5. Simulation Results of Kalman Filter  

The LDC algorithm is more appropriate than other methods for implementation of the time update measurement equations of the Kalman channel estimator. A systolic array architecture of the LDC algorithm is shown in figure 4. The mapping of the systolic structure of the correction algorithm to a smaller number of processors is shown in figure 5. This allows us to employ the LDC algorithm, which results in a considerable saving in computations compared to the WGS method.

Input Signal without Noise 0.5 0 -0.5

0

100

200

300

400

500

600

700

Input Signal with Random Noise 1 0 -1

0

100

200

300

400

500

600

700

0

100

200

300

400

500

600

700

1 0 -1

Figure 6 Simulation Results of Floating Point Model of Kalman Filter

Figure 6 shows the simulation results of the Matlab based floating point model of the Kalman filter. The top plot shows the input signal without noise, the

6

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

shown in figure 8. The floating point was varied in bothe directions and the top plot shows the most negative and most positive cases. The top plot and the bottom plot clearly shows the fixed-point model gives almost the same results as floating-point counterpart.

middle one shows the input signal with noise and the bottom plot shows the signal output of the Kalman filter floating–point model. S

0.6

Floating-Point Most Positive Floating-Point Most Negative

0.4

6. FPGA Implementation

Value

0.2

The VHDL model of the Kalman filter was created and its behavioral simulation was done using Xilinx ISE 10.1 HDL simulator and the timing waveforms are shown in figure 9. The top-level RTL schematic is shown in figure 10. The Kalman filter was then synthesized using the inbuilt XST tool in Xilinx Foundation Series ISE 10.1 and the results are given in table 1. The target device was chosen to be Virtex 5 Xc5vsx50t.

0 -0.2 -0.4 -0.6 -0.8

0

100

200

300 400 Sample Number

500

600

700

Floating-Point Histogram 250 Floating-Point

Occurances

200 150 100 50 0 -0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

Value

Figure 7 Variation of the floating point

Figure 7 shows the variation of the floating point being relatively positive and negative Figure 9 Behavioral simulation of the Kalman Filter

S

Largest error = 1.075016e-002 at sample 1 0.6 Floating-Point Most Positive Floating-Point Most Negative Fixed-Point Most Positive Fixed-Point Most Negative

0.4

Value

0.2 0 -0.2 -0.4 -0.6 -0.8

0

100

200

300 400 Sample Number

500

600

700

Fixed-Point Histogram 250 Floating-Point Fixed-Point

Occurances

200 150 100 50

Figure 10 RTL schematic 0 -0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

Value

The requested frequency for the design was 145 MHz and the estimated frequency was 132.4 MHz (register to register worst case) with an estimated period of 7.5530 ns. The design can handle an input sampling rate of 396.400 KSPS. The snapshot of the software used is shown in figure 10.

Figure 8 Comparison of the floating-point and fixed point models

A fixed-point model of the Kalman Filter was created using C and the system simulation was done as in the case of floating-point model. The results were same in the fixed-point as was for floating-point model

7

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

7. Conclusion A fixed-point implementation of the Kalman filter is presented in this paper for its VLSI implementation on FPGA. The floating point version is compared with the fixed-point counterpart. The overall architecture can be traded-off for unfolding to increase the speed and for folding to minimize the silicon area. Square-root and matrix decomposition techniques can be used to improve the immunity towards numerical errors. The design was implemented on Xilinx tools and overall design achieved a speed of 132.4 MHz and it can handle a sample rate of 396.400 KSPS.

8. References Figure 10 Snapshot of the software used 

[1] S. Sigurdsson, “Fast Kalman filtering for ARMA processes: Fixed point implementation,” MS. thesis. Colorado State University, Ff. Collins. CO, June 1982.

   Sr. No.

Information

Count

%age use

1

No. slices

of

2145 of 32640

6%

2

Slice LUTs

3626 of 32640

11%

11%

3

Slice LUTs used as logic

3626 of 32640

4

LUT FlipFlop pairs used

4168

5

LUT FlipFlop pairs with an unused FF

2023 4168

[2] C. Gueguen and L. L. scharf, ‘‘Exact maximum likelihd identification of ARMA models; A signal processing perspective,” io Proc. EUSIPCO, Lame. Swieerland, [3] J. P. Dugre. L. L. Scharf, and C. J. Gueguen, “Exact likelihood for stationary vector Sept 1980. autoregressive moving average processes.” presented at the workshop on fast alge rithms in linear systems. Aussois, France, sept 1981. [4] L. B. Jackson. “On the interaction of roundoff noise and dynamic range in digital filters,” Bell Sysr. Tech. J.. vol. 49, 1970.

of

[5] C. T. MuIlis and R. A Roberts, “Synthesis of minimum roundoff noise bed point digital Nte~,” IEEE Trans. CirnU’tr Syst., vol. CASZ3. Sept 1976.

48%

6

LUT FlipFlop pair with an unsed LUT

542 4168

of

13%

7

Fully used FF pairs

1603 4168

of

38%

8

Bonded IOBs

82 of 480

17%

9

DSP48Es

16 of 288

5%

[6] M. Bayouni, P. Rao, and B. Alhalabi, “VLSI Parallel Architecture for Kalman Filter- An Algorithm Specific Approach,” Journal of VLSI Signal Processing,” 4, pp.147-163, Kluwer Academic publishers, 1992. [7] M S. Grewal, and A. P. Andoros, “Kalman Filtering, Theory and Practice,” Englewood Cliffs, NJ, Prentice Hall, 1993. [8] M. J. Omidi, Glen Gulak, S. Pasupathy, “Parallel Structures for joint estimation and Data Detection over Fading Channels”, IEEE Journal on Selected Areas in Communications, Vol. 16, No. 9, pp. 1616-1629, December 1998.

Table 1 Synthesis Results

8

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

[9] S. Y. Kung, “VLSI Array Processors,” Prentice Hall, 1988. [10] Sanjay Sharma, Sanjay Attri, R. C. Chauhan, “Joint Channel Estimation and Data Detection under Fading on Reconfigurable Fabric”, Elsevier Science B. V., INTEGRATION, The VLSI Journal, vol. 37/3, pp. 177-189, August 2004. [11] S. Uma , Dr. S. Annadurai “Colour Image Restoration Using Morphological Neural Network”, ICGST journal of GVIP, vol.5 issue 8, pp. 53-60, August 2005.

9. Biographies Ms Ruchi Pasricha is currenlt working as Assistant Professor in Chadigarh Engineering College, Landran near the beautiful city of Chandigarh in Punjab. She has done her B.Tech. from SLIET, Longowal in 2000 and M. E. from Punjab Technical University, Jalandhar in 2007. Currently, she is pursuing her Ph.D. from Thapar University, Patiala on the topic of Digital Compensation Techniques for Non-linearity mitigation for RF front end Amplifiers. Her main interests are VLSI signal Proceesing, FPGA Design and Communication Systems.

Dr. Sanjay Sharma is currently working as Assistant Professor in Electronics and Communication Engineering Department of Thapar University, India. He has done his B. Tech in ECE from REC in 1993, Jalandhar, M. E. in ECE from TTTI, Chandigarh in 2001 and Ph.D. from PTU, Jalandhar in 2006. He has completed all his education with honours. He has published many papers in various journals and conferences of international repute. He has to his credit the implementation of research projects worth 12000 USD. His main interests are VLSI Signal Processing, Wireless System Design using Reconfigurable Hardware, Digital communication, Channel Coding etc.

9

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

10

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

Power Delay Optimized Adder for Multiply and Accumulate Units P.Ramanathan1, P.T.Vanathi2 1.Senior Lecturer and Research Scholar 2. Assistant Professor, Department of Electronics and Communication Engineering , PSG College of Technology, Coimbatore – 641 004, Tamilnadu, India [email protected] are 10 transistor SERF adder, 10 transistor CLRCL adder and 14 transistor full adder. The circuit diagram of 10T SERF adder [1] is shown in Figure.1. It can be seen that it has two four transistor xnor structures to perform the sum operation. Both these xnor does not have ground so it has very low power consumption since there is no direct path from Vdd to ground. The carry logic is generated by pass transistor logic. Though SERF adder [1] consumes less power it suffers from threshold loss problem as both sum and carry are generated from pass transistor logic.

Abstract The prevalent blocks used in digital signal processing hardware are the adder, multiplier and delay elements. Better the performance of adder structure better will be the performance of multipliers in total aspect. Reducing power dissipation, delay and area at the circuit level is considered as one of the major factors in developing low power systems. In this paper we have introduced a new 14 transistor (14 T) full adder which has better power, delay performance than the existing adders. Performance comparison of the proposed 14 T adder has been made by comparing its performance with 10T SERF, 10T CLRCL, the existing 14T full adders. The proposed 14 T full adder structure has improved performance characteristics and suitable for Array, Carry Save and Dadda multipliers. Also three versions of 3 tap FIR filter namely Broadcast, Unfolded Broadcast, Unfolded and Retimed Broadcast structures have been implemented using three different multipliers. Each of these multipliers used for the Filters are implemented using all the existing full adders and the proposed 14 T full adder. Results show that circuits implemented using proposed 14T full adder have better power, delay and cascaded performance when compared with the peer ones. All the simulations were carried out using TSMC Complementary Metal Oxide Semiconductor (CMOS) 180 nm technology file with a supply voltage of 1.8V. Tools used are Cosmos and Path mill of Synopsys.

A

B

Sum

Cin

Cout

Figure. 1 Circuit diagram of 10T SERF full adder

In order to rectify the defects in the SERF adder [1], the 10T CLRCL [1] adder was introduced. The main aim of the design is that the carry signal must not suffer from distortion as it propagated. The circuit diagram of 10T CLRCL [1] is shown in Figure.2. It consists of two transistor xor structures. The inverters used prevent threshold loss problem and propagate the full swing signals to generate the carry. Though the carry signal has full swing operation, this circuit consumes more power. The circuit diagram of Existing 14T full adder [2] is shown in Figure.3. It has a four transistor xor structure and an inverter. The carry is generated using transmission gate logic and the sum is generated from pass transistor logic. .The power consumed by this circuit is less when compared with that of 10T CLRCL full adder [2] and more when compared with 10T SERF full adder [1].

Keywords : Array Multiplier, Carry Save multiplier, Dadda Multiplier, FIR Filter, Unfolding and Retiming.

1. Introduction Full Adder is a basic block in all digital circuits. A small change in transistor count, power and delay will cause a drastic change in the performance of a large VLSI circuit. The performance of multipliers depends on the full adder used. The important parameters to be considered while designing a full adder are power consumption, delay, area, full swing operation and performance while cascading adders in a multiplier structure. Three existing low power full adder cells

11

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

A

C in

B A

A

F C in

B

B F

XOR

Sum

A

XNOR

C out

Figure.4 Circuit diagram of xor & xnor structures

B C out

Figure 2 .Circuit diagram of 10T CLRCL full adder

A

B

Cin A

B

Sum

B

A

Sum

Cout Cout Cin

Figure.5 Circuit diagram of Proposed 14T full adder

Figure 3 .Circuit diagram of Existing 14T full adder

The hybrid adder proposed in [3] has more delay penalty in low voltage circuits. To overcome the above drawbacks a new 14 T full adder is designed and its performance characteristics are studied by implementing filter structures proposed in [4]. The remainder of the paper is organized as follows: Section (2) focuses on Design of new 14 T Full adder. Section (3) emphasizes on multipliers. Section (4) focuses on FIR filters. Section (5) emphasizes on the results of FIR filters. Section (6) discusses the conclusion of the work.

The sum output logic is pass transistor logic and while the carry output is transmission gate logic. By observing at the XOR cell, we noticed that the output is just the input value going through a pass transistor, and the only power dissipation is caused by the discharging capacitance to the ground occurring when there is an input pattern of (1 1) which has 25% of Activity Factor. Compared to the existing 14T full adder, here xnor cell is used in which the power dissipation is caused by the charging capacitance to the Vdd occurring when there is an input pattern of (0 0) which has again the 25% of Activity factor. Therefore, the sum circuit comprising of XOR and XNOR gate does not consume much power, since the switching activity to the ground is 25%.. The sum and carry output circuits require an inverted and noninverted signal from the first XOR gate output. The inverter used causes dynamic power dissipation. The rest of the circuit is just transmission gates. The proposed structure has better cascading capability, low power consumption and the higher operating frequency than the existing Full adder structures. The difference in the full adder structure of the existing 14T Full adder and proposed structure is the implementation of the sum equation, which results in the better performance of proposed 14T full adder.

2. Design of Novel 14 T Full Adder The Proposed 14T adder has a four transistor xor structure, a four transistor xnor structure and an inverter. Figure.4 shows the xor and xnor structures used in proposed 14T adder. The first xor structure gives a good logic ‘0’ as it has a ground and the xnor structure gives a good logic ‘1’ as it has a VDD. The circuit diagram of proposed 14T full adder is shown in Figure.5. The sum and carry are generated as per the equations given below.

sum = ( A : B) ⋅ C + ( A ⊕ B) ⋅ C

(1)

carry = ( A ⊕ B) ⋅ C + (A : B) ⋅ A

(2)

12

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

Table 1:Comparison of power and delay of different full adders

Power Delay Product

Power

Delay

(in µW)

(in ns)

(10-15 Ws)

10T CLRCL

21.074

0.1680

3.540432

10T SERF

7.990

0.1078

0.861322

16.960

0.1399

2.372704

13.074

0.1013

1.3243962

Full Adder

adders. The multiplier structures that have been considered for analysis are Array, Carry-Save and Dadda. Both 4x4 and 8x8 versions of the above three multipliers have been implemented and their performance comparison has been made.

EXISTING 14T

PROPOSED 14T

The table 1 shows the power, delay and power-delay product comparison of three existing full adders and the proposed 14T full adder. It can be seen that 10TSERF has the least power and power-delay product but it suffers from severe threshold loss problem which leads to circuit malfunction when cascaded in larger circuits. The proposed 14T full adder has less power consumption and delay when compared with existing 14T and 10T CLRCL full adders. Figure.6 shows the output waveforms containing the sum and carry signals of all the full adders. The input pattern is varied from 000 to 111 with input changing every nano-second so as to calculate the delay from the wave form. It can be seen from the waveform that carry (ca) of 10TSERF suffers from threshold loss problem (3rd and 5th bit). This is the main drawback of 10TSERF full adder. In 10T CLRCL waveform the carry(ca) signal suffers from threshold loss problem also the width of the glitches show that it has slow operation. It can be seen that the proposed 14T and existing 14T full adders have similar waveform but proposed 14T adder is faster than existing 14T adder because there exists a glitch at 2ns in the sum signal of existing 14T adder. Also proposed 14T adder consumes less power than the existing 14T adder.

Figure.6 Waveforms of different full adders

Table 2 : Power, Delay Comparison of 4x4 Array Multipliers using Different Adders.

Full Adder

Power (in µW)

Delay (in ns)

10T CLRCL

72.562

0.459

Power Delay Product (10-15 Ws) 33.305958

10T SERF

55.493

0.531

29.46678

69.436

0.395

65.665

0.350

EXISTING 14T PROPOSED 14T

27.42722 22.98275

The output results of 4x4 and 8x8 array multiplier show that the multiplier with SERF adder consumes low power but the output is highly distorted. Though the Proposed 14T consumes more power than SERF adder it is faster than it. The Proposed 14T full adder has the least value of power-delay product. The output results of 4x4 and 8x8 carry save multiplier show that the multiplier with PROPOSED 14T adder the least value of power-delay product. The table 6 and 7 shows the 4x4 and 8x8 Dadda multiplier results using different full adders .

3. Multipliers In this paper three different multipliers are considered for analysis of the adders. The multipliers are the structures where there will be many cascading stages of the full adder, so the performance of the full adders while cascading to many stages can be easily studied by analyzing the power, delay, power-delay product of the different multipliers made from different

13

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

Table 7: Power, Delay Comparison of 8X8 Dadda Multiplier using Different Adders

Table 3: Power , Delay Comparison of 8X8 Array Multiplier using Different Adders

Power (in µW)

Delay ( in ns)

Power Delay Product (10-15 Ws)

333.02

1.328

442.25056

335.16

2.39

801.0324

10T CLRCL

388.42

1.611

625.74462

10TSERF

305.90

2.882

881.6038

Multiplier using Adder PROPOSED 14T EXISTING 14T

Table 4: Power, Delay Comparison of 4x4 Carry Save Multiplier using Different Adders

Full Adder

Power (in µW)

Delay (in ns)

Power Delay Product (10-15 Ws)

10T CLRCL

94.478

0.561

53.00216

10TSERF

51.927

0.539

27.98865

EXISTING 14T

58.818

0.463

27.23273

PROPOSED 14T

62.466

0.373

23.29982

Multiplier using Adder

Power (in µW)

Delay ( in ns)

Power Delay Product (10-15 Ws)

NEW 14T

326.57

0.95

310.2415

EXISTING 14T

336.43

2.37

797.3391

10T CLRCL

497.20

2.210

1098.812

10TSERF

274.49

2.25

617.602

4. FIR FILTER Three versions of 3-tap FIR filter [5] has been implemented in this section. They are the Broadcast 3 tap FIR filter, Unfolded Broadcast FIR filter and Unfolded and Retimed Broadcast FIR filter. All these three structures have been implemented using three types of multipliers (8x8 bit) with four types of the above described adders. Z -1

X(n) h(0)

Z -1

h(1) X

Table 5: Power, Delay Comparison of 8X8 Carry Save Multiplier using Different Adders.

Power (in µW)

Delay ( in ns)

Power Delay Product (10-15 Ws)

316.88

0.9

285.192

335.73

2.068

694.28964

10T CLRCL

390.15

2.22

866.133

10TSERF

295.18

1.9

560.842

Multiplier using Adder PROPOSED 14T EXISTING 14T

Z -1

h(2) X

M0

h(3) X

M1

+

h(N-1) X

M2

X

M3

+ A0

Z -1

M(N-1)

+ A1

y(n)

+ A2

A(N-2)

Figure.7 FIR filter in Direct form

Figure.7 shows the FIR structure in direct form, here the critical path consists of a multiplier and n full adders . Figure.8 shows the broadcast structure of FIR filter where the critical path consists of one full adder and one multiplier. X(n)

Table 6: Power, Delay Comparison of 4X4 Dadda Multiplier using Different Adders

Full Adder

Power (in µW)

Delay (in ns)

h(N-4)

Power Delay Product (10-15 Ws)

10T CLRCL

108.23

0.675

73.05525

10TSERF

61.824

0.517

31.96301

EXISTING 14T

77.070

0.446

34.37322

Proposed 14T

72.126

0.436

31.44694

h(N-3) x

h(N-2) x

M(N-4)

z -1

h(N-1) x

M(N-3)

x

A(N-2)

z -1

h(0) x

M(N-2)

x

A(N-3)

z -1

x

M(N-1)

x

A(N-4)

M(0)

z -1

x

y(n)

A(0)

Figure.8 FIR filter in Broadcast form

Figures 9 and 10 show the unfolded and retimed versions of Broadcast FIR filter. Unfolding is done by a factor of three. Both Unfolding and Retiming increases the throughput of the filter.

14

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

x(0) M21

x(1)

M11

M01

A11

A01

D M22

14T) . It can be seen that the FIR filter using Proposed 14T has least power delay product in case of all three multipliers.

y(0)

D M12

M02

A12

A02

M13

M03

y(1)

x(2) M23

Figure. 11(a) Power Comparison

A13

y(2)

A03

Figure.9 Unfolded FIR filter in Broadcast form

x(0) X

X

X

D D

+

2D

2D

x(1)

2D +

D

y(0) Figure. 11(b) Delay Comparison

D X

X

X

D

2D

+

+

D

y(1)

x(2) D X

D

X

D +

X

2D +

D

y(2)

Figure 11(c) Power-Delay Product Comparison

Figure.11 Comparison of Broadcast 3 Tap FIR Filter using

Figure.10 Unfolded and retimed FIR filter in Broadcast form

different Multipliers and adders

5. Results of FIR Filter Figure.11 shows the power, delay and power delay product comparison graphs of 3 tap FIR Broad cast structure implemented by Array, Carry Save and Dadda multipliers. Each of these multipliers is designed using four different full adders (10TCLRCL, 10TSERF, Proposed 14T and Existing 14T) . It can be seen that the FIR filter using Proposed 14T has least power delay product in all the cases Figure.12 shows the power, delay and power delay product comparison graphs of 3-Unfolded Version of a 3 TAP FIR FILTER implemented by Array, Carrysave and Dadda multipliers. Each of these multipliers is designed using four different full adders (10TCLRCL, 10TSERF, Proposed 14T and Existing

Figure.12(a) Power Comparison

15

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

Figure.13(b) Delay Comparison

Figure.12 (b) Delay Comparison

Figure.13(c) Power-Delay Product Comparison

Figure.13 Comparison of 3 Unfolded and Retimed Version of a 3 Tap FIR Filter using different multipliers and adders

Figure.12(c) Power-Delay Product Comparison

Figure.12

Comparison of 3 Unfolded Version 3 Tap FIR Filter Implemented using different Multipliers and Adders.

6. Conclusion The Proposed 14T full adder has 23 % power savings when compared with Existing 14 T Full adder and 38 % power savings when compared to 10 T CLRCL adder. The Proposed 14T full adder has the least delay when compared with its peers. Three multipliers Array, Carry-Save and Dadda has been designed using four different full adders and their performance has been compared. The multipliers implemented using Proposed 14 T full adder have least power delay product in all the cases. Three forms of 3 tap FIR filter namely broadcast, unfolded broadcast and unfolded retimed broadcast have also The results reveal that Filter structures implemented using Proposed 14T full adders posses the least power delay product. This reveals the compatibility of the Proposed 14T full adder for DSP applications.

Figure.13 shows the power, delay and power delay product comparison graphs 3 unfolded and retimed version of a 3 Tap FIR Filter implemented by Array, Carry-save and Dadda multiplers. Each of these multipliers are designed using four different full adders (10TCLRCL, 10TSERF, Proposed 14T and Existing 14T). It can be seen that the FIR filter using Proposed 14T has least power delay product in case of all three multipliers. 10TCLRCL has least power consumption in all the three multipliers and Proposed 14T has least delay in all the three multipliers.

8. References [1] Jin-Fa Lin,Yin-Tsung Hwang, ”A Novel High-Speed and Energy Efficient 10- Transistor Full Adder Design” VOL. 54, NO. 5, MAY 2007. [2] E. Abu-Shama and M. Bayoumi, “A new cell for low power adders,” in Proc. Int. Midwest Symp. Circuits Syst., 1995, pp. 1014–1017.

Figure.13(a) Power Comparison

16

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

[3] T.Kowsalya, “ Tree Structured Arithmetic circuit by using different CMOS logic styles” ICGSTPDCS, Volume 8, Issue 1, December 2008. [4] Deepak,G.Meher,P.K.Sluzek ,"Performance Characteristics of Parallel and Pipelined Implementation of FIR Filters in FPGA Platform",in Signals, Circuits and Systems, 2007. ISSCS 2007. International Symposium on Publication Date: 13-14 July 2007. [5] N. Zhuang and H. Wu, “A new design of the CMOS full adder,” IEEE J. Solid-State Circuits, vol.27, no. 5, pp. 840–844, May 1992. [6] J. Wang, S. Fang, and W. Feng, “New efficient designs for XOR and XNOR functions on the transistor level,” IEEE J. Solid-State Circuits, vol.29, no. 7, pp. 780– 786, Jul. 1994. [7] A. M. Shams and M. Bayoumi, “A novel highperformance CMOS 1-bit full adder cell,” IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 47, no. 5, pp. 478–481, May 2000. [8] C. S. Wallace, "A suggestion for a fast multiplier," IEEE Trans. Electronic Computers, vol. EC-13, pp. 14-17, February 1964. [9] Chao, L.-F. Sha, E.H.-M.Dept. of Comput. Sci., Princeton Univ.,NJ, "Efficient retiming and unfolding",in Acoustics, Speech, and Signal Processing, 1993. ICASSP-93., 1993 IEEE International Conference on Publication Date:27-30 Apr 1993.

P.Ramanathan, completed his Master’s degree in VLSI Design during the May 2006 and he is currently working as a Senior Lecturer in ECE Deparment, PSG College of Technology, Coimbatore, India. He has about 9 years of teaching experience. His research area is VLSI Design.

P. T.Vanathi is Assistant professor of ECE Dept. at PSG College of Technology, Coimbatore, India. She has 20 years of teaching experience and has published 9 papers in reputed journals. Her Research areas are VLSI Design, Speech Signal Processing and Wireless Networks.

17

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

18

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

An Approach for Objective Assessment of Stuttered Speech Using MFCC Features K.M Ravikumar, R.Rajagopal, H.C.Nagaraj Nitti Meenakshi Institute of Technology, Bengaluru, India [email protected], [email protected], [email protected]

8.Broken words (words not completely pronounced) [6]. Stuttering is often associated with “Repetitions”. As described above, part-word or syllabic repetitions are one of the defining elements of stuttering. The dominant features of Normal Nonfluent (NNF) speech reported are: 1.Word Repetitions, but not partword Repetition is a prevalent feature of early stuttering [25]. 2. In early stuttering, there is a high proportion of Repetition in general, as opposed to other types of disfluency like prolongation [4]. Conventional way of making stuttering assessment are to count the occurrence of these types of disfluencies and express them either as the number of disfluent words as a proportion of all words in a passage or measure the time the disfluencies take compared with the duration of the entire passage manually (subjective). The main difficulties in making such counts which is subjective are: 1.It is time consuming to make and 2. There are poor agreements when different judges make counts on the same material [7]. Has these counts are subjective they are inconsistent and prone to error. Despite the fact that some researchers have several attempts to use objective methods to evaluate patients progress in speech therapy [1, 10, 11, 12], there is an always a need for improvement. In our previous work we have developed a procedure for recognition of disfluencies [14], using Hidden Markov Model (HMM) and we also tested the perceptorn classifier [15]. In our present work we use speech recognition technology with a new approach to automate the disfluency counts, thus providing an objective and consistent measurement. Different Stuttering devices based on Altered Auditory Feedback namely; Delayed Auditory Feedback (DAF), Frequency Shifted Auditory Feedback (FAF), and Masked Auditory Feedback (MAF) and also Digital Speech Aid (DSA) is widely used to treat the stutterer to reduce those counts [13]. The 150 words Standard English passage was selected for preparing the database. All the 15 clients around the age group of 25 on an average were made to read

Abstract Syllable repetition is one of the important parameter in assessing the stuttered speech objectively. The existing method which uses artificial neural network (ANN) and Hidden Markov Model (HMM) requires high levels of agreement as prerequisite before attempting to train and test to separate fluent and nonfluent. We propose automatic detection method for syllable repetition in read speech for objective assessment of stuttered disfluencies which uses a new approach and has four stages comprising of segmentation, feature extraction, score matching and decision logic: Segmentation is assisted manually which is tedious but straightforward. Feature extraction is implemented using well known Mel frequency Cepstra coefficient (MFCC). Score matching is done using Dynamic Time Warping (DTW) between the syllables. The Decision logic is implemented by Support Vector Machine (SVM) and compared with our previous work which uses Perceptron method. The proposed objective approach has an advantage over the manual (subjective), which provide consistent measurement required for assessment. The assessments by human judges on the read speech of 15 adults who stutter are described. 80% of data are used for training and 20% for testing. The average result was found to be 93.45%, which is better than our previous work [80.78%] using HMM. Keywords: Assessment, DTW, MFCC, Objective, Perceptron, Stuttering, SVM.

1. Introduction Stuttering, also known as stammering in the United Kingdom is a speech disorder. The type of disfluencies that employed are: 1.Interjections (extraneous sounds and words such as “uh” and “well”); 2.Revisions ( the change in content or grammatical structure of a phrase or pronunciation of a word as in “ there was a young dog , no , a young rat named Arthur”); 3.Incomplete Phrases (the content not completed); 4.Phrase–repetitions; 5.Wordrepetitions; 6.Part-word-repetitions; 7.Prolonged sounds (sounds judged to be unduly prolonged);

19

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

the passage and these speech were recorded using cool edit version2 at sampling rate of 16000 samples per second with number of bits to represent as 16-bits. The remainder of the paper organized as follows: section (2) focuses on Automatic detection methods and steps involved in it. Section (3) emphasizes on 15 samples collected and accuracy of new approach on those samples. Section (4) concludes by comparing present work with previous.

robust in some different pattern recognition tasks concerning human voice. They are widely used in speech recognition and also in speaker identification. The human voice is very well adapted to the ear sensitivity, most of the energy developed in speech being in the lower frequency energy spectrum, below 4 kHz. In speech recognition tasks, usually 12 coefficients are retained, which represent the slow variations of the spectrum of the signal, which characterizes the vocal tract shape of the uttered words [16].

2. Automatic Detection Method The detection scheme used for assessment is divided into four steps as shown in Figure 1:

Speech

Segmentation: Phonetics gives no exact specification of syllables. The characteristic feature of the syllable is the dynamical transient part consonantvowel or consonant –vowel –consonant. The feeling of syllable boundaries, although usually very strong, is subjective and often not unique. For Automatic segmentations of syllable many methods are available, which uses signal extremes, first Autoregressive (AR) coefficient, etc [19]. The speech samples collected in the databases are segmented manually, which is tedious but straightforward [5]. The segmented speech syllables are subjected to feature Extraction.

Segmentation

Feature Extraction

Score Matching

Decision Logic

No. of Repetitions

Feature Extraction: A common first step in feature extraction is frequency or spectral analysis. The signal processing techniques aim to extract features that are related to identify the characteristics. The speech signal is analyzed in successive narrow time windows of 10msec width, for its frequency content with 2msec offset [21]. For each and every window we obtain the intensity of several bands on the frequency scale using feature extraction algorithm. Several different feature extraction algorithms exist, namely [5] 1. Linear Predictive Cepstral Coefficients (LPCC) 2. Perceptual Linear Prediction (PLP) Cepstra. 3. Mel Frequency Cepstral Coefficient (MFCC) Most feature extraction package produce a multidimensional feature vector for every frame of speech. LPCC computes Spectral envelope, before converting it into Cepstral coefficient. The LPCC are LP-derived Cepstral coefficient. PLP integrates critical bands, equal loudness pre emphasis and intensity-to-loudness compression. The PLP is based on the Nonlinear Bark scale. It was originally designed to speech recognition with the removing of speaker dependent characteristics. MFCC is based on signal decomposition with the help of a filter bank, which uses the Mel scale. The MFCC results on Discrete Cosine Transform (DCT) of a real logarithm of the short-term energy expressed on the Mel frequency scale. Our work considers 12MFCC.The Cepstral coefficients are set of features reported to be

Figure 1: Block diagram of Automatic detection method

The Mel-frequency scaling is done by a bank of triangular band-pass filters, nonuniformly distributed along the frequency axis. The Mel-scale equivalent value for frequency f expressed in Hz is:

f ⎞ ⎛ Mel ( f ) = 2595 log10 ⎜1 + ⎟ ⎝ 700 ⎠

(1)

The MFCCs are computed by redistributing the linearly spaced bins of the log-magnitude Fast Fourier Transform (FFT) into Mel-spaced bins according to equation (1) and applying DCT on the redistributed spectrum. A relatively small number of coefficients (typically 12) provide a smoothened spectral envelope, leading to the isolation of the vocal tract response by the simple retention of the desired amount of information. An additional advantage in using MFCC is that, it has decorrelating effect on the spectral data and maximizes the variance of the coefficients, similar to the effect of Principal Component Analysis. The each dimension is a floating point value. Feature extraction modules are also called front-end or just signal processing modules. Score Matching: In this paper, the DTW based score matching is done. The DTW procedure combines alignment and distance computation in one dynamic programming procedure. Basic DTW assumes (a)

20

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

global variation in speaking rate for a person uttering the same word at different times can be handled by linear time normalization (b) local rate variations within each utterance are small and can be handled using distances penalties (c) each frame of test utterance contributes equally to recognition (d) single distance measure applied uniformly across all frames is adequate. These give intuitive distance measurements between time series by ignoring both global and local shifts in the time dimension. The 12 dimensional MFCC obtained for each syllable are used to compute the angle between them (normalized inner product) which serve as local-distance and represent in the form of matrix. Using Dynamic Programming (DP) the minimum-cost path through matrix is found [8, 22]. These values were given to decision logic to identify whether the syllable were repeated or not.

Figure 2: Basic idea of Perceptron

If distinct parameters are separated, do not move. If not, move it to the left. If the pattern is correctly classified, do nothing, else:

Δw = η ∑ xt d t

The Perceptron classifier minimizes the error probability much better then Minimum Mean Square Error (MMSE) classifier. The Perceptron learning algorithm is given below. a) Get a training sample. b) Check to see if it is misclassified. i) If classified correctly, do nothing. ii) If classified incorrectly, update w by

Decision Logic: i) Perceptron Method: In our previous work [15] we tested the Decision logic using the Perceptron to take a decision whether a syllable is repeated or not. Perceptron was the first iterative algorithm for learning linear classification. It is a single layer network with threshold activation function: (2) y = sgn wT x + b

(

Δw = ηxt d t

ii) Support Vector Machine (SVM): In this paper we use the SVM method to classify the fluent with that of Nonfluent. SVM [3, 18, 24] is a powerful machine learning tool which attempts to obtain a good separating hyper-plane between two classes in the higher dimensional space. The equation of the hyper-plane is: (10) wT x + b = 0

The weight vector w is updated each time, a training point is misclassified. The algorithm is guaranteed to converge when data are linearly separable. Two classes of pattern are “linearly separable” if they can be separated by a linear hyperplane. Suppose that target values (d t ) take either 1 or -1:

x ∈ c1 x∈ c2

Where w a weight, is vector and b is the bias. Nonlinearity is satisfied by mapping the input features x into higher dimensions using a function

(3)

φ ( x) : R d

Here we find w such that

w x>0

for

x ∈ c1

wT x < 0

for

x ∈ c2

T

(4)

wT φ ( x ) + b = 0

(5)

xt ∈m

1 w min ξ ,W ,b 2

(6)

(12)

2

N

+ C∑ξi

(13)

i =1

Subject to:

(

)

y i wT φ ( xi ) + b ≥ 1 − ξ i ,

Where m is the set of vectors. The gradient of ε (w) is:

∂ε = − ∑ xt d t ∂w xt εm

(11)

This leads to the following optimization problem:

The Perceptron criterion leads to the following objective function

ε ( w) = − ∑ wT xt d t

→ Rd , p > d

And hence the hyperplane becomes:

This implies that

wT xd > 0 ∀x

(9)

c) Repeat steps 1 and 2 until convergence

)

⎛ 1 if d t = ⎜⎜ ⎝ − 1if

(8)

xt ∈m

ξi ≥ 0 (7)

i = 1..N ,

i = 1..N

(14)

C is constant determined by a cross validation process. The dual formulation of this problem is: N

max∑λi − λ

The basic idea behind Perceptron is shown in Figure 2.

21

i =1

1 N N ∑∑λi λ j yi y j K (xi , x j ) 2 i=1 j =1

(15)

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

Subject to:

∑λ λ i

i

=0

(16)

0 ≤ λi ≥ C , i = 1..N Here

λi , i = 1..N

are the Lagrange multipliers. The

K (xi , x j ) = φ (xi ) φ (x j ) is called a T

function

kernel function. In SVM literature, there are many forms of the kernel function. If the probability density functions of the feature vectors in both classes are known, there is a possibility of defining natural kernels derived from these distributions [23]. Figure 3: Result of test data 1

3. Results Out of fifteen samples of speech collected, twelve samples (80%) were used for training and remaining three samples (20%) for testing. The percentage of accuracy for test data1 and test data2 is computed. The confusion matrix is used for analysis. It gives the following values: Percentage of Repetition recognized as Repetition, Repetition as Non-repetition, Non-repetition as Repetition and Non-repetition as Non-repetition. i) Test data1 Percentage Confusion Matrix = ⎛⎜ 95.4 4.5 ⎞⎟ ⎜ 14.6 85.3 ⎟ ⎝ ⎠

Figure 4: Result of test data 2

Classification Accuracy Per class = (95.4 85.3)

The results are tabled in Table 2 for two testing data.

Overall Classification Accuracy = (90.35)

Parameters

ii) Test data2

100 0 ⎞ Percentage Confusion Matrix = ⎛⎜ ⎜ 3.4 96.6 ⎟⎟ ⎠ ⎝ Classification Accuracy Per class = (100 96.6 ) Overall Classification Accuracy = (98.3) The Figure 3 and 4 shows the result of test data 1 and test data 2. The Percentage of accuracy for two test data is listed in the Table 1. The syllables per minute (SPM) and percent disfluency (PD) were calculated using following formula: SPM =

PD =

Total

Total

number of syllables read X Total time in sec onds

Table 2: Percent Disfluency (PD) Test data1 Test data2

No. of syllable Time in Secs Fluent syllable Non-fluent syllable SPM

171 68.4 130 41 150

147 62.4 121 26 141

PD (%)

23.97

17.68

The Table 2 helps the speech-language pathologist to asses the client and also improves interjudge agreement about stuttered events [20].

4. Conclusion In this paper a new approach for automatic detection of syllable Repetition is presented for objective assessment of stuttered disfluencies. We discussed the different steps involved in finding the number of repetitions from the speech samples using MFCC feature extraction algorithm. When compared to the previous work which is implemented using ANN [10, 11], has the result of 78.01% and with HMM [14] 80% and also with Perceptron classifier [15] 83%, our present work which uses SVM performs better with average result of 94.35%. Other Features extraction

60

number of disfluent syllable X 100 Total number of syllable

Table 1: Percentage of accuracy for test data Feature Extraction Test Test data2 Average algorithm data1 accuracy MFCC 90.35% 98.35% 94.35%

22

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

Algorithm like fused MFCC and IMFCC may be taken as future work [21]

[13] K. M. Ravikumar and R. Rajagopal, “Altered Auditory Feedback Systems for Adult Stutter,” Proceedings of the Sonata International Conference on Computer Communication and Control, pp. 193-196, November 2006. [14] K. M. Ravikumar, Sachin Kudva, R. Rajagopal and H. C. Nagaraj, “Development of a Procedure for the Automatic Recognition of Disfluencies in the Speech of People Who Stutter,” International Conference on Advanced Computing Technologies, Hyderbad, India, pp. 514-519, December 2008. [15] K. M. Ravikumar, Balakrishna Reddy, R.Rajagopal and H. C. Nagaraj, “Automatic Detection of Syllable Repetition in Read Speech for Objective Assessment of Stuttered Disfluencies”. In Proceedings of World Academy Science, Engineering and Technology, vol.36, Bangkok, Thailand, pp. 270-273, October 2008. [16] L. Rabiner and B.H. Juang, “Fundamental of speech recognition,” PTR Prentice Hall, Englewood Cliffs, New Jersey, 1993. [17] Radounae Iqdour and Abdelouhab Zeroual, “The Multi-layered Perceptrons Neural Networks for the Prediction of Daily Solar Radiation,” Internatioal Journal of Signal Processing ,vol 3, no.1, pp. 24-29, 2007. [18] A. Reda, El-Khoribi, “Support Vector Machine Training of HMT Models for Land Cover Image Classification,” ICGST-GVIP, vol.8, issue 4, pp. 7-11, December 2008. [19] W. Reichl and G. Ruske, “Syllable segmentation of continuous speech with Artificial Neural Networks,” In Processing of Eurospeech, Berlin, vol.3, pp. 1771-1774, 1993. [20] V.V. Sairam, “Assessment of fluency in Adult”. In Proceedings of the National Workshop on Assessment and Management of Fluency Disorders, Mysore, India, pp. 11-26, October 2007. [21] Sandipan Chakroborthy and Goutam Saha,” Improved Text-Independent Speaker Identification Using Fused MFCC & IMFCC Feature Sets Based on Gaussian Filter,” International Journal of Signal Processing, vol. 5, no.1, pp. 11-19, 2009. [22] H. Silverman and D. Morgan, “The application of dynamic programming to connected speech segmentation,” IEEE ASSP Mag.7, no.3, pp. 725, 1990. [23] A. Sloin and D. Burshtein, “Support Vector Machine Training for Improved Hidden Markov Modelling,” IEEE Transactions on Signal Processing, vol. 56, no. 1, January 2008. [24] V. N. Vapnik, “Statistical Learning Theory, New York: Wiley,” 1998. [25] E.Yairi and B.Lewis, “Disfluencies at the onset of stuttering,” Journal of speech & Hearing Research, vol.27, pp. 154-159, 1984.

5. Acknowledgements We would like to thank L&T and itie Knowledge Solutions for providing technical support and timely guidance.

6. References [1] M. Adams, “Voice onsets and segment duration Of normal speakers and beginning Stutterers,” Journal of Fluency Disorders, vol. 6, pp. 133- 140, 1987. [2] K. R.Aida-Zade, C. Ardil and S. S. Rustamov, “Investigation of combined use of MFCC and LPC Features in Speech Recognition Systems,” International Journal of Signal Processing, vol. 3, no.2, pp. 105-111. [3] C. Burges, “A tutorial on support vector machines for pattern recognition,” Data Mining Knowl. Discov., vol.2, pp. 121-167, 1998. [4] E.G.Conture, “Stuttering” Englewood cliffs, New Jersey: Prentice-Hall, 2nd edition, 1990. [5] Dalouglas O’ Shaughnessy, “Speech Communication,” Human and Machine, Universities press, 2nd edition, 2001. [6] W.Johnson et al., “The onset of stuttering, minneapolies :University of Minnesata press,” 1959. [7] D. Kully and E .Boerg, “An investigation of inter_clinic agreement in the identification of fluent and stuttered syllables,” Journal of Fluency Disorders, vol.13, pp. 309-318, 1988. [8] E. Keogh, “Exact indexing of dynamic time warping,” In VLDB, pp. 406-417, Hong Kong, China, 2002. [9] Neeta Awasthy, J.P.Saini and D.S.Chauhan, “Spectral Analysis of Speech: A new Technique,” International Journal of Signal Processing, vol. 2, no.1, pp. 19-29, 2006. [10] Peter Howell, Stevie Sackin and Kazan Glen, “Development of a Two-stage procedure for the Automatic Recognition of Dysfluencies in the speech of children who stutter: I. Psychometric Procedure Appropriate for Selection of Training Material for Lexical Dysfluency Classifiers,” JSLHR, vol.40, pp. 1073-1084, October 1997. [11] Peter Howell, Stevie Sackin, and Kazan Glen, “Development of a Two-stage procedure for the Automatic Recognition of Dysfluencies in the speech of children who stutter: II. ANN Recognition of Repetitions and Prolongations with supplied word segment markers,” JSLHR, vol.40, pp. 1085-1096, October 1997. [12] Peter Howell and Louise Vause, “Acoustic analysis and perception of vowels in stuttered speech”, Journal of Acoustic Society of America, vol.79, no.5, pp.1571-1579, May 1986.

23

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

Ravikumar.K.M, Assis-tant Professor, Depart-ment Of Electronics and Communication Engg., Ghousia college of Engg., Ramanagara has completed his M.Tech from SJCE, Mysore in the field of Biomedical Instrumentation in the year 2002 and he is pursuing his PhD program under VTU, Belgaum. He is in the field of teaching from past 12 years and he has published four papers in the International Conference and one in International journal related to his research areas. His field of interest includes Digital Signal Processing, Speech Signal Processing and Communication System.

N.S.T.L., Visakhapatnam and D.O.E., Government of India in the areas of Sonar System Modelling and Simulation, Underwater Propagation Modelling, Beamforming algorithm & software development and Tracking algorithm development. He joined Central Research Laboratory (CRL), Bharat Electronics in June 1998 and served as Head, Radar Signal Processing Group till May 2006. He joined L&T in May 2006 and is Head of Technology Development in Strategic Electronics Center at Bengaluru. His current focus areas are Military Communication, Aviation and UAVs. He has more than 70 publications in the proceedings of International Conferences and International /National Journals. He has served in the Technical Program committees of many International Conferences like IRSI, FUSION 2000, etc.

Dr.R.Rajagopal, Strategic Electronics C- enter, L&T, Bengaluru, obtained his PhD degree in 1992 from the Bharathidasan Univer- sity, Tiruchirappalli with his research thesis in the area of Array Signal Processing for Passive Sonar. He had his M.E. degree in Communication Systems from Bharathidasan University in 1985 and his B.E.(Hons) degree from Madras University in 1982.He has worked in the ECE department of Regional Engineering College, Tiruchirappali from 1982 to May 1998. During this period, he carried out many sponsored research projects for N.P.O.L., Cochin,

Dr.H.C.Nagaraj, Princi-pal, Nitte Meenakshi institute of Technology, Bengaluru completed his PhD from IIT madras, Chennai in the year 2000 and M.E in communication system from P.S.G College of Technology, coimbatore in 1984.He is in the field of teaching from past 26 years and published 24 papers in the National/International conferences and journals. His area of interest are Digital Signal Processing, Image processing, Digital Communication, Mobile Communication and Biomedical Engineering

24

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

Analysis of Underwater Transients using Lifting based Wave Packet Transform Roshen Jacob* , Tessamma Thomas** and A.Unnikrishnan*

*Naval Physical & Oceanographic Laboratory, Kochi-682 021,India ** Cochin University of Science & Technology, Kochi-682021,India [[email protected], [email protected], [email protected]]

Abstract This paper proposes a fast method for analyzing underwater transients buried in noise. The challenge here is to develop a method applicable to different types of transients with unknown waveforms and arrival times. Traditional signal analysis techniques are not ideally suitable for transient analysis, as they make the assumption that the signals are stationary, or are infinite in extend. The advent of time-frequency representations has opened better methods for transient analysis. The method used here for transient analysis is wavepacket transform. As for the implementation, instead of the conventional filter bank implementation scheme, a less computationally intensive method, namely lifting is adopted here. As a result, tremendous reduction in computation (around 60%) has been achieved. The tiling plots of the wavepacket transform outputs clearly bring out the time of occurrence and frequencies of the embedded transients. The proposed algorithm has been implemented in real time and its performance verified using a number of simulated and recorded underwater transients. The capability and time saving achieved by combining wave packet transform and lifting scheme will be very useful in real time underwater transient analysis systems. 1 Keywords: Wavelet transform, Wavepacket transform, Lifting scheme, transient analysis, underwater transients.

1. Introduction A transient signal can be generally defined as a signal whose duration is short compared to the observation time. These signals are non-stationary, having a wide variety of signal characteristics such as shape, frequency content and duration which are unknown and undergoes wide variation from event 1

This study has been implemented on ADSP 21062 Sharc platform at NPOL.

to event. The detection, analysis and classification of such signals is a problem of importance in fields such as underwater acoustics, biomedical engineering and industrial applications. This paper proposes a new method for the analysis of multiple transient signals with unknown waveforms and arrival times, embedded in noise. Performance analysis of machinery is a very common application of transient analysis. A transient of a certain frequency range and wave shape may indicate a particular degradation, signaling an impending failure in the machinery, So, an early detection can prevent machinery damages. Biomedical equipments for health diagnosis also use transient analysis extensively. If the transient signal is known, the problem would be trivial. The interest here is to develop an approach useful for various types of transients, regardless of their form, length or location. Detection and analysis of transients is gaining significance in the context of underwater acoustical signals also. With the ships and submarines becoming silent day by day, it is difficult to detect them based on narrow-band machinery noise only. However, the transient signals emitted by naval targets during torpedo launch, sudden course changes etc are having comparatively higher power. Also abrupt machinery failure or enemy scanning signals could be considered as rarely occurring transient waveform embedded in noise. Transient analysis acquires significance in this context. Underwater transients can be divided into two main categories: those of biological origin and those of nonbiological origin. The biological transients are further divided into two classes, namely snapping shrimp and clicks, emitted by shrimps, whales and dolphins. The non-biological transients are those emitted from submarines, ships etc. Typically, underwater transients have duration of 200-600 ms. Traditional signal analysis techniques are not ideally suitable for transient analysis, as they make the assumption that the signals are stationary or are infinite in extend. But transients are non-stationary

25

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

In nature and highly corrupted with noise. Recent developments have shown that time-frequency methods are very efficient in analyzing and classifying transients. The information in nonstationary signals that was lost in the Fourier Transform is characterized by small and highly concentrated areas in the time-frequency domain. In other words, fewer coefficients with significant values are generated. Hence, we can say what frequencies are present at a particular time instant. Wavelet and Wave Packet transforms are transforms which can give such efficient representations [1]. Discrete Wavelet transforms have been widely applied to the problem of transient processing [2], primarily because the transform basis functions provide good time localization and it involves the tracking of local transform maxima across analysis scales. But the problem with Discrete Wavelet Transform (DWT) is that it can only find its finer analysis for the low-band signal, whereas, the Discrete Wave Packet Transform (DWPT) can divide all the time-frequency plane into successive subtle tilings[3,4,5,6]. Hence DWPT is more competent to handle wide-band and high-frequency narrow band signals like transients. It is well known that the DWT is a very computationally intensive process. In order to alleviate this problem, we have proposed the Lifting based implementation [7, 8, 9, and 10]. Prior to transient analysis, the transient has to be detected. The onset time and duration of the transient event is estimated by a less complex algorithm. Page Test (or Cusum Test) proposed by Douglas. A.Abraham [11] is a well established procedure for transient detection. After the transient is detected using the Page Test, the resulting data block is subjected to Discrete Wave Packet transform, for analysis. Wavepacket transform can be used for transient detection also, but only at the cost of more hardware. Wavelet processing in real-time is now possible thanks to the technology available at present. The proposed scheme of wavelet analysis has been implemented on a DSP processor (ADSP 21062 Sharc). By efficiently using the Lifting technique, a very efficient implementation of wave packet transform has been achieved. In Sec. 2, Wavelet Transform using Mallat’s Filter Bank Scheme is briefly reviewed, followed by Discrete Wave packet transform in Sec. 3. In section 4, the Lifting scheme proposed by Wim Sweldon is presented. The simulation results and implementation details are shown in Sec. 5. Concluding remarks are presented in the last section.

2. Discrete Wavelet Transform using Filter Banks (DWT) Wavelet transforms have been widely applied to the problem of transient processing, primarily

because the transform basis functions provide good time localization and it involves the tracking of local transform maxima across analysis scales. This technique relies on the observation that the evolution of the transform maxima across scales provides a measure of the local regularity of the signal. DWT brings out the arrival time of the transient as a sharp peak in the timefrequency tiling plot. By choosing the optimum values of the dilation and translation parameters, the sharpness of the peak can be controlled. Using this method, transients occurring at different time instants can be analysed. Also, transients overlapping in time and frequency can be resolved. The DWT method performs equally well for noisy signals at very low SNRs also. Wavelet analysis, viewed in the context of multiresolution analysis was developed by Mallat. Mallat’s filter bank algorithm involves the computation of approximation coefficients s(k) and detailed coefficients d(k) using the following filtering operations. Here, h(n) and g(n) are the wavelet and scaling filters respectively. The coefficients at scale j are convolved with the time reversed filter coefficients h(-n) and g(-n) and then down sampled to get the coefficients at scale (j-1). Figure 1 below shows the filter bank implementation for the decomposition.

d j-1(n) h(n)

2

s(n)

d

g(n)

h(n)

2

g(n)

2

j-2

(n)

2

s j-1(n)

s j-2(n)

Figure 1. DWT- Mallat’s decomposition tree

3. Discrete Wavelet Packet Transform (DWPT) A Wavelet basis is a member of the larger collection of wave packet bases, so, wave packets will be even better for the transient analysis. According to the multi scale filtering structure, wave packet transform can divide the entire time-frequency plane into subtle tilings, while the classical WT can only find its finer analysis for lowerband only. Hence DWPT is more competent to handle wide-band and high-frequency narrow band signals like transients. The wave packet method is a generalization of wavelet decomposition that offers a richer range of possibilities for signal analysis. In the wave packet analysis, the details as well as the approximations can be split. This yields 2n different ways to encode the signal. The wavelet packet decomposition tree is shown in Figure 2.

26

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

by

γj-

P ( γ j ) where P represents the prediction

operator. This is the real de-correlating step. However, the new representation has lost certain basic properties, which one usually wants to keep, like for example, the mean of the signal. To restore this property, one needs a primal lifting step, whereby λ j is updated with data from the new γ j . Thus λ j , is replaced by

λj

+U ( γ j ) where U represents the

updating operator. These steps can be repeated by iteration on λ j , creating a multilevel transform or

Figure 2. Wave Packet decomposition tree

multi-resolution decomposition. So, the lifting steps to go from level j+1 to level j, the steps are summarized as follows. Splitting (lazy wavelet transform)

4. Fast Wavelet Transform using Lifting Scheme The DWT is a very computation intensive process, so the requirement for faster implementation schemes is very much required. The Lifting based implementation meets this requirement. It was developed by Wim Sweldons in 1997 as a method to improve a given WT to obtain some specific properties[6]. Later , it was extended to a generic method to create the so called second generation wavelets. The theory behind the classical wavelets relies heavily on Fourier Transform, while the lifting scheme can also be used to introduce wavelets without using the concept of Fourier Transform. It is fruitful to view the DWT as prediction-error decomposition. The scaling coefficients at given scale are predictors for the data at the next higher resolution or scale (j-1). The wavelet coefficients are simply the “prediction errors” between the scaling coefficients and the higher resolution data. This interpretation has led to a new framework for the DWT known as the lifting scheme. Suppose that the low-resolution part of a signal at level j+1 is given, represented by λ j +1 .

λ j +1

γ j -odd samples λ j Prediction (dual lifting) γ j ⇐ γ j - P ( γ j ) Update (primal lifting) λ j ⇒ λ j + U ( γ j ) ⇒-Even sample

These three steps form a lifting stage (See Figure 3). Iteration of the lifting stage on the output s (n) creates the complete set of DWT scaling and wavelet coefficients (See Figure 4). By first factoring a classical wavelet filter into lifting steps, the computational complexity of the corresponding DWT can be reduced. The lifting steps can be easily implemented with ladder_type structures, which is different from the direct finite impulse response (FIR) implementations of Mallat’s algorithm. Hence our implementation will require lesser hardware resources while achieving higher utilization.

This set is transformed into two other sets at level j: the low-resolution part λ j and the high-resolution

+ odd

γj

part γ j . This is obtained first by splitting the data set

λ j +1

into two data subsets. Traditionally, this is done

by separating

λ j +1

λ j +1

into the set of even samples and

odd samples. Such a splitting is sometimes referred to as the lazy wavelet transform. Doing just this does not improve our signal representation. For that , the two subsets are recombined in several lifting steps which decorrelate the two signals. Lifting steps usually come in pairs of a primal and a dual lifting step. A dual lifting step can be seen as a prediction; the data λ j are predicted from the data γ j .

γj

U

even

+

λj

Figure 3. Lifting stage-Split, Predict, Update

5. Implementation Details Figure 5 below shows the functional block diagram for a typical transient detection and analysis system. On incoming blocks of data, Page test is done continuously to detect the occurrence of transients/intercepts. Once a transient is detected, on that particular block of data, analysis is done to characterize the type of signal.

When the signals are highly correlated, such a prediction will be very good, and thus we need not keep this information in both signals. We need to store only that part of γ j that differs from its prediction (the prediction error). Thus

-P

Split

is replaced

27

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

Figure. 5 Functional block diagram

The DWT implemetation using Mallat’s algorithm and Lifting scheme algorithms were first simulated using MATLAB for the various types of transients and various values of SNRs. Subsequently these algorithms were proven on ADSP-21062 SHARC based hardware. The editor/debugger has been used for developing, editing, debugging and optimising the C codes for the ADSP-21062 SHARC. We have used DB4 filters in the implementation. Table 1 show the filters used in the Filter bank implementation. Table 2 shows the coefficients used in the Lifting scheme. In the lifting scheme, the sequence with DB4 is predict followed by update, then predict, then update and finally scaling, using the given coefficients. The computing times to perform one level of the transform on 1-D signal are recorded for both Filter Bank & Lifting scheme, for different input lengths (see Table 3). The results show that that the Lifting scheme is always faster than the Filter Bank scheme. The performance comparison is done for several filters of practical interest, addressing both DWT & IDWT. In all of them, up to 60% reduction in the computing times could be achieved by using the Lifting Scheme. Hence our implementations will require lesser hardware resources while achieving higher utilization.

Figure 4. – Two stage Lifting

The performance of Wavepacket Transform on a variety of simulated transients, added with noise is given below. The algorithm was tried on recorded biological transient signals also. This study may not be exhaustive in terms of the transients used. We have chosen dB4 wavelet for the simulation. Other wavelets can also be used as a future work to arrive at the best one. The simulated transients used are of short duration compared with the observation intervals. There are mainly three types of generic transient's namely impulsive transient, Ringing transient and Chirp transient. The model used for impulsive transient h (t) of 10 ms duration.

(n) 0.0106 0.0329 0.0308 -0.1870 -0.0280 0.6309 0.7148 0.2304

h(t ) = exp(α t 2 / 2 + iβ t 2 / 2 ) sin( 2π ft )

Where, α - Damping factor, β - Oscillating frequency, f - Center frequency=2400 Hz, The model used for Ringing transients r(t) of ms duration is r (t ) = exp (α t 2 / 2 + iβ t 2 / 2).s (t ) where

g(n) -0.2304 0.7148 -0.6309 -0.0280 0.1870 0.0308 -0.0329 -0.0106

Table 1 - DB4 filters

100

s(t ) = sin(2πft ) + .5 sin(4πft ) + .25 sin(6πft )

f - Center frequency=800 Hz,

The Model used for the Chirp transients c(t) of 250 ms duration is

alpha

1.586134342

beta

0.05298011854

gamma =

0.8829110762

delta

0.4435068522

zeta

1.149604398

Table2 - Lifting Scheme Coefficients

c(t ) = exp(−α t 2 / 2 + ft / 2)

Data length 512 1024 2048

where α - chirp slope f - start frequency=50hz, BW - Bandwidth=200 hz,

FBS msec 0.3 0.75 1.7

LS msec 0.1 0.27 0.6

Table 3 - Execution Time

28

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

Figures 6-11 show the impulsive, ringing transient and chirp transient and the wave packet transforms. The frequencies and the time of occurrences are clearly brought out in the DWPT tiling plots. The 2400 hz in the impulsive transient and the 800 hz in the ringing transient are clearly marked at level 4 in the tiling plots, in the respective frequency nodes. The chirp transient frequency variation also is very much evident at level 5 in the wavelet plot, in all the frequency nodes for 50 hz to 250 hz. The spectrum of the different transients is also plotted alongside the wavepacket transform plots, inorder to highlight the advantage of the wavepacket scheme proposed here. The same algorithm was tested with recorded biological transients (Figure 12 & 13). As the DWPT tiling plots show, the multiple transients, their frequencies and time of occurrences are clearly brought out. Here also, the prominent frequencies in the whale noise as well as the chirps in the biological noise are marking in the tiling plots. The results show the potential of this method in analyzing any transient, irrespective of the duration and waveform. The reduction in computation achieved with lifting also incorporated in to the scheme is another great advantage. In this paper, we have used wavepacket transform for analysis of transients. Detection can also be done using the same method. But this will demand more hardware depending on the number of channels to be processed. However, with the fast processing power available currently, this may not be too much to ask.

[level node]

2 WAVELET

wavepacket level number

4 db4 6 8 10 12 14 16 500

1000

1563

4 5

1563

1875

4 6

1875

2188

4 7

2188

2500

4 8

2500

2813

4 9

2813

3125

4 10

3125

3438

4 11

3438

3750

4 12

3750

4063

4 13

4063

4375

4 14

4375

4688

4 15

4688

5000

2500

3000

3500

amplitude

0

0.02

0.04

0.06

0.08 0.1 0.12 time in seconds

0.14

0.16

0.18

0.2

Figure 8. Ringing Transient Signal with noise [level node]

wavepacket level number

amplitude

1250

1

0

1

db4 6 8 10 12

0.5

14

0 -0.5

16 500

-1

0.18

1250

4 4

-3

1.5

0.16

938

938

-2

2

0.14

625

4 3

-1

2.5

0.08 0.1 0.12 time in seconds

4 2

2

4

0.06

625

3

3

0.04

313

4

3.5

0.02

313

4 1

Figure 7. Wpkt Transform of Impulsive Transient

WAVELET

0

0

1500 2000 sample number

2

-1.5

BW

4 0

0.2

1000

BW

4 0

0

313

4 1

313

625

4 2

625

938

4 3

938

1250

4 4

1250

1563

4 5

1563

1875

4 6

1875

2188

4 7

2188

2500

4 8

2500

2813

4 9

2813

3125

4 10

3125

3438

4 11

3438

3750

4 12

3750

4063

4 13

4063

4375

4 14

4375

4688

4 15

4688

1500 2000 sample number

2500

5000 3000

3500

Figure 9. Wpkt of Ringing Transient

4 50

3 spectrum amplitude

40

2 30

1 20

0 10

-1 0 0

500

1000 1500 2000 2500 3000 3500 4000 4500 5000 frequency in Hz

-2 0

Figure 6. Impulsive Noisy Transient & Spectrum

500

1000

1500

2000

2500

Figure 10. Chirping Transient Signal with noise

29

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

WAVELET db4

wavepacket level number

5

10

15

20

25

30 500

1000

5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 1500 2000 sample number

Analysis of Whale Noise

BW

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

0 16 31 47 63 78 94 109 125 141 156 172 188 203 219 234 250 266 281 297 313 328 344 359 375 391 406 422 438 453 469 484 3000

2500

16 31 47 63 78 94 109 125 141 156 172 188 203 219 234 250 266 281 297 313 328 344 359 375 391 406 422 438 453 469 484 500 3500

0.8 0.6 0.4

signal amplitude

[level node]

0.2 0 -0.2 -0.4 -0.6 -0.8 0

20

Figure11. Wpkt Transform of Chirping Transient

40

60

Figure

13a.

Analysis of Biological Noise

140

Transient

wavepacket level number

0.4

0.2

0

-0.2

-0.4

160

180

BW

1

7

0

0

2

7

1

86

172

3

7

2

172

258

4

7

3

258

345

5

7

4

345

431

6

7

5

431

517

7

7

6

517

603

8

7

7

603

689

9

7

8

689

775

10

7

9

775

1000

2000

3000 4000 5000 s am ple num ber

-0.6

200

Signal

[node]

0.6

signal amplitude

80 100 120 Time in msec

86

861

6000

7000

8000

Figure 13b. Wavepacket Transform -0.8 0

20

40

60

80 100 120 Time in msec

140

160

180

200 12

x 10

4

Figure 12a. Transient Signal 10

wavepacket level number

BW

8

7

0

0

86

2

7

1

86

172

3

7

2

172

258

4

7

3

258

345

5

7

4

345

431

6

7

5

431

517

7

7

6

517

603

8

7

7

603

689

9

7

8

689

775

10

7

9

775

Squared Modulus

[node] 1

2

2000

3000

4000 5000 6000 s am ple num ber

7000

8000

861 9000

Squared Modulus

2500

2000

1500

1000

500

300

400 500 F re q u e n c y in H z

600

700

200

300

400 500 F re q u e n c y in H z

600

700

800

900

This paper proposes a new method for analyzing underwater transients buried in noise with unknown waveforms and arrival times with less computational complexity. Transient analysis is done using wavepacket transform. Instead of the conventional filter bank implementation scheme, this paper proposes a less computationally intensive method, namely lifting based wave packet implementation. As a result, tremendous reduction in computation (around 60%) has been achieved. Simulation has been done using a number of simulated and recorded underwater transients. From the resulting tiling plots,

3000

200

100

6. Conclusions

10000

3500

100

0

Figure 13c. Spectrum

Figure 12b. Wavepacket Transform

0

4

0

1000

0

6

800

900

Figure 12c. Spectrum

30

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

The time of occurrence and frequencies of the embedded transients are clearly brought out. We have come to the conclusion that the lifting scheme is faster than the Filter Bank scheme; especially when the filter has has more taps. Upto 60% reduction is possible with the lifting scheme. The capability and time saving achieved by combining wave packet transform and lifting scheme will be very useful in real time transient analysis systems. This study may not be exhaustive in terms of the transients used. DB4 wavelet has been chosen for the simulation. Other wavelets can also be used as a future work to arrive at the best one. Also, the SNR upto which the algorithm works, the optimum decomposition level, the best wavelet for specific transients, adaptive wavelet transforms-these issues are currently being done as a continuation of this work.

Acknowledgements The authors wish to express their thanks to Director, N.P.O.L. For giving the opportunity and encouragement to do this work.

References [1] Daubechies,”The Wavelet transform, timefrequency localization and signal analysis”, IEEE Tr. on Information Theory, Vol. 36, pp 916-1005, 1990 [2] Athina.P.Petropulu, “Detection of Transients using DWT”, IEEE,1992 [3] R R Coifman and M V Wickerhauser, “Entropy based algorithms for best basis selection”, IEEE Tr. on Information Theory, Vol. 38, pp 713-718, 1992 [4] Stephen Del Marco and John Weiss, “Improved Transient Signal Detection using Wave-packet based Detector”, IEEE Tr. On Signal Processing, Vol.45, No.4, April 1997 [5] Philip Ravier and Pierre-Olivier Amblard, “Wavelet packets and De-noising based on higherorder statistics for transient detection”, Signal Processing, 81(2001) 1909-1926 [6] Christopher Delfs and Frederich Jondral, “Classification of Transient time-varying signals using DFT and Wavepacket based methods”, ICASSP 98, Vol.3 [7] Ingrid Daubechies and Wim Sweldons, “Factoring Wavelet Transforms into Lifting Steps”, “Wavelets in Geosciences”, Roland Klees and Roger Haagmans (Eds) [8] Hongyu Liao, Mrinal Kr. Mandal and Bruce.F.Cockburn, “Efficient architectures for Lifting Based WT”, IEEE,Tr. Signal Processing, 2004, May, Vol.52, No.5 [9] Pei-Yin Chen and Shung-Chih Chen, “An efficient VLSI Architecture of 1-D Lifting DWT”, IEICE Tr. Electronics, Vol.E87-C, No.11, Nov 2004 [10] H Olkkonen,J T Olkkonen and P Pesola, “Efficient Lifting Wavelet Transform for

Microprocesor and VLSI Applications”, IEEE Signal Processing Letters, Vol.12, No.2, Feb 2005 [11] Douglas.A.Abraham, “Analysis of a signal starting time estimator based on the Page Test Statistic”, IEEE Tr, on AES, Oct 1997, pp 12251229

Biographies Roshen Jacob- Received BTech in Electrical Engineering (1979) from College of Engineering (Trivandrum), India, M.Tech in Digital Electronics (1987) from Cochin University of Science and Technology and pursuing Ph.D degree in the area of TimeFrequency analysis. For the last twenty years, she has been working in the Naval Physical and Oceanographical Laboratory, Kochi which is a premiere Laboratory of Defence Research and Development Organisation. Her field of interests includes Sonar Signal Processing, Time Frequency Analysis and Embedded Systems. She has published three Conference Papers and two Journal papers. She is a Fellow of IETE and IEI, India. List of her publications are given below 1) “Estimation of Chirp parameters using Fractional Fourier Transform” Proceedings of the National Symposium on Ocean Electronics, SYMPOL 2005, Dec 15-16 2) “Reverberation insensitive Waveforms for Active Sonars” NPOL Departmental Report No. NPOL-RR07/2007 3) “Fractional Fourier Transform based Matched Filter in Active Sonar” Sea Tech Journal (NPOL), Vol.3, No.1, June 2006 4) “Applications of Fractional Fourier Transform in Sonar Signal Processing” IETE Journal of Research (Under Review) 5) “Noise Normalization Schemes used for Signal Detection” Proceedings of the National Symposium on Ocean Electronics, SYMPOL 2005, Dec 15-16.

31

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

Dr. Tessamma Thomas- Has 30 years of teaching experience (UG, PG & Ph.D) at the Department of Electronics in Cochin University of Science and Technology. She is currently Reader at the same Department. Graduated from Kerala University, India (1975) in Physics, Post Graduation in Physics (1977), Completed her M.Tech(1979) from Cochin University of Science and Technology in Electronics & Communications and Ph.D (1996) from Cochin University of Science and Technology in “A Modified Block Adaptive Predictive Coder for Speech Processing”. Her field of Research & teaching include Speech / Music Processing, Image Processing, Wavelet Transform Applications, Biomedical signal / Image processing, Artificial Neural Networks, Digital System Designs. She has more than forty National and International Journal and Conference Papers to her credit.

Dr. A UnnikrishnanGraduated from REC (Calicut), India in Electrical Engineering (1975), completed his M.Tech from IIT, Kanpur in Electrical Engineering (1978) and Ph.D from IISc, Bangalore in “Image Data Structures”(1988). Presently, he is The Associate Director Naval Physical and Oceanographical Laboratory, Kochi which is a premiere Laboratory of Defence Research and Development Organisation. His field of interests includes Sonar Signal Processing, Image Processing and Soft Computing. He has authored about fifty National and International Journal and Conference Papers. He is a Fellow of IETE & IEI, India.

32

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

Modified Time Adaptive Wavelet Based Approach for Enhancing Speech from Adverse Noisy Environments Sumithra M G , Thanuskodi K, Anitha M R A. Professor, C. Lecturer, Bannari Amman Inst. of Technology, Tamil Nadu. India [email protected], [email protected] B. Principal, Coimbatore Inst. of Engg. & Information Tech. ,Tamil Nadu. India [email protected]

Abstract A modified time adaptive wavelet based approach is proposed in this paper for enhancement of speech from adverse noise conditions by considering the musical noise encountered in most frequency domain speech enhancement algorithms. Enhancement of speech from adverse noisy environment is accomplished through time adaptation of discrete wavelet coefficients and the use of soft thresholding on wavelet co-efficients followed by post filtering to smooth the estimated speech. This approach is incorporated with time domain and frequency domain analysis. Results are measured objectively by Signal to Noise Ratio (SNR), Itakura-Saito (IS) distance, Minimum mean square error (MMSE) and subjectively by listening test (Mean Opinion Score) with TIMIT sentences corrupted by various types of colored noise. Visual inspection of spectrograms and power spectral density plots are also used to support the results. Keywords: Daubechies wavelets, Discrete wavelet transform, Time adaptation factor, Soft threshold, Band pass filter, Speech enhancement.

Introduction Speech enhancement methods can be used to increase the quality of the speech processing devices like mobile telephony, digital hearing aids and human-machine communication systems in our daily life and make them more robust under noisy conditions. Speech enhancement includes improving the speech quality, its intelligibility and reducing listener’s fatigue. The quality of speech signal is a subjective measure which reflects the way the signal is perceived by listeners. It can be expressed in terms of how pleasant the signal sounds are or how much effort is required to understand the message. Intelligibility, on the other hand is an objective measure of the amount of information that can be extracted by listeners from the given signal. Among various singlemicrophone algorithms for speech enhancement, the spectral subtraction has been mostly employed. Despite its capability of removing background

noise, spectral subtraction [1] introduces additional artifacts known as the musical noise, and is faced difficulties in pause detection. This distortion is caused due to the inaccuracies in the short-time noise spectrum estimate. The spectrum of real world noise does not affect the speech signal uniformly over the entire spectrum. In our previous work the fact that is taken into account that the background noise affects the speech spectrum differently at various frequencies and spectral subtraction is performed independently on each band by estimating noise[2]. Other methods focused on masking the musical noise using psychoacoustic models [3] [4]. In recent years, several alternative approaches such as signal subspace methods [5], have been proposed for enhancing the degraded speech. In subspace method the estimation of signal subspace dimension is difficult for unvoiced period and transitional regions. Existing approaches to this task include traditional methods such as spectral subtraction [1] and Ephraim Malah filtering [6], a drawback of this technique is the necessity to estimate the noise or the signal to noise ratio. This can be a strong limitation when recording with nonstationary noise and for situations where the noise can not be estimated. For spectral subtraction, Wiener filtering, and Ephraim Malah filtering, the signal is divided into 25 ms windows with 12 ms overlap between frames. Frequency analysis is done using a Hamming window, and noise estimation is accomplished using the first three frames of the signal. Wavelet-based techniques using coefficient thresholding [7], Wavelet-based techniques using adaptive thresholding [8] approaches have also been applied to speech enhancement. Donoho [7] introduced wavelet thresholding (shrinking) as a powerful tool in denoising signals degraded by additive white noise and more recently a number of attempts have been made to use perceptually motivated wavelet decompositions coupled with various thresholding and estimation methods Although the application of wavelet shrinking for speech enhancement has been reported in literature [911], there are many problems yet to be resolved for a successful application of the method to speech signals degraded by real environmental noise types. The reported results in literature shows that they have lower SNR improvement and Mean Opinion Score (MOS).

33

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

The main objective of the proposed method is to improve on existing single-microphone schemes for an extended range of noise types and noise levels, thereby making this method more suitable for mobile speech communication applications than the existing. This algorithm introduces a speech enhancement system based on a time adaptive wavelet denoising with post filtering. The performance of the proposed method was evaluated on several speakers and under various noise conditions including white noise, pink noise, F16 cockpit noise and car interior noise. Subjective experiments by means of a listening test shows that the system based on this method has significant improvement over the state-of-the-art speech enhancement system. The results of the proposed method shows that it is well suited for adverse noise conditions and yields better spectral performance. It is very important characteristic for speech recognition or speaker verification. This paper proposes a time adaptive wavelet based speech enhancement method for different noisy environments. Section 1 represents a survey on the related works. Section 2 introduces a background about the wavelets. Section 3 shows the steps for implementation. Section 4 explains experimental results and discussions. Finally section 5 summarizes the proposed research work.

associated with low pass and high pass filters, respectively. The decomposition of the signal into different frequency bands is simply obtained by successive high pass and low pass filtering of the time domain signal as shown in figure.1. The original signal x[n] is first passed through a half band high pass filter g[n] and a low pass filter h[n]. The signal can therefore be sub sampled by 2, simply by discarding every other sample. This constitutes one level of decomposition. This proposed method of analysis has gone up to 4th level decomposition using Daubechies wavelet-4. Ingrid Daubechies, one of the brightest wavelet on research which is compactly supported orthonormal wavelets [12].

2. Background – Wavelet Analysis Wavelet transform has been intensively used in various fields of signal processing. It has the advantage of using variable size time-windows for different frequency bands. This results in a high frequency-resolution (and low time-resolution) in low bands and low frequency-resolution in high bands. Consequently, wavelet transform is a powerful tool for modeling non-stationary signals like speech that exhibit slow temporal variations in low frequency and abrupt temporal changes in high frequency. Moreover, when one is restricted to use only one (noisy) signal (as in single-microphone speech enhancement), generally the use of the sub band processing can result in a better performance. Therefore, wavelet transform can provide an appropriate model for speech signal denoising applications. In the present work, the computation of Discrete Wavelet Transform (DWT) providing sufficient information for both analysis and synthesis of the original signal, with a significant reduction in the computation time. The DWT is considerably easier to implement without needing to perform numerical integration as like Continuous wavelet transform (CWT). DWT Computation:. The DWT analyze the signal at different frequency bands with different resolutions by decomposing the signal into a coarse approximation and detail information shown in figure.1.DWT employs two sets of functions, called scaling functions and wavelet functions, which are

Figure.1. DWT computation

Proposed Scheme using Time Adaptive Discrete Wavelet Transform: A noisy speech signal can be modeled as the sum of clean speech and additive background noise. If the signal includes ambient noise, the result is an additive signal model given by, x= y+n (1) where x is noisy signal, y is clean speech and n is additive noise component . So that X=Y+N (2) where X = Wx, Y = Wy, N = Wn in wavelet domain [12]. The matrix notation represents the coefficients across each scale and time. Block diagram of the proposed approach is shown in figure.2. First DWT of the noisy speech is taken then the time adaptive nature is captured by time varying linear factor T ( a , τ ) calculation for each scale (a=2m)and time(τ = n 2m) using equation (3). This factor only affects the duration of amplitude envelope of wavelet, but not affects the frequency. T ( a , τ + Δτ ) =

⎛ Cs ⎜ ⎜1 − ⎜ C + X s TADWT ( a , τ ) ⎝

1 ⎞ ⎟ ⎟ ⎟ ⎠

⎞ ⎛ ∂ ⎜⎜1 + ( a , τ ) ⎟⎟ X ∂t TADWT ⎠ ⎝

(3)

For implementation based on Yao and Zhang’s work [13] for cochlear implant coding, coefficients at 22 scales, m=7,8,…28 are calculated using numerical integration

34

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

of the CWT. These 22 scales corresponds to center frequencies logarithmically spaced from 225 Hz to 5300Hz have considered in this method. Cs = 0.8 is a constant representing non linear saturation effects in the cochlear model [14]. Since, the primary adaptation mechanism involves variation of the wavelet time support, the impact of initial time support was done by turning off adaptation mechanism ( T (a, τ ) = 1 ). The resulting time adaptive wavelet transform coefficients are calculated from the product of X TADWT (a, τ ) DWT coefficients X DWT (a , τ ) with a time constant K ( a , τ ) and the same is substituted in equation.(3) for time adaptation mechanism. From the reported analysis [8], X TADWT = K(a, τ ) * X DWT ( a , τ ) (4) 1 Π K ( a ,τ ) = C 1 + T 2 ( a ,τ ) where C 0 + C1 + C 2 + C 3 = C = 2 (normalizing constant). The normality is obtained by equation (5). Discrete Wavelet Transform

X DWT (a , τ )

Noisy Speech x(n)

Time adaptation

XTADWT= K(a,τ ) *XDWT (a,τ )

Threshold value determination

Th =σ (2 log (N)) 1/ 2

Soft Thresholding ' YTADWT (a,τ ) = Thresh {XTADWT(a,τ )}

Revert to AWT ' ' YDWT (a,τ) =[1/ K(a,τ)]*YTADWT (a,τ)

Estimated Speech

Inverse Discrete wavelet transform

BPF

y '(n )

Figure.2. Proposed Scheme

C0 = C2 =

(1 + 3 ) , 4

(3 − 3 ) , 4

C1 = C3 =

(3 + 3 ) 4

(1 − 3 ) 4

(5)

Then h(k ) = C k / 2 coefficients for db 4 wavelets are as follows [12],

h (0 ) =

(1 + 3 ) , h(1) = (3 + 3 )

h (2 ) =

(1 − 3 ) , h(3) = (3 − 3 )

4 2

4 2

(6) 4 2 Since, it is a discrete wavelet this computational method requires no integration and is more efficient. Removing noise components by thresholding the wavelet coefficients is based on the observation that in many signals (like speech), energy is mostly concentrated in a small number of wavelet dimensions. The coefficients of these dimensions are relatively large compared to other dimensions or to any other signal (specifically noise) that has its energy spread over a large number of coefficients. Hence, by setting smaller coefficients to zero, one can nearly optimally eliminate noise while preserving the important information of the original signal. In wavelet representation noise characteristics will tend to be characterized by smaller coefficients across time and scale while signal energy will be concentrated in larger coefficients. This offers the possibility of using threshold to separate the signal from the noise.

4 2

Denoising by Thresholding: In the literature there are two types of thresholding [11] techniques applicable to speech processing which are Hard Thresholding and Soft thresholding. In hard thresholding all coefficients below a predefined threshold value are set to zero. But, in soft thresholding where in addition the remaining coefficients are linearly reduced in value. Here in this research work soft thresholding is applied. Since, noise components may be present at both high and low frequency. The universal threshold is the most common method typically implemented on soft-threshold function. An estimation of the noise level is usually calculated during the initial silence period in most of the popular enhancement algorithms. Specifically, in this work, the variance of the noisy coefficients in each wavelet band are calculated using equation.(7). In practical situations, colored noises encountered rather than white noises. Assuming zero-mean Gaussian noise, the coefficients will be Gaussian random variables of zero mean and variance σ 2, the standard deviation σ is thus estimated by, σ = (1 / 0.6745) Median( | ci | ) (7) where ci represents high frequency wavelet coefficients which are used to identify the noise components at first level decomposition. The set of standard deviation values can now be used as the “noise profile” for threshold setting [13]. This noise profile estimation enables the algorithm to cope with colored noises. Threshold can be done across all wavelet decomposition levels, referred as level-dependent threshold, which is applied in this work. Threshold value [11] can be determined by,

35

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

Th =σ (2 log (N)) 1/ 2 (8) where Th is the threshold value. N is the length of the noisy signal (x). ⎧⎪Sgn ( X TADWT Y 'TADWT (a, τ ) = ⎨ ⎪⎩ 0

) ( X TADWT

- Th

)

, X TADWT > Th , X TADWT < Th

(9) where Y ' TADWT (a, τ ) is thresholded time adaptive wavelet co-efficient of estimated speech signal and X TADWT (a, τ ) is time adaptive wavelet coefficient of the noisy speech signal and ‘a’ represents scale factor on wavelet will be taken as constant.

3. Implementation Details The proposed method can be implemented by the following steps: Step 1) Computation of the DWT for noisy speech signal. Step 2) Computation of time adaptation factor and Multiply with discrete wavelet coefficients using equation. (4). Step3) Estimate the noise using equation. (7) &(8) then apply soft thresholding technique for the wavelet co-efficients using equation.(9). Step 4) Inverse time adaptive discrete wavelet transform is taken through dividing the coefficients by that adaptation factor, which yields enhanced speech with reduced noise components. Step 5) By band pass filtering (post) (100Hz – 3.8 KHz ) ITADWT signal components yields enhanced speech with reduced noise components.

4. Experimental Results and Discussions The proposed scheme is implemented using MATLAB 7 wavelet toolbox and the obtained results are evaluated using COLEA speech analysis tool. Objective and subjective tests were conducted to evaluate the quality and intelligibility of the proposed method. Objective measures provide a mathematical comparison of the original and processed speech signals. The quality of speech signal is a subjective measure which reflects the way the signal is perceived by listeners. It can be expressed in terms of how much the signal sounds pleasurable or how much effort is required to understand the message. Ten sentences from the TIMIT [16] data base produced by five male speakers and five female speakers were used in this method. Two sets of additive noise experiments were implemented on this data. In the first, white Gaussian noise at SNR levels of -10,-5,0,+5,+10 db were added to the sentences using Speech demo software. In the second, specific noise characteristics including F-16 cockpit noise, car noise and pink noise at 0dB SNR level are considered to evaluate how well the methods work

with non-white and relatively non-stationary noise sources. Signal-to-noise ratio (SNR), Itakuro-Saito (IS) distance and MMSE are used as objective measurements criteria for both set of experiments. To evaluate the effectiveness of using this proposed method for enhancement of speech signals, we compare it to other standard approaches on this task. This includes comparing to the spectral subtraction, Wiener filtering, Ephraim Malah filtering and Bionic wavelet transform. Coefficient thresholding for the BWT [13] was done using a level-independent soft thresholding function. The proposed work is expertise with leveldependent soft thresholding function. Signal to Noise Ratio (SNR): The global SNR values are determined by the following, 2 ⎤ ⎡ ∑ y (n ) ⎥ ⎢ n (10) SNR dB = 10 log10 ⎢ ⎥ 2 ⎢ ∑ [ y (n ) − yˆ (n )] ⎥ ⎦ ⎣n where y (n ) = clean speech and yˆ (n ) = estimated speech. If the summation is performed over the whole signal length, the operation is called as global SNR. Minimum Mean Square Error: Mean Square Error (MSE) is defined as the average power of the difference between the enhanced speech and clean one. It can be obtained by [9], ⎧ 2⎫ rˆ = E ⎨ [y (n ) − yˆ (n )] ⎬ ⎩



(11)

The objective of any speech enhancement system is to minimize this MSE. For this proposed method the calculated MMSE is lying between 0.07 to 0.5. Itakura-Saito (IS) distance: IS distance is a meaningful measure of performance when the two waveforms differ in their phase spectra.

d ( a , b) =

( a − b) T R ( a − b) a T R (a )

(12)

where ‘a’ is the vector for the prediction coefficients of the clean speech signal, vector R is the (Toeplitz) autocorrelation matrix of the clean speech signal and vector ‘b’ is the prediction coefficients of the enhanced signal. Many reported experiments confirmed that two spectra would be perceptually nearly identical if the distance is from 1 to 10, with lower values indicating lesser distance and better speech quality. In this proposed method the difference in phase spectra of enhanced speech and clean speech is between 0.21and 1.2. AWGN noise condition across range of SNR values: SNR results for the white noise experiment are shown in figure.3. Methods compared includes Ephraim Malah filtering, Iterative Wiener filtering, Spectral Subtraction, BWT denoising (BWT) and the proposed time adaptive wavelet based method. From the figure the proposed method, BWT denoising and the Ephraim Malah filtering clearly have the best performance for this noise condition as compared to Iterative Wiener filtering and spectral spectral subtraction method. The output SNR of

36

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

the proposed method, BWT denoising and the Ephraim Malah filtering shows the linear SNR improvement for different noise levels.

to rate the sentences for quality, using a standard MOS scale where 5 indicates excellent quality and imperceptible distortion and 1 indicates unsatisfactory quality with annoying and objectionable distortion is shown in figure.5. The net result is that the Ephraim Malah filtering method was second to the proposed method in overall opinion score, followed by an Iterative wiener filtering, BWT and spectral subtraction methods.

Figure.3.SNR results for white noise case at -10,-5, 0,+5,+10 dB

Proposed method gave about 14 db improvement at the lower input SNRs, decreasing to about 9db improvement at the higher input SNRs. Similarly, BWT denoising and Ephraim Malah filtering gave about 12 db improvement at the lower input SNRs, decreasing to about 3dB improvement and 4 db improvement at the higher input SNRs respectively. Realistic noise conditions at 0 db SNR: SNR improvement across varying realistic noise conditions at 0 dB SNR are shown in figure.4. Here the results are given as net improvement, so that relative effectiveness can be seen for all four noise conditions as a function of enhancement method.

Figure.5. MOS comparisons for varying noise conditions at 0dB SNR

It can be seen from the above chart the proposed method have better performance for F-16 cockpit, pink, white noise cases and showing poor performance in the car noise case. Direct power spectrum of clean and enhanced speech (normalized frequency) from the pink noise at 0dB SNR is estimated based on hamming windowing technique shown in figure.6.

Figure.4. SNR comparisons for varying noise conditions at 0dB SNR

The proposed method substantially outperforms the other methods in nearly all cases, but is competitive with Ephraim Malah filtering only for the car noise conditions. It can be seen from the chart the proposed method have 50% improvement over EM filtering for white noise, 44% improvement for pink noise and 47% improvement for F-16 cockpit noise cases. For car noise environment, the proposed method has 30% less performance than EM filtering. Mean opinion score results across all test conditions: A subjective perceptual measure called Mean Opinion Score (MOS) is computed by having a group of listeners rate the quality of the speech on a five point scale, then averaging the results. For these tests twenty subjects were surveyed and asked

Figure.6. Power spectrum of clean speech and enhanced speech from pink noise at 0dB SNR

From the above plot it can be seen that the more identical spectrum for clean as well as enhanced speech within the pass band frequency range of the post filter. Figure.7 shows the time domain representation of the noisy, clean and enhanced speech without and with post filtering. The signal level of the enhanced speech is slightly increased in figure.7(d) because of the gain factor of post filtering which is not affecting the quality of speech. Speech spectrogram (shows the energy in a signal at each frequency and at each time) of enhanced

37

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

and clean speech are shown in figure.8. The intensity variations in enhanced speech are due to increase in signal strength. Due to the application of band pass filtering there is no obvious difference in the frequency range 100Hz-3.8 KHz between clean and estimated speech. But, even after enhancement the proposed method is somehow less efficient in frequency domain due to the presence of unwanted noise components which are indicated by the arrow mark specifically in figure.8(c). With reference to listening test results, it is implicit that these limitations would not strongly degrade the intelligibility of enhanced speech.

5. Conclusions A time adaptive wavelet based speech enhancement method is based on time adaptation and basic soft thresholding with post filtering process. It has shown that it can considerably enhance the noisy speech corrupted by white and colored noises. The competency of the proposed system to extract a clear and intelligible speech from various adverse noisy environments in comparison with other well-known methods has been demonstrated through both objective and subjective measurements. The quality and intelligibility tests were proved that the enhanced speech and clean speech have better similarities on time and frequency domain analysis. In spite of the powerful performance for additive white noise case, the proposed method produces better performance in real time noisy environment like F-16 cockpit and car noise etc. We conclude that the proposed method gives two important results: (1) The method was well suited to enhance the speech even for very strong noise condition since it has produced better performance than the existing algorithms. (2) Although subjective informal listening tests are in form of the level dependent thresholding, the proposed method yields better spectral performance. The limitations of the proposed method are (1) Its performance is less efficient for the enhancement of car noise and multi talker (babble) noise cases. (2) The results shows that beyond the certain pass band frequency range the unwanted noise components are introduced. (3) The absence of voiced and unvoiced classification make this algorithm as inefficient one for highly non-stationary noisy environments. Future work on this approach will include V/UV classification and modified thresholding techniques with adaptive filtering for other noisy cases like street, helicopter, train noisy, etc.

6. Acknowledgements We gratefully acknowledge the cooperation of the people who participated in the subjective test. We would like to express our sincere thanks to the management of Bannari Amman Institute of

Technology, Sathymangalam, India who provided the facilities to do our research.

7. References [1] S.F.Boll, “ Suppression of acoustic noise In speech using spectral subtraction”, IEEE Trans. Acousics. Speech. Signal process., vol. 27,pp. 13-120, April 1979. [2] M.G. Sumithra , D.Deepa and K. Thanuskodi “Frequency Dependent Single Channel Speech Enhancement from Additive Background Noise” Proceedings of the International conference on Advanced Communication Systems, pp. 85 90,2007. [3] D.E.Tsoukalas ,J.N.Mourjopoulos & Kokkinakis “Speech enhancement based on audible noise suppression” , IEEE Trans. Speech Audio . Proc., vol.5,pp. 479–514,Nov.1997. [4] N. Virag, “Single channel speech enhancement based on masking properties of the human auditory system,” IEEE Trans. Speech Audio Processing, vol. 7, pp. 126–137, Mar. 1999. [5] Y. Ephraim & H.L. V Trees, “A signal subspace Approach for Speech Enhancement ”, IEEE Trans. Speech and Audio Processing vol.3, no.4 pp.251-265, Sep 1995. [6] Ephraim. Y., Malah. D., “ Speech Enhancement Using a minimum mean-square error log-spectral amplitude estimator”, IEEE Trans. Acoust. Speech Signal Processing ASSP-32(6), 11091121,1984. [7] D.L. Donoho ,“ De-noising by soft thresholding”, IEEE Trans. on Information Theory, vol. 41 no. 3, 613-627, May 1995. [8] Michael T. Johnson , Xiaolong Yuan , Yao Ren, “ Speech signal enhancement through adaptive wavelet thresholding”, Speech Communication, 49 (10),pp. 123-133, 2007. [9] W. Seok & K.S.Bae, “ Speech enhancement with reduction of noise components in the wavelet domain”, in Proceedings of the ICASSP, pp.13231326, 1997. [10] E. Ambikairajah, G. Tattersall and A.Davis, “Wavelet transform based speech enhancement ” in Proceedings of ICSLP, 1998. [11] Yasser Ghanbari, Mohammad Reza Karami Mollaei, “A new approach for speech enhancement based on the adaptive thresholding of the wavelet packets”, Speech Communication 48 , 927-940, 2006. [12] K.P. Soman, K.I. Ramachandran, “ Insight in to wavelets”, From theory to practice, Prentice - Hall of India Private Ltd, 2nd edition, 2006 . [13] Yao,J.,Zhang,Y.T., “The application of bionic wavelet transform to speech signal processing in cochlear implants using neural network simulations”, IEEE Trans.Biomed.Eng.49(11), 1299-1309,2002.

38

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

[14] Yao,J., “An Active model for otoacoustic emissions and its application to time-frequency signal processing”, Ph.D. thesis, The Chinese University of Hong Kong, Hong Kong, 2001. [15] Hamid Sheikhzadeh and Hamid Reza Abutalebi, “An improved wavelet based

Speech enhancement system”, in proc. of Eurospeech 2001. [16] J.Sgarofolo, “Getting started with the DARPA TIMIT CD-ROM : An acoustic phonetic continuous speech database”, NIST speech disc 1-1.1,oct 1990

Figure.7. Time domain representation (a) Noisy speech with pink noise (0dB SNR) (b) Clean speech (Male- TIMIT database, (c) Enhanced speech without post filtering,(d)Enhanced speech with filtering

Figure.8. Speech Spectrograms (a) Noisy speech (b) Clean speech (c) Estimated speech (proposed method)

39

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

Dr. K. Thanushkodi, born in Theni District, TamilNadu State,India in1948, received the BE in Electrical and Electronics Engineering from Madras University, Chennai. MSc (Engg) from Madras University, Chennai and PhD in Electrical and Electronics Engineering from Bharathiar University, Coimbatore in 1972, 1976 and 1991 respectively. His research interests lie in the area of Computer Modeling and Simulation, Computer Networking, Signal Processing and Power System. He has published 35 technical papers in National and International Journals.

Biographies obtained Mrs.M.G.Sumithra, B.E(Electronics and Commn. Engg.) from Govt. College of Engineering , Salem, India in 1994 and received M.E.(Medical Electronics)from College of Engineering, Anna University Chennai, India in 2001. She is currently Professor in Bannari Amman Inst. of Technology, Sathyamangalam, Tamilnadu, India and pursuing her research in Speech Processing. Her areas of interest are Signal Processing and Speech Communications.

Ms. M. R. Anitha obtained B.E (Electronics and Communication Engg.) from Velalar College of Engg. and Tech, Erode, India in 2002 and received M.E. (Communication Systems) from Bannari Amman Inst. of Tech., Sathyamangalam , in June 2007. India. She has been working as a Lecturer in Bannari Amman Inst. of Technology, Sathyamangalam, India,2007. Her area of interests are Digital signal processing and Wireless communication systems.

40

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

A Zero Voltage Transition Synchronous Buck Converter with an Active Auxiliary Circuit A.K. Panda1, Hari N.Pratihari2, Bibhu Prasad Panigrahi3, L.Moharana4 1) Dept. of EE, NIT, Rourkela, www.nitrkl.ac.in 2) Dept. of ETC, MIET, Bhubaneswar, 3) Dept. of EE, IGIT, Sarang, 4) Dept. of Physics, Utkal University, Bhubaneswar. [email protected], [email protected], [email protected], [email protected] MOSFETs usually serve this purpose. However, higher input voltages and lower output voltages have brought about very low duty cycles, increasing switching losses and decreasing conversion efficiency. So in this paper, we have optimized the efficiency of the synchronous buck converter by eliminating switching losses using soft switching technique. The voltage-mode soft-switching method that has attracted most interest in recent years is the zero voltage transition [1], [2], [4]-[8], [10], [11], [13][21], [23]-[27], [29]. This is because of its low additional conduction losses and because its operation is closest to the PWM converters. The auxiliary circuit of the ZVT converters is activated just before the main switch is turned on and ceases after it is accomplished. The auxiliary circuit components in this circuit have lower ratings than those in the main power circuit because the auxiliary circuit is active for only a fraction of the switching cycle; this allows a device that can turn on with fewer switching losses than the main switch to be used as the auxiliary switch. The improvement in efficiency caused by the auxiliary circuit is mainly due to the difference in switching losses between the auxiliary switch and the main power switch if it were to operate without the help of the auxiliary circuit. Previously proposed ZVT-PWM converters have at least one of the following key drawbacks. 1) The auxiliary switch is turned off while it is conducting current. This causes switching losses and EMI to appear that offsets the benefits of the using the auxiliary circuit. In converters such as the ones proposed in [2], [10], [14] and [15] the turn off is very hard. 2) The auxiliary circuit causes the main converter switch to operate with a higher peak current stress and with more circulating current. This results in the need for a higher current-rated device for the main switch, and an increase in conduction losses. The converters proposed in [3], [6], [8], [11], [12], and [16] the current stresses are very high on the main switch. 3) The auxiliary circuit components have high voltage and/or current stresses. Such as converters proposed in [1], [5], [6] and [13], [16]. The converter proposed in [25] and [26] reduces the current stress on the main switch, but

Abstract An improved active auxiliary circuit that allows the power switch in the pulse width modulated synchronous buck converter to operate with zerovoltage switching is proposed in this paper. The proposed zero-voltage transition (ZVT) PWM synchronous buck converter, is designed to operate at low output voltage and high efficiency typically required for portable systems. To make the DC-DC converter efficient at lower voltage, synchronous converter is an obvious choice because of lower conduction loss in the diode. The main feature of the auxiliary circuit is that the auxiliary switch can operate with zero-current switching turn-on and turnoff without increasing the peak current stresses of the main switch. Additionally, the resonant auxiliary circuit designed is also devoid of the switching losses. An analytical study of the proposed converter with the auxiliary circuit is presented in detail, and general guidelines for the design and implementation are given. The analyses have been verified with simulation and experimental results. The suggested procedure ensures an efficient converter. Keywords: Zero voltage transition, Zero voltage switching, Zero current switching, active auxiliary circuit, resonant circuit

1. Introduction The next generation of portable products, such as personal communicators and digital assistants, has demanded improvement in dc-dc converter topology in order to increase battery lifetime and enable smaller, cheaper systems. Since many portable devices operate in low power standby modes for a majority of the time they are on, increasing light-load converter efficiency can significantly increase battery lifetime. A key element in this task, especially at low output voltages that future microprocessor and memory chips will need, is the synchronous rectifier. The synchronous rectifier buck converter is popular for low-voltage power conversion because of its high efficiency and reduced area consumption [3], [9], [12], [21], and [26-27]. A synchronous rectifier is an electronic switch that improves power-conversion efficiency by placing a low-resistance conduction path across the diode rectifier in a switch-mode regulator.

41

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

circuit is very complex. 4) An improved ZVT boost converter with coupled inductor and zener diode is proposed in [19] and switching losses are reduced. However introduction of coupled inductor and zener diode make the complex with ten modes of operations. 5) ZVT technique has been implemented in a single-phase active power factor correction circuit [24], but results are presented at 4o kHz only, which cannot be regarded as very high and so high power density may not be achieved. 6) The simulation results of the ZVT synchronous buck converter are presented in [27], but experimental results are not presented. 7) A GA tuned dc-dc converter is implemented in [28]. But it is not a soft switching converter and so not suitable for high frequency operation. 8) Only simulated results are presented in [29], but problems of voltage and current stresses were not discussed.

Figure1. The proposed converter 1. Input Voltage Vi is constant. 2. Output Voltage V0 is constant or output capacitor C0 is large enough. 3. Output current I0 is constant or output inductor L0 is large enough. 4. Output Inductor L0 is much larger than resonant circuit inductor Lr. 5. Resonant circuits are ideal. 6. Semiconductor devices are ideal. 7. Reverse recovery time of all diodes is ignored.

Reducing switching losses for low power circuit such as synchronous buck is not known to be present in the literatures [1]-[24]. The converter shown in figure 1 is designed for a low voltage, high current circuit and found to be highly efficient. Hence, this paper presents a new class of ZVT synchronous buck converter. By using a resonant auxiliary network in parallel with the main switch, the proposed converters achieve zero-voltage switching for the main switch and synchronous switch and zero-current switching for the auxiliary switch without increasing their voltage and current stresses. The paper is organized as follows: The next section gives a short description of the proposed circuit followed by review of the various modes of operation with their key waveforms and the representation of their equivalent operation modes. Section III presents the design considerations and section IV includes basic features of converter. Section V includes simulation and experimental results to illustrate the features of the proposed converter scheme. Section VI includes some conclusions.

2.

B. Modes of Operation Eight stages take place in the steady-state operation during one switching cycle in the proposed converter. The key waveforms of these stages are given in figure 2 and the equivalent circuit schemes of the operation stages are given in figure 3. The detailed analysis of every stage is presented below: Mode 1 (t0, t1): Prior to t = t0, the body diode of S2 was conducting; main switch S and auxiliary switch S1 are turned-off. At t0, the auxiliary switch S1 is turned on which realizes zero-current turn-on as it is in series with the resonant inductor Lr. The current through resonant inductor Lr and resonant capacitor Cr rise at the same rate as falls of current through iS2. Resonance occurs between Lr and Cr during this mode. The mode ends at t = t1, when iLr reaches I0 and iS2 falls to zero in result the body diode of S2 stops conducting. The voltage and current expressions which govern this circuit mode are given by: iS I iL (1)

Operation Principles And Analysis

A. Definitions and Assumptions The circuit scheme of the proposed new ZVT synchronous buck converter is shown in figure 1. The auxiliary circuit consists of switch S1, resonant capacitor Cr, Resonant inductor Lr. The auxiliary circuit operates only during a short switchingtransition time to create ZVS condition for the main switch. The body diode of the main switch is also utilized in the converter. A high frequency schottky diode DS is used for discharging the capacitor voltage to the output, which happens before the turn on of the synchronous switch. During one switching cycle, the following assumptions are made in order to simplify the steady-state analysis of the circuit shown in figure 1.

iL t

V

t

Z

1

ω Z

At t=t1,

42

sin ω t

LC L C

t

Resonant Frequency Characteristic Impedance

(2)

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

VC t

VC

t

iL t t

t

t

VC t

(3)

I

VC

(18)

(4)

t

I Z

sin

(5)

V

Mode 2 (t1, t2): Lr and Cr continue to resonate. At t1, the synchronous switch S2 is turned on under ZVS. This mode is made to end by turning off the switch S2 under ZVS, when iLr current reaches to it maximum value i.e. iLrmax.

iS

iL

iL t

t

I

(6)

VS VC

sin ω t

Z

t At t=t2, iL t t

t

I cos ω t (7)

IL

t

tan

VC t

t

(8)

VS VC

(9)

I Z

VC

(10)

Mode 3 (t2, t3): At t2, iLr reaches its peak value iLrmax. Since iLr is more than load current I0, the capacitor CS will be charged and discharge through body diode of main switch S, which leads to conduction of body diode. This mode ends when resonant current iLr falls to load current I0. So current through body diode of main switch S becomes zero which results turned off of body diode. At the same time the main switch S is turned on under ZVS. The voltage and current expressions for this mode are:

iL t

VC

t

Z

I cos ω t

sin ω t

t

t

(11)

At t=t3, IL

t

tan

Z

VC

Sin VC t

(12) I (13) (14)

VC

Mode 4 (t3 t4): At t3, the main switch is turned-on with ZVS. During this stage the growth rate of iS, is determined by the resonance between Lr and Cr. The resonant process continues in this mode and the current iLr continue to decrease. This mode ends when iLr falls to zero and S1 can be turned-off with ZCS. The voltage and current expressions for this mode are:

iL t t I Cosω t At t = t4, 0 iL t t

tan

VC

t

Z

. sin ω t

Figure 2. Key theoretical waveforms concerning the operation stages in the proposed converter

t (15) (16)

I Z VC

(17)

43

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

Figure 3. Modes of operation

44

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

Mode 5 (t4 t5): At t4, the auxiliary switch S1 is turnedoff with ZCS. The body diode of S1 begins to conduct due to resonant capacitor Cr which starts to discharge. The resonant current iLr rises in the reverse direction, reaches a maximum negative and increases to zero. At this moment the body diode of S1 is turned off and the mode ends. The voltage and current equations for this mode are given by: V Sinω t t (19) iL t t

the auxiliary circuit. The proposed auxiliary resonant circuit provides soft switching conditions for the main transistor. The following design procedure is developed considering procedures such as those presented previously [20]. A. Delay time The on time of auxiliary switch (S1) must be shorter than one tenth of the switching period.

TD =

Z

At t = t5 iL t 0 t VC t VC

a=

(a − 1) 2 I in (max) TD ⎡ π ⎤ Vo ⎢1 + (a − 1) ⎥ ⎣ 2 ⎦

(32)

D. Resonant Inductor (Lr) The resonant inductor is given by

Lr =

VoTD

⎡ π ⎤ I in (max) ⎢1 + ( a − 1) ⎥ 2 ⎣ ⎦

(33)

E. MOSFET Selection A method to choose the MOSFETs for the converter is to compare the power dissipation values for a number of different MOSFET types. Usually, a low on-state drain resistance MOSFET is chosen for the synchronous rectifier, and a MOSFET with a low gate charge is chosen for the switches.

(27)

Mode 8 (t7, t8): At t7, the body diode of switch S2 is on as soon as Cr is fully discharged and schottky diode is turned off under ZVS. Dead time loss is negligibly small compared to the conventional synchronous buck converter. During this mode, the converter operates like a conventional PWM buck converter until the switch S1 is turned on in the next switching cycle. The equation that defines this mode is given by:

I

(31)

I in (max)

Cr =

(28)

iS

I Lrm

It is greater than one (1 ≤ a ≤ 1.5) and is desired to be as small as possible. This factor can be used for the selection of the auxiliary switch. C. Resonant Capacitor (Cr) The resonant capacitor can be expressed as

Mode 7 (t6, t7): At t6, the main switch S is turned off with ZVS. The schottky diode D starts conducting. The resonant energy stored in the capacitor Cr starts discharging to the load through the high frequency schottky diode DS for a very short period of time, hence body – diode conduction losses and drop in output voltage is too low. This mode finishes when Cr is fully discharged. The equations that define this mode are given by: (26)

0

(30)

B. Current Stress Factor (a) The current stress factor of the auxiliary switch is defined as

(20) (21) (22)

Mode 6 (t5, t6): Since the body diode of S1 has turned off at t5, now only the main switch S carries the load current. There is no resonance in this mode and the circuit operation is identical to a conventional PWM buck converter. The voltage and current equations for this mode are given by: I (23) i 0 (24) iL t (25) VC VC t

At t=t7,

1 TS 10

4.

Basic Converter Features

The features of the proposed soft switching converter are briefly summarized as follows. 1. All of the active and passive semiconductor devices are turned on and off under exact ZVS and/or ZCS. 2. The proposed converter has a simple structure, low cost, and ease of control. 3. The converter acts as a conventional PWM converter during most of the switching cycle. 4. The presented snubber cell can be easily applied to the other basic PWM dc-dc converters and to all switching converters. 5. The proposed converter has a larger total efficiency and a wider load range. 6. The main switch and the auxiliary switch are not subjected to additional voltage stresses. Current

(29)

3. Design Procedure Design of conventional PWM converters has been well presented in literatures. Thus it is more significant to focus on design procedures of the auxiliary circuit. The resonant inductor, resonant capacitor, and the delay time of the auxiliary switch are the most important components when designing

45

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

stress on the main switch is slightly higher, but current stress on the auxiliary switch is within safe limit.

5.

converter. The high efficiency correctness of the design values.

concludes

the

Simulation And Experimental Results

A Prototype of the proposed converter shown in figure 4 has been built in the laboratory. The newly proposed converter operates with an input voltage Vs = 12V, output voltage Vo = 3.3V, load current of 10A and a switching frequency of 200 kHz. The converter is simulated using simulation software PSIM version 6.0. The major parameters and components are given in Table 1. Table 1: Components used in the Proposed converter Component Value/Model Simulation Experiment Main Switch, S Auxiliary Switch, S1 Synchronous Switch, S2 Schottky Diode, D Capacitance, CS Resonant Inductor, Lr Resonant Capacitor, Cr Output Capacitor, Co Output Inductor, Lo

Ideal Ideal Ideal Ideal 0.05nH 200nH 0.2μF 100μF 2μH

IRF1312 IRF1010E IRF1010E MBR60L45CTG 0.05nH 200nH 0.2Μf 100μF 2μH

Figure 5. Switching Waveform of S in ZVT SBC

Figure 6. Switching Waveform of S1 in ZVT SBC

Figure 4 Experimental setup of the proposed onverter Figures (5, 6 & 7) show the simulation results of the proposed converter and figures (8,9 & 10) present the experimental results. All the waveforms except the efficiency curve represents a time period of one switching cycle, which is 5μs in this case. The amplitudes are denoted below each of their waveforms respectively. Figure (5 & 8) shows the ZVS switching of the main switch. Figure (6 & 9) shows the ZCS switching of the auxiliary switch and figures (7 &10) shows the ZVS switching of the synchronous switch.

Figure 7. Switching Waveform of S2 in ZVT SBC

From figure 11 it can be observed that the efficiency values of the soft switching converter are relatively high with respect to those of the hard switching converter. The efficiency values towards the minimum output power decrease naturally because the converter is designed for the maximum output current. At 70% output power, the overall efficiency of the proposed converter increases to about 97% from the value of 92% in its counterpart hard switching

46

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

6.

Conclusion

The concepts of ZVT used in high power were implemented in synchronous buck converter and it was shown that the switching losses in synchronous buck were eliminated. Besides the main switch ZVS turned-on and turned-off, the auxiliary switch ZCS turned-on and turned-off, the synchronous switch also turned-on and turned-off under ZVS. Hence switching losses are reduced and the newly proposed ZVT synchronous buck is highly efficient than the conventional converter. The additional current stresses on the main devices do not take place, however the voltage stress on the top switch is little higher, which can be overcome by using a little higher rating MOSFET. The auxiliary devices are subjected to allowable voltage and current values. Moreover, the converter has a simple structure, low cost and ease of control. A prototype of a 3.3V, 10A, 200 kHz system was implemented to experimentally verify the improved performance.

VS

IS

Figure 8. Main Switch S: VS; IS: (V: 10 V/div, I: 10 A/div, time: 0. 5 μs/div)

VS1

7. References [1] L.Yang and C.Q.Lee, “Analysis and design of boost zero-voltage-transition PWM converter,” in Proc. IEEE APEC Conf. pp.707-71, 1993.. [2] G.Hua, C.S.Leu, Y.Jiang, and F.C.Lee, “Novel zero-voltage-transition PWM converters,” IEEE Trans. Power Electron., vol.9, no.2, pp.213-219, Mar.1994. [3] A.J.Stratakos, S.R.sanders, and R.W.Broderson, “A low-voltage CMOSdc-dc converter for a portable battery-operated system,” in Proc. Power Electronics Specialist conf., vol.1, pp.619-626, Jun. 1994 [4] A.V.da Costa, C.H.G.Treviso, and L.C.deFreitas, “A new ZCS-ZVS-PWM boost converter with unity power factor operation,” in Proc. IEEE APEC Conf., pp.404-410, 1994 [5] N.P.Filho, V.J.Farias, and L.C.deFreitas, “A novel family of DC-DC PWM converters uses the self resonance principle,” in Proc. IEEE PESC Conf., pp.1385-1391, 1994. [6] G. Moschopoulos, P.Jain, and G.Joos, “A novel zero-voltage switched PWM boost converter,” in Proc. IEEE PESC Conf., pp.694-700, 1995. [7] Elasser and D. A. Torrey, “Soft switching active snubbers for dc/dc converters,” IEEE Trans. Power Electron., vol. 11, no. 5, pp. 710–722, 1996. [8] K.M.Smith and K.M.Smedly, “A comparison of voltage-mode soft switching methods for PWM converters,” IEEE Trans. Power Electron., vol.12, no.2, pp.376-386, Mar.1997. [9] O.Djekic, M.Brkovic, “Synchronous rectifiers vs. schottky diodes in a buck topology for low voltage applications,” Power Electronics Specialists Conference, 1997. PESC '97 Record, 28th Annual IEEE, 22- 27 vol.2, pp. 1374 – 1380, June 1997,

Is1 Figure 9. Auxiliary Switch S1: VS1; IS1: (V: 10 V/div, I: 10 A/div, time: 0. 5 μs/div)

IS2

VS2 Figure 10. Synchronous Switch S2: VS2; IS2: (V: 10 V/div, I: 10 A/div, time: 0. 5 μs/div)

Figure 11. Converter efficiency versus output power

47

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

[10] Y.Xi, P.K.Jain, G.Joos, and H.Jin, “A zero voltage switching forward converter topology,” in Proc. IEEE INTELEC conf., pp.116-123, 1997. [11] C.J.Tseng and C.L.Chen, “Novel ZVT-PWM converter with active snubbers,” IEEE Trans. Power Electron., vol.13, no.5, pp.861-869, Sept.1998. [12] O.Djekic, M. Brkovic, A. Roy “High frequency synchronous buck converter for low voltage applications.” IEEE PESC’98 Record, vol.2, pp. 1248 – 1254, 1998. [13] G. Moschopoulos, P.Jain, G.Joos, and Y.F.Liu, “Zero voltage switched PWM boost converter with an energy feedforward auxiliary circuit,” IEEE Trans., Power Electron., vol.14, no.4, pp.653-662, Jul.1999. [14] T.W.Kim, H.S.Kim, and H.W.Ahn, “An improved ZVT PWM boost converter,” in Proc. IEEE PESC Conf., 2000, pp.615-619, 2000. [15] J.H.Kim, D.Y.Lee, H.S.Choi, and B.H.Cho, “High performance boost PFP with an improved ZVT converter,” in Proc., IEEE APEC Conf., pp. 337-342, 2001. [16] N.Jain, P.Jain, and G.Joos, “Analysis of a zero voltage transition boost converter using a soft switching auxiliary circuit with reduced conduction losses,” in Proc. IEEE PESC Conf., pp.1799-1804, 2001. [17] M.L.Martins, H.A.Grundling, H.Pinheiro, J.R.Pinheiro, and H.L. Hey, “ A ZVT PWM boost converter using auxiliary resonant source,” in Proc. IEEE APEC Conf., pp.1101-1107, 2002. [18] I.L.Oun, D.Y.Lee and B.H.Cho,” Improved zerovoltage-transition (ZVT) boost converter using coupled inductor and low voltage zener diode,” Power Conversion Conference Proceedings, PCC OSAKA, Vol.2, pp.627-631, 2-5 April 2002. [19] C.M.Wang, “Zero-voltage-transition PWM dcdc converters using a new zero-voltage switch cell,” in Proc. IEEE INTELEC Conf., pp.784789, 2003 [20] M.L. Martins, J.L. Russi, H. Pinheiro, H.A. Grundling, H.L. Hey, “ Unified design for ZVT PWM converters with resonant auxiliary circuit,” Electric power applications, IEE proceedings, vol.151, issue 3, pp. 303-312, 8 May 2004. [21] S. Kaewarsa, C. Prapanavarat, U. Yangyuen, “An improved zero-voltage-transition technique in a single-phase power factor correction circuit,” International conference on power system technology – POERCON 2004, , vol.1, pp.678 – 683, 21-24 Nov. 2004. [22] M.D.Mulligan, B.Broach, and Thomas H.Lee,” A constant-frequency Method for Improving light-load efficiency in synchronous buck converters,” IEEE Power Electronics letters, vol.3, no.1, pp.24-29, March 2005.

[23] M.L.Martins, J.L.Russi, H.L.Hey, “Zero-voltage transition PWM converters: a classification methodology,” IEEE proceedings on electric power applications, vol.152, no.2, pp.323 – 334, March 2005. [24] S.Kaewarsa,”An improved zero-voltagetransition technique in single-phase active power factor correction circuit,” AU Journal of Technology, Vol.8, No.4, pp.207-214, April 2005. [25] W.Huang and G. Moschopoulos, “A new family of zero-voltage-transition PWM converters with dual active auxiliary circuits,” IEEE Trans. Power Electron., vol.21, no.2, pp.370-379, March 2006. [26] V.Yousefzadeh and D.Maksimovic, “Sensorless optimization of dead times in dc-dc converters with synchronous rectifiers,” IEEE Trans. Power Electronics, vol.21, no.4, pp.994-1002, July 2006. [27] A.K. Panda, Aroul.K, “A Novel Technique to reduce the Switching losses in a synchronous buck converter” IEEE PEDES Conference, IIT, Delhi, pp.1-5, 12th-15th Dec2006, [28] S.G.Kadwane, S.Gupta, B.M.Karan, T.Ghose, “Practical implementation of GA tuned DC-DC converter” ICGST ACSE Journal, Vol.6, issue 1, January 2006. [29] P.R.Mohan, M.V.Kumar, O.V.R. Reddy, and R.Reddy, “ Simulation of a novel zero voltage transition technique based on boost power factor correction converter with EMI filter,” ARPN Journal of Engineering and Applied Sciences, Vol.2, No.4, pp.1-5, August 2007.

48

Digital Signal Processing Journal, ACSE-ISSN: 1687-4811, Volume 9, Issue 1, Delaware, USA, June 2009

Anup Kumar Panda: Born in 1964. Received the B.Tech in Electrical Engineering from Sambalpur University, India. Got the M.Tech in Power Electronics and Drives from Indian Institute of Technology, Kharagpur, India in 1993 and Ph.D. in 2001 from Utkal University. Join as a lecturer in IGIT, Sarang under the Utkal University in January 1990. Served there for eleven years and then join National Institute of Technology, Rourkela in January 2001 as an assistant professor and currently continuing as an Professor in the Department of Electrical Engineering. . Till now published around thirty papers in national and international journals and conferences. Completed two MHRD projects amounting to twenty-one lakhs and currently handling a project of Thirty Five Lakhs sponsored by CDAC. Guided one Ph.D scholar and presently guiding Four scholars in the field of Power Electronics. Present interest of research areas are: Soft Switching Converters, Electrical Drives, Power Quality Improvement etc.

Bibhu Prasad Panigrahi, was born in 1968. He received the B.Sc.(Engineering) degree in Electrical Engineering from University College of Engineering, Burla, Sambalpur University (Orissa), India in the year 1989 and the M.Tech. (Power Electronics and Power Systems) degree from Indian Institute of Technology, Bombay, Mumbai, India in the year 1997. He obtained his Ph.D. degree in Electrical Engineering from Indian Institute of Technology, Kharagpur, India in the year 2007. He has worked as Assistant Engineer in Fertilizer Corporation of India Limited for two years from 1989 to 1991. After spending few months as Executive Trainee in Nuclear Power Corporation of India Limited, he switched over to academics and joined Electrical Engineering Department of Indira Gandhi Institute of Technology (IGIT), Sarang, Dhenkanal, Orissa (India) as a Lecturer in April 1992. In April 1997, he became Senior Lecturer in the same department and in April 2002, he was promoted to Lecturer (Selection Grade)/Assistant Professor. In March 2009, he joined as Professor in the same Electrical Engineering Department of IGIT, Sarang. He has published twenty four research papers in various journals and conference proceedings. His research interest includes Electrical Machines, Power Electronics and Power Systems.

Hari Narayan Pratihari: Born in1967. Received the B.Tech in Electronics and Communication Engineering. Received the M.Tech in Electronics and Communication in 2005 from NIT Rourkela. Now pursuing his Ph.D from Utkal University. Currently he is Associate Professor in the Department of Electronics and Telecommunication at Mahavir Institute of Engineering and Technology, Bhubaneswar. He has published around ten papers in national and international conferences and journals.

L. Maharana: Born in 1950.He was graduated in physics in 1969. He has done his post graduation in 1971 from Utkal University. He received his Ph.D in 1980 from Utkal University, Orissa, India. He has 33 years of teaching & research experience. He has published around forty four national & international journals and conference papers. At present, Mr. Maharana is the Head, Department of physics, Utkal University, India.

49