The most obvious way, a weighted sum of a fi- ... in speech and audio applications is frequency-warped linear prediction, WLP, [9, 10, 11]. Here, unit .... and vector γ are formed, and the filter coefficients are computed from Equation (11). 6 ..... the search for an excitation vector is based on the output of the synthesis filter. How ...
Linear predictive coding with modified filter structures Aki Härmä
Abstract In conventional one-step forward linear prediction an estimate for the current sample value is formed as a linear combination of previous sample values. In this paper, a generalized form of this scheme is studied. Here, the prediction is not based simply on the previous sample values but to the signal history as seen through an arbitrary filterbank. It is shown in the paper how the coefficients of a modified model can be obtained and how the inverse and synthesis filters can be implemented. Various properties of such systems are derived in this article. As an example, a novel linear predictive system using inherently logarithmic frequency representation is introduced.
1
Introduction
Linear predictive coding, LPC, has been widely used especially in speech signal processing since its introduction in the late sixties [1]. During the last three decades the number of available techniques have been growing rapidly [2, 3, 4, 5]. However, relatively little has been done to change the basic principle of prediction. In linear prediction an estimate x˜(n) for a sample value x(n) is given as a linear combination of previous sample values. 1
There are infinitely many alternative ways to form a linear combination of signal history and use it to predict the next signal value. The most obvious way, a weighted sum of a finite number N of previous signal values, is typically utilized in, e.g., coding applications. This selection for sampling of signal history is not based on any mathematical necessity. The underlying mathematical theory is in Wold decomposition principle [6] stating that a regular sequence can be obtained from white noise by filtering with an IIR filter. In traditional LPC this filter is assumed to be a conventional all-pole filter, and the optimal coefficients, in a least squares sense, can be obtained from the celebrated Yule-Walker equations which obey a more general orthogonality principle of linear prediction [7, 8]. In classical LPC filter coefficients can be computed directly in the time domain. The obtained coefficients can be used in an inverse filter to produce a prediction error signal. The original signal can be synthesized exactly from the prediction error signal using a synthesis filter. In this article, this type of linear predictive systems where coefficients of a power spectrum model can be estimated in the time domain, and inverse and synthesis filters can be directly implemented, are called complete linear predictive coding systems. An example of a modified formulation of the prediction principle which has been used in speech and audio applications is frequency-warped linear prediction, WLP, [9, 10, 11]. Here, unit delays of a conventional prediction filter are replaced by first-order allpass sections. The coefficients of a warped prediction filter can be estimated from the outputs of a chain of allpass filters [10] using, e.g., modified autocorrelation or covariance methods of linear prediction. In a coding application, prediction error and synthesis filters can be implemented directly as warped filters [12], i.e., WLP is a complete system. The main advantage of this approach in speech and audio applications is the obtained modified frequency representation which, with a suitable selection of a warping parameter, approximates the frequency representation of human hearing [13]. However, the freedom of design is in choosing the value of the warping parameter. In mel-generalized 2
cepstral modeling [14] the conventional warped linear prediction is extended by adding a second free parameter which controls the magnitude representation of power spectrum continuously between linear and log-magnitude representations. In this technique, the filter coefficients are estimated in the frequency domain but time-domain filters can be implemented. In selective linear prediction [15], and perceptual linear prediction [16], the modification to the prediction principle is done indirectly by manipulating the power spectrum representation of a signal. Here, a set of coefficients is computed from a modified autocorrelation function obtained as an inverse Fourier transform of the power spectrum representation. This is a generic approach, but it usually does not give direct means for time-domain implementation of analysis and synthesis filters using those coefficients. In system identification literature various types linear predictive spectrum estimation methods based on generalized orthogonal polynomial bases have been presented [17]. The most famous forms are Laguerre models [18, 19, 20, 21] and Kautz models [22, 23]. In these techniques the number of free parameters is, at most, the same as the order of the model. No synthesis filters have been presented, however, the implementation method for synthesis filters introduced in [12] can be used with these models. Therefore the class of LPC’s based on orthogonal polynomial bases is also complete. In this article we study a class of modified complete LPC systems, show how the coefficients can be estimated, and how modified inverse and synthesis filters can be implemented. The presented work is partially based on a recent introduction of a generalized LPC synthesis filter in [12]. The implementation technique is also presented in the current paper. The approach proposed in this article significantly increases the number of free parameters. This facilitates direct design and implementation of complete LPC systems having properties which would be difficult or impossible to obtain using any of the classical modifications. 3
The generalization of the prediction principle is introduced in Section 2. This is followed by Sections 2.1 and 2.2 introducing methods for estimation of filter coefficients, and implementation of filters, respectively. Various properties of modified LPC systems are studied in Section 3. At this point it is necessary to restrict the discussion to certain interesting classes of LPC systems. Guidelines for design of modified LPC systems are discussed in Section 4 with design examples.
2
Generalized blocks for linear predictive coding
The classical linear prediction for a sample value x(n) is given by x˜(n) =
L X
ak x(n − k),
(1)
k=1
where ak are the coefficients of the predictor. The z-transform of (1) is ˜ X(z) =
L X
ak z −k X(z).
(2)
k=1
In this paper, we generalize the prediction principle by replacing z −k with a linear filter Dk (z). This yields ˜ X(z) =
L X
ak Dk (z)X(z).
(3)
k=1
In the time domain one may write x˜(n) =
L X
ak dk [x(n)],
(4)
k=1
where P∞ p=0 x(n − p)dk (p)
dk [x(n)] =
x(n)
if k > 0 if k = 0
and dk (p), p = 0, 1, · · · ∞ is the impulse response of Dk (z). 4
(5)
2.1
Parameter estimation
There are numerous ways to find coefficients for (4). A commonly used error criterion is given by e=
N X
|x(n) − x˜(n)|2 .
(6)
n=0
This is called the Least Squares Error, LSE, principle. The least-squares optimal set of coefficients ak can be solved from normal equations obeying the orthogonality principle of linear prediction, see, e.g., [3]. Substitution of dp [x(n)] into the place of every x(n − p) in classical normal equations gives N X
"
dp [x(n)] d0 [x(n)] −
n=0
L X
!#
ak dk [x(n)]
= 0,
(7)
k=1
where p = 1, 2, · · · , N . This substitution does not violate the underlying theory of linear prediction. Equivalently with classical linear predictive modeling, Eq. (7) can be modified to a matrix expression given by
C0,0 .. .
··· .. .
C0,L−1 .. .
CL−1,0 · · · CL−1,L−1
a1 .. . aL
= −
C0,1 .. . C0,L
,
(8)
where the correlation terms are defined as Ck,p =
−1 1 NX dk [x(n)]dp [x(n)]. N n=0
(9)
Alternatively, Eq. (8) can also be written as Γa = γ,
(10)
where Γ and γ represent matrices on the left and right hand side of (8) and a = [a1 , a2 , · · · , aN ]T . The least-squares optimal coefficients are formally solved from a = Γ−1 γ. 5
(11)
D(z) 1 D(z) 2
...
D(z) 3
D(z) L
Coefficients
Normal equations and solution
x(n)
Figure 1: A network for computation of the filter coefficients ak .
So far, the only requirement is that Γ must be invertible. The computation of coefficients is sketched in Fig. 1. This is equivalent to the covariance method of linear prediction [3] because the correlation matrix Γ is not of Toeplitz form. A Toeplitz form would lead to Yule-Walker equations which can be solved efficiently, e.g., using Levinson-Durbin algorithm [24]. A class of modified LPC’s with a Toeplitz correlation matrix is introduced in Section 3.1.
In practice, a signal of finite length is fed to the network of Fig. 1 and all correlations between inputs and outputs of filters Dk (z) are computed over the signal. Then matrix Γ and vector γ are formed, and the filter coefficients are computed from Equation (11). 6
x(n)
e(n) -
~ x(n)
a1
D(z) 1
a2
D(z) 2
a
3
...
...
D(z) 3
a D(z) L
L
Figure 2: Block diagram of a generalized prediction error filter.
2.2
Filters
The prediction error signal, or residual, is given by e(n) = x(n) − x˜(n) = x(n) −
L X
ak dk [x(n)].
(12)
k=1
Its z-transform is E(z) = X(z) 1 −
L X
!
ak Dk (z) = X(z)H(z),
(13)
k=1
where H(z) is the transfer function of a prediction error filter. The block diagram of this filter is shown in Fig. 2. This filter can be implemented directly. To synthesize the signal from the residual signal it is necessary to implement the inverse filter of (12). The resulting synthesis filter is given by x(n) = e(n) +
L X
k=1
7
ak dk [x(n)]
(14)
e(n)
x(n)
~ x(n)
a
1
D(z) 1
a2
D(z) 2
a3
...
...
D(z) 3
aL D(z) L Figure 3: Block diagram of a generalized synthesis filter. X(z) =
1−
E(z) E(z) = H(z) k=1 ak Dk (z)
PL
(15)
with its block diagram shown in Fig. 3. The implementation of the synthesis filter may be a problem if there are delay-free paths in any sub-filter Dk (z). In this case, the block diagram in Fig. 3 has delay-free loops. However, there are techniques that can be used to implement the filter directly. An implementation technique for this type of recursive filters was introduced in [25, 12]. The algorithm may be applied to any filter having the structure shown in Fig. 4a, where P is a linear system containing delay-free paths. If P has delay-free paths the problem is that its output, denoted o(n) in Fig. 4a, is a function of y(n) and vice versa. A solution is to divide the computation of y(n) into two distinct steps. First, we may calculate an output of P so that P is temporarily disconnected 8
x(n)
y(n)
x(n)
o(n)
a)
χ
y(n) 0
y(n)
o (n)
P
0
b)
P
0.0
Figure 4: a) A recursive delay-free filter b) an equivalent circuit. from y(n). In Fig. 4b this is denoted o0 (n). A practical way to do that is to feed a zero sample into P and read its output. After that y0 (n) = x(n) + o0 (n) is fed to a pure delayfree structure (the upper loop in Fig. 4b). The pure delay-free structure may be obtained from the original filter structure by disconnecting all its inner unit delay elements. In the cases of our interest P reduces to a single coefficient χ. The output of the disconnected part o0 (n) depends only on the previous samples while the output of the pure delay-free structure is a function of x(n) and o0 (n). From Figure 4b one can write down the following difference equation: y(n) = x(n) + o0 (n) + χy(n).
(16)
The output of the filter at time n is easy to solve from this difference equation and it is given by y(n) =
x(n) + o0 (n) . 1−χ
(17)
The following algorithm for implementation of a generic synthesis filter was proposed in [12]: Algorithm 1 1. At a given time step, compute o0 (n) using the disconnected structure shown in Fig. 4b. 2. The value of χ may be computed in advance or obtained as an output for P , where 9
the inner states are set to zeros, using 1 as an input value. That is, χ is the first sample of the impulse response of P .
3. The final output y(n) of the delay-free recursive filter is given by y(n) =
x(n) + o0 (n) . 1−χ
(18)
4. Update the inner states of P using y(n).
5. n → n + 1 and go to step 1. This algorithm can be directly used to implement (15) with any set of linear sub-filters Dk (z). The only limitation is where the loop gain is χ = 1. In this case, the denominator of Eq. (18) vanishes. Since χ is the first sample of the impulse response of P , the denominator 1 − χ is the first sample of the impulse response of the prediction error filter H(z). The power series expansion, or z-transform of the impulse response, of H(z) is given by H(z) =
∞ X
hp z −p .
(19)
p=0
If h0 = 1 − χ = 0, the synthesis filter can be written as 1 1 = , P∞ H(z) 0 + p=1 hp z −p
(20)
which is a non-causal filter. Therefore, in some cases the estimated set of filter coefficients ak can result in a non-causal synthesis filter which cannot be implemented. It has been shown that in warped linear prediction this can happen only if the warped synthesis filter has a pole in a specific position outside of the unit circle [12]. 10
3
Properties
3.1
Taxonomy of predictive structures
The most general case is when Dk (z) are a set of arbitrary linear filters. It would be possible to choose the filters so that they represent some specific properties of an input signal. For example, in a speech coding application one of the filters would probably be a long-term pitch predictor and others could be matched to the transfer functions of the vocal tract. Although any such complete LPC system can be implemented, they typically are difficult to use in coding applications due to the difficulties of controlling the stability of the synthesis filter. Generally, stability problems could be handled using, e.g, regularization techniques [26]. A scheme which provides a relatively clear analogy to conventional LPC is based on the principle of uniform sampling. In this article, uniform sampling means that the difference between transfer functions in any adjacent pair of outputs dk [x(n)] and dk−1 [x(n)] is identical. In the z-domain, this can be expressed by Dk (z)/Dk−1 (z) = D1 (z),
(21)
where D1 (z) is a linear, but not necessarily causal, prototype filter. Consequently, each subfilter can be expressed by Dk (z) = D1 (z)k .
(22)
Uniform sampling systems constructed of allpass subfilters is a class which has many favorable properties. It also includes the classical LPC and warped LPC as special cases. Let us interpret correlation terms of Eq. (8) as expectation values given by Ck,p = E[dk [x(n)]dp [x(n)]].
(23)
For an allpass subfilter it is known that Dk (z −1 ) = Dk−1 (z). Thus, one can apply 11
Parceval’s theorem to show that Ck,p = E[dk [x(n)]dp [x(n)]] ∼
∞ X
dk [x(n)]dp [x(n)]
n=−∞
1 I dz D(z)k X(z)D(z −1 )p X(z −1 ) i2π C z 1 I dz = D(z)k−p X(z)X(z −1 ) i2π C z =
=
∞ X
d|k−p| [x(n)]d0 [x(n)] ∼ E[d|k−p| [x(n)]d0 [x(n)]] = C|k−p|,0 ,
(24)
n=−∞
where k and p are any integers and ∼ indicates that the normalization of the expectation is omitted to simplify notation. Equation (24) shows that if subfilters are allpass filters and form a cascade of (21), the correlation matrix of (8) has Toeplitz symmetry. Consequently, the normal equations in (7) reduce to Yule-Walker equations where the corresponding correlation matrix Γ is a Toeplitz matrix. This facilitates the use computationally efficient techniques, for example, the Levinson-Durbin algorithm [27] to solve the coefficients ak . That is, the computation of coefficients is analogous to the autocorrelation method of linear prediction [3].
3.2
Spectrum representation of prediction error
Spectrum interpretation of the performance of linear predictive systems is of fundamental importance since modified LPC systems typically have a different representation of the frequency scale. They feature a warped frequency representation. In the following, the spectrum representation is studied in the case of an uniformly sampling system, where subfilters obey Eq. (22). In a more general case, the spectrum representation of the prediction error is difficult to formulate. The Fourier transform of the prediction error in an uniformly sampling LPC can be expressed as E(ω) = X(ω) −
L X
k=1
12
ak D1k (ω)X(ω),
(25)
where ω is given in radians. Using the notation D1 (ω) = A(ω)e−iφ(ω) = e−i(i ln(A(ω))+ψ(ω)) = e−iν(ω)
(26)
one may write Eq. (25) as E(ω) = X(ω) −
L X
ak e−ikν(ω) X(ω).
(27)
k=1
The resulting spectrum error variance is given by 2 L X −ikν(ω) |E(ω)| dν(ω) = |X(ω)| 1 − ak e dν(ω). −π −π
Z
π
Z
2
π
2
(28)
k=1
Changing the integration variable to ω leads to 2 L X −ikν(ω) |X(ω)| 1 − ak e −π
Z
π
2
k=1
!
∂ψ(ω) i∂A(ω) + dω. A(ω)∂ω ∂ω
(29)
The imaginary part of (29) is an integral over a product of a symmetrical power spectrum term and ∂A(ω)/(A(ω)∂ω). In the cases of interest A(ω) is a symmetrical magnitude response and therefore the latter is an antisymmetrical function. Therefore, the imaginary component vanishes in integration and (29) reduces to 2 L X k −ikψ(ω) |X(ω)| 1 − ak A (ω)e −π
Z
π
2
k=1
!
∂ψ(ω) dω. ∂ω
(30)
In (30) ψ(ω) defines a frequency-warping function. This form is closely related to the theory of the orthogonal FAM functions [28]. The warping effect changes the spectral density of the estimated spectral model. The associated tilt in the power spectrum is produced by function W (ω) =
∂ψ(ω) ∂ω
(31)
which appears in (30). The negative value of (31), −W (ω), is the group delay of D1 (ω). Its inverse can be used to compensate this effect when necessary. 13
In practice, it is reasonable to choose ψ(ω) to be a monotonic function of ω. Otherwise, the frequency mapping is not unique meaning that a single spectrum peak could produce multiple peaks into the estimated spectrum. On the other hand, if ψ(ω) exceeds the range [θ − π, θ + π], where θ is some phase offset, as ω goes from −π to π, the system may produce spurious peaks into the spectral estimate due to the aliasing effect. That is to be expected because the system performs sub-sampling of the waveform if the phase difference between two adjacent subfilters is larger that π.
3.3
Parallel and cascaded systems
If it is required that the phase difference between two adjacent filters Dk (z) and Dk−1 (z) should be limited to π, there are two alternatives for D1 (z). First, D1 (z) can be a firstorder causal filter given by a0 + a1 z −1 . (32) b0 + b1 z −1 In this case subfilters Dk (z) can be implemented as a cascade of first-order blocks. In D1 (z) =
the case of the uniform sampling approach the properties of the modified system, e.g., the warping function ψ(ω), are are limited to those obtained with a first-order subfilter. Alternatively, D1 (z) can be an arbitrarily high-order non-causal filter. This can be obtained using the parallel structure shown in Figs. 1, 2, and 3. For example, let Dk−1 (z) and Dk (z) be (k − 1)th and kth order filters. Now Dk (z)/Dk−1 (z) becomes a (2k − 1)thorder non-causal transfer function which still has a phase response between [θ −π, θ +π]. For example, let sub-filters be allpass filters given by Dk (z) = z −k
Pk (z −1 ) , Pk (z)
(33)
and P (z) is a kth order polynomial. The z-transform of the synthesis filter (15) is given by H(z) =
1 1−
−1 ) a1 z −1 PP1 (z 1 (z)
14
−1
− · · · − aL z −L PPLL(z(z) )
.
(34)
After simple manipulations this leads to − Lk=1 Pk (z −1 ) H(z) = QL − k=1 Pk (z −1 ) + Ξ1 (z) + · · · + ΞL (z) Q
(35)
where Ξk (z) = ak Pk (z)
L Y
Pp (z −1 )z −k .
(36)
p=1,p6=k
The estimated model is a pole-zero filter with a fixed set of zeroes, that is, the numerator of (35) is not a function of coefficients ak . The order of the filter is L! even if it has only L filter coefficients ak are to be estimated.
3.4
Gain factor
In this section it is convenient to express the transfer function H(z), which is a rational function, as an infinite power series expansion given by H(z) =
∞ X
hp z −p ,
(37)
p=0
where hp are the samples of the impulse response of H(z). It was shown in [29] that for a minimum-phase filter H(z), having all the poles (or zeroes) within the unit circle, the impulse response satisfies 1 Zπ ln |H(e−iω )|2 dω = 0. 2π −π
(38)
This also holds for a generalized filter of (12) or (15) if the integration is performed over ν(ω). However, this does not hold if the integration is performed over ω. Therefore, there is an additional gain term in the filter. The proof of (38) for a conventional filter [3] reduces to lim H(z −1 ) = 1,
(39)
z→∞
which states that the gain of the filter is one. In the case of a generalized filter we have −1
lim H(z ) = H(0) = h0 +
z→∞
∞ X
p=0
15
hp 0 = h0 6= 1.
(40)
Therefore, the gain of the estimated filter H(z) is not unity but h0 , which is the first sample value of the impulse response of H(z). On the other hand, if χ is the first sample of the impulse response of P (z) in Algorithm 1, then it also holds that h0 = 1 − χ. Dividing the estimated coefficients ak of a generalized filter by g the filter satisfies (38). In practice, this gain term has to be taken into account in measuring, e.g., prediction gain values. Otherwise, results would be biased and meaningless.
3.5 Stability of the synthesis filter Nothing guarantees the stability of the synthesis filter (15) in a generalized form. Stability depends on the choice of Dk (z) and the estimated filter coefficients ak . According to our practical experience, stability problems easily appear in the following cases: 1. If the filterbank performs non-uniform or sparse sampling in the time-domain. Sparse sampling means that the phase function between two adjacent stages Dk (z) and Dk+1 (z) is not limited to the range of 2π. This approach often leads to a correlation matrix Γ which may be lacking the property of positive definiteness or it may even be singular. If the correlation matrix is positive definite, then the obtained filter in the conventional LP is stable, see, e.g., [15, 30]. However, this does not guarantee the stability in a generalized case (see below) It is a practical rule of thumb to design the filters such that the phase function of Dk (z) is equal to or approximates ψk (ω) = kψ1 (ω), where ψ1 (ω) is the phase function of a prototype sub-filter. 2. If Dk (ω) are not allpass filters. 16
(41)
There is a classical result for the classical LP that if the correlation matrix Γ is positive definite, all reflection coefficients corresponding to an estimated filter obey |Kp | < 1 [15, 31]. Moreover, the synthesis filter has all its poles within the unit circle. Reflection coefficients can be computed from filter coefficients ak using a simple recursive algorithm [3]. If the filters Dk (z) are a chain of identical elements, as in (21), a generalized lattice filter can be implemented by replacing the unit delays of a conventional lattice structure with these elements, see, e.g., [12]. The reflection coefficients of the modified lattice can be computed directly from the correlation matrix. Therefore, it holds that a positive definite correlation matrix yields |Kp | < 1. Let us study a simple example with D1 (z) =
λ1 + z −1 z −1 + λ2
(42)
The transfer function of the first stage of a synthesis lattice is now given by H(z) =
1 −1
1 +z 1 − K1 λz−1 +λ2
,
(43)
which has the following pole z=
λ2 − K1 . −1 + K1 λ1
(44)
If −1 < λ1 = λ2 < 1 and |K1 | < 1, the pole is always within the unit circle, and the filter is stable. This is an allpass filter and it can be shown that this holds also for higher order allpass filters. If λ1 6= λ2 , it is easy to find parameter values, for which a reflection coefficient |K1 | < 1 yields a pole outside of the unit circle. That is, if the sub-filters Dk (z) are not allpass filters the property of the positive definiteness of the correlation matrix does not necessarily induce a stable synthesis filter. 17
4
Examples
In principle, Dk (z) can be chosen freely. However, it is obviously advantageous to design the set of filters such that their outputs are linearly independent. The classical choice Dk (z) = z −k gives orthogonal uniform sampling in the time domain and a Fourier filterbank gives an orthogonal uniform sampling in the frequency domain. There is a large assortment of different time-frequency decompositions between these two extremes, see, e.g., [32]. However, due to the stability problems associated with non-allpass filters, we confine here only systems where Dk (z) are a set of allpass filters.
4.1 Orthogonal polynomial bases In control theory, there is a long tradition in estimating parameters for generalized autoregressive models which are related to certain classes of orthogonal functions. The contribution of the current paper is to point out that Algorithm 1 makes it possible to implement also synthesis filters for these systems. Ninness and Gustafsson [17] have proposed a unifying approach where Dk (z) =
q
k 1 − |λk |2 Y z −1 − λp . 1 − λk z −1 p=1 1 − λp z −1
(45)
If λk = λp = 0, ∀k, p this reduces to a traditional FIR filter. If λk = λp , ∀k, p this is the Laguerre model [18, 19] which has been extensively studied, e.g., in [20, 21]. Kautz models [22, 23] also follow from (45) by a suitable choice of parameters. A simplified version of (45) has been in various speech and audio applications [10, 33]. Here, a system with Dk (z) =
k Y z −1 − λ
p=1
1 − λz −1
.
(46)
is called a frequency-warped filter [34], and the corresponding modified LPC scheme is called warped linear predictive coding [9]. Equation (24) obviously holds and therefore 18
one can use the autocorrelation method of LP to compute filter coefficients. In this case, the synthesis filter is guaranteed to be stable. When z = e−iω , (46) yields Dk (e−iω ) =
k Y
λ sin(ω)
e−ip(ω+2 arctan( 1−λ cos(ω) )) ≡
p=1
k Y
e−ipν(ω) ,
(47)
p=1
or simply Dk (ω) =
k Y
e−ipν(ω) ,
(48)
p=1
where ν(·) is the frequency warping function. For example, if a Fourier transform is applied to the outputs of the filterbank in Fig. 1, then a warped spectral representation is obtained [35]. The spectral mapping is determined by the phase function of Dk (z) which may be controlled in this case by the value of λ. This is advantageous in speech and audio signal processing because it directly implements a non-uniform frequency resolution representation for LPC which can be adjusted so that it approximates roughly the frequency resolution of hearing [13].
4.2
Log-warped LPC
It is required that the warping function ν(ω) is a monotonic function with a range ν(ω) ∈ [0 , π] for ω ∈ [0 , π]. A true logarithmic function does not obey this because limx→0 log(x) = −∞. Therefore, we choose to approximate the logarithmic warping function by a piecewise function which is linear below a limit frequency ωp and logarithmic above that, such that the function and its first derivative are continuous. This function is given by
ν(ω) ≡ ω ˜=
aω,
if 0 ≤ ω < ωp
if ωp ≤ ω ≤ π π e with a = , and b = , ωp (1 + log(π/ωp )) ωp
(49)
π log(bω)/ log(bπ),
19
(50)
where e = exp(1). The inverse of (49) is given by ν −1 (˜ ω) =
ω ˜ /a,
if 0 ≤ ω ˜ < ν(ωp )
exp( ω˜ log(bπ) )/b, log(b˜ ω)
(51)
if ν(ωp ) ≤ ω ˜≤π
To implement log-warped LPC system, one has to design a set of allpass filters with phase responses given by n˜ ω . In this article, this was done using an allpass filter design algorithm introduced in [36] The transfer function of an nth order allpass filter is Dn (z) =
n Y 1 − λk z −1
k=1
z −1 − λ∗k
z −n P (z −1 ) = , P (z)
(52)
where it is assumed that P (z) is a polynomial of real-valued coefficients. Dn (z) is a combination of first-order blocks with real coefficients λk and/or pairs of complex-valued blocks with complex conjugate poles λk and zeros 1/λk , respectively. In practice, it is also necessary to implement a filter as cascade of first and second order blocks due to high density of poles in the filter. The network of Fig. 1 can be directly used for linear prediction by computing the correlation values from the outputs Dk (z) and forming normal equations where the coefficients ak of a prediction error filter can be solved. The z-transform of the prediction error is given by E(z) = S(z)(1 −
P X
ak Dk (z)) = S(z)A(z),
(53)
k=1
In this case the autocorrelation method of linear prediction is not available due to approximation errors in low-order allpass sections and therefore we are forced to use a version of the covariance method of LP. Direct application of the covariance method of linear prediction easily leads to an instable synthesis filter. There are many alternative ways to stabilize autoregressive models. In this article, an algorithm introduced in [26] was used. The method sets a side con20
Log−warped LPC
Fundamental Frequency (C) [Hz]
Fundamental Frequency (C) [Hz]
Conventional LPC
174
348
696
348
696
1392
1392 0
174
0.5 Norm. linear freq.
1
0
0.5 Norm. logarithmic freq.
1
Figure 5: Comparison of conventional (left) and log-warped (right) autoregressive modeling of a power spectrum. The order of the model is 40. The four test signals are musical C6 (or Am7) chords (C, G, C, E, and A notes at different octave ranges) constructed of pure tones. The frequency of the lowest C note is marked on the left side of each panel. The vertical grid-size is 40 dB.
straint to linear prediction equations which smoothes the produced spectrum model and, thus, usually leads to a stable and efficient solution. Figure 5 illustrates the performance of a conventional and a log-warped algorithm for a set of musical C6 chords constructed of pure sinusoids. The duration of each signal was 1000 samples and a 40th order smoothness constrained model was estimated from the signals. The estimated power spectra with the conventional method are shown on the left hand side of Fig. 5 and the log-warped spectrum estimates for the same signals on the right hand side of the figure. Figure 5 illustrates the benefits of using logarithmic frequency resolution in analysis of musical signals. In this case, musical chords have a similar subjective timbre and they have a similar spectrum shape in the log-warped spectrum model independently of the fundamental frequency. 21
Impulse response of a log−warped LPC filter 0.5
0
−0.5 100
200
300
400
500 600 700 Time [samples]
800
900 1000 1100
Magnitude [dB]
30 20 10 0 −10 0
0.2
0.4 0.6 Normalized frequency
0.8
1
Figure 6: The impulse response of a log-warped IIR-type filter shown in Fig. 5 (top). Frequency response of the filter on a linear frequency scale (bottom). Due to the smoothness constrained algorithm [26], all the parametric models in Fig. 5 are also stable. The impulse response of a synthesis filter corresponding to the second frequency response (C is 348 Hz) in the right panel of Fig. 5, is shown in the top panel of Fig. 6. The corresponding frequency response on a linear frequency scale is in the bottom panel of Fig. 6.
4.3 Arbitrary time-frequency LPC A large number of traditional IIR and FIR filters were tested as candidates for Dk (z) while writing this paper. However, the author was not able to find any class of bandpass subfilters where the synthesis filter would be stable for a large number of different signals. In LPC algorithms based on analysis-by-synthesis techniques, e.g., code-excited linear prediction [37], the instability of the synthesis filter is not necessarily a problem because the search for an excitation vector is based on the output of the synthesis filter. However, the instability of the filter would restrict the space of suitable excitation vectors and 22
therefore make the codec sub-optimal.
5
Conclusions
Conventional linear predictive coding techniques are based on computation of predictor coefficients from a sequence of adjacent signal samples. In this paper we studied techniques where the filter coefficient are computed from a generalized linear representation of signal history, that is, from outputs of a filterbank. These filter coefficients are associated with a modified prediction error and synthesis filters based on the same filterbank structure. In many cases the synthesis filter has delay-free loops for which an implementation technique is introduced in the article. Therefore, it is possible to design linear predictive coding algorithms based on this type of modified prediction scheme. The main advantage of this generalized scheme is the possibility to incorporate a priori information about the signal into the coding algorithm and to selectively enhance the spectral modeling process at some specific frequency regions of the input signal. This was exemplified in this article by designing a complete LPC system where the spectral modeling is performed on a near-logarithmic frequency scale. The main drawback in this scheme is that the stability of the synthesis filter cannot always be guaranteed.
Acknowledgments This work has been supported by the Academy of Finland and the GETA graduate school. The author is grateful to Vesa Välimäki and Timo Laakso who provided software for allpass filter design. The author also wishes to thank especially Unto K. Laine, but also many others for fruitful discussions and critical reading of an early versions of the manuscript. 23
References [1] B. S. Atal and M. R. Schroeder, “Predictive coding of speech signals,” in Proc. 1967 IEEE Conf. on Communication and Processing, 1967, pp. 360–361. [2] T. Kailath, “A view of three decades of linear filtering theory,” Trans. Information Theory, vol. 20, no. 2, pp. 146–181, 1974, Reprinted in [38]. [3] J D Markel and A H Gray, Linear Prediction of Speech, Communication and Cybernetics 12. Springer-Verlag, 1976. [4] N S Jayant and P Noll, Digital coding of waveforms, Prentige-Hall, New Jersey, 1984. [5] W. B. Kleijn and K. K. Paliwal, Eds., Speech Synthesis and Coding, Elsevier Science Publ., 1995. [6] H. Wold, A Study in the Analysis of Stationary Time Series, Almquist and Wiksell, Stockholm, Sweden, 2 edition, 1954, first edition in 1938. [7] A. N. Kolmogorov, “Stationary sequences in Hilbert space,” Bull. Math. Univ. Moscow, vol. 2, no. 6, 1941, English translation available in [38]. [8] N. Wiener, Extrapolation, Interpolation and Smoothing of Stationary Time Series with Engineering Applications, Technology Press and John Wiley & Sons, Inc., New York, 1949. [9] H. W. Strube, “Linear prediction on a warped frequency scale,” J. Acoust. Soc. Am., vol. 68, no. 4, pp. 1071–1076, October 1980. 24
[10] U. K. Laine, M. Karjalainen, and T. Altosaar, “WLP in speech and audio processing,” in Proc. IEEE Int. Conf. Acoust., Speech, and Signal Proc., Adelaide, 1994, vol. III, pp. 349–352. [11] A. Härmä and U. K. Laine, “A comparison between frequency-warped and conventional linear predictive coding,” IEEE Trans. Speech and Audio Processing, July 2001. [12] A. Härmä, “Implementation of frequency-warped recursive filters,” Signal Processing, vol. 80, no. 3, pp. 543–548, February 2000. [13] J. O. Smith and J. S. Abel, “Bark and ERB bilinear transform,” IEEE Trans. Speech and Audio Processing, vol. 7, no. 6, pp. 697–708, November 1999. [14] K. Tokuda, T. Kobayashi, S. Imai, and T. Chiba, “Spectral estimation of speech by mel-generalized cepstral analysis,” Electr. and Comm. in Japan, Part 3, vol. 76, no. 2, pp. 30–43, 1993. [15] J. Makhoul, “Linear prediction: A tutorial review,” Proc. IEEE, pp. 561–580, April 1975, Reprinted in [39]. [16] H. Hermansky, “Perceptual linear predictive (PLP) analysis of speech,” J. Acoust. Soc. Am., vol. 87, no. 4, pp. 1738–1752, April 1990. [17] B. Ninness and F. Gustafsson, “A unifying construction of orthogonal bases for system identification,” IEEE Trans. Automatic Control, vol. 42, no. 4, pp. 515–521, April 1997. [18] Y. W. Lee, Statistical theory of communication, Wiley, New York, 1960. [19] R. E. King and P. N. Paraskevopoulos, “Digital Laguerre filters,” J. Circuit Theory Applicat., vol. 5, pp. 81–91, 1977. 25
[20] B. Wahlberg, “System identification using Laguerre models,” IEEE Trans. Automatic Control, vol. 36, no. 5, pp. 551–562, May 1991. [21] T. Oliveira e Silva, “Optimality conditions for truncated Laguerre networks,” IEEE Trans. Signal Processing, vol. 42, no. 9, pp. 2528–2530, 1995. [22] W. H. Kautz, “Transient synthesis in the time domains,” IRE Trans. Circuit Theory, vol. CT-1, no. 3, pp. 29–39, 1954. [23] B. Wahlberg, “System identification using Kautz models,” IEEE Trans. Automatic Control, vol. 39, no. 6, pp. 1276–1282, June 1994. [24] J. Durbin, “The fitting of time-series models,” Rev. Inst. Int. Statist., vol. 28, no. 3, pp. 233–243, 1960. [25] A. Härmä, “Implementation of recursive filters having delay free loops,” in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Seattle, Washington, May 1998, vol. III, pp. 1261–1264. [26] G. Kitagawa and W. Gersch, Smoothness Priors Analysis of Time Series, Springer, New York, 1996. [27] N. Levinson, “The Wiener RMS (root mean square) error criterion in filter design and prediction,” J. Math. Phys., vol. 25, no. 4, pp. 261–278, 1947, Also an appendix in [8]. [28] U. K. Laine, “Famlet, to be or not to be a wavelet,” in Proc. Int. Symp. TimeFrequency and Time-Scale Analysis, Victoria, Canada, October 1992, IEEE, pp. 335–338. 26
[29] F. Itakura and S. Saito, “A statistical method for estimation of speech spectral density and formant frequencies,” Electr. and Commun. in Japan, vol. 52-A, pp. 36–43, 1970. [30] S Haykin, Modern Filters, Macmillan, 1989. [31] J G Proakis and D G Manolakis, Digital Signal Processing, Macmillan, 2 edition, 1992. [32] A. N. Akansu and R. A. Haddad, Multiresolution Signal Decomposition, Telecommunications. Academic Press, 1992. [33] M. Karjalainen, A. Härmä, J. Huopaniemi, and U. K. Laine, “Warped filters and their audio applications,” in IEEE Workshop Appl. Signal Proc. Acoust. and Audio, New Paltz, New York, October 1997. [34] W. Schüssler, “Variable digital filters,” Arch. Elek. Übertragung, vol. 24, pp. 524– 525, 1970. [35] A. V. Oppenheim, D. H. Johnson, and K. Steiglitz, “Computation of spectra with unequal resolution using the Fast Fourier Transform,” Proc. IEEE, vol. 59, pp. 299– 301, February 1971. [36] T. I. Laakso, T. Q. Nguyen, and R. D. Koilpillai, “Designing allpass filters using the eigenfilter method,” in Proc. Int. Conf. Acoust. Speech and Signal Processing’93, Minnesota, USA, April 1993, IEEE, vol. III, pp. 77–80. [37] M. R. Schroeder and B. A. Atal, “Code-excited linear prediction (celp): High quality speech at very low bit rates,” in Proc. of IEEE Int. Conf. on Acouctics, Speech, and Signal Processing, March 1985, pp. 937–940. 27
[38] T. Kailath, Ed., Linear Least.Squares Estimation, vol. 17 of Benchmark Papers in Electr. Eng. and Computer Science, Dowden, Hutchinson & Ross Inc., Pennsylvania, 1977. [39] R. W. Schafer and J. D. Markel, Eds., Speech Analysis, Selected Reprints Series. IEEE Press, New York, 1979.
28