theory and real time implementation of a celp ... - Semantic Scholar

2 downloads 0 Views 356KB Size Report
(BER) [2] and Code Excited Linear Prediction (CELP) [3] which differ principally in ... efficiently remove the maximum amount of correlation from the signal. ... The excitation codebook generally consists of a collection of random vectors, each of ...
THEORY AND REAL TIME IMPLEMENTATION OF A CELP CODER AT 4.8 AND 6.0 KBITS/SECOND USING TERNARY CODE EXCITATION C.S. XYDEAS, M.A. IRETON and D.K. BAGHBADRANI SUMMARY A geometrical description of the excitation codebooks required in CELP systems is used to demonstrate the equivalence of Gaussian, sparse and ternary excitation codebooks. An efficient algorithm for searching a ternary excitation codebook is presented and the real time implementation, on a single DSP device, of a 4.8 and 6.0 kbits/sec CELP speech coder is described. 1. Introduction During the past few years many new speech coding algorithms have been developed which utilize the principle of analysis by synthesis. These algorithms operate on successive short-term speech frames and calculate, at the transmitter, a set of quantized parameters which best reconstruct the input speech signal subject to some error criterion. This generally involves locally decoding the transmitted parameters and comparing the reconstructed signal with the desired signal. There are three main categories of such LPC systems, Multi-pulse [1]. Backward Excitation Recovery (BER) [2] and Code Excited Linear Prediction (CELP) [3] which differ principally in the way that the excitation signal, used to drive the vocal tract filter, is derived. All of these systems aim to produce good quality speech at low bit rates, typically less than 8000 bits/sec. Furthermore it is important that they can be implemented at a realistic complexity using current DSP technology. CELP offers a potentially powerful solution to the low bit rate speech coding problem although originally its algorithmic complexity appeared prohibitive for real-time implementations. In recent years, however. the CELP complexity issue has received considerable attention, and a number of simplifications to the original coding algorithm have been proposed [4,9]. This paper deals with both the theory of CELP and its real time implementation. A geometrical interpretation of the CELP excitation codebook is utilized to show that sparse excitation vector codebooks and ternary codebooks are equivalent, in terms of coding performance, to the initially proposed Gaussian random codebooks. This interpretation of the codebook structure not only predicts many of the observations made about the CELP coders but also leads to a design technique for very large virtual excitation codebooks, which require no storage and on which efficient non exhaustive search algorithms can be performed. This will, however, be the subject of a later paper. The main focus of this paper is the development of a real-time 6k bits/sec speech coder implementation. This has been successfully achieved using the AT & T DSP32 digital signal processor. The use of ternary codes has meant that all of the information relating to the excitation codebook could be stored in 1k words of memory (one 32-bit word per excitation vector). This approach permits a very fast codebook search and results in no loss of objective or subjective performance. In order to realize this real-time CELP codec, novel techniques for implementing the codebook search algorithm in a DSP environment are required. A method for calculating the codeword error criterion is presented for ternary codes which is as quick as the table look up procedures used in other implementations [5] without the added memory overhead. The ideas presented are quite naturally extended to the case of a 4.8 k bits/sec speech coder implementation. Such an implementation is described and the -results obtained from informal subjective listening tests are presented which compare the 6.0 and 4.8 k bits/sec codecs with log PCM operating at a variety of bit rates. 2. System Description and Theory

1

The principle of the Code Excited Linear Predictive Coder was first described by Atal [3] and its structure is given in Figure 1. It consists basically of three components, an LPC based vocal tract model, an excitation codebook and an error criterion which serves to select that codebook entry which best reconstructs the input speech signal. The vocal tract model utilizes both long term and short term linear prediction filters in an attempt to efficiently remove the maximum amount of correlation from the signal. For voiced speech the long term predictor can be considered as modeling the pitch periodicity within the speech signal. This is however, in some ways misleading since multiples of the pitch period may be utilized by the long term prediction filter. The excitation codebook generally consists of a collection of random vectors, each of which are constructed using samples from a set of Independent Identically Distributed Gaussian random variables (IID Gaussian rv's) having zero mean and unit variance. The motivation for this construction follows from the observation that the residual signal, obtained by inverse filtering the input speech signal through both the long and short term predictors, consists of IID Gaussian rv's with zero mean. The variance of these sequences is effectively normalized by the inclusion of the gain term (see Figure 1).

Figure 1 Analysis by synthesis in Vector Excited Vocoders The error criterion serves to compare the locally decoded speech signal with that of the original and enables the system to select the codebook entry which minimizes this error. The measure used is the mean squared error (mse) between the two signals. For perceptual reasons the error signal is weighted in such a way that the coding noise, in the recovered signal, is concentrated in the formant regions of the speech signal and can therefore be effectively masked by the powerful frequency components present in these regions. The mode of operation of the CELP coder can be briefly described as follows. First, the vocal tract model parameters are derived from a short segment of the input speech. Each of the excitation vectors stored in the excitation codebook is multiplied by its optimal gain factor and then used to excite the vocal tract model. The excitation vector which generates the synthetic output that most closely resembles the input speech segment is selected and its index is used to encode the excitation signal. Codebook search error calculation In matrix notation the excitation codebook search process can be stated as: find the codebook entry which minimizes: where H is the impulse response matrix of the combined long term and perceptually weighted short term linear predictor filters, x is the perceptually weighted input speech signal after the filter memory from previous frames has been removed, γ is the excitation gain, L is the number of entries in the excitation codebook and ║•║ denotes the operation of taking the norm of a vector. Minimizing (1) with respect to γ leads to

and

2

where T denotes matrix transposition. Minimization of (2) is equivalent to maximization of the expression:

which is still however a very complex operation to perform. This complexity can however be reduced by using the approximation [9]

where n is the length, in samples, of the excitation vectors, µi is the autocorrelation function of the combined filter impulse response and νik is the autocorrelation function of the codeword ck. 2.1

Geometrical description of the

excitation codebook

By considering a geometrical interpretation of the excitation codebook some interesting and useful insights into the nature of the distribution of the excitation codewords can be formed. Because of the existence of the gain term in the analysis by synthesis loop all of the Gaussian random codewords can be considered to have unit length (or energy) with no loss of generality. This leads us to the notion that each individual excitation codeword represents a point on the surface of a unit sphere in n dimensional space. Furthermore, due to the spherical symmetry of the multi-dimensional Gaussian distribution, these points must be uniformly distributed over the surface of the sphere. (This interpretation relies on the use of the Euclidean norm to generate the underlying metric space). Now, as the codebook is required to model "residual vectors", which are themselves IID Gaussian rv's and therefore uniformly distributed over the surface of the same sphere, the Gaussian random codebook can be considered in this sense, to be a good estimate of an optimal/residual excitation codebook. A significant simplification of the CELP system results from using sparse excitation codebooks. Sparse excitation codebooks consist of Gaussian random excitation vectors in which most of the excitation sample values have been set to zero. The number of non-zero excitation samples in the codebook entries will be referred to as the weight w of the vectors. Typical values of w are generally chosen to be in the range 3 to 6 for excitation vectors of length n = 40. An important point to note is that every codebook entry has the same weight. The generation of an individual sparse codeword can be considered as a two stage process. In the first stage the positions for the non-zero excitation amplitudes are chosen at random and in the second stage random amplitudes are assigned to these positions. Thus the first stage of the sparse codeword generation defines a subset of the n-dimensional sphere, from which the codebook entry is chosen at random. It is useful at this point to define: i) an s-code to be the binary code of length n and weight w which has one's corresponding to the positions of non-zero excitation pulses of a given codeword and zero's elsewhere. ii) an s-subset to be the set of all excitation vectors which can be described by the same s-code. Since the s-codes of the excitation codebook entries are random binary vectors, and the s-subsets generated by the s-codes are uniformly distributed over the surface of the unit sphere, then this technique for choosing 3

sparse codebook entries is as suitable as that of choosing full random vectors. Of course this is only valid so long as the number of available s-subsets is much' greater than the number of codebook entries required. A ternary excitation vector is derived from a sparse excitation vector by considering only the signs of the non-zero excitation amplitudes. That is each non-zero amplitude in the given excitation vector retains its sign and has its magnitude set to 1. (Alternatively the energy of the excitation vectors can instead be thought of as being normalized to one with no loss of generality again due to the existence of the coder gain term). Each ternary excitation vector can thus be considered to be a member of the set of all ternary codes of length n and weight w. Again, as in the case of sparse excitation codebooks, we define a t-subset to be the set of all ternary excitation vectors which can be described by the same s-code. Since the s-codes and the signs of the non-zero excitation pulses are selected randomly, it follows that the ternary excitation vectors are uniformly distributed over the surface of the n-dimensional sphere. As a consequence a ternary excitation codebook is "equivalent" to both the sparse excitation and Gaussian random codebooks. All of the discussion so far has been concerned with using these geometrical ideas to demonstrate the equivalence of ternary, sparse and Gaussian excitation codebooks. The concepts are, however, much more powerful than this and can be used to predict observations made by other researchers in the field of CELP coders. One such example, given later, is a theoretical explanation of the curve relating the observed SNR to the weight w of the excitation vectors [4]. This analysis can be performed for the case of ternary codes using the Euclidean metric space (6] which has been implicit in the previous discussion. A more general form of this analysis, which covers both ternary and sparse Gaussian excitation vectors, can be obtained if one uses an alternative metric to define the underlying metric space. Such a metric can be defined by using the Hamming distance on the s-codes corresponding to the given excitation vectors. The metric space obtained by using this metric will be referred to as the Hamming metric space. The use of this space may seem a little strange at first but can be justified as follows. All of the points belonging to a given s-subset or t-subset in the Euclidean metric space correspond to a single distinct point in the Hamming metric space and thus selecting distinct points, in the Hamming metric space, for codebook entries is equivalent to the restriction that no more than one codebook entry can be selected from any given s- or t-subset. The use of this Hamming distance metric on ternary excitation vectors can be further justified by consideration of the following theorem. Theorem I For any two distinct ternary excitation vectors c1, c2 de(c1;c2) ≥ dh(c1;c2), where de is the Euclidean distance and dh is the Hamming distance between the two points. Proof. Consider that s1 and s2 are the s-codes to the two ternary excitation codes c1, and c2. For any two such codes: For any two distinct binary codes b1 and b2 say: and since s-codes are binary codes this leads to the conclusion that for ternary codes

4

Since Theorem I states that any minimum codeword separation criterion used with the Hamming distance to design excitation codebooks implies a corresponding criterion defined using the Euclidean distance between points, where the measured Euclidean distance between any two points will be at least as great as the Hamming distance. In other words the Hamming distance is a "stronger" measure than the Euclidean distance. It must be remembered that this is only true, however, for ternary excitation vectors. The imposition of these ideas of a calculable distance between any pair of codebook entries allows a codebook design criterion to be specified. The most obvious such criterion is the definition of a minimum distance between pairs of adjacent codewords. Both the Hamming and Euclidean distance metrics can be used for this purpose. Let us suppose for a moment, that the set of codewords from which the codebook entries are chosen is finite. If this is the case then there is a non-zero probability that if the codewords are selected independently, then the same codeword may be selected more than once. For ternary codes it is always the case that there are only a finite number of possible distinct codewords. If sparse excitation vectors are used and the decision as to whether two vectors are distinguishable or not is to be based on the Hamming distance measure, then again there are only a finite number of distinct codewords available. This is because each s-subset maps to a single discrete point if the Hamming distance is used and there are only a finite number of s-subsets available. Now if a given codebook contains a number of non-distinct codewords as entries then all but one of these must be redundant. As a consequence, the codebook that one obtains by deleting all of the redundant entries will be equivalent to the original codebook and will in general have fewer entries. The number of entries in this smaller codebook will be referred to as the effective codebook size. This concept is useful since it leads to a probabilistic explanation of the curve relating SNR to the weight w of a sparse codebook, as first observed by Gersho [4]. Specifically, the process of selecting random excitation codewords for inclusion within the excitation codebook can be analyzed as a binomial process. For the Hamming metric space each s-code determines a n distinct point on the surface of the n-dimensional sphere and there are a total of N =   of these for a  w given excitation vector length n and weight w. Each of the vectors is equally likely and will be selected at random with probability p = 1/N. Let L be the number of excitation codewords selected randomly for the codebook and Nα be the number of codewords chosen α times after the L selections. Then the effective codebook size SE is given by since

but where P(0) is the probability that an excitation codeword is not chosen after L independent selections from all of the available codewords. Thus Using the Binomial theorem the probability P(0) after L selections can be specified as: combining equations 7 and 8 leads to 5

For excitation codebooks containing L = 1024 entries of length n = 40 Figure 2 can be obtained from equation 9 as the weight w of the sparse vectors is varied.

Figure 2 Comparison of the curves a) relating the SNR of the reconstructed speech to the excitation vector weight and b) relating the effective codebook size to the excitation vector weight. This figure clearly demonstrates that we should expect random codebooks composed of codewords with w ≥ 5 all to be equivalent. However for w < 4 the limited number of points in the space does not satisfy the condition stated earlier, that for random sparse codebooks the number of available codewords must be much greater than the required codebook size. For w = 2 for instance, with the conditions stated above, the number of available codewords is less than the required codebook size. This explains the drastic decline in performance observed for w = 2. These results also suggest that some of these problems can be avoided by designing codebooks using a set minimum distance criterion. This immediately distributes the excitation vectors in an efficient way within the excitation space and eliminates the possibility of a codebook containing multiple replicas of the same codeword. 2.2 Practical implementation of the codeword error calculation Much of the preceding section was concerned with a geometrical representation of the excitation codebook which justified the use of ternary excitation vectors. Furthermore the power of this representation to examine and explain excitation codebook structures such as the observed relationship between the SNR of the encoded speech and the weight of the excitation vectors was demonstrated. All of this does not however suggest how to efficiently use the properties of the excitation codevectors discussed, in a practical, real-time CELP implementation. Since the greatest contribution to overall system complexity is provided by the excitation codebook search algorithm, in a CELP coder, our attention will be focused in this section on techniques which allow the efficient formulation of the codebook search error measure formed between the input and reconstructed speech segments. The error calculation that must be performed for every excitation vector is given, by combining equations 4 and 5 as:

where E' represents the approximate error measure. By noticing that xTH is constant within each excitation analysis frame, (i.e. it does not vary for different excitation codebook entries) and that there are only typically 4 non-zero entries for every codeword ck it is self evident that equation (10) is ripe for further simplification. With a little thought it is easy to show that 6

for a sparse codebook with w = 4 say, then most of the νik values must be zero. In fact for w = 4 only at most 6 of these values can be non-zero for any given codeword. A simple approach for calculating equation 10 is therefore to store the non zero values of νik along with their positions i together with the positions and amplitudes of the excitation vectors in memory. This will be referred to as the "table look up approach". The vector XTH and the autocorrelation function of the speech signal µi are pre-calculated and stored prior to the execution of the codebook search procedure. This imposes a small, constant overhead on the system complexity and will be ignored in the complexity figures given. Table 1 gives a breakdown of how the estimated complexity of the "table look up approach" is distributed between the various parts of the codeword error calculation in terms of operations per codeword. This must be multiplied by the size of the codebook in order to obtain an estimate of the number of operations required for each excitation frame. Tables 1, 3 and 5 also ignore any overheads introduced by latency or looping effects since these are fairly constant for the 3 algorithms described but vary significantly between competing DSP devices. As a further note of explanation it should be pointed out that the error measure denominator that is calculated is always half of that indicated in equation (10) and hence does not affect the search process and that the need for the division operation can be avoided. It is also assumed in the complexity calculation that there are 6 non-zero values of νik in order to avoid the loop overheads that would have to be included if this number was allowed to vary. Table 1 Action being Performed Fetch excitation pointers from memory Calculate (xTH)ck Calculate Σµiνik Fetch correlation lag i Fetch correlation νi Add ½µ0ν0k

Number of operations/codeword 4 4

Total

6*1 6*1 1 21

This "table look up approach" also requires a substantial amount of memory in which to store both the excitation vectors and their associated autocorrelation functions, a total of 13½ words per codebook entry (each word consists of 32 bits). A breakdown of this requirement is given in Table 2. It is assumed that positions are in the range 1,...,40 and can therefore be stored in one byte of memory (8 bits). The amplitudes however are floating point values and require a 32 bit word each. Table 2 Item to be stored

Number of 32 bit words 4*¼ 4 1+6 6*¼ Total 13½

Excitation positions Excitation amplitudes ν0k and νik νik positions

The memory storage requirement can be drastically reduced by calculating the vik values on line. This is however at the expense of considerable additional computational complexity. The summation part of equation 5 can be re-written as: 7

where ckj is the i'th sample of the k'th excitation codeword. Using the change of variable l = j-i and by noting that the summation need only be performed for those terms in which both ckj and ckj-1 are non-zero, equation 11 becomes:

where pi i=1,2,..., w are the positions in ck of the non-zero excitation pulses. Tables 3 and 4 give the respective complexity and memory requirement of using this approach. Table 3 Action being performed Fetch excitation positions from memory Calculate (xTH)ck Calculate ν0k Calculate Σµiνik Subtraction pj-pl Multiplication ckpj•ckpl Multiplication µ( p j − pl ) ⋅ c kp j ⋅ c kpl and sum

(

Add ½µ0ν0

Number of operations/codeword 4 4 4 6*1 6*1 6*1

)

k

Total

1 31

Table 4 Item to be stored

Number of 32 bit words 4*¼ 4 Total 5

Excitation positions Excitation amplitudes

When ternary excitation codes are used it is not only possible to reduce the memory requirement still further but also to reduce the complexity to the same as that for the "table look up approach". Since every ternary code has w excitation pulses, each with unit magnitude, ν0k is constant for every codeword having the value ν0 = W, and need not therefore be calculated. A ternary excitation code is completely defined by the position and sign of each of its non-zero excitation pulses. This information is most conveniently stored in a byte format with the least significant bits of the byte containing an excitation pulse position and the most significant bit determining its sign. Thus it is possible to store an entire ternary excitation vector of weight 4 in four bytes of memory. This corresponds to just one 32-bit word and represents an 80% improvement on the storage required for a Gaussian sparse codebook and an improvement of more than 90% when compared with the "table look up approach". If the positions of the excitation pulses are stored in decreasing order (for the previous techniques the order is of no consequence) then the value of pj-pl calculated using equation 12 will always be positive since j is always less than 1. Furthermore the binary subtraction performs an "exclusive or operation on the pulse 11

8

amplitude sign bits stored alongside the pulse position data and since pj-pl is always positive this operation is never interfered with by carry bits being propagated from less significant bits. Thus the sign bit of the result of the subtraction Pj-Pl, where Pi and Pl are pi and pl with the respective sign bits added, represents the sign of the multiplication ckpj•ckpl. Since the amplitude of this result must be one then the subtraction Pj-Pl calculates both the correlation lag and the value of ckpj•ckpl and hence only one operation is required to calculate two results. The resulting breakdown of the number of operations required for the calculation of the excitation codebook search error criterion is now given in Table 5 with the corresponding breakdown of the memory requirement for storing the excitation codebook being given in Table 6. Table 5 Action being performed

Number of operations/codeword 4 4

Fetch excitation positions from memory calculate (xTH)ck Calculate Σµiνik Subtraction pj-pl Addition ± µ( p j − pl )

6*1 6*1

Add ½µ0ν0k Total

1 21

Table 6 Item to be stored

Number of 32 bit words 4*¼ 1

Excitation positions and signs Total

Thus we have argued in this section that sparse excitation vectors can be replaced by ternary excitation vectors with no loss of speech quality. Furthermore this change leads to an efficient codebook search error criterion calculation algorithm and a more than 13 fold reduction in the codebook memory storage requirement at the transmitter and a 5 fold reduction at the receiver over other comparable techniques. The next section describes the real-time implementation of both the 6.0 and 4.8 kbits/sec speech coders, using the above approach, on an AT&T DSP32. 3. Real-time Implementation of 4.8 and 6.0kb/sec CELP Coders. The main aim of our work has been to produce a practical real-time implementation of a 6 kbits/sec CELP coder capable of producing commercially acceptable speech quality with the design constraint that the solution should require only a single DSP and the minimum of extra external memory. The AT&T DSP32C was chosen as the implementation device because of its speed and floating point architecture. The ready availability of tried and tested development hardware and software for the non CMOS devices was a further important criterion. The software for the real-time speech coder has been produced on a development kit containing a 250ns DSP32 and operates at approximately one third real time. Real time performance will however be achieved with the 80ns CMOS device. An important consequence of the floating point architecture was that numerical results, identical to those of a Fortran simulation, were produced.

9

This proved to be an invaluable debugging aid, though the advent of efficient C compilers for these devices is rendering the distinction between simulation and implementation obsolete. The main characteristics of the 6.0 kbits/sec coder can be briefly summarized as follows. The input speech signal is band limited to 3.4kHz and sampled at 8 K samples/sec. The LPC analysis is performed using the Burg algorithm, with the coefficients produced being quantized with 52 bits using log area ratios. The LPC analysis frame length is 240 samples (30ms). The long term predictor is updated (outside the analysis by synthesis loop) every 120 samples (15ms) and has an allowable delay range of between 16 and 143 samples. The pitch delay is calculated using a simple maximum covariance technique and the 3 filter tap coefficients are estimated using an autocorrelation approach. The central LTP filter tap is restricted to lie in the range [0,1] with the other two taps being confined to the range [-0.8, 0.8]. Each tap is quantized using a 4-bit uniform quantizer. The excitation signal is updated every 40 samples (5ms) and is modeled using a 10 bit ternary excitation codebook containing excitation vectors of weight 4. For the search process a perceptual weighting factor of 0.9 is used. The excitation gain is quantized using a logarithmic quantizer with 5 bits. These bit rate allocations are summarized along with those for a 4.8 kbits/sec coder in Table 7. Table 7 Parameter

Frame length Number of (Samples) bits/frame

Number of bits/sec

6.0 kbs-1 coder

STP coefficients LTP coefficients LTP delay Codeword Index Codeword Gain

240 120 120 40 40

52 12 7 10 5

1733 ⅓ 800 466 ⅔ 2000 1000 6000

4.8 kbs-1 coder

STP coefficients LTP coefficients LTP delay Codeword Index Codeword Gain

256 128 128 64 64

54 12 7 10 5

1687 ½ 750 437 ½ 1250 625 4750

The implementation technique used for the 4.8 kbits/sec speech coder implementation is analogous to that used for the 6.0 kbits/sec coder. We have found that it is subjectively preferable to save the bits, necessary to achieve a coder operating at 4.8 kbits/sec, by extending the analysis frame lengths, with the main saving being obtained by increasing the excitation frame length from 40 to 64 samples. Other researchers in the field [8] have attempted to save these bits by using vector quantization of one or more of the above sets of parameters, usually the long or short term predictor coefficients, but we have found these techniques to be unsatisfactory [7]. Informal subjective listening tests have been performed in which the quality of the speech generated by the two speech coders described here has been compared to speech obtained from using log PCM at a variety of bit-rates. These results show that our 6 kbits/sec CELP coder is judged to be subjectively superior to Log PCM coded speech using 6 bits per sample but inferior to that encoded using Log PCM with 7 bits per sample. For the 4.8 kbits/sec CELP codec the speech quality was judged to lie between that of speech coded using Log PCM with 5 and 6 bits per sample. 10

4. Conclusions. In this paper we have demonstrated the equivalence of ternary excitation codebooks to both the original full gaussian excitation vectors and sparse gaussian excitation codebooks. We have also, using the same concepts, been able to explain the observed relationship between the SNR of the reconstructed speech segments and the weight of the excitation vectors in the codebook using a probabilistic analysis of codebook construction. The use of ternary excitation vectors in the excitation codebook has allowed us to produce a 4.8/6.0 kbits/sec implementation which has a very small memory requirement and a low computational complexity and can, therefore, be realized using a single DSP device. This has enabled us to search a 1024 entry excitation codebook and to produce good quality speech at these low bit rates. The main limitation of CELP systems at present is their poor performance at encoding the higher frequency components of the speech signal. We are currently investigating a number of techniques that will reduce this problem and enable toll quality speech to be generated using the CELP algorithm for bit rates in the range 4.8 to 6.0 kbits/sec. References 1. B.S. Atal, J.R. Remde, "A New Model of LPC Excitation for Producing Natural Sounding Speech at Low Bit Rates", ICASSP 82, pp 614-617, 1982. 2. N. Gouvianakis, C.S. Xydeas, "Advances in Analysis by Synthesis LPC Speech Coders". Special Issue of IERE on Mobile Radio, January 1988. 3. B.S. Atal, M.R. Schroeder, "Stochastic Coding of Speech Signals at very Low Bit Rates". Proc. ICC, pp 1610-16139 1984. 4. G. Davidson, A. Gersho, "Complexity Reduction Methods for Vector Excitation Coding". Proc. ICASSP 86, pp 3055-30589 1986. 5. G. Davidson, M. Yong, A. Gersho, "Real-Time Vector Excitation coding of Speech at 4800 bps", ICASSP 879 pp 2189-2192, 1987. 6. M.A. Ireton, "Low-Bit Rate Speech Coding", Thesis, to be submitted in 1988 at The University of Manchester. 7. D.K. Baghbadrani, "Low-Medium Bit Rate Speech Coding", Thesis, to be submitted in 1988 at the Loughborough University of Technology. 8. P. Kroon, B.S. Atal, "Quantisation Procedures for the Excitation in CELP coding of speech", Proc ICASSP 87, pp 1953-1956, 1987. 9. I.M. Trancoso, B.S. Atal, "Efficient Procedures for Finding the Optimum Innovation in Stochastic Coders". Proc ICASSP 86, pp 2375-23789 1986. Acknowledgements This work has been sponsored by Signal Processors Ltd, Cambridge, England.

11

Suggest Documents