Wireless speech coding: A Systematic Review

2 downloads 0 Views 109KB Size Report
Recent hot area of speech coding is code-excited linear prediction (CELP). .... noisy speech quality and get a new vocoder like sinusoidal or mixed excitation.
Wireless speech coding: A Systematic Review Swati Bahire Raina

Rohit Raina

Vinit Agarwal

MIET, Jammu

MIET, Jammu

SGVU, Jaipur

[email protected]

[email protected]

[email protected]

_____________________________________________________________________________________________ This paper examines the recent development in speech coding. Huge requirement of good quality voice, it becomes an important research area in computer science. The goal of this paper is to give summary of high quality speech signal for wireless telephony application. Recent hot area of speech coding is code-excited linear prediction (CELP). It is an emerging technique and works at a range of 2.4 kbps over the traditional vocoder method. Keywords: Speech Coding, CELP, low-bit rate speech. _____________________________________________________________________________________________

1. INTRODUCTION Speech Coding is a process of representing high-quality digital speech with low-bit rate. It is a process of obtaining a compressed speech signal to get the efficient transmission over band limited wire or wireless channels. It is one of the essential components of telecommunication and multimedia areas. So, to maintain quality of speech coding some standards has been set. There is a standard organization viz. International Telecommunication Union (ITU-T) which has processed these standards for the speech coding. This organization was initially well-known as International Telephone and Telegraphy Consultative Committee, CCITT. The latest survey of audio coding is given in [06], [11], [17] of this paper. Generally, in speech and audio coding it is very difficult to get exactly same or original sample after decoding. There is track off between bit-rate and recovered speech quality. The real time implementation of speech coding algorithm with digital signal is nothing but an extension of application in the word of telecommunication and multimedia. Wideband audio compression for high quality of voice production has became important action in last few decades. Most of the applications are based on telecommunication, multimedia, telephone, satellite, etc. The Motion Picture Expert Group [MPEG] of ISO has also developed an audio coding algorithm [21] of its own. The objective of this paper is to provide a review of different methodologies for voice coding method and algorithms. The paper is divided into four sections, second section describe related work about CELP. Section-III describes the future works and paper ends with the conclusion in section IV.

2. RELATED WORK Related work in the wireless speech coding is as follows: I. SPEECH CODING A. History: Atal and Schroeder [13], [14] invented the CELP which is similar to that work of Copperi and Sereno [20]. In fact MPLPC is a special form of CELP for multiple codebooks used. CELP is an algorithm in which we can get

very high quality speech at low bit-rate. The first paper on CELP coding was published by Atal and Schroeder [13], [14]. After that very soon many complexity method were reported [15], [16]. In next some years CELP combined with LPAS features for generation of excitation signal. The advantages of CELP coding have been developed to reduce the complexity error, increase robustness to channel error and get high quality. CELP is mainly adopted in telecommunication standard for speech coding based approach. The first adoption was done by U.S. Federal Standard 1016, a CELP algorithm which work on 4.8 kbps. B. Excitation codebook: In CELP, the Gaussian random number generated each code vector separately. It achieved many features like reduce complexity, storage space, sensitivity to channel error and increase speech quality. Lin says that an overlapped codebook technique reduces the storage [28]; they proved that in this technique, the samples are grouped together by performing a circular shift of samples. In Sparse excitation codebook, it reduces the search complexity and the code vector elements have zero value. Sparse codebook is proposed by Davidson and Gersho [43] and Lin [25]. The nonzero values of sparse codebook are then forced to be +1 or -1 by Lin [25] and Xyles [44]. This can be achieved by design of definite ternary codebook structure. Another approach of codebook is lattic which is generated for multiple dimensions by [42] and [41]. Adoul propose the use of lattic structure of excited codebook, in which they exchanged the symbol +1 or -1 with binary value 1 and 0. The binary value code vector with binary world is used to map excitation without any storage [40], [39], [38] and [37]. C. Pitch-periodicity: Another important method of CELP is Pitch-periodicity. Periodicity of speech coding uses high resolution or fraction pitch method proposed by Marques [21], [19]. They said that speech quality is gained by searching the resolution of pitch period to number of samples by means of interpolation. It increases the size of adaptive codebook with respective to bit rate for accurate speech excitation. In this adaptive codebook pitch value of every frame also increases the bit-rate. To reduce this, the differential coding of pitch in each subframe is done, used in FS 1016 CELP [26]. One of another technique called as generalized analysis or synthesis given by Kleijn who say that piecewise the linear segmentation to track the pitch and considerably reducing the bit-rate for pitch [27]. Recently a smoothing method was produced by Kleijn to get high periodicity. Shubam [22] proposed a new technique called gravelly who says that characters of voice speech in CELP are concealed by containing the gain of excitation. D. LP spectral parameter and its coding: In CELP, linear prediction parameter is used in modeling. The rate of transmission is about 20-30 m/s which take large amount of bit-rate and to improve this Vector Quantization (VQ) is used. This method is given by Bazo [27] based on quantization of line spectral pairs called as Line Spectral Frequencies (LSF). But the important study of LSF parameters is given by Kang and Fransen [29]. They are saying about the bit-rate of LPC which is about 800b/s and designed 12bVQ vocoder. Many other authors have also worked on it to get ‘transparency’ in quality of LP parameters at low bit-rate using multistage VQ. A multistage VQ also get some part of adaptive codebook introduced by Tanaka and

Tanichuchi [30]. There are many other methods reported in [36]. An LSF coding method which gets the history of speech spectral parameters was given by Xydas [31]. There are many examples, see [34], [23], [18]. II. VARIABLE-RATE SPEECH CODING: Digital communication speech coding is done at constant bit rate but in some applications in telecommunication & for digital storage variable bit-rate is required, so a Variable Bit Rate (VBR) coder is generated. VBR coder changes the bit-rate at interval after every 10ms. This bit-rate can be changed by two ways internally and externally. Internally it is changed by statistical character of in coming speech signal and externally by current traffic level in multiuser communication on network [33]. Now-a-days for many purpose variable rate speech coding is done like in speech storage. Digital Speech Interpolation (DSI), partially in Code Division Multiple Access (CDMA) becomes essential for VBR. Many others like ADPCM [32], sub band coding [24][35] can be used. One of the important features of VBR is voice activity detection [VAD] which is differential between speech segments and pauses (when user is silent and only background noise is present). VAD algorithm is achieved by low average rate without decrease in speech quality. III. VOCODER: Speech quality is obtained by waveform coding like CELP coding. It is generally disgracing rapidly as bit rate reaches below 4kbps. Vocoder provides information about spectral magnitude for decoding to synthesize speech. One of the best vocoder is LPC vocoder. LPC vocoder by Atal and Hanauer [10] has been used by U. S. Government for many years to secure voice communication. It is well known as LPC-10, as it uses 10th order liner prediction of speech production [41]. LP filter is used to synthesis the excitation signal unlike LPAS codes. The excitation is generated at output code without actually sending the bits which specify the excitation waveform. Each frame is divided as voice and unvoiced. In voice frame pitch period is specified where as for unvoiced frames there is random noise excitation.Vocoders have been studied from many years, most of these methods have poor response. Because of their poor quality, ‘unnatural’ and ‘buzzy’ sound it becomes difficult to recognize the original sound. Recently, a new vocoder algorithm with CELP coder has been introduced which as a bit-rate of about 2.4 kbps.

3. FUTURE DIRECTIONS The current state of speech and audio coding is the best performance algorithms. It achieves good quality of speech here, but its quality can yet be improved and the bit rate can also be reduced. At 2-3 kbps the quality of CELP is again reduce with noisy speech quality and get a new vocoder like sinusoidal or mixed excitation. PWI will also offer a good quality. At rate 1-kbps or below the vocoders operates on large segment of speech. Audio coding in range at 96-128kbps per channel for 15-20 kHz bandwidth can also be archived with good quality.

4. CONCLUSIONS From last few years the speech coding had a very good time especially with Wireless speech coding algorithms at low bit-rate 2.4 kbps. This paper presented a survey of various speech coding techniques that has been used in many systems.

REFERENCES 1. B.S. Atal, "The History of Linear Prediction," IEEE Signal Processing Magazine, vol. 23, no. 2, March 2.

3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35.

2006, pp. 154–161 M. R. Schroeder and B. S. Atal, "Code-excited linear prediction (CELP): high-quality speech at very low bit rates," in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 10, pp. 937–940, 1985. M.R. Schroeder “ Vocoder: analysis and synthesis of speech”, proc. IEEE vol.54, no 3,pp 720-734, may 1966. Kondoz. A. M., Digital Speech: Coding for Low Bit Rate Communications Systems, John Wiley and Sons Publishing, England, 1994 B.S. Atal,” High Quality Speech at Low Bit-rate: multipulse and stochastically excited linear prediction codres” in proc. Int. Conf on Speech processing (Toronto, Canada). J. D. Gibson, “Adaptive prediction for speech differential encoding system”, IEEE vol. 68 nov 1997. Atal, B. S., Cuperman, V., Gersho, A., Advances in Speech Coding, Kluwer Academic Publishers, Boston M. J. T. Smith and T. P. Barnwell, “Exact reconstruction techniques for tree-structured sub-band coders,” IEEE Trans Acousr., Speech, Signal Process., vol. ASSP-34, no. 3, p. 434, June 1986. Atal and S. L. Hanauer,” Speech analysis and synthesis by linear prediction of speech” vol. 55, 1971 R. w. Schafer and L R. Rabiner, “Digital representation of speech signal”, prov IEEE vol 63, pp 662-677. K. Brandenburg and G Stoll, “The ISO/MPGR audio codec: A genric standered for coding for High quality digital audio”, presented at 92nd Audio Engg. Soc. Conv. Austria. Code Excited Linear predition(CELP) high quality at very low bit-rate”, on Proc IEEE Int. Con on Speech and signal processing,mar 1985. G. Davison and A Gersho, “complexity reduction methods for excitation coding”, Japan 1986. L. A. Hernandez-Gomez f. Casajus-Quiors, “on the behaviou of reduced complexity CELP in Japan 1999. P. Noll,” wideband speech and audio coding,” IEEE in may 1993. J. S. Marques, J. M. TRibolet, “Pitch Predicition with fractional delay in CELP coding,” in Europen Conf on Speech Communication and Technology in Paris. M Copperi and D Sereno,” Improved LPC excitation based on pattern classification and perceptual criteria”, in Porc 7th Int. Conf. at Canada 1984. “Pitch Prediction with fractional delay in CELP coding” on Eropean conf. on Speech Communication. In 1989. “Constrained-excitation coding of speech at 4.8kb/s”, in proc, IEEE in 1990. R. Hagen and P. Hedelin,” Robust Vector Quantization in spectral coding”, in IEEE Int. Conf. in 2000. C. Xydeas, “An overview of speech coding Technique”, in Dig IEEE in 1999. W. B Klejin, D.J. Krasinski, “Improved speech quality in SECP”, in Int. Conf. in New York. J. P. Compbell, Jr. T. E. Tremain and V. C. Wetch” The DOD 4.8kbps Standard”. A Buzo, A H Gray,” speech coding based on vector quantization”, IEEE Trans. In 1980. G. S. Kang and L. J. Fransen,”Low Bit-rate encoder” based on line spectrum frequency”, in Nov 1984. Y. Tanaka and T Taniguchi,” Efficient coding for LPC parameter using adaptive prefiltering amd MSVQ with partially adaptive codebook”, in Int. Conf., 2000. R. Di Francesco, Variable rate speech coding with online segment” in Proc IEEE vol.1 pp 233-236. Y. Yatsuzuka,” Higly sentative speech dector and high-speech voice band in ADPCM system”, IEEE in apr1999. A. Gersho and E. Paksoy,” Variable speech coding for cellur network” in speech and coding for wireless network applications. 1989. R. J. Di Francesco,” Real time speech segment using pitch and convexity jump model”, in IEEE. S Saoudi, J. M Boucher,” A new efficient algorithem to compute the LSP parameter for speech coding”, in 1992. C. Lafleme, J. P. adoul” on reducing computational complexity of codebook search in CELP coder”, proc. Int . Conf. Apr 1990. M. A Ireton and C.S Xydeas,” On improving vector excited coder through the use f spherical lattics codebook. C. Lambin, J. P Adoul ,” Fast CELP coding based on the barmes-Wall lattice in 19 dim.” In Scotland. “On the structure of vector quantization,” IEEE Trans Theory in 1989.