2014 INTERNATIONAL CONFERENCE ON COMPUTATION OF POWER, ENERGY, INFORMATION AND COMMUNICATION (ICCPEIC)
Performance Evaluation of Code Excited Linear Prediction Speech Coders at various bit rates Aney S. Anselam Dept. of Electronics and Communication Engg. Mar Baselios College of Engg. & Teclmology, Thiruvananthapuram, India Email:
[email protected]
Abstract
-
Attractive improvements have been made during
these days in coding speech with high quality at low bit rates and low delay. The need for low bit rate speech coding algorithms
continues,
supported
by
the
ever
increasing
number of users to the wireless communication networks. This paper discusses the implementation details of Code Excited Linear Prediction Speech Coder at different bit rates (16Kbps, 9.6Kbps, 7Kbps, 6.8Kbps, 4.9Kbps and 4.8Kbps) and analytical evaluation of performance in terms of bit rate and quality using PRAA T software.
Keywords: Speech coding, CELP, Code book, LPC.
I. INTRODUCTION Speech coding is the process of obtaining the compact representation of speech signals for efficient transmission on channel and for storage. Speech coding finds its important role in modern voice - enabled teclmology like digital speech communication, voice over Internet Protocol (VolP) etc. The techniques developed for speech coding can also be applied to other application areas like speech synthesis, audio coding, speech recognition and speaker recognition. The prime goal of speech coding is either to maximize the perceived quality at a particular bit rate or to minimize the bit rate for a particular perceptual quality by maintaining low coding delay. All speech coders are designed to reduce the bit rate below reference of 128 Kbps [I] [4]. Wave form coders are made to preserve the original shape of the signal waveform and is better suited for high bit rate coding (;:, 32 Kbps). Signal to noise ratio can be evaluated to measure the quality of waveform coder. Vocoders/parametric coders are based on lossy compression algorithms in which no attempt is made at reproducing the exact speech waveform at the receiver, but to create perceptual equivalents to the signal. So the signal to noise ratio is a useless quality measure. In these coders, the speech signal is represented by a set of parameters of a model from which the speech is considered to be generated. These parameters estimated from input speech is encoded and transmitted as bit stream. Accuracy and sophistication of the underlying model controls the perceptual quality of the decoded speech. The most successful parametric coding is Linear Prediction Coding (LPC). One of the most effective coding methods for low bit rates is Code Excited Linear Prediction (CELP) proposed
Sakuntala S. Pillai Senior Member IEEE Dept. of Electronics and Communication Engg. Mar Baselios College of Engg. & Teclmology, Thiruvananthapuram, India Email:
[email protected]
by Atal and Schroeder [1]. It can produce low bit rate coded speech which is comparable to that of medium rate waveform coders and there by act as the bridge for the gap between wave form coders and vocoders. Successful standardization of low rate speech coding is going on, but the goal of toll - quality low-rate coding continues to provide a research challenge. The CELP coding technique splits the input sampled speech into blocks of samples to form vectors. This is based on analysis by synthesis search procedures, linear prediction and perceptually weighted vector quantization. The name "code -excited" originated from the excitation codebook, having codes to excite the synthesis filters. In the challenging field of modern voice - enabled communication CELP is very important because it eliminates the voiced I unvoiced classification, preserve the partial phase information of original signal and is effective for mid -rate region. The design and analysis of various types of speech coders are available in the literature, but a comparative analysis of CELP codec for various levels of performance is not available. In this work the performance evaluation of CELP coder is done for various bit rates. The parameters which control the bit rate and perceptual quality are explained with experimental support in this paper. II. CELP ENCODER Input speech is segmented into frames of 240 samples (20 to 30ms). Each frame is subdivided to form subframes of 60 samples and duration 5 to 7.5ms. Once the input speech is broken into samples, the Short-term linear prediction and long term linear prediction are performed. The detailed block diagram of CELP encoder is shown in Fig 1. The Short term linear prediction analysis is done to obtain LPC coefficients, which contain the formant information. These LPC coefficients are given to the prediction error filter to perform inverse filtering. This removes the formant infonnation from the subframes giving a resultant signal, from which long term informations (pitch, intensity etc) are extracted through long term linear prediction analysis [2]. The input frame is also given to the perceptual weighing filter. The filter weights and gain are computed based on a perceptual compression algorithm. The short-term prediction coefficients (i.e. LPCs) and the long-term prediction coefficients are given as input to formant
978-1-4799-3826-1I14/$3l.00mOI4 IEEE
93
ANCY S. ANSELAM et aZ.: PERFORMANCE EVALUATION OF CODE EXCITED LINEAR PREDICTION SPEECH CODERS
�nput PCM
r o " '"
Synthesized. speech
m
3 �
,---�==�r====1---tI�
Figure 2.
Block diagram of CELP decoder
3