6'th International Symposium on Telecommunications (IST'2012)
Secure Data Communication through GSM Adaptive Multi Rate Voice Channel M. Boloursaz, A. H. Hadavi, R. Kazemi, Student Member, IEEE, and F. Behnia, Senior Member, IEEE Department of Electrical Engineering Sharif University of Technology Tehran, Iran Email:
[email protected] Abstract— This paper considers the problem of digital data transmission through the Global System for Mobile communications (GSM) for security applications. A data modem is presented that utilizes codebooks of Speech-Like (SL) symbols to transmit data through the GSM Adaptive Multi Rate (AMR) voice codec. Using this codebook of finite alphabet, the continuous vocoder channel is modeled by a Discrete Memory less Channel (DMC). A heuristic optimization algorithm is proposed to select codebook symbols from a database of observed human speech such that the capacity of DMC is maximized. Using the DMC capacity, a lower bound on the capacity of the considered voice channel can be achived. Simulation results show that the proposed data modem achieves higher data rates and lower symbol error rates compared to previously reported results while requiring lower computational complexity for codebook optimization. Keywords- Data modem; Adaptive Multi Rate vocoder channel; capacity maximization; speech-like codebook optimization
I.
INTRODUCTION
Nowadays, with the increased penetration of wireless mobile networks, voice channels are widely available ubiquitously. Previous research [1], [2], [3] has demonstrated that it can be possible to use such channels for data communication in low-rate applications. Although high speed data dedicated channels are currently available, using compressed voice dedicated channels for data communication is still attractive because of wider coverage, better service availability, improved QoS (Quality of Service), enhanced reliability as well as robustness. These advantages are further described below: First, the wider coverage of mobile voice channel makes the proposed data channel available everywhere, especially in areas where data dedicated networks have not yet become widespread. Second, in order to maintain the required speech quality, voice channels are designed for superior QoS in comparison with data services. Generally, voice traffic is prioritized over data connections on networks where both exist. This leads the proposed low-rate data channel to be a reliable and consistent real-time channel with improved QoS especially in network congestion periods. Finally, the proposed modem will provide an end to end secure data channel by application of end to end encryption as well as avoiding common network gateways. Previous research on this subject has typically focused on the GSM voice channel as the compressed low-rate voice medium. A data modem of the type described here has been suggested for transmission of encrypted end-to-end secure
978-1-4673-2073-3/12/$31.00 ©2012 IEEE
voice or data over the GSM voice channel in [1], [4], [5]. This application is of central interest because it is evident that GSM network operators and third party attackers can easily eavesdrop all parlances. Using the proposed modem in combination with a low-rate vocoder such as MELP, makes transmission of encrypted voice through the GSM network possible. This modem has also been proposed for real-time and secure transmission of Point-of-Sale (POS) transaction information from the POS terminal to the financial host through the GSM voice channel [6]. It has also been proposed for NAT (Network Address Translator) traversal allowing mobile users to set up a direct P2P connection with minimal user intervention and without connecting to a middle server by making an end to end hole through the NATs[7]. Providing data connection over voice dedicated channels is a challenging problem due to unpredictable distortions caused by voice codecs used in such channels to the transmitted waveform. Such distortions can be summarized as follows: First, the vocoder speech coding process involves the lossy LPC-based analysis by synthesis compression technique that guarantees perceptual resemblance of the synthesized speech to the original one but the two waveforms differs strongly when compared sample by sample. Second, further loss of information of the original speech waveform occurs due to lossy vector and non-vector quantization of the extracted speech parameters in the vocoder speech coding procedure. Third, the vocoder compression/decompression process is a nonlinear process with long memory. Nonlinearity is due to quantization of extracted speech parameters and memory is caused by differential encoding of these parameters. Fourth, both linear prediction and differential encoding of extracted speech parameters assume that the neighboring input samples are highly correlated. This assumption holds for voice but not for regular data signals. In fact, the more data modulated within the voice bandwidth, the less correlated the adjacent samples are and consequently, the less the signal fits into the vocoder speech coding model. This leads to more distortions caused by the vocoder which in turn causes higher detection error rates. In this study, the previously proposed methods for the problem of data communication through compressed voice channels are classified to three different groups due to the conceptual idea they are based on. These three groups are referred to as “Parameter Mapping”, “Codebook
1021
In this research, the Codebook Optimization approach is taken because it proved to achieve higher throughputs and lower error rates. A heuristic codebook design algorithm is demonstrated to maximize the achievable data rate of this approach from an information theoretic point of view. The search space is confined to a data base of observed human speech to guarantee that the transmitted signal is classified as voice at the channel VAD block. Simulation results on the GSM Adaptive Multi Rate (AMR) voice codec [15] shows that in comparison with previous works, the proposed algorithm converges to codebooks with higher data rates and lower error rates in less number of iterations. The rest of this paper is organized as follows. Section 2 outlines the general modem structure and the encoding/decoding process. Section 3 presents the codebook symbol generation process. Section 4 describes the results of numerical simulations and compares them to previous studies. Section 5 concludes the paper and considers directions for further research. II.
GENERAL MODEM STRUCTURE
As mentioned earlier, the approach taken in this research is to design a set of predefined waveforms as symbols that can successfully pass through the desired vocoder and are separable after passing it. The data are mapped onto these symbols on the transmitter side and extracted on the receiver side. The codebook is a table containing symbols of samples each. The encoder and decoder are mappings E:{ } → and D: → { } that map the input data onto the codebook symbols and extract the output data from the received signals respectively. So the transmitter converts the input data bit stream into decimal messages of nR bits each and transmits through the vocoder channel. On the receiver side, message is detected from the . Finally, the by received distorted symbol estimate of the transmitted bits are extracted from . The overall block diagram of this system is shown in Fig. 1. Therefore, the main problem is to devise the codebook and the mappings E and D for efficient data communication through the desired vocoder.
Codebook Input Data Stream 00101110 (a)
(b)
Output Data Stream 00101110
Bit to Message Converter
Encoder
Message to Bit Converter
Decoder
GSM Voice Channel
Optimization” and “Modulation Optimization” methods and are described below respectively: The “Parameter Mapping” method involves mapping of the input bit stream onto speech parameters of some speech production model. In this approach the modulator is a signal synthesizer which uses the input data to synthesize a speechlike signal based on the speech production model. Katugampala et al. [1], [4], [5] mapped the input data on Pitch Frequency, Line Spectral Frequencies (LSF) and Energy of speech frames. This approach achieved a raw data rate of 3kbps with 2.9% Bit Error Rate (BER) on the GSM EFR voice channel. A similar method is utilized by [8] and [9] for data communication through codecs that apply Algebraic Code Excited Linear Prediction (ACELP) speech coding technique. This approach modulates the input data stream on algebraic codebook pulse positions making a PPM signal. This method achieved a raw data rate of 3kbps with 1.2% BER on the GSM Enhanced Full Rate (EFR) voice channel. The “Codebook Optimization” method is to encode the input bit stream onto a codebook of predefined waveforms (symbols) that are optimized for efficient communication through the desired voice codec. Sapozhnykov et al. [10] also numerically optimized the symbol waveforms by a pattern search algorithm over all symbols of a fixed length that meet the voice codec limited bandwidth constraint. This modem was tested on GSM HR, FR, AMR and EFR voice codecs and achieved a throughput of 4kbps with 2.5% symbol error rate (SER) on EFR voice codec. Shahbazi et al. [3], [11] generated the codebook by searching for the best performing symbols from amongst a database of waveforms derived from observed human speech. This modem was tested on different compression rates of GSM AMR and EFR voice codecs and achieved a raw data rate of 2kbps with SER of and on AMR 12.2 and EFR voice codecs respectively. Finally, the “Modulation Optimization” method is to use common digital modulations without or with little modifications and optimize their parameters for efficient data communication through compressed voice channels. Chmayssani et al. [12], [13] used the conventional FSK and QAM to modulate the input bit stream into audio signals and achieved a data rate of 3kbps with BER on GSM EFR voice channel. Dhananjay et al. [14] modified the common FSK and proposed the Hermes technique that achieved a throughput of 1.2kbps with BER on simulations with GSM EFR speech coder. The first approach of speech-like signal synthesis based on the use of speech models has the general advantage of generating audio signals that are voice like, and hence should trigger voice activity detection (VAD) systems in the voice networks to classify the signal as voice and not shut the signal transmission off as they would with noise-like signals. For the other two approaches the modulated signal is not as voice-like however, and the VAD is generally not triggered by the generated signals. For such approaches some further modifications may be needed to bypass the network’s VAD system.
Codebook
Figure 1. General modem structure. (a) transmitter, (b) receiver
1022
Looking at the problem from an information theoretic point of view, the random variables and are the input and output of the vocoder channel and the mappings E and D are the channel encoder and decoder functions. Regardless of the channel memory (considering the vocoder as a memory less channel), the maximum achievable rate R is the channel capacity given by And the mapping D determines the transmitted symbol at the receiver such that and are jointly typical. Equivalently, according to the classic detection theory the index of the transmitted symbol is selected by Maximum a Posteriori (MAP) detection given by Eq. (1) and (2) were applicable if the conditional probability distributions or were known, but this is not true for the vocoder channel. The problem gets more complicated considering the unknown model of channel memory where M denotes the set of previously transmitted symbols. Previous studies have modeled the total channel distortion caused by long memory, lossy compression and anticausality of the vocoder channel by an effective Gaussian random noise not necessarily centered on zero. LaDue et al. [2] supports this model by stating that proved to have a bell shaped distribution of finite variance by simulation experiments of modulating statistically efficient amount of random data on symbols, passing them through the vocoder and observing the output symbol histograms. Applying this model and assuming equal power for all codebook symbols, (2) gives where represents vector dot product. But (3) is true if and only if the assumed noise variance is equal for all codebook symbols, however simulation results show the opposite. Neglecting this fact, the same matched filter detector
as in (3) is used in this research. This completes the basic principles of modem operation. III.
CODEBOOK OPTIMIZATION PROBLEM
A. Objective Function The general modem structure described above is depicted in Fig. 2. As seen in this figure, the transmitter codebook symbols are represented by a random variable with alphabets from the set Similarly, the channel output symbols are represented by the random variable taking values from the set In fact, applying the modem structure demonstrated in section II and using the above notation, the vocoder channel is modeled by a Discrete Memory less Channel (DMC) (Fig. 2) with its probability distribution dependent on the choice of codebook symbols. The capacity for this channel is obtained by Assuming is constant, maximization of leads to minimization of . This assumption holds if the random variable is uniformly distributed which is true in most practical systems that apply scrambling, encryption, etc. on the input bit stream. Therefore, the goal of optimization is to find a set of codebook symbols that minimize or equivalently maximize the DMC channel capacity . It is obvious that depends on joint probability distribution of & . This probability distribution is not known for the vocoder channel but a histogram of it can be derived by simulation experiments on the vocoder channel simulator. This empirical histogram can be represented by a matrix whose elements denote the empirical probability of a transmitted symbol that is decoded as .
Figure 2. Discrete memoryless model of the vocoder channel
1023
An error occurs if and only if . This empirical probability distribution can be evaluated for any choice of codebook symbols and forms the basic idea of the proposed optimization algorithm illustrated in section C. Previous research suggest the empirical Symbol Error Rate (SER) as the cost function for minimization, but clearly minimization of is more general and leads to minimized SER. Shahbazi et al. [3] solves suboptimum problem by maximizing the Euclidean distance between codebook symbols. Though, experiments on the vocoder simulator imply that maximizing this Euclidean distance is not equivalent to a reduced SER due to imperfectness of the Gaussian noise model for channel distortions caused by vocoder long memory, lossy compression and anticausality. B. Search Space As noted before, the codebook symbols are chosen from TIMIT which is a data base of human speech. Limiting the search space to human voice samples has 2 main advantages described below: First, it cause the modulator output to have voice like properties i.e. it will fit completely to the voice frequency band in order not to be filtered by the channel, provide a dynamic spectrum to guarantee that it will be classified as voice by the channel VAD and finally meet the assumption of strong correlation among neighboring input samples to completely fit to the channel LPC model. Second, it reduces the search space to a subset which most probably contains the global minimum. As a result, the complexity of the search algorithm is dramatically decreased without noticeable performance degradation. LaDue et al. [2] and Sapozhnykov et al. [10] do not assume this search space reduction in the codebook design procedure and solve a more complex and time consuming optimization problem. In this research it is shown that this search space reduction used with the proposed algorithm leads to better data rate and BER performances with less search complexity. Shahbazi et al. [3] uses the same TIMIT database as the search space. The optimization process addressed in that work consists of a preprocessing stage followed by a symbol selection procedure. The preprocessing stage itself consists of pitch modification, filtering, power normalization, phase continuity adjustment and removal of similar symbols. Finally, the symbol selection procedure introduces an algorithm that reaches a suboptimal solution. In this work, the search space is further reduced using a similar preprocessing phase consisting of the same pre filtering, power normalization, phase continuity adjustment and similar symbols removal. The pith modification phase has been omitted due to the fact that it was observed to ruin the dynamic spectrum of the modulator output which in turn makes further modifications necessary to bypass the VAD. Applying this space limitation and the mentioned preprocessing, a symbol set consisting of symbols each having samples is acquired. Therefore, the constrained nonlinear search problem on the multidimensional and probably multimodal discrete space is reduced to finding a
codebook of symbols taken from this set such that is minimized. The next section illustrates the proposed heuristic algorithm for solving the addressed problem. C. Proposed Optimization Algorithm Before proceeding to the optimization algorithm definition, the term “most error causing symbol” should be defined. It is known from the classic information theory that the objective function can be written as Using the definition of P matrix (7) can be written as In which
is the probability of
obtained by
Contribution of the codebook symbol to sum of all terms from the above sum containing denoted by written as:
is the or
The index of the “most error causing symbol” is given by The proposed minimization algorithm is an iterative method that starts from an initial random codebook and tries to find a codebook with a reduced in each iteration. Iterations involve 4 steps as demonstrated below: Step1: Find the “most error causing symbol” of the current codebook using its histogram. This estimate of the histogram has been calculated in step of the last iteration and stored in matrix (empirical histogram). Step2: Generate new codebooks by substituting the most error causing symbol of the current codebook obtained in step 1, by random symbols from the search space. Step3: Estimate the empirical histogram for N codebooks generated in step2. This is done by modulating a statistically sufficient amount of random data on codebook symbols, passing them through the voice codec, demodulating the received symbols at the receiver and generating the matrix. Step4: Calculate for these N codebooks using the empirical histograms estimated in step3. Now, if the child with the least improves the objective function compared to its parent codebook, it is set as the current codebook, else return to step2. End if the stopping criterion is met, otherwise go to step1. The stopping criterion is set such that the algorithm stops if a predefined number of iterations is exceeded or is below a preset threshold. To prevent the algorithm from being trapped in a local minimum the following rule is applied: If the cost function dose not decrease in 3 consecutive iterations the algorithm substitutes the second “most error
1024
causing symbol” by random symbols of the search space (Step 2) for the next iteration and then continues regularly. This algorithm tries to maximize the cost function improvement in each iteration by eliminating “the most error causing symbol” that increases the most. As shown in the next section, this clever substitution significantly reduces the number of iterations needed for achieving a codebook with acceptable performance. IV.
SIMULATION RESULTS
To assess the performance of the proposed codebook design algorithm, codebooks with different dimensions providing various Modem Data Rates (MDR) were generated for each Codec Data Rate (CDR) of the AMR voice coder applying the proposed algorithm. The empirical Symbol Error Rate (SER) was evaluated for each codebook as described previously. The test results along with some codebook and algorithm parameters are reported in Table1. The Modem Data Rate (MDR) is calculated by In the above equation 8000 is the number of samples transmitted in a second and represents the number of bits transmitted by each usage of the channel or equivalently the number of bits transmitted on each sample. The reported MAEFR parameter represents the Maximum Achievable Error Free Rate. It can be interpreted as a lower bound on the vocoder channel capacity and is calculated by Table 1
CDR (Kbps) 12.2
10.2
7.95
7.4
6.7
5.9 5.15 4.75
In the above equation is the maximum number of symbols transmitted in a second and represents the maximum number of bits transmitted by each usage of the modeled discrete memory less channel or equivalently the maximum number of bits transmitted on each codebook symbol. The achievability of MAEFR data rate can be proved using the concept of random coding on the modeled discrete memory less channel and applying the packing lemma as described in [16]. The Convergence Iterations (CI) denotes the number of algorithm iterations that results in the corresponding codebook. And finally, the reported N parameter is the number of children of each parent codebook used in the codebook optimization algorithm. It should be noted that the simulations were done using the ANSI-C code for AMR speech codec [17]. Plots of SER and MAEFR against the iteration number are depicted in Fig. 3 and Fig. 4. These figures show that the proposed optimization algorithm achieves almost exponential convergence rate in comparison with almost linear convergence rate achieved in [2]. It should also be noted that the number of iterations needed for the algorithm to converge are at list an order of magnitude less than the results achieved in [2] and [10].
Performance of the proposed optimization algorithm over different compression levels of the AMR voice codec
Codebook Size 128 64 32 64 32 16 64 32 16 64 32 16 64 32 16 64 32 16 32 16 32 16
Symbol length (Samples) 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40
MDR (bps) 1400 1200 1000 1200 1000 800 1200 1000 800 1200 1000 800 1200 1000 800 1200 1000 800 1000 800 1000 800
1025
SER
MAEFR (bps)
CI (Iterations)
N
1199.7 999.9 799.9 1199.5 999.2 799.7 1199.3 998.6 799.4 1183.1 981.4 797.3 1152.9 970 792.6 916.5 778.1 895.6 769.4
135 120 85 110 45 20 95 75 35 110 45 20 145 70 110 135 100 120 100 150 85 85
90 85 45 75 40 20 75 35 20 75 40 20 75 35 20 75 35 20 35 20 35 20
0.4
1000
0.35
990 Maximum Achievable Error Free Rate (bps)
Symbol Error Rate (SER)
0.3
0.25 AMR 4.75 AMR 5.15 AMR 5.9 AMR 6.7
0.2
0.15
0.1
0.05
0
980
970
960
950
940
0
10
20
30
40 Iterations
50
60
70
920
80
Figure 3. SER Convergence of 32x40 Codebook (MDR=1000)
V.
CONCLUSION
REFERENCES
[2]
[3]
[4]
[5]
0
10
20
30
40 Iterations
50
60
70
80
Figure 4. MAEFR Convegence of 32x40 Codebook (MDR=1000)
In this paper a data transfer technique for digital data communication through the GSM voice channel is presented that utilizes codebooks of Speech-Like (SL) symbols to transmit data through the Adaptive Multi Rate (AMR) voice codec. Compared to the previous works, a main contribution of the authors is modeling the AMR vocoder channel by a Discrete Memory less Channel (DMC) and setting the capacity of this DMC channel as the objective function for maximization instead of the Symbol Error Rate (SER) that has been optimized in most previous works. The other contribution of the authors is the heuristic algorithm proposed for codebook optimization that proved to achieve better rate and SER performance compared to previous works while having faster convergence rate. The presented data modem along with the proposed codebook optimization technique could be used for data communication through channels with unknown or complicated channel models which is ongoing. [1]
AMR 4.75 AMR 5.15 AMR 5.9 AMR 6.7
930
[6]
[7]
[8] [9]
[10]
[11]
[12]
[13]
N. Katugampala, S.Villette, and A. M. Kondoz, “Secure voice over GSM and other low bit rate systems,” IEE Secure GSM and Beyond: End to End Security for Mobile Communications, pp. 3/1- 3/4, London, Feb. 2003. C. K. LaDue, V. V. Sapozhnykov, and K. S. Fienberg, “A data modem for GSM voice channel,” IEEE Transactions on Vehicular Technology, vol.57, no.4, pp.2205-2218, July 2008 A. Shahbazi, A. H. Rezaei, A. Sayadiyan, and S. Mosayyebpour, “Data Transmission over GSM Adaptive Multi Rate Voice Channel Using Speech-Like Symbols,”. International Conference on Signal Acquisition and Processing ICSAP, pp. 63-67, Feb. 2010 N. N. Katugampala, K. T. Al-Naimi, S. Villette, and A. M. Kondoz, “Real time data transmission over GSM voice channel for secure voice and data applications,” Secure Mobile Communications Forum: Exploring the Technical Challenges in Secure GSM and WLAN, pp. 7/17/4, 23 Sept. 2004. N. N. Katugampala, K. T. Al-Naimi, S. Villette, and A. M. Kondoz, “Real time end to end secure voice communications over gsm voice
[14]
[15]
[16] [17]
1026
channel,” 13th European signal processing conference, Antalya, Turkey, September 2005 B. Kotnik, Z. Mezgeca, J. Svecko, and A. Chowdhury, “Data transmission over GSM voice channel using digital modulation technique based on autoregressive modeling of speech production” Digital Signal Processing, vol.19, no.4, pp. 612-627, July 2009 A. Patro, Y. Ma, F. Panahi, J. Walker, and S. Banerjee, “A System for Audio Signalling Based NAT Traversal,” Third International Conference on Communication Systems and Networks, pp. 1-10, Jan. 2011 A. Kondoz, N. Katugampala, K. T. Al-Naimi, S. Villette, Data Transmission, WIPO Patent GB05/001729, November 17, 2005. M. Boloursaz, R. Kazemi, D. Nashtaali and F. Behnia, “Secure Data over GSM based on Algebraic Codebooks,” 10th IEEE East-West Design & Test Symposium, pp. 409-412, September 2012. V. Sapozhnykov and S. Fienberg, “A Low-rate Data Transfer Technique for Compressed Voice Channels,” Journal of Signal Processing Systems, vol. 68, no. 2, pp. 151-170, 2012 A. Shahbazi, A. H. Rezaie, A. Sayadiyan, and S. Mosayyebpour, “A novel speech-like symbol design for data transmission through GSM voice channel,” IEEE International Symposium on Signal Processing and Information Technology, pp. 478-483, Dec. 2009 T. Chmayssani and G. Baudoin, “Data transmission over voice dedicated channels using digital modulations,” 18th International Conference on Radioelektronika, pp. 1-4, April 2008 T. Chmayssani and G. Baudoin, “Performances of digital modulations for data transmission over voice dedicated channels,” 50th International Symposium ELMAR, vol.1, pp. 273-276, Sept. 2008 A. Dhananjay, A. Sharma, M. Paik, J. Chen, T. K. Kuppusamy, J. Li, and L. Subramanian, “Hermes: data transmission over unknown voice channels,” In MobiCom ’10: Proceedings of the sixteenth annual international conference on Mobile computing and networking, pp. 113– 124, 2010. ETSI, Digital Cellular Telecommunications Systems (Phase 2+), Adaptive Multi-Rate (AMR) speech transcoding, (GSM 06.90 version 7.2.1 Release 1998). A. El Gamal, Y. H. Kim, Network Information Theory, Cambridge University Press, 2011, pp.39–43. ETSI, Digital Cellular Telecommunications Systems (Phase 2+), Adaptive Multi-Rate (AMR) speech transcoding, ANSI-C code for AMR speech codec, (GSM 06.90 version 7.2.1 Release 1998).