8964
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 65, NO. 11, NOVEMBER 2016
Capacity Bounds and Detection Schemes for Data Over Voice Reza Kazemi, Member, IEEE, Mahdi Boloursaz, Member, IEEE, Seyed M. Etemadi, and Fereydoon Behnia
Abstract—Cellular networks provide widespread and reliable voice communications among subscribers through mobile voice channels. These channels benefit from superior priority and higher availability compared with conventional cellular data communication services, such as General Packet Radio Service, Enhanced Data Rates for GSM Evolution, and High-Speed Downlink Packet Access. These properties are of major interest to applications that require transmitting small volumes of data urgently and reliably, such as an emergency call in vehicular applications. This encourages excessive research to make digital communication through voice channels feasible, leading to the emergence of Data over Voice (DoV) technology. In this research, we investigate the challenges of transmitting data through mobile voice channels. We introduce a simplified information-theoretic model of the vocoder channel and derive bounds on its capacity. By invoking detection theory concepts and conjecturing Weibull and chi-square distributions for approximately modeling the probability distribution of channel output, we propose improved detection schemes based on the mentioned distributions and compare the achieved performances with the calculated bounds and other state-of-the-art DoV structures. Moreover, in common mobile networks, the vocoder compression rate is adopted in accordance with the network traffic adaptively. Although this phenomenon affects the overall capacity significantly, it has been overlooked by previous research studies. In this research, we apply the Gilbert–Elliott (GE) model to the voice channel, extract the required model parameters from the Markov model, and bound the overall voice channel capacity by considering the adaptive rate adjustment phenomenon. Index Terms—Gilbert–Elliott (GE) channel model, Markov model, voice channel capacity, Weibull detection scheme.
I. I NTRODUCTION
W
ITH the rapid increase in the popularity of mobile voice communications, the cellular network infrastructure has become widely available nowadays. This has encouraged research studies to look for methods and strategies to take full advantage of this omnipresent communication network. As an effort to utilize this widely spread communication network, this paper focuses on the feasibility of utilizing mobile voice
Manuscript received October 6, 2014; revised November 30, 2015; accepted January 12, 2016. Date of publication January 20, 2016; date of current version November 10, 2016. The review of this paper was coordinated by Prof. O. B. Akan. R. Kazemi, S. M. Etemadi, and F. Behnia are with the Department of Electronic and Electrical Engineering, Sharif University of Technology, Tehran, Iran (e-mail:
[email protected]). M. Boloursaz is with the Department of Electronic and Electrical Engineering, Sharif University of Technology, Tehran, Iran, and also with the Advanced Communications Research Institute (ACRI), Tehran, Iran. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TVT.2016.2519926
channels for Data over Voice (DoV) communications. This application is motivated by the fact that mobile voice channels benefit from broader coverage and higher priority compared with conventional data channels. Thus, they are generally preferred in cases of emergency data transmissions. However, we emphasize that the proposed DoV solution is certainly not intended as a competitor or substitute for mobile network data transmission protocols, such as General Packet Radio Service and High-Speed Downlink Packet Access. Instead, we consider it as a supplementary system to be utilized in specific situations that require higher availability. Important instances of such situations are briefly stated in the following. One of the well-known vehicular applications, which is envisaged to operate in both congestion periods and outlying areas, is the so-called emergency call (eCall) service. This service offers transmission of automated messages over the cellular network voice channel to the public safety answering point following a road crash. These messages include a minimum set of data on the precise crash location, airbag deployment status, and other relevant information. An implementation of this system is the pan-European eCall system scheduled for deployment in 2018 [1], [2]. The WikiWalk system is another application in which the mobile phone is equipped with a device to modulate the location data and transmit it via the mobile voice channel. This service enables the WikiWalk server to provide the user with appropriate voiced guidance instructions [3]. In addition to the aforementioned applications, other applications that utilize the mobile voice channel for DoV communications are addressed in the following. Katugampala et al. in [4]–[6] utilized this technique, to securely transmit end-toend encrypted voice or data through a mobile voice channel. A new method for transmitting Point of Sale (POS) transactions from POS terminals to financial hosts over a mobile voice channel was introduced in [7]. This technology has also been introduced for Network Address Translator (NAT) traversal, allowing mobile users to set up a direct peer-to-peer connection with minimal user intervention and without connecting to a middle server by making an end-to-end hole through the NATs [8]. Other possible applications are online text and multimedia messaging, telemetry, and real-time monitoring systems [9]. In Fig. 1, we show the mentioned applications. Note that all these applications rely on utilizing the mobile voice channel as a medium for low-rate high-priority digital communications (DoV technique) [10]. Thus, the DoV concept deals with efficient methods of data transmission through commonly used voice channels in mobile networks. The study of DoV started
0018-9545 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
KAZEMI et al.: CAPACITY BOUNDS AND DETECTION SCHEMES FOR DATA OVER VOICE
Fig. 1. Some applications of DoV technology.
with the pioneering work of Katugampala et al. [4] in 2003 and was followed by other research studies as previously addressed. Particularly, the international patents registered by Preston et al. in 2006 [11], Dorr in 2007 [12], Eatherly et al. in 2011 [13], and Kondoz et al. in 2013 [14] and the standards issued by the European Telecommunications Standards Institute in 2009 and 2011 [15], [16] further paved the way to utilizing this technology in practical and commercial applications. Despite the aforementioned advantages, data transmission through the mobile voice channel is a challenging task due to the following reasons. • The voice coding process executed in the vocoder is a lossy compression technique that only guarantees perceptual resemblance of the original and the reconstructed signals to the human auditory system. The two signals may, in general, be significantly different on a sample-bysample basis. • The scalar and vector quantizations applied to speech parameters in the coding process is a source of additional data loss for the original voice waveform. • Differential encoding of the speech parameters in the vocoder generates a long and unpredictable dependence relation between the encoded symbols in the output stream. • The encoding of speech parameters is tailored for highly correlated signals. This is a valid assumption for voice but not for common data signals. • The voice encoding process is performed frame by frame, and therefore, each sample depends on its preceding and following samples in the same frame. The dependence of a specific sample on its subsequent samples implies anticausality being another challenge to overcome for efficient data transmission. • Adaptive adjustment of the vocoder compression rate during a conversation according to the network traffic is yet another source of voice channel memory that has to be considered. The previous works on utilizing mobile voice channel in DoV communications have generally taken three different
8965
approaches to overcome channel nonidealities. These approaches, known as parameter adaption, codebook optimization, and modulation optimization methods, are briefly described in the following. The parameter adaption method involves mapping of the input bit stream into speech parameters of a speech production model. Katugampala et al. [6], Ozkan et al. [17], and Rashidi et al. [18] considered pitch or formant frequency, line spectral frequencies, and energy of speech frames as parameters to carry the input data bits. Yang et al. [19] and Kotnik et al. [7] utilized the linear prediction coding and autoregressive speech model, respectively. A similar method was introduced in [20], which applied the algebraic code excited linear prediction speech coding technique. So far, the highest data rate among these methods is 3 kb/s with the bit error rate (BER) of 1.2% achieved in [14] in the GSM enhanced full rate (EFR) voice channel. The codebook optimization approach employs a set of predefined waveforms (symbols) as codebook. These waveforms are optimized prior to transmission for an improved performance. For this purpose, LaDue et al. [21] and Sapozhnykov and Fienberg [9] applied the genetic algorithm and the pattern search algorithm, respectively. Shahbazi et al. [22] used exhaustive search over observed human speech waveforms to find optimal symbols. Boloursaz et al. [23] modeled the adaptive multirate (AMR) voice codec by a discrete memoryless channel (DMC) and proposed a heuristic algorithm that optimized the codebook for maximized DMC capacity. The highest achieved data rate of this category was 4 kb/s with a BER of 2.5%. Finally, the modulation optimization scheme is based on common digital modulation techniques and optimizes the involved parameters for efficient communication through the voice codec. Chmayssani and Baudoin [24] used conventional frequency-shift keying (FSK) and quadrature amplitude modulation techniques. Zdenko et al. [25] and Dhananjay et al. [26] proposed the phase-continuous context-dependent orthogonal frequency-division multiplexing (OFDM) amplitude-shift keying and Hermes (modified FSK) methods, respectively. Finally, Chen and Guo [27] proposed the OFDM-based method of modulating the input bit stream on orthogonal multifrequency sinusoidal carriers. The highest achieved data rate of this group is 3 kb/s with a BER of 3 × 10−3 on the GSM EFR voice channel. The three aforementioned methods share common imperfections. First, they all presume the decision rule a priori and optimize the symbols or other parameters of the model to achieve a locally optimum performance. For instance, the “parameter adaptation” method minimizes a certain voice parametric distance measure as the symbol detection criteria. Similarly, the “codebook optimization” method introduced in [21] presumes maximum correlation as the decision rule, which works only for i.i.d. additive white Gaussian noise. Finally, the “modulation optimization” method is limited to conventional digital modulation techniques. These limitations and prior assumptions may preclude achieving a globally optimum detector. Second, as previously mentioned, differential encoding of the extracted speech parameters in the vocoder compression process causes long and unpredictable symbol dependence relations. The aforementioned approaches mostly ignore this
8966
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 65, NO. 11, NOVEMBER 2016
effect in design and performance analysis. This, in turn, leads to a noticeable degradation in the performance of the above structures. Third, all of the mentioned approaches disregard the perturbations that occur in practical situations. As a major example, the previous works do not discuss the effects of adaptive compression rate adjustment of the vocoder during a conversation. As outlined above, previous studies have suggested different approaches to modulate data and have reported different qualities. There are few prior works based on information theory concepts. In this regard, Boloursaz et al. [28] introduced and optimized the maximum achievable error-free rate as a lower bound on the capacity of the vocoder channel. Kazemi et al. [29] also calculated a lower bound on the capacity of the voice codec. This paper aims at proposing an efficient scheme for data communication through a mobile voice channel that not only considers the aforementioned concerns but also offers superior performance in contrast to the state-of-the-art structures. To this end, we first focus on nonideal long-term dependence relations of the channel outputs and derive analytical bounds on vocoder capacity considering this effect. These bounds are then statistically calculated by simulations on the vocoder channel. We present the simulation results for the AMR voice codec that is commonly used in different generations of cellular networks [32]. Being inspired by the presented bounds and invoking the maximum-likelihood detector (MLD), we propose new closeto-optimum detection schemes based on Weibull and chi-square distributions. Furthermore, the achieved rates are compared with the calculated lower bound as well as the performances reported by previous works in terms of the maximum achievable error-free rate (MAER) [46]. Performance evaluation indicates that our proposed detection schemes approach the lower bound derived on vocoder capacity. We further address the vocoder rate adjustment problem and present an analytical investigation. For this purpose, we invoke queuing theory concepts to model the vocoder rate transition and apply the Gilbert–Elliott (GE) model to calculate the overall voice channel capacity. Notation: Throughout this paper, we denote scalar parameters and vectors by regular (not boldface) lowercase and uppercase letters, respectively, e.g., x and X, and boldface lowercase letters denote random variables, e.g., x. Moreover, matrices symbolize using boldface uppercase letters, e.g., X, whereas the corresponding regular lowercase letters, with subscript indexes, represent the entries, e.g., x1,2 . Calligraphic letters are used for functions and mappings, e.g., X . The probability distribution function of a random variable is denoted by p(.), and the probability of occurrence of a single event is depicted by Pr(.). Furthermore, H(.), I(.; .), and E[.] are entropy, mutual information, and expected value of random variables. C and R represent the capacity and achievable rate through channels, and h(.) is the binary entropy function. Moreover, 1(.) indicates 0 if its Boolean argument is false and 1 otherwise. Finally, . indicates 2-norm of vector, ., . represents the inner product of vectors, and (.)N denotes a number in the radix of N (i.e., (ijk)N = i × N 2 + j × N + k). The rest of this paper is organized as follows. In Section II, we describe the vocoder channel model from an informationtheoretic point of view. The model is followed by an outline on
Fig. 2. (a) Conceptual diagram of the vocoder channel. (b) Simplified vocoder channel model.
channel capacity derivations. We present our detection schemes in Section III. The equivalent GE channel is discussed in Section IV, where we also derive a lower bound on the channel capacity by considering the vocoder rate transition effects. Finally, Section V concludes this paper. II. B OUNDS ON VOCODER C APACITY Here, the vocoder channel is studied from the information theory prospective, and bounds on its capacity are proposed. The derived bounds are analytically proved in Section II-A and numerically calculated in Section II-B by performing statistical simulations on the AMR voice codec. A. Bounds and Proofs A conceptual model of the vocoder channel is shown in Fig. 2(a). In this block diagram, each output sample of the AMR voice codec (ˆsi ) depends probabilistically on the corresponding input sample si and the memory state mi by p(ˆsi |si , mi ) density function. The channel memory state models the symbol dependence relations caused by differential encoding in the vocoder and depends on previous memory state and the input by mi = F (si−1 , mi−1 ) function. In this diagram, D represents a unit delay block. A more simplified version of the given channel model, including channel encoder and decoder blocks, is shown in Fig. 2(b). In this model, mi is expanded to its terms that are the l previous inputs si−1 , si−2 , . . . , si−l if the channel memory effects are assumed negligible after a distance exceeding l
KAZEMI et al.: CAPACITY BOUNDS AND DETECTION SCHEMES FOR DATA OVER VOICE
channel uses. In the following theorem, we prove an upper bound on overall voice channel capacity based on the given simplified model. Theorem 1: For the vocoder channel with memory modeled by the block diagram in Fig. 2(b), the following upper bound holds on the capacity: C≤
max p(s1 ,s2 ,...,sl+1 )
I(s1 , s2 , . . . , sl+1 ; ˆs).
Converse Proof: See Appendix A. Theorem 2: For the memory full vocoder channel modeled by the block diagram in Fig. 2(b), the following lower bound holds on the capacity:
8967
in which qm ∈ {1, . . . , t}. In other words, the proposed location distribution matrix P represents the empirical distribution of the channel output. To elaborate further, we have partitioned the continuous n-dimensional space of the channel output into tn equal-volume segments to form a discrete set of channel output symbols of size tn and store the empirical probability of these symbols in P. We derive these empirical probabilities by feeding the channel with a statistically sufficient amount of equiprobable input symbols and evaluating the probabilities as in (2). Hence, the location distribution matrix P gives the probability distribution of the channel output and helps calculate H(ˆs) as Hˆs = −
max I(s; ˆs) C.
pq1 ,q2 ,...,qn × log2 pq1 ,q2 ,...,qn .
q1 ,q2 ,...,qn
p(s)
Achievability Proof: See Appendix B. B. Numerical Results
Since the statistical model of the vocoder channel p(ˆs|s) is not analytically tractable, a numerical calculation of the lower bound derived in the previous section is not straightforward. In the following, a numerical approach to calculate the mentioned lower bound is offered. In this approach, we assume that the input data stream is modulated on two equiprobable symbols with n-sample length (i.e., binary signaling). Now, assume that a total of k symbols are transmitted through the vocoder channel. Denoting the jth output symbol by vector Sˆj and the ith sample of this symbol by ˆsi,j , any output symbol represents a point in Rn as
Taking a similar approach, we can derive the location distribution matrix for the special cases in which the channel input is fixed at any of the binary input alphabets and estimate the empirical probabilities of the channel output conditioned on each of its input alphabets. Hence, H(ˆs|s) can be also estimated, and consequently, I(s; ˆs) can be calculated as I(s; ˆs) = H(ˆs) − H(ˆs|s). We observe that by increasing t while guaranteeing the statistical sufficiency of k, which is roughly given by (3) [47], I(s; ˆs) converges to an asymptotic value in each case. This asymptotic value can be considered as a lower bound on vocoder capacity per channel use, i.e.,
Sˆj = [ˆs1,j , ˆs2,j , . . . , ˆsn,j ]. Now, divide the dynamic range of the output signal on each axis into t equal-length portions leading to tn isometric cubes in Rn . Note that this dynamic range on the mth axis is defined as the distance between the infimum and supremum of the output signal on that axis given by lm and um in the following: um = sup (ˆsm,j ) lm = inf (ˆsm,j ).
(1)
j
j
Furthermore, we define the location distribution matrix P as P = [pq1 ,...,qn ]t × t × · · · × t ,
qm ∈ {1, . . . , t}.
n
In the given equation, each element of matrix P represents an empirical probability of an output symbol to be located in the corresponding cube. We assume that the total number of the transmitted symbols k is statistically sufficient to yield accurate estimates of the exact location distribution matrix and hence estimate these empirical probabilities as in pq1 ,...,qn =
k n 1 um −lm × 1 lm +(qm −1)× k j=1 m=1 t ≤ˆsm,j < lm +qm ×
um −lm t
(2)
k>
10 . min(P)
(3)
To compare the derived lower bound with the previously mentioned rates in the literature, it should be reported as the achievable error-free rate R in bits per second according to R = I(s; ˆs) × fs
(4)
where fs denotes the vocoder sampling frequency (8000 for AMR). It also gives the symbol rate (in symbols per second) provided by the vocoder. This achievable rate is sketched versus t for different values of symbol length n and for both AMR values 12.2 and 4.75 in Fig. 3. According to this figure, the lower bounds of 3711 b/s for AMR 12.2 and 950 b/s for AMR 4.75 are achievable by using antipodal modulations of lengths 2 and 4 on these vocoders, respectively. As customary in binary signaling, the probabilities of transmitting symbols are denoted by p and 1 − p. Considering this denotation, to investigate the “capacity symmetric property” of the vocoder channel, I(s; ˆs) is plotted versus p in Fig. 4. As observed in this figure, the symmetric assumption is legitimate for the vocoder channel; hence, it can be approximated by a binary symmetric channel with negligible performance degradation for both compression rates of 12.2 and 4.75 kb/s.
8968
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 65, NO. 11, NOVEMBER 2016
III. I MPROVED D ETECTION S CHEMES The vocoder channel, as previously described, comprises a variety of nonlinear blocks [9], [21]. It is clear that designing a perfect and optimum modem for this channel is complicated and prohibitive. Accordingly, we design a close-to-optimum modem for this channel in the following. To do so, the simplified model of the vocoder channel presented in Fig. 2(a) is taken into account. To design the suboptimum modem, an appropriate decoder should be designed to correctly detect the transmitted symbols. Considering the vocoder memory, this decoder could be derived according to maximum-likelihood detection, i.e., u ˆ = arg max p(Sˆj |S u , Sˆj−1 , . . . , Sˆ1 )
(5)
u∈{1,...,2nR }
in which 2nR denotes the total number of codebook symbols (i.e., 2nR -ary signaling), and S u indicates the uth symbol among them. Moreover, similar to the previous section, Sˆj represents the jth symbol at the channel output. Since this probability is a function of all samples of the previously transmitted symbols, it seems intricate to solve this optimization directly. The two main reasons are as follows. 1) To our best knowledge, as the vocoder memory denoted by function F in Fig. 2 has not been mathematically modeled so far, symbol dependence relations of this channel do not possess a mathematically tractable behavior; hence, it would be a major problem to obtain an analytical representation for p(Sˆj |Sj , Sˆj−1 , . . . , Sˆ1 ). 2) For a similar reason, the joint probability distributions of samples for each symbol are not modeled as well, and it is not straightforward to attain p(Sj |Sj , Sˆj−1 , . . . , Sˆ1 ). Fig. 3. Achievable error-free rate curves in (a) AMR 12.2 and (b) AMR 4.75 vocoder channel.
Fig. 4. Normalized mutual information over p.
Accordingly, two steps should be taken to overcome the mentioned problems. First, the channel memory effect should be eliminated through dispersion, which was accomplished by the interleaving technique akin to the achievability proof of the lower bound in Section II-A. Second, regarding the second concern, a proprietary and well-defined statistic should be proposed based on the samples of each symbol to facilitate the derivation of a straightforward decision rule. As previously noted, the joint distribution of the output samples for each symbol is neither known nor discriminative enough to be mathematically tractable. Hence, we need to propose a statistic with a well-defined mathematical behavior that also contains the most possible amount of information transmitted by all samples. Considering a square Euclidean distance from the mean of the output symbols as our statistic turns out to be an appropriate case for the aforementioned dilemma, which not only does not miss crucial information of output samples but also benefits from a well-defined mathematical probability distribution. Proposition 1: For orthogonal signaling, the Euclidean distance is a sufficient statistic or, better yet, it contains all necessary information from the output samples. Proof: Assuming a codebook of n orthogonal symbols with length m 2nR , we have an output signal space of, at most, 2nR dimensions. Hence, the codebook symbols form a basis for the output signal space. Let rhu denote the Euclidean
KAZEMI et al.: CAPACITY BOUNDS AND DETECTION SCHEMES FOR DATA OVER VOICE
8969
ˆ and the hth distance between the output symbol (i.e., S) codebook symbol (i.e., S h ) subject to transmission of the uth codebook symbol (i.e., S u ). Now, subject to the transmission of S u , by extending the Euclidean distance statistic (i.e., rhu ) as ˆ 2 + S h 2 − 2S h , S ˆ rhu = S
(6)
we can deduce that since S h s are assumed to have equal energy ˆ 2 + S h 2 ), levels, the first couple of terms in rhu , i.e., (S h are common among all S s and can be omitted through the decision process. In conclusion, rhu s provide projections of the output signal on all bases of the signal space (S h s); hence, no information will be ignored. With regard to Proposition 1 and considering r hu as the proposed statistic, the concern about losing information by applying the proposed statistic is satisfied. Next, to investigate the second concern addressed by unknown probability distribution, let us elaborate by deriving the MLD regarding the proposed statistic as
nR (7) u ˆ = arg max p r 1 , . . . , r2 |S u u
considering orthogonal codebook symbols while defining 2 . rh = Sˆ − S h , it is legitimate to assume that p(r h |S u ) = p(rhu ) are independent for all h ∈ {1, 2, . . . , 2nR }. Hence, nR p(r1 , . . . , r 2 |S u ) can be expanded as nR
2 1 2nR u p r , . . . , r |S = p(r h |S u ) = p (r uu )
h=1
nR 2
p r hu .
h=1,h=u
(8) To proceed with further mathematical analysis, we need to make assumptions about the probability distribution of the proposed statistic [i.e., p(rhu )]. To do so, we consider two cases of Weibull and chi-square assumptions for approximating p(r hu ), and in the next step, we will verify the correctness and goodness of these conjectures by conducting the corresponding simulations. We consider the chi-square assumption as conjecture with respect to the pseudo-Gaussian property of this channel [21] and recalling the premise that the square sum of pseudoGaussian samples can be approximated by either a central or a noncentral chi-square distribution [30]. Moreover, the intuition behind conjecturing a Weibull assumption corresponds to its flexibility and consistency in modeling positive semi-Gaussian distributions [31]. A. Weibull Assumption Here, we verify the Weibull assumption for the proposed statistic and derive a close-to-optimum detector based on that. To this end, we modulate a statistically sufficient amount of random data on two orthogonal symbols of length 2 (n = 2) and then transmit them through the AMR vocoder channel simulator [32]. The scaled output histogram of the proposed statistic is then sketched in Fig. 5. As expected, for both compression rates of 12.2 and 7.4 kb/s, the histograms extracted for r00 and r 11 have a lower mean value
Fig. 5. Empirical probability distributions of the suggested statistic (a) AMR 12.2 and (b) AMR 7.4.
compared with r10 and r 01 . Both mentioned histograms can be approximated by a Weibull probability distribution given by x θ θ x θ−1 −( λ ) , x≥0 e pw (x; λ, θ) = λ λ 0, x 0 is the shape parameter, and λ > 0 is the scale parameter of the probability density function [33]. We also conduct a two-sample Kolmogorov–Smirnov test [34] on channel output samples, which verifies the goodness of Weibull fit at the level of 0.05 in all cases. To check whether these assumptions are valid if n > 2, we conducted several simulations for n = 4, 8, 16, while utilizing different orthogonal symbols in each case. According to the simulation results, the achieved parameters for all h = u in each case were almost the same, and the deviations in all cases were less than 3%. Leveraging on this outcome, we show in Appendix C that the Weibull assumption yields the following detection rule: ⎡ ⎤ nR 2 pw (r h ; λ2 , θ2 )⎦ . u ˆ = arg max ⎣pw (r u ; λ1 , θ1 ) u∈1,...,2nR
h=1,h=u
(9)
8970
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 65, NO. 11, NOVEMBER 2016
TABLE I A CHIEVED E RROR P ROBABILITIES IN THE C ASE OF W EIBULL A SSUMPTION
TABLE II A CHIEVED E RROR P ROBABILITIES IN THE C ASE OF C HI -S QUARE A SSUMPTION
As an expressive instance, for the binary signaling case, the given decision rule can be simplified as follows by doing some algebraic simplifications:
We stress in Appendix D that the chi-square assumption yields the well-known minimum distance detector of
[r 1 ]
(θ1 −θ2 )
·e
r1 λ2
θ
2
(13)
u
1
S1
≷ [r2 ]
(θ1 −θ2 )
S2
·e
r2 λ2
2 θ 2 1 r − λ
θ
1
(10)
and by applying natural logarithm and doing some mathematical operations, we have ln r 1 · (θ1 − θ2 ) + S1
u ˆ = arg min[r u ].
1 θ 1 r − λ
1
r λ2
θ2 −
≷ ln r · (θ1 − θ2 ) + 2
S2
1
r λ1
θ1
r2 λ2
θ2 −
r2 λ1
θ1 . (11)
Finally, utilizing the given detection rule, the empirical error probability for n = 2 is attained by simulations as in Table I. It should be noted that in conducting this simulation, we utilized the Monte Carlo method while averaging over ten trials. We also considered two equiprobable symbols with two samples each, i.e., S 1 = −S 2 = [−A, A]. Although, due to normalization, the result is not sensitive to the value of A, but we assume A = 0.1 in our simulations. Moreover, the achieved variance for the mentioned error probabilities was less than 3%; hence, the 95% confidence interval will be BER ± 6% in each case, e.g., (0.2359, 0.2661), (0.0784, 0.0986), and (0.0141, 0.0159) for the values in Table I. B. Chi-Square Assumption Here, we derive a close-to-optimum detector by using the chi-square assumption for the proposed statistic. In [21], the output samples of each symbol are assumed to have i.i.d. normal distributions. According to the classic theory of probability, the sum of squares of n independent of normal random variables forms a chi-square (central or noncentral) random variable with n degrees of freedom [35]. Exploiting this knowledge, we get the following distributions for the proposed Euclidean distance statistic: ⎧ h n − r ⎪ ⎨ n n1 n (r h ) 2 −1 e 2α21 , h=u 2 2) f (r h |S u ) = α1 2 rΓh(+β n 1
⎪ ⎩ 1 e 2α22 rh 4 − 2 I n βrh , h = u β 2α22 α42 2 −1 (12) where Γ(.), In (.) are the gamma function and the modified 2 Bessel function, respectively. Furthermore, β = S h=u −S u and α1 , α2 are constants, depending on the characteristics of the applied vocoder.
Finally, utilizing the derived detection rule, the empirical error probability for n = 2 is calculated by simulations, as in Table II. C. Performance Comparison As stated in Section I, the best result for each method is achieved under specific transmission rates. As an example, in the case of the parameter mapping method, Kondoz et al. suggested a method that could transmit the bit stream at the rate of 3000 b/s with the BER of 0.012, whereas Sapozhnykov et al. (i.e., the best result for the code optimization method) offered a method that operates at the rate of 4000 b/s with the BER of 0.025. Considering these differences, to compare the proposed detection schemes with the previous works studied in Section I, we need to apply a fair benchmark that can handle the aformentioned issue. To do so, the MAER for each technique is reported in Table III as a fair performance benchmark. MAER can be computed as R(1 − h(p)), where R indicates the transmission bit rate (i.e., second column in Table III), and p denotes the BER (i.e., third column in Table III) at the mentioned transmission bit rate (i.e., R). In fact, MAER represents the equivalent rate at which the input bits can be transmitted without errors. As listed in Table III, not only the proposed schemes offer superior results in contrast to state-of-the-art structures but also their MAER performances approach the lower bound obtained in Section II-B. IV. A DAPTIVE R ATE T RANSITION E FFECTS As observed in Section II-B, different vocoder compression rates introduce different levels of distortion to the voice signal causing them to offer different capacities for digital communication. Thus, different rates are achievable over the high-quality (full rate) and low-quality (half rate) channel states representing AMR compression rates of 12.2 and 4.75, respectively. In this research, we model the overall voice channel by an equivalent GE channel changing between full rate and half rate according to a binary-state Markov model. Here, we introduce the GE channel and its capacity. Then, we derive the corresponding Markov model for full/half-rate state change and use it to compute the overall voice channel capacity. A. GE Channel The GE channel is a double-state binary symmetric channel with transition probabilities as shown in Fig. 6 [36], [37]. This
KAZEMI et al.: CAPACITY BOUNDS AND DETECTION SCHEMES FOR DATA OVER VOICE
8971
TABLE III MAER C OMPARISON B ETWEEN THE P ROPOSED S CHEMES , THE P REVIOUS W ORKS , AND THE A CHIEVED L OWER B OUND
The crossover probabilities for the full- and half-rate states are pf = Pr(ej = 1|sj = f ),
ph = Pr(ej = 1|sj = h).
Finally, the full-to-half ratio of the channel is defined by Fig. 6. Equivalent GE model for voice channel.
c= type of channel emerged in the early works by Gilbert and Elliott to model time-varying channels with memory [38], [39]. Definition 1: Denote by bj ∈ {0, 1} and bj ∈ {0, 1} the jth input and output bits of the GE channel, respectively. If we − → bj , it satisfies define the error process {ej }∞ j=1 as ej = bj the following conditions. First, the channel input and error processes are independent, i.e.,
k Pr {ˆbj }j=1 |{bj }kj=1 = Pr {ej }kj=1 for k = 1, 2, . . . . Second, although the channel error is a process with memory itself, it is memoryless when conditioned on the channel state process {sj }∞ j=1 , sj ∈ {sf , sh }. It should be mentioned that the letters sf and sh address the channel states of full rate and half rate, respectively, i.e., k
Pr {ej }kj=1 {sj }kj=1 = Pr(ej |sj ). j=1
The conditioned error probabilities Pr(ej |sj ), j = 1, 2, . . . are also independent of the channel use index j. According to Fig. 6, the channel state is a stationary first-order Markov process that satisfies
Pr sj {sj }∞ j=1 = Pr(sj |sj−1 ). Similarly, Pr(sj |sj−1 ) is independent of the j index. Defining f and has (14) gives the state transition matrix T 1−h f as . The steady-state probabilities of being in h 1−f full- or half-rate states are given by the eigenvector of T as (15). The initial state probabilities are also assumed to equal to these steady-state values to ensure stationarity, i.e., f = Pr(sj = sf |sj−1 = sh ), h = Pr(sj = sh |sj−1 = sf ) (14) Pr(s0 = sf ) =
f , f +h
Pr(s0 = sh ) =
h . f +h
(15)
f . h
(16)
The following lemma explains the channel memory behavior. It can be easily proved by induction on j. Lemma 1: For state σ ∈ {sf , sh }, we have Pr(sj = σ|s0 = σ) − Pr(sj = σ|s0 = σ) = (1 − f − h)j . Definition 2: The channel memory is defined as m = 1 − f − h,
m ∈ [−1, 1].
(17)
Corollary: An m value equal to zero implies that the channel is memoryless, i.e., the current channel state is independent of all previous states. In case of m > 0, the channel is called persistent, i.e., the probability of remaining in a particular state is higher than the steady-state probability of being in that state as given by (15). Giving an insight into this case, let us assume that σ = sf , we want to show that the probability of remaining in a particular state [i.e., Pr(s1 = f |s0 = f )] is higher than the steady-state probability of being in that state [i.e., Pr(s0 = sf ) = f /(f + h) based on (15)]. In doing so, we proceed as follows: f f +h
(18)
Pr(s1 = sf |s0 = sf ) − (1 − f − h) f +h
(19)
Pr(s1 = sf |s0 = sf ) Pr(s1 = sf |s0 = sf ) Lemma 1
(1 − f − h) (1 − f − h) · Pr(s1 = sf |s0 = sf ) 1 Pr(s1 = sf |s0 = sf ).
(20)
Finally, for m < 0, the channel is called oscillatory, i.e., in contrast with the persistent case, the channel stays in a particular state with a lower probability than the steady-state probability of being in that state. Moreover, note that (21) follows directly from (16) and (17). Thus m ≥ max{−c, −c−1}.
(21)
8972
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 65, NO. 11, NOVEMBER 2016
Definition 3: p∗k ({ej }k−1 j=1 , s0 ) denotes the crossover probability for the kth transmitted bit conditioned on the initial state s0 and the previous error process {ej }k−1 j=1 , i.e.,
k−1 = Pr e , s = 1|{e } , s p∗k {ej }k−1 k j j=1 0 . j=1 0 Similarly, pk ({ej }k−1 j=1 ) denotes the same probability conditioned only on the previous error process, i.e.,
k−1 = 1|{e } = Pr e pk {ej }k−1 k j j=1 j=1 . These crossover probabilities and their distributions can be recursively calculated using the formulas derived in [36]. Finally, Mushkin and Bar-David proved the GE capacity as in Theorem 3 [36]. Theorem 3: The GE capacity is calculated as C = 1 − lim E [h(pk )] = 1 − lim E [h(p∗k )] k→∞
k→∞
where E[.] is the expected value, and h(.) denotes the binary entropy function. It is shown in [36] that the two limits in Theorem 3 always exist and are equal. Corollary: The terms 1 − h(pk ) and 1 − h(p∗k ) monotonically converge to the channel capacity as in (22). These quantities are interpreted as the “memoryless” and “side information” bounds on GE capacity, i.e., CML = 1 − h(p1 ) ≤ · · · ≤ 1−h(pk−1 ) ≤ 1−h(pk ) ≤ C
≤ 1−h (p∗k ) ≤ 1−h p∗k−1 ≤ · · ·≤ 1−E [h (p∗0 )] = CSI . (22) In (22), the subscripts ML and SI stand for “memoryless” and “side information” capacities, respectively. It should be stressed that CML is the capacity of a Gilbert channel with the same ph , pf , and c but with m = 0 [40]. For arbitrary m, this is the rate achieved if we fragment and disperse the channel memory by interleaving. Moreover, it is obvious from the definition p∗0 (sj ) = Pr(ej = 1|sj ) that CSI is the capacity of a channel with the same ph , pf , c and with arbitrary m, assuming stateside information to be accessible to the receiver. Concluding this section, if the GE crossover and transition probabilities (ph , pf , h, and f ) are known, the ML and SI bounds on channel capacity can be numerically derived by the iterative calculation of Theorem 3 or by applying the statistical technique introduced in [41]. In the following section, we derive the state transition probabilities. B. State Transition Model In the previous section, the capacity of the voice channel was expressed in terms of error probabilities in half- and fullrate modes and the transition probabilities between these states. Here, we proceed with our analysis by presenting the general traffic model of a cellular network based on the Markov chain. The aim of this model is to estimate the values of state transition probabilities required for overall capacity calculation, as stated
in the previous section. To achieve this goal, first, the proposed model is presented, and second, the state transition probabilities are calculated. Following the previous research studies [42], [43], by assuming a large number of users, we can model the voice call arrival process as a Poisson process of rate λ [i.e., Pois(k; λ)]. Furthermore, the duration time for each call is supposed to be a random exponential variable with mean 1/μ [i.e., Exp(t; 1/μ)]. The network operators adopt different strategies to maximize the number of users that can be served simultaneously. Sharing a specified time slot between two users is one of the common strategies in cellular networks, which is addressed as half-rate channel assignment. To elaborate, if the number of occupied time slots (no ) is less than a predefined threshold (τHR ), the new incoming calls are assigned a full-rate channel; otherwise, a half-rate channel is assigned. Denote the system state by the random process n(t) = (i(t), j(t)) in which i and j indicate the number of active calls assigned with full- and half-rate channels at time t, respectively. In the following, the state transition of n(t) is described for the two different cases in which the new incoming call is assigned a full-rate or a half-rate channel, respectively. 1) Full-Rate Case: If the total number of occupied slots no = i + (j/2) is less than τHR , the new incoming call is assigned a full-rate channel. The sate transition diagram for this case is given in Fig. 7(a). Considering the small slot time interval T and diagram sketched in Fig. 7(b), the following transition probabilities could be computed as Pr(ci,j → ci+1,j ) = Pois(1; λT ) ≈ λT T Pr(ci,j → ci−1,j ) =
Exp(t, iμ)dt ≈ iμT 0
Pr(ci,j → ci,j−1 ) ≈ jμT
(23)
Pr(ci,j → ci,j ) ≈ 1 − (λ + iμ + jμ)T.
(24)
In the given statements, Pr(ci,j −→ ci+1,j ) shows the transition probability from state n(t1 ) = (i, j) to state n(t2 ) = (i + 1, j). 2) Half-Rate Case: On the other hand, if no > τHR , the new calls are assigned in half-rate mode. The transition probabilities in this case, as shown in Fig. 7(b), are derived as follows: Pr(ci,j → ci,j+1 ) ≈ λT, Pr(ci,j → ci−1,j ) ≈ iμT Pr(ci,j → ci,j−1 ) ≈ jμT Pr(ci,j → ci,j ) ≈ 1 − (λ + iμ + jμ)T.
(25)
As previously mentioned, this section aims at calculating the state transition probabilities that are required for channel capacity estimation. The first step in deriving these probabilities is to define the state transition matrix M . In doing so, we use ma,b . to define the elements of matrix M , i.e., M = [ma,b ]n2 ×n2 , in t
t
KAZEMI et al.: CAPACITY BOUNDS AND DETECTION SCHEMES FOR DATA OVER VOICE
8973
eigenvalue of 1, Ps and, consequently, the required probabilities are derived as follows:
Pr(s0 = sf ) =
p(ij)nt =
τHR 2(τ HR −i) i=0
i+ j2 τHR k+ 2l