Blind Channel Identi cation for Direct-Sequence

Blind Channel Identi cation for Direct-Sequence Spread-Spectrum Systems

Abstract

Dennis L. Goeckel1, Alfred O. Hero III1, and Wayne E. Stark1 EECS Department University of Michigan Ann Arbor, MI 48109

Channel identi cation for a binary phase-shift keyed (BPSK) direct-sequence spread-spectrum (DS/SS) system operating over a fading channel with sampling at the chip rate is considered in this work. The system is mapped to a discrete oversampled system, thereby allowing channel identi cation via second order statistics under a few nonrestrictive conditions. Using the method of subchannel response matching (SRM), the oine solution to this channel identi cation problem involves the determination of the eigenvector corresponding to the minimum eigenvalue of a matrix that depends on the correlation statistics of the samples of the received signal. A low complexity stochastic gradient method for nding this eigenvalue adaptively is derived and a convergence analysis under a few weak assumptions presented. For comparison, a method that utilizes trellis searching for joint data and channel identi cation when the system is not oversampled is extended in an obvious way to oversampled systems and a dierent adaptive algorithm developed than has been used in the past. Numerical results in the form of channel estimation error are obtained for the case when the spreading code is unknown but periodic with period equal to the symbol period.

1 Introduction

In this paper, channel identi cation for uncoded singleuser direct-sequence spread spectrum (DS/SS) systems is considered. The systems will be characterized by symbol transmission period [0; Ts] and employ a rectangular chip pulse restricted to [0; Tc], Tc Ts , where the processing gain is de ned as N = Ts =Tc. Although it is not suboptimal for an ideal system on an additive white Gaussian noise (AWGN) channel with no intersymbol interference (ISI) to correlate with the full spreading waveform and sample only once per symbol period, in actual application the sampling is normally done at a much higher rate. In many cases, sampling is done at the chip rate and, after multiplication by the appropriate spreading sequence bit, the results summed to form the decision statistic for a given information bit, which then can be thresholded to determine an information estimate. For an ISI system that is oversampled, however, it is no longer possible to obtain

a sucient statistic for metric calculation of a given path of the Viterbi algorithm, the optimal sequence estimator for a system corrupted by ISI, by multiplying the samples within a symbol period by a transmitter spreading coecient and then summing. The optimal combining of the samples in a given symbol period depends on the channel impulse response at each sample period, not just at the symbol period. This is the motivation for channel identi cation of oversampled systems at the sample period. Since the DS/SS system is oversampled in continuous time, one predicts that it can be mapped to an oversampled system in discrete time. This is important as classical blind equalization results show that unique channel identi cation is not possible (unless the channel is known to have minumum phase) through the use of second order statistics for non-oversampled discrete systems when the input sequence is ergodic and wide-sense stationary [1]. Recent work [2], however, has shown that if the input symbols to the channel in a digital communications system are independent and identically distributed (IID) and the output of the channel is sampled at a rate that is a multiple of the input symbol rate, the oversampling of the cyclostationary input waveform makes the sampled process also cyclostationary, thus allowing for blind channel identi cation based on only second order statistics. Because estimation of second order statistics requires fewer samples than that of higher order statistics for a given level of accuracy, one expects algorithms based on second order statistics to exhibit faster convergence.

2 DS/SS Systems over ISI Channels

The input data bits, denoted by the sequence (sk ), are assumed to be IID. The spreading sequence will be denoted (al ). Since binary phase-shift keying is being employed, the output of the transmitter can be expressed as (using complex baseband notation throughout) x(t) = q Eb j NTc s(t)a(t)e where Eb is the energy per data bit, is the transmitter phase (which will be assumed to be worked into the channel P1 response and suppressed from here forward), s(t) = k=?1 sk pTs (t ? kTs), and a(t) = P1 (t ? lT ), where pT (t) is de ned to be unita p c l T c l=?1 amplitude pulse that is nonzero on [0; T]. The time-nonselective, frequency-selective continuous 1 This work was supported by a National Science Foundation channel response is denoted by g(t). The received signal Graduate Fellowship and by the National Science Foundation under grants BCS-90243370 and NCR-9115969. is given by r(t) = g(t) x(t) + n(t) where n(t) is com-

The basic idea of the subchannel response matching alplex Gaussian channel noise. The receiver, using a chipmatched lter matched to the rectangular pulse, calculates gorithm is that since any two subchannels m and n have the same input, ym (k) hn (k) = yn (k) hm (k). Thus, for the complex statistic a given pair, it is sought to match the output from the Z kTs+(n+1)Tc application of an estimate of channel n's impulse response on m's output to the application of an estimate of m's imr(t)dt: yn (k) = y(kTs + nTc ) = kTs+nTc pulse response on n's output. If this is done in the mean It is straightforward to show that the response gm at squared sense for all distinct pairs of channels, one seeks sample time mTc due to a unit amplitude pulse on [0; Tc] to minimize is given by ?1 NX ?2 NX Z1 Z Tc E (h^) = E[j^hm(k) yn (k) ? ^hn(k) ym (k)j2] gm = g() pTc (t + mTc ? )dtd m=0 n=m+1 ?1

0

which implies by linearity that hkm , the response at sample subject to the constraint that kh^k2 = 1 where h^ = time s + mTc due to the spreading waveform a^k (t) = [^h0 (0) : : : ^h0(L ? 1) h^ 1 (0) : : : h^ N ?1 (L ? 1)]T . PN ?kT 1 k De ne y n(k) = [yn(k) yn (k ? 1) : : :yn (k ? L + 1)]T and p=0 a^p pTc (t ? pTc ? kTs ) on [kTs; (k + 1)Ts], is given by Rmn = E[y m (k)y Hn (k)]. The oine problem can be rewritNX ?1 ten as: nd the conjugate of the eigenvector corresponding âkp gm?p = âkm gm hkm = to the minimum eigenvalue of the matrix S where p=0

NX ?1 where one period of the spreading sequence fa^kp ; p = 0; : : :; N ? 1g is de ned in the obvious way. S = [I Rnn] ? V n=0 Thus, it has been established that the overall channel response can be written as a convolution of the spreading sequence for a period and an eective channel response. 2 R R10 : : : R(N ?1)0 3 00 Note that the index k that was carried through the calcu77 66 .. lation allows for the spreading sequence to vary over each . R01 R11 77 : 6 V = symbol period. However, for the work considered in this 64 .. ... 5 . paper, it is assumed that the spreading sequence is unR : : : R known and periodic in the symbol period. For this reason, 0(N ?1) (N ?1)(N ?1) the superscript k will be dropped from h in succeeding Thus, S consists of the dierence of two matrices, one that sections. consists of L by L identical blocks on the diagonal and one is seen to be not an outer product when considered 3 Adaptive Algorithms Using Subchannel that carefully. Note that if the channel is identi able via second Response Matching order statistics [4], there will be a unique minimum eigen3.1 Subchannel Response Matching value and the conjugate of its eigenvector will be the unique Since the SRM algorithm, an oine algorithm for chan- solution. In the case of a noiseless system, S will be a posnel identi cation in oversampled systems, is explained itive semide nite matrix, denoted SZ , with a nullspace of thoroughly elsewhere [3], the aim is only to set up the dimension one. In a noisy system, S = SZ + (N ? 1)2 I problem and present the nal solution. and thus will be positive de nite. Let the impulse response fhm ; m = 0; 1; : : :; LN ? 1g of the overall channel be of duration LN samples. Consider 3.2 Adaptive Algorithm the division of the oversampled output into N subchannels, Although nding the eigenvector associated with the one subchannel for each oset Tc from the input sample minimumeigenvalue of a matrix corresponds closely to Pistime. Each subchannel operates at sample rate Ts on the arenko's harmonic retrieval method [5], S is not simply an IID channel input symbols, and the output at time k of outer product of the observed vector unless N = 2, so th the n subchannel can be written as Pisarenko's results cannot be applied. As an alternative, a low complexity stochastic gradient method that can be LX ?1 performed online is derived. The error signal considered is yn (k) = hn(l)s(k ? l) + jn (k) the Rayleigh quotient of S l=0

where hn (l) = hlN +n is the lth sample of the nth subchannel of the impulse response, and jn (k) is the zero-mean observation noise variance 2 on this sample.

^H ^ ^ T ^ e(h^ ) = h ^S 2h = h ^S h2 khk khk

(1)

which is minimized at the conjugate of the miminumeigenvector of S. Since the adaptive algorithm is implemented with sample averages as opposed to ensemble averages, the matrix Sk is introduced which is the estimate of S at time k based on the observed channel outputs at time k; k ? 1; : : :; k ? L + 1 (i.e. S with the expectations removed). This definition suggests an empirical objective function at step k given by Jk =

^T ^ k ? l ) hl Sl hl kh^l k2 l=0

k X

(

desired solution hopt . Convergence of the stochastic gradient search for a particular sample path cannot be guaranteed; as a rst approximation, convergence conditions are derived in this section for the gradient search assuming the matrix S is measured with no error. Even under this assumption, global convergence is still dicult to demonstrate due to the complexity of the error surface, including

at portions leading to algorithm stagnation at each of the eigenvectors. Therefore, a desireable goal is to choose such that the algorithm will converge to the correct solution when it is near the minimizing eigenvector. The error at iteration k of the algorithm is given by

^Tk S h^k h^Tk SZ h^k h ^ ek = e(hk ) = ^ 2 = ^ 2 + (N ? 1)2 (2) khk k khk k In the following, the term involving 2 will be ignored as (N ? 1)2 represents a constant residual error at the min^ Tk Sk h^k h imizing solution. J(hk ) = e(hk ) = ^ 2 : khk k Assuming a noiseless system (2 = 0), substituting in h^k = h^k?1 ? 5 e(h^k?1), and retaining only terms that The standard update for the estimated channel, h^k , in are linear in ek?1 (since ek?1 1 near hopt in a noiseless the stochastic gradient algorithm is then given by h^ k = system), one obtains h^k?1 ? 5 e(h^k?1) where is a user speci ed gain factor. Thus, the key is the ecient calculation of the gradient of h^T S h^ h^T S 2 h^ h^T S 3 h^ e(h). Direct calculation using the de nition in equation ek = k?k1h^ Zk2k?1 ? 2 kh^k?k12 kZh^ k?k12 + 2 kh^k?k12 kZh^ k?k14 k k k?1 k k?1 (1) yields P ?1 H Applying the modal decomposition SZ = LN i=0 i v iv i th T 2 where v i is the i eigenvector of SZ , convergence of ek can 5e(h) = 2 (khk Sk hk?hkh4 Sk h Ih) : be achieved by the convergence of each of the modes. For the ith mode, one obtains Due to the complicated form of Sk , \brute force" compu! T tation of the above requires O(L2 N 2) operations because kh^k?1 k2 i kh^k?1 v i k2 2 2i i i 1 ? 2 + of the matrix-vector multiplication Sk h. However, noting ek = kh^k?1 k2 kh^k k2 kh^k?1 k2 kh^k?1 k4 the conjugate symmetry of S, 5(hT Sk h ) = 2Sk h, and it can be shown that where 2 [0; 1] is a forgetting factor. For the rest of this paper, it will be assumed that = 0, leading to the simpler objective function

NX ?1 ?1 @(hT Sk h ) = 2(LX hm (i) yn(k ? i)yn (k ? l) @hm (l) n=0 i=0 NX ?1 LX ?1 ? ym (k ? j) hn (j)yn (k ? l)):

eik eik?1 1 ? î 2 khk?1k

!2

(3)

where the last inequality comes from the fact that kh^k k2 is nondecreasing as a function of k. This nondecreasing n=0 j =0 property can be derived as follows: note from the de nition The analysis of the number of operations now goes as 5e(h) that h^ Hk?1(h^ k ? h^k?1) = h^ Hk?1 5 e(h^k?1) = 0 which follows: the two sums over n must be done for each l and of i (or j), thus leading to O(L2 N) operations; given these then implies sums, 5(hT Sk h ) can be found in O(LN) inner products of vectors of length L. Thus, the total number of operakh^k k2 = kh^k ? h^k?1 + h^ k?1k2 tions required to obtain Sk h (and thus 5e(h)) is O(L2 N), = kh^k ? h^k?1k2 + kh^k?1k2 khk?1k2 leading to a considerable savings if the processing gain is large. thus establishing the nondecreasing property. If the initial guess is chosen on the unit circle (i.e. 3.3 Convergence Analysis 2 ^ A gain factor must be speci ed for the stochastic gra- kh0 k = 1) and is suciently close to the solution hopt , dient algorithm that guarantees convergence of h^k to the the nondecreasing property and (3) can be used to show

2 . that convergence occurs in all of the modes if < max Since max is dicult to obtain, the conservative estimate < tr(2SZ ) is used instead, where tr(SZ ) = L(N ? P 1) Ni=0?1 (E[yi (k)2 ] ? E[ji2 (k)]), which can be readily estimated if the signal-to-noise ratio (SNR) is known.

One key item not displayed in Figures 1-3 for the SRM algorithm is that the plots are dominated by a few sample paths that take a long time to converge. Since the complexity of the SRM algorithm is low, this suggests using multiple starting points for one trial and choosing the one with the lowest ek (calculated at each step to form the gradient) at the kth iteration. Using a small number 4 Trellis-Searching Algorithm of starting points would greatly increase performance with In [6], an algorithm for blind equalization is introduced. complexity still less than the trellis-searching algorithm. It extends the Viterbi algorithm, the optimal sequence esti- An example of the improved performance (averaged over mator for channels with known ISI, to perform blind equal- 30 sample paths) is shown in Figure 4. ization by allowing M survivors per state and peforming channel estimation for each survivor at each step using an 6 Future Work LMS algorithm. To extend this algorithm to oversampled There are a few key areas still requiring work on this systems, a given branch metric is now calculated as the problem. The convergence analysis presented for the total Euclidean distance between the samples in a symbol stochastic gradient algorithm must be updated to take period and the samples predicted along that branch in the into account variation of the sample data from the entrellis. semble averages, and convergence in terms of mean and The channel update algorithm is also altered. The min- variance needs to be considered. Global convergence conimum mean squared error solution for the channel update ditions must also be considered more closely. It would also is given by hopt n (l) = Rn(l) = E[yn (k)s(k ? l)] which is be nice to have an algorithm with faster convergence propeasily approximated. Thus, instead of using a gradient erties; in particular, it would be desireable to nd a fast search algorithm, this ensemblePaverage is approximated ?1 y (k)^s(k ? l) where method for computation of the Hessian so that a Newtonby a sample mean R^ n(l) = T1 Tk=0 n type algorithm could be eciently implemented. s^(k) is the estimate of the kth data bit for the path under consideration. References [1] S. Haykin, Ed., Blind Deconvolution, PTR Prentice5 Results and Conclusions Hall, Englewood Clis, 1994. The complexity of the two algorithms must be considered in evaluating performance. The subchannel response [2] L. Tong, G. Xu, and T. Kailath, \Blind Identi cation and Equalization Based on Second-Order Statistics: matching (SRM) method, as mentioned earlier, involves 2 A Time Domain Approach," IEEE Transactions on O(L N) operations per symbol to perform one iteration of Information Theory, Vol.40, pp. 340-349, March 1994. the gradient search. The trellis searching algorithm disL cussed requires O(LM2 N) operations per symbol, signif- [3] S. Schell, D. Smith, and S. Roy, \Blind Channel Idenicantly more than the SRM estimator, but also decodes ti cation using Subchannel Response Matching," Prothe bits as it runs with no need for an additional sequence ceedings of the 1994 Conference on Information Sciestimator. ence and Systems, Vol. 2, pp. 858-862. For the case N = 4; L = 5, two channels labelled \good" and \bad" are considered with zeroes as speci- [4] L. Baccala and S. Roy, \A New Blind Time-domain ed below. One hundred trials, each with a random iniChannel Identi cation Method Based on Cyclostatial guess, were averaged over to obtain Figures 1-3 on tionarity," IEEE Signal Processing Letters, pp. 89-91, the following page. The gain factor from section 3.3 for June 1994. these channels is approximately = 0:07. The normallized mean squared error (NMSE) is used as the gure of merit [5] G. Mathew, S. Dasgupta, and V. U. Reddy, \Improved Newton-Type Algorithm for Adaptive Impleand is de ned for a channel h^ trying to identify h as ^ mentation of Pisarenko's Harmonic Retrieval Method NMSE = k kh^hk2 ? khhk2 k2 . and Its Convergence Analysis," IEEE Transactions on Signal Processing, Vol. 42, pp. 434-437, February 1994. Subchannel Good Bad 0 0.353 0.353 j 0.636 0.636j [6] N. Seshadri, \Joint Data and Channel Estimation Us0.259 0.150 j 0.692 0.400j ing Blind Trellis Search Techniques," IEEE Trans1 -0.104 0.590 j -0.121 0.689j actions on Communications, Vol. 42, pp. 1000-1011, 1.691 0.615 j 1.122 0.410j February/March/April 1994. 2 0.069 0.393 j 0.148 0.837j -1.221 1.455 j 1.034 0.376j 3 0.752 2.067 j 0.393 1.081j -0.800 1.385 j -0.525 0.909j

1

1

10

10

0

10

0

SRM (Gain=0.03)

10

SRM (Gain=0.03)

−1

10

SRM (Gain=0.07) SRM (Gain=0.15)

NMSE

NMSE

TS (M=2)

SRM (Gain=0.07)

TS (M=2)

SRM (Gain=0.15) TS (M=8)

−1

10 −2

10

TS (M=8)

−3

10

0

−2

200

400

600

800

1000 1200 Iteration

1400

1600

1800

10

2000

0

200

400

600

800

1000 1200 Iteration

1400

1600

1800

2000

Figure 1: Normallized Mean Squared Error for the \Good" Figure 3: Normallized Mean Squared Error for the \Bad" Channel, SNR = 18 dB: The gain factor of = 0:07 ap- Channel, SNR = 18 dB: The \bad" channel adversely afpears too conservative. fects SRM but not TS.

1

1

10

0

10

10

0

10

SRM (Gain=0.15) NMSE

NMSE

SRM (Gain=0.03) TS (M=2)

SRM (Gain=0.07)

−1

10

TS (M=2)

−1

10

SRM (Gain=0.15)

SRM (Gain=0.15, 8 starts)

TS (M=8) TS (M=8)

−2

−2

10

0

200

400

600

800

1000 1200 Iteration

1400

1600

1800

2000

10

0

200

400

600

800

1000 1200 Iteration

1400

1600

1800

2000

Figure 2: Normallized Mean Squared Error for the \Good" Figure 4: Normallized Mean Squared Error for the \Bad" Channel, SNR = 8 dB: The drop in SNR of 10 dB aects Channel, SNR = 18 dB: The improved SRM algorithm shows a large gain vs. SRM on the \bad" channel. both algorithms the same.