Blind Separation of Synchronous Co-Channel Digital Signals Using

Blind Separation of Synchronous Co-Channel Digital Signals Using an Antenna Array. Part I. Algorithms Shilpa Talwar

Mats Viberg

Arogyaswami Paulraj

Scienti c Computing Dept. of Applied Electronics Information Systems Lab. Stanford University Chalmers Univ. of Technology Stanford University S.C.C.M., Building 460 S-41296 Gothenberg Durand 133 Stanford, CA 94305 Sweden Stanford, CA 94305 July 18, 1995

EDICS SP 3.10 - Computational Algorithms Permission to publish this abstract separately is granted

Abstract We propose a maximum-likelihood approach for separating and estimating multiple synchronous digital signals arriving at an antenna array. The spatial response of the array is assumed to be known imprecisely or unknown. We exploit the nite alphabet (FA) property of digital signals to simultaneously determine the array response and the symbol sequence for each signal. Uniqueness of the estimates is established for signals with linear modulation formats. We introduce a signal detection technique based on the FA property which is dierent from a standard linear combiner. Computationally ecient algorithms for both block and recursive estimation of the signals are presented. This new approach is applicable to an unknown array geometry and propagation environment, which is particularly useful in wireless communication systems. Simulation results demonstrate its promising performance.

Email: [email protected], Ph: (415) 723-0061, Fax: (415) 723-2411. This work was supported in part by the Computational Science Graduate Fellowship Program of the Oce of Scienti c Computing in the Department of Energy.

1

1 Introduction Wireless communication systems are witnessing rapid advances in volume and range of services. A major challenge for these systems today is the limited radio frequency spectrum available. Approaches that increase spectrum eciency are therefore of great interest. One promising approach is to use antenna arrays at the cell sites. Array processing techniques can then be used to receive and transmit multiple signals that are separated in space. Hence, multiple co-channel users can be supported per cell to increase capacity. In this paper, we study the problem of separating multiple synchronous digital signals received at an antenna array [1]. The goal is to reliably demodulate each signal in the presence of other co-channel signals and noise. The complementary problem of transmitting to multiple receivers with minimum interference at each receiver has been studied in [2, 3, 4]. Several algorithms have been proposed in array processing literature for separating co-channel signals based on availability of prior spatial or temporal information. The traditional spatial algorithms combine high resolution direction- nding techniques such as MUSIC and ESPRIT [5, 6] with optimum beamforming to estimate the signal waveforms [7, 8]. However, these algorithms require that the number of signal wavefronts including multipath re ections be less than the number of sensors, which restricts their applicability in a wireless setting. In the recent past, several property-restoral techniques have been developed which exploit the temporal structure of communication signals, while assuming no prior spatial knowledge. These techniques take advantage of signal properties such as constant modulus (CM) [9], discrete alphabet [10, 11], self-coherence [12], and high-order statistical properties [13, 14]. In this paper, we propose a new property-restoral approach that takes advantage of the discrete as well as the nite alphabet (FA) property of digital signals. Our approach is termed blind since it does not require any training signals for signal demodulation. This is particularly useful in situations where training signals are not available. For example, in communications intelligence, training signals are not accessible. In cellular applications, blind algorithms can be used to reject interference from adjacent cells. In GSM for example, adjacent cell interference appears only over a partial burst and training signals do not help. Blind algorithms are also bandwidth ecient due to elimination of training sets. Moreover, the study of blind algorithms can be used to complement existing non-blind techniques. For example, as a result of our investigation of uniqueness for the blind problem, we propose a minimal set of training signals which can be used in a non-blind multi-user scenario. An important advantage of our approach is 2

that in such scenarios, training sets can easily be incorporated to initialize our algorithms. The algorithms presented herein can be used to demodulate multiple synchronous digital signals in a coherent multipath environment. To guarantee unique signal estimates, we assume that the number of signals does not exceed the number of sensors, and that the channel is constant over a sucient number of snapshots. The synchronous assumption is reasonable in microcell air interfaces where symbol timing can be eectively controlled. Extension of our approach to asynchronous transmission and delay spread channels is relatively straightforward, the main dierences in these scenarios being a) instead of matched ltering the array output is oversampled, and b) the signal estimation step is replaced by maximum-likelihood sequence estimation (MLSE) since the channel is no longer memoryless. Recently, some non-ML techniques have also been proposed which use subspace information to rst synchronize the signals and remove intersymbol interference, and then use one of the synchronous FA algorithms presented herein to separate the signals [15, 16, 17]. The outline of the paper is as follows. We introduce the data model in Section 2. In Section 3, we consider the problem of uniquely identifying the signals when the array response structure is unknown. In Section 4, the maximum likelihood (ML) estimator for the array responses and symbol sequences is discussed. Two ecient block algorithms are presented in Section 5, and their convergence is analyzed. Recursive extensions of these algorithms are discussed in Section 6. In Section 7, we present simulation results to demonstrate the performance of these algorithms. Finally, we conclude with directions for future work in Section 8.

2 Problem Formulation Consider d narrowband signals impinging at an array of m sensors with arbitrary characteristics. The signal waveform received at each sensor is demodulated with respect to the carrier frequency (assuming perfect carrier phase lock recovery). The m 1 vector of sensor outputs, x (t), in the absence of multipath, is given by

x(t) =

d X k=1

pk a(k )sk (t) + v (t)

(1)

where pk is the amplitude of the kth signal, a(k ) is the array response vector to a signal from direction k , sk () is the kth signal waveform, and v () is additive white noise with covariance 2I. In a realistic communication scenario however, there are multiple re ected and diracted paths from the source to the array. These paths arrive from dierent angles, and with dierent attenua3

tions and time delays. The array output becomes

x(t) =

qk d X X k=1 l=1

a(kl ) pkkl sk (t ? kl) + v (t)

where qk is number of subpaths for the kth signal, and kl and kl are, respectively, the attenuation and time delay corresponding to lth subpath. We assume that the propagation delays associated with these paths is much smaller than the inverse bandwidth of the signals. The delays can thus be modeled as phase-shifts under the narrowband assumption. The new data model becomes d X

x(t) =

k=1

pk ak sk (t) + v(t)

(2)

P

where ak is now the total array response vector ak = Dl=1k kl e?j!c kl a(kl ), and !c is the carrier frequency. The spatial structure of the array response vector ak cannot be exploited if the number of paths is larger than the number of sensors. However, we can exploit the temporal structure of digital signals with memoryless linear modulation formats

sk (t) =

N X n=1

bk(n)g(t ? nT );

where N is the number of symbols in a data batch (burst), fbk ()g is the symbol sequence of the kth user, T is the symbol period, and g() is the unit-energy signal waveform of duration T . For simplicity, we assume that the symbols belong to the alphabet = f1; 3; : : :; (L ? 1)g for real signals, and = f1; 3; : : :; (L ? 1)g fj; j 3; : : :; j (L ? 1)g for complex signals. These correspond to the important cases of PAM and QAM modulation formats. Now, assuming that the signals are symbol-synchronous, we perform matched ltering on (2) to obtain the following equivalent discrete representation of the data

x(n) =

d X k=1

pk ak bk (n) + v(n):

(3)

The noise term v(n) remains white, and it is easily seen that the output of the matched lter is a sucient statistic for determining the transmitted symbols [18]. We can rewrite (3) in matrix form

x(n) = As(n) + v(n) 4

(4)

where x(n) is the ltered data, s(n) = [b1(n) : : : bd(n)]T , v(n) is additive white noise, and A is an m d matrix of array responses scaled by the signal amplitudes A = [p1a1 : : : pdad ]. Assuming that the channel is constant over the available N symbol periods, we obtain the following block formulation of the data

X(N ) = AS(N ) + V(N )

(5)

where X(N ) = [x(1) : : : x(N )], S(N ) = [s(1) : : : s(N )], and V(N ) = [v(1) : : : v(N )]. The matrix A represents the spatial structure of the data, and the matrix S represents its temporal structure. The problem addressed in this paper is the combined estimation of the array response matrix A and the symbol matrix S(N ), given the array output X(N ). We assume that the number of signals is known or has been estimated [19]. For notational convenience, we denote X X(N ) and S S(N ), from here on.

3 Identi ability Before discussing the estimation problem, we consider the problem of uniqueness of signal estimates in the absence of noise. This problem can be viewed as a nonlinear factorization of the data matrix XmN into factors Amd and SdN , such that X = AS. In the case that columns of matrix A lie on the array manifold (de ned as the set fa() : 2 [0; 2 ]g) and S is an an arbitrary full-rank matrix, it is well-known that this factorization is \unique" provided 1) any set of m vectors from the array manifold is linearly independent, and 2) d < m [20]. There is an ordering ambiguity in the signal estimates since X = AS = APTPS = A S; where A and S is also a valid solution pair for any permutation matrix P. Our problem is the opposite of the standard problem. We assume that A is an arbitrary fullrank matrix, but the elements of S belong to a nite alphabet . We rst consider the case of binary antipodal signals, i.e. = f1g, and then generalize to larger alphabets. As before, the solution to the nonlinear system of equations, X = AS, is not unique under our assumptions. For example, A = AT and S = T?1S, is a solution for any nonsingular matrix Tdd that is a diagonal with 1 entries or a permutation matrix or a product of the two. In this case, there is an additional ambiguity in the sign of the estimated signals, beyond the usual ordering ambiguity. However, the sign ambiguity can be easily removed once the signals have been estimated by appropriate 5

decoding of each symbol sequence. Hence, the sign and ordering ambiguities do not present a serious problem as the correct signals are still received. We de ne the system X = AS to be identi able if all simultaneous solutions can be written as AT and T?1S, where T is a nonsingular matrix with exactly one non-zero element (+1 or ?1) in each row and column. For convenience, we call such a matrix an Admissible Transform Matrix (ATM). The following theorem gives a sucient condition for the identi ability of binary signals.

Theorem 3.1 Let X = AS where Amd is an arbitrary full-rank matrix with d m, and SdN is a full-rank matrix with 1 elements. If the columns of S include all the 2d?1 possible distinct (up to a sign) d-vectors with 1 elements, then A and S can be uniquely identi ed up to a matrix T with exactly one non-zero element, f+1; ?1g, in each row and column. Proof: It is easily seen that the condition on S also implies that it is full rank. Suppose there S . Then exists another pair, A and S , both full rank, such that X = A , we get Solving for A

X = AS = A S

(6)

A = ASS y

(7)

where ()y denotes the pseudo-inverse. We multiply (7) by S to obtain

A S = ASS yS and together with (6), this implies A(S ? SS yS ) = 0. Since A is full rank, we must have

S = SS yS :

(8)

Note S y S is a projection matrix which projects the rows of S onto the row space of S . Thus from (8), we see that S and S must share a common row space. That is

S = TS:

(9)

Now, it remains to show that T is an ATM. We normalize the rst element of each column of S to +1 by post-multiplying (9) by a diagonal 6

matrix D with diagonal elements Djj = S1j (j = 1 : : :N )

S D = TSD ) S (1) = TS(1):

(10)

Next, we post-multiply the resulting system (10) by a N N permutation matrix that reorders the columns of S(1) to make the rst 2d?1 columns distinct

S(1)P = TS(1)P ) S (2) = TS(2):

(11)

Dropping the superscripts in (11), we have the following equivalent system of equations 2

tT 6 1 6 tT 6 2 6

S = TS = 6 .. 6 . 4 tTd

3 7 7 7 7 [s1 s2 7 7 5

: : : sN ]

where S is an arbitrary full rank matrix of 1 elements, and S is a normalized matrix with the rst n = 2d?1 columns distinct. We can partition S = [Sn Sr ], where Sn is the submatrix of n distinct columns and Sr is the submatrix of remaining columns. Note that the columns in Sr are repeated from the rst n, and thus provide no extra information in determining T. Therefore, we consider only the equations de ned by Sn . Each row t of T must satisfy tT Sn = sTn , where sTn is a subvector of the corresponding row of S and can be of any one of the 2n possible n-vectors. More explicitly, for tT = [td : : : t2 t1 ], we have a system of equations of the type

td + (td?1 + : : : + t2 + t1) = 1 td ? (td?1 + : : : + t2 + t1) = 1

.. . td + (td?1 + : : : + t2 ? t1) = 1 td ? (td?1 + : : : + t2 ? t1) = 1:

In Appendix A.1, we show by induction that the solution of this system has the form: ti = 1 for some i, and tj = 0 for j = 1; : : :d; j 6= i. This implies that each row of S is a multiple of a row of S. Since S has rank d, all rows of S, up to a sign, are included in S . Therefore, the rows of T are distinct, and it is an ATM. 2 7

The identi ability of larger alphabets follows easily from the binary case. Consider the alphabet

of length L with symbols f1; 3; : : :; (L ? 1)g. We assume that the columns of S include all the l = Ld =2 possible distinct (up to a sign) d-vectors. In particular, they include n = 2d?1 distinct vectors of 1 elements and n = 2d?1 distinct vectors of (L ? 1) elements. As before, (without loss of generality) we can normalize each column of S such that the rst symbol is positive, and permute the columns so that we can partition S = [Sn (L ? 1)Sn Sr ], where Sn is a submatrix of n distinct vectors with 1 elements and Sr is the remaining submatrix. Each row t of T again satis es tT S = tT [Sn (L ? 1)Sn Sr ] = sT ; (12) where sT is the corresponding row of S , and is partitioned likewise as sT = [sT1;n sT2;n sTr ]. Although there are l distinct equations de ned by the columns of S, the subset of equations determined by the rst 2n columns alone results in a trivial solution for t, i.e. only one non-zero element. This follows from Equation (12),

tT Sn = sT1;n tT (L ? 1)Sn = sT2;n

(13) (14)

which implies (L ? 1)sT1;n = sT2;n: Since the elements of both s1;n and s2;n belong to the alphabet

, the only possibilty for the entries of s1;n are 1s. Thus, from Equation (13), we see that the problem is reduced to one of identi ability of binary signals. For this problem, we have shown previously that t is a trivial solution, and consequently, T is an ATM. Finally, the generalization to complex signals is straightforward. In this case, we see that the signals can be identi ed uniquely up to a factor f+1; ?1; +j; ?j g. As before, we have the system of equations S = TS, where S and S are now complex matrices with elements in = f1; 3; : : :; (L ? 1)g fj; j 3; : : :; j (L ? 1)g. We are interested in nding a condition on the columns of S such that T is an ATM. For complex signals, we extend the de nition of an ATM to a nonsingular matrix with one non-zero element, f1; j g, in each row and column. We begin by noting that multiplication of complex matrices is isomorphic to multiplication of real matrices with twice the dimensions [21]. In particular, we have 2

3

2

RefS g ?ImfS g 5 4 RefTg ?ImfTg 4 = ImfTg RefTg ImfS g RefS g 8

32 54

3

RefSg ?ImfSg 5 ; ImfSg RefSg

(15)

and by denoting each of the real block matrices above with a subscript ()R , we get S R = TR SR . The matrix of received signals, SR , now has dimensions d 2N , for d = 2d. If the rst N columns of SR include all the l = Ld=2 possible distinct vectors, then there is exactly one non-zero element, a +1 or a ?1, in each of the d rows of the submatrix [RefTg ? ImfTg]. Furthermore, each row of this submatrix must be distinct since S is full rank, and thus, T = RefTg + jImfTg is an ATM. It is important to note here that the identi ability of d complex signals is equivalent to that of d = 2d real signals. We can summarize the above discussion by the following theorem:

Theorem 3.2 Let X = AS where Amd is an arbitrary full-rank matrix with d M , and SdN is a full-rank matrix with elements in .

1. Real Case: If the columns of S include all the Ld =2 possible distinct (up to a sign) d-vectors with elements in = f1; 3; : : :; (L ? 1)g, then A and S can be uniquely identi ed up to a matrix T with exactly one non-zero element, f+1; ?1g, in each row and column. 2. Complex Case: If the columns of S include all the L2d =2 possible distinct (up to a sign) d-vectors with elements in = f1; 3; : : :; (L ? 1)g fj; j 3; : : :; j (L ? 1)g), then A and S can be uniquely identi ed up to a matrix T with exactly one non-zero element, f+1; ?1; +j; ?j g, in each row and column.

In Theorem 3.2, we give a sucient condition for identi ability of signals that belong to a nite alphabet . Now, we show that if N is suciently large, the probability of achieving identi ability approaches one. We consider the case with real signals, noting that the following result also holds for complex signals, with d replaced by d.

Theorem 3.3 Let p denote the probability of receiving all the Ld=2 distinct (up to a sign) columns of S in N snapshots, N Ld =2. Assuming that the probability of receiving each symbol in is equal, and that the snapshots are independent, p is bounded by d d 1 ? L L ?d 2 2 L

Proof: See Appendix A.2. 2

!N

p 1:

(16)

We see from Equation (16) that for a xed value of d, as N increases, p approaches 1. The

9

probability q = 1 ? p that one of the Ld =2 distinct vectors is not picked can then be bounded by N d d L 2 q 2 1 ? Ld L2 exp(? (LNd =2) ) :

Hence, the probability of missing one of the distinct vectors approaches zero exponentially fast, the rate of decay depending on the ratio (LNd =2) . Nevertheless, for large values of d, the number of snapshots required for identi ability seems quite large. But the condition requiring Ld =2 distinct vectors is only sucient, and far from being necessary. In fact, we see in Appendix A.3 that if S contains a speci c set of d + 1 columns for

= f1g, it can be determined uniquely up to an ATM. Although this result remains to be generalized, it shows that in practice identi ability can be achieved with far fewer snapshots.

4 Maximum Likelihood Estimation In this section, we consider the problem of estimating the digital signals in the presence of noise. From Section 2, we see that the signals can be modeled as unknown deterministic sequences corrupted by white Gaussian noise: x(n) = As(n)+ v(n) where E [v(n)v(k)] = 2 I nk. The maximum likelihood estimator yields the following separable least-squares minimization problem min kX(N ) ? AS(N )k2F

A;S(N )2

(17)

in the variables A and S(N ) which are respectively continuous and discrete. We assume N is large enough to ensure unique signal and array response estimates. It is proved in [22] that the minimization can be carried out in two steps. First, we minimize (17) with respect to A since it is unconstrained

A^ = X(N )S(N )y = X(N )S(N )(S(N )S(N ))?1: ^ back into (17), we obtain a new criterion which is a function of S(N ) only Then, substituting A ? k2 ; min k X ( N ) P S(N ) F S(N )2

(18)

where P?S (N ) = IN ? S(N )(S(N )S(N ))?1 S(N ). The global minimum of (18) can be obtained by enumerating over all possible choices of S(N ). But, this search has an exponential complexity 10

in the number of symbols N and the number of signals d (d for complex signals), and can be computationally prohibitive, even for modest size problems. In the next section, we consider two iterative block algorithms that have a lower computational complexity.

5 Block Algorithms The block algorithms, ILSP and ILSE, were introduced in [1]. These algorithms take advantage of the ML estimator in Equation (17) being separable in the variables A and S. The ML criterion is minimized with respect to the two variables using an alternating minimizations procedure. The idea is to visit the received data iteratively until a best t with the channel (array response) and signal model is obtained. The ILSP algorithm performs well with no prior estimate of the array responses, and can be used to initialize ILSE. For a suciently good initialization, ILSE algorithm converges rapidly to the ML estimate of the array responses and signal symbol sequences. Apart from their eciency, the algorithms are naturally parallelizable, and can be easily extended for recursive estimation (see Section 6).

5.1 ILSP Algorithm We begin by assuming that f (A; S; X) = kX ? ASk2F in (17) is a function of unstructured contin^ of A, the minimization of f (A ^ ; S; X) uous matrix variables A and S. Given an initial estimate A with respect to continuous S is a least-squares problem. Each element of the solution S is projected to its closest discrete values, say S^ . Then a better estimate of A is obtained by minimizing f (A; S^ ; X) with respect to A, keeping S^ xed. This is again a least-squares problem. We continue this process until A^ and S^ converge.

Iterative Least Squares with Projection (ILSP) 1. Given A0, k = 0 2. k = k + 1

S k = (Ak?1Ak?1)?1Ak?1X Sk = proj [S k] Ak = XSk(SkSk )?1 3. Repeat 2 until (Ak ? Ak?1) = 0 11

In the above description of the algorithm, proj [] implies projection onto a discrete alphabet. However, this de nition may be extended to projection onto a constant modulus or any other signal characteristic. Moreover, in scenarios where the array manifold structure is applicable, Ak can be projected onto the manifold at each iteration. Hence, the ILSP algorithm outlines a general approach for imposing known structure on variables A and S in a minimization criterion of the form kX ? ASk2F . The main advantage of the algorithm is its low computational complexity. At each iteration, two least-square problems are solved, each requiring O(Nmd) ops for N m. In ^ , and Nmd +2d2(m ? 3d )+ Nd2 particular, Nmd +2d2(N ? d3 )+ md2 ops are required to solve for A

ops to solve for S^ . Thus, the algorithm's complexity is polynomial in N and d.

5.1.1 Real Signals If the signals belong to a real alphabet, we take advantage of this fact by constraining the imaginary part of S^ to be zero in each step, and thus, reducing the number of unknowns by a half. Equivalently, we can minimize a slightly modi ed criterion f (AR; S; XR) by noting that

kX ? ASk2F = k [RefXg ? RefAgS] + j [ImfXg ? ImfAgS] k2F = kRefXg ? RefAgSk2F + kImfXg ? ImfAgSk2F

2 3

2 3 2

RefXg

Re f A g 5 S

5?4 =

4

ImfXg ImfAg F = kXR ? AR Sk2F ; where XR and AR are real augmented matrices as shown above. Thus, the algorithm proceeds as before, the only dierence being that for real signals, we consider augmented matrices and replace () by ()T .

5.1.2 Initialization A common initialization strategy in optimization for nonlinear problems with mixed discrete and continuous variables is to use the solution of the continuous problem as an estimate for the mixed problem [23]. The continuous solution for the ML criterion in Equation (18) is Vd, the right singular vectors corresponding to the largest d singular values of of X. Hence, we can initialize ILSP with S0 = proj [Vd]. It is easily seen that this is equivalent to initializing with A0 = Ud, the largest d left singular vectors of X. Since it is expensive to compute the singular value decomposition (SVD), we initialize instead with Xd , the rst d rows of X (equivalently A0 = Imd ) which have the same 12

rowspan as Vd in the absence of noise. This approximation works well if all signals have nearly equal powers. If some signals are much stronger than the others, then S0 = proj (Xd) may be rank de cient. In this case, we add a small diagonal perturbation matrix to S0 in order to restore its rank. In general, if Sk or Ak becomes rank-de cient at any iteration, we have found this simple strategy of perturbing the diagonal to be quite eective in extracting all the signals in dicult scenarios. One may also use the soft-orthogonalization strategy proposed in [9] for the multiple source CM problem. The availability of prior information to initialize the ILSP algorithm can improve the possibility of global convergence, and help reduce the number of iterations. For example, in scenarios where imprecise knowledge of the spatial structure of the array is available, estimates of the array responses obtained from traditional techniques, such as MUSIC or ESPRIT, can be used to construct A0. Alternatively, if the users initially transmit a short set of training signals, such as the sequence of d symbols that de ne the matrix Sd given in Appendix A.3: Equation (56), then A0 can be estimated from the received data by A0 = XSd?1 . The advantage of using this particular training set is that S?d 1, given in (59), is easily computed and is sparse. Although the algorithm is no longer blind, revisiting the data iteratively can signi cantly improve channel and signal estimates in situations with low SNR's.

5.1.3 Comparison with Relevant Algorithms The ILSP algorithm is similar to the LS-CMA and multi-target LS-CMA algorithms proposed in the CM literature [9, 24, 25]. But, there are two key dierences. First, the FA property is stronger than CM for digital signals (with PSK modulation format) since the signals are restricted to lie on discrete points on a disk. Second, these algorithms use a MMSE beamformer to estimate the signal waveforms. They minimize the following performance criterion using the alternating projections technique kWX ? Sk2F : (19) Wmin ;S2CM It is shown in [24] that for a single user and an unknown spatial noise covariance, the above criterion yields the maximum likelihood signal estimate. However, when multiple signals are present, this criterion suers from the drawback that only the strongest signals may be captured. For example, consider the case where two weight vectors have converged to the same (stronger) signal. The ^ X ? S^ k2F with S^ = proj [W ^ X], may be small in this case. Yet the ML MMSE residual, kW ^ S^ k2F where A^ = XS^ + , will be large since a best t of the array response and signal residual, kX ? A 13

model to the received data is not obtained. Now, let us examine the ML and MMSE schemes to see precisely how they dier. We can express ILSP algorithm succinctly as

Sk+1 = proj [(XS+k )+ X];

(20)

and a MMSE scheme such as multi-target LS-CMA [25] without soft-orthogonalization as

Sk+1 = proj [SkX+X]:

(21)

Since the pseudo-inverse does not satisfy (XS+k )+ = Sk X+ in general [26], the two algorithms yield dierent signal estimates. Note the matrix Ak = XS+k in (20) is poorly conditioned near the solution A, if the angular separation between array response vectors is small. In contrast, the data matrix X in (21) becomes poorly conditioned for high SNR's. The MMSE algorithm is computationally less expensive. However, we have found the performance of the ML algorithm to be more favorable in a blind multiple signal scenario. In [10], Swindlehurst et al have proposed a decision directed technique for digital signals using both the MMSE and ML beamformers (similar to ILSP), assuming a rough estimate of the signal of interest is available. They have shown that the asymptotic symbol error rate for the MMSE beamformer W = RSX R?XX1 is lower than the error rate that of ML beamformer W = A+ . Hence, for large N , the converged signal estimates obtained from ILSP may be improved by applying the the MMSE approach.

5.2 ILSE Algorithm A limitation of the ILSP algorithm is that its performance is limited by that of the ML beamformer. This is easily seen by considering the case where A is known, and the ML criterion is to be minimized with respect to the variable S only. In ILSP, S 2 is not estimated directly, but in two steps: (i) least-squares and (ii) projection. The least-square step causes noise enhancement if the array response vectors are not well separated in angle, i.e. A is ill-conditioned. The optimal approach is to enumerate over all possible S matrices with elements in , and choose the S that minimizes kX ? ASk2F . But, this is computationally demanding since LdN matrices need to be considered. Fortunately, the search can be reduced to enumerating Ld vectors in (N times) by exploiting the

14

following property of the Frobenius norm: 2 min kx(1) ? As(1)k2F + : : : + s(min kx(N ) ? As(N )k2F S2 kX ? ASkF = smin (1)2

N )2

(22)

Hence, minimization over the signal vectors s(1) : : : s(N ) can be carried out independently. For each s(n), the ML estimate ^s(n) is obtained by enumerating over all Ld possible vectors s(j ) 2 , and choosing the one that minimizes ^s(n) = arg min kx(n) ? As(j)k2; j = 1 : : :Ld: s j 2

( )

(23)

The ILSE algorithm proceeds in a similar fashion as ILSP. Given an initial estimate of A (possibly from ILSP), we iterate using an alternating minimization technique: minimize f (A; S; X) with respect to S 2 and then with respect to A, at each step. The residual function f (A; S; X) is decreased at each iteration, and for a reasonably good initial A0, convergence to the global minimum is rapid. The algorithm has complexity O(NmdLd ) ops per iteration: Nmd + 2d2(N ? d3 ) + md2 ^ and NmLd (d + 1) ops to enumerate.

ops to solve for A

Iterative Least-Squares with Enumeration (ILSE) 1. Given A0, k = 0 2. k = k + 1

Let A =: Ak?1 in (22), and minimize for Sk (Enumeration) Ak = XSk(SkSk )?1

3. Repeat 2 until (Ak ? Ak?1) = 0

5.3 Convergence We have recently discovered that ILSE algorithm is very similar to the Segmental K-Means algorithm used for estimating parameters of hidden Markov models [27]. The K-means algorithm is an iterative scheme which alternates between two key steps: (i) segmentation performed via generalized Viterbi algorithm and (ii) optimization. In [27], Juang and Rabiner prove xed point convergence of the algorithm. Their proof is based on Zangwill's convergence theorem which is a general result for algorithms in nonlinear programming [28]. It is shown that the K-means algorithm satis es the conditions of the theorem. Rather than taking the same approach, we present 15

a simple proof which shows that ILSE algorithm converges to a xed point in a nite number of iterations. We rst review some de nitions that will be needed for the proof [28]. Let T : V ! V be a mapping from a point in space V to a point in V . An algorithm is an iterative process that generates a sequence fVk g1 k=1, given a point V0 in V , by successive transformations Vk = T (Vk?1). A point V is a xed point of T if V = T (V). Let be the set of xed points of T . A function f is a descent function for mapping T if it satis es the conditions: 1. f : V ! IR is non-negative and continuous ) < f (V) for V = T (V) and V 2= 2. f (V ) f (V) for V = T (V) and V 2 3. f (V Many algorithms use the objective function to be minimized as the descent function f . The conditions 1-3 require that the algorithm reduce f at each iteration until a xed point is reached. Within this framework, we can consider ILSE algorithm as a mapping on the cross-product space V = A S where A = C md and S = dN . Starting with an arbitrary pair V0 = (A0; S0), the iterates Vk = (Ak; Sk ) = T ((Ak?1; Sk?1)) are determined by two minimization steps (i) Sk = arg minS2 kX ? Ak?1Sk2F , and (ii) Ak = arg minA kX ? ASk k2F . In order for the mapping T to be well-de ned, we need to ensure that for each pair (Ak?1; Sk?1) 2 V , the algorithm computes a unique pair (Ak; Sk ) 2 V . That is, the algorithm computes a unique Sk for a given Ak?1, and a unique Ak for a given Sk . For a given Ak?1, it may be possible that Sk in (i) is not unique, such as when Ak?1 is rank de cient. In this case, we need to impose a rule that uniquely de nes Sk . For example, in a computer implementation of the algorithm, there is an ordering associated with enumeration, and we choose Sk to be the minimizer of (i) with the lowest index. With respect to Ak in (ii), given a full rank Sk , the least-squares estimate Ak is unique. However, if Sk is rank-de cient, we need to compute the minimum-norm least-squares solution for Ak which is unique. It is natural to consider the residual function f (A; S; X) = kX ? ASk2F as a descent function of the algorithm. Clearly, f (A; S; X) is non-negative and continuous in A and S. Consider

f (Ak ; Sk; X) = kX ? AkSk k2F

2 = min A kX ? ASk kF

kX ? Ak?1Skk2F 2 = min S2 kX ? Ak?1SkF 16

(24) (25) (26) (27)

kX ? Ak?1Sk?1k2F = f (Ak?1; Sk?1; X):

(28) (29)

Based on our assumptions, inequality (26) is strict unless Ak = Ak?1 and inequality (28) is strict unless Sk = Sk?1. Hence, at each iteration the residual is strictly decreased unless (Ak?1; Sk?1) = T ((Ak?1; Sk?1)) is a xed point. The convergence proof is based on the observation that each of the iterates, Sk 2 dN , belongs to a nite set. Let us assume there are P possible Sk matrices. Since there is a unique least-squares estimate Ak associated with each Sk , there are only P pairs (Ak ; Sk ) that can be generated by +1 obtained from the the ILSE algorithm (for k 1). Consider the sequence of pairs f(Ak ; Sk)gPk=1 algorithm. There are P +1 iterates in the sequence taken from a set of P possible elements. Hence, at least two of the iterates must be the same. Let us assume j and j + l are the lowest indices for which (Aj ; Sj ) and (Aj +l ; Sj +l) are the same. The residual for the two iterates is equal

f (Aj+l ; Sj+l ; X) = f (Aj ; Sj ; X):

(30)

If l = 1, then (Aj ; Sj ) is a xed point of the algorithm since (Aj +1; Sj +1) = T ((Aj ; Sj )) = (Aj ; Sj ). Let us assume (Aj ; Sj ) is not a xed point, and l > 1. Then, f (Aj +1; Sj +1; X) must be strictly less than f (Aj ; Sj ; X), and hence

f (Aj+l ; Sj+l; X) < f (Aj ; Sj ; X)

(31)

However, since (30) and (31) are contradictory, (Aj ; Sj ) must be a xed point. To summarize, we have the following convergence theorem for ILSE.

Theorem 5.1 Let (A0; S0) 2 C md dN , and the iterates (Ak; Sk) = T ((Ak?1; Sk?1)), k 1, be determined by ILSE algorithm. Let f (A; S; X) be a descent function of the algorithm. Then, there exists some j0 such that for k j0, (Ak; Sk ) = T ((Aj ; Sj )) and f (Ak ; Sk ; X) = f (Aj ; Sj ). 0

0

0

0

Hence, a xed point is reached in a nite number of steps, and can be detected by a lack of change in the residual. The global minimum is a xed point of the iteration. This follows from the fact that a transition to another point would be possible only if the residual is reduced, but this cannot occur since the global minimum is the point with the lowest residual. Note that the theorem is valid for any initial guess (A0; S0). However, the sequence f(Ak ; Sk)g and the xed point it converges to depends on the initial guess. 17

We have found in our simulations that for some initial iterates, the algorithm converges to xed points that are not the global minima. This case can be detected by considering the magnitude 1 kX ? A ^ S^ k2F , where A ^ and S^ are converged estimates from ILSE. If a global of the residual, Nm minimum is reached, this residual is close to the noise power since the true residual

kX ? ASk2F = kVk2F N tr(2I) = Nm2: If the residual is not decreased to noise level, we have reached a non-global solution. In this case, we restart the algorithm with another initial guess. We proceed in this fashion until the global minimum is reached. For a wide range of parameters (SNR and array response vector separation), ILSE algorithm converges to the global solution when initialized with array response estimates from ILSP. If the global solution is not reached, we re-initialize ILSP with a random guess, and then re-initialize ILSE with the estimate from ILSP. Usually one or two re-initializations are sucient to yield the global minimum. Since both ILSP and ILSE converge very rapidly, re-initialization is not computationally expensive. Hence, the success of our approach is not hindered by the presence of additional xed points. We cannot apply the above convergence theory to ILSP since the algorithm does not necessarily decrease the residual at each iteration. However, in our simulations, we have found that it also converges to a xed point in a nite number of steps. Moreover, in scenarios where the array response vectors are well separated, it converges to the global minimum.

5.3.1 Cost Function In Figure 1, we symbolically depict the ML cost function kX ? ASk2 in order to understand the path taken by ILSE algorithm. We consider a scenario with d = 2 BPSK signals at 15 dB SNR, arriving from = [0; 10], and a block size of N = 3. We choose this simple case due to the complexity involved in enumerating all possible S matrices. Neverthless, this picture gives a good qualitative understanding of the path of the algorithm to a global minimum or some other xed point. In this gure, the x-axis corresponds to the 26 possible signal matrices, S(j ) , j = 1 : : : 64. The y-axis corresponds to the least-squares estimates A(i) associated with each S(i), that is A(i) = arg minA kX ? AS(i) k2 for i = 1 : : : 64. The graph itself corresponds to a matrix of residuals, wherein the entry at row i and column j corresponds to the the value of the residual kX ? A(i)S(j )k2, i; j = 1 : : : 64. In particular, the ith row in the graph corresponds to the residuals associated with 18

matrix A(i) for all possible S(j ) 's. For simplicity, we have chosen a 3 color scheme to depict the residuals. A point (i; j ) is colored gray if the value of its associated residual kX ? A(i) S(j )k2 is minimum with respect to other residuals in row i. Otherwise, it is colored white. If a point is a global minimum with respect to the whole matrix, it is colored black. For a given A0, ILSE generates an S0 which is the starting point in our graph. From this S0, the algorithm computes an A1. This step is symbolically depicted by a vertical line starting at column j such that S(j ) = S0, and moving up to the diagonal. Now, the algorithm generates S2 by enumerating over all possible S's. We represent this step by a horizontal move on the line associated with A(j ) = A1. Since we seek a unique minimum, the line segment from the diagonal ends at the leftmost gray point. This follows from the fact that in our implementation, we always pick the S(j ) with the lowest index that minimizes kX ? AS(j )k2 ; j = 1 : : :LdN . Once S1 is picked, the algorithm computes the corresponding A2 , and this once again corresponds to a move to the diagonal. We proceed in this fashion until a global minimum or a xed point is reached. In this graph, xed points are located on the diagonal. It is clear that if the leftmost minimum with respect to S for a particlular A(i) is on the diagonal, the scheme described above will not generate any more line segments and the iteration will remain xed at that point. Since the global minima are xed points of the algorithm, they are also located on the diagonal. There are multiple global solutions since the signals can be identi ed only up to an ATM, as described in Section 3. Note that in rows 1; 16; 22; 27 etc., there are multiple gray points. These correspond to cases where A(i) is singular, and thus, multiple S(j)'s yield the same residual. We have shown a few of the paths that may be taken by the algorithm, depending on the initial pair (A0; S0). Paths to the global minima are indicated by solid lines, and paths to other xed points by dashed lines. We note that for this particular scenario, 56 paths of the 64 possible paths lead to global solutions, and 8 paths lead to other xed points.

6 Recursive Algorithms In this section, we consider two classes of recursive algorithms for estimating the received signals. In recursive estimation, we are interested in solving the following minimization problem at symbol period n 2 min k ( X ( n ) ? AS ( n )) B ( n ) (32) k F A;s(n)2

1 2

where X(n) = [X(n ? 1) x(n)], S(n) = [S(n ? 1) s(n)], and B(n) = diag (n?1; n?2; : : :; 1) is a diagonal weighting matrix for some 0 < < 1. Our objective is to compute A(n) and s(n), 19

assuming that a good estimate of S(n ? 1) (or equivalently A(n ? 1)) is available. This estimate may be obtained blindly by using the block algorithms of previous section or by a short training set. The exponential weighting is used to de-emphasize old data in a time-varying environment. The \fading memory" least-squares solution for A(n) is given by

A(n) = X(n)B(n)S(n)(S(n)B(n)S(n))?1

(33)

which can be updated recursively (see [29]),

A(n) = A(n ? 1) + (x+(ns)?(nA)P(n(n??1)1)ss(n(n))) s(n)P(n ? 1):

(34)

In the above Equation, P(n) = (S(n)B(n)S(n))?1, can also be expressed recursively as

P(n) = 1 P(n ? 1) ? P(n +? s1)(sn()nP)s(n(n?)P1)(sn(n?) 1) ; P(0) = I:

In addition, we de ne H(n) = X(n)B(n)S(n), so that we can rewrite A(n) = H(n)P(n) in Equation (33). Using this framework, we are now ready to consider the two classes of recursive algorithms.

6.1 Class A The rst class includes algorithms that alternate between estimating s(n), and then updating A(n), at each symbol period. The recursive extensions of the two block algorithms belong to this class. For each data vector x(n), we rst estimate s(n) by minimizing ^s(n) = arg smin kx(n) ? A(n ? 1)s(n)k2F (n)2

(35)

using either the least-squares with projection approach or the enumeration approach. This step requires O(md2) ops or O(mdLd ) ops, respectively. Then, A(n) is computed using Equation (34), which requires O(md) ops. The enumeration approach is more robust for low SNR's, and is thus recommended whenever computationally feasible (for small values of d). The following recursive algorithm has complexity O(mdLd ) ops per snapshot:

20

Recursive Least Squares with Enumeration (RLSE) 1. Given A(n ? 1) 2. n = n + 1

Minimize (35) for ^s(n) (Enumeration) Update A(n) using (34) (RLS) 3. Continue with the next snapshot

6.2 Class B In this class, we minimize Equation (32) jointly over A(n) and s(n). This is achieved by substituting the weighted least-squares solution for A(n), given by Equation (33), back into the original minimization criterion to yield a new criterion 2 ? min k X ( n ) B ( n ) ; k P S ( n ) B ( n ) s(n)2

F 1 2

(36)

1 2

where P?B(n) S(n) = In ? PB(n) S(n) , and PB(n) S(n) is a d-dimensional projection matrix de ned by the rows of S(n)B(n) 1 2

1 2

1 2

1 2

PB(n)

1 2

?1 S(n) = B(n) S(n) (S(n)B(n)S(n) ) S(n)B(n) : 1 2

1 2

(37)

Note that the key dierence between the criterion in Equation (36) and the block ML criterion in Equation (18) is that the minimization in (36) is over s(n) only, since S(n ? 1) is assumed to be known. Hence, recursive minimization of the ML criterion is computationally tractable. We can equivalently maximize 2

max kX(n)B(n) PB(n) S(n) kF 1 2

s(n)2

1 2

(38)

which can be expressed in terms of the trace operator

tr[B(n) PB(n) S(n) B(n) X(n)X(n)]: smax (n)2

1 2

1 2

1 2

Using Equation (37), we see that

B(n) PB(n) 1 2

1 2

S(n) B(n) = B(n)S(n) P(n)S(n)B(n) 1 2

21

(39)

which can be partitioned as "

# 2B(n ? 1)S(n ? 1)P(n)S(n ? 1)B(n ? 1) B(n ? 1)S(n ? 1)P(n)s(n) : s (n)P(n)S(n ? 1)B(n ? 1) s(n)P(n)s(n)

(40)

The above follows by noting that the weighting matrix can be expressed as

B(n) =

"

B(n ? 1) 0

0

1

#

(41)

:

Likewise, we can partition "

X(n)X(n) =

X(n ? 1)X(n ? 1) X(n ? 1)x(n) x (n)X(n ? 1) x (n)x(n)

#

:

(42)

Multiplying Equations (40) and (42), and using properties of the trace, we can rewrite (39) maxs(n)2 tr[2 H(n ? 1)P(n)H(n ? 1) + H(n ? 1)P(n)s(n)x(n)] + : : : +tr[P(n)H (n ? 1)x(n)s(n) + P(n)s(n)x(n)x(n)s(n)]:

(43)

It is easily seen that H(n) can be updated recursively as

H(n) = H(n ? 1) + x(n)s(n);

(44)

and thus, (43) can be simpli ed to max tr[H(n)P(n)H(n)]:

s(n)2

Since A(n) = H(n)P(n), we estimate s(n) by maximizing ^s(n) = arg smax tr[A(n)H(n)]: (n)2

(45)

Substituting H (n) = S(n)B(n)X(n) in (45), it follows that we choose s(n) to maximize the correlation between A(n)S(n)B(n) and X(n)B(n) . In Equation (45), we compute A(n) and H(n) for each of the Ld possible vectors s(n) 2 . This is done recursively using Equations (34) and (44), and hence the computation requires O(mdLd )

ops. Next, we compute the diagonal entries of A(n)H(n) since we are only interested in the 1 2

1 2

22

trace of the product. Computing the trace for all possible vectors s(n) also requires O(mdLd )

ops. Hence, this recursive approach has computational complexity O(mdLd ), which is the same as RLSE. However, we obtain better signal estimates using this approach. The algorithm is summarized below.

Recursive Projection Update (RPU) 1. Given A(n ? 1) and H(n ? 1) 2. n = n + 1

Maximize (45) for ^s(n) Update A(n) and H(n) 3. Continue with the next snapshot

7 Simulation Results We present the results of three dierent sets of simulations in this section. For simplicity, we assume a uniform linear array of m = 4 sensors. In the rst set of simulations, we study the performance of the block algorithms for a block size of N = 100. We consider d = 3 digitally modulated BPSK signals arriving from [10; 16; 25] relative to array broadside. We assume all three signals have equal powers. Starting with A0 = Imd , we rst estimate A^ and S^ using ILSP. The estimate A^ is then used to initialize ILSE for improved array response and signal estimates. To ensure that the global minimum is achieved, we check if the residual is close to the noise power level. If this is not the case, we re-initialize ILSP with a random initial guess. This blind estimation process is repeated 104 times, each time over a dierent noise realization. Hence, a total of 106 bits are estimated for each signal. In Figure 2, we show the bit error rates achieved using ILSP with signal to noise ratios ranging from 0 to 6 dB. Although the signals are closely spaced in their directions of arrival (DOAs), this algorithm is successful in separating and estimating the three signals. We observe that the BER's depend strongly on the separation between the DOAs of the signals since s2 which is the closest to the other two signals has the highest BER, and s1 which is well separated has the lowest BER. This follows from the fact that using least-squares to estimate the signals (S = AyX) causes noise enhancement, especially if the columns of A are nearly dependent due to closely-spaced signals. 23

In ILSE however, we use enumeration to estimate the signals. As seen in Figure 3, this algorithm yields signi cantly lower BER's. We achieve bit error rates lower than 10?3 for SNR's greater than 5 dB. In Figure 4, we present the average number of iterations required by both ILSP and ILSE. The number of iterations for ILSP decrease moderately fast with increasing SNR. ILSP requires more iterations than ILSE, but each iteration is computationally cheaper. Both algorithms converge fairly rapidly to the global solution. In this dicult scenario, re-initialization was needed in less than 4% of the runs. In the next set of simulations, we compare the performance of the blind algorithms to the case where the array response matrix is completely known. The simulation set up is the same as before, except that 2 BPSK signals at = [0; 5] are received at the array. Again, both ILSP and ILSE are used to estimate the digital signals. We see from Equation (22) that in ILSE, we make a joint decision on all d bits in each snapshot s(n); n = 1 : : :N . Hence, it is convenient to consider the Snapshot Error Rate (SER) which is the probability that s(n) does not equal ^s(n). In Figure 5, the dashed and the solid lines indicate the theoretical SER for the signal detection approach used in ILSP and ILSE respectively with a known A. More speci cally, we detect the signals in ILSP by obtaining a least-squares estimate of S and projecting on to the discrete alphabet, and in ILSE we enumerate over all possible S matrices. For the ML minimization criterion (17), the SER curve for ILSE represents the lowest SER attainable by any algorithm. The `' and `' indicate the SER performance of the blind algorithms, initialized with A0 = Imd . Again, for less than 0:5% of the runs, we needed to re-initialize with random A0's. The plot shows that there is virtually no dierence in the performance of the blind algorithms as compared to the non-blind approach. A more detailed analysis of these algorithms and their comparison is presented in [30, 31]. In the nal set of simulations, we study the performance of recursive algorithms. We use d training signals, given in Equation 56, to estimate A0. For simplicity, we choose = 1. As in the previous experiment, we rst consider the simple scenario with 2 BPSK signals arriving from 0 and 5 degrees. At 3 dB SNR, we estimate N = 250 snapshots using RLSE and RPU. We average the results over 104 such runs, so that a total of 2:5 106 bits per signal are estimated. The BER's achieved by both the algorithms are equal: 1:45 10?2 for each signal, although the number of bits in error is slightly larger for RLSE. The SER for the two algorithms is also 1:45 10?2. In Figure 6, we plot the SER using RPU at each symbol period n; n = 1 : : :N . The SER plot using RLSE is virtually identical for this scenario. We observe that the SER is high initially since the estimate of A is poor. But as n increases, this estimate improves and the SER decreases. After 24

n = 120, the SER converges to about 1:14 10?2, the theoretical SER for known A, as shown in

Figure 4. Finally, we make the current scenario more dicult scenario by adding another signal at 10 degrees, with 3 dB SNR. Also, we reduce N = 100 to ease the computational burden. The BER's achieved by RLSE and RPU are given in Table 1. We see that RPU yields lower BER's than RLSE.

8 Conclusions We have presented a blind approach for the separation of synchronous digital signals in a coherent multipath environment. We have shown that given a sucient number of snapshots, the signal estimates obtained by this approach are unique. The block algorithms, ILSP and ILSE, take advantage of the FA property of digital signals to simultaneously estimate the array response and symbol sequence for each signal. We have proved that ILSE algorithm converges to a xed point in a nite number of iterations. A xed point which is not the global solution can be detected by considering the magnitude of the residual. Hence, we are ensured that the global minimum will be reached by re-initializing one or more times. We may note that despite the diculties associated with the existence of xed points that are not global solutions, these algorithms are important in providing a computationally feasible alternative to complete enumeration which is intractable. Our simulation experiments have shown that in scenarios of practical interest, the converged performance of ILSP and ILSE is virtually the same as predicted in the case where the array response matrix is completely known. We have also described recursive extensions of ILSP and ILSE, and proposed a new algorithm, RPU, that minimizes the ML criterion recursively. In contrast with block minimization, recursive minimization of the ML criterion is computationally tractable. Our approach can be extended to asynchronous transmission and to multipath channels with large delay spread, as discussed in Section 1. Since our algorithms require that the number of signals be known or accurately estimated, we have proposed in [32] a robust scheme for determining the number of incident signals in a multipath propagation environment. Determining the length of the impulse response associated with each signal is a more dicult problem, and is currently under investigation. Other directions for future work include developing computationally ecient initialization strategies in order to avoid restarts, estimating signals in the presence of non-Gaussian interference, and combining coding schemes with blind estimation.

25

A Appendix (Identi ability) A.1 Lemma 1 Let Sdn be a matrix with 1 elements such that the columns of S include all the n = 2d?1 possible distinct d-vectors with the rst element normalized to +1. Then, the solution t for the system of equations of the form

tT [s1 s2 : : : sn] = [1 1 : : : 1] ; where tT = [td : : : t2 t1 ], is such that ti = 1 for some i, and tj = 0 for j = 1; : : :d; j 6= i.

Proof: The proof is by induction. The case d = 1 is trivial. Let us consider the case d = 2. d = 2 : De ne 1;1 t1. We have the following n = 2 equations 2;1 t2 + 1;1 = 1 2;2 t2 ? 1;1 = 1

(46) (47)

There are four possible right hand sides which we denote as [x1 x2 ]. Then for [x1 x2 ] equal to: 1. [1 ? 1] or [?1 1]; t2 = 0 and t1 = +1 or ?1 respectively. 2. [1 1] or [?1 ? 1]; t1 = 0 and t2 = +1 or ?1 respectively. We assume that the above theorem holds for some d = k. Then for d = k + 1, we obtain n = 2k equations of the following type

k+1;1 tk+1 + k;1 = 1 k+1;2 tk+1 ? k;1 = 1 .. .

k+1;n?1 tk+1 + k; = 1 k+1;n tk+1 ? k; = 1 n 2

n 2

(48) (49) (50) (51) (52)

where k;j j = 1 : : : 2k?1 is of the form tk : : : t2 t1 . From (48) and (49), four possibilities exist for tk+1 and k;1: 1. tk+1 = 0 and k;1 = +1 or ?1 26

2. k;1 = 0 and tk+1 = +1 or ?1 Now, if tk+1 = 0, from (22)-(26), we get the reduced system fk;1 = 1, : : : , k; n = 1g which by assumption has the solution: ti = 1 for some i, and tj = 0 for j = 1; : : :d; j 6= i. If tk+1 = +1 or ?1, we must have fk;1 = 0, : : : , k; n = 0g. The rst two equations of this system 2

2

k;1 tk + k?1;1 = 0 k;2 tk ? k?1;1 = 0 imply tk = k?1;1 = 0. Given tk = 0, we obtain the smaller system fk?1;1 = 0, : : : , k?1; n = 0g. Next, we consider the rst two equations of this system to show tk?1 = 0, and continuing in this fashion, it is easily seen that tk?2 = : : : = t1 = 0. Thus, by induction, the theorem holds for all d. 4

2

A.2 Theorem 3.3 Let p denote the probability of receiving all the Ld =2 distinct (up to a sign) columns of S in N snapshots, N Ld =2. Assuming that the probability of receiving each symbol in is equal, and that the snapshots are independent, p is bounded by d d 1 ? L L ?d 2 2 L

!N

p 1:

(53)

Proof: Our sample space S is the set of all d-vectors with entries in = f1; 3; : : :; (L ? 1)g, S = fx : xk 2 ; k = 1 : : :dg. Let A be the event that all the l = Ld=2 distinct vectors from S are picked in N l independent trials. We are primarily interested in showing that given a xed value of d, the probability p of event A approaches 1 as N increases. For this reason, it suces to compute a lower bound for p. Consider the complement of A. Ac is the event in which at least one of the distinct vectors has not been picked. Let Aci ; i = 1 : : :l be the event where neither a particular +xi nor ?xi has been S picked in N trials. Then Ac = li=1 Aci . Furthermore l l [ X c c P (A ) = P ( Ai ) P (Aci ) i=1 i=1

(54)

since the events Aci are not disjoint. Now, the probability that +xi and ?xi is not picked in the kth trial is one minus the probability +xi or ?xi is picked, ie. (1 ? L2d ). Since the trials are independent, 27

P (Aci ) is (1 ? L2 )N . Then from (54), we get d

P (Ac ) = 1 ? p l(1 ? L2d )N : The result in Equation (53) easily follows. 2

A.3 In this appendix, we show that if the columns of S include a speci c set of d + 1 vectors for

= f1g, then A and S are identi able up to an ATM. As shown previously, if A is full rank, we need only consider the system of equations

S = TS

(55)

where S dN is an arbitrary matrix of 1 elements, Tdd is a nonsingular matrix, and SdN is the matrix of received signals. We assume S includes, up to a sign factor, the d columns de ned by matrix Sd , 2 3 6 6 6 Sd = 666 6 6 4

+1 +1 +1 .. . +1

+1 ?1 +1 .. . +1

+1 +1 ?1 .. . +1

: : : +1 7 : : : +1 7 7 7 : : : +1 7 7;

.. .

.. . : : : ?1

7 7 5

(56)

and the vector sd+1 = [+1 ? 1 ? 1 : : : ? 1]T . We normalize the rst element of each column of S to +1 and permute the columns, so that S can now be written as

S = [Sd sd+1 Sr ]; where Sr is the submatrix of remaining columns. Then, for each row t of T, we need to solve the following system of equations tT [Sd sd+1 Sr ] = sT (57) where sT is the corresponding row of S , and can be likewise partitioned as sT = [sTd sd+1 sTr ]. The rst d columns of S form a linearly independent set, and thus de ne t uniquely in terms of elements of sd tT = sTd S?d 1: (58) 28

The inverse of Sd can be computed easily by noting that Sd = eeT +2e1 eT1 ?2I where e = [1 1 : : : 1]T and e1 = [1 0 : : : 0]T , and then applying the Sherman-Morrison-Woodbury formula [33] 2

3?d 6 6 +1 6 6 S?d 1 = 12 66 +1 6 .. 6 . 4 +1

+1 ?1 0 .. . 0

+1 0 ?1 .. . 0

3

: : : +1 7 ::: 0 7 7 7 ::: 0 7 7:

.. .

.. . : : : ?1

7 7 5

(59)

Remark: In situations where blind identi cation is not required, the rows of Sd de ne a useful set of d training signals sent by each of the users to learn the array response matrix A. Hence, t can be expressed in closed form as

tT = 12 [(3 ? d)s1 + s2 + : : : + sd s1 ? s2 : : : s1 ? sd ] :

(60)

Since sd is vector with 1 entries, there are a nite number of possibilities for t. The question then becomes whether there exists another vector of 1's in S such that only a trivial t is possible. The answer to this question is the vector sd+1 . We see from (57) that t must satisfy tT sd+1 = sd+1 , which yields the relation (2 ? d)s1 + s2 + : : : sd = sd+1 : (61) If we let k denote the number of ?1's in fs2 : : : sd g, then (61) becomes (2 ? d)s1 + k(?1) + (d ? 1 ? k)(+1) = sd+1 ; and solving for k, we get

k = 21 ((2 ? d)s1 ? sd+1 + (d ? 1)) :

(62)

We consider the four dierent possibilities for [s1 sd+1 ]. In the case that [s1 sd+1 ] is equal to +[1 1] or ?[1 1], k equals 0 or d ? 1, respectively. This implies sTd = [1 1 1 : : : 1], and then from (60), we see that tT = [1 0 0 : : : 0]. Similarly, if [s1 sd+1 ] = [1; ?1], then sTd = [1 : : : 1 ? 1 1 : : : 1] and correspondingly tT = [0 : : : 0 1 0 : : : 0], with a 1 only in the ith position, i = 2 : : :d. In all four cases, a trivial solution is obtained. We have shown previously in Section 3 that each row t of T must be distinct. Therefore, T is an ATM. 2

29

References [1] S. Talwar, M. Viberg, and A. Paulraj. Blind Estimation of Multiple Co-Channel Digital Signals Using an Antenna Array. IEEE Signal Processing Letters, (1)-2:29{31, Feb 1994. [2] D. Gerlach and A. Paulraj. Adaptive Transmitting Antenna Arrays with Feedback. IEEE Signal Processing Letters, (1)-10:pp. 150{152, Oct 1994. [3] G. Raleigh, S. Diggavi, V. Jones, and A. Paulraj. A Blind Adaptive Transmit Antenna Algorithm for Wireless Communication. In Proc. IEEE ICC (to appear), 1995. [4] Guanghan Xu and Hui Liu. An Eective Transmission Beamforming Scheme for FrequencyDivision-Duplex Digital Wireless Communication Systems. In Proc. IEEE ICASSP, pages 1729{1732, 1995. [5] R. O. Schmidt. A Signal Subspace Approach to Multiple Source Location and Spectral Estimation. PhD thesis, Stanford University, Stanford, CA, 1981. [6] A. Paulraj, R. Roy, and T. Kailath. Estimation of Signal Parameters via Rotational Invariance Techniques - ESPRIT. In Proc. Nineteenth Asilomar Conf. on Circuits, Systems and Comp., 1985. [7] B. Ottersten, R. Roy, and T. Kailath. Signal Waveform Estimation in Sensor Array Processing. In Proc. Twenty-Third Asilomar Conf. on Signals, Systems and Computers, pages 787{791, 1989. [8] S. Andersson, M. Millnert, M. Viberg, and B. Wahlberg. An Adaptive Array for Mobile Communication Systems. IEEE Trans. on Veh. Tec., 40(1):230{236, 1991. [9] B. Agee. Blind Separation and Capture of Communication Signals Using a Multitarget Constant Modulus Beamformer. In Proc. IEEE MILCOM, pages 340{346, 1989. [10] A. Swindlehurst, S. Daas, and J.Yang. Analysis of a Decision Directed Beamformer. Submitted to IEEE Trans. on Signal Processing, October 1993. [11] R. Gooch and B. Sublett. Joint Spatial and Temporal Equalization in a Decision-Directed Adaptive Antenna System. In Proc. Asilomar Conf., pages 255{259, 1989.

30

[12] B.G. Agee, A.V. Schell, and W.A. Gardner. Spectral Self-Coherence Restoral: A New Approach to Blind Adaptive Signal Extraction using Antenna Arrays. Proceedings of the IEEE, vol. 78:753{767, Apr. 1990. [13] J. Cardoso. Source Separation using Higher Order Moments. In Proceedings IEEE ICASSP, pages 2109{2112, May 1989. [14] L. Tong, Y. Inouye, and R. Liu. Waveform-Preserving Blind Estimation of Multiple Independent Sources. IEEE Trans. on Signal Processing, vol. 41:pp. 2461{2470, July 1993. [15] Hui Liu and Guanghan Xu. A Deterministic Approach to Blind Symbol Estimation. IEEE Signal Processing Letters, (1)-12:205{207, Dec 1994. [16] Hui Liu and Guanghan Xu. Multiuser Blind Channel Estimation and Spatial Channel PreEqualization. In Proc. IEEE ICASSP, pages 1756{1759, 1995. [17] A. Vanderveen, S. Talwar, and A. Paulraj. Blind Identi cation of FIR Channels Carrying Multiple Finite Alphabet Signals. In Proc. IEEE ICASSP, pages vol. 2, pp. 1213{1216, 1995. [18] H.L. Van Trees. Detection, Estimation and Modulation Theory, volume I. Wiley, New York, 1968. [19] M. Wax and T. Kailath. Detection of Signals by Information Theoretic Criteria. IEEE Trans. Acoust., Speech., Signal Processing, vol. ASSP-33:pp. 387{392, Apr. 1985. [20] M. Wax and I. Ziskind. On Unique Localization of Multiple Sources by Passive Sensor Arrays. IEEE Trans. Acoust., Speech., Signal Processing, vol. 37:pp. 996{1000, July 1989. [21] N.V. E mov and E.R. Rozendorn. Linear Algebra and Multi-Dimensional Geometry. MIR Publishers, Moscow, 1975. [22] G. Golub and V. Pereyra. The Dierentiation of Pseudo-Inverses and Nonlinear Least Squares Problems Whose Variables Separate. SIAM J. Num. Anal., 10:413{432, 1973. [23] P. Gill, W. Murray, and M. Wright. Practical Optimization. Academic Press Limited, San Diego, CA, 1981. [24] B. Agee. Maximum-Likelihood Approaches to Blind Adaptive Signal Extraction Using Narrowband Antenna Arrays. In Proc. Asilomar Conf., pages 716{720, 1991. 31

[25] Y. Wang, Y. Pati, Y. Cho, A. Paulraj, and T. Kailath. A Matrix Factorization Approach to Signal Copy of Constant Modulus Signals Arriving at an Antenna Array. In Proc. CISS (Princeton), 1994. [26] S. Barnett. Matrices - Methods and Applications. Clarendon Press, Oxford, 1990. [27] B. Juang and L.R. Rabiner. The Segmental K-Means Algorithm for Estimating Parameters of Hidden Markov Models. IEEE Trans. on ASSP, vol. 38, no. 9:pp. 1639{1641, Sept. 1990. [28] W.I. Zangwill. Nonlinear Programming: A Uni ed Approach. Prentice Hall Inc., New Jersey, 1969. [29] Simon S. Haykin. Introduction to Adaptive Filters. Macmillan, New York, 1984. [30] S. Talwar and A. Paulraj. Performance Analysis of Blind Digital Signal Copy Algorithms. In Proc. MILCOM, volume I, pages 123{128, 1994. [31] S. Talwar, A. Paulraj, and M. Viberg. Blind Separation of Synchronous Co-Channel Digital Signals Using an Antenna Array. Part II. Performance Analysis. Manuscript in Preparation, June 1995. [32] A.J. van der Veen, S. Talwar, and A. Paulraj. Blind Estimation of Multiple Digital Signals Transmitted over FIR Channels. IEEE Signal Processing Letters, (2)-5:99{102, May 1995. [33] G.H. Golub and C.F. Van Loan. Matrix Computations. John Hopkins University Press, Baltimore, MD, 1984.

32

60

50

40

A

(i)

30

20

10

10

20

30

S

(j )

40

50

60

Figure 1: This graph represents a matrix of residuals, wherein the entry at row i and column j corresponds to the the value of the residual kX ? A(i) S(j )k2 , i; j = 1 : : : 64. Shown are paths of ILSE algorithm that lead to global minima (solid lines), and to other xed points (dashed lines).

33

0

10

s2 −1

10

BER

s1

s3 −2

10

−3

10

0

1

2

3 SNR (dB)

4

5

6

Figure 2: Bit Error Rate for ILSP

0

10

−1

10

−2

s2

10 BER

s1 s3

−3

10

−4

10

−5

10

0

1

2

3 SNR (dB)

4

5

Figure 3: Bit Error Rate for ILSE

34

6

15

ILSP

No. of Iterations

10

5 ILSE

0 0

1

2

3 SNR (dB)

4

5

6

Figure 4: Number of Iterations for ILSP and ILSE

0

10

−1

10

ILSP

−2

ILSE

SER

10

−3

10

−4

10

−5

10

0

1

2

3

4 SNR (dB)

5

6

7

8

Figure 5: Snapshot Error Rate for ILSP and ILSE 35

0.055 0.05 0.045 0.04

SER

0.035 0.03 0.025 0.02 0.015 1.14e−2 0.01 0.005 0

50

100 150 Symbol Period (n)

200

250

Figure 6: Snapshot Error Rate Pro le for RPU

Bit Error Rate Alg. s1 s2 s3 ? 2 ? 2 RLSE 2:88 10 5:26 10 2:59 10?2 RPU 2:66 10?2 4:92 10?2 2:39 10?2

Table 1: Bit Error Rate for RLSE and RPU

36