Optimized Signaling for MIMO Interference ... - Semantic Scholar

15 downloads 0 Views 205KB Size Report
Apr 26, 2003 - ... iterative water-filling was proposed [10] for the digital subscriber line (DSL) ... ratio (INR) of the interference which is generated by user and received by ...... “Game theory and the design of self-configuring, adaptive wireless ...
1

Optimized Signaling for MIMO Interference Systems with Feedback

Sigen Ye, Student Member, IEEE, and Rick S. Blum, Senior Member, IEEE

This material is based on research supported by the Air Force Research Laboratory under agreement No. F49620-01-1-0372 and by the National Science Foundation under Grant No. CCR-0112501. The authors are with the Electrical and Computer Engineering Department, Lehigh University, Bethlehem, PA 18015 USA (e-mail: [email protected]; [email protected]).

April 26, 2003

DRAFT

2

Abstract The system mutual information of a multiple-input multiple-output (MIMO) system with multiple users which mutually interfere is considered. Perfect channel state information is assumed to be known to both transmitters and receivers. Asymptotic performance analysis shows that the system mutual information changes behavior as the interference becomes sufficiently strong. In particular, beamforming is the optimum signaling for all users when the interference is large. We propose several numerical approaches to decide the covariance matrices of the transmitted signals and compare their performance in terms of the system mutual information. We model the system as a noncooperative game, and perform iterative water-filling to find the Nash equilibrium distributively. A centralized global approach and a distributed iterative approach based on the gradient projection method are also proposed. Numerical results show that all proposed approaches give better performance than the standard signaling which is optimum for the case without interference. Both the global and the iterative gradient projection methods are shown to outperform the Nash equilibrium significantly. Index Terms MIMO, mutual information, cochannel interference.

I. I NTRODUCTION In recent years multiple-input multiple-output (MIMO) systems have attracted great attention [1]-[6]. MIMO systems have shown great potential for providing high spectral efficiency in isolated, single user, wireless links without interference [1][7][8]. Recently there has been some research devoted to MIMO broadcast channels [6] and MIMO multiple access channels [9], where there are multiple users which mutually interfere but users either share the same transmitter or the same receiver. However, much less effort has been given to the scenario where interfering users have different transmitters and receivers. It was first shown in [2] that cochannel interference from adjacent cells in MIMO cellular systems can degrade the overall system performance significantly. We consider an MIMO interference system, where there are multiple MIMO users which mutually interfere, the transmitters send independent data streams, and the receivers decode independently. This scenario arises, for example, in MIMO cellular systems, where the user in one cell suffers from the cochannel interference from the users in other cells, or in MIMO ad-hoc networks, where each transceiver pair suffers from the interference from other transceiver pairs operating in the same frequency band. A similar single-antenna system is usually referred to as an interference channel. The capacity of the interference channel is still an unsolved problem theoretically, even for the simplest two-user additive white Gaussian interference channel. Although the optimum transmission probably involves multiuser detection techniques, most work still concentrates on the single-user detection. A distributed power control scheme involving iterative water-filling was proposed [10] for the digital subscriber line (DSL) interference channel. An iterative power control algorithm was proposed [11] for an MIMO interference system with feedback. The capacity of MIMO interference systems without feedback was treated in [12], and it was proved that putting all power into one antenna is the optimum signaling when the interference is sufficiently strong. Recently, multiuser detection techniques were employed [5] in the interference-limited MIMO cellular systems to

April 26, 2003

DRAFT

3

improve the performance. In this paper, we will focus on the optimum signaling for MIMO interference systems with feedback for flat Rayleigh fading channels. Both the transmitters and receivers are assumed to have perfect knowledge of channel state information. We also assume each receiver is performing single-user detection, which means that no multiuser detection techniques are employed. Section II introduces the system model and formulates the optimum signaling problem. In Section III, the asymptotic behavior of the system mutual information is investigated analytically. In Section IV, several approaches to decide the signaling are proposed. Simulation results are given and discussed in Section V. Section VI gives the conclusions. II. P ROBLEM F ORMULATION

() denotes the conjugate. For a matrix A, A denotes the transpose, Ay denotes the conjugate transpose, and tr(A) denotes the trace. For a matrix S = [s ℄, S+ = [max(s ; 0)℄. Let I denote the identity matrix and O denote the zero matrix. We first introduce the notation in this paper. We use boldface to denote matrices and vectors. T

ij

ij

L users, where each user employs N transmit antennas and N receive antennas, and suffers from cochannel interference from the other L 1 users. The transmitters of the L users We consider an MIMO interference system with

t

r

are sending independent data, and the receivers are performing independent decoding with single-user detection. The received complex baseband signal vector (Nr  1) of user

y = p H x + `

where

`;j

j

`;`

p H x + n ;

L X

`

`;j

6

`;j

j

(1)

`

j =1;j =`

H

of user

`

` is given by

(Nr  Nt ) denotes the channel matrix between the receive antennas of user

and

n

`

(Nr  1) is the noise vector. Both

H

`;j

and

n

`

` and the transmit antennas

are assumed to have independent and identically

distributed complex Gaussian entries with zero mean and unit variance. The transmitted signal is denoted by (Nt

x

`

 1). For simplicity, we assume all of the interfering signals x ; j = 1;    ; L; j 6= ` are unknown to the j

receiver and we model each of them as being Gaussian distributed, the usual form of the optimum signal in MIMO problems. This model is well suited to the case where each user chooses his signaling without knowing the exact interference environment he will face. We normalize

x

such that the covariance matrix

j

Q = E fx xy g satisfies j

j

j

tr(Q ) = 1 for j = 1;    ; L. The signal-to-noise ratio (SNR) of user ` is  , and  is the interference-to-noise ratio (INR) of the interference which is generated by user j and received by user `’s receiver. P p H x + n , with the covariance Given H 1 ;    ; H , the interference-plus-noise of user ` is =1 6= P H Q Hy . Using the assumptions outlined, the mutual information of user ` matrix of R = I + =1 6=  conditioned on H 1 ;    ; H is obtained [13] as j

`

`;j

L

`;

`;L

j

;j

`;j

`

`;j

j

`

L

`

j

`;

;j

`

`;j

`;j

j

`;j

`;L

I (Q1 ;    ; Q ) = log2 det `

April 26, 2003

L



I +  H Q Hy R `

`;`

`

`;`

`

1



:

(2)

DRAFT

4

Similarly, the mutual information of the overall

I (Q1 ;    ; Q ) =

L X

L

L-user system conditioned on H1 1 ;    ; H ;



log2 det I +  H Q Hy R `

`;`

`

`;`

1

L;L

is



(3)

`

`=1

=

L n X

`

`;`

`

`;`

o





log2 det  H Q Hy + R

log2 det R :

(4)

`

`

`=1

Q ;    ; Q )g or E fI (Q ;    ; Q )g by aver-

We note that to compute the mutual information we compute E fI` (

1

1

L

L

H ;    ; H . However, since the transmitter is assumed to have perfect channel knowledge we can maximize E fI (Q ;    ; Q )g or E fI (Q ;    ; Q )g by maximizing I (Q ;    ; Q ) or I (Q ;    ; Q ) for the given value of H ;    ; H . Thus we obtain signaling matrices that depend on the aging over the random channel matrices

1;1

1

`

1

L;L

1;1

L

1

L

L

`

1

L

L;L

channel matrices which is as one would expect in cases with feedback. If we consider a single-user MIMO link for user `, the optimum signaling problem is to find the optimum maximize

Q

`

to

I (Q1 ;    ; Q ) given the interference from other users. It is well known that for a single-user MIMO `

L

link without interference, the capacity is achieved by water-filling if the transmitter has the channel knowledge [7]. This result can be easily generalized to a single-user MIMO link with fixed interference covariance matrix [14]. However, in this paper, we are interested in the single-user performance

I (Q1 ;    ; Q ). `

L

Let



P SD 1

L-user system

performance

I (Q1 ;    ; Q ) L

rather than the

= fQ R R jQ is positive semi-definite and tr(Q) = 1g, N

N

which is the convex set of positive semi-definite (PSD) matrices with unit trace. We study the optimum signaling problem of the

L-user

system, where the goal is to find the optimum

I (Q1 ;    ; Q ) for known H1 1 ;    ; H L

;

L;L

Q ;;Q 2  1

L

P SD 1

which maximize

.

III. A SYMPTOTIC B EHAVIOR In the MIMO interference system, there are multiple users which mutually interfere. If one user updates his covariance matrix, this will impact his own mutual information as well as the mutual information of the other users. Generally speaking, the system capacity in (3) is not a convex or concave function with respect to

Q ;;Q 1

L

.

Therefore, the optimization problem is difficult to solve analytically or even numerically due to its complexity and non-convexity. In this section, we investigate the convexity and concavity of the function for two asymptotic cases, that is, the case when the INR is extremely small and the case when the INR is extremely large. “A function is convex if and only if it is convex when restricted to any line that intersects its domain.” [15]. Given an arbitrary function

x , define g(t) = f (tx + (1 t)x ); 0  t  1. Then f (x) is a convex (concave) function of x if and only if g (t) is a convex (concave) function of t for any feasible x and x , which 2 is equivalent to 2  0( 0) for 0  t  1. Based on this property, we have the following theorems. and two different feasible points

x

f (x)

1

and

2

1

2

1

2

d g (t) dt

A. Small Interference Theorem 1:

April 26, 2003

I (Q1 ;    ; Q ) in (3) is a concave function of Q1 ;    ; Q L

L

when the INR is sufficiently small.

DRAFT

5

X ;    ; X ) and (Z ;    ; Z )

Proof: We consider the convex combination of two different feasible solutions (

(

X 2 `

P SD 1

and

Z 2 `

P SD 1

for

` = 1;    ; L), which is

1

1

L

L

(Q1 ;    ; Q ) = t(Z1 ;    ; Z ) + (1 t)(X1 ;    ; X ) L

L

L

= (X1 ;    ; X ) + t(Y1 ;    ; Y ); L

(5)

L

Y ;    ; Y ) = (Z ;    ; Z ) (X ;    ; X ) and 0  t  1 (Note that Y 62  but Y ; ` = 1;    ; L. Then I (Q ;    ; Q ) is a concave function of Q ;    ; Q Clearly, Q 2  2 2 I (Q ;    ; Q )  0 for any feasible (X ;    ; X ); (Z ;    ; Z ); 0  t  1.

where (

1

1

dt

1

L

L

P SD 1

`

d

1

L

1

1

L

1

L

1

L

`

is Hermitian.).

L

if and only if

P SD 1

`

L

We first introduce some useful results in the matrix differential calculus, which will be used extensively. Given

A(x) with the scalar parameter x, we have [16]   dA(x) d tr(A(x)) = tr

a matrix function

dx



(6)

dx

d ln det A(x) = tr A(x) 1 dAdx(x) dx d A(x) 1 = A(x) 1 dAdx(x) A(x) dx



(7) 1

(8)

By applying (6) - (8) to calculate the derivative of (4), we derive

d 1 X tr h(R +  H Q I (Q1 ;    ; Q ) = dt ln 2 =1  dR y 1 dR ( dt +  H Y H ) R dt ; y 6  H Y H does not depend on t. Further, = L

L

`

`

`;`

`

Hy )

1

`;`

`

`

`

`

where

R` = P =1 L

d

j

dt

;j

`

`;j

`;j

j

`;`

`

`;`

(9)

`

`;j



1 X tr (R + M ) 1 ( dR + d2 I (Q1 ;    ; Q ) = 2 dt ln 2 =1 dt R ; N )(R + M ) 1( ddtR + N ) + R 1 ddtR R 1 ddt (10) where M =  H Q Hy and N =  H Y Hy . As the INR  becomes sufficiently small, roughly speaking, R approaches I, R` is proportional to  , and R` + N is dominated by N . In other words, the second term inside the trace in (10) is proportional to  2 , while  has little impact on the first term. Therefore, the second L

`

L

`

`

`

`

`

`

`

`

`

`

`

`;`

`

`

`

`;`

`

`;`

d

`

`

`

`

d

`;j

dt

`;j

`;`

`

dt

`

`;j

`;j

term can be ignored, and we obtain

1 X tr h (R + M ) d2 I ( Q ;    ; Q )  1 dt2 ln 2 =1  dR 1 dR ( dt + N )(R + M ) ( dt + N ) : L

L

`

`

1

`

`

`

`

`

`

(11)

`

A = (R + M ) and B = R` + N . Since A is PSD, there exists a matrix C such that A = C Cy . Thus tr( A B A B ) = tr(C Cy B C Cy B ) = tr(Cy B C Cy B C ) = tr((Cy B C )(Cy B C )y )  0, where we utilize the fact that B is Hermitian and Consider a single term in the summation in (11). Let `

`

`

April 26, 2003

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

1

`

`

`

`

`

`

d

`

dt

`

`

`

`

`

`

`

`

DRAFT

6

(Cy B C )(Cy B C )y is PSD. Therefore, when  that I (Q1 ;    ; Q ) is concave. `

`

`

`

`

`

2

d

is sufficiently small,

`;j

dt2

I (Q1 ;    ; Q ) L

 0, which means

L

= 0), where users become independent and the

Remark: Consider the extreme case without interference ( `;j

system mutual information becomes decoupled. By using the well-known fact that the single-user MIMO capacity is a concave function, we know that the system mutual information is also a concave function. Theorem 1 essentially extends this concavity to the cases with sufficiently small interference. B. Large Interference Theorem 2:

I (Q1 ;    ; Q )

Q

in (3) is a convex function of

L

for fixed

i

Q ;;Q 1

i

1

; Q +1 ;    ; Q i

for any

L

i = 1;    ; L when the INR is sufficiently large. Proof: Without loss of generality, we set i = 1. Define I (Q1 ) = I (Q1 ;    ; Q ) for any fixed Q2 ;    ; Q . By investigating the convexity of I (Q1 ) restricted to any line, we can prove that I (Q1 ) is a convex function of s

L

s

Q . Consider

L

s

1

Q = tZ + (1 t)X = X + tY ; and Y = Z X is Hermitian. From (3), we have o n  I (Q ) = log det I +  H Q Hy R 1

where

X ;Z 2  1

1

P SD 1

1

1

+ `

`

`;`

(8), we derive

`

`;`

1

1

2

L X

(12)

1

n

1;1

1

1;1



log2 det I +  H Q Hy R `

`=2

A =  H Q Hy

1

1

1

s

We define

1

and note that

R

1

and

`;`

`

`;`

1

1

1

o

`

:

(13)

A ; ` = 2;    ; L do not depend on t. By applying (7) and `

i h  dI (Q1 ) = tr I + A1 R1 1 1 1 H1 1 Y1 Hy1 1 R1 1 dt X h  + tr I + A R 1 1 A R 1  1 H 1Y1 Hy R s

;

;

L

`

`

`

`=2

h

= tr (R1 + A1 ) 1 1 B1 + where

B = H Y Hy `;1

`

1

`;1

L X

tr

`;

`

`;

`;1

1

i

`

i i

h

(R + A ) 1 A R 1  1 B ; `

`

`

`;

`

(14)

`

`=2

; ` = 1;    ; L is Hermitian. Further, h d2 I (Q1 ) 2 =  tr (R1 + A1 ) 1 dt2 s

+

L X

h

2 1 tr (R + A ) `

`;

`

1

1

B (R + A ) B 1

1

1

1

i

1

B (R + A ) A R B `

`

`

`=2

1

1

`

`

`

i

+ (R + A ) 1 A R 1 B R 1 B : `

For sufficiently large `;1 , and

R

1 `

R

`

`

`

`

`

(15)

`

R + H Q Hy ) , (R + A )

is roughly proportional to `;1 , which means that (

are roughly proportional to

April 26, 2003

`



1

`;1

1

1

1;1

1

1;1

1

`

`

. Therefore, the first term in (15) can be ignored, since it depends on DRAFT

1

,

7



2

`;1



while the second term in (15) depends on

C = (R + A ) A R = R `

`

1

`

. Note that

A

`

and

B

`

do not depend on

(R + A ) to give i n h X d2 I (Q1 )  2 1 tr (R + A ) 1 B C B 2 dt

1

`

1

`;1

`

1

`

`



`;1

. We define

1

`

L

s

`

`;

`=2

`





+ tr C B R 1 B `

`

`

`

`

`

:

`

(16)

(R + A ) R = A is PSD, we have R + A  R according to the definition of PSD matrix 1 ordering [17]. Then (R + A ) 1  R 1 [17], which means that C = R 1 (R + A ) is PSD. Consider 1 the first trace term in (16). We know that (R + A ) and C are PSD, so there exist matrices D and E such i i h h 1 1 that (R + A ) = D Dy and C = E Ey . Therefore, tr (R + A ) B C B = tr D Dy B E Ey B = i h   tr (Dy B E )(Dy B E )y  0 because (Dy B E )(Dy B E )y is PSD. Similarly, we have tr C B R 1 B  0. 2 (Q ) s 1  0 when  is very large, which means that I (Q ) is a convex function of Q . Therefore, 2 1 1 1 Since

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`

`;

dt

`

`

`

`

d I

`

`

`

`

`

`

`

`

`

s

Theorem 3: When the INR is sufficiently large, at least one of the optimum solutions employs 1-stream signaling for all users (Here the number of streams is defined as the number of non-zero eigenvalues of the transmitted covariance matrix.).

Q ;    ; Q be any feasible solution. First we concentrate on user 1. By eigenvalue decomposition, P t P t we obtain that Q = is a feasible 1-stream covariance  v vy =  Q , where Q = v vy 2  matrix. According to Theorem 2, when  becomes sufficiently large, I (Q ;    ; Q ) is a convex function of Q for fixed Q ;    ; Q . Therefore, Proof: Let

1

L

N

1

i=1

i

i=1

i

(i) 1

(i) 1

N

i

i

i

i

1

`;j

2

P SD 1

L

1

L

I (Q1 ;    ; Q ) L



Nt X

 I (Q(1 ) ;    ; Q ) i

i

L

i=1

 maxfI (Q ;    ; Q )g; (i) 1

i

where we employ the fact that

PNt

i=1

L

(17)

 = 1. This means that for any Q1 , we can find a 1-stream signaling approach i

for user 1 which gives performance no worse than that of

Q . By applying the same argument to all users, we 1

conclude that we can always find a 1-stream approach which gives performance as good as or better than that of

Q ;;Q 1

L

. Therefore, at least one of the optimum solutions employs 1-stream signaling for all

L users. In other

words, when the interference is sufficiently large, one optimum solution forces all users to perform beamforming along a certain direction, no matter how many transmit antennas they have. IV. D IFFERENT S IGNALING A PPROACHES In the previous section, we discussed some properties of the system mutual information in two extreme cases. However, to analytically solve for the optimum covariance matrices for general cases is very difficult. Therefore, in this section, we turn to numerical methods to address the problem. We will introduce several specific numerical algorithms to determine the covariance matrices of the transmitted signals and compare their performance and complexity.

April 26, 2003

DRAFT

8

A. Independent Water-Filling The first approach, which we call ”independent water-filling”, employs the signaling which is optimum for the no-interference case. The transmitter of each user always pretends that there is no interference from other users, and performs traditional water-filling [7] according to the its own channel state information. To be more specific,

Q ;    ; Q ) in (2) assuming  = 0 and thus R = I. Suppose  Hy H = UUy by eigenvalue decomposition, where U is unitary and  is diagonal. The covariance matrix user ` employs is Q = U(I  ) Uy , where  is chosen so that tr(I  ) = 1. Therefore, each user decides its

user ` tries to maximize I` (

1

L

`;j

1 +

`

`

`

`;`

`;`

1 +

covariance matrix locally and independently in a single step. The only information required by the transmitter of user

`

is its own channel state information

H

`;`

, and the complexity is very low. Since this approach achieves

optimum performance when there is no interference, we expect it to give performance close to optimum when the interference is weak. However, we do not expect it to perform well when strong interference is present. B. Nash Equilibrium If we assume the cooperation between users is not feasible, the problem can be modeled as a non-cooperative game, where each user is only concerned about his own mutual information rather than the system mutual information. To be more specific, each user is a player and user his payoff

` tries to determine the signaling strategy Q

`

to maximize

I (Q1 ;    ; Q ) (the mutual information of a single link) without cooperating with other users. `

L

The non-cooperative game theory was also applied to some other problems where there are multiple users, including power control for wireless networks [18] [19] and DSLs [10]. The Nash equilibrium of a non-cooperative game is defined as a strategy profile in which no player could increase his expected payoff by unilaterally deviating from this profile [20]. In other words, each player’s strategy is a best response to other players’ strategies in terms of his payoff. At the Nash equilibrium of our MIMO interference system game, user ` should employ the covariance matrix which maximizes

I (Q1 ;    ; Q ) `

L

given the covariance matrices of the other users. That is, user

` should

perform generalized water-filling when taking the interference from other users into account [14]. Mathematically, suppose

 Hy `

R H = WDWy , with W unitary and D diagonal. User ` employs the covariance matrix 1

`;`

`;`

`

Q = W(I D ) Wy ; `

where



is chosen to satisfy

replaced with

R

1=2 `

H

`;`

tr(I

1 +

(18)

D ) = 1. This is similar to independent water-filling except that H 1 +

`;`

is

. The Nash equilibrium is reached only when the water-filling condition (18) is satisfied

for all users simultaneously. Once the Nash equilibrium is reached, each user tends not to deviate from this point individually since any deviation would result in his own performance degradation. However,

Q

`

in (18) depends on

equilibrium solution

Q ;;Q 1

L

Q ; j = 1;    ; L; j 6= ` implicitly. An explicit closed form for the Nash j

is not available. To solve for the Nash equilibrium, we borrow the idea of iterative

water-filling from [10], which was applied to the DSL interference channel. The main procedure is: at any given time slot, one user updates its covariance matrix via water-filling according to (18) for the given channel and interference, and all users do this iteratively until the whole procedure converges. The information required by the transmitter April 26, 2003

DRAFT

9

of user

`

is its own channel state information

H

`;`

and the covariance matrix of the interference-plus-noise

R. `

Therefore, users can update their signaling covariance matrices in a distributed way provided the covariance matrix of the interference-plus-noise can be estimated. However, it is not obvious that this algorithm always converges in our MIMO interference system. It was shown in [10] that the iterative water-filling procedure converges under certain conditions for the DSL interference channel. As will be shown later, numerical results suggest that this algorithm might not always converge in our problem. Apparently a similar algorithm was proposed in [11], but not from the game-theoretic point of view. C. Global Gradient Projection Method The nature of the Nash equilibrium is that users are competing with each other rather than cooperating to achieve a socially optimum point. As we have already pointed out, a change in the covariance matrix of one user generally results in a change of the mutual information of all users. Therefore, to achieve the system capacity, the transmitters must cooperate in a certain way when deciding their covariance matrices, so that the perfect compromise between the maximization of one’s own mutual information and the minimization of the interference to other users can be achieved. As mentioned before, due to the non-concavity of our optimization problem, it is almost impossible to develop a general algorithm to find the optimum solution. Nonetheless, we can develop some numerical algorithms based on known nonlinear programming techniques which, in general, give a suboptimum solution and provide a lower bound of the gain we can obtain by global optimization. However, considering the complexity of our problem, with matrix variables and matrix manipulations involved, we prefer simple algorithms. The gradient projection (GP) method [21], which is an extension of the unconstrained steepest descent method to the convex constrained problems, is an excellent choice to serve our purpose due to its simplicity and its efficiency in dealing with certain constraints as we will show. In this subsection, we propose a global approach based on the GP method. First we give a brief review of GP method. By making minor changes to the discussion in [21] to solve for the maximum instead of the minimum, we summarize the main idea underlying the GP method as follows: suppose we need to maximize a function

f (x) over x 2 X , where X

is a convex set. Given an initial feasible point

x 2 X, 0

we update the value according to

x

k+1

= x + (x k

k

x );

k

(19)

k

where

x = [x + s rf (x )℄X : k

k

k

(20)

k

 1 is the stepsize, s > 0 is a scalar, rf (x ) is the gradient of f (x) at the point x , and [℄X denotes the projection on the convex set X . The projection solves for a point in X which is closest to x + s rf (x ) in the sense of Euclidean distance, which becomes necessary when x + s rf (x ) does not fall in the desired  , x always belongs to X due to the convexity of is a convex combination of x and x set X . Since x Here 0 < k

k

k

k

k

k

k+1

April 26, 2003

k

k

k

k

k

k

k+1

DRAFT

10

X . The update procedure is repeated until it converges. Note that if x + s rf (x ) 2 X , then (19) becomes x = x + s rf (x ), which is exactly the steepest descent approach. However, an extra projection step is k

k+1

k

k

k

k

k

k

introduced here due to the set constraint. There are several important concepts and properties regarding the GP method:

x~ is called a stationary point if it satisfies hrf (~x); x x~i  0 for any x 2 X , which is a necessary condition for a local maximum. Here h; i denotes the inner product. x x in (19) is gradient related, which means that hrf (x ); x x i > 0, as long as x is not a stationary ) > f (x ). point. This guarantees that there exists some such that f (x The sequence x generated by (19) always converges to a stationary point, provided that and s are chosen

1)

2)

k

k

k

k

k+1

k

3)

k

k

k

k

k

k

properly. Our system capacity optimization problem is to maximize (3) over

Q ;;Q 2  1

L

P SD 1

. By applying the GP

method, we have the global GP algorithm as follows: Algorithm 1: Global GP Choose the initial conditions

k=0

Q (0);    ; Q (0). 1

L

do Calculate the gradient Choose appropriate

Q0 (k) = Q (k) + s

s

G = rQi I (Q (k);    ; Q (k)); i = 1;    ; L 1

ki

k

G ; i = 1;    ; L Q (k) = projection of Q0 (k) onto  i

i

k

ki

i

P SD 1

i

Choose appropriate

L



; i = 1;    ; L

k

Q (k + 1) = Q (k) + (Q (k) Q (k)); i = 1;    ; L i

i

k

i

i

k =k+1

until the maximum absolute value of the elements in

Q (k) Q (k 1) <  for i = 1;    ; L. i

i

Several rules for the selection of sk and k were introduced in [21] and can be applied to this algorithm, including the limited minimization rule, Armijo rule along the feasible direction, and Armijo rule along the projection arc, etc. In the simulation results presented in Section V, the Armijo rule along the feasible direction will be used. For convenience, let

Q = (Q ;    ; Q ), I (Q) = I (Q ;    ; Q ), and rI (Q) = (rQ1 I (Q);    ; rQL I (Q)). 1

1

L

L

According to the Armijo rule along the feasible direction, we choose nonnegative integer

k

k

= 

m

L X

i=1

tr

m

k , where

m

k

is the first

hrI (Q(k)); Q (k) Q(k)i

 rQi I (Q(k))y Q (k) Q (k) :



i

(21)

i

0 < < 1 and 0 <  < 1 are fixed scalars. Typically a very small value of 

April 26, 2003

m

m which satisfies I (Q(k + 1)) I (Q(k))  

Here

s = 1 and =

is chosen, and



is chosen

DRAFT

11

between

1=2 and 1=10. Note that the inner product of complex matrices is defined as

hA; Bi = tr(Ay B):

(22)

However, to make the GP algorithm work for our particular problem, there are still two problems to be solved:

Q0 (k) onto the constraint set  . 1) Gradient: The gradient rQi I (Q ;    ; Q ) depends on the derivative of I (Q ;    ; Q ) with respect to Q .   It was shown in [16] that  det(BXC)= X = det(BXC) C(BXC) B . Based on this, we derive that how to calculate the gradient, and how to project 1

P SD 1

i

1

L

  ln det( A + BXC ) = C(A + BXC) X

L

i

T

1

1

B

T

:

(23)

By applying (23) to (4), we obtain the derivative

I (Q1 ;    ; Q )  h y = ln 2 H (R +  H Q h h X +  Hy (R +  H Q Hy ) 1 i

L

i

i;i

i

i

i;i

L

`;i

ln 2

`=1 `=i

6

R

`

`;i

`

`;`

`

iT

Q Hy ) H

`;`

1

i

R

i;i

i;i

1

i

`

iT

H

`;i

;

Q . Further, for a function f (z) where z = x + jy is a = ( j ), and the gradient is defined as r f (z ) = complex variable, the derivative is defined as p +j ) . Therefore, [22]. Here j = 1. Hence for a real-valued function f (z ) we have r f (z ) = 2(   I (Q ;    ; Q )  rQi I (Q ;    ; Q ) = 2 Q 2  y y (24) = ln 2 H (R +  H Q H ) H +

where we use the fact that

does not depend on

i

i

f (z )

1 2

z

f (z )

f (z )

x

y

f (z )

f (z )

x

y

z

f (z )

z

1

1

L

z

L

i

1

i

i

i;i

2 ln 2

L X

6



`;i

i

i;i

i

i;i

i;i

h

Hy (R +  H Q Hy ) `;i

`

`

`;`

`

`;`

1

R

1

i

`

H

`;i

;

`=1;`=i

(R +  H Q Hy )

R ; ` = 1;    ; L are Hermitian matrices. 2) Projection: We recognize from (24) that rQi I (Q ;    ; Q ) is a Hermitian matrix. Therefore, Q0 (k ) = Q (k)+ s rQi I (Q (k);    ; Q (k)) is Hermitian, and the projection problem becomes how to project a Hermitian

where we note that

`

`

`;`

`

`;`

1

and

1

`

1

i

1

k

L

i

L

matrix onto the constraint set



P SD 1

. We use the Frobenius norm as the matrix distance criterion, which is the

counterpart of the Euclidean norm in the vector space and is consistent with the definition of the inner product

A = [a ℄ and B = [b ℄ is j = tr[(A B)y (A B)℄. Given a Hermitian matrix A , we need to find A 2 

in (22). By the definition of Frobenius norm, the square distance of two matrices

D2 (A; B) =

P

i;j

which minimizes

ja

ij

b

ij

2

D (A; A0 ). By using the Lagrange multiplier method, we define

ij

0

P SD 1

2

f = D2 (A; A0 ) + (tr(A) 1) = tr[(A A0 )y (A A0 )℄ + (tr(A) 1); where

(25)

 is the Lagrange multiplier. Since tr(A)= A = I and tr(Ay A)= A = (Ay ) f = ((A A0 )y ) + I = 0: A

T

T

April 26, 2003

ij

, we have (26)

DRAFT

12

Therefore,

A=A

I = U(D I)Uy ;

0

(27)

A = UDUy is the eigenvalue decomposition with U unitary and D diagonal. Recall that A is PSD and tr(A) = 1. Thus we need to solve for  which satisfies tr(D I) = 1 and then A = U(D I) Uy . Therefore, where

0

+

to project a Hermitian matrix onto



P SD 1

+

, we only need to adjust the eigenvalues in an appropriate way and keep

the original eigenvectors. According to the convergence analysis in [21], we know that Algorithm 1 converges to a stationary point which satisfies

hrI (Q~ ); Q Q~ i  0 for any Q 2 

P SD 1

Q~ ,

. However, its complexity is quite high compared to that

of iterative water-filling. A central control, which has access to all the channel state information and the covariance matrices of all users, is necessary. It performs the calculation and sends the information to all users so they can update their covariance matrices accordingly. D. Iterative Gradient Projection Method Based on the global GP method given in the previous subsection, we propose a distributed algorithm where users update their covariance matrices in an iterative way. At each iteration, one user updates his covariance matrix using the GP method while the covariance matrices of the other users remain unchanged, as described in Algorithm 2. Algorithm 2: Iterative GP Choose the initial conditions

k=0

Q (0);    ; Q (0). 1

L

do

i = 1 to L

for

Calculate the gradient Choose appropriate

Q0 (k) = Q (k) + s

s

G = rQi I (Q (k + 1);    ; Q (k + 1); Q (k);    ; Q (k)) 1

ki

i

ki

i

L

ki

i

i

Choose appropriate

1

ki

G Q (k) = projection of Q0 (k) onto  i

i



P SD 1

ki

Q (k + 1) = Q (k) + (Q (k) Q (k)) i

i

ki

i

i

end

k =k+1 until the maximum absolute value of the elements in

Q (k) Q (k 1) <  for i = 1;    ; L. i

i

We still use the Armijo rule along the feasible direction to decide

1);    ; Q 1 (k + 1); Q ; Q +1 (k);    ; Q (k)). i

April 26, 2003

i

i

L

We choose

s

ki

=1

s

ki

and

and



ki



ki

=

. Let

m

ki ,

I (Q ) = I (Q1 (k + where m is the first k

i

ki

DRAFT

13

nonnegative integer

m which satisfies I (Q (k + 1)) I (Q (k))    (k)  tr rQ I (Q (k))y Q k

i

k

m

with

i

i

k

i

i

Q (k) i



;

(28)

0 < < 1 and 0 <  < 1 fixed scalars.

Q(k +1))  I (Q(k)). Since I (Q) is upper-bounded, Algo~ only if for i = 1;    ; L, hrQi I (Q~ ); Q Q~ i  0 rithm 2 is guaranteed to converge. Further, it stops at the point Q ~ is a stationary point since hrI (Q~ ); Q Q~ i = P hrQi I (Q~ ); Q Q~ i  for any Q 2  . This means that Q Based on the properties of the GP method, we have I (

i

i

i

L

P SD 1

i=1

i

i

0. Therefore, as for Algorithm 1, this iterative algorithm also converges to a stationary point. Recall that in Theorem 1 in Section III, we proved that I (Q1 ;    ; Q ) is a concave function when the INR is sufficiently small. In this L

case, any stationary point is also a global maximum. This assures that both algorithms converge to the optimum solution. In other cases when the function is not concave, we cannot guarantee that these algorithms converge to the global optimum since they may get stuck at some other stationary points. Nonetheless, both algorithms give very good results, as will be shown in the next section. The advantage of this algorithm over Algorithm 1 is that all users do not need to update simultaneously, thus the centralized control is not necessary. Each user can update his own covariance matrix by himself as long as all the necessary information is known. Therefore, it is similar to the Nash equilibrium in the sense that each user tries to maximize a function individually. The difference is that here each user tries to maximize the system mutual information instead of the individual mutual information as in the Nash equilibrium. However, each user needs to know all the channel state information and the covariance matrices of all users, as in the global GP method. Its complexity is also comparable to that of the global GP method. V. S IMULATION R ESULTS In this section, we compare the achieved system mutual information of different approaches described in the previous section, and use the properties in Section III to interpret the solutions. For simplicity, we always let

N = N = 2. t

r

First we consider the symmetric case, where

 =  and  `

`;j

=  for all ` and j . Fig. 1, Fig. 2 and Fig. 3 plot the

ergodic mutual information per user vs. INR (  ) for different SNR values for 2, 3 and 5 users, respectively. For the Nash equilibrium, the simulation results show that iterative water-filling almost always converges, and more than

95% of the time it converges in less than 10 iterations. The non-convergence rate is around 1%, and we exclude the non-convergence cases when calculating the ergodic mutual information. For the global and the iterative GP methods, we choose

 = 0:1, = 0:5

and

 = 0:01.

In our attempts of changing the parameters or applying

other rules for the parameter selection, we obtained almost the same ergodic mutual information curves. We also tried several choices of initial conditions: scaled identity matrices, the result of independent water-filling, and the Nash equilibrium. Although the convergence points for different initial conditions are not necessarily identical for every channel realization, the ergodic mutual information curves are extremely close to each other, and there is no April 26, 2003

DRAFT

14

evidence that one choice is better than another. In the figures we choose to show the performance curves for the case when the initial conditions use scaled identity matrices. Similarly, the global and the iterative GP methods give virtually identical performance curves (the curves were right on top of each other so we only give one), although they might not converge to the same point for every channel realization. Here we give results for the iterative GP method. In our simulation we found that more than

95% of time both algorithms converge in less than 30 iterations.

Not surprisingly, among all approaches, independent water-filling is the worst, while the global and the iterative GP methods give the best performance. When the INR is small, as explained in Section IV, both GP methods actually achieve the optimum performance. Simulation results show that both independent water-filling and the Nash equilibrium give performance very close to the optimum for small INR. All approaches essentially converge to the same point when the interference is diminishing. We observe from Fig. 1 - Fig. 3 that the interference dramatically degrades the system performance. The degradation becomes more severe as INR increases or as the number of users increases. When the interference becomes strong, independent water-filling, as expected, causes significant performance degradation. Nash equilibrium always outperforms independent water-filling and improves the system mutual information greatly when SNR and INR are large. Both GP methods give dramatic improvement over the Nash equilibrium. To give a better idea of how much gain we get from the GP methods, we plot the contour of the performance difference between the Nash equilibrium and the iterative GP method. For example, Fig. 4 gives the plots for

L = 5. A 100% improvement is observed for the SNR of 5 dB and the INR of 10 dB.

Since the mutual information is a random variable, the distribution of the mutual information is more informative than the ergodic mutual information alone. As an example, the complementary cumulative distribution function of the mutual information is given in Fig. 5 for

L = 5,

SNR

= 5 dB and INR = 10 dB. It shows that the relative

performance of different approaches for any outage probability is similar to that for the ergodic mutual information. In particular, both GP methods outperform the Nash equilibrium significantly. Now we consider the non-symmetric case, where we generate independent



`

and



`;j

randomly with uniform

distribution between -5 dB and 10 dB, which are of practical interest. The ergodic mutual information per user is shown in Table I for a different number of users. Again the GP methods show significant performance enhancement over the Nash equilibrium. The larger the number of users, the larger the improvement we obtain from the GP methods. When INR becomes very large compared to SNR, the Nash equilibrium is actually employing a 1-stream scheme, since the water-filling procedure automatically shuts down other streams. The global and the iterative GP methods, as observed in our simulation, also employ only one stream when INR is large. All these, in some sense, agree with Theorem 3 in Section III, which states that 1-stream signaling is optimum for large interference. However, dramatic differences are observed between the performances of these 1-stream signaling schemes. Although the Nash equilibrium seems to be a reasonable approach, it still has an obvious gap when compared to the GP methods.

April 26, 2003

DRAFT

15

VI. C ONCLUSION We studied the optimum signaling in the MIMO interference system for the feedback case from the mutual information perspective. We showed analytically the behavior change of the system mutual information when INR is increased from a sufficiently small number to a sufficiently large number. In particular, the system capacity is a concave function when INR is small, but is a convex function of each user’s covariance matrix when INR is large. Moreover, we proved that the optimum solution is a 1-stream approach for all users when INR is sufficiently large. Several signaling approaches were proposed. Numerical results showed that in a practical environment where many users are present, the Nash equilibrium, regardless of its relatively low complexity, might not be good enough. The proposed global and iterative gradient projection methods showed evident improvement over the Nash equilibrium. We also want to point out that all material presented in this paper will apply to the case where each user employs an arbitrary number of transmit and receive antennas, although we assumed that all users employ

N

r

N

t

transmit and

receive antennas for brevity in presentation. Moreover, we used the system mutual information defined in (3) as

our optimization criteria, where all users have equal priority. However, the analysis in Section III and the proposed gradient projection methods can be easily extended with minimal changes to the cases where different priority is assigned to different users when calculating the system mutual information. Our results show that a certain form of cooperation between users is highly desirable in an environment with excessive interference. The iterative gradient projection method is a possible choice since it is distributed and gives the same performance as the global gradient projection method. However, it requires complete information on the channel matrices and the transmitted covariance matrices, which might be prohibitive in practice. Our future work will concentrate on the search for a distributed algorithm which is less complex and requires less information than the iterative gradient projection method but gives better performance than the Nash equilibrium. R EFERENCES [1] G. J. Foschini and M. J. Gans, “On limits of wireless communications in a fading environment when using multiple antennas,” Wireless Personal Communications, vol. 6, pp. 311–335, Mar. 1998. [2] S. Catreux, L. J. Greenstein, and P. F. Dressen, “Simulation results for an interference-limited multiple-input multiple-output cellular system,” IEEE Commun. Lett., vol. 4, pp. 334 –336, Nov. 2000. [3] H. Bolckei, D. Gesbert, and A. J. Paulraj, “On the capacity of OFDM-based spatial multiplexing systems,” IEEE Trans. Commun., vol. 50, pp. 225–234, Feb. 2002. [4] C. C. Martin, J. H. Winters, and N. R. Sollenberger, “Multiple-input multiple-output (MIMO) radio channel measurements,” in IEEE International Sym. Antennas and Propagation Society, vol. 1, Boston, MA, July 2001, pp. 418–421. [5] H. Dai, A. F. MoIisch, and H. V. Poor, “Downlink multiuser capacity of interference-limited MIMO systems,” in Proc. IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, vol. 2, Lisbon, Portugal, Sept. 2002, pp. 849–853. [6] R. W. Heath, Jr., M. Airy, and A. J. Paulraj, “Multiuser diversity for MIMO wireless systems with linear receivers,” in Proc. Asilomar Conference on Signals, Systems and Computers, vol. 2, Pacific Grove, CA, Oct. 2001, pp. 1194–1199. [7] I. E. Telatar, “Capacity of multi-antenna Gaussian channels,” European Transactions on Telecommunications, vol. 10, pp. 585–595, Nov.Dec. 1999. [8] V. Tarokh, N. Seshadri, and A. R. Calderbank, “Space-time codes for high data rate wireless communications: performance criteria and code construction,” IEEE Trans. Inform. Theory, vol. 4, no. 2, pp. 744–765, Mar. 1998.

April 26, 2003

DRAFT

16

[9] W. Yu, W. Rhee, S. Boyd, and J. M. Cioffi, “Iterative water-filling for Gaussian vector multiple access channels,” IEEE Trans. Inform. Theory, submitted for publication. [10] W. Yu, G. Ginis, and J. M. Cioffi, “Distributed multiuser power control for digital subscriber lines,” IEEE J. Select. Areas Commun., vol. 20, no. 5, pp. 1105–1115, June 2002. [11] M. F. Demirkol and M. A. Ingram, “Power-controlled capacity for interfering MIMO links,” in Proc. IEEE Veh. Technol. Conf., vol. 1, Atlantic City, NJ, Oct. 2001, pp. 187–191. [12] R. S. Blum, “MIMO capacity with interference,” IEEE J. Select. Areas Commun., to appear. [13] T. M. Cover and J. A. Thomas, Elements of Information Theory.

New York: John Wiley & Sons, 1991.

[14] F. R. Farrokhi, G. J. Foschini, A. Lozano, and R. A. Valenzuela, “Link-optimal BLAST processing with multiple-access interference,” in Proc. IEEE Veh. Technol. Conf., vol. 1, Boston, MA, Sept. 2000, pp. 87–91. [15] S. Boyd and L. Vandenberghe, Convex Optimization, to be published. [Online]. Available: http://www.stanford.edu/ boyd/cvxbook.html [16] J. R. Magnus and H. Neudecker, Matrix Differential Calculus with Applications in Statistics and Economics, 2nd ed. John Wiley & Sons, 1999. [17] R. A. Horn and C. R. Johnson, Matrix Analysis.

New York: Cambridge University Press, 1985.

[18] C. U. Saraydar, N. B. Mandayam, and D. J. Goodman, “Efficient power control via pricing in wireless data networks,” IEEE Trans. Commun., vol. 50, no. 2, pp. 291–303, Feb. 2002. [19] A. B. MacKenzie and S. B. Wicker, “Game theory and the design of self-configuring, adaptive wireless networks,” IEEE Commun. Mag., vol. 39, pp. 126–131, Nov. 2001. [20] R. B. Myerson, Game Theory: Analysis of Conflict. [21] D. P. Bertsekas, Nonlinear Programming. [22] S. Haykin, Adaptive Filter Theory, 3rd ed.

April 26, 2003

Cambridge, MA: Harvard University Press, 1991.

Belmont, MA: Athena Scientific, 1995, (especially Section 2.3). Prentice Hall, 1996, (especially Appendix B).

DRAFT

Ergodic Mutual Information per User (b/s/Hz)

17

Iterative GP Nash Equilibrium Independent Water−filling SNR = −5 dB SNR = 0 dB SNR = 5 dB SNR = 10 dB SNR = 15 dB SNR = 20 dB

12

10

8

6

4

2

0 −5

0

5

10

15

20

INR (dB)

Fig. 1. Ergodic mutual information per user for Nt = Nr = 2 and L = 2 (The global GP method gives almost the same performance as the iterative GP method.). TABLE I E RGODIC MUTUAL INFORMATION (MI) PER USER FOR RANDOMLY GENERATED SNR S AND INR S Ergodic MI per user

L=2

L=3

L=4

L=5

Independent water-filling

2.2149

1.6479

1.2713

1.0163

Nash equilibrium (Inash )

2.3555

1.8274

1.4446

1.1693

Iterative GP method (Igp )

2.5393

2.1890

1.8728

1.6024

7.8%

19.8%

29.6%

37.1%

(Igp

Inash )=Inash

Ergodic Mutual Information per User (b/s/Hz)

12 Iterative GP Nash Equilibrium Independent Water−filling SNR = −5 dB SNR = 0 dB SNR = 5 dB SNR = 10 dB SNR = 15 dB SNR = 20 dB

10

8

6

4

2

0 −5

0

5

10

15

20

INR (dB)

Fig. 2. Ergodic mutual information per user for Nt = Nr = 2 and L = 3 (The global GP method gives almost the same performance as the iterative GP method.).

April 26, 2003

DRAFT

18

10 Iterative GP Nash Equilibrium Independent Water−filling SNR = −5 dB SNR = 0 dB SNR = 5 dB SNR = 10 dB SNR = 15 dB SNR = 20 dB

Ergodic Mutual Information per User (b/s/Hz)

9 8 7 6 5 4 3 2 1 0 −5

0

5

10

15

20

INR (dB)

Fig. 3. Ergodic mutual information per user for Nt = Nr = 2 and L = 5 (The global GP method gives almost the same performance as the

20 80 0

20 0

40 60 80 10

0

40 0

20

10

5

(A) Igp

5 10 INR (dB)

15

Inash (b/s/Hz)

20

0 40

0 40 60 80 10

0



5 10 INR (dB)

(B) 100 (Igp

800

−5 −5

20 0

0.2

10

0.1

0

0

0.4

0.2

5

5

0.6

10

20

6 0.

0.4

0.8

0

40 60 80 10 0

15 SNR (dB)

1

0.8

0.4

0

−5 −5

10

0.8

10 5

1.2

1

0.2

SNR (dB)

0.1

15

0.2

0.4

20

0.6

20

20

iterative GP method.).

15

20

Inash )=Inash

Fig. 4. Contour plot of the ergodic mutual information difference per user between the Nash equilibrium ( Inash ) and the iterative GP method (Igp ) for Nt = Nr = 2 and L = 5.

April 26, 2003

DRAFT

19

Prob (instantaneous mutual information > abscissa)

1

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0

Fig. 5. = 10

Iterative GP Nash Equilibrium Independent Water−filling

0.9

0.5

1 1.5 Mutual information per user (b/s/Hz)

2

2.5

Complementary cumulative distribution functions of mutual information per user for Nt = Nr = 2, L = 5, SNR = 5 dB, and INR

dB.

April 26, 2003

DRAFT

Suggest Documents