Weighted Sum Rate Maximization for MIMO ... - Semantic Scholar

4 downloads 5316 Views 566KB Size Report
coding, multiuser multi-antenna communication, zero-forcing. ... E-mail: {ltran,markku.juntti}@ee.oulu.fi. ... E-mail: {mats.bengtsson,bjorn.ottersten}@ee.kth.se.
1

Weighted Sum Rate Maximization for MIMO Broadcast Channels using Dirty Paper Coding and Zero-forcing Methods Le-Nam Tran, Member, IEEE, Markku Juntti, Senior Member, IEEE, Mats Bengtsson, Senior Member, IEEE, and Björn Ottersten, Fellow, IEEE

Abstract—We consider precoder design for maximizing the weighted sum rate (WSR) of successive zero-forcing dirty paper coding (SZF-DPC). For this problem, the existing precoder designs often assume a sum power constraint (SPC) and rely on the singular value decomposition (SVD). The SVD-based designs are known to be optimal but require high complexity. We first propose a low-complexity optimal precoder design for SZF-DPC under SPC, using the QR decomposition. Then, we propose an efficient numerical algorithm to find the optimal precoders subject to per-antenna power constraints (PAPCs). To this end, the precoder design for PAPCs is formulated as an optimization problem with a rank constraint on the covariance matrices. A well-known approach to solve this problem is to relax the rank constraints and solve the relaxed problem. Interestingly, for SZF-DPC, we are able to prove that the rank relaxation is tight. Consequently, the optimal precoder design for PAPCs is computed by solving the relaxed problem, for which we propose a customized interior-point method that exhibits a superlinear convergence rate. Two suboptimal precoder designs are also presented and compared to the optimal ones. We also show that the proposed numerical method is applicable for finding the optimal precoders for block diagonalization scheme. Index Terms—MIMO systems, broadcast channels, dirty paper coding, multiuser multi-antenna communication, zero-forcing.

I. I NTRODUCTION Over the last decade, multiple-input multiple-output (MIMO) transmission techniques have drawn a lot of attention due to its capability of boosting the channel capacity without the need for additional bandwidth or power [3]–[5]. An important scenario is the transmission from a multi-antenna transmitter, e.g, a BS in cellular networks, to several receivers, possibly with multiple antennas. This scenario is referred to Le-Nam Tran and Markku Juntti are with Centre for Wireless Communications and Department of Communications Engineering, University of Oulu, Finland. E-mail: {ltran,markku.juntti}@ee.oulu.fi. Le-Nam Tran was with Signal Processing Laboratory, ACCESS Linnaeus Center, KTH Royal Institute of Technology, SE-100 44 Stockholm, Sweden. Mats Bengtsson and Björn Ottersten are with Signal Processing Laboratory, ACCESS Linnaeus Center, KTH Royal Institute of Technology, SE-100 44 Stockholm, Sweden. E-mail: {mats.bengtsson,bjorn.ottersten}@ee.kth.se. Björn Ottersten is also with the Interdisciplinary Center for Security, Reliability and Trust (SnT), University of Luxembourg, L-1359 LuxembourgKirchberg, Luxembourg (email: [email protected]). This research has been supported by Tekes, the Finnish Funding Agency for Technology and Innovation, Nokia Siemens Networks, Renesas Mobile Europe, Elektrobit, Xilinx, Academy of Finland, and by the European Research Council under the European Community’s Seventh Framework Programme (FP7/2007-2013) / ERC grant agreement n° 228044. Parts of this paper were presented at the IEEE International Conference on Communications, Ottawa, Canada, June, 2012 [1], [2].

as the MIMO broadcast channel (BC) (alternatively, downlink channel). It is now well known that dirty paper coding (DPC) achieves the capacity region of the MIMO BC [6]. Although DPC is the optimal transmission scheme for MIMO BCs, finding the transmit covariances that achieve a point in the capacity region is generally of high complexity. For example, to find the sum-capacity point, several numerical algorithms were proposed, e.g., in [7]–[9], which are basically based on the iterative water-filling algorithm. Thus, it is of great interest to develop simplified DPC-based precoding techniques, where optimal precoders are easy to find. Successive zero-forcing dirty paper coding (SZF-DPC), introduced in [10] for single-antenna receivers, and later generalized in [11] for multiple-antenna receivers, combines the zero-forcing (ZF) technique with DPC. In the SZF-DPC scheme, for the kth user, the interference caused by users 1 to k − 1 is canceled by DPC, and that caused by users k + 1 to K is eliminated by the ZF technique, where K is the number of users. To be specific, let W k be the precoder of the kth user. Then, the ZF constraints impose that H j W k = 0 for all j < k, where H j is the channel matrix between the BS and the jth user. In this way, SZF-DPC decomposes a MIMO BC into a group of parallel interference-free point-to-point channels, which simplifies the problem of precoder design. It was shown in [10], [11] that SZF-DPC performs very close to DPC, and that the optimal precoders can be computed analytically. Thus, SZF-DPC can also be used as a benchmark for comparison purposes. It is convenient to consider a cascade structure for W k , i.e., W k = B k Dk , where B k is designed to satisfy the ZF constraints, and Dk is optimized under the power constraints. The precoder design methods for SZF-DPC differ in how B k ¯ k ) to satisfy the zerois calculated. Since W k must lie in N (H H H H ¯ interference constraints, where H k = [H H 1 H 2 · · · H k−1 ] , and N (H) denotes the null space of H, it is optimal to ¯ k ). This fact design B k as an orthonormal basis of N (H was exploited in [11], where B k is an orthonormal basis of ¯ k ), which is obtained from the singular value decompoN (H ¯ k . However, finding a basis of N (H ¯ k ) via sition (SVD) of H ¯ SVD is computationally costly, since the size of H k grows with the user index. We note that the SVD-based design in [11] needs to calculate a series of SVD to find the precoders for all users. Herein, we propose a low-complexity precoder design for SZF-DPC, using only a single QR decomposition (QRD).

2

First, we note that it is computationally cheaper to calculate the null space of a matrix using a QRD instead of an SVD. Thus, a natural way to reduce the complexity of the SVD¯ k ) using the QRD, based method is to find a basis of N (H instead of the SVD. This approach is referred to as the natural QRD-based design (NQRD-based design). Still, the NQRDbased design computes several QR decompositions. As one of our main contributions, we propose a precoder design that results from applying a QRD to a matrix composed of the channel matrices of all users. More explicitly, only a single QRD is performed to find all B k ’s, instead of ¯ k ) to find B k for each k as in separately computing N (H the SVD-based design. Thus, the proposed method requires much lower complexity, compared to the SVD- and NQRDbased designs. Even though the columns of B k in the proposed ¯ k ), we will prove that it is also method do not span N (H an optimal precoder design for SZF-DPC. Particularly, the proposed method reduces to the QRD-based design in [10] for multiple-input multiple-output (MISO) BCs, meaning that the QRD-based design in [10] is optimal for MISO BCs. The precoder designs for SZF-DPC mentioned above assume a sum-power constraint (SPC), for which the optimal precoders admit a water-filling solution. In practice, individual per-antenna power constraints (PAPCs) are more useful than the SPC, since each antenna is equipped with its own power amplifier. We notice that the precoder designs for channel inversion or block diagonalization with PAPCs in [12], [13] are applicable to SZF-DPC since SZF-DPC can be viewed as a relaxation of these schemes. Particularly, a numerical algorithm based on a dual decomposition method was proposed in [13], using a subgradient method to find the optimal precoders for block diagonalized systems. Since subgradient methods in general show slow convergence, a second contribution of this paper is to propose a more efficient solution. To the best of our knowledge, no analytical form for the optimal precoder design for SZFDPC with PAPCs has been reported. Hence, we resort to numerical algorithms. To this end, we first formulate the precoder design as a rank-constrained optimization problem, to which the rank relaxation technique is a popular approach [14]. Basically, this technique drops the rank constraints to form a relaxed problem, which is (very often) convex and, thus, easier to solve. In general, however, such an approach may yield suboptimal solutions to the original problem since the rank constraints are not guaranteed when solving the relaxed problem. Interestingly, due to the special structure of SZFDPC, we are able to show that all optimal solutions of the relaxed problem always satisfy the rank constraints. In other words, the relaxation is tight, and both the original and the relaxed problems are equivalent. Then, we propose a numerical algorithm to solve the relaxed problem, based on the barrier method. As a part of the proposed algorithm, we recognize that all the power constraints must be active at the optimum. This facilitates finding the optimal solutions numerically since equality constraints are easier to cope with. By numerical examples, we demonstrate that the proposed algorithm achieves remarkably fast convergence rate. In addition, we present two suboptimal precoder designs which have lower computational

complexity, and perform very close to the optimal one. As mentioned before, the precoder design for SZF-DPC is closely related to that for block diagonalization (BD) [15], [16]. Of the two precoding schemes, BD suffers from a stricter ZF condition than SZF-DPC. More explicitly, the ZF constraints for BD force the precoder of a user to lie in the intersection of null spaces of all other users’ channel matrices. Subsequently, in contrast to SZF-DPC, not all the antennas in the BD transmit scheme will use full power, since the number of degrees of freedom could be too low. In other words, some of the PAPCs are nonbinding. In spite of this difference, we show that the proposed precoder design method for SZF-DPC can be slightly modified to solve the problem of the precoder design for BD, leading to a numerical method that converges faster than the existing design using the subgradient method. The remainder of the paper is organized as follows. In Section II, we briefly review the system model and the precoder design for SZF-DPC schemes. Section III deals with the precoder design with the SPC. In this section, we present an optimal precoder design, and analyze the computational complexity. The precoder design with PAPCs is addressed in Section IV, where we present a specialized numerical algorithm to the find the optimal precoders. We also introduce two suboptimal designs which have lower computational complexity, but perform close to the optimal one. In Section V, we address a precoder design for BD under PAPCs, extending the proposed algorithm for SZF-DPC. Numerical results are given in Section VI, followed by some conclusions in Section VII. Notation: Standard notations are used in this paper. Bold lower and upper case letters represent vectors and matrices, respectively; H H and H T are Hermitian and normal transpose of H, respectively; ||H||F and |H| are the Frobenius norm and determinant of H, respectively. I M represents an M × M identity matrix. R(H) and N (H) denote the column space and the null space of H, respectively. diag(x), where x is a vector, denotes a diagonal matrix with elements x; diag(H), where H is a square matrix, denotes a vector of its diagonal elements. [x]i is the ith entry of vector x; [H]i,j is the entry at the ith row and jth column of H. II. MIMO BC S

AND

SZF-DPC

Consider a single-cell MIMO BC with a base station (BS) and K multiple antenna users. The channel between the BS and the kth user is generally modeled by a matrix H k ∈ Cnk ×N , where N and nk > 1 are the number of antennas at the BS and the kth user, respectively. The received signal at the kth user is given by X y k = H k xk + H k xj + nk (1) j6=k

where xk ∈ CN ×1 denotes the transmitted signals for the kth user, and nk ∈ Cnk ×1 is assumed to be a complex-Gaussian noise vector with zero mean and covariance matrix I nk . We can further write xk = W k sk (2)

3

where sk ∈ CLk ×1 , Lk ≤ min(N, nk ) with E[sk sH k ] = I, and W k ∈ CN ×Lk are the vector of transmitted symbols and the precoder of the kth user, respectively. Accordingly, (1) can be rewritten as X X y k = H k W k sk + H k W j sj + H k W j sj + nk . (3) jk

It is well known that DPC is a capacity achieving transmission strategy for MIMO PBCs. For the kth user, the BS views the interference term jk H k S j H H k |

where S j = W j W H j is the transmit covariance matrix for the jth user. Since (4) is non-convex with respect to S j , it is generally difficult to deal with. For the sum rate maximization problem for MIMO BCs under a total power constraint, optimal S j ’s cannot be found analytically, and iterative numerical algorithms are involved [7]–[9]. To overcome the difficulty in finding optimal covariance matrices in DPC, the authors in [11] proposed SZF-DPC, which admits a closed-form solution for optimal precoders. In fact, SZF-DPC is a generalization of the zero-forcing DPC (ZF-DPC) in [10], devised for single-antenna receivers. For P SZF-DPC, the interference term H W k jk H k W j sj is eliminated by designing W j such that H k W j = 0 for all j > k.

(5)

Accordingly, the resulting data rate of the kth user for SZFDPC is given by RkSZF-DPC = log |I + H k S k H H k |.

(6)

The goal of the precoder design for SZF-DPC schemes is to find W k ’s that maximize a performance measure under the ZF constraints in (5), and additional constraints on the transmit power. In this paper, we consider the precoder design for SZF-DPC to maximize the WSR under a sum power and per-antenna power constraints. III. P RECODER D ESIGNS

FOR

SZF-DPC

WITH

SPC

In this section, we address the maximization problem for SZF-DPC under a SPC, which is mathematically formulated as PK H H maximize k=1 γk log |I + H k W k W k H k | Wk (7) subject to H j W k = 0, ∀j < k PK H k=1 tr(W k W k ) ≤ P

where P is the maximum transmit power at the BS, and γk is the weighting factor associated with the rate of user k, which is typically included to achieve a certain fairness index among users. For example, to obtain the proportional fairness, we ¯ k , where R ¯ k is the average data rate of can set γk = 1/R user k in the previous time slots. Moreover, by choosing γk

appropriately, we can also find the full Pareto boundary, i.e., the outer boundary of the achievable rate region of SZF-DPC. In [11], an approach to find the optimal precoders W k ’s for (7) was proposed using the SVD, which is described next. A. SVD-based Design To solve (7), we can write W k = B k D k , where B k is designed to remove the interference, and D k is adjusted to maximize the WSR under some power constraints. Obviously, ¯ k ), the ZF constraints in (7) mean that B k must lie in N (H 1 where Pk−1 ni ×N ¯ k = [H T H T · · · H T ]T ∈ C i=1 H . (8) 1

2

k−1

Like DPC, the user ordering also affects the achievable sum rate of SZF-DPC [10], [11]. Note that a different user ordering ¯ k in (8), and accordingly different results in a different H ¯ N (H k ). More specifically, the singular values of the effective channel for the kth user are changed with a new user ordering, providing higher or lower sum rate. Obviously, the best sum rate can be obtained by searching all possible combinations of user ordering. For the sake of simplicity, we assume a natural user ordering in this paper. In [11], B k is chosen to be an ¯ k ), which can be found by an SVD orthonormal basis of N (H ¯ k . To be specific, denote the full-size SVD of H ¯ k as of H ¯k=U ¯ kΣ ¯ k [V¯ k B k ]H H

(9)

¯ k has the same dimensions as H ¯ k . Then columns where Σ Pk−1 of B k ∈ CN ׯnk , n ¯ k = N − i=1 ni , form an orthonormal ¯ k ). The condition for the BS to support all users basis of N (H ¯ k ) has a dimension larger than zero, for all k. is that N (H Assuming the rows of all H k ’s to be P linearly independent, this requirement is equivalent to N > K−1 i=1 ni . When the number of users is large, a user scheduling algorithm is needed to choose a set of users that satisfies the above condition and can exploit the multiuser PK−1 diversity gain [17]. In this paper, we assume N > i=1 ni and focus on the precoder design. H H H Since tr(W k W H k ) = tr(B k D k D k B k ) = tr(D k D k ), the maximization problem in (7) reduces to PK ˜ k Dk D H ˜H maximize γk log |I + H k Hk | k=1 Dk (10) PK H subject to k=1 tr(D k D k ) ≤ P ˜ k = H k B k . Then, D k can be easily found with where H ˜ k . More water-filling over non-zero singular values of H ˜ specifically, define a compact SVD of Hk as ˜ k = H k B k = U k Σk V H ∈ Cnk ׯnk H (11) k

where Σk is an Lk × Lk diagonal matrix that contains all ˜ k , Lk = min(nk , n non-zero singular values of H ¯ k ), and V k ˜ k . To maximize the contains the Lk singular vectors of H 1 WSR, D k is found as D k = V k Φk2 , where Φk ∈ CLk ×Lk is a diagonal matrix, which is a solution to the following problem RSZF-DPC = 1H ¯

maximize P

Φk :

tr(Φk )≤P

K X

k=1

γk log2 |I + (Σk )2 Φk |. (12)

k is only defined for k ≥ 2 since we just need to find the precoding matrices B k for k ≥ 2. The precoding matrix B 1 for the first user is set to be an identity matrix, i.e., B 1 = I N .

4

The set of optimal {Φk } to (12) can be easily calculated using the water-filling algorithm. The resulting precoder for the kth 1 user is given by W k = B k V k Φk2 . Since the columns of ¯ k ), it is not difficult to see that the SVD-based B k span N (H method is optimal for (7). However, Pk−1 this method employs SVD to find the null space of a ( i=1 ni ) × N matrix for the kth user, which is computationally costly. A simple method to reduce the complexity of the SVDbased design is to replace the SVD by the QRD, which has lower and deterministic complexity. Specifically, applying a ¯ k gives QRD to H ¯ k = [Lk 0][Q B k ]H H k

Pk−1

(13)

Pk−1

where L ∈PC i=1 ni × i=1 ni is a lower triangular matrix, k−1 ¯ H ), Qk ∈ CN × i=1 ni contains an orthonormal basis of R(H k N ׯ nk ¯ k ), i.e., and B k ∈ C forms an orthonormal basis of N (H ¯ k B k = 0, and B H B k = I. This simple method is referred H k to as the natural QRD (NQRD)-based design in the sequel. Note that B k in (13) and B k in (9) are both orthonormal basis ¯ k ), and, thus, the NQRD- and SVD-based precoder of N (H designs are equivalent. Apparently, the NQRD-based design can lower the complexity of SVD-based method, but its complexity is still high because, like SVD-based design, the QRD is sequentially applied to a matrix of large dimensions. In the following, we propose another optimal precoder design, which is derived directly from a single QRD of H in (14). B. Generalized QRD-based Design (GQRD-based design) The precoding matrix B k in the NQRD- and SVD-based ¯ k ) for each k. Due to the designs is found to be a basis of N (H ¯ concatenated structure of H k , the complexity of these methods ¯ k )) = N − increases with k. Note that rank(B k ) = dim(N (H n ¯ k . Since rank(H k B k ) = rank(H k ) = nk ≤ rank(B k ), we can reduce the dimension of B k as long as its singular values are ‘aligned’ with those of H k . This fact suggests that ¯ k ) to be an optimal design. B k need not be a basis of N (H In what follows, we propose a precoder design based on a ¯ k ). For single QRD, where columns of B k do not span N (H notational convenience, let us stack the channel matrix of all users in a matrix H defined as  H H = HH ∈ CnR ×N (14) HH · · · HH 1 2 K PK where nR = k=1 nk is the total number of receive antennas, and all precoders in a matrix W given by

W = [W 1 W 2 · · · W K ] ∈ CN ×LR (15) PK where LR = k=1 Lk is the total number of data streams that the BS is able to transmit to all users in the system. The ZF constraints in (5) can be equivalently expressed as  ˜ H1 ˜2  × H     . .   . (16) HW =  × ×    . ..  × × × ˜ × × × × HK

˜k = which is a lower block triangular matrix, where H H k W k ∈ Cnk ×Lk is the effective channel matrix of the kth user. In the generalized QRD (GQRD)-based design, we further force HW to be a lower triangular matrix. Specifically, consider a QRD of H given in (14) as H = LQ and partition L into  L1 ×   L= ×  × ×

(17) 

L2 × × ×

..

.

× ×

..

. ×

LK

      

(18)

where × stands for Lk ∈ Cnk ×nk is a lower triangular matrix, and Q into ¯1B ¯2 ··· B ¯ K ]H Q = [B (19)

¯ k ∈ CN ×nk satisfies H j B ¯ k = 0, ∀j < k, i.e. where B H ¯ ¯ ¯ ¯ H k B k = 0, and B k B k = I nk . Note that the columns of ¯ k does not form a basis of N (H ¯ k ). Let B   ¯k B ¯ k+1 · · · B ¯ K ∈ CN ׯnk . Bk = B (20)

Clearly, the columns of B k in (20) forms an orthogonal basis ¯ k ), and thus precoder for user k can always be written of N (H as W k = B k Dk . In this way, an orthogonal basis for each ¯ k ) is computed by a single QRD of H in (14). To gain N (H further insights into the structure of optimal precoders under SPC, let us analyze the effective channel of the kth user, which is given by     D k,1 H k W k = H k B k D k = Lk 0 = Lk Dk,1 (21) D k,0 where D k,1 contains the top nk rows of D k and D k,0 the remaining (¯ nk − nk ) rows. From (21), we can easily see that the cost function of (7) stays the same no matter how Dk,0 is chosen. We now show that it is optimal to set D k,0 = 0, which follows from H H H H tr(W k W H k ) = tr(B k D k D k B k ) = tr(B k B k D k D k ) H H = tr(D k D H k ) = tr(D k,1 D k,1 ) + tr(D k,0 D k,0 )

≥ tr(D k,1 D H k,1 )

(22)

with equality if and only if D k,0 = 0. In fact, we have proved the following theorem Theorem 1. The optimal precoders W k ’s for SZF-DPC under ¯ k D k,1 , where D k,1 is the SPC are of the form W k = B solution to the following problem PK H H maximize k=1 γk log |I + Lk D k,1 D k,1 Lk | Dk,1 (23) PK H subject to k=1 tr(D k,1 D k,1 ) ≤ P. Theorem 1 leads to the GQRD-based design, which is summarized in Algorithm 1.

Remark 1. As a special case when nk = 1 for all k, i.e. singleantenna receivers, the GQRD-based method is the same as the precoder design based on QRD proposed in [10]. Indeed,

5

this design is widely used in current works [18]–[20] without recognizing its optimality. As an intermediate consequence of Theorem 1, the precoder design based on QRD in [10] is shown to be optimal for SZF-DPC in MISO BCs under a SPC. We note that the optimality of the QRD design for SZF-DPC with single-antenna receivers under a SPC was also established in our earlier works using a different approach [21]–[23]. C. Complexity Comparison In this subsection, we analyze the complexity of the precoder designs for SZF-DPC schemes presented above. Since the process of finding D k is the same for all methods, we only consider the complexity of calculating B k . The complexity is measured by the number of flops as in [24], [25], denoted as ψ.2 Although flop counting is a crude measurement of the true computational complexity, it captures the order of the computation load. For simplicity, we show the complexity for the case where all users have the same number of antennas, i.e. nk = n ¯ , for all k. The number of supportable users is then given by K = N/¯ n, where N is the number of transmit antennas at the BS, assumed to be a multiple of n ¯ . The number of flops of some typical operations is given as follows. Multiplication of an m× p matrix and a p × n matrix requires 8mpn flops. The number of flops needed to compute QRD of a real matrix of size m×n, m ≥ n with fast Given transformations is 2n2 (m−n/3) [26]. For a complex matrix of the same size, we approximate the number of flops involved in a QRD by 4n2 (3m − n), i.e. treating every operation as a complex multiplication. Similarly, the number of flops for the SVD of an m × n complex-valued matrix is approximated by 24mn2 + 48m2 n + 54m3 [24]. 1) SVD-based design: For the kth user, k ≥ 2, the number ¯ k ∈ C(k−1)¯n×N is of flops needed to compute the SVD of H 2 2 2 24(k − 1)¯ nN + 48(k − 1) n ¯ N + 54(k − 1)3 n ¯ 3 . Thus, the total number of flops of the SVD-based precoder design is ψSVD ≈

K X

{24(k − 1)¯ nN 2 + 48(k − 1)2 n ¯2N

(24) +54(k − 1)3 n ¯3} ≈ 41KN 3. 2) GQRD-based Design: The complexity of the generalized QRD-based design is equivalent to the number of flops needed to compute a QRD of H, which is given by k=2

ψGQRD = 4N 2 (3N − N ) = 8N 3 .

(25)

2 A flop is equal to a real floating point operation [26]. A real addition, multiplication, or division is counted as one flop. A complex addition and multiplication have two flops and six flops, respectively.

·106 2 SVD-based design NQRD-based design GQRD-based design 1.5 Number of flops

Algorithm 1 GQRD-based precoder design for SZF-DPC. 1: Compute the QR decomposition as H = LQ ¯1B ¯2 ··· B ¯ K ]H , where B ¯ k ∈ CN ×nk . 2: Partition Q = [B 3: Solve (23) using water-filling algorithm over effective ˜ k = HkB ¯ k = Lk . Denote the optimal channel matrices H solution by Dk,1 . ¯ k D k,1 . 4: Optimal precoder for user k is found as W k = B

1

0.5

0 4

6

8

10 12 14 16 18 20 22 24 26 28 30 32 Number of transmit antennas, N

Fig. 1. Complexity comparison of various precoder designs for SZF-DPC schemes, nk = n ¯ = 2.

The complexity of the precoder designs mentioned above is compared in Fig. 1, where we plot the number of flops versus the number of transmit antennas, N . In Fig. 1, the number of receive antennas is the same for all users, nk = n ¯ = 2, for all k, and the number of users is K = N/¯ n. As we can see, the NQRD-based design requires significantly lower complexity than the SVD-based method. Noticeably, the GQRD-based precoder design greatly reduces the complexity compared to other precoder designs.

IV. P RECODER D ESIGNS

FOR

SZF-DPC

WITH

PAPC S

While the precoder designs for SZF-DPC with a sum power constraint can be found easily using the water-filling algorithm [11], that with PAPCs has not been extensively studied. In practice, PAPCs are more relevant since each antenna has its own power amplifier [12], [27], [28]. The maximization for SZF-DPC under PAPCs is formulated as maximize Wk

PK

k=1

H γk log |I + H k W k W H k Hk |

H j W k = 0, ∀j < k, PK H k=1 [W k W k ]n,n ≤ Pn , ∀n = 1, . . . , N (26) where Pn is the power constraint for the nth transmit antenna. ¯ k ) as given in (8). Then, solving Let B k be a basis of N (H (26) amounts to solving the following optimization subject to

maximize Ωk 0

subject to

PK

k=1

PK

˜ k Ωk H ˜ H| γk log |I + H k

H k=1 [B k Ωk B k ]n,n rank(Ωk ) ≤ nk

≤ Pn , ∀n

(27)

n ¯ k ׯ nk ˜ k = H k Bk ∈ where Ωk , Dk D H , and H k ∈ C nk ׯ nk C .

6

We begin by omitting the rank constraints in (27), and consider the following relaxed problem PK ˜ k Ωk H ˜ H| maximize γk log |I + H k k=1 Ωk 0 (28) PK H subject to k=1 [B k Ωk B k ]n,n ≤ Pn .

Now problem (28) is a convex program, and thus can be solved by numerical optimization tools, for example SDPT3 [29]. Regarding the relaxation technique, an important question to ask is whether the optimal solution to the relaxation problem is also optimal to the original problem. Interestingly, we show that (27) and (28) are equivalent, and thus the optimal solution to (28) is also optimal to (27). The proof is an immediate consequence of the following lemma.

Lemma 1. The optimal solutions Ωk to (28) satisfy rank(Ωk ) ≤ Lk = min(nk , n ¯ k ). Proof: Please refer to Appendix A. Although problem (28) can be solved by a general purpose optimization package, developing a specialized algorithm, if possible, that exploits the problem structure is always of great interest. Herein, we present a customized interior-point algorithm to solve (28), using the barrier method. Before proceeding, it is worth pointing out that a two-stage iterative algorithm was proposed for the precoder design of block diagonalization scheme in [13] using a subgradient method, which can be applied to solve (28). More explicitly, consider the partial Lagrangian function of (28), which is given by L(Ωk , λ) =

K X

k=1



H

˜ k Ωk H ˜ | γk log |I + H k

N X

n=1

K X λn ( [B k Ωk B H k ]n,n − Pn ) (29) k=1

where λn is the dual variable associated with the power constraint on the nth antenna . Since strong duality holds for (28), its optimal solution can be found via the following convex-concave optimization problem minimize maximize L(Ωk , λ) = minimize g(λ) λ≥0

Ωk 0

PK

P H = [Ω1 ]i,i + K k=2 [B k Ωk B k ]i,i < Pi . There exists ǫ > 0 to be small enough such that [Ω1 ]i,i + PK H k=2 [B k Ωk B k ]i,i + ǫ < Pi . Replacing [Ω1 ]i,i by ([Ω1 ]i,i + ǫ) yields a larger objective value in (28), which contradicts the assumption that {Ωk } are optimal. This observation is computationally useful since the inequality constraints in (28) can be replaced by equality ones, which facilitates a Newtontype method. In other words, (28) is equivalent to H k=1 [B k Ωk B k ]i,i

A. Optimal Precoder Design

λ≥0

(30)

where g(λ) , max L(Ωk , λ) is the dual function of problem Ωk 0

(28). The two-stage iterative algorithm in [13] works as follows. For fixed λ, the set of covariance matrices {Ωk } that maximizes L(Ωk , λn ) can be obtained by the waterfilling algorithm. Next, for a set of given {Ωk }, the iterative algorithm updates λ to minimize the dual function g(λ) based on the subgradient method. Generally, however, the subgradient method converges slowly to the optimum. It is known that for a minimax optimization problem, the infeasible-start Newton’s method [30] that solves the maximization and the minimization at the same time has a faster convergence rate [28], [30], [31]. In the following, we propose a numerical algorithm to solve (28), which exhibits a better convergence behavior. First, we observe that the constraints in (28) are active at the optimum. As proof, suppose the ith constraint is inactive, i.e.

PK ˜ k Ωk H ˜ H| maximize γk log |I + H k Pk=1 K H subject to [B Ω B ] = P , ∀n k k n,n n k k=1 Ωk  0, ∀k.

(31)

The proposed algorithm is based on the barrier method [30] to solve (31). As a standard step, we define a modified objective function f (t, {Ωk }) = −(t

K X

k=1

˜ k Ωk H ˜ H| + γk log |I + H k

K X

k=1

log |Ωk |)

(32) where log |Ωk | is the logarithmic barrier function to account for the positive semidefinite constraint Ωk  0, and t > 0 is a parameter that controls the logarithmic barrier terms. For mathematical convenience, let (n)

Ak

n ¯ k ׯ nk = BH k diag(0, . . . , 0, 1, 0, . . . , 0)B k ∈ C | {z } | {z } n−1

(33)

N −n

and consider a standard equality constrained minimization problem minimize subject to

f ({Ωk }, t) PK (n) k=1 tr[Ak Ωk ] = Pn , ∀n.

(34)

The general idea of a barrier method is that, for a fixed t, we find the optimal solutions {Ωk (t)} to (34) (which is known as the centering step), and increase t until the algorithm converges. In this paper, we employ the infeasible start Newton method to find the optimal solutions to (34). The purpose of using the infeasible start Newton method is to simplify the initialization of {Ωk } that satisfy the equality constraints. We start with the optimal conditions (i.e. KKT conditions) for (34), which are given by ˜ H (I + H ˜ k Ωk H ˜ H )−1 H ˜k −tγk H k k N X (n) −Ω−1 µn Ak = 0, ∀k k +

(35)

n=1

K X

k=1

(n)

tr[Ak Ωk ] = Pn , ∀n (36)

where {µn } are the dual variables. In (35) we have used the ˜ k Ωk H ˜ H | with respect fact that the gradient of log |I + H k ˜ k Ωk H ˜H ˜H to Ωk is given by ∇Ωk log |I + H k | = H k (I + −1 ˜ ˜ k Ωk H ˜H H H k . The main effort in a Newton method is k ) to calculate the Newton step. To do this, we replace Ωk by Ωk +∆Ωk and µn by µn +∆µn in the KKT conditions, which

7

yields the KKT system for the Newton step ˜H ˜ ˜H ˜ ˜ H −1 H ˜k tγk H k (I + H k Ωk H k + H k ∆Ωk H k ) + (Ωk + ∆Ωk )−1 − K X

k=1

N X

(n)

(µn + ∆µn )Ak

n=1

(n)

tr[Ak ∆Ωk ] = Pn −

K X

k=1

= 0, ∀k

(n)

tr[Ak Ωk ], ∀n.

(37)

(38)

˜k = I + H ˜ k Ωk H ˜ H . Since Ω ˜ k is invertible, and Denote Ω k −1 −1 −1 −1 (A + B) ≈ A − A BA for small B, we can write (37) as −1 ˜H ˜ −1 ˜ −1 ˜ ˜ H ˜ −1 ˜ tγk H k (Ωk − Ωk H k ∆Ωk H k Ωk )H k + (Ωk −1 − Ω−1 k ∆Ωk Ωk ) −

N X

(n)

(µn + ∆µn )Ak

n=1

= 0, ∀k

(39)

Algorithm 2 The proposed numerical algorithm to solve (28) Initinalization: Ωk = In¯ k , µ = 0, t = t0 , and φ and tolerance ǫ > 0 1: repeat {Outer iteration} 2: repeat {Inner iteration (centering step)} 3: Compute the Newton step ∆Ωk and dual step ∆µ from (41) and (44), respectively 4: Backtracking line search: 5: s=1 6: while r({Ωk } + s{∆Ωk }, µ + s∆µ) > (1 − αs)r({Ωk }, µ) or ({Ωk } + s{∆Ωk }  0) do 7: s = βs 8: end while 9: Update primal and dual variables: Ωk = Ωk +s∆Ωk ; µ = µ + s∆µ 10: until r({Ωk }, µ) < ǫ 11: Increase t. t = φt 12: until t is sufficiently large to tolerate the duality gap.

or equivalently as We define residual norm at {Ωk } and µ, which is used in the backtracking line search, as

˙ k ∆Ωk H ˙ k Ωk + ∆Ωk = tγk Ωk H ˙ k Ωk tγk Ωk H + Ωk − H

N X

(n)

(µn + ∆µn )Ωk Ak Ωk , ∀k

(40)

n=1

−1

˜ k . One possibility to find {∆Ωk } ˜ H ˜ Ω ˙k = H where H k k and {∆µn } is to vectorize ∆Ωk as a vector ωk of length n ¯ k (¯ nk +1)/2 for each k,Ptransform (38) and (40) into a form of K a linear system of (N + k=1 n ¯ k (¯ nk +1)/2) variables, and use a generic method to solve the resulting linear system. However, the complexity of such a method is O(N 6 ). In this paper, we present a low-complexity method to find {∆Ωk } and {∆µn }, using block elimination [30, Section 10.4]. Specifically, we express ∆Ωk as (0)

∆Ωk = Ξk +

N X

(i)

∆µi Ξk .

(41)

i=1

Substituting (41) into (40) yields a system of (N +1) discretetime Sylvester equations ˙ (0) H ˙ k Ωk + Ξ(0) = tγk Ωk H ˙ k Ωk + tγk Ωk HΞ k k N X (n) Ωk − µn Ω k A k Ω k n=1

˙ (i) H ˙ k Ωk + Ξ(i) = −Ωk A(i) Ωk , i = 1, . . . , N. tγk Ωk HΞ k k k (42) Numerical methods to solve the discrete-time Sylvester equations in (42) with complexity O(¯ n3k ) can be found e.g. in [32]. We note that the solution to each discrete-time Sylvester equation in (42) is a Hermitian matrix, and thus ∆Ωk in (41) is Hermitian as well. To compute the Newton step for the dual variable µ, we plug ∆Ωk from (41) into (38), which results in a linear system Ψ∆µ = ψ (43) PK (i) (j) tr(A Ξ ) for i, j = 1, 2, . . . , N , where [Ψ]i,j = PK PK k=1 (i) k (0)k (i) and [ψ]i = − k=1 tr(Ak Ξk ) + Pi − k=1 tr(Ak Ωk ).

r({Ωk }, µ) =

K X

k=1

˜ H (I + H ˜ k Ωk H ˜ H )−1 H ˜k || − tγk H k k −

Ω−1 k

+

N X

n=1

(n)

µn Ak ||F + kvk2

(44)

PK (i) where vi = Pi − k=1 tr[Ak Ωk ] for i = 1, 2, . . . , N. The proposed algorithm to solve (28) is summarized in Algorithm 2. The proposed algorithm can be easily modified into a primal-dual interior-point method which often shows faster convergence rate. However, we skip the details for the sake of simplicity. Remark 2. Here, we provide a rough comparison of the computational cost of an iteration for the proposed algorithm and the two-stage iterative method. Note that, to update Ωk in each iteration of the two-stage iterative method, presented in [13], we need to compute the singular value decomposition (SVD) of an nk × n ¯ k matrix, and the inverse of an n ¯k × n ¯k matrix, of which the complexity is O(¯ n2k nk ) and O(¯ n3k ), respectively. As mentioned earlier, the complexity of solving (42) is reduced to O(¯ n3k ). That is to say, the complexity of each iteration of the two-stage iterative method is of the same order as that of Algorithm 2. Moreover, Algorithm 2 is actually a customized interior-point method using the barrier method, and thus it shows a superlinear convergence rate as we show in the numerical result section. As a result, for the proposed algorithm, less computation effort is required to obtain the optimal precoders. B. Suboptimal Designs In practice, it is always interesting to find suboptimal designs that can achieve a significant fraction of the optimal performance, but require lower complexity. In this subsection we propose two suboptimal precoder designs, and briefly discuss their computational complexity, compared to (28).

8

1) QRD-based PAPC Precoder Design: The first suboptimal design, which is referred to as QRD-based PAPC design, is derived from the GQRD-based design for the SPC. Specifically, let Qk ∈ CN ×nk be the result of applying the QRD to H as given in (19). By Theorem 1, it holds that H H H H k Qk QH k H k = H k B k B k H k . This equality suggests that precoders based on a linear combination of Qk are expected to work well. In this way, the transmit covariance matrices of the first suboptimal design are given by Sk =

Qk Ωk QH k

(45)

where Ωk ∈ Cnk ×nk is the solution to the following problem maximize Ωk

subject to

PK

H H k=1 γk log |I + H k Qk Ωk Qk H k | PK H k=1 [Qk Ωk Qk ]n,n ≤ Pn , ∀n.

(46)

We note that problem formulation in (46) is analogous to that in (31). Hence, Algorithm (2) is used to compute optimal Qk in (46). Recall that the dimension of each Ωk in (46) is nk (nk + 1)/2, which is smaller than n ¯ k (¯ nk + 1)/2 for each optimal Ωk in (28). Thus, solving (46) requires much lower complexity, especially when N is large, and nk is small. The simulation results show that the sum-rate gap between the optimal design in (28) and suboptimal design in (46) is negligible. 2) Rescaled SPC Precoder Design: The second suboptimal design is obtained by scaling each column of the optimal precoders for the SPC to meet the PAPCs. To be specific, let W ⋆k be the optimal precoder for the kth user with the SPC, which is presented in Section III. Then a suboptimal precoder √ for the kth user can be given by W k = W ⋆k diag( αk ), where αk = [αk,1 αk,2 · · · αk,nk ], and αk,i is a factor used to the scale the ith column of W ⋆k to satisfy the PAPCs. Mathematically, αk is found through solving the following problem maximize αk ≥0

subject to

PK

⋆ ⋆H H k=1 γk log |I + H k W k diag(αk )W k H k | PK ⋆ ⋆H k=1 [W k diag(αk )W k ]n,n ≤ Pn , ∀n.

(47) We can particularize Algorithm 2 to solve (47), and the required computational complexity is lower, compared to the first one, since the dimension of each αk is just nk . V. P RECODER D ESIGN

B LOCK D IAGONALIZATION PAPC S

FOR

WITH

As mentioned earlier, an optimal precoder design was proposed in [13] for BD under PAPCs, using the subgradient method. In this section, we illustrate how to modify Algorithm 2 to attain the optimal precoder design for BD with significantly faster convergence rate. Let T k ∈ CN ×Lk be the precoder of the kth user in the BD scheme, where Lk ≤ nk is the number of data streams that the BS can allocate to user k. Then the ZF constraints for BD impose that H i T k = 0 for all i 6= k. That is, the precoder of a user is designed to cancel the interference induced by all other users. The maximization

for BD with PAPCs is formulated as PK H H maximize k=1 γk log |I + H k T k T k H k | Sk

subject to

(48)

H j T k = 0, ∀j 6= k PK H k=1 [T k T k ]n,n ≤ Pn , ∀n

To remove the ZF constraints in (48), for user k, we define ¯ ′k as a matrix, comprising all other users’ channel matrices H [15] ¯ ′ = [H T H T · · · H T H T · · · H T ]T ∈ C( H k 1 2 k−1 k+1 K

P

i6=k

ni )×N

(49) P ′ Similarly, let Πk ∈ CN ׯnk , where n ¯ ′k = N − i6=k ni , be ¯ ′ ). Then we can write T k = Πk T ′ , T ′ ∈ a basis of N (H k k k n′k ×Lk C , and problem (48) is equivalent to PK ′ ′H H H maximize k=1 γk log |I + H k Πk T k T k Πk H k | Sk PK ′ ′H H subject to k=1 [Πk T k T k Πk ]n,n ≤ Pn , ∀n (50) We can further rewrite (50) as PK ˜H ˜ maximize k=1 γk log |I + H k Ωk H k | Ωk 0 PK (51) subject to [Π Ω Π ] ≤ P , ∀n k

k=1

k

k n,n

n

rank(Ωk ) ≤ min(n′k , Lk )

′ ′ ˜ k = H k Πk is where Ωk = T ′k T ′H ∈ Cnk ×nk , and H k the effective channel matrix for user k in the BD scheme. Similarly, the relaxed problem is equivalently given by PK ˜ ˜H maximize k=1 γk log |I + H k Ωk H k | Ωk 0 (52) PK subject to k=1 [Πk Ωk Πk ]n,n ≤ Pn , ∀n.

We have shown in Section IV that the relaxation when designing the optimal precoders for SZF-DPC with PAPCs is tight. The same result holds also for BD, as shown in the following lemma. Lemma 2. The optimal solutions to (52) satisfy rank(Ωk ) ≤ min(n′k , nk ) for all k = 1, 2, . . . K.

Proof: The proof of Lemma 2 adopts that of Lemma 1. Further details are given in Appendix B. Different from SZF-DPC, the inequality constraints in (52) are not necessarily active at the optimum. However, introducing slack variables, we can transform (52) into an easier-tohandle formulation, which is expressed as PK ˜ ˜H maximize k=1 γk log |I + H k Ωk H k | PK H subject to (53) k=1 [Πk Ωk Πk ]n,n + ξn = Pn , ∀n ξn ≥ 0 ∀n Ωk  0, ∀k. The proposed algorithm for BD follows the same steps as those for SZF-DPC, with some modifications in the centering step. First, the modified objective function is changed to K  X ˜ k Ωk H ˜ H| f ({Ωk }, v, t) = − t γk log |I + H k k=1

+

K X

k=1

log |Ωk | +

N X

n=1

log ξn



(54)

.

9

101

and the problem for the centering step is (55)

Similar to (35) and (36), the KKT conditions for (55) are expressed as −

˜ H (I tγk H k

+

K X

k=1

˜ k Ωk H ˜ H )−1 H ˜k H k N X (n) − Ω−1 + µn Ak k n=1

= 0, ∀k

−ξn−1 + µn = 0, ∀n

(56) (57)

[Πk Ωk ΠH k ]n,n + ξn = Pn , ∀n

(58)

˙ ˙ ˙ tγk Ωk H∆Ω k H k Ωk + ∆Ωk = tγk Ωk H k Ωk + Ωk −

(µn +

(n) ∆µn )Ωk Ak Ωk ,

n=1

∀k

(59)

k=1

The Newton step ∆Ωk is computed analogously by a system of (N + 1) discrete-time Sylvester equations as in (42). Plugging (60) into (61), and using (41), we can find the Newton step ∆µ for the dual variables as the solution to the following Ψ∆µ = ψ (62) PK (i) (j) 2 where [Ψ]i,j = k=1 tr(Ak Ξk ) − ξi δi,j for i, j = 1, 2, . . . , N , where δi,j denotes the Kronecker’s function, i.e., δi,j = 1 if i = j and δi,j = 0 otherwise, and PK (i) (0) (i) [ψ]i = Pi − k=1 tr(Ak Ξk + Ak Ωk ) + ξn2 µn − 2ξi . The Newton direction for the slack variables is computed using (60), namely as ∆ξn = −ξn2 (µ + ∆µn ) + ξn .

10−2 10−3 10−4 10−5 10−6

20

40

60

Fig. 2.

80

100

Convergence behavior, N = 6, nk = 2 for k = 1, 2, 3.

VI. N UMERICAL

ξn−2 ∆ξn + ∆µn = ξn−1 − µn , ∀n (60) K K X X (n) (n) tr[Ak ∆Ωk ] + ∆ξn = Pn − tr[Ak Ωk ] − ξn ∀n (61) k=1

10−1

Iteration

and the resulting KKT system to find the Newton direction is given by

N X

Proposed method Two-stage iterative method

100

Error sum rate

f ({Ωk }, ξ, t) PK k=1 [B k Ωk B k ]n,n + ξn = Pn , ∀n.

minimize subject to

(63)

Remark 3. We have treated the cases of sum-power and per-antenna power constraints separately and their specific properties are exploited to arrive at a computationally efficient algorithm for each case. However, it could also be practical to consider both types of power constraints simultaneously. For example, in addition to PAPCs due to the physical limitation of the power amplifiers, we can impose a SPC on transmitted data to meet, e.g., a regulatory body requirement for health factors or to reduce the overall interference situation [31]. We note that the proposed algorithms for SZF-DPC and BD schemes presented in the paper can be easily modified to handle SPC and PAPCs simultaneously. For such a case, all the constraints will not be binding, in general. However, we can introduce some slack variables to convert all constraints to be equality ones as done in (53). Then, the steps from (54) to (63) can be slightly changed to solve the new problem.

RESULTS

In this section, we provide numerical examples to demonstrate the results in this paper. In the first numerical experiment, we compare the convergence rate of Algorithm 2 and the two-stage iterative method in [13]. A MIMO BC with N = 6 transmit antennas, K = 3 users, each with 2 receive antennas is simulated. The tolerance (for each centering step) is set to be ǫ = 10−5 . The barrier method parameters t0 and φ are set to 50 and 1, respectively. The effects of the initial value of φ and t0 are discussed in [30]. The backtracking line search parameters in Algorithm 2 are α = 0.01, and β = 0.5. Fig. 2 plots the error in the sum rate versus the number of iterations for a random realization of channel matrices. For the first iterations, the two-stage iterative method converges faster than the proposed method to the optimal solution. However, the proposed method performs better when precoders approach the optimum. Simulation results obtained with other randomly generated channel matrices illustrate the same convergence behavior of the two methods. Recall that a subgradient need not be a descent direction. Thus, an iteration can even decrease the objective function. Moreover, the convergence rate of subgradient methods relies strongly on the problem size. For example, we observe that the twostage iterative method fails to converge within three thousand iterations when N = 16, and K = 8, while Algorithm 2 still converges to the optimal solution within tens of iterations. As a conclusion, the proposed numerical algorithm demonstrates better convergence rate than the two-stage iterative algorithm in [13]. It is worth mentioning again that, since the proposed algorithm and the two-stage iterative method are alternatives to each other to find the optimal solution of the resulting convex problem given in (28), they will converge to the same optimal objective value, i.e., the proposed algorithm and the two-stage iterative approach yield the same sum-rate performance. Thus, it is sufficient to provide the sum rate offered by the proposed algorithm in Figs. 3 and 4 to follow. In Fig. 3, we plot the average sum rate, i.e., γk = 1 for all k, of optimal and suboptimal precoder design methods

10

Sum-power constraint Per-antenna optimal design QRD-based PAPC precoder design Rescaled SPC Precoder Design

Average sum rate (b/s/Hz)

Average sum rate (b/s/Hz)

20

Sum-power constraint Per-antenna optimal design QRD-based PAPC precoder design Rescaled SPC Precoder Design

6

15

10

5

4

3

2 0

2

4

6

8

0

2

Total transmit power, P (dB) Fig. 3. Sum rate comparison of optimal and suboptimal designs for SZF-DPC schemes with dk = 1, N = 8, K = 4, nk = 2, ∀k = 1, 2, . . . , 4.

for SZF-DPC schemes as a function of P , the total transmit power. The resulting power constraint for each antenna (when considering the PAPCs) is Pn = P/N for all n. A quasistatic fading model is used in our simulation. The channel √ ˘ k where dk is a for user k is generated as H k = dk H ˘k given parameter to capture the path loss and entries of H follow zero mean and unit variance complex Gaussian random variables for each snapshot. Fig. 3 considers a scenario with dk = 1, N = 8, K = 4, and nk = 2 for k = 1, 2, . . . , 4. That is, we ignore the effect of path loss and simply consider small scale fading in Fig. 3. We can see that the optimal precoder designs with PAPCs yield a slightly lower sum rate than those with the corresponding SPC. The sum-rate gap between the optimal and suboptimal designs are small, and decreases as P increases. The suboptimal design based on QRD performs slightly better than the suboptimal design based on a heuristic manner. We can expect that these precoder designs give the same performance as P approaches to infinity, since the equal power allocation is proved to be optimal in the high SNR regime. In Fig. 4, we investigate the sum rate performance of the precoder designs for SZF-DPC where users have different path loss. The simulation scenario for Fig. P 4 is the same as that K for Fig. 3 but now we set dk = k 2 / l=1 l2 . Due to nonuniformly distributed location of the users, the gap between the optimal precoder designs with SPC and PAPCs become larger. VII. C ONCLUSIONS This paper addresses the precoder design of SZF-DPC for MIMO BCs. For the SPC, we propose a precoder design, which is shown to be optimal and has greatly lower complexity than the existing method using the SVD. For PAPCs, the precoder design for SZF-DPC is first formulated as a rankconstrained optimization problem. Then we consider a relaxed problem, which is obtained by dropping the rank constraints. Exploiting the special features of SZF-DPC, we prove that the

4

6

8

Total transmit power, P (dB) Fig. 4. Sum rate comparison P of optimal and suboptimal designs for SZF2 DPC schemes with dk = k 2 / K l=1 l , N = 8, K = 4, nk = 2, ∀k = 1, 2, . . . , 4.

relaxed and original problems are equivalent. More explicitly, we show that optimal solutions of the relaxed problem always satisfy the rank constraints. Next, we propose an efficient numerical method based on a barrier method to solve the relaxed problem. The proposed numerical method is shown to have a superior convergence behavior, compared with the twostage iterative method based on the dual subgradient method. In addition, we illustrate that that the proposed precoder design for SZF-DPC can be slightly modified to solve the problem of precoder design for BD. A PPENDIX A. Proof of Lemma 1 In this appendix, we prove that the rank of optimal solutions to (28) is less than or equal to Lk . The proof follows similar arguments as in [23], [33]. We begin by forming the Lagrangian function of (28), which is given by L(Ωk , λ) = −

N X

n=1

λn

K X

k=1

K X

˜ k Ωk H ˜ H| γk log |I + H k

(n) tr(Ωk Ak )

k=1

− Pn

!

+

K X

tr(Φk Ωk )

(64)

k=1

˜ k = H k B k ∈ Cnk ׯnk , A(n) is defined in (33), where H k {λn } are dual variables associated with the PAPCs, and Φk  0 is the dual variable for the positive semidefinite constraint. Denote P = diag(P1 , P2 , . . . , PN ), Λ = diag(λ1 , λ2 , . . . , λN ), and Λk = B H k ΛB k . We can then rewrite (64) as L(Ωk , λ) =

K  X k=1

H

˜ k Ωk H ˜ | γk log |I + H k

 − tr(Λk Ωk ) + tr(Φk Ωk ) + tr(ΛP ).

(65)

11

At the optimum, we have H

H

˜ (I + H ˜ k Ωk H ˜ )−1 H ˜ k − Λk + Φk = 0. γk H k k

(66)

Using the complementary slackness property Φk Ωk = 0, we obtain H

H

˜ (I + H ˜ k Ωk H ˜ )−1 H ˜ k Ωk = Λ k Ωk . γk H k k

(67)

We now show that the dual optimal variables of (28) are strictly positive, i.e., λn > 0 for all 1 ≤ n ≤ N . As proof, consider the dual objective of (28), which can be expressed as, g(λ, Φk ) = max L(Ωk , λ, Φk ). (68) By contradiction, suppose λi = 0 for some 1 ≤ i ≤ N . We construct a set of Ωk such that Ω1 = diag(0, . . . , 0, α, 0, . . . , 0), and Ωk = 0 for 2 ≤ k ≤ K. | {z } | {z } i−1

N −i

Then, the objective function in (65) becomes

˜ 1 ]i ||2 ) + tr(Φ1 Ω1 ) (69) L(Ωk , λ, Φk ) = γ1 log(1 + α||[H 2

˜ 1 ]i is the ith column of H ˜ 1 . We can see that the where [H objective function in (69) is unbounded above if α → ∞. Since we are only interested in the case where g(λ, Φk ) is finite, we conclude that λi > 0 for all 1 ≤ i ≤ N . Thus, Λ must be positive definite, and Λk is invertible. It follows from ˜ k ) = Lk , which completes the (67) that rank(Ωk ) ≤ rank(H proof.

B. Proof of Lemma 2 In this appendix, we slightly modify the proof of Lemma 1 to show that the optimal solutions to problem (52) satisfy rank(Ωk ) ≤ min(nk , n′k ). Let {λn } be dual variables associated with the PAPCs in (52). Following the same derivations from (64) to (67) as in Appendix A, we have ˜ H (I + H ˜ k Ωk H ˜ H )−1 H ˜ k Ωk = Λ k Ωk γk H k k

(70)

˜ k are defined in (49)-(52),Λ = where Πk and H diag(λ1 , λ2 , . . . , λN ), and Λk is now defined as Λk = ΠH k ΛΠk . Unlike SZF-DPC where we are able to prove that λn > 0 for all n, the dual variables associated with the PAPCs for BD are not necessarily positive. That is to say, not all the power constraints are necessarily tight at the optimum. However, Λk in (70) is still invertible. To see this, let θ be a vector lying in N (Λk ), i.e. Λk θ = 0. Due to the assumption of independence among H k ’s, it is guaranteed with probability H ˜H one that H k θ = H k Πk θ 6= 0. Let Ωk = αθθ . Then, lim L(Ωk , λ, Φk ) = ∞, i.e., the objective function in (68) α→∞ is unbounded above, which is not of the interest. For Λk to be invertible, the number of non-zero λn is not less than n′k , which is a special case of [13, Lemma 3.2]. R EFERENCES [1] L.-N. Tran, M. Juntti, M. Bengtsson, and B. Ottersten, “Successive zero-forcing DPC with sum power constraint: Low-complexity optimal precoders,” in Proc. IEEE ICC 2012, Jun. 2012, pp. 4857–4861. [2] L.-N. Tran, M. Juntti, M. Bengtsson, and B. Ottersten, “Successive zeroforcing DPC with per-antenna power constraint: Optimal and suboptimal designs,” in Proc. IEEE ICC 2012, Jun. 2012, pp. 3746–3751.

[3] E. Telatar, “Capacity of multi-antenna Gaussian channels,” Eur. Trans. Telecommun, vol. 10, pp. 585–598, Nov. 1999. [4] G. J. Foschini and M. J. Gans, “On limits of wireless communications in a fading environment when using multiple antennas,” Wireless Pers.Commun, vol. 6, pp. 311–335, Mar. 1998. [5] A. Goldsmith, S. Jafar, N. Jindal, and S. Vishwanath, “Capacity limits of MIMO channels,” IEEE J. Sel. Areas Commun., vol. 21, no. 5, pp. 684 – 702, Jun. 2003. [6] H. Weingarten, Y. Steinberg, and S. Shamai, “The capacity region of the Gaussian multiple-input multiple-output broadcast channel,” IEEE Trans. Inf. Theory, vol. 52, no. 9, pp. 3936–3964, Sep. 2006. [7] N. Jindal, W. Rhee, S. Vishwanath, S. Jafar, and A. Goldsmith, “Sum power iterative water-filling for multi-antenna Gaussian broadcast channels,” IEEE Trans. Inf. Theory, vol. 51, no. 4, pp. 1570–1580, Apr. 2005. [8] W. Yu, “Sum-capacity computation for the Gaussian vector broadcast channel via dual decomposition,” IEEE Trans. Inf. Theory, vol. 52, no. 2, pp. 754 –759, Feb. 2006. [9] M. Kobayashi and G. Caire, “An iterative water-filling algorithm for maximum weighted sum-rate of Gaussian MIMO-BC,” IEEE J. Sel. Areas Commun., vol. 24, no. 8, pp. 1640–1646, Aug. 2006. [10] G. Caire and S. Shamai, “On the achievable throughput of a multiantenna Gaussian broadcast channel,” IEEE Trans. Inf. Theory, vol. 49, no. 7, pp. 1691–1706, Jul. 2003. [11] A. Dabbagh and D. Love, “Precoding for multiple antenna Gaussian broadcast channels with successive zero-forcing,” IEEE Trans. Signal Process., vol. 55, no. 7, pp. 3837–3850, Jul. 2007. [12] A. Wiesel, Y. Eldar, and S. Shamai, “Zero-forcing precoding and generalized inverses,” IEEE Trans. Signal Process., vol. 56, no. 9, pp. 4409–4418, 2008. [13] R. Zhang, “Cooperative multi-cell block diagonalization with per-basestation power constraints,” IEEE J. Sel. Areas Commun., vol. 28, no. 9, pp. 1435–1445, Dec. 2010. [14] Z. quan Luo, W. kin Ma, A.-C. So, Y. Ye, and S. Zhang, “Semidefinite relaxation of quadratic optimization problems,” IEEE Signal Process. Mag., vol. 27, no. 3, pp. 20–34, May 2010. [15] Q. Spencer, A. Swindlehurst, and M. Haardt, “Zero-forcing methods for downlink spatial multiplexing in multiuser MIMO channels,” IEEE Trans. Signal Process., vol. 52, no. 2, pp. 461–471, Feb. 2004. [16] L.-N. Tran, M. Juntti, and E.-K. Hong, “On the precoder design for block diagonalized MIMO broadcast channels,” IEEE Commun. Lett., vol. 16, no. 8, pp. 1165–1168, Aug. 2012. [17] L.-N. Tran and E.-K. Hong, “Multiuser diversity for successive zeroforcing dirty paper coding: Greedy scheduling algorithms and asymptotic performance analysis,” IEEE Trans. Signal Process., vol. 58, no. 6, pp. 3411–3416, Jun. 2010. [18] J. Jiang, R. Buehrer, and W. Tranter, “Greedy scheduling performance for a zero-forcing dirty-paper coded system,” IEEE Trans. Commun., vol. 54, no. 5, pp. 789–793, May 2006. [19] Z. Tu and R. Blum, “Multiuser diversity for a dirty paper approach,” IEEE Commun. Lett., vol. 7, no. 8, pp. 370–372, Aug. 2003. [20] M. Maddah-Ali, M. Sadrabadi, and A. Khandani, “Broadcast in MIMO systems based on a generalized QR decomposition: Signaling and performance analysis,” IEEE Trans. Inf. Theory, vol. 54, no. 3, pp. 1124 –1138, Mar. 2008. [21] L.-N. Tran, M. Juntti, M. Bengtsson, and B. Ottersten, “Beamformer designs for zero-forcing dirty paper coding,” in Proc. International Conference on Wireless Communications and Signal Processing (WCSP), Nov. 2011, pp. 1–5, invited paper. [22] L.-N. Tran, M. Juntti, M. Bengtsson, and B. Ottersten, “On the optimality of beamformer design for zero-forcing DPC with QR decomposition,” in Proc. IEEE ICC 2012, Jun. 2012, pp. 2536–2541. [23] L.-N. Tran, M. Juntti, M. Bengtsson, and B. Ottersten, “Beamformer designs for MISO broadcast channels with zero-forcing dirty paper coding,” IEEE Trans. Wireless Commun., vol. 12, no. 3, pp. 1173–1185, Mar. 2013. [24] Z. Shen, R. Chen, J. Andrews, J. Heath, R.W., and B. Evans, “Low complexity user selection algorithms for multiuser MIMO systems with block diagonalization,” IEEE Trans. Signal Process., vol. 54, no. 9, pp. 3658–3663, Sep. 2006. [25] X. Zhang and J. Lee, “Low complexity MIMO scheduling with channel decomposition using capacity upperbound,” IEEE Commun. Lett., vol. 56, no. 6, pp. 871–876, Jun. 2008. [26] G. H. Golub and C. F. V. Loan, Matrix Computations, 3rd ed. The John Hopkins Univ. Press, 1996.

12

[27] F. Boccardi and H. Huang, “Zero-forcing precoding for the MIMO broadcast channel under per-antenna power constraints,” in Proc. IEEE SPAWC 2006, 2006, pp. 1–5. [28] W. Yu and T. Lan, “Transmitter optimization for the multi-antenna downlink with per-antenna power constraints,” IEEE Trans. Signal Process., vol. 55, no. 6, pp. 2646–2660, Jun. 2007. [29] K. C. Toh, M. J. Todd, and R. Tutuncu, “SDPT3— a Matlab software package for semidefinite programming,” Optimization Methods and Software, Nov. 1999. [30] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge University Press, 2004. [31] H. Huh, H. Papadopoulos, and G. Caire, “Multiuser MISO transmitter optimization for intercell interference mitigation,” IEEE Trans. Signal Process., vol. 58, no. 8, pp. 4272–4285, Aug. 2010. [32] N. J. Higham, Accuracy and Stability of Numerical Algorithms, 2nd ed. Society for Industrial and Applied Mathematics, 2002. [33] M. Vu, “MISO capacity with per-antenna power constraint,” IEEE Trans. Commun., vol. 59, no. 5, pp. 1268–1274, May 2011.

Le-Nam Tran received the B.S. degree in Electrical Engineering from Ho Chi Minh National University of Technology, Vietnam in 2003, and M.S and PhD in Radio Engineering from Kyung Hee University, Republic of Korea, in 2006 and 2009, respectively. In 2009, he joined the Department of Electrical Engineering, Kyung Hee University, Republic of Korea, as a lecturer. From September 2010 to July 2011, he was a postdoc fellow at the Signal Processing Laboratory, ACCESS Linnaeus Centre, KTH Royal Institute of Technology, Sweden. Since August 2011, he has been with Centre for Wireless Communications and Department of Communications Engineering, University of Oulu, Finland. His current research interests include multiuser MIMO systems, energy efficient communications, and full duplex transmission. He received the Best Paper Award from IITA in August 2005.

Markku Juntti (S’93-M’98-SM’04) received his M.Sc. (Tech.) and Dr.Sc. (Tech.) degrees in Electrical Engineering from University of Oulu, Oulu, Finland in 1993 and 1997, respectively. Dr. Juntti was with University of Oulu in 1992–98. In academic year 1994–95 he was a Visiting Scholar at Rice University, Houston, Texas. In 1999–2000 he was a Senior Specialist with Nokia Networks. Dr. Juntti has been a professor of communications engineering at University of Oulu, Department of Communication Engineering and Centre for Wireless Communications (CWC) since 2000. His research interests include signal processing for wireless networks as well as communication and information theory. He is an author or co-author in some 200 papers published in international journals and conference records as well as in book WCDMA for UMTS published by Wiley. Dr. Juntti is also an Adjunct Professor at Department of Electrical and Computer Engineering, Rice University, Houston, Texas, USA. Dr. Juntti is an Editor of IEEE T RANSACTIONS ON C OMMUNICATIONS and was an Associate Editor for IEEE T RANSACTIONS ON V EHICULAR T ECHNOLOGY in 2002–2008. He was Secretary of IEEE Communication Society Finland Chapter in 1996–97 and the Chairman for years 2000–01. He has been Secretary of the Technical Program Committee (TPC) of the 2001 IEEE International Conference on Communications (ICC’01), and the CoChair of the Technical Program Committee of 2004 Nordic Radio Symposium and 2006 IEEE International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC 2006). He is the General Chair of 2011 IEEE Communication Theory Workshop (CTW 2011).

Mats Bengtsson (M’00-SM’06) received the M.S. degree in computer science from Linköping University, Linköping, Sweden, in 1991 and the Tech. Lic and Ph.D. degrees in electrical engineering from the Royal Institute of Technology (KTH), Stockholm, Sweden, in 1997 and 2000, respectively. From 1991 to 1995, he was with Ericsson Telecom AB Karlstad. He currently holds a position as Associate Professor at the Signal Processing Laboratory, School of Electrical Engineering, KTH. His research interests include statistical signal processing and its applications to communications, multi-antenna processing, cooperative communication, radio resource management, and propagation channel modelling. Dr. Bengtsson served as Associate Editor for the IEEE Transactions on Signal Processing 2007-2009 and was a member of the IEEE SPCOM Technical Committee 2007-2012.

Björn Ottersten (S’87-M’89-SM’99-F’04) was born in Stockholm, Sweden, 1961. He received the M.S. degree in electrical engineering and applied physics from Linköping University, Linköping, Sweden, in 1986. In 1989 he received the Ph.D. degree in electrical engineering from Stanford University, Stanford, CA. Dr. Ottersten has held research positions at the Department of Electrical Engineering, Linköping University, the Information Systems Laboratory, Stanford University, the Katholieke Universiteit Leuven, Leuven, and the University of Luxembourg. During 96/97 Dr. Ottersten was Director of Research at ArrayComm Inc, a start-up in San Jose, California based on Ottersten’s patented technology. He has co-authored journal papers that received the IEEE Signal Processing Society Best Paper Award in 1993, 2001, and 2006 and 3 IEEE conference papers receiving Best Paper Awards. In 1991 he was appointed Professor of Signal Processing at the Royal Institute of Technology (KTH), Stockholm. From 1992 to 2004 he was head of the department for Signals, Sensors, and Systems at KTH and from 2004 to 2008 he was dean of the School of Electrical Engineering at KTH. Currently, Dr. Ottersten is Director for the Interdisciplinary Centre for Security, Reliability and Trust at the University of Luxembourg. As Digital Champion of Luxembourg, he acts as an adviser to European Commissioner Neelie Kroes. Dr. Ottersten has served as Associate Editor for the IEEE Transactions on Signal Processing and on the editorial board of IEEE Signal Processing Magazine. He is currently editor in chief of EURASIP Signal Processing Journal and a member of the editorial boards of EURASIP Journal of Applied Signal Processing and Foundations and Trends in Signal Processing. Dr. Ottersten is a Fellow of the IEEE and EURASIP. In 2011 he received the IEEE Signal Processing Society Technical Achievement Award. He is a first recipient of the European Research Council advanced research grant. His research interests include security and trust, reliable wireless communications, and statistical signal processing.

Suggest Documents