A Parallel Low Complexity Zero-Forcing Beamformer ... - IEEE Xplore

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 63, NO. 16, AUGUST 15, 2015

4179

A Parallel Low Complexity Zero-Forcing Beamformer Design for Multiuser MIMO Systems Via a Regularized Dual Decomposition Method Bin Li, Chang Zhi Wu, Hai Huyen Dam, Antonio Cantoni, Life Fellow, IEEE, and Kok Lay Teo, Senior Member, IEEE

Abstract—Zero-forcing beamforming under per-antenna power constraint (PAPC) is considered in this paper, and the objective is to maximize the minimum user information rate. A parallel low complexity zero-forcing beamformer design is proposed in this paper for MU-MIMO systems by introducing a regularized dual decomposition method. The idea of this method is to solve the problem via solving its dual problem. Since the dual objective is not differentiable, a Tikhonov regularization is introduced. The regularized problem can be solved by using a gradient-based method in a parallel manner. Moreover, the optimal solution of the Lagrangian is in a closed form. The smoothness properties of the regularized dual function are investigated. We also estimate the error bound between the optimal function value of the primal problem and that of the regularized dual problem. Corresponding convergence analysis and convergence rate of the proposed algorithm are established. Computational complexity analysis is carried out to compare the complexity of the proposed method with that of state-of-the-art interior point method. Simulation results are provided to show the effectiveness of the proposed method. Index Terms—Zero-forcing beamforming (ZFBF), per-antenna power constraint (PAPC), MIMO, dual decomposition, parallel computation, tikhonov regularization.

I. INTRODUCTION

T

RANSMITTER design for the multi-user multiple-input multiple-output (MU-MIMO) systems has been studied intensively in the literature (see, for example, [1]–[7]). The dirty

Manuscript received June 14, 2014; revised February 03, 2015 and April 01, 2015; accepted May 15, 2015. Date of publication May 26, 2015; date of current version July 02, 2015. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Animashree Anandkumar. This work was supported by a Discovery Grant from the Australia Research Council (No. DP120103859). B. Li, H. H. Dam, and K. L. Teo are with the Department of Mathematics and Statistics, Curtin University, GPO Box U1987, Perth, WA 6845, Australia (e-mail: [email protected]; [email protected]; [email protected]). C. Z. Wu is with the Australasian Joint Research Centre for Building Information Modelling, School of Built Environment, Curtin University, Bentley, WA 6102, Australia (e-mail: [email protected]). A. Cantoni are with the School of Electrical, Electronic and Computer Engineering, The University of Western Australia, Crawley, WA, 6009 Australia (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TSP.2015.2437846

paper coding (DPC) [1] is known as the capacity-achieving scheme. Due to high computational complexity, however, it is difficult to be implemented in practice. Consequently, emphasis has been shifted to finding suboptimal strategies (see, for example, [8], [9]). Zero-forcing beamforming (ZFBF) is one of the most commonly used linear pre-coding methods in MU-MIMO broadcast channel since it provides a good trade-off between the complexity and the performance. It is applied to serve multiple users simultaneously in a group by exploiting the spatial separation between users (space-division multiple access (SDMA)). Each user stream in the group is coded independently and multiplied by a beamforming weight vector for transmission through multiple Base Station (BS) antennas. In particular, these weight vectors are chosen such that the mutual interferences among the users are ‘zero-forced’ (eliminated). A. Literature Review Traditionally, pseudo-inverse is usually adopted as the precoding strategy for ZFBF. It has been proved to be the optimal beamformer design under the total power constraint [9]. However, for real world applications, each antenna of the transmitter has its own amplifier and for each amplifier there is a limit. Thus, it is more realistic to impose a power constraint for each antenna. In addition, its linear operation is important for modern efficient modulation, e.g., Orthogonal Frequency Division Multiplexing (OFDM). Furthermore, pseudo-inverse can be numerically unstable and becomes ill-conditioned if its conditional number is large. In this case, the performance of the beamformer is unreliable. Hence, the per-antenna power constraints (PAPC) (see, for example, [9]–[19]) are more relevant for real world applications than the total power constraint. ZFBF under the PAPC is a nontrivial problem. In [9], it is shown that ZF pre-coding is closely related to generalized inverses and it is a difficult optimization problem, which depends on the performance criterion, to find the optimal generalize inverse under PAPC. Two specific performance criteria (i.e., the fairness and throughput) have been investigated. For the fairness performance, the problem is transformed into a convex second order cone program. For the throughput performance, the problem is transformed into a standard determinant maximization program subject to matrix linear inequalities, which is

1053-587X © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

4180


a convex optimization problem. Both of the transformed problems can be solved by using standard optimization packages. In [11], the pseudo-inverse is scaled to satisfy the PAPC with a scale factor. However, as we mentioned, this approach does not lead to optimal solution. In this paper, we consider a zero-forcing beamformer design problem under PAPC. For fairness of the users, the objective of the problem is to maximize the worst user information rate (see, for example, [9], [10]). In fact, the method proposed in this paper can be extended to an even more general case. More specifically, the objective function has a separable structure, i.e., , which implies that both of the two performance criteria in [9] can be covered. Although ZFBF under PAPC with fairness performance measure can be solved by applying the interior point method [12] with an iterations bound as shown in [20], it is expensive for realizing this algorithm with hardware in practice. This is due to the fact that a lot of complex computations are involved in this algorithm, such as the calculations of a Hessian matrix and taking the inverse of a matrix in each iteration [21]. More specifically, it has a complexity of in each iteration [20]. Furthermore, for real world applications, parallel algorithms are more attractive [22], especially when the problem size becomes larger [23], since they allow complex calculations to be achieved in a constrained time interval using a collection of computational units working in parallel. Further, a complex problem can be more easily solved in a constrained time if the computational complexity for each unit is low. These considerations motivate the study on the development of a parallel low complexity algorithm for this problem. To develop a parallel algorithm for this problem, there are two challenges. The first challenge is to deal with the coupled constraints. For example, the PAPC and the other constraints are coupled together. In addition, the linear inequality constraints are coupled constraints. Thus, the problem becomes inseparable and it cannot be computed in parallel. We solve this problem by proposing a dual decomposition based method. The idea of the dual decomposition method is to solve the original problem through solving its dual problem. This is achieved by appending the all the constraints into the objective function to form the Lagrangian. Since all the constraints are combined in the objective function, there are no coupled constraints in the dual problem. Hence, the dual problem is separable and can be solved in parallel. Moreover, the dual decomposition method is attractive because it is a first-order algorithm and hence its computational overhead for each iteration is low. In addition, first-order algorithms are more robust against the presence of noise during the computations. The drawback for the first-order method is that the convergence rate is slow, and this is more significant when the searching approaches to the optimal solution. The second challenge is that the dual problem is not differentiable. This is because the newly formed Lagrangian is not strictly convex, which implies that its dual function is not differentiable. In the literature, there are some methods to tackle this challenge. For example, alternating direction method [24], proximal point method [25] and partial inverse method [26]. The main idea of these methods is to smooth the dual function by introducing a so-called Augmented Lagrangian. A computational

drawback of these schemes is that the resulting objective function is inseparable. To overcome this obstacle, we introduce a Tikhonov regularization to smooth the dual function by regularizing the Lagrangian into a strictly convex function. By doing so, the dual function becomes differentiable while the regularized Lagrangian remains separable. Furthermore, this method is robust and the optimal solution of the Lagrangian is in a closed form. In the literature, the dual decomposition method is widely used in the distributed beamforming for multi cellular communications systems [27]–[30]. For example, the dual decomposition method is utilized for solving a coordinated multi-cell multi-antenna minimum power beamformer design problem with single-antenna users in [27]. In [28], the dual decomposition method is applied to a transmission power minimization problem in a multi-cell network. In [29], a minimum power downlink beamforming problem for a multi-cell system is solved by using the dual decomposition method. In [30], dual decomposition techniques with regularization is proposed for cooperative beamforming in relay networks. The regularization is different from ours which results in a augmented Lagrangian decomposition method. However, the problems investigated in those works are different from the problem stated here. Meanwhile, regularization being used is different from the proposed method. According to our best knowledge, the regularized dual decomposition algorithm has not been proposed for ZFBF under PAPC. B. Contributions In this paper, a new regularized dual decomposition method is proposed. For the standard dual decomposition method, there are two loops of optimization [31], [32]. One loop is to minimize the Lagrangian and the other loop is to maximize the dual function. In addition, since the resulted dual function is not differentiable, subgradient method has to be used maximize the dual function. This method is known to suffer from the slow convergence rate [31]. Different from the standard dual decomposition method, we introduce the Tikhonov regularization by exploring the structures of our problem. There are several advantages. First, the optimal solution of minimizing the Lagrangian is in a closed form, which implies there is only one loop of optimization. Second, the resulted dual function is differentiable and hence we can obtain gradients of the dual problem. Third, the problem remains separable. By applying proposed method, as stated above, a parallel low complexity design can be obtained. Both the primal variables and dual variables can be updated in a parallel manner. More specifically, the dual variables are updated by means of a gradient method. On the other hand, the primal variables are updated directly since the optimal solution of the Lagrangian is in a closed form. The smoothness properties of the regularized dual function are proven and the obtained Lipschitz constants are used for the choice of the step sizes of the dual updates. We also estimate the error bound between the optimal function value of the original problem and that of the regularized problem. The convergence analysis is carried out and the convergence rate is provided. The computational analysis is carried

LI et al.: A PARALLEL LOW COMPLEXITY ZERO-FORCING BEAMFORMER DESIGN FOR MULTIUSER MIMO SYSTEMS

out to compare the complexity of the proposed method with that of state-of-the-art interior point method. C. Organization The rest of this paper is organized as follows. The problem is formulated in Section II. In Section III, we present a regularized dual decomposition method. The smoothness properties and regularization error bound are provided. The computational complexity analysis is carried out. We also establish the convergence result and give the convergence rate of the proposed method in this section. Numerical examples are given to analyzing the performance of the proposed method in Section IV. Finally, we conclude this paper by making some concluding remarks in Section V.

4181

is the th column vector of is a vector of where length with an 1 in the th element while 0 in the other elements, and is the maximum allowable power on each antenna. The problem under consideration may now be formally stated below: Problem 2.1:

II. PROBLEM FORMULATION Consider the standard MISO multiuser broadcast channel

Let (1)

where is the received signal of the th user, is the channel vector of length of the th user, is the transmitted vector of length , and is the circularly symmetric complex Gaussian noise with mean 0 and variance . Throughout the paper, we assume that . (1) can be written in a compact form given below: (2)

where and are the corresponding real and imaginary part of . Similarly, we can also define and . As is shown in [12], Problem 2.1 is equivalent to the following convex optimization problem, which is referred to as Problem 2.2. Problem 2.2: (8a) (8b)

where denotes the transpose, and denotes the conjugate transpose. Here, the linear zero-forcing pre-coding transmitter is applied, i.e., (3)

(8c) (8d) where is a vector of ones with appropriate dimension, means each element of is less than or equal to 0, is a matrix with th row being

(4) is the information vector of length such that denotes the identity matrix of appropriate dimension, is an complex matrix, and denotes a real and positive diagonal matrix. The information rate for each user is denoted by and is given by where

(5) where is the signal-to-interference-plus-noise ratio (SINR) for each user, which is given by (6) . From (4), we have . In this paper, we take minimum user information rate, i.e., , as the performance measure. To limit the power on the amplifier of each antenna, the per-antenna power constraints are imposed as follows: (7)

and

is an identity matrix and is a diagonal matrix with 1 appearing in the th th positions and 0 elsewhere.

.. .

(9)

where, for is expressed as (10), which is shown on the bottom of the next page, and . For is expressed as (11), which is shown on the bottom of the next . is a page, and diagonal matrix with th row being

III. DUAL DECOMPOSITION AND REGULARIZATION In this section, we shall solve Problem 2.2 through solving its dual problem in such a way that all the constraints are appended

4182


A. Regularization of the Lagrangian

to the Lagrangian and hence the problem becomes separable and the primal variables and can be computed in parallel. To begin with, we introduce the following Lagrangian of Problem 2.2 as

Note that the Lagrangian (12) is not strictly convex and the dual function (13) is not differentiable. As stated, the augmented Lagrangian method, ADMM or proximal point method will introduce a inseparable term such that the augmented Lagrangian is not separable. Moreover, these methods are also known to be sensitive to the choice of weighting parameters associated with the terms being appended to the Lagrangian. Here, we apply, in this paper, a Tikhonov regularization to Lagrangian (12). By doing so, the regularized dual function (13) will become differentiable and will remain separable. To regularize (12), we introduce two prox-functions and , where is a smoothing parameter, and denotes Euclidean norm. As it will be shown later in the next section, is the error bound of the regularization. By appending and into (12), we obtain the regularized Lagrangian as

(12) The corresponding dual function is denoted as

(13) Then, the dual problem of Problem 2.2 can be written as Problem 3.1:

(14) Note that the optimal solution of

can be written in closed form as where means each element of is greater than or equal to 0. Problem 2.2 can be solved through solving its dual problem, i.e., Problem 3.1, since the strong duality holds. This can be verified by the fact that Problem 2.2 is convex and the following Slater’s condition holds [21]. Lemma 3.1: There exists an and a for Problem 2.2 such that

(15) where . By considering (13), (14) and (15), we obtain the regularized dual function as follows

where means each element of is less than 0. Remark 3.1: Lemma 3.1 is true when there are strictly feasible solutions for Problem 2.2. Throughout this paper, we assume that there are strictly feasible solutions for Problem 2.2.

.. .

.. .

.. .

.. .

..

.. .

.. .

.. .

.. .

.. .

.. .

.. .

..

.. .

.. .

.. .

.

..

.

.. .. .

.

.

(16)

.. .

.. .

.. .

.. .

.. .

.. .

.. .

..

.

.. .

.. .

.. .

.. .

(10)

.. .

.. .

.. .

.. .

.. .

.. .

.. .

.. .

.. .

..

.. .

.. .

(11) .. .

.


The regularized dual problem with the objective function replaced by (16) is referred to as Problem . Note that Problem 3.1 corresponds to the following primal problem: Problem 3.2:

where (24) gives

and

4183

. Replacing with

and

with

in

(25) From (8c), we have (26) It is obvious that, and are separable and for each is a well defined concave function, which is continuously differentiable at any . Furthermore, the regularized Lagrangian has the following properties. Property 3.1: The gradients (17)–(19)

(17)

(18)

Note that

, then it follows that (27)

By adding all the rows of (8b), we have

and all the components of

in

where

Since

, then it follows that (28)

(19)

By considering the definition of Euclidian norm, we know that . Thus, from (27) and (28), we obtain

are Lipschitz continuous with the Lipschitz constants given by

(29)

(20)

Combining (25), (27) and (29) and knowing that , we obtain the desired relation. This completes the proof. Remark 3.2: (27) and (29) provide a ‘good’ bound for this problem. In fact, this is achieved by taking the conjugate of channel as the weights of the beamformer under the total power constraint. This is known as matched filter in signal processing, which provides the best signal to noise ratio. We can refer to this as an ‘idealized’ scenario. This ‘idealized’ bound may not be achieved in this problem since the feasible set bounded by PAPC is a subset of that bounded by the total power constraint. Theorem 3.1 implies that the error bound of the regulation is . From the definition of , we know that only depends on for a specified scenario, where and are fixed. Thus, we can give a rule of choosing according to Theorem 3.1. For this, we define , where is called error bound parameter. In fact, is the portion of the bound of the regularization error to the performance of the ‘idealized’ scenario according to (29) and Remark 3.2. In addition, the definition of shows that the error bound depends on and square root of linearly.

(21) (22) where

is an matrix with all the elements being 1, denotes the Frobenius norm,

and, for Proof: See Appendix A.

.

B. Regularization Error In this section, we shall investigate the error introduced due to the regularization. Theorem 3.1: Suppose the Slater conditions hold and let and be optimal solutions of Problem 3.2 and Problem 2.2, respectively. Then, for any , (23)

C. Parallel Algorithm

Proof: Under the Slater condition, both Problem 2.2 and Problem 3.2 have solutions. Since is an optimal solution of Problem 3.2, while Problem 3.2 is strictly convex, we have

In this section, we will show how the regularized problem, , can be solved in parallel. For convei.e., Problem nience, we denote the dual variable as

(24)

(30)

4184


Fig. 1. Parallel Implementation.

where

is the th element for

and

Obviously, , for and . From (15), it follows from the structures of and that the primal variables and can be updated in parallel as shown in (31)–(32) at the bottom of the page. (32)

The dual variables and can be updated by a gradient method. This is expressed by (33), shown at the bottom of the page. Remark 3.3: For the choice of the step sizes, we adopt the constant step sizes and choose and . The implementation of (31), (32) and (33) is as shown in Fig. 1. In Fig. 1, the blocks with the same color can be computed in parallel. More specifically, the dual variables and can be updated in parallel, the primal variables and can be updated in parallel, the residual of the constraints and can be updated in parallel, and and can be updated in parallel.

(31)

(33)


Now we state the parallel algorithm as follows:

TABLE I COMPUTATIONAL COMPLEXITY COMPARISON BETWEEN INTERIOR POINT METHOD & ALGORITHM 1

Algorithm 1 Initialize

, choose the step size and by estimating the Lipschitz constants and according to (20)–(22), and set and the tolerance . Set , which can also be achieved by choosing since .

and

For the th iteration Step 1 Compute

4185

and

Step 2 Compute

according to (31) and (32). according to (33).

Step 3 Compute the residual error of the duality gap , where and is defined as in (16). The algorithm stops when

.

Remark 3.4: In Algorithm 1, the primal variables ( and ) can be updated without doing optimization and hence the convergence rate of the algorithm depends on the optimization of the dual variables. Since the process of optimizing the dual variables is based on the gradient descent method, the convergence rate of Algorithm 1 is according to [33]. Remark 3.5: From the definition of , we know that . According to Theorem 3.1, we have (34) It follow from (34) that the optimal solution of Algorithm 1 . converge to the optimal solution of Problem 2.2 as D. Computational Complexity Analysis In this section, we will give computational complexity analysis of the proposed method. Then, we will compare it with that of state-of-the-art interior point method in each iteration. For Algorithm 1 in each iteration, the main computation and in overhead is the update of the primal variables in (31) and (32) and the dual variables (33). For the primal variable updates, it involves parallel streams. In each stream, the main computation is from a mulmatrix and a vector. There are tiplication between a some minor computations such as vector additions and multiplications between scalars and vectors. For the dual updates there are only 3 parallel streams. In each stream, the main computation is from a multiplication between an matrix and a vector, a matrix and a vector and a matrix and a vector. Overall, the complexity of Algorithm 1 in each . iteration is On the other hand, the main computational load for the primal-dual interior point method in each iteration is the calculation of the Newton search direction [21]. The computational complexity per iteration for the primal-dual interior point

method is . From these comparisons, we can see that the computation for Algorithm 1 is significantly lower than that for the primal-dual interior point method. For the convergence rate, we know that the number of itera, while tions for solving Problem 2.1 by Algorithm 1 is it is by applying the interior point method [20]. So there is a trade-off between the complexity per iteration and the convergence rate. More specifically, the interior point method gains convergence rate at the expense of a much higher cost on the computational complexity per iteration, resulting in higher expense on the hardware implementation. More significantly, it cannot be implemented in a parallel manner. Hence, the proposed algorithm is much more attractive from the practical point of view. The details of iteration complexity and convergence rate for both algorithms are listed in Table I. For further studies, the focus should be on a faster convergence rate algorithm while maintaining the same level of computational complexity per iteration as in the proposed algorithm. Ultimately, the question of whether the trade-off is favorable requires the design the hardware to implement the algorithm. IV. NUMERICAL RESULTS In this section, we shall apply Algorithm 1 to Problem 2.1 and then analyze its performance. The base station array considered in the numerical studies is a uniform planar circular array. It consists of isotropic elements and the inter-element . Set and dB spacing is equal to . Here, SNR denotes the signal to noise ratio corresponds to a single user channel from one BS array element to the user receiver. The computer used is a Dell desktop. Its CPU is i5–2500, 3.30 G CPU, and it has a 8 G RAM. The simulation is implemented in the Matlab environment. We implement Algorithm 1 within a channel that was tested by Commonwealth Scientific and Industrial Research Organization , and is (CSIRO) in rural Australia [34]. is set as set as 70%. We study the performances with different values of the regularization parameter . The results are shown in Fig. 2 as a function of the number of transmit antennas for . As expected, we can see that the performance converge to the optimal performance (PAPC), which is denoted as PAPC in Fig. 2, . The performance is virtually identical for less than as 0.3. With the same setting for SNR and , we compare the complexity per iteration between the interior point method and Algorithm 1. The results are shown Fig. 3 as a function of the number of transmit antennas with different number of users . From Fig. 3, we can see that the computational complexity per iteration of the proposed method is much lower than that of

4186


Fig. 2. ZFBF under PAPC as a function of

with different

and

dB.

Fig. 3. Computational Complexity Comparison.

the interior point method. Note that the logarithm of the complexity is plotted in the figure. In Fig. 4, we plot the performances achieved with different values of the error bound parameter as a function of SNR. As expected, the performance converges to the optimum as decreases and the performance is already very closed to the optimal performance when . Recalling Theorem 3.1 and the definition of , we know that, in this case, the bound of the regularization error is 30% of the ‘idealized’ scenario (matched filter). We cannot tight the bound error further since the ‘ide-

alized’ scenario only serves as a reference and could not be achieved as explained in Remark 3.2. To investigate the convergence behavior of the proposed aldB and plot the residual error of gorithm, we set the duality gap as a function of the number of iterations with and in Fig. 5. As illustrated, the algorithm different converges faster when is larger. This can be interpreted by the fact that the convexity of the problem is stronger with a larger and hence the algorithm converges faster. As expected, the al. Set , gorithm converges faster with a smaller


4187

Fig. 4. ZFBF under PAPC as a function of SNR with different .

Fig. 5. Convergence Behavior. (a)

. (b)

and we also plot the information rate with different in Fig. 6. or , there As illustrated, when is set as are certain performance loss. It almost the same as the optimal . performance when it decreases to

. (c)

(d)

.

The computational time (seconds) for the interior point method in the Optimization Tool Box within the Matlab environment and for the proposed method are listed in Table II. is set as . As can be seen in Table II, the iteration

4188


Fig. 6. ZFBF under PAPC as a function of

TABLE II CPU TIME COMPARISON

with different

.

function is shown to be Lipschitz continuous and the corresponding Lipschitz constants can be used for the choice of step size of the dual updates. The regularization bound is provided. The proposed algorithm is proved to be convergent with an convergence rate. For future studies, the focus should be on a faster convergence rate algorithm with the same level of iteration complexity. APPENDIX A PROOF OF THEOREM 3.1

CPU time of the proposed method is significantly less than the interior point method. This implies that in the real system the overall implementation time by using our method would be even smaller than the interior point method although the total number of iterations of the proposed method is larger. The computational operations in of our method are much simpler than those in the interior point method, and consequently simpler hardware realizations are possible with significantly faster execution time. Thus, even though for the proposed method the number of iterations for convergence is large, it is feasible to achieve a shorter real time for convergence with the proposed method. Furthermore, if some performance loss is acceptable, can be set larger so that the number of iterations can be dramatically decreased as it is shown in Fig. 5.

Proof: The gradients formulas (17), (18) and (19) can be obtained readily by taking the derivatives of with respect to and and then using the min-max theory. We will move on to the Lipschitz continuity and find the corresponding Lipschitz constant for the gradient of each constraint. Towards this goal, we first consider (17) and choose and . From (17), we obtain (35).

(35)

V. CONCLUSION

Clearly, the first term in (35) is bounded due to the fact that and is bounded. Let

A low complexity parallel beamformer design, which is based on a regularization dual decomposition method, is proposed. By applying this method, both the primal variables and the dual variables can be updated in parallel. An computational complexity per iteration is achieved, while for state-of-the-art interior point method, the compuwith tational complexity per iteration is state-of-the-art interior point method. The regularized dual

Then, it follows that (17) is Lipschitz continuous and the corresponding Lipschitz constant is as given by (20). The Lipschitz continuity and the corresponding Lipschitz constant of (18) can be obtained similarly.


For (19), let choose and . From (19) and definition of lows that

, and , it fol-

.. .

(36) where ement being

is a diagonal

matrix with the th el-

Then, we have (37),

(37) where is an diagonal matrix with the th element being and the inequality follows from . Denote the fact that

Then, the Lipschitz continuity of (19) can be obtained by letting

and is the corresponding Lipschitz constant. This completes the proof. ACKNOWLEDGMENT The first author would like to thank Dr. Gang Wang for the helpful discussions. The authors would like to thank the anonymous reviewers for their helpful and constructive comments. REFERENCES [1] G. Caire and S. Shamai (Shitz), “On the achievable throughput of multiatenna Gaussian broadcast channel,” IEEE Trans. Inf. Theory, vol. 49, no. 7, pp. 1691–1706, June 2003. [2] M. Joham, W. Utschick, and S. Nossek, “Linear transmit processing in MIMO communications system,” IEEE Trans. Signal Process., vol. 53, no. 8, pp. 2700–2712, Aug. 2005. [3] B. C. Peel, B. M. Utschick Hochwald, and A. L. Swindlehurst, “A vector-perturbation technique for near-capacity multiantenna multiuser communication—Part I: Channel inversion and regularization,” IEEE Trans. Commun., vol. 53, no. 3, pp. 195–202, Mar. 2005. [4] H. Sung, S. R. Lee, and I. Lee, “Genearlized channel inversion methods for multiuser MIMO systems,” IEEE Trans. Commun., vol. 57, no. 11, pp. 3489–3499, Nov. 2009.

4189

[5] Q. H. Spencer, A. L. Swindlehurst, and M. Haardt, “Zero-forcing methods for downlink spatial multiplexing in multiuser MIMO chanels,” IEEE Trans. Signal Process., vol. 52, no. 2, pp. 461–471, Feb. 2004. [6] L. U. Choi and R. D. Murch, “A transmit preprocessing technique for multiuser MIMO systems using a decomposition approach,” IEEE Trans. Wireless Commun., vol. 3, no. 1, pp. 20–24, Jan. 2004. [7] P. S. Udupa and J. S. Lehnert, “Optimizing zero-forcing precoders for MIMO broadcast systems,” IEEE Trans. Commun., vol. 55, no. 8, pp. 1516–1524, Aug. 2007. [8] A. Wiesel, Y. C. Eldar, and S. Shamai (Shitz), “Linear precoding via conic optimizaiton for fixed MIMO receivers,” IEEE Trans. Signal Process., vol. 54, no. 1, pp. 161–176, Jan. 2006. [9] A. Wiesel, Y. C. Eldar, and S. Shamai (Shitz), “Zero-forcing precoding and generalized inverses,” IEEE Trans. Signal Process., vol. 56, no. 9, pp. 4409–4418, Sep. 2008. [10] K. Karakayali, R. Yates, G. Foschini, and R. Valenzuela, “Optimal zero-forcing beamforming with per-antenna power constraints,” in Proc. IEEE Int. Symp. Circuits Syst. Inf. Theory., 2007, pp. 101–105. [11] S. R. Lee, J. S. Kim, S. H. Moon, H. B. Kong, and I. Lee, “Zeroforcing beamforming in multiuser MISO downlink systems under perantenna power constraint and equal-rate metric,” IEEE Trans. Wireless Commun., vol. 12, no. 1, pp. 20–24, Jan. 2013. [12] B. Li, H. H. Dam, A. Cantoni, and K. L. Teo, “A primal-dual interior point method for optimal zero-forcing beamformer design under perantenna power constraints,” Optim. Lett., vol. 8, no. 6, pp. 1829–1843, 2014. [13] H. H. Dam and A. Cantoni, “Interior point method for optimum zeroforcing beamforming with per-antenna power constraints and optimal step size,” Signal Process., vol. 106, pp. 10–14, 2015. [14] B. Li, H. H. Dam, K. L. Teo, and A. Cantoni, “A low complexity optimization algorithm for zero-forcing precoding under per-antenna power constraints,” presented at the 40th IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Brisbane, Australia, Apr. 19–24, 2015. [15] B. Li, H. H. Dam, A. Cantoni, and K. L. Teo, “A first-order optimal zero-forcing beamformer design for multiuser MIMO systems via a regularized dual accelerated gradient method,” IEEE Commun. Lett., vol. 19, no. 2, pp. 195–198, Feb. 2015. [16] H. H. Dam, A. Cantoni, and B. Li, “A fast low complexity method for optimal zero-forcing beamformer MU-MIMO system,” IEEE Signal Process. Lett., vol. 22, no. 9, pp. 1443–1447, 2015. [17] B. Li, H. H. Dam, K. L. Teo, and A. Cantoni, “A survey on zeroforcing beamformer design under per-antenna power constraints for multiuser MIMO systems,” presented at the IEEE Int. Conf. Digit. Signal Process. (DSP), Singapore, Jul. 21–24, 2015. [18] B. Li, H. H. Dam, A. Cantoni, and K. L. Teo, “Some interesting properties for zero-forcing beamforming under per-antenna power constraints in rural areas,” J. Global Optim., DOI: 10.1007/s10898-014-0237-4. [19] B. Li, H. H. Dam, A. Cantoni, and K. L. Teo, “A global optimal zeroforcing beamformer design with Signed Power-of-Two coefficients,” J. Ind. Manag. Optim., vol. 12, no. 2, pp. 595–607, Apr. 2016. [20] S. J. Benson, Y. Ye, and X. Zhang, “Solving large-scale sparese semidefinite programs for combinational optimization,” SIAM J. Optim., vol. 10, no. 2, pp. 443–461, 2000. [21] S. Boyd and L. Vandenberghe, Covex Optimization. Cambridge, U.K.: Cambridge Univ. Press, 2004. [22] H. Pham and X. Lu, “The inverse parallel machine scheduling problem with minimum total completion time,” J. Ind. Manag. Optim., vol. 10, no. 2, pp. 613–620, 2014. [23] T. Hirai, H. Masuyama, S. Kasahara, and Y. Takahashi, “Performance analysis of large-scale parallel-distributed processing with backup tasks for cloud computing,” J. Ind. Manag. Optim., vol. 10, no. 1, pp. 113–129, 2014. [24] S. Kontogiorgis, R. D. Leone, and R. Meyer, “Alternating direction splitings for block angular parallel optimization,” J. Optim. Theory Appl., vol. 90, no. 1, pp. 1–29, 1996. [25] G. Chen and M. Teboulle, “A proximal-based decomposition method for convex minimization problems,” Math. Programm. (A), vol. 64, pp. 81–101, 1994. [26] J. E. Spingarn, “Applications of the method of partial inverses to convex programming: Decomposition,” Math. Programm. (A), vol. 32, pp. 199–223, 1985. [27] A. Tolli, H. Pennanen, and P. Komulainen, “Distributed coordinated multi-cell transmission based on dual decomposition,” in Proc. IEEE GLOBECOM, Nov. 2009, pp. 1–6. [28] S. Shen and T. M. Lok, “Asynchronous distributed downlink beamforming and power control in multi-cell networks,” IEEE Trans. Wireless Commun., vol. 13, no. 7, pp. 3892–3902, Jul. 2014.

4190


[29] J. Choi, “On the decomposition method for distributed downlink beamforming in multi-cell systems,” in Proc. IEEE 80th Veh. Technol. Conf. (VTC Fall), Sep. 14–17, 2014, pp. 1–5. [30] N. Chatzipanagiotis, A. Petropulu, and M. M. Zavlanos, “A distributed algorithm for cooperative relay beamforming,” in Proc. Amer. Control Conf. (ACC), Jun. 17–19, 2013, pp. 3796–3801. [31] D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods. Belmont, MA, USA: Athena Scientific, 1997. [32] D. P. Palomar and M. Chiang, Member, “A tutorial on decomposition methods for network utility maximization,” IEEE J. Sel. Areas Commun., vol. 24, no. 8, pp. 1439–1451, Aug. 2006. [33] L. Vandenberghe, “Optimization methods for large-scale systems,” in Lecture Notes, UCLA, Spring 2013–2014 [Online]. Available: http:// www.seas.ucla.edu/~vandenbe/ee236c.html [34] H. Suzuki, D. Robertson, N. L. Ratnayake, and K. Ziri-Castro, “Prediction and measurement of multiuser MIMO-OFDM channel in rural Australia,” in Proc. IEEE 75th Veh. Technol. Conf., 2012, pp. 1–5.

Bin Li received the B.Sc. degree in automation and the M. A. Sc in control science and engineering from Harbin Institute of Technology, China, in 2005 and 2008, respectively, and Ph.D. degrees in mathematics and statistics from Curtin University, Australia, in 2011. From 2012–2014, he was a Research Associate with the School of Electrical, Electronic and Computer Engineering, the University of Western Australia, Australia. Since 2014, he is a Research Fellow with the Department of Mathematics and Statistics, Curtin University, Australia. His research interests include signal processing, optimization and optimal control.

Chang Zhi Wu received the Ph.D. degree from Zhongshan University, China, in 2006. He joined Chongqing Normal University as a Lecturer in 2006 and then promoted as a Professor in 2009. In 2013, he joined Australian Joint Research Centre for Building Information Modelling at Curtin University as a senior research fellow. His main interests include both theoretical and practical aspects of optimization and optimal control and their applications in signal processing, civil engineering and construction management.

Hai Huyen Dam received the Bachelor degree (first class Honours) and Ph.D. degree (with distinction) from Curtin University of Technology, Perth, Australia, in 1996 and 2001, respectively. From 1999 to 2000, she spent one year at the Blekinge Institute of Technology, Sweden. From 2001 to 2006, she was a Research Fellow/Senior Research Fellow with Western Australian Telecommunications Research Institute (WATRI). Since 2007, she is a Senior Lecturer with the Department of Mathematics and Statistics, Curtin University, Australia. Her research interests are signal processing, adaptive array processing, optimization, equalization and filter design.

Antonio Cantoni (M’74–SM’83–F’98) was born in Soliera, Italy, on October 30, 1946. He received the B.E. (first-class Hons.) and Ph.D. degrees from The University of Western Australia, Nedlands, W.A., Australia, in 1968 and 1972, respectively. In 1972, he was a Lecturer in Computer Science at the Australian National University, Canberra, A.C.T., Australia. In 1973, he joined the Department of Electrical and Electronic Engineering, University of Newcastle, Shortland, N.S.W., Australia, where he held the Chair of Computer Engineering until 1986. In 1987, he joined QPSX Communications Ltd., Perth, W.A., Australia, as the Director of the Digital and Computer Systems Design Section for the development of the distributed-queue dual-bus Metropolitan Area Network. From 1987 to 1990, he was also a Visiting Professor in the Department of Electrical and Electronic Engineering, The University of Western Australia, where he is currently the Winthrop Professor in the Department of Electrical Electronic and Computer Engineering. He also has fractional position as a Senior Research Scientist in the Information, Communication and Technology Centre, Commonwealth Scientific and Industrial Research Organisation, Epping, N.S.W., Australia. From 1992 to 1997, he was the Director of the Western Australian Telecommunications Research Institute and a Professor of Telecommunications at Curtin University of Technology, Bentley, W.A., Australia, and also the Director of the Cooperative Research Centre for Broad-Band Telecommunications and Networking. From 1997 to 2000, he was the Chief Technology Officer of Atmosphere Networks, an optical networks startup that he cofounded. From 2000 to 2009, he was the Research Director of the Western Australian Telecommunications Research Institute. His research interests include adaptive signal processing, wireless communications, electronic system design, phaselocked loops, and networking. Dr. Cantoni has been an Associate Editor of the IEEE TRANSACTIONS ON SIGNAL PROCESSING. He is a Life Fellow of the Australian academy of Technological Sciences and Engineering.

Kok Lay Teo (M’74–SM’87) received the B.Sc. degree in telecommunications engineering from Ngee Ann Technical College, Singapore, and the M.A.Sc and Ph.D. degrees in electrical engineering from the University of Ottawa, Canada. He was with the Department of Applied Mathematics, University of New South Wales, Australia, the Department of Industrial and Systems Engineering, National University of Singapore, Singapore, the Department of Mathematics, the University of Western Australia, Australia. In 1996, he joined the Department of Mathematics and Statistics, Curtin University of Technology, Australia, as Professor. He then took up the position of Chair Professor of Applied Mathematics and Head of Department of Applied Mathematics at the Hong Kong Polytechnic University, China, from 1999 to 2004. He is currently John Curtin Distinguished Professor at Curtin University. He has published 5 books and over 350 journal papers. He has a software package, MISER3.3, for solving general constrained optimal control problems. His editorial positions include serving as Editor-in-Chief of the Journal of Industrial and Management Optimization, and Numerical Algebra, Control and Optimization, and as a member of editorial board of Automatica, Journal of Global Optimization, Journal of Optimization Theory and Applications, Optimization and Engineering, Discrete and Continuous Dynamic Systems, Optimization Letters, Differential Equations and Dynamical Systems, and Applied Mathematical Modeling. His research interests include both the theoretical and practical aspects of optimal control and optimization, and their practical applications such as in signal processing in telecommunications, and financial portfolio optimization.

A Parallel Low Complexity Zero-Forcing Beamformer ... - IEEE Xplore

A Parallel Low Complexity Zero-Forcing Beamformer ... - IEEE Xplore

Suggest Documents

Low Complexity Implementation of Block ... - IEEE Xplore

Design of a compact, low complexity scalable phased ... - IEEE Xplore

A Low Complexity HD Detector for Dual Polarized ... - IEEE Xplore

A Low ML-Decoding Complexity, Full-Diversity, Full ... - IEEE Xplore

A Low-Complexity Noncoherent IR-UWB Transceiver ... - IEEE Xplore

A Decoupling Approach for Low-Complexity Vector ... - IEEE Xplore

A LOW-POWER PARALLEL PROCESSOR IC FOR ... - IEEE Xplore

A First-Order Optimal Zero-Forcing Beamformer Design ... - IEEE Xplore

A Superresolution Wide Null Beamformer for ... - IEEE Xplore

Low Complexity Multi-Target Tracking for Embedded ... - IEEE Xplore

On Low Repair Complexity Storage Codes via Group ... - IEEE Xplore

Low Complexity Adaptive Turbo Space-Frequency ... - IEEE Xplore

Low Complexity Fractal-based Image Compression ... - IEEE Xplore

Low ML-Decoding Complexity, Large Coding Gain, Full ... - IEEE Xplore

Exploiting Multipath Activity Using Low Complexity ... - IEEE Xplore

Data-Assisted Low Complexity Compressive Spectrum ... - IEEE Xplore

Data-Assisted Low Complexity Compressive Spectrum ... - IEEE Xplore

Low-complexity space-time processor for DS-CDMA ... - IEEE Xplore

Low-Complexity Beam Allocation for Switched-Beam ... - IEEE Xplore

Low Complexity Iterative Receiver for Downlink MC ... - IEEE Xplore

Selective Gray-Coded Bit-Plane Based Low-Complexity ... - IEEE Xplore

Low-Complexity Joint Synchronization of Symbol Timing ... - IEEE Xplore

Low-complexity data decoding using binary phase ... - IEEE Xplore

Low Complexity Compressed Sensing Based Channel ... - IEEE Xplore