AN ONLINE LEARNING APPROACH TO ... - IEEE Xplore

2015 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 17–20, 2015, BOSTON, USA

AN ONLINE LEARNING APPROACH TO THROUGHPUT OPTIMIZATION IN WIRELESS NETWORKS UNDER DYNAMIC AND UNKNOWN INTERFERENCE CONDITIONS Ramesh Annavajjala, Rami S. Mangoubi, Christopher C. Yu and James M. Zagami The Charles Stark Draper Laboratory 555 Technology Square, Cambridge, MA, 02139 ABSTRACT In this paper, we consider a multi-user communication system with dynamically varying interference on block fading channels. We focus on a multi-antenna receiver, single-antenna transmitters, and the case in which the receiver has no knowledge of the channel state information, interference dynamics, and the variance of the additive noise. Pilot-assisted transmission techniques are employed to enable channel estimation at the receiver. For a given channel coherence length, increasing the number of pilots improves the estimation accuracy, with the tradeoff of reduction in data throughput. Thus, we propose to optimize the pilot content within the data frame to maximize the average data throughput. We employ well-known cross-validation techniques from the machine learning literature to simultaneously improve the estimation accuracy as well as the average throughput. Simulation results with the proposed approach suggest that even when the average number of active interferers is larger than the number of degrees of freedom, at least 85% of the ideal throughput can be achieved with the optimum pilot overhead. Index Terms— multiple-access communication, dynamic interference, optimum combining, diagonal loading, cross-validation 1. INTRODUCTION 1.1. Background The recent decade has witnessed near-exponential growth in the wireless data usage [1], and we are marching towards a fifth generation wireless technology to provide data rates of the order of multi-gigabits [2]. With network densification via small cells, and due to spatial-division multiple access (SDMA), interference in wireless networks has become one of the major bottlenecks to provide reliable communication. A large body of research has focused on characterizing the effects of interference on the system performance [3]. A well-known approach to signal detection in the presence of interference is the use of optimum combining receiver [4]. The optimum combining receiver is essentially a linear minimum-mean-square error (L-MMSE) receiver that maximizes the signal-to-interference-plus-noise ratio (SINR) at the output of the combiner. This is to be contrasted against the maximal ratio combining (MRC) receiver that maximizes the signal-to-noise ratio (SNR) which is optimal when there is no interference. While MRC requires the knowledge of the instantaneous channel state information (CSI) and the noise variance, the L-MMSE receiver requires additional information about the instantaneous interference covariance matrix. Also, in a dynamic interference environment, the number of active interferers can be random in which case neither the MRC nor the L-MMSE receiver is optimal. To enable c 978-1-4673-7454-5/15/$31.00 ⃝2015 IEEE

practical implementation of these algorithms, typically known (or pilot) symbols are inserted within the data frame so that the receiver can estimate the CSI of the desired and interfering users. Although pilot symbols can improve the estimation accuracy, they lead to a reduction in the data throughput. Traditionally, interference mitigation algorithms in the literature assume a fixed number of interferers, and focus on detection performance with either known or estimated CSI [5]. On the other hand, using random set theory and approximate Bayesian recursions, optimal joint detection of multiple users is formulated in [6, 7]. Since estimation of the sample covariance matrix (SMI) requires at least as many samples as the number of receive antennas, the impact of diagonal loading on the SMI-based L-MMSE receiver performance is studied in [8]. Also, the robustness of Capon beamformer is studied in [9] wherein the authors employ the Lagrangian multiplier methodology to precisely compute the diagonal loading based on the ellipsoidal uncertainty set of the array steering vector. While many works have addressed computation of optimal diagonal loading, these approaches assume either channel statistics or a deterministic number of interferers [10, 11]. Recently, there is growing interest in the application of machine learning (ML) [12] techniques to wireless communications and networking [13]-[15] as problems in ML and communication share many similarities. For example, regression in ML is closely tied to continuous-valued parameter estimation in communication, whereas the classification in ML bears similarity with detection of finite-dimensional signal constellations. In particular, for cognitive wireless networks, using the support vector machines, the authors in [16] address the channels and modulation selection problem, whereas the radio-frequency channel characterization problem is studied in [17]. Using unsupervised learning, [18] studies the robust signal classification problem. 1.2. Problem Statement In this paper, we address the problem of interference mitigation when the interference is dynamically varying and when the receiver has no knowledge of the CSI and the noise variance. We focus on a block fading channel model with single-antenna transmitters and multiple antennas at the receiver, and employ pilot-assisted transmission techniques to enable channel estimation at the receiver. Increasing the number of pilots improves the channel estimation accuracy, but the overhead results in will lead to a reduction in data throughput, we propose to optimize the pilot content within the data frame to maximize the average data throughput. We employ well-known cross-validation (CV) techniques from machine learning literature to simultaneously improve the estimation accuracy as well as the average throughput. For practical channel coherence

lengths, and for binary modulations, our simulation results suggest that significant throughput improvement can be achieved with minimal pilot overhead even when the total number of users is larger than the number of receive antennas. The paper is organized as follows. In Section 2, we introduce the system model that captures dynamic interference conditions. The problem formulation is described in Section 3. A cross-validation approach to parameter estimation and signal detection is detailed in Section 4, and simulation results are presented in Section 5. We conclude this work in Section 6. 2. SYSTEM MODEL Notation: Lower-case bold-faced variables denote the column vectors (i.e., x) whereas upper-case bold-faced variables denote the matrices (i.e., A). The identity matrix of size 𝑁 × 𝑁 is denoted by I𝑁 . The transpose (or Hermitian) of a vector or a matrix is denoted by (⋅)⊤ (or (⋅)† ). A complex (or real)-Gaussian random vector (cgRV or rgRV) x with mean m and covariance matrix C is denoted by x ∼ 𝒞𝒩 (m, C) (or x ∼ 𝒩 (m, C)). The expectation operator is denoted by 𝔼 [⋅]. The size of (or the number of elements in) a set 𝒮 is denoted by ∣𝒮∣. For a scalar/vector/matrix □, ℜ{□} denotes the corresponding real part. We consider a communication link with a desired transmitter and its receiver, which is affected by a number of interfering transmitters. The maximum number of interferers is denoted by 𝐾max . In this work, we focus on the case when each transmitting node is equipped with a single transmit antenna. The desired receiver is assumed to have 𝑁𝑅 receive antennas. The air interference is such that a channel use is defined as communication on a specific frequency tone during a symbol period. This model, for example, corresponds to an OFDMA (orthogonal frequency-division multiple access) air interface. We consider a block fading channel model wherein the channel remains constant within a block of 𝑁 channel uses, and varies slowly across the blocks. We denote the block index by 𝑏, and the index of the channel use within a block by 𝑛. Assuming perfect symbol synchronization at the receiver , the 𝑁𝑅 × 1-dimensional signal vector at the receiver can be written as y(𝑏, 𝑛)

=

h0 (𝑏)𝛼0 (𝑏)𝑥0 (𝑏, 𝑛) + 𝐾 max ∑

h𝑘 (𝑏)𝛼𝑘 (𝑏)𝑥𝑘 (𝑏, 𝑛) + v(𝑏, 𝑛)

𝑘=1

=

H(𝑏)𝜶(𝑏)x(𝑏, 𝑛) + v(𝑏, 𝑛),

(1)

where 𝑥0 (𝑏, 𝑛) is the desired user’s signal, h0 (𝑏) is the 𝑁𝑅 × 1dimensional channel from the desired user, 𝑥𝑘 (𝑏, 𝑛) is the signal from the 𝑘-th interferer, h𝑘 (𝑏) is the 𝑁𝑅 × 1-dimensional channel from the 𝑘-th interferer, and v(𝑏, 𝑛) is the additive noise with mean 0 and spatial covariance matrix R. We also have H(𝑏) = [h0((𝑏), . . . , h𝐾max (𝑏)] the global channel matrix, and ) 𝜶(𝑏) = diag [𝛼0 (𝑏), . . . , 𝛼𝐾max (𝑏)]⊤ the diagonal matrix of the

user activation factors, and x(𝑏, 𝑛) = [𝑥0 (𝑏, 𝑛), . . . , 𝑥𝐾max (𝑏, 𝑛)]⊤ the vector-valued symbols of all the users. For simplicity, we assume v(𝑏, 𝑛) to be independent and identically distributed (i.i.d) complex-Gaussian across channel uses. The coefficients 𝛼𝑘 (𝑏) represent the activity factors for user 𝑘, 𝑘 = 0, . . . , 𝐾max . A simple model for 𝛼𝑘 (𝑏) is an i.i.d Bernoulli distribution. That is, 𝛼𝑘 (𝑏) = 1 with probability 𝑝𝑘 , and is 0 with probability 1 − 𝑝𝑘 . In this work, we set 𝑝0 = 1 and 𝑝𝑘 = 𝑝, 𝑘 = 1, . . . , 𝑘max . That is, the desired user is always present in the received signal model of (1). Note that

when 𝛼𝑘 (𝑏) = 1, ∀𝑘 = 0, . . . , 𝐾max , detection of 𝑥0 (𝑏, 𝑛) using a linear receiver requires 𝑁𝑅 ≥ 𝐾max + 1. It is important to realize that determining the set of active interferers from the received signal model in (1) is closely related to the model order determination problem [19]. With knowledge of the sample correlation matrix of the received signal and the underlying noise variance, this problem is well-studied in [20]-[24]. We note that a block of 𝑁 symbols, in practice, is generally partitioned into 𝑁𝑃 pilot symbols and 𝑁𝐷 data symbols. The pilot symbols enable the receiver estimate the channel parameters, whereas the information is carried within the data symbols. At the receiver, a more realistic assumption is that the pilot symbols from the desired transmitter are known whereas they are unknown from the interfering transmitters. Without loss of generality, the first 𝑁𝑃 positions of the block are assumed to contain pilots. As a result, for 𝑛 = 1, . . . , 𝑁𝑃 , we set 𝑥0 (𝑏, 𝑛) = 1 and 𝑥𝑘 (𝑏, 𝑛) = ±1, with equal probability, for 𝑘 = 1, . . . , 𝐾max . The remaining 𝑁𝐷 of the 𝑁 symbols contain the modulation data that must be detected at the desired receiver. Note that each transmitter can employ a modulation format that is different from the other transmitters. For simplicity, we assume a common signal constellation that has 𝑀 modulation symbols. The channels h𝑘 (𝑏), 𝑘 = 0, . . . , 𝐾max , can have a variety of distributions that strongly depend on the propagation environment. For simplicity, we assume that h𝑘 (𝑏) ∼ 𝒞𝒩 (0, 𝛺𝑘 I𝑁𝑅 ). This model assumes a rich scattering environment, and the channel gains are spatially uncorrelated. This is a valid assumption for widely spaced antenna elements. The variable 𝛺𝑘 captures the distancedependent average channel power from user 𝑘. We also assume spatial independence of fading channels across the users. 3. OPTIMUM ONLINE LEARNING Our goal is to devise algorithms for channel parameter estimation and signal detection in dynamic interference conditions. From an implementation standpoint, we constrain the receiver to use linear detection algorithms (such as zero-forcing or L-MMSE receiver approaches). Note that we have a basic assumption that 𝐾max ≤ 𝑁𝑅 − 1 for the feasibility of linear receivers when 𝑝 = 1. However, the average number of interferers is 𝐾max 𝑝 which can be significantly less than 𝑁𝑅 , depending upon the value of 𝑝, and a linear receiver with a fixed number of receive antennas can potentially withstand a group of interferers that is larger than 𝑁𝑅 − 1. We note that the number of symbols within a coherence block, 𝑁 , is a function of the channel selectivity in time and frequency. Roughly, 𝑁 varies inversely with the product of the channel coherence lengths in time and frequency. Also, 𝑁 should be higher than 𝑁𝑅 to make estimation of channel covariance matrix at the receiver feasible using a portion of the pilot symbols. We also assume a flexibility in our choice of 𝑁𝑃 and 𝑁𝐷 such that 𝑁 = 𝑁𝑃 + 𝑁𝐷 . Note that choosing a higher 𝑁𝑃 provides good channel estimation accuracy, but at the cost of information rate. In this work, the noise covariance matrix is set to R = 𝜎𝑛2 I𝑁𝑅 , and, in addition to the channel gains, the receiver does not have knowledge of 𝜎𝑛2 . With linear processing constraints at the receiver, let us denote by w𝑏 the weight vector employed within block 𝑏 to detect the symbols 𝑥0 (𝑏, 𝑛), 𝑛 = 𝑁𝑃 + 1, . . . , 𝑁 . The detected symbol 𝑥 ˆ0 (𝑏, 𝑛) is simply { } 𝑥 ˆ0 (𝑏, 𝑛) = slicer w𝑏† y(𝑏, 𝑛), 𝒮 , 𝑛 = 𝑁𝑃 + 1, . . . , 𝑁, (2) where 𝒮 is the signal constellation employed by the desired user, and

slicer {𝑧, 𝒮} = argmin𝑥∈𝒮 ∣𝑧 − 𝑥∣2 is the inverse mapping of the complex-valued signal 𝑧 to produce the nearest modulation symbol within 𝒮. Note that since the channel gains and the noise variance are unknown, pilot symbols are used to estimate these parameters which in turn are used to form the weight vector w𝑏 . The fraction of symbols that are correctly detected is termed as the normalized throughput, and is given by

where Rideal is the ideal channel covariance matrix which is given by ∑ Rideal = h0 h†0 + h𝑘 h†𝑘 + 𝜎𝑛2 I𝑁𝑅 , (7) 𝑘∈ℐ

(3)

and the denominator in (6) ensures that w† h0 = 1 so that the desired user’s signal, upon the application of w, has no channel-specific scaling. The equalized symbols for the desired user are given by { } † 𝑥 ˆ0 (𝑛) = slicer weff y(𝑏, 𝑛), 𝒮 𝑛 = 𝑁𝑃 + 1, . . . , 𝑁. (8)

where 1A is the indicator function that evaluates to 1 when the event A is true, and is 0 when A is false. Our goal is, for a block length 𝑁 , to optimally allocate the pilots and data to maximize the average normalized throughput, 𝔼 [𝒯𝑏 ]. More formally, the optimization problem is:

The probability of error in correctly detecting 𝑥0 (𝑛) when all the users employ a binary constellation is given by [ ( )] ∑ † † √ ℜ{weff h0 + 𝑘∈ℐ weff h 𝑘 𝑥𝑘 } 𝑃𝑠 = 𝔼 𝒬 2 , (9) 𝜎𝑛 ∥weff ∥

∑𝑁 𝒯𝑏 =

𝑛=𝑁𝑃 +1

[𝑁𝑃,𝑜𝑝𝑡 , 𝑁𝐷,𝑜𝑝𝑡 ]

=

=

1{ˆ𝑥0 (𝑏,𝑛)≡𝑥0 (𝑏,𝑛)} 𝑁

,

argmax 𝔼 [𝒯𝑏 ] 𝑁𝑃 ,𝑁𝐷

subject to 𝑁𝑃 + 𝑁𝐷 = 𝑁 ) 𝑁𝐷 ( argmax 1 − 𝑃𝑠 𝑁𝑃 ,𝑁𝐷 𝑁 subject to 𝑁𝑃 + 𝑁𝐷 = 𝑁,

(4)

where 𝑃 𝑠 is the average symbol error probability. Since the constrains are integer valued, and 𝑃 𝑠 is analytically intractable, it is rather hard to analytically solve (4). Further, with pilot-based channel estimation, 𝑃 𝑠 itself is a function of 𝑁𝑃 . To proceed further, we ]⊤ [ (𝑖) (𝑖) choose a set of pilot/data partitions, 𝑛(𝑖) = 𝑁𝑃 , 𝑁𝐷 , such that (𝑖)

(𝑖)

𝑁𝑃 + 𝑁𝐷 = 𝑁 . For each partition 𝑖, we employ cross-validation principles from the ML literature for robust weight vector computation, and record the throughput achieved, 𝔼 [𝒯𝑏 ](𝑖) . The optimal partition, 𝑖∗ , is simply 𝑖∗ = argmax𝑖 𝔼 [𝒯𝑏 ](𝑖) . The main advantage of this approach is that the search complexity is fully controlled by the the number of partitions, and we only need to search around the small-to-moderate pilot sizes. Since many parameters in the model (1) are unknown, we expect that cross-validation approaches provide best-in-class estimation as well as detection performances for both in-sample as well as out-of-sample data. 4. CROSS VALIDATION APPROACH 4.1. Ideal Performance Before we embark on cross-validation approaches to the throughput optimization problem in (4), we first look at the best possible performance under ideal channel knowledge. This ideal performance also serves as an upper bound on what is achievable by any learning algorithm. With ideal channel knowledge, we drop the index of the coherence block 𝑏. Within a coherence block, we denote by ℐ = {𝑖1 , . . . , 𝑖𝐾 } the set of active interferers. The instantaneous interference channel matrix can then be denoted by Hℐ which is given by ℋℐ = [h𝑖1 , . . . , h𝑖𝐾 ] . (5) Having knowledge of h0 , Hℐ and the noise variance 𝜎𝑛2 , the linear MMSE weight vector at the receiver is weff =

R−1 ideal h0

h†0 R−1 ideal h0

,

(6)

where 𝒬 (𝑥) is the complimentary cumulative distribution function of a standard Gaussian rv, and the expectation is over h0 , and, for 𝑘 ∈ ℐ, {h𝑖𝑘 , 𝑥𝑘 }. In (9), 𝑥𝑘 = ±1, with equal probability, are the modulation symbols of the 𝑘th active interferer. We also note that when there is no interference (i.e., 𝐾 = 0), the optimal detection rule is maximal ratio combining (MRC) with the weights w = h0 /∥h0 ∥2 , and the error probability takes a form different from (9) as [ ( )] √ ∥h0 ∥ 𝑃 𝑠,𝑀 𝑅𝐶,𝐾=0 = 𝔼 𝒬 2 , (10) 𝜎𝑛 and the expectation in (10) is over the channel h0 . Since (9) and (10) are not functions of 𝑁𝑃 , it follows that the optimal throughput is achieved by setting 𝑁𝑃 = 0 and 𝑁𝐷 = 𝑁 . That is, as one would expect, with genie-aided channel information, all the symbols within a block are used for data transmission. 4.2. Channel Estimation and Beamforming via Cross-Validation We now describe a procedure that performs channel estimation, signal detection, and optimization of training and data phases to maximize the normalized throughput. We first divide the pilot portion of the frame into training and validation phases. We define by 𝛿 the ratio between the number of symbols for training and the number of pilot symbols. With this, 𝑁𝑃,𝑡 = 𝛿𝑁𝑃 is the number of pilot symbols available for training and 𝑁𝑃,𝑣 = (1 − 𝛿)𝑁𝑃 is the number of pilot symbols available for validation. The set of pilot indices ℐ𝑃 is partitioned into ℐ𝑡 and ℐ𝑣 such that ℐ𝑡 contain the pilot indices for training, whereas ℐ𝑣 contain the indices for testing. Using ℐ𝑡 , a sample-mean based channel estimate is ∑ ˆ 0 (𝑏) = 1 y(𝑏, 𝑛) = h0 (𝑏) + h ∣ℐ𝑡 ∣ 𝑛∈ℐ 𝑡

1 ∑ ∑ h𝑖𝑘 (𝑏, 𝑛)𝑥𝑖𝑘 (𝑏, 𝑛) + v𝑡 (𝑏), ∣ℐ𝑡 ∣ 𝑛∈ℐ 𝑘∈ℐ

(11)

𝑡

where the second term in (11) ( is the inter-user ) (or multiple-access) 𝜎2 interference, and v𝑡 (𝑏) ∼ 𝒞𝒩 0, ∣ℐ𝑛𝑡 ∣ I𝑁𝑅 is the channel estimation error (in the absence of any interference). An estimate of the overall covariance matrix using ℐ𝑡 is ∑ ˆ 𝑡𝑜𝑡𝑎𝑙 (𝑏) = 1 y(𝑏, 𝑛)y† (𝑏, 𝑛), R ∣ℐ𝑡 ∣ 𝑛∈ℐ 𝑡

(12)

K(max) = 0. SNR [dB] = 0. NR = 4. N = 1000. δ = 0.8

and an estimate of the noise variance is given by ˆ2 𝑛 = 𝜎

∑ 1 ˆ 0 (𝑏)∥2 . ∥y(𝑏, 𝑛) − h ∣ℐ𝑡 ∣𝑁𝑅 𝑛∈ℐ

0.99

(13)

𝑡

ˆ 0 (𝑏) ˆ −1 (𝑏)h R 𝑒𝑠𝑡,𝜆 , † −1 ˆ 0 (𝑏) ˆ ˆ (𝑏)h h (𝑏)R 0

(14)

Normalized Throughput

0.97

Note that the estimate (13) is biased, and this bias can be corrected relatively easily only when there is no interference. We propose the following weight vector to detect the desired user’s modulation symbols: w𝜆 (𝑏) =

With CV: Based on MSE With CV: Based on mean pilot error Without CV

0.98

0.96 0.95 0.94 0.93 0.92

𝑒𝑠𝑡,𝜆

0.91

where 0.9

(15)

is an estimated covariance matrix of the received signal augmented with a diagonal load that is parameterized by 𝜆. We note that, (14) provides a robust beamformer in the presence of unknown noise variance and dynamic interference, and, unlike [8],[9],[10], and [11], we determine the optimal 𝜆 solely based on the received data and known pilot symbols without regard to the statistics of interference and noise. The detected symbols using (15) are simply { } 𝑥 ˆ0 (𝑛) = slicer w𝜆† (𝑏)y(𝑏, 𝑛), 𝒮 𝑛 = 𝑁𝑃 + 1, . . . , 𝑁. (16) Using the fact that 𝑥0 (𝑏, 𝑛) = 1 for 𝑛 ∈ ℐ𝑣 , an optimal 𝜆 can be obtained by minimizing the sample MSE between the estimated and true pilot symbols, or by minimizing the sample error rate between the detected and true pilot symbols. That is, 2 1 ∑ † 𝜆★,𝑀 𝑆𝐸 = argmin (17) 1 − w𝜆 (𝑏)y(𝑏, 𝑛) ∣ℐ ∣ 𝑣 𝜆∈𝜦 𝑛∈ℐ 𝑣

is the optimal 𝜆 that minimizes the sample MSE in the testing set, and 1 ∑ { { 1 sign ℜ{w† (𝑏)y(𝑏,𝑛)}}∕=1} (18) 𝜆★,𝐵𝐸𝑅 = argmin ∣ℐ𝑣 ∣ 𝑛∈ℐ 𝜆 𝜆∈𝜦 𝑣

is the optimal 𝜆 that minimizes the sample BER in the testing set. Note that in (17) and (18) 𝜦 is a set of 𝜆s that the receiver must search over, and the overall detection complexity grows linearly with ∣𝜦∣. Once the optimal 𝜆 is found, the receiver employs all the pilot symbols to estimate the channel, overall covariance matrix, and the noise variance. The resulting weight vector w𝜆★ (𝑏) is used to detect all the data symbols within the frame. We refer to the optimal beamformer based on (17) as the MSE-CV-BF, whereas the one based on (18) as the BER-CV-BF. The conventional beamformer without CV is termed as C-BF which is obtained by using all the pilots to estimate the desired channel, interference-plus-noise covariance matrix, and the additive noise variance, and the diagonal load is simply the noise variance. 5. SIMULATION RESULTS 5.1. Parameters and Methodology In all the simulations, we set 𝑁𝑅 = 4 receive antennas, employ binary constellations for all the users (i.e., 𝒮 = {−1, +1}), and set 𝛿 = 0.8 (i.e., 80% of pilots for training and the remaining 20% for

0.89 10

20

30

40

50 60 Number of Pilots

70

80

90

100

(a) SNR = 0 dB SNR [dB] = 20. NR = 4. N = 1000. δ = 0.8 1 With CV: Based on MSE With CV: Based on mean pilot error Without CV

0.99 0.98 Normalized Throughput

( ) ˆ 𝑡𝑜𝑡𝑎𝑙 (𝑏) + 𝜆 trace R ˆ 𝑒𝑠𝑡,𝜆 (𝑏) = R ˆ 𝑡𝑜𝑡𝑎𝑙 (𝑏) I𝑁 R 𝑅 𝑁𝑅

0.97 0.96 0.95 0.94 0.93 0.92 0.91 0.9 10

20

30

40


70

80

90

100

(b) SNR = 20 dB Fig. 1: Normalized throughput under the first approach to interference modeling. Here, 𝐾max = 0, 𝛿 = 0.8, 𝑁𝑅 = 4 antennas, and a frame length of 𝑁 = 1000 symbols.

validation) which is a general recommendation from the ML literature [12]. The channel coherence length 𝑁 is set to 1000 symbols, and the activity factor of interferers, 𝑝, is set to 0.5. The diagonal load search window 𝜦, in dB, is chosen from [−20, 20] in increments of 2. For each data/pilot partition, we generate 100000 independent realizations of (1). For each realization, we compute the optimal weight vector from (14) with the sample MSE minimizing 𝜆★,𝑀 𝑆𝐸 from (17), or the sample BER minimizing 𝜆★,𝐵𝐸𝑅 from (18). Using the optimal BF, the data symbols are detected as per (16), and the normalized throughput per realization is computed as per (3). Upon averaging (3) over the realizations, we obtain 𝔼 [𝒯𝑏 ]. In all the simulations, the throughput is further normalized by the ideal throughput with perfect CSI at the receiver. The interference is modeled in two different approaches. In the first approach, all the active interferers transmit at the same average power level, which is denoted by 𝛾 𝐼 = 𝛺𝑖 /𝜎𝑛2 , 𝑖 ∈ ℐ, relative to the thermal noise power. If 𝛾 0 = 𝛺0 /𝜎𝑛2 denote the average received SNR from the desired user, then 𝛾 0 /𝛾 𝐼 denote the average carrier-

K(max) = 5. p = 0.5. CIR [dB] = −10. SNR [dB] = 0. NR = 4. N = 1000. δ = 0.8

K(max) = 10. p = 0.5. CIR [dB] = 0. SNR [dB] = 0. NR = 4. N = 1000. δ = 0.8

0.9

0.94

0.88 0.92


0.84



0.86

0.82 0.8 0.78

0.9

0.88

0.86 With CV: Based on MSE With CV: Based on mean pilot error Without CV

0.76 0.84 0.74 0.72 10

20

30

40


70

80

90

0.82 10

100

20

30

(a) SNR = 0 dB

40


70

80

90

100

(a) SNR = 0 dB K(max) = 10. p = 0.5. CIR [dB] = 0. SNR [dB] = 20. NR = 4. N = 1000. δ = 0.8

K(max) = 5. p = 0.5. CIR [dB] = −10. SNR [dB] = 20. NR = 4. N = 1000. δ = 0.8 0.9

0.94

0.92



0.85


0.8

0.75

0.9

0.88

0.86

0.84 0.7


0.82

0.65 10

20

30

40


70

80

90

100

0.8 10

20

30

40


70

80

90

100

(b) SNR = 20 dB

(b) SNR = 20 dB

Fig. 2: Normalized throughput under the first approach to interference modeling. Here, 𝐾max = 5, 𝑝 = 0.5, 𝛿 = 0.8, CIR = -10 dB, 𝑁𝑅 = 4 antennas, and a frame length of 𝑁 = 1000 symbols.

Fig. 3: Normalized throughput under the second approach to interference modeling. Here, 𝐾max = 10, 𝑝 = 0.5, 𝛿 = 0.8, CIR = 0 dB, 𝛥 = 10 dB, 𝑁𝑅 = 4 antennas, and a frame length of 𝑁 = 1000 symbols.

to-interference ratio (CIR). In the second approach, each interferer is assumed to transmit at a power level that is uniformly distributed within [−𝛥, 𝛥] dB relative to a nominal value of 𝛾 𝐼 . This model allows for distance-dependent power variations and/or any residual errors due to open-loop power control. Under the second approach, we set the nominal CIR to be 0 dB and 𝛥 = 10 dB. 5.2. Results and Observations Under the first approach to interference modeling, the throughput is plotted as a function of 𝑁𝑃 for 𝛾 0 ∈ {0, 20} dB. Fig. 1 depicts the throughput performance when 𝐾max = 0, whereas Fig. 2 assumes 𝐾max = 5 and 𝑝 = 0.5. We observe from Fig. 1 that, in the absence of any interference, there is very little to be gained from the CV approach as the optimum load is 0. In fact, at very low SNRs (i.e., 𝛾 0 = 0 dB) and at very low 𝑁𝑃 there is a small degradation in performance with both MSE-CV-BF and BER-CV-BF relative to the C-BF. As the SNR increases to 20 dB, all the approaches yield iden-

tical performances. However, with interference, the performances are remarkably different, as shown in Fig. 2. From Fig. 2, we observe that, at both lower and higher operating SNRs, the proposed MSE-CV-BF and BER-CV-BF approaches significantly outperform C-BF. For example, with 1% pilot overhead, the throughput of MSECV-BF is around 83% which is 15% higher than that of C-BF. We also notice that at lower 𝑁𝑃 , MSE-CV-BF has a small advantage over BER-CV-BF, whereas at higher SNR and with larger 𝑁𝑃 these two approaches have comparable performances. Under the second approach to interference modeling, Fig. 3 considers an over-loaded scenario with 𝐾max = 10 and 𝑝 = 0.5. Note that the average number of interferers in this case is 5 which is higher than 𝑁𝑅 − 1 = 3. When 𝛾 0 = 0 dB, we see that the normalized throughput with C-BF peaks around 87% with 𝑁𝑃 = 50, whereas with just 25 pilots the normalized throughput improves to 92% with the MSE-CV-BF. As the SNR increases to 20 dB, we see a slight dip (to around 86% at 𝑁𝑃 = 50) in the normalized throughput of C-BF,

whereas it increases to around 93% with the MSE-CV-BF. We also observe that, in the region of higher pilot overhead, the BER-CV-BF has a slightly inferior performance compared with the MSE-CV-BF at lower SNRs, and the two approaches have comparable performances as the SNR increases. However, for lower pilot overhead, MSE-CV-BF offers superior performance compared with BER-CVBF. 6. CONCLUSION Traditionally, interference mitigation algorithms in the literature have focused on either identifying/estimating a deterministic number of interference channels or employing a variety of receivers with either ideal/estimated channels. In this work, we have addressed the problem of robust interference mitigation with linear receivers for data throughput optimization when the receiver has no knowledge of the channel statistics, and when the interference itself is dynamically varying across the channel coherence length. Using the cross-validation principles from machine learning, we have obtained the optimum data and pilot allocation to maximize the average throughput. Our results have shown that even when the average number of active interferers, 𝐾max 𝑝, is larger than the number of degrees of freedom, 𝑁𝑅 − 1, at least 85% of the normalized throughput can be achieved with the optimum pilot overhead. 7. REFERENCES [1] Cisco White Paper, “Visual Networking Index: Global Mobile Data Traffic Forecast Update, 2014 2019.” Available at http://www.cisco.com/c/en/us/ solutions/collateral/service-provider/ visual-networking-index-vni/white_paper_ c11-520862.html.

[10] N. Ma and J. Goh, “Efficient method to determine diagonal loading value,” in Proc. IEEE Int. Conf. Acoustics, Speech Signal Processing, vol. V, 2003, pp. 341-344. [11] X. Mestre and M. A. Lagunas, “Finite sample size effect on minimum variance beamformers: Optimum diagonal loading factor for large arrays,” IEEE Trans. Sig. Processing, vol. 54, no. 1, pp. 69-82, Jan. 2006. [12] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Ed., Springer & Sons, 2013. [13] C. Clancy, J. Hecker, E. Stuntebeck, and T. OShea, “Applications of machine learning to cognitive radio networks,” IEEE Wireless Commun., vol. 14, no. 4, pp. 47-52, Aug. 2007. [14] A. He, K. K. Bae, T. Newman, J. Gaeddert, K. Kim, R. Menon, L. Morales-Tirado, J. Neel, Y. Zhao, J. Reed, and W. Tranter, “A survey of artificial intelligence for cognitive radios,” IEEE Trans. Vehicular Techno., vol. 59, no. 4, pp. 1578-1592, May 2010. [15] M. Bkassiny, Y. Li and S. K. Jayaweera, “A survey on machinelearning techniques in cognitive radios,” IEEE Comm. Surveys & Tutorials, vol. 15, no. 3, pp. 1136-1159, Third Quarer 2013. [16] G. Xu and Y. Lu, “Channel and modulation selection based on support vector machines for cognitive radio,” in Proc. International Conference on Wireless Communications, Networking and Mobile Computing (WiCOM), Sept. 2006, pp. 1-4. [17] T. Atwood, “RF channel characterization for cognitive radio using support vector machines,” Ph.D. dissertation, University of New Mexico, Nov. 2009. [18] T. Clancy, A. Khawar, and T. Newman, “Robust signal classification using unsupervised learning,” IEEE Trans. on Wireless Commun., vol. 10, no. 4, pp. 1289-1299, Apr. 2011. [19] P. D. Grunwald, The Minimum Description Length Principle, The MIT Press, 2007.

[2] J. G. Andrews, S. Buzzi, W. Choi, S. V. Hanley, A. Lozano, A. C. K. Soong, J. C. Zhang, “What will 5G be?,” IEEE Journal on Selected Areas in Commun., vol. 32, no. 6, pp. 1065-1082, June 2014.

[20] H. Akaike, “A new look at the statistical model identification,” IEEE Trans. Automat. Contr., vol. 19, pp. 716-723, 1974.

[3] P. Stavroulakis, Interference Analysis of Communication Systems, Edited, IEEE Press Selected Reprint Series, 1980.

[21] G. Schwartz, “Estimation the order of a model,” Ann. Stat., vol. 6, pp. 461-464, 1974.

[4] J. H. Winters, “Optimum combining in digital mobile radio with cochannel interference,” IEEE Journal on Selected Areas in Commun., vol. 2, no. 4, pp. 528-539. July 1984.

[22] J. Rissanen, “Modeling by shortest data description,” Automatica, vol. 14, pp. 465-471, 1978.

[5] M. L. Honig, Advances in Multiuser Detection, Edited, John Wiley & Sons, 2009. [6] E. Biglieri and M. Lops, “Multiuser detection in a dynamic environment. Part I: User identification and data detection,” IEEE Trans. Info. Theory, vol. 53, no. 9, pp. 3158-3170, Sep. 2007. [7] E. Biglieri and M. Lops, “Multiuser detection in a dynamic environment. Part I: Joint user identification and parameter estimation,” IEEE Trans. Info. Theory, vol. 55, no. 5, pp. 23652374, May 2009. [8] B. D. Carlson, “Covariance matrix estimation errors and diagonal loading in adaptive arrays,” IEEE Trans. Aerospace and Electronics Systems, vol. 24, no. 4, pp. 397-401, July 1998. [9] J. Li, P. Stoica, and Z. Wang, “On robust Capon beamforming and diagonal loading,” IEEE Trans. Signal Processing, vol. 51, no. 7, pp. 1702-1715, July 2003.

[23] M. Wax and T. Kailath, “Detection of signals by information theoretic criteria,” IEEE Trans. on Acoustic, Speech, and Signal Processing (ASSP), vol. 33, pp. 387-392, Apr. 1985. [24] R. R. Nadakuditi and A. Edelman, “Sample eigenvalue based detection of high-dimensional signals in white noise using relatively few samples,” IEEE Trans. Sig. Processing, vol. 56, no. 7, pp. 2625-2638, July 2008.