Dr. Mihnea Moldoveanu and Redline Communications Inc. for offering me the internship ..... methods from machine learning to communications research [1].
VARIATIONAL INFERENCE METHODS FOR SIGNAL PROCESSING IN WIRELESS COMMUNICATIONS
by
Dexu Lin
A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy, Graduate Department of Electrical and Computer Engineering, in the University of Toronto.
c 2008 by Dexu Lin. Copyright All Rights Reserved.
Variational Inference Methods for Signal Processing in Wireless Communications Doctor of Philosophy Thesis Edward S. Rogers Sr. Department of Electrical and Computer Engineering University of Toronto by Dexu Lin October 2007
Abstract The transmission of information over a wireless medium often introduces additional unknown variables corrupting the primary signal of interest. These unknowns originate from different sources (e.g., channel distortions or multiuser interference) and possess different characteristics (additive or multiplicative, static or time-varying). A consequence of this phenomenon is that the extra unknowns need to be taken into consideration at the receiver when detecting the desired signal. The factor graph depiction of this problem setting is straightforward. Yet performing exact inference on the graph to optimally remove the effects of distortion via direct belief propagation is often computationally prohibitive. This thesis aims to establish a new framework, based on variational inference, to guide the design of near-optimal algorithms for joint detection and estimation in such a scenario. The application of this framework in the OFDM area results in a novel detector with phase noise cancellation capability. The proposed algorithm generates near-optimal soft estimates of the desired OFDM signal in the presence of unknown phase noise. To provide accurate channel information for this detector, we also propose an optimal (maximum a posteriori) joint estimator for the channel impulse response, carrier frequency offset and phase noise. This implies that the frequency synchronization and channel estimation algorithms for OFDM can be derived from statistical estimation theory even in the presence of phase noise. This variational inference framework is also used to establish a unified approach for studying joint detection and decoding in multiple-access and ISI channels. It produces low-complexity, near-optimal detector/estimator designs for turbo receivers, including the built-in capability to deal with channel uncertainty. Various existing turbo receivers can be seen as its special cases, leading to insights on how new improvements can be made. ii
Acknowledgements I am genuinely grateful for the guidance I received from my advisor Professor T.J. Lim in the past six years. He has been the most reliable source of knowledge and vision in my effort to undertake the challenge of this thesis. His optimism and humour have also made it less intimidating. I am thankful for the freedom and trust he offered me, and the insights and encouragements that are never in short supply. While much of my life in the past few years has been spent on waiting (for simulation results, paper submission feedbacks, microwave oven to heat up my lunch, etc.), I could always count on his speedy response to my emails and the most patient and meticulous critique to my writing. I also feel very fortunate to be studying under the influence of so many great researchers at U of T. Many of them are role models that I constantly look up to. In particular, I would like to thank Professors Ravi Adve, Frank Kschischang, Wei Yu and Brendan Frey for being great instructors of their courses, from which I benefited tremendously. I am also grateful for their personal help, to which I am deeply indebted. Much credit goes to my friends Yi Zhao, Ryan Pacheco and Dongning Guo, for their insights when I needed them most. Their ingenuity are embedded in the material that will follow. Special thanks to Yi for keeping me company for all the graduate school years, and for constantly updating me with the most recent jokes. To all those great office-mates I cannot name one by one – you guys have made our room a lively place. I would also like to gratefully acknowledge the funding support provided by NSERC, OGS, the Rogers Graduate Scholarship, and the Shahid Qureshi Memorial Scholarship. Thanks to Dr. Mihnea Moldoveanu and Redline Communications Inc. for offering me the internship opportunity in 2003, which eventually lead to the OFDM portion of the thesis. Last but not least, I would like to thank my wife and my parents for their constant support, and my friends at RHCCC, with whom I experienced the abundant provision of joy from our Lord Jesus Christ. iii
Contents Abstract
ii
Acknowledgements
iii
List of Tables
viii
List of Figures
xi
1 Introduction
1
1.1
Frequency Offset and Phase Noise in OFDM
. . . . . . . . . . . . . . . . . . . .
2
1.2
Gaussian Multiple Access Channel . . . . . . . . . . . . . . . . . . . . . . . . . .
5
1.3
The Exact Inference and Its Difficulties . . . . . . . . . . . . . . . . . . . . . . .
7
1.4
Variational Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
1.5
Organization and Contribution of the Thesis . . . . . . . . . . . . . . . . . . . . 12
1.6
I
1.5.1
Part I: Mitigation of Frequency and Phase Distortion in OFDM . . . . . . 12
1.5.2
Part II: Soft-In Soft-Out Detection in Multiple Access Channels . . . . . 12
Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Mitigation of Frequency and Phase Distortion in OFDM
2 Channel Estimation in the Presence of Frequency Offset and Phase Noise
14 15
2.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2
System Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2.1
Prior Statistics of Phase Noise . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.2
Signal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.3
Role of Channel Estimation in Receiver Design . . . . . . . . . . . . . . . 20 iv
CONTENTS
2.3
2.4
2.5
2.6
CONTENTS
Channel Estimation with CFO and PHN . . . . . . . . . . . . . . . . . . . . . . . 22 2.3.1
Joint CFO/PHN/CIR Estimator (JCPCE) . . . . . . . . . . . . . . . . . 22
2.3.2
Complexity Analysis and Low Complexity Implementation . . . . . . . . 25
CFO Estimation Based on Repeated Training Symbols . . . . . . . . . . . . . . . 28 2.4.1
Moose’s CFO Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4.2
CFO Estimator with PHN Rejection . . . . . . . . . . . . . . . . . . . . . 29
2.4.3
Joint Phase Noise and Channel Impulse Response Estimation . . . . . . . 32
2.4.4
Complexity Analysis and Low Complexity Implementation . . . . . . . . 33
Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.5.1
Channel Estimation with PHN Only . . . . . . . . . . . . . . . . . . . . . 34
2.5.2
Channel Estimation with both CFO and PHN . . . . . . . . . . . . . . . 43
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3 Joint Data Detection and Phase Noise Estimation 3.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.2
Signal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3
Conventional and Optimal Phase Noise Cancellation . . . . . . . . . . . . . . . . 49
3.4
3.5
II
47
3.3.1
Conventional Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.3.2
Maximum Likelihood Detector . . . . . . . . . . . . . . . . . . . . . . . . 50
Joint Estimation via Variational Inference . . . . . . . . . . . . . . . . . . . . . . 52 3.4.1
Variational Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.4.2
Iterative Conditional Mode . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.4.3
Expectation-Maximization (EM) Algorithm . . . . . . . . . . . . . . . . . 56
3.4.4
Summary of Variants of Variational Inference . . . . . . . . . . . . . . . . 58
Complexity Analysis and Reduction . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.5.1
Wiener Phase Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.5.2
Gaussian Phase Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.6
Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.7
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Soft-In Soft-Out Detection in Multiple Access Channels
67
4 A Variational Inference Framework for Soft-In-Soft-Out Detection
68
v
CONTENTS
CONTENTS
4.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.2
System Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.3
4.4
4.2.1
Signal Model for BPSK Modulation . . . . . . . . . . . . . . . . . . . . . 71
4.2.2
Optimal SISO Detectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Message-Passing Scheduling in Turbo Multiuser Detection . . . . . . . . . . . . . 73 4.3.1
Obtaining Extrinsic Information: Sequential Schedule . . . . . . . . . . . 73
4.3.2
Obtaining Extrinsic Information: Flooding Schedule . . . . . . . . . . . . 75
4.3.3
Obtaining Extrinsic Information: Hybrid Schedule . . . . . . . . . . . . . 77
Multiuser Detection via Variational Inference . . . . . . . . . . . . . . . . . . . . 77 4.4.1
Variational Inference and Variational Free Energy Minimization . . . . . . 78
4.4.2
VFEM Interpretation of Linear Multiuser Detectors . . . . . . . . . . . . 78
4.4.3
VFEM Interpretation of Interference Cancellation Detectors . . . . . . . . 80
4.4.4
VFEM Interpretation of Gaussian SISO Multiuser Detector . . . . . . . . 83
4.4.5
VFEM Interpretation of Discrete SISO Multiuser Detector
4.4.6
VFEM Interpretation of Decorrelating-Decision-Feedback SISO Multiuser
. . . . . . . . 89
Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.5
4.6
4.7
4.4.7
DDF-Aided Discrete SISO MUD . . . . . . . . . . . . . . . . . . . . . . . 96
4.4.8
Summary of Variational-Inference-Based SISO MUD . . . . . . . . . . . . 99
Variational EM for Iterative Parameter Estimation . . . . . . . . . . . . . . . . . 100 4.5.1
Formulation of Variational EM Algorithm . . . . . . . . . . . . . . . . . . 100
4.5.2
Channel and Noise Variance Estimation for Gaussian SISO MUD . . . . . 102
4.5.3
Channel and Noise Variance Estimation for Discrete SISO MUD . . . . . 105
Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 4.6.1
Gaussian SISO MUD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.6.2
DDF-Aided Discrete SISO MUD . . . . . . . . . . . . . . . . . . . . . . . 110
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5 Bit-Level SISO Detection for Gray-Coded Multilevel Modulation
114
5.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.2
Signal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.3
Turbo Receiver Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 5.3.1
Existing Turbo Receiver for BPSK . . . . . . . . . . . . . . . . . . . . . . 118
5.3.2
Existing Turbo Receiver for Multilevel Modulation . . . . . . . . . . . . . 119
vi
CONTENTS
CONTENTS
5.3.3
Proposed Turbo Receiver for Multilevel Modulation . . . . . . . . . . . . 121
5.4
5.5
5.6
5.7
Binary SISO Detection via Variational Inference . . . . . . . . . . . . . . . . . . 121 5.4.1
Relation to Existing SISO Detectors . . . . . . . . . . . . . . . . . . . . . 121
5.4.2
Theoretical Implications of Variational Inference Interpretation . . . . . . 126
Bit-Level Equalization and Soft Detection for M -QAM . . . . . . . . . . . . . . . 127 5.5.1
Gray Mapping and Multi-Linear Transformation . . . . . . . . . . . . . . 127
5.5.2
Gaussian SISO Detector for 2L -PAM Modulation . . . . . . . . . . . . . . 129
5.5.3
Discrete SISO Detector for 2L -PAM Modulation . . . . . . . . . . . . . . 132
5.5.4
Decorrelating-Decision-Feedback SISO Detector for M -QAM Modulation 136
Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 5.6.1
Turbo Multiuser Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 137
5.6.2
Turbo Equalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
6 Conclusions
145
6.1
Thesis Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
6.2
Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
A Variational Inference for Joint Data and Phase Noise Estimation
148
A.1 Closed Form Expression of F(Q, p) . . . . . . . . . . . . . . . . . . . . . . . . . . 148 A.2 Derivation of Variational Inference Algorithm . . . . . . . . . . . . . . . . . . . . 149 A.3 Derivation of ICM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 B Gaussian SISO Detector for Gray-Coded QAM Modulation
152
C Discrete SISO Detector for Gray-Coded QAM Modulation
155
D List of Publications
158
Bibliography
160
vii
List of Tables 2.1
Joint CFO/PHN/CIR Estimator (JCPCE). . . . . . . . . . . . . . . . . . . . . . 25
2.2
Conjugate Gradient algorithm for evaluating (2.22). . . . . . . . . . . . . . . . . 26
2.3
Modified JCPCE with closed-form CFO estimation. . . . . . . . . . . . . . . . . 33
3.1
Updating Equations for Variational Inference Algorithm. . . . . . . . . . . . . . . 54
3.2
Updating Equations for Iterative Conditional Mode Algorithm. . . . . . . . . . . 55
3.3
Updating Equations for EM I Algorithm. . . . . . . . . . . . . . . . . . . . . . . 57
3.4
Updating Equations for EM II Algorithm. . . . . . . . . . . . . . . . . . . . . . . 58
3.5 3.6
Comparison of Variants of Variational Inference Algorithm. . . . . . . . . . . . . 59 ˆ . . . . . . . . . . . . . . . . . . 60 Conjugate Gradient Algorithm for Evaluating θ.
4.1
Three scheduling schemes of turbo MUD employing Gaussian SISO MUD. . . . . 86
4.2
Three scheduling schemes of turbo MUD employing discrete SISO MUD. . . . . . 93
4.3
Variational-inference-based SISO multiuser detectors employing different scheduling schemes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.4
Variational EM algorithm employing Gaussian SISO MUD. . . . . . . . . . . . . 104
4.5
Variational EM algorithm employing Discrete SISO MUD. . . . . . . . . . . . . . 106
5.1
Characteristics of two variations of Gaussian SISO detectors. . . . . . . . . . . . 124
5.2
EXT obtained from two variations of Gaussian SISO detector. . . . . . . . . . . . 124
5.3
Parameters for BPSK and 4-PAM Gaussian SISO BLESD detector. . . . . . . . . 131
5.4
Parameters for BPSK and 4-PAM discrete SISO BLESD detector. . . . . . . . . 134
5.5
Turbo MUD of 4-PAM/16-QAM implementing Gaussian SISO detection and discrete SISO detection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
5.6
Turbo equalization of 4-PAM/16-QAM implementing Gaussian SISO detection. . 142
viii
List of Figures 1.1
Phase noise and carrier frequency offset channel model. . . . . . . . . . . . . . .
4
1.2
Common phase noise versus random phase noise. . . . . . . . . . . . . . . . . . .
5
1.3
Multiple-access channel model. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
1.4
(a) Factor graph description of an OFDM signal distorted by phase noise. (b) Factor graph description of a multiple-access channel. . . . . . . . . . . . . . . .
7
2.1
OFDM packet structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2
OFDM transmitter/receiver structure and phase noise channel model. The channel is estimated using the proposed Joint CFO/PHN/CIR Estimator (JCPCE). . 21
2.3
Power spectral density of the phase noise process ejθ(t) . . . . . . . . . . . . . . . 35
2.4
Effect of residual common phase rotation in JCPCE, SNR = 30 dB. . . . . . . . 35
2.5
The predicted pdf vs. the histogram of δ at SNR = 15, 25, 35 dB. . . . . . . . . . 38
2.6
Channel estimation performance for different number of conjugate gradient (CG) iterations (ǫ = 0). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.7
MSE vs. SNR channel estimation performance with PHN modelling error (ǫ = 0). 41
2.8
MSE vs. SNR channel estimation performance at different levels of PHN (ǫ = 0). 42
2.9
MSE vs. SNR channel estimation performance using JCPCE (Wiener PHN). . . 43
2.10 MSE vs. SNR channel estimation performance using JCPCE (Gaussian PHN). . 44 2.11 MSE vs. SNR channel estimation performance using modified JCPCE (Wiener PHN). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 2.12 MSE vs. SNR channel estimation performance using modified JCPCE (Gaussian PHN). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.1
Structure of the data detection module incorporating PHN mitigation. . . . . . . 61
ix
LIST OF FIGURES
3.2
LIST OF FIGURES
An instance of PHN sequence estimated using the conventional method and variational inference. (a)–Wiener PHN, (b)–Gaussian PHN. . . . . . . . . . . . . 63
3.3
BER performance comparison between the conventional method and proposed detection schemes. (a)–Wiener PHN, (b)–Gaussian PHN. . . . . . . . . . . . . . 64
3.4
BER performance of the low complexity detection scheme. (a)–Wiener PHN, (b)–Gaussian PHN. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.1
Graphical model of a coded multiuser channel. Note the time dependency among bits of the same user (code constraint), and the user dependency among bits at the same time (channel constraint). . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.2
An instance of sequential message-passing in the graphical model: the multiuser detector receives prior distributions of b2 , b3 and b4 to generate the extrinsic information for b1 . This process is repeated for b2 , b3 and b4 to complete one message-passing iteration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.3
An instance of flooding message-passing in the graphical model: the multiuser detector receives prior distributions of b1 , · · · , b4 to generate the extrinsic infor-
4.4
mation for b1 , · · · , b4 . This completes one message-passing iteration. . . . . . . . 76 BER vs. SNR difference in two-user channel. (a) Strong user detected first. (b)
Weak user detected first. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.5
BER performance of turbo MUD employing Gaussian-SISO MUD (K = 4, ρ = 0.7). (a) Sequential schedule; (b) Flooding schedule. . . . . . . . . . . . . . . . . 108
4.6
BER performance of turbo MUD employing flooding-Gaussian-SISO MUD with joint noise variance and channel estimation (N = 32, K = 32). The single user bound is obtained by assuming perfect channel knowledge. . . . . . . . . . . . . . 109
4.7
BER performance of turbo MUD employing discrete-SISO MUD with joint noise variance estimation (K = 4, ρ = 0.7). (a) Sequential schedule; (b) Flooding schedule. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.8
BER performance of turbo MUD employing flooding-discrete-SISO MUD with joint noise variance and channel estimation (N = 32, K = 32). The single user bound is obtained by assuming perfect channel knowledge. . . . . . . . . . . . . . 112
x
LIST OF FIGURES
5.1
LIST OF FIGURES
(a) Block diagram of a typical symbol-level turbo detector with multilevel modulation. (b) Block diagram of the proposed bit-level turbo detector with multilevel modulation (BLESD). ΛI and ΛO denote extrinsic information generated by the SISO detector and APP decoder, respectively. They are used as prior probabilities for the next decoding/detection stage. . . . . . . . . . . . . . . . . . . . . . . 122
5.2
The transformation from 4-PAM to 8-PAM Gray mapping. . . . . . . . . . . . . 128
5.3
BER performance of turbo MUD with 16-QAM modulation (K = 8, ρ = 0.5). (a) Gaussian SISO detection; (b) Discrete SISO detection. . . . . . . . . . . . . . 139
5.4
BER performance of Gaussian-SISO BLESD and symbol-based turbo equalization for 16-QAM modulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
xi
Chapter 1
Introduction Now faith is the substance of things hoped for, the evidence of things not seen. —– Epistle to the Hebrews (11:1) To meet the ever growing demand for ubiquitous wireless connectivity, wireless receivers are required to be increasingly sophisticated and robust, at the same time providing high spectral efficiency, low manufacturing cost, low power consumption, and variable data rate. The design of receivers to meet such stringent requirements, on the other hand, is limited by the channel noise and fading effects, interference from nearby users, as well as non-idealities within the receiver’s own electronic components. Consequently, the search for powerful detection and estimation algorithms to combat the unforgiving wireless medium and imperfect electronics continues to pose one of the most critical challenges facing communications research today, despite the fact that detection and estimation theory in the traditional sense is a wellestablished and well-studied area. This is because conventional signal processing methods are often ill-equipped to fully address some important problems in wireless research that cannot be simply categorized as inferring an input based on an observed output. Alternative approaches to solve these challenges are crucial to help realize, with the help of coding, the ultimate channel capacity promised by information theory for various wireless applications. Toward the goal of finding such alternatives, this thesis will present a new design framework for wireless detection and estimation for problems with certain defining characteristics. In particular, the proposed framework is studied with two direct applications in mind: 1. We will investigate the resolution of frequency and phase ambiguity in orthogonal frequency division multiplexing (OFDM). This includes both channel estimation and data 1
1.1. FREQUENCY OFFSET AND PHASE NOISE IN OFDM
2
detection in the presence of phase noise and frequency offsets. 2. We will propose a systematic design methodology for soft-in soft-out (SISO) detectors with joint channel parameter tracking capabilities, which may be used as the detection component in an iterative turbo receiver. In the subsequent sections, we will summarize the sources of difficulties associated with each topic and reveal the commonalities underlying both. This is to motivate a unified approach we apply to tackle the problems, namely, variational inference. As an important topic in machine learning, the general goal of inference is to derive beliefs1 of un-observed variables based on the observable evidence. Previous research on decoding low-density-parity-check (LDPC) codes using factor graphs and the sum-product (SP) algorithm is a great example of applying inference methods from machine learning to communications research [1]. Variational inference, on the other hand, is a structured method that generates approximate beliefs about the variables of interest by formulating and solving an optimization problem. For instance, in the signal processing field, variational inference has already been successfully used in image processing to perform scene analysis [2]. In this thesis, we will extend the realm of its application by showing that it is also well-suited for solving a wide range of detection and estimation problems in communications. A brief introduction of the theory of variational inference will be presented in Section 1.4.
1.1
Frequency Offset and Phase Noise in OFDM
First developed in the 1960’s [3], OFDM is now a well-known multicarrier modulation technique [4–6] that has become a preferred choice in high-rate wireless and wireline communication systems such as wireless local area networks (Wi-Fi), broadband wireless access (WiMAX), highspeed digital subscriber lines (DSL) and digital broadcasting (DAB and DVB-T). It is also the dominant modulation choice for the 4G standards that are under development. This is due to its spectral efficiency – no guard bands are needed between adjacent frequency channels – and more importantly, its implementation simplicity compared to traditional time-domain modulation methods in channels with severe inter-symbol interference (ISI), encountered whenever bit rates are required to be very large. OFDM’s combined advantage of high spectral efficiency (comparable to that of single carrier 1
The beliefs are usually expressed as probability distributions.
1.1. FREQUENCY OFFSET AND PHASE NOISE IN OFDM
3
systems) and robustness to frequency-selective fading is made possible by the special mechanism in which it spreads a high-rate time-domain data stream over a number of low rate frequencydomain subcarriers. In doing so, the fading characteristics over each subcarrier is approximately flat, resulting in resilience to ISI. Further, the subcarriers are not multiplexed using the conventional frequency division multiplexing (FDM) technique, which requires non-overlapping subcarriers and guard bands, but are allowed to overlap while still remaining orthogonal to each other, leading to the full utilization of the available spectrum. In practice, OFDM is easily realized by taking a fixed-size data stream modulated in the normal manner and transmitting its inverse discrete Fourier transform (IDFT) in the time domain. After adding a cyclic prefix at the beginning of an OFDM symbol, mathematically it appears as though the data symbols are transmitted in parallel in the frequency domain, each symbol occupying one subcarrier. An introductory overview of this mechanism can be found, for example, in [4] and [7]. Nevertheless, OFDM does have its drawbacks relative to time-domain modulation, most significantly its extreme sensitivity to time-varying multiplicative effects such as fast fading, Doppler shifts, and oscillator jitter. The latter two effects lead to a mismatch between the carrier frequencies of the received signal and the local oscillator, so that a carrier frequency offset (CFO) ∆f Hz is created. Oscillator jitter also creates a very damaging effect called phase noise (PHN), meaning that the phase of the locally generated sinusoid randomly changes over time. These time-varying distortions are particularly damaging to OFDM because in they create inter-carrier interference (ICI) that could potentially severely degrade the performance of data detection. Improved RF circuit design may conceivably alleviate the problem with CFO and PHN but cannot eliminate it. Therefore, it is necessary to design digital signal processing techniques to combat residual CFO and PHN in the high-performance systems envisioned for the future. While the CFO and PHN both manifest themselves as phase deviations in the time domain received signal (as seen in Fig. 1.1), they are modeled as two separate effects because of each one’s unique characteristics. CFO produces a phase rotation which varies linearly with time. In other words, the phase deviation sequence can be written as [φ1 , φ2 , · · · , φN ]T = [ξ, ξ +
2πǫ/N, · · · , ξ + 2π(N − 1)ǫ/N ]T , where ξ is a constant phase offset that can be absorbed into
the channel response. The quantity ǫ, called normalized CFO, is related to the actual frequency offset ∆f and the OFDM symbol period T as ǫ = ∆f T . On the other hand, the phase shift generated by PHN is more complex. In general, the PHN sequence [θ1 , · · · , θN ]T can be modeled
4
1.1. FREQUENCY OFFSET AND PHASE NOISE IN OFDM
Time-Domain OFDM Signal
x1 -1 × ejθ
Frequency Offset
-1 × ejφ
xN −1
xN
? ? ? ? ? N · · · ejφN−1 -2 × ejφ -3 × - × ejφejφ ×
? ? ? ? ? nnn1 + 2 + 3 + N + · · · nN −1 - + n ?
Received Signal
x3
? ? ? ? ? N -2 × ejθ -3 × - × ejθ· · · ejθN−1 ejθ ×
Phase Noise
Additive Noise
x2
r1
?
r2
?
r3
?
rN −1
?
rN
Figure 1.1: Phase noise and carrier frequency offset channel model. as a low pass Gaussian process2 whose parameters are determined by hardware components. Overall, assuming the ideal sample at the receiver at time instance i (i = 1, · · · , N ) is xi , the
actual sample measured by the receiver, corrupted by CFO, PHN and additive channel noise, is3 ri = ej(θi +2πiǫ/N ) xi + ni .
(1.1)
The major contributor to CFO is the frequency difference between the local oscillators at the transmitter and the receiver. This value is a constant for each transmitter-receiver pair. Therefore, CFO may be estimated much like the slow-fading channel (or along with it, as demonstrated in Chapter 2) in the channel estimation stage. But the PHN is a random process, creating different distortions from one OFDM symbol to another, making it difficult to mitigate using a training symbol approach. Various methods to mitigate PHN have been proposed in the past, where PHN is generally decomposed into two components (as seen in Fig. 1.2): the common phase noise which is present in all subcarriers and the random phase noise which induces ICI. In other words, the common PHN is the average phase rotation over an OFDM symbol, and the random PHN is 2 For PHN with small variance, the Gaussian model is very accurate. But for very severe PHN, we may need more complex modeling [8]. 3 See Fig. 2.2 for the detailed OFDM system diagram where the signals xi and ri are defined.
5
1.2. GAUSSIAN MULTIPLE ACCESS CHANNEL
ș t
ș
ș t Common PHN
t Random PHN
Figure 1.2: Common phase noise versus random phase noise.
the variation about the average. In prior methods [9–11], the common phase noise is measured as the average angular rotation of the constellation on the pilot subcarriers and cancelled on the data subcarriers, while the random phase noise is simply ignored. It is clear that only handling the common PHN, although resulting in straightforward and easy-to-implement solutions, does not eliminate the ICI caused by the random PHN, which is the most damaging portion of the phase distortion experienced by the signal. Unfortunately, despite the inadequacy of this approach, any attempt to remove the complete PHN sequence (common PHN + random PHN) could easily lead to prohibitive receiver complexity. It is the objective of this thesis to show that it is possible to consider the complete PHN in OFDM receiver design without imposing prohibitive complexity.
1.2
Gaussian Multiple Access Channel
Our second problem of interest is data detection in Gaussian multiple-access channels. Consider a linear discrete memoryless channel of the form r = Hd + n,
(1.2)
where d is a K-dimensional input vector whose elements represents data symbols taken from a constellation set Ω ⊆ C. The channel matrix H ∈ CN ×K may represent various physical
meanings in different multiple-access settings. When dealing with synchronous direct-sequence
6
1.2. GAUSSIAN MULTIPLE ACCESS CHANNEL
d1
- Channel 1
d2
- Channel 2
d3
- Channel 3
.. . dK
- Channel K
S S S Noise Q S Q S Q S QQ ? S sw - + - r 7
Figure 1.3: Multiple-access channel model. code-division multiple-access (DS-CDMA), which is the application assumed in this thesis, H depends on the signature vector and channel of each user. The multiple-access channel occupies an important place in wireless communications because all wireless networks support multiple users, and at the network uplink, requires accurate detection of data transmitted from each user (dk ) based on the received signal (r) in the presence of inter-user interference and noise. Signal processing techniques developed for such a purpose are referred to as multiuser detection. Multiuser detection has been an extensively studied area for a number of years [12–19]. Apart from the conventional matched filter detector, the most celebrated linear detectors include the linear minimum mean-squared error (L-MMSE) detector and the decorrelating detector. Popular non-linear detectors include, for example, successive interference cancellation (SIC) detector, parallel interference cancellation (PIC) detector, and decorrelating decision-feedback (DDF) detector [20]. More recently, following the discovery of turbo codes [21], the principle of turbo processing has been used in various signal processing settings. Among these, the turbo receiver for coded multiple-access channels, which treats the error control code as the outer code and the channel as the inner code, has been shown to perform dramatically better than the conventional noniterative method of interference suppression followed by hard-decision decoding. At the centre of a turbo multi-user detector is a soft-in soft-out (SISO) detector that is able to exchange
7
1.3. THE EXACT INFERENCE AND ITS DIFFICULTIES
p(x)
···
x
···
p(θ)
p(d1 )
θ
ZZ Z Z f
p(r|x, θ)
p(d2 ) p(dK ) ···
d1
d2
···
dK
bb "" B b B " b " b B" f
p(r|d)
(a)
(b)
Figure 1.4: (a) Factor graph description of an OFDM signal distorted by phase noise. (b) Factor graph description of a multiple-access channel. extrinsic information (EXT) with a SISO decoder. Consequently, the search for low-complexity, high performance SISO detectors has been motivating a broad range of research activities. Such research efforts have led to numerous SISO detectors being derived using radically different approaches: minimizing mean squared-error (MSE) at the output of a linear filter, estimating and cancelling interference user by user, and so on. This diversity of viewpoints makes the comparison and analysis of multiuser detectors difficult, since each approach is valid in its own right, and there seems to be no way to tell how “suboptimal” each one is and how they relate to the optimal solution. The ability to bring different schemes into focus under a single theme would be an important step forward, as it allows for a systematic and rigorous framework to analyze existing methods and propose new ones.
1.3
The Exact Inference and Its Difficulties
On the surface, there seems to be no common ground between the CFO/PHN mitigation problem in OFDM and the SISO detection problem in multi-access channels. But some similarities emerge as we observe the factor graph descriptions of both problems in Fig. 1.4. It is seen that both graphical structures include multiple input variables and a single observation. In Fig. 1.4, the observation node r is omitted since it is a known quantity and may be absorbed into the neighbouring function node f . It is now clear that the common challenge associated with both cases is the inference of multiple inputs (the hidden variables) based on the observed outputs (the evidence). Without doubt, this simple graphical model with multiple inputs and a single observation is connected to a wide range of settings in communications. If we are able to handle these two particular
1.3. THE EXACT INFERENCE AND ITS DIFFICULTIES
8
problems well, it will potentially offer useful intuition to many challenging situations that can be modeled similarly. Using a graphical model to solve communications problems is not new. As mentioned earlier, in the coding community, this technique has been applied with remarkable success in decoding LDPC codes [22]. The message-passing algorithm is called belief propagation because local beliefs (soft estimates) of unknown variables, given the observed evidence, are passed along the vertices of the factor graph according to fixed rules. After a number of iterations, depending on the topology of the graph, we arrive at posterior distributions (or their approximations) of the unknown variables conditioned on the evidence. The rule underlying such a message-passing procedure is the sum-product (SP) algorithm [23]. An important special case of applying the SP algorithm occurs when the factor graph is a tree (no loops). Under this condition, the SP algorithm will terminate after messages have been passed in both directions along every branch, and produce the exact posterior distribution. Noticing that the graphs describing our problems of interest are indeed trees, an obvious question to ask is whether we can apply the SP algorithm directly to extract the optimal solution. Unfortunately, it turns out that the complexity of the SP algorithm would be exactly the same as computing the exact posterior distributions of the desired hidden variables (since the SP algorithm presents exact inference in trees). We analyze each of the two cases in more detail as follows: • In the phase noise case (Fig. 1.4(a)), the message from the function node f to the variable node x is
Mf →x =
R
θ p(r|x, θ)p(θ)dθ
= p(r|x)
(1.3)
The integration in (1.3) is easily computed because a Gaussian prior distribution may be assumed for θ. However, because p(r|x) is a complicated distribution and x belongs to a discrete alphabet, the optimal x that maximizes p(x|r) ∝ p(r|x)p(x) often requires
testing each symbol hypothesis, resulting in complexity of O(|Ω|N ), where |Ω| is the symbol constellation size.
• In the multiple-access channel case (Fig. 1.4(b)), the message from the function node f
9
1.4. VARIATIONAL INFERENCE
to the variable node dk is Mf →dk
=
P
d:dk
p(r|d)
= p(r|dk )
Q
i6=k
p(di )
(1.4)
where the notation d : dk indicates the set of vectors d ∈ ΩK with the k-th symbol being dk . The difficulty here is that the summation in (1.4) requires complexity of O(|Ω|K ), which is prohibitive for large |Ω| and K.
It is therefore clear from the above analysis that exact inference based on the SP algorithm does not help in our case4 . We need a different set of tools to help overcome these obstacles.
1.4
Variational Inference
The variational method is a systematic method to approximate a complicated probability distribution. When applied to inferring the marginal distributions (or likelihood functions) of hidden variables in a statistical model, it is called variational inference. As this thesis is concerned with the practical application of variational inference methods specifically for wireless detection and estimation, we will not attempt a general and in-depth introduction. A more thorough survey, including various interpretations of variational inference and its connection to statistical physics, can be found in [1], [25], and [26]. For illustration purposes, we now provide a rudimentary example of variational inference using a simple communications model with K unknown inputs and one observed output. Assume E to be the observed output due to inputs A1 , A2 , · · · , AK over a channel p(E|{Ak }K 1 ). If
K {Ak }K 1 are to be inferred based on the evidence E, the distribution of interest is p({Ak }1 |E).
Variational inference allows us, whenever the direct evaluation of p({Ak }K 1 |E) is computationK ally intractable, to resort to a tractable approximation of p({Ak }K 1 |E), written as Q({Ak }1 ),
where the constant E is omitted.
K A good approximation Q({Ak }K 1 ) needs to resemble p({Ak }1 |E) as closely as possible, and K the Kullback-Leibler divergence D Q({Ak }K 1 )kp({Ak }1 |E) offers an excellent measure of sim-
ilarity. However, since p({Ak }K 1 |E) is difficult to calculate as we have assumed, we may use an
4 It is possible to carry out the SP algorithms for multiuser detection with the help of some approximations that lower the computational complexity [24], but this results in significant performance degradation. We will revisit the SP algorithm in Chapter 4, which will help shed light on how to generate messages for joint detection and decoding using the turbo principle.
10
1.4. VARIATIONAL INFERENCE
K K equivalent alternative by replacing it with p({Ak }K 1 , E) = p(E|{Ak }1 )p({Ak }1 ), which is pro-
portional to p({Ak }K 1 |E) and is called the complete likelihood function. The variational free en K ergy (or Gibbs free energy) is therefore defined as follows, equaling D Q({Ak }K 1 )kp({Ak }1 |E) up to an additive constant: F=
Z
Q({Ak }K 1 ) log
Q({Ak }K 1 ) dA1 dA2 · · · dAK . K p(E|{Ak }1 )p({Ak }K 1 )
(1.5)
Notice that in (1.5), {Ak }K 1 are assumed to be continuous random variables. It is also
possible that {Ak }K 1 are discrete, in which case the integration would become a summation: F=
X
A1 ,··· ,AK
Q({Ak }K 1 ) log
Q({Ak }K 1 ) . K p(E|{Ak }1 )p({Ak }K 1 )
(1.6)
If we do not place any constraints on Q({Ak }K 1 ), by minimizing the variational free energy
K K over Q({Ak }K 1 ), we obtain Q({Ak }1 ) = p({Ak }1 |E) and F = − log p(E). This particular value
of F is called the exact Gibbs free energy since Q({Ak }K 1 ) attains the exact posterior. However,
K this Q({Ak }K 1 ) is not particular useful, since p({Ak }1 |E) is intractable by assumption.
The versatility of the variational inference approach comes in when we try to parameterize
Q({Ak }K 1 ) by assuming that it comes from a restricted family of distributions (for example,
a Gaussian). As such, we may find it much easier to obtain the closed form expression for K Q({Ak }K 1 ) which approximates p({Ak }1 |E) well. The variational free energy obtained this
way is an approximation to the exact Gibbs free energy, and is lower bounded by the exact Gibbs free energy.
To emphasize the fact that Q({Ak }K 1 ) belongs to a simpler ensemble of parameterized distri-
K butions, we may alternatively write Q({Ak }K 1 ) as Q({Ak }1 ; λ), indicating that this distribution
is parameterized by a set of parameters, represented by λ. Consequently, the task of variational inference becomes the task of optimizing F(λ) over λ.
We now provide an outline of the general procedure for deriving marginal distributions
through variational free energy minimization (VFEM): 1. Postulation: Assume a postulated distribution5 Q({Ak }K 1 ; λ); 2. Evaluation: Derive a closed-form expression for F(λ); 3. Optimization: Minimize F(λ) (exactly or iteratively) over λ. 5
K Later on we will show that p({Ak }K 1 ) and p(E|{Ak }1 ) may be adjusted to suit our needs as well.
11
1.4. VARIATIONAL INFERENCE
Note that we have now transformed the general inference problem into a well-defined optimization problem, with the variational free energy as the objective function. One important instance of the variational method arises when Q(A1 , A2 , · · · , AK ) is assumed to be factorizQ able as K k=1 Qk (Ak ) (we shall hereafter omit the subscript on Q for ease of notation), and we find independent functions Q(Ak ) that minimize the free energy. This factorization of a
distribution and the independence assumption associated with it is referred to as the mean-field approximation. Obviously, the mean-field approximation is rather crude. One way to make the approximation closer to the exact distribution is to introduce two-node beliefs in addition to one-node beliefs (allowing joint densities of pairs of variables in the postulated posterior distribution), i.e., writing the postulated posterior distribution as: Q(A1 , A2 , · · · , AK ) =
Y
(i,j)
Q(Ai , Aj )
Y
[Q(Ai )]1−qi ,
(1.7)
i
where (i, j) indicates connected variable nodes i and j in the factor graph, and qi is the number of nodes that are connected to node i. Postulating the posterior distribution using (1.7) is called the Bethe approximation [27, 28]. It is of special importance because when the factor graph is tree-like, the Bethe approximation is exact, meaning that minimizing the variational free energy given the Bethe approximation leads to the exact marginal distributions. This is much like the SP algorithm, which is also exact when the factor graph is tree-like. In fact, this similarity is no coincidence. The connection between variational inference and the SP algorithm is a very intricate one, as Yedidia et al. recently show – variational inference with Bethe approximation and the SP algorithm are equivalent [29]. The SP algorithm can be seen as an efficient algorithm to minimize the Bethe free energy subject to the normalization conditions of the postulated distributions. When higher-order beliefs (as opposed to only one-node or two-node beliefs) are incorporated, one obtains increasingly accurate approximations to the exact Gibbs free energy after the minimization of the variational free energy even when the factor graph is loopy. This is called the Kikuchi approximation. Utilizing the belief propagation understanding of the Bethe approximation, the Kikuchi approximation may be transformed into the generalized belief propagation [29]. It has been shown to be able to dramatically improve upon the conventional BP for some probabilistic models.
1.5. ORGANIZATION AND CONTRIBUTION OF THE THESIS
1.5
12
Organization and Contribution of the Thesis
Having taken a bird’s eye view of variational inference at an abstract level, it should be clear that the variational inference framework is immensely versatile and powerful. Yet the question of primary importance remains whether this tool is suitable for the particular problems of interest in this thesis. In the subsequent chapters, we will try to provide a positive answer to that. We will show that variational inference is indeed capable of generating efficient, low-complexity algorithms to undertake some very challenging detection/estimation tasks, even without using complicated techniques such as the Bethe or Kikuchi approximation. The organization of these chapters and the major contributions as a result of this endeavour are summarized below.
1.5.1
Part I: Mitigation of Frequency and Phase Distortion in OFDM
As described in Section 1.1, to alleviate the detrimental effects of CFO and PHN, a holistic receiver design must be implemented, with the CFO and PHN taken into account at both the channel estimation stage and the data detection stage. As the first step of the solution, in Chapter 2 we propose an optimal (maximum a posteriori) joint estimator for the channel impulse response, CFO and PHN. As such, accurate estimates of the channel and CFO can be obtained in the presence of PHN, and compensated for, before the subsequent data detection stage. PHN, being time-varying, cannot be removed based on the estimates in the channel estimation stage. Therefore, as the second step of the proposed solution, in Chapter 3 we derive a near-optimal joint estimator of the random PHN and the desired data signal through variational inference. This approach creates a paradigm shift for problems of this kind, where rather than generating “hard” estimates of the data and PHN profile, probability distributions for both data and PHN are found at the estimator output, which may be utilized for soft-input error control decoding.
1.5.2
Part II: Soft-In Soft-Out Detection in Multiple Access Channels
In Chapter 4 we explore the general problem of joint multiuser detection and decoding from the variational inference viewpoint. We concentrate on the design of soft-in soft-out detectors suitable for turbo receivers. In the absence of a general guiding principle, existing methods often do not allow for insights into design flaws and hence avenues for improvement. This thesis presents a comprehensive theory, centered on variational inference, that underlies SISO
13
1.6. NOTATIONS
detector designs. A unified treatment of this kind presents rigorous justifications for numerous detectors that were proposed on radically different grounds, and illuminates new and better alternatives. The framework can be applied to combat multiple access interference (MAI), inter-symbol interference (ISI), and interference in the multiple-antenna (MIMO) environment. We extend this framework to SISO detection of Gray coded multilevel modulations in Chapter 5. The conventional approach requires the detection of channel symbols first before converting them back to bits for decoding. To avoid the performance loss inherent in such a two-step process, we introduce a bit-level strategy, where the soft information about the bits that make up the symbols are directly estimated. This method combines the detection and demapping operations into one, thereby outperforming the symbol-based method substantially in certain channel conditions.
1.6
Notations
Upper and lower case bold face letters indicate matrices and column vectors, respectively; (·)T and (·)H denote transpose and Hermitian transpose, respectively; 1 and 0 represent the all-one and all-zero column vectors, respectively; X ◦ Y stands for the Schur product (element-wise Q product) of matrices X and Y; the notation 6 N n=1 Xn indicates a series of Schur products, i.e.,
X1 ◦ X2 ◦ · · · ◦ XN ; diag(x) is a diagonal matrix with the vector x on its diagonal; diag(X) is
a diagonal matrix with the diagonal elements of square matrix X on its diagonal; E[x] stands for the expected value of a random vector x, and cov[x, y] = E (x − E[x])(y − E[y])T ; V[x] =
cov[x, x] denotes the covariance matrix of x; N (µ, Σ) and CN (µ, Σ) represent respectively
the probability density functions of real and circularly symmetric complex Gaussian random vectors with mean µ and covariance matrix Σ. In particular, for an N -dimensional circularly symmetric complex Gaussian random vector x CN (µ, Σ) =
1 π N |Σ|
exp −(x − µ)H Σ−1 (x − µ) .
(1.8)
Part I
Mitigation of Frequency and Phase Distortion in OFDM
14
Chapter 2
Channel Estimation in the Presence of Frequency Offset and Phase Noise This chapter aims to obtain accurate channel estimates in the presence of frequency and phase distortions in preparation for the subsequent data detection stage. This is crucial because, although accurate channel estimates are easily obtained under the assumption of perfect phase and frequency synchronization, PHN and CFO create substantial inter-carrier interference that a conventional OFDM channel estimator cannot account for. In this chapter we will introduce an optimal (maximum a posteriori) joint estimator for the channel impulse response (CIR), CFO and PHN, utilizing prior statistical knowledge of PHN that can be obtained from measurements or data sheets. In addition, in cases where a training symbol consists of two identical halves in the time domain, we propose an algorithm that optimally removes the effect of PHN with lower complexity than with a non-repeating training symbol. To further reduce the complexity, simplified implementations based on the conjugate gradient (CG) method are also introduced to realize the proposed algorithms using the Fast Fourier Transform (FFT) with only minor performance degradation.
2.1
Introduction
In [30], the effect of PHN on the system performance was studied and it was found that OFDM is orders of magnitude more sensitive to PHN than a single carrier system. Tomba [31] provides a more detailed treatment on the OFDM error probability in the presence of PHN for different modulation schemes. Other works that have documented the detrimental effects of frequency 15
2.1. INTRODUCTION
16
offset and phase noise are numerous [32–38]. However, the successful alleviation of these combined problems based on a statistically optimal receiver implementation, which must include channel estimation in the presence of CFO and PHN, have not been proposed. In most cases, the channel frequency response was assumed to be known prior to PHN suppression, which is obviously an over-simplified assumption. Recognizing the importance of obtaining accurate channel estimates in the presence of PHN, some earlier research was conducted, but the resulting solutions are rather limited. In [39] PHN was considered in the formulation of the channel estimation problem but was not directly used in the solution and thus the method is not statistically optimal. In [40] channel estimation was performed but first the PHN was estimated using at least one “carrier recovery” pilot tone that required frequency guard bands on both sides to minimize interference from data symbols. Only specific frequency selective channels were considered in the simulations, so the performance in more general Rayleigh frequency selective channels is unknown but it is expected that performance will degrade if a channel null should occur in the vicinity of the pilot tone. In [41], PHN was estimated using pilot symbols based on a linearized parametric model, but it is suboptimal because the inter-carrier interference (ICI) introduced by the PHN in the received signal is ignored. In [42], a precoding method was proposed which used null guard bands in the frequency domain. The method is designed for band-limited multiplicative effects but was also tested on the Wiener phase noise model with some success. However, it requires expensive operations such as singular value decomposition (SVD) and null guard bands which reduce spectral efficiency. In this chapter, our goal is to tackle the channel estimation problem when CFO and PHN are present with optimality in mind, in particular, through the maximization of a “complete likelihood function”1 . Special features of the complete likelihood function are taken advantage of to enable a unique and elegant joint estimation scheme achieving statistically optimal performance. The rest of the chapter will be organized as follows: Section 2.2 discusses the power spectral density (PSD) of the VCO output in connection with the prior distribution of the PHN process and presents the signal model of a CFO/PHN channel. Section 2.3 derives the Joint CFO/PHN/CIR Estimator (JCPCE) which performs accurate channel estimation even with 1
The complete likelihood function is the joint distribution of the observed variables (e.g., r) and latent variables (e.g., ǫ, θ, g) in the probabilistic model [43, ch. 11].
17
2.2. SYSTEM DESCRIPTION
CFO and PHN impairments. Section 2.4 considers a special case when repeating training symbols are available and introduces a variant to Moose’s CFO estimation algorithm [44] which optimally rejects the effects of PHN. The channel impulse response is subsequently estimated by optimally cancelling the remaining PHN. Section 2.5 presents simulations of the proposed algorithms and tests their robustness in a wide range of scenarios. Section 2.6 contains the summary of the chapter.
2.2 2.2.1
System Description Prior Statistics of Phase Noise
Two different models of PHN are available in the literature [32]. The first one models a freerunning oscillator and assumes the PHN process to be a Wiener process that is nonstationary, with its power growing with time. The second one models an oscillator controlled by a phaselocked loop (PLL) and approximates the PHN process as a zero-mean coloured Gaussian process that is wide sense stationary (WSS) and has finite-power. In this chapter, our solution covers both scenarios. For simplicity, we will refer to the first one as Wiener PHN and the second one as Gaussian PHN. In both cases, denoting the phase noise process at the output of the VCO by θ(t), the samples of θ(t) within an OFDM symbol, θ, have a multivariate Gaussian prior distribution: p(θ) = N (0, Φ), where the samples are taken at a rate of N/T samples per second, N is the number of OFDM sub-carriers, and T is the period of the OFDM symbol. For this model to be useful, however, the covariance matrix, Φ, must be available. In the rest of this section we explain how Φ can be determined from the power spectral density of the VCO output. We first write the output of the VCO with PHN as2 s(t) = ej(2πfo t+θ(t)) . Then the autocorrelation function of s(t), Rs (τ ), can be calculated: . Rs (τ ) = E{s∗ (t)s(t + τ )}
= ej2πfo τ E{ej(θ(t+τ )−θ(t)) }.
(2.1)
We will now briefly describe the autocorrelation functions of s(t) for Wiener and Gaussian PHN respectively. 2
This is equivalent to considering real-valued sinusoids. See [45, pgs. 369–372] for details.
18
2.2. SYSTEM DESCRIPTION
Wiener Phase Noise The Wiener PHN process θ(t) =
Rt 0
φ(τ )dτ , where φ(t) is a zero-mean stationary Gaussian
process with autocorrelation function Rφ (τ ) = 4πβl δ(τ ). It is known that [30] Rs (τ ) = ej2πfo τ e−2πβl |τ | 1 1 Ss (f ) = πβ 2 l 1+((f −f )/β ) o
(2.2)
l
where Ss (f ) = F{Rs (τ )} denotes the power spectral density. So βl is the 3dB bandwidth of the VCO output which can be easily measured using a spectrum analyzer. The discrete-
time samples of θ(t) form a random-walk process θn = θn−1 + φn , n = 0, · · · , N − 1, where
p(φn ) = N (0, α2φ ), α2φ = 4πβl T /N . Assuming θ−1 = 0 due to perfect synchronization at the
beginning of the OFDM symbol, the Gaussian-distributed PHN vector θ = [θ0 , · · · , θN −1 ]T has
a covariance matrix
Φ = α2φ
1 1 ···
1 2 ··· .. .. . . . . . 1 2
1
2 . N
(2.3)
Gaussian Phase Noise In this case θ(t) is modelled as a stationary random process with autocorrelation function Rθ (τ ). It is known that [46] Rs (τ ) = ej2πfo τ e−Rθ (0) eRθ (τ ) ≈ ej2πfo τ e−Rθ (0) (1 + Rθ (τ )).
(2.4)
where the approximation is tight when Rθ (0) ≪ 1 (since |Rθ (τ )| ≤ Rθ (0)). This is a common
assumption made about the PHN process in coherent receivers. Exact analysis of the power spectral density is found in [46] but for our purposes it is enough to use the above approximation to obtain Ss (f ) = e−Rθ (0) [δ(f − fo ) + Sθ (f − fo )],
(2.5)
where Sθ (f ) = F{Rθ (τ )}. The shape of Ss (f ) may be measured by a spectrum analyzer or
provided as part of the VCO specifications (phase noise masks are commonly known) and hence
19
2.2. SYSTEM DESCRIPTION
-
Preamble Section
AGC & Synchronization
-
Payload Section DATA Variable Number of OFDM Symbols
Channel Estimation
Figure 2.1: OFDM packet structure.
Sθ (f ) and Rθ (f ) can be found. Finally the value on the ith row and jth column of Φ is Φi,j = Rθ
T |i−j | N
,
(2.6)
since T /N is the sampling period. In the subsequent derivations, since both types of PHN can be sufficiently characterized by the covariance matrix Φ, we shall not distinguish between the two unless required.
2.2.2
Signal Model
We consider a slow fading frequency-selective channel where the channel impulse response is assumed to remain constant during each packet of transmission which consists of multiple OFDM symbols including the initial preambles for synchronization and channel estimation as well as the variable-length payload that follows (as depicted in Fig. 2.1). Assuming perfect timing synchronization, the complex baseband received signal of an OFDM symbol within the training period sampled at rate N/T can be written as: N −1 X 1 rn = √ ej(θn +2πǫn/N ) hk dk ej2πnk/N + ηn , n = 0, · · · , N − 1, N k=0
(2.7)
−1 N −1 where ǫ = ∆f T is the normalized CFO; {θn }N n=0 is the discrete-time PHN sequence; {hk }k=0
−1 is the channel frequency response at subcarriers 0 to N − 1; {dk }N k=0 are the transmitted data −1 symbols belonging to an M -QAM constellation; and {ηn }N n=0 is circularly symmetric complex
white Gaussian noise with variance σ 2 per dimension. Equation (2.7) may be written in matrix form as: r = EPFH Hd + n, where F ∈ CN ×N is the DFT matrix with the (l, m)th element being Fl,m =
(2.8) 2π(l−1)(m−1) √1 e−j N ; N
20
2.2. SYSTEM DESCRIPTION
d = [d0 , · · · , dN −1 ]T is the data vector; n = [η0 , · · · , ηN −1 ]T is the noise vector with distribution p(n) = CN (0, 2σ 2 I); P = diag([ejθ0 , · · · , ejθN−1 ]T ) is the PHN matrix; E = diag([1, ej2πǫ/N , · · · , ej2π(N −1)ǫ/N ]T ) is the CFO matrix; and H = diag(h) = diag([h0 , · · · , hN −1 ]T ) is the channel
matrix. Notice that although a full OFDM symbol contains Ng + N time samples, Ng being the length of the cyclic prefix, in this signal model we assume the cyclic prefix has been removed and so there are only N samples per OFDM symbol. We may rewrite (2.8) as r = EPGFH d + n,
(2.9)
where G is defined as G = FH HF and is a circulant matrix. Using g = [g0 , · · · , gL−1 ]T
to denote the channel impulse response, where L is the channel length, the channel impulse response can be converted to the channel frequency response by writing h = Wg ∈ CN ×1 . √ Note that g/ N is the true channel impulse response, but for simplicity we shall also assign the name to g. W is a partition of the DFT matrix, i.e., F = [W | V] ,
(2.10)
in which W ∈ CN ×L and V ∈ CN ×(N −L) are orthogonal unitary matrices satisfying WH V = 0
and WWH + VVH = I.
Let D = diag(d). We can now introduce the following equivalent representation of (2.8) for the convenience of channel estimation: r = EPFH DWg + n.
2.2.3
(2.11)
Role of Channel Estimation in Receiver Design
The transmitter/receiver structure, and the channel model over a period of one OFDM symbol, taking into account the distortion caused by CFO and PHN, is illustrated in Fig. 2.2. In many OFDM standards, such as IEEE 802.11a and HiperLAN2, there are mainly two types of physical layer symbols: the preamble symbols and the payload symbols. Issues such as automatic gain control (AGC), frequency offset correction and timing synchronization are resolved by the earlier portion of preamble symbols, while fine frequency tuning and channel estimation are performed by the later portion of preamble symbols. Timing synchronization is commonly performed using autocorrelation based metrics [47, 48] which are insensitive to CFO and PHN disturbances. At the receiver, when the preamble symbols used for channel estimation are received, it is assumed
21
2.2. SYSTEM DESCRIPTION
b - Mod.
d-
.. .
S/P
-
ˆ ˆ Demod. d b
.. .
P/S
Channel Estimation Module
.. .
-
EQ & Detec.
Data Detection Module
Cyclic Prefix
.. .
IDFT -
-
Remove Cyclic Prefix
ǫˆ,ˆ g
P/S
- Channel
-
r
.. .
g
6
n
.. .
S/P
? +n
CFO & PHN
x
? ×n
r ?
JCPCE
Figure 2.2: OFDM transmitter/receiver structure and phase noise channel model. The channel is estimated using the proposed Joint CFO/PHN/CIR Estimator (JCPCE).
2.3. CHANNEL ESTIMATION WITH CFO AND PHN
22
that perfect timing synchronization is achieved and the d vector is exactly known. On the other hand, when the payload is received, it can be assumed that the channel is known (except for possible fine tuning using the embedded pilot symbols) and the data symbols are to be detected. A diagram exemplifying this packet structure is given in Fig. 2.1. From (2.11), in the absence of the CFO term E and PHN term P, it is easy to obtain the optimal estimate of g given the training symbols d and the received signal r in the channel estimation phase. In the data detection phase, we assume perfect knowledge of g and simply extract the maximum-likelihood estimate of d. Unfortunately, this simple procedure is not possible in a CFO/PHN channel, as the presence of E and P first renders the channel estimate inaccurate and later (through a different PHN sequence) impairs the performance of data detection. This chapter will focus on obtaining accurate channel and CFO estimates in the presence of PHN. Our design methodology goes as follows: we will start by introducing an optimal joint CFO/PHN/CIR estimator. Then, we seek means to decrease the complexity of the optimal estimator by transmitting repeating training symbols and also by using the conjugate gradient method. We will be dealing with the data detection problem in the next chapter, where with the accurate estimate of the channel and CFO (which are quasi-static), the data detection stage only suffers from the unknown PHN distortion (which is time-varying).
2.3 2.3.1
Channel Estimation with CFO and PHN Joint CFO/PHN/CIR Estimator (JCPCE)
An OFDM channel estimator may either estimate the channel frequency response (h) or the channel impulse response (g). In this work, we shall assume that we are always interested in the channel impulse response g, since its lower dimensionality leads to welcome computational savings as well as lower variance for the obtained estimates. Furthermore, we assume that the tap length of the impulse response (or equivalently, the dimension of g), L, is known perfectly a priori 3 . Looking at (2.11), it is obvious that the optimal estimates for E, P and g are coupled and in general difficult to obtain. But as the following derivation shows, we are very fortunate in this case as the joint optimization problem can in fact be decoupled. 3
In practice, if L is over-estimated, the mean-squared error of the channel estimate would degrade by a constant amount. On the other hand, if L is significantly under-estimated, the channel estimation may fail catastrophically.
2.3. CHANNEL ESTIMATION WITH CFO AND PHN
23
Taking the Bayesian approach, we first write the “complete likelihood function” p(r, ǫ, θ, g) = p(r|ǫ, θ, g)p(ǫ)p(θ)p(g), which is proportional to the a posteriori distribution p(ǫ, θ, g|r). p(ǫ) and p(g) are constants (representing non-informative priors) as no prior knowledge of ǫ and g is assumed. Also, we have assumed in Section 2.2.1 that the prior distribution of θ is N (0, Φ),
where Φ is known. The “complete negative log-likehood function” can therefore be written as L(ǫ, θ, g) = − log p(r|ǫ, θ, g) − log p(θ) =
1 (r 2σ2
− EPFH DWg)H (r − EPFH DWg) + 12 θ T Φ−1 θ.
(2.12)
Our objective is to find the optimal estimates ˆ g ˆ ) = arg min L(ǫ, θ, g). (ˆ ǫ, θ, ǫ,θ,g
(2.13)
Forward Substitution Solving ∂L(ǫ, θ, g)/∂g∗ = 0 produces the optimal channel estimate in terms of ǫ and θ ˆ = WH DH DW g
−1
WH DH FPH EH r.
(2.14)
Note that when EP = I, this is the expression for the conventional maximum-likelihood channel estimator without PHN and CFO. It shall be assumed hereafter that constant-modulus training symbols are used, i.e., DH D = 2ρ2 I. This assumption is reasonable for a practical channel estimator, and it simplifies the expressions in subsequent derivations4 . This assumption leads to ˆ = (2ρ2 )−1 WH DH FPH EH r. g
(2.15)
Noticing that r − EPFH DWˆ g = r − (2ρ2 )−1 EPFH DWWH DH FPH EH r
= r − (2ρ2 )−1 EPFH D(I − VVH )DH FPH EH r = r − (I − (2ρ2 )−1 EPFH DVVH DH FPH EH )r
(2.16)
= (2ρ2 )−1 EPFH DVVH DH FPH EH r, 4
The extension to non-constant-modulus training symbols simply requires the use of (2.14) instead of (2.15) in the subsequent steps.
2.3. CHANNEL ESTIMATION WITH CFO AND PHN
24
and substituting (2.16) into (2.12), we have after simplification L(ǫ, θ) = =
1 rH EPFH DVVH DH FPH EH r + 12 θ T Φ−1 θ 4σ2 ρ2 H H H 1 1 T −1 H ∗ uT E |RH F{z DV} V 4σ2 ρ2 {z FR} E u + 2 θ Φ θ, | D C CH
(2.17)
where R = diag(r) and u = [ejθ0 , · · · , ejθN−1 ]T . Realizing that for small θ, u ≈ 1 + jθ and letting C = RH FH DV, L(ǫ, θ)
≈ 4σ12 ρ2 (1 + jθ)T ECCH EH (1 − jθ) + 12 θ T Φ−1 θ = 4σ12 ρ2 θ T ECCH EH θ + 2σ 2 ρ2 θ T Φ−1 θ − j1T ECCH EH θ + jθ T ECCH EH 1 + 1T ECCH EH 1 = 4σ12 ρ2 θ T Re(ECCH EH )θ + 2σ 2 ρ2 θ T Φ−1 θ − 2θ T Im(ECCH EH )1 + 1T ECCH EH 1 . (2.18) Note that the last equality holds for real valued θ. Solving ∂L(ǫ, θ)/∂θ = 0 gives us the optimal estimate of θ in terms of ǫ ˆ = [Re(ECCH EH ) + 2σ 2 ρ2 Φ−1 ]−1 Im(ECCH EH )1. θ
(2.19)
Substituting (2.19) into (2.18) and simplifying, we have, after scaling by a constant,
L(ǫ) = −1T Im(ECCH EH )T [Re(ECCH EH ) + 2σ 2 ρ2 Φ−1 ]−1 Im(ECCH EH )1 + 1T ECCH EH 1. (2.20)
Hence by searching over a range of feasible values of ǫ, we may find the optimal estimate of ǫ ǫˆ = arg min L(ǫ). ǫ
(2.21)
In the absence of PHN, i.e., with θ = 0 in (2.11), L(ǫ) = 1T ECCH EH 1 would be the metric for CFO estimation in the joint estimator for g and ǫ.
Backward Substitution ˆ can be determined by substituting E = E ˆ the values of θ ˆ After finding ǫˆ and correspondingly E, into (2.19):
25
2.3. CHANNEL ESTIMATION WITH CFO AND PHN
Table 2.1: Joint CFO/PHN/CIR Estimator (JCPCE). Step 1: ǫˆ = arg minǫ 1T ECCH EH 1 − 1T Im(ECCH EH )T
×[Re(ECCH EH ) + 2σ 2 ρ2 Φ−1 ]−1 Im(ECCH EH )1; ˆ = diag([1, ej2πˆǫ/N , · · · , ej2π(N −1)ˆǫ/N ]T ); E
HE HE ˆ = [Re(ECC ˆ ˆ H ) + 2σ 2 ρ2 Φ−1 ]−1 Im(ECC ˆ ˆ H )1; Step 2: θ ˆ = diag([ej θˆ0 , · · · , ej θˆN−1 ]T ); P
ˆHE ˆ H r. ˆ = (2ρ2 )−1 WH DH FP Step 3: g
H ˆH H ˆH ˆ = [Re(ECC ˆ ˆ θ E ) + 2σ 2 ρ2 Φ−1 ]−1 Im(ECC E )1.
(2.22)
ˆ and substituting P = P ˆ = diag(exp(j θ)) ˆ into (2.15), the optimal channel estimate Letting P after removing the CFO and PHN is therefore: ˆ HE ˆ H r. ˆ = (2ρ2 )−1 WH DH FP g
(2.23)
We summarize the complete JCPCE algorithm in Table 2.1. Note that the closed-form expressions for the jointly optimal estimates for ǫ, θ and g are due to the unitary property of CFO and PHN matrices: EH E = I and PH P = I. This property is also utilized in [49] to establish the optimality of MUSIC-based CFO estimation in OFDM.
2.3.2
Complexity Analysis and Low Complexity Implementation
In an OFDM system, computational complexity is a critical issue, since the use of FFT implies a low complexity order of O(N log N ). In the implementation of the JCPCE, the main computational tasks reside in evaluating equations (2.20), (2.22) and (2.23). We will now investigate
the complexity of each computation and seek means to reduce it. ˆ and E ˆ are diagonal matrices, while F and WH are DFT or In (2.23), we see that D, P partial DFT matrices. Thus each step of matrix-vector multiplication has a complexity order of O(N log N ) or less.
The more challenging task is (2.22), which involves a matrix inversion requiring in general
a complexity order of O(N 3 ). However, as we will show in the following, with the help of the conjugate gradient method [50], we are able to lower the complexity to an acceptable level.
26
2.3. CHANNEL ESTIMATION WITH CFO AND PHN
Table 2.2: Conjugate Gradient algorithm for evaluating (2.22). ˆ0 = 0 Initialization: θ HE ˆ 0 − q = −q ˆ ˆ H ) + Ψ−1 ]θ γ 0 = [Re(ECC ν 0 = −γ 0 = q For k =0:i−1 H H ˆH −1 ˆ αk = γ H k γ k /(ν k [Re(ECC E ) + Ψ ]ν k ) ˆ ˆ θ k+1 = θ k + αk ν k HE ˆ ˆ H ) + Ψ−1 ]ν k γ k+1 = γ k + αk [Re(ECC βk+1 =
γH k+1 γ k+1 γH k γk
ν k+1 = −γ k+1 + βk+1 ν k
End
Wiener Phase Noise The inverse of Wiener PHN covariance matrix Φ has a convenient tridiagonal structure [51]. If we let Ψ =
1 Φ, 2σ2 ρ2
Ψ−1 = 2σ 2 ρ2 Φ−1 can be written as:
Ψ−1
2 −1
−1 2 2 2σ ρ = α2φ 0
0
.. . . −1 2 −1 −1 1
2 −1 .. .. . .
(2.24)
HE ˆ ˆ H )1, where q can be computed efficiently using FFT since all matrices Let q = Im(ECC
involved in calculating q are either diagonal or DFT (or partial DFT) matrices. The evaluation HE ˆ = q. This ˆ ˆ H ) + Ψ−1 ]θ of (2.22) is now equivalent to solving a linear equation [Re(ECC problem can be easily tackled by the conjugate gradient method. The complete algorithm is presented in Table 2.2. Of all the operations in Table 2.2, the dominant complexity is associated with the matrixHE ˆ ˆ H ) + Ψ−1 ]ν k . Thanks to the tridiagonal form of Ψ−1 , this vector multiplication [Re(ECC HE ˆ ˆ H ) + Ψ−1 ]ν k requires 7N + can be performed easily. More specifically, evaluating [Re(ECC
6N log N operations. Thus, the overall complexity of every iteration of the conjugate gradient algorithm is O(N log N ). The conjugate gradient algorithm requires a maximum of N iterations
to converge to the exact solution. But our simulations show that the number of iterations required for good estimation performance is much smaller than N . In conclusion, the complexity
27
2.3. CHANNEL ESTIMATION WITH CFO AND PHN
of evaluating (2.22) is O(iN log N ), where i is the number of iterations in the conjugate gradient algorithm.
Gaussian Phase Noise In the case of Gaussian PHN, we notice that Ψ, as a Toeplitz matrix, can be approximated by ˜ [52] according to this simple result: a circulant matrix Ψ Theorem 1 The best circulant approximation to a symmetric Toeplitz matrix Ψ ∈ CN ×N , ˜ ∈ CN ×N , in the sense of minimizing the Frobenius norm kΨ − Ψk ˜ F , is a circulant matrix Ψ T ˜ = [ψ˜0 , · · · , ψ˜N −1 ] has entries whose first row ψ (N − i)ψi + iψN −i ψ˜i = , N
(2.25)
where ψ T = [ψ0 , · · · , ψN −1 ] is the first row of Ψ. This operation has complexity of O(N ). Proof: See [52]. It can be shown that this approximation is asymptotically exact as N → ∞ for an autocor-
relation matrix Ψ of a first-order autoregressive process, which is a good fit to the Gaussian PHN process assumed in [53]. Being a circulant matrix, the eigenvalue decomposition (EVD) ˜ −1 = FΛ−1 FH , where Λ ˜ is a diagonal matrix. It is well-known that ˜ is FΛ ˜ FH and Ψ of Ψ ˜ Ψ Ψ Ψ √ ˜ Replacing Ψ by Ψ, ˜ the simplification ˜ 1 ), where ϕ ˜ 1 is the first column of Ψ. ΛΨ˜ = diag( N FH ϕ for (2.22) becomes H ˆH H ˆH ˆ = [Re(ECC ˆ ˜ −1 ]−1 Im(ECC ˆ θ E )+Ψ E )1.
(2.26)
This problem can be treated similar to the Wiener PHN case using the conjugate gradient ˜ It can be shown that the overall complexity is method in Table 2.2 by replacing Ψ with Ψ. again O(iN log N ). The practical value of i is investigated by simulations in Section 2.5 and is found to be about five to ten.
The computation of (2.20) can be done almost identically as (2.22) using the circulant approximation of Ψ and the conjugate gradient algorithm. However, a crucial drawback of the search method for finding the optimal ǫˆ is that the complexity scales with the inverse of the resolution required for ǫˆ. Certainly, the importance of this problem is offset by the fact that CFO synchronization needs only be done once per data packet, but it is still advantageous if further simplifications can be made.
28
2.4. CFO ESTIMATION BASED ON REPEATED TRAINING SYMBOLS
2.4
CFO Estimation Based on Repeated Training Symbols
For cases where the system has very limited computational power, it is beneficial to obtain a closed form solution for ǫ. When no PHN is present, the pioneering work of Moose [44] achieves just that by assuming the two halves of a training symbol are identical.
2.4.1
Moose’s CFO Estimator
In the Moose algorithm [44], we transmit an OFDM symbol with two identical halves in the time domain. Such a signal is easily generated [48] by transmitting N/2 training symbols d0 , · · · , dN/2−1 on the even sub-carriers, and zeros on the odd sub-carriers. The N -point sequence in time at the receiver, with CFO and PHN distortion, can be written as
for n = 0, · · · , N − 1.
N/2−1 X 1 j(θn +2πǫn/N ) rn = p e hk dk ej4πnk/N + ηn , N/2 k=0
(2.27)
We shall first assume no PHN, i.e., θn = 0 for n = 0, · · · , N −1. Denoting r1 = [r0 , · · · , rN/2−1 ]T
and r2 = [rN/2 , · · · , rN −1 ]T , we have
r1 = x + n1 ; r2 = ejπǫ x + n2 , N
(2.28)
N
where x = EGFH d ∈ C 2 ×1 , and FH d ∈ C 2 ×1 is the training symbol that is transmitted
twice consecutively, n1 ∼ CN (0, 2σ 2 I) and n2 ∼ CN (0, 2σ 2 I) are independent additive noise
vectors. Here the CFO matrix E, channel circular convolution matrix G and DFT matrix F follow similar definitions as before but are only half the size. The optimal estimate of ǫ is ǫˆ = arg max p(r1 , r2 |ǫ) = arg max p(r2 |ǫ, r1 )p(r1 |ǫ), ǫ
ǫ
(2.29)
which reduces to ǫˆ = arg maxǫ p(r2 |ǫ, r1 ) if we assume that p(r1 |ǫ) = p(r1 ) (this is an approximation since r1 and ǫ are in general not independent). Notice that r2 = ejπǫ r1 − ejπǫ n1 + n2 = ejπǫ r1 + z,
(2.30)
where p(z) = CN (0, 4σ 2 I), because the instantaneous value of z depends on ǫ, but the statistics
2.4. CFO ESTIMATION BASED ON REPEATED TRAINING SYMBOLS
29
do not. We then have p(r2 |ǫ, r1 ) = CN (ejπǫ r1 , 4σ 2 I). Therefore, the negative log-likelihood function becomes
− log p(r2 |ǫ, r1 ) =
1 (r2 − ejπǫ r1 )H (r2 − ejπǫ r1 ). 4σ 2
And it follows that ǫ= ˆ
1 H ∡r r2 , π 1
(2.31)
(2.32)
where ∡x denotes the phase angle of a complex number x.
2.4.2
CFO Estimator with PHN Rejection
In the presence of PHN, the derivation presented above fails because (2.30) no longer holds. We propose, in the following, a CFO estimation algorithm that optimally accounts for PHN. Rewriting (2.28) to include the PHN distortion, we have r1 = P1 x + n1 ; r2 = ejπǫ P2 x + n2 ,
(2.33)
where P1 and P2 contain consecutive PHN sequences θ 1 and θ 2 . The optimal estimate of ǫ is then ǫ = arg max p(r1 , r2 |ǫ) = arg max ˆ ǫ
ǫ
Z
θ2
Z
θ1
p(r1 , r2 , θ 1 , θ 2 |ǫ)dθ 1 dθ 2
(2.34)
where p(r1 , r2 , θ 1 , θ 2 |ǫ) = p(r1 , r2 |ǫ, θ 1 , θ 2 )p(θ 1 , θ 2 )
= p(r2 |, r1 , ǫ, θ 1 , θ 2 )p(r1 |ǫ, θ 1 , θ 2 )p(θ 1 , θ 2 ).
(2.35)
Assuming p(r1 |ǫ, θ 1 , θ 2 ) = p(r1 ) as before, it follows that ǫˆ = arg maxǫ
θ2
R
θ1
p(r2 |r1 , ǫ, θ 1 , θ 2 )p(θ 1 , θ 2 )dθ 1 dθ 2 .
(2.36)
N
Denoting the “differential PHN” sequence θ ∆ = θ 2 −θ 1 ∈ R 2 ×1 , and P∆ = diag([ejθ∆(0) , · · · ,
jθ∆( N −1) T 2 ] ),
e
R
r2 can be written in terms of r1 as
r2 = ejπǫ P∆ r1 − ejπǫ P∆ n1 + n2 = ejπǫ P∆ r1 + z,
(2.37)
where p(z) = CN (0, 4σ 2 I). In other words, p(r2 |r1 , ǫ, θ 1 , θ 2 ) = CN (ej2πǫ P∆ r1 , 4σ 2 I). This means that p(r2 |r1 , ǫ, θ 1 , θ 2 ) is only a function of θ ∆ instead of θ 1 and θ 2 individually. We may
30
2.4. CFO ESTIMATION BASED ON REPEATED TRAINING SYMBOLS
therefore rewrite (2.36) as ǫˆ = arg maxǫ
R
θ∆
p(r2 |r1 , ǫ, θ ∆ )p(θ ∆ )dθ ∆
(2.38)
= arg maxǫ p(r2 |r1 , ǫ).
Lemma 1 If [θ T1 , θ T2 ]T ∈ RN ×1 is a jointly Gaussian random vector with distribution N (0, Φ), where Φ ∈ RN ×N can be partitioned into four Φ=
"
N 2
×
N 2
blocks:
ΦN/2
Υ
ΥT
ΦN/2
#
,
(2.39)
then p(θ ∆ ) = p(θ 2 − θ 1 ) = N (0, 2ΦN/2 − Υ − ΥT ). Proof: From θ ∆ = θ 2 − θ 1 = [ −I I ] and p
"
θ1 θ2
#!
=N
0,
"
"
θ1 θ2
#
ΦN/2
Υ
ΥT
ΦN/2
(2.40)
#!
,
(2.41)
we obtain p
θ∆
= N
[ −I I ]
"
0 0
#
, [ −I I ]
"
ΦN/2
Υ
ΥT
ΦN/2
#"
−I I
#!
(2.42)
= N 0, 2ΦN/2 − Υ − ΥT .
. Denoting Φ∆ = 2ΦN/2 − Υ − ΥT , we may write p(θ ∆ ) = N (0, Φ∆ ). Finally, we use the
following lemma to evaluate p(r2 |r1 , ǫ) in (2.38).
Lemma 2 Given p(r2 |r1 , ǫ, θ ∆ ) = CN (ejπǫ P∆ r1 , 4σ 2 I) and p(θ ∆ ) = N (0, Φ∆ ), then 2 p(r2 |r1 , ǫ) = CN (ejπǫ r1 , R1 Φ∆ RH 1 + 4σ I),
(2.43)
where R1 = diag(r1 ). Proof: Using the Iterated Expectation Theorem [43, ch. 14] and its analog in covariance, given a Gaussian distributed x and a Gaussian conditional distribution for y|x, the marginal
2.4. CFO ESTIMATION BASED ON REPEATED TRAINING SYMBOLS
31
distribution of y (determined by the expected value E(y) and the covariance matrix V(y)) is also Gaussian and is related to the conditional distribution by E(y) = Ex Ey (y|x) V(y) = Vx (Ey (y|x)) + Ex (Vy (y|x)).
(2.44)
Applied to the conditional distribution p(r2 |r1 , ǫ, θ ∆ ), we have E(r2 |r1 , ǫ) = Eθ∆ (Er2 (r2 |r1 , ǫ, θ ∆ ))
V(r2 |r1 , ǫ) = Vθ∆ (Er2 (r2 |r1 , ǫ, θ ∆ )) + Eθ∆ (Vr2 (r2 |r1 , ǫ, θ ∆ )).
(2.45)
Also we have Er2 (r2 |r1 , ǫ, θ ∆ ) = ejπǫ P∆ r1 ≈ ejπǫ R1 (1 + jθ ∆ );
Vr2 (r2 |r1 , ǫ, θ ∆ ) = 4σ 2 I.
(2.46)
Therefore, after simple matrix algebra we obtain Eθ∆ (Er2 (r2 |r1 , ǫ, θ ∆ )) = ejπǫ r1 ;
Vθ∆ (Er2 (r2 |r1 , ǫ, θ ∆ )) = R1 Φ∆ RH 1 ;
Eθ∆ (Vr2 (r2 |r1 , ǫ, θ ∆ )) =
(2.47)
4σ 2 I.
We then readily arrive at our final result 2 p(r2 |r1 , ǫ) = CN (ejπǫ r1 , R1 Φ∆ RH 1 + 4σ I).
(2.48)
From (2.38) and Lemma 2, we see that ǫˆ = arg maxǫ log p(r2 |r1 , ǫ)
2 −1 jπǫ r ) = arg minǫ (r2 − ejπǫ r1 )H (R1 Φ∆ RH 1 1 + 4σ I) (r2 − e
=
1 π∡
(2.49)
H 2 −1 rH 1 (R1 Φ∆ R1 + 4σ I) r2 .
This expression is very similar to (2.32) except for a weighting matrix that accounts for the distortion caused by PHN. (Note that (2.49) represents a novel CFO estimation scheme in the PHN channel.)
32
2.4. CFO ESTIMATION BASED ON REPEATED TRAINING SYMBOLS
2.4.3
Joint Phase Noise and Channel Impulse Response Estimation
With the CFO estimated using (2.49), we now turn to the remaining channel estimation issue in the presence of PHN. Because of the special structure of the repeating training symbol, we are required to re-derive the optimal joint PHN/CIR estimation algorithm. Expressing (2.27) in the matrix form yields ˆ F ˇ H DWg + n, r = EP
(2.50)
ˆ ∈ CN ×N where r = [rT1 , rT2 ]T ∈ CN ×1 is the time-domain received repeating training symbol. E ˇ = [F, F] ∈ is the CFO matrix already estimated; P ∈ CN ×N is the unknown PHN matrix; F
CN/2×N is the cascade of two DFT matrices; D = diag(d) ∈ CN/2×N/2 contains the length-N/2 training symbol; g ∈ CL×1 is the channel impulse response.
Similar to (2.12), we obtain the “complete negative log-likelihood function”: L(θ, g) =
1 ˆ F ˇ H DWg)H (r − EP ˆ F ˇ H DWg) + 1 θ T Φ−1 θ. (r − EP 2 2σ 2
(2.51)
Solving ∂L(θ, g)/∂g∗ = 0 produces the optimal channel estimate of g in terms of θ ˇ HE ˆ H r. ˆ = (4ρ2 )−1 WH DH FP g
(2.52)
Noticing that ˆ F ˇ H DWˆ ˆ F ˇ H DWˆ (r − EP g)H (r − EP g) H 2 −1 H H H H ˆ F ˇ DWW D FP ˇ HE ˆHr = r r − (4ρ ) r EP #" " #" #H FH D 0 FH D 0 I + VVH −WWH 1 Hˆ ˆ H r, = 4ρ2 r EP PH E H H H H 0 F D 0 F D −WW I + VV (2.53) and substituting (2.53) into (2.51), we have after simplification L(θ) =
1 ˆ E ˆ H u∗ uT EA 8σ2 ρ2
+ 12 θ T Φ−1 θ
(2.54)
where A = RH
"
FH D
0
0
FH D
#"
I + VVH −WWH
−WWH
I + VVH
#"
FH D
0
0
FH D
#H
R.
(2.55)
33
2.5. SIMULATIONS
Table 2.3: Modified JCPCE with closed-form CFO estimation. H 2 −1 Step 1: ǫˆ = π1 ∡ rH 1 (R1 Φ∆ R1 + 4σ I) r2 ; ˆ = diag([1, ej2πˆǫ/N , · · · , ej2π(N −1)ˆǫ/N ]T ); E
ˆ = [Re(EA ˆ E ˆ H ) + 4σ 2 ρ2 Φ−1 ]−1 Im(EA ˆ E ˆ H )1; Step 2: θ ˆ = diag([ej θˆ0 , · · · , ej θˆN−1 ]T ); P ˇP ˆHE ˆ H r. ˆ = (4ρ2 )−1 WH DH F Step 3: g
and R = diag(r), u = [ejθ0 , · · · , ejθN−1 ]T . Solving ∂L(θ)/∂θ = 0 gives us, similar to (2.19), the optimal estimate of θ
ˆ = [Re(EA ˆ E ˆ H ) + 4σ 2 ρ2 Φ−1 ]−1 Im(EA ˆ E ˆ H )1. θ
(2.56)
We summarize the modified JCPCE algorithm for the case of repeating training symbols in Table 2.3.
2.4.4
Complexity Analysis and Low Complexity Implementation
The derivation in the previous section shows that, with the help of repeating training symbols, the CFO estimation can be done in closed-form with the distortion due to PHN optimally rejected. However, the final expression in (2.49) still requires a matrix inversion. Fortunately, for both Wiener and Gaussian PHN, complexity reduction is also available for this computation ˜ ∆ . We may concentrate since Φ∆ is a Toeplitz matrix with a close circulant approximation Φ on the matrix vector product ˜ ∆ RH + 4σ 2 I)−1 r2 . x = (R1 Φ 1
(2.57)
˜ ∆ RH +4σ 2 I)x = r2 , which can be computed This is equivalent to solving a linear equation (R1 Φ 1 efficiently using the conjugate gradient method analogous to the one described in Table 2.2.
2.5
Simulations
In this section, we simulate the performance of JCPCE and its variants based on algorithms presented in Tables 2.1 to 2.3. The following system parameters are assumed in our simulations
34
2.5. SIMULATIONS
unless stated otherwise: 1. A Rayleigh multipath fading channel with a delay of L = 10 taps and an exponentially decreasing power delay profile that has a decay constant of 4 taps (The channel is normalized such that kgk2 /N = 1); 2. An OFDM training symbol size of N = 64 subcarriers with each subcarrier modulated in quadrature phase-shift keying (QPSK) format; 3. Baseband sampling rate fs = 20 MHz (subcarrier spacing of 312.5 kHz); 4. The Wiener PHN is generated as a random-walk process with incremental PHN of αφ = 0.6 deg. The covariance matrix Φ is as depicted in (2.3). 5. The Gaussian PHN has a standard deviation of θrms = 3 deg (i.e., Rθ (0) = (πθrms /180)2 ). It is generated, according to the Matlab code recommended for the IEEE 802.11g standard [53], as i.i.d. Gaussian samples passed through a single pole Butterworth filter of 3dB bandwidth Ωo = 100 kHz. Hence, the PHN covariance matrix Φ is Φi,j = (πθrms /180)2 e−
2πΩo |i−j| fs
.
(2.58)
According to the assumed parameters for the PHN, the power spectral density of the random process ejθ(t) is plotted in Fig. 2.3.
2.5.1
Channel Estimation with PHN Only
In general, PHN is a more complex effect than CFO and is harder to analyze. We will first perform simulations with no CFO to study the joint PHN and channel impulse response estiˆ = I), as mation described as part of the JCPCE algorithm (Steps 2 and 3 in Table 2.1 with E well as its low complexity variant summarized in Table 2.2. Unresolvable Residual Common Phase Rotation Fig. 2.4 plots two instances of the PHN process (from the Wiener and Gaussian model, respectively) and their estimates via the JCPCE algorithm. The figure also depicts the peculiar effect of residual common phase rotation at the output of the JCPCE. At SNR = 30 dB, it is seen that the Wiener PHN is estimated accurately, while the estimator for the Gaussian PHN
35
2.5. SIMULATIONS
−20 Gaussian PHN Wiener PHN −40 f
≈ 100 Hz
3dB
−60
dBc
−80 f3dB≈ 10 KHz
−100
−120
−140
−160 0 10
2
4
10
6
10 ∆ f (Hz)
8
10
10
Figure 2.3: Power spectral density of the phase noise process ejθ(t) .
Degree
10
5
Wiener PHN Estimated PHN Estimated PHN with δ removed Actual PHN
0
−5
10
20
30 Sample
40
50
60
40
50
60
5
Degree
δ: Residual Phase Rotation 0
−5 Gaussian PHN −10
10
20
30 Sample
Figure 2.4: Effect of residual common phase rotation in JCPCE, SNR = 30 dB.
36
2.5. SIMULATIONS
differs from the actual PHN by a constant phase rotation that shifts the estimate towards the zero degree line. This constant rotation creates an equal but opposite rotation in the channel estimate (which is difficult to illustrate graphically). The exact analysis of this residual common phase rotation is difficult, but we have a fairly good understanding of its origin which is summarized in the following proposition and is supported by further simulations: Proposition 1 Assume the actual PHN process and channel impulse response are θ o and go , ˆ and g ˆ , calculated using the JCPCE respectively. As SNR → ∞, the jointly optimal estimates, θ algorithm approach
ˆ → θ 0 + δ1 θ
(2.59)
ˆ → e−jδ go , g
(2.60)
δ = arg min(θ 0 + α1)T Φ−1 (θ 0 + α1).
(2.61)
where α
Proof: Consider the minimization of the complete negative log-likelihood function L(θ, g),
where the actual values of the variables are go and θ o . We examine the joint optimizers of L(θ, g) as SNR → ∞ in relation to go and θ o .
Looking at (2.12), it is seen that L(θ, g) has two components, associated with p(r|θ, g) and
p(θ), respectively. Denote
L(θ, g)p(r|θ,g)
L(θ)p(θ)
= =
1 H 2 2σ2 kr − PF DWgk ; 1 T −1 2 θ Φ θ.
(2.62)
As SNR → ∞ (i.e., σ 2 → 0) (θ o , go ) = arg min Lp(r|θ,g) (θ, g), θ,g
(2.63)
but the minimizer is not unique, since (θ o + δ1, e−jδ go ) = arg min Lp(r|θ,g) (θ, g), θ,g
(2.64)
for arbitrary angle δ. This is because introducing two opposite phase rotations to u and g does not alter the overall channel response, and hence the likelihood.
37
2.5. SIMULATIONS
Assume the uniqueness of (2.64), i.e., Sp(r|θ,g) ≡ {(θ o + δ1, e−jδ go )} describes a complete
set of optimizers for Lp(r|θ,g) (θ, g). Notice that any variable pair (θ, g) ∈ Sp(r|θ,g) makes ˆ g ˆ ) of L(θ, g) Lp(r|θ,g) (θ, g) = 0, or p(r|θ, g) = ∞. It then follows that the optimizer (θ,
must be a subset of Sp(r|θ,g) , as any other pair (θ, g) would make the complete likelihood ˆ g ˆ ) = arg minθ,g L(θ)p(θ) subject to finite. Consequently, the only task remaining is to find (θ, ˆ g ˆ ) ∈ Sp(r|θ,g) . (θ,
In brief, the residual common phase rotation, represented by an unknown constant δ, is
introduced to shift the optimal estimate θ o such that θ o + δ1 is closer to a zero-mean Gaussian process defined by the covariance matrix Φ. Thus although the PHN estimate we have obtained is “maximum a posteriori”, it is not unbiased. This proposition not only gives us a qualitative understanding of the phenomenon, but also ˆ most likely, θ ˆ = θ o + δ1 is approximately zero offers quantitative predictions. By making θ mean. That implies δ should be approximately the negative sample mean of θ o : δ≈−
1 T θ 1. N o
(2.65)
Since p(θ o ) = N (0, Φ), it is easy to see that p(δ) = N (0, 1T Φ1/N 2 ).
Fig. 2.5 shows the pdf of the measured δ compared to the Gaussian prediction, where δ is ˆ and θ o . It is seen that this prediction measured in simulation as the mean difference between θ holds very well at different SNR’s. Hence, we now have a much better knowledge about the behaviour of the residual common phase rotation, and know that it is not significant, as its variance is a fraction of that of PHN. We call this residual common phase rotation “unresolvable”, because it cannot be corrected in the channel estimation stage using a likelihood-based estimator. The consequence of the residual common phase rotation is the rotation of the estimated channel impulse response from go by δ. Left untreated, using this biased channel estimate in the data detection stage in a PHN channel is equivalent to having a perfect channel estimate but an exacerbated PHN. In fact, the equivalent PHN process would be zero mean Gaussian with a new covariance matrix Φδ = Φ + σδ2 11T ,
(2.66)
where σδ2 = 1T Φ1/N 2 . Alternatively, δ can also be estimated in the data detection stage using pilot symbols embedded in the transmitted OFDM symbols. Since this is a practical implementation choice, we
38
2.5. SIMULATIONS
f(δ)
15 10 SNR = 15 dB 5 0 −0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
0.4
0.5
0.4
0.5
f(δ)
15 10 SNR = 25 dB 5 0 −0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
f(δ)
15 10 SNR = 35 dB 5 0 −0.5
−0.4
−0.3
−0.2
−0.1
0 δ(rad)
0.1
0.2
0.3
Figure 2.5: The predicted pdf vs. the histogram of δ at SNR = 15, 25, 35 dB.
will not discuss it much further but assume from hereon that δ can be perfectly corrected to facilitate easy assessment of the quality of channel estimation. In particular, in each simulation δ is set to be the mean difference between the actual PHN process and the estimated PHN process. Cram´ er-Rao Lower Bound The minimum MSE the proposed channel estimator can possibly achieve is the Cram´er-Rao Lower Bound (CRLB) for an OFDM channel without PHN distortion. It is an important measure to help gauge the practical value of the proposed schemes. In the following, we will derive the CRLB for estimating the impulse response g ∈ CL×1 : Theorem 2 In the absence of CFO and PHN, the CRLB of estimating the channel impulse response g ∈ CL×1 in an OFDM channel with additive white Gaussian noise is CRLB(g) =
L . SNR
(2.67)
39
2.5. SIMULATIONS
Proof: In the absence of CFO and PHN, the received signal in an OFDM channel can be written, similar to (2.11), as r = FH DWg + n.
(2.68)
p(r|g) = CN (FH DWg, 2σ 2 I),
(2.69)
Thus,
or equivalently, log p(r|g) = −
1 (FH DWg − r)H (FH DWg − r). 2σ 2
(2.70)
Taking the derivative w.r.t. g∗ and we have ∂ ∂g∗
{log p(r|g)} = − 2σ1 2 WH DH F(FH DWg − r) =
1 WH DH Fn. 2σ2
(2.71)
From [54], the Fisher information matrix can be evaluated h
ih iH ∂ I(g) = E log p(r|g) ∂g∗ log p(r|g) o n = E (2σ12 )2 WH DH FnnH FH DW = (2σ12 )2 WH DH F E nnH FH DW = =
∂ ∂g∗
(2.72)
1 WH DH DW 2σ2 2 ρ I, σ2
where the last equality follows from the constant modulus assumption of the training data, i.e., DH D = 2ρ2 I. Specifically in this simulation, QPSK training symbols are used, i.e., d = ρ × [{±1} + j{±1}]N ×1 .
The Cram´er-Rao Lower Bound is therefore CRLB(g) = tr I−1 (g) =
= where SNR =
Es No
=
2ρ2 . 2σ2
Lσ2 ρ2 L
SNR ,
(2.73)
40
2.5. SIMULATIONS
−2
10
CG−based JCPCE CRLB
−3
MSE
10
−4
10
0
10
20
30 40 No. of CG Iteration
50
60
Figure 2.6: Channel estimation performance for different number of conjugate gradient (CG) iterations (ǫ = 0).
41
2.5. SIMULATIONS
−3
10
θ
rms
=3, κ=0.005 (true value)
θrms=1, κ=0.005 θ
=3, κ=0.001
θ
=5, κ=0.02
rms rms
MSE
CRLB −4
10
−5
10
25
30
35
40
SNR
Figure 2.7: MSE vs. SNR channel estimation performance with PHN modelling error (ǫ = 0).
Effect of Number of Iterations of Conjugate Gradient Algorithm The number of iterations in the conjugate gradient method for PHN estimation is a crucial factor in the overall complexity of the low-complexity implementation of JCPCE. Fig. 2.6 illustrates the performance of the channel estimator at SNR = 20, 30, 40 dB as a function of the number of conjugate gradient iterations in Table 2.2. The plots reveal that, in general, reliable channel estimates can be obtained with 5 to 10 conjugate gradient iterations, thanks to the superior convergence properties of the conjugate gradient method. The Gaussian PHN model is assumed in this simulation. We omit the curves for the Wiener PHN case since very similar observations can be made. This also applies for the next two sets of simulations for conciseness. Sensitivity to PHN Modelling Error It is well-known that the accuracy of the prior statistics plays an important role in the performance of Bayesian estimators. The Bayesian estimation of the PHN in the JCPCE algorithm is therefore influenced by the accuracy of the prior distribution p(θ) and, more specifically, the covariance matrix Φ. From (2.58) it is seen that modelling errors may occur in two places: the
42
2.5. SIMULATIONS
−3
10
θrms=1,κ=0.005 θrms=5,κ=0.005 θrms=3,κ=0.02 θrms=5,κ=0.02
MSE
CRLB −4
10
−5
10
25
30
35
40
SNR
Figure 2.8: MSE vs. SNR channel estimation performance at different levels of PHN (ǫ = 0).
PHN standard deviation θrms and the relative bandwidth κ = Ωo /fs . In our simulations, the PHN is generated by setting θrms = 3 deg and κ = 0.005. We artificially introduce erroneous PHN statistics in the channel estimator by varying Φ in Table 2.1 over a range of values of θrms and κ to test the robustness of JCPCE to inaccurate PHN statistics. Fig. 2.7 depicts the performance of JCPCE. The result demonstrates that even with significant errors in PHN statistics, JCPCE performs close to the CRLB for estimating g in an OFDM channel free of CFO or PHN, which has been shown to be CRLB(g) = L/SNR. Performance at Different Levels of PHN Fig. 2.8 studies the channel estimation accuracy as a function of the severity of PHN distortion. We vary the parameters of the PHN generated in the simulations over different settings of θrms and κ, and assume perfect knowledge of these statistics at the channel estimator (no modelling error). It is seen that even in extreme cases such as θrms = 5 deg and κ = 0.02, JCPCE is able to perform close to the CRLB, confirming that the proposed algorithm is robust to very severe PHN in the channel.
43
2.5. SIMULATIONS
JCPCE CG−based JCPCE with i=5 Iterations Partial JCPCE Ignoring PHN CRLB
−3
MSE
10
−4
10
20
25
30 SNR
35
40
Figure 2.9: MSE vs. SNR channel estimation performance using JCPCE (Wiener PHN).
2.5.2
Channel Estimation with both CFO and PHN
In this section we will examine the channel estimation performance in the presence of both CFO and PHN. The allowable CFO estimation range is |ǫ| < 0.5 for the JCPCE algorithm and
|ǫ| < 1 for the modified JCPCE algorithm. In the following simulations, the CFO term ǫ will be generated from a uniform distribution in [−0.4, 0.4] corresponding to a maximum CFO of 125 kHz. Care should be taken, however, when simulating both CFO and PHN. Now the effective PHN seen by the PHN estimator (Step 2 in both Table 2.1 and Table 2.3) is a combination of the residual CFO estimation error (as an additional PHN process with linearly varying phase) and the actual PHN. So δ should now be equal to the mean difference between the effective PHN process and the estimated PHN process, and removed before channel estimation. Channel Estimation Performance of JCPCE Fig. 2.9 and Fig. 2.10 plot the channel estimation MSE as a function of the system SNR (SNR = Es /No = 2ρ2 /2σ 2 ) in the presence of both CFO and PHN. The complete JCPCE algorithm is compared to the partial JCPCE where PHN estimation is omitted. (We cannot compare with
44
2.5. SIMULATIONS
JCPCE CG−based JCPCE with i=5 Iterations Partial JCPCE Ignoring PHN CRLB
−3
MSE
10
−4
10
20
25
30 SNR
35
40
Figure 2.10: MSE vs. SNR channel estimation performance using JCPCE (Gaussian PHN).
the conventional channel estimator that ignores PHN and CFO because it completely fails when ǫ 6= 0.) It is seen that the complete JCPCE algorithm almost completely cancels the effect of
CFO and PHN distortion. The partial JCPCE, which optimally cancels CFO but ignores PHN, deviates from the CRLB at high SNR, demonstrating that PHN has a major effect in channel estimation even with optimal CFO estimation. We also plot the low complexity implementation of JCPCE using the conjugate gradient method described in Section 2.3.2. It is shown that for both types of PHN, there is only a small performance degradation when using i = 5 conjugate gradient iterations to evaluate (2.20) and (2.22). Channel Estimation Performance of Modified JCPCE Fig. 2.11 and Fig. 2.12 simulate the modified JCPCE (Table 2.3) given repeating training symbols. Here we keep the same simulation settings except for letting the training symbol have a repeating structure. The performance of the modified JCPCE is compared against the CRLB and the conventional channel estimator with the CFO estimated using Moose’s method in (2.32) and the random PHN ignored. It is seen that the conventional method suffers from
45
2.6. SUMMARY
−3
MSE
10
−4
10
Modified JCPCE CG−based Modified JCPCE (5 Iterations) Conventional Scheme with Moose CFO Estimator CRLB −5
10
20
25
30 SNR
35
40
Figure 2.11: MSE vs. SNR channel estimation performance using modified JCPCE (Wiener PHN). an error floor due to the untreated PHN. The modified JCPCE in Fig. 2.11 and Fig. 2.12 has very similar performance as the original JCPCE in Fig. 2.9 and Fig. 2.10, showing that the modified JCPCE, while having much lower complexity, does not perform worse. Therefore, in practice, the modified JCPCE is a preferred scheme over the regular JCPCE. We also plot the low complexity implementation of the modified JCPCE using the conjugate gradient method described in Section 2.4.4. It is shown that even with as few iterations as i = 5 for the evaluation of (2.49) and (2.56), little performance degradation is introduced as a result.
2.6
Summary
This chapter addresses the problem of channel estimation in a practical OFDM receiver that suffers from phase noise and frequency offset. We first derived the maximum a posteriori (MAP) estimator of the channel response, phase noise and frequency offset, incorporating prior knowledge of the phase noise statistics and using a constant-modulus training sequence. Next, we proposed a less complex estimator which requires a training symbol that has two identical halves in the time domain. The lower complexity is obtained because an expensive exhaustive
46
2.6. SUMMARY
−3
MSE
10
−4
10
Modified JCPCE CG−based Modified JCPCE (5 Iterations) Conventional Scheme with Moose CFO Estimator CRLB −5
10
20
25
30 SNR
35
40
Figure 2.12: MSE vs. SNR channel estimation performance using modified JCPCE (Gaussian PHN).
search over all feasible frequency offsets is no longer needed. The proposed method’s CFO estimate is more accurate than the one from [44] since it is based on an accurate model of the PHN present in the signal. Furthermore, we explored ways to reduce the complexity of the proposed estimators through the use of the conjugate gradient iteration. It is demonstrated that the channel estimators are able to perform well with a very small number of conjugate gradient iterations, with each iteration efficiently computed using the FFT. It is evident that the proposed channel estimators can be readily implemented without substantial increase to the overall complexity of conventional OFDM receivers. This chapter provides a firm foundation for the design of OFDM detectors in the presence of PHN, which will be presented in the next chapter, where the channel impulse response and CFO can now be safely assumed known.
Chapter 3
Joint Data Detection and Phase Noise Estimation This chapter studies the mitigation of phase noise (PHN) in OFDM data detection. We present a systematic probabilistic framework based on the variational inference technique that leads to both optimal and near-optimal OFDM detection schemes in the presence of unknown PHN. In contrast to the conventional approach that cancels the common (average) PHN, our aim is to jointly estimate the complete PHN sequence and the data symbol sequence. We derive a family of low-complexity OFDM detectors for this purpose. In deriving the proposed schemes, we also point out that the expectation-maximization (EM) algorithm is a special case of the variational-inference-based joint estimator. Further complexity reduction is obtained using the conjugate gradient (CG) method, and only a few conjugate gradient iterations are needed to closely approach the ideal joint estimator output.
3.1
Introduction
In recent years, various methods to mitigate the effect of PHN during data detection have been presented in the literature [9–11], where PHN is commonly decomposed into two components: the common phase noise, also known as the common phase rotation, which depends on the average value of the PHN over one OFDM symbol and has the same effect on all subcarriers, and the random phase noise which induces inter-carrier interference (ICI). Common PHN can be mitigated with a few pilot-tones, as in [9] for example. Nikitopoulos and Polydoros [10] assume that the common PHN evolves slowly over consecutive OFDM symbols so that previous 47
3.2. SIGNAL MODEL
48
estimates of common PHN may be used in the processing of the current OFDM symbol. In [11], an MMSE equalization technique was used to suppress the common PHN after modelling the ICI caused by the random PHN as extra additive noise. These papers represent state-of-the-art PHN mitigation techniques, where common PHN is the target of estimation and cancellation while random PHN is treated as unavoidable noise. However, the removal of the complete PHN sequence (common PHN + random PHN) must lead to much improved performance since then the ICI introduced by random PHN can be suppressed, but this has never been rigorously studied due to the difficulty of jointly estimating both the PHN profile and data symbols. In this chapter, we will show that joint data detection and PHN estimation is feasible through a relaxation of the form of the likelihood function containing both the data and PHN. We will not distinguish between the two components of PHN, and instead investigate the general PHN issue, by first developing a probabilistic model, and then deriving variational inference algorithms to solve the problem. Because of the probabilistic nature of the technique used, rather than generating “hard” estimates of the data and PHN profile, we obtain probability distributions for the data and PHN, which provide measures of the reliability of the estimator outputs, or alternatively our uncertainty over those estimates. The rest of the chapter will be organized as follows: Section 3.2 presents the signal model; Section 3.3 summarizes the conventional PHN cancellation algorithm and derives the optimal OFDM symbol detector; Section 3.4 presents a family of algorithms based on the variational inference framework that jointly detect the data symbols and estimate PHN; Section 3.5 provides complexity analysis on the proposed algorithms and derives complexity reduction methods based on the conjugate gradient algorithm; Section 3.6 presents simulation results that test the proposed algorithms in terms of the bit error rate (BER) of detected OFDM symbols; Section 3.7 contains the summary.
3.2
Signal Model
The signal model we consider in this chapter is rather similar to that of Chapter 2. The system block diagram, including the transmitter, channel and receiver, is given in Fig. 2.2. The Channel Estimation Module in Fig. 2.2 is already studied in Chapter 2, where an optimal joint CFO/PHN/CIR estimator (JCPCE) was proposed that performs almost as well as if no CFO or PHN existed. The exact CFO and PHN that distort the channel estimation training symbols are also estimated. It is therefore reasonable to assume that the channel
3.3. CONVENTIONAL AND OPTIMAL PHASE NOISE CANCELLATION
49
impulse response and CFO (which are quasi-static) are known at the data detection stage. However, the same assumption cannot be made about PHN since it is time varying and differs from one OFDM symbol to the next. Our goal in this chapter is to investigate methods for efficient OFDM data detection in the presence of unknown PHN distortion assuming known channel impulse response and removal of CFO (as depicted in the Data Detection Module in Fig. 2.2). In the subsequent description of the signal model for data detection, we will assume that the CFO is perfectly estimated at the channel estimation stage and removed. The complex baseband received signal of one OFDM data symbol within the payload section sampled at rate N/T can be written as an N point sequence for n = 0, · · · , N − 1: N −1 X 1 rn = √ ejθn hk dk ej2πnk/N + ηn , N k=0
(3.1)
−1 N −1 where {θn }N n=0 is the discrete-time PHN sequence; {hk }k=0 is the channel frequency response −1 at subcarriers 0 to N − 1; {dk }N k=0 are the transmitted data symbols belonging to an M -QAM
−1 2 constellation; and {ηn }N n=0 is complex white Gaussian noise with variance σ per dimension.
(3.1) may be written in matrix form as:
r = PFH Hd + n,
(3.2) 2π(l−1)(m−1) √1 e−j N ; N vector; P = diag([ejθ0 ,
where F ∈ CN ×N is the DFT matrix with the (l, m)th element being Fl,m = d = [d0 , · · · , dN −1 ]T is the data vector; n = [η0 , · · · , ηN −1 ]T is the noise
· · · , ejθN−1 ]T ) is the PHN matrix; and H = diag(h) = diag([h0 , · · · , hN −1 ]T ) is the channel matrix.
Note that in contrast to many papers discussing PHN, we do not assume that the channel is perfectly equalized before PHN estimation and cancellation, which would be difficult to achieve.
3.3 3.3.1
Conventional and Optimal Phase Noise Cancellation Conventional Schemes
In this section, we describe the essence of conventional PHN cancellation schemes [9–11]. The discrete Fourier Transform (DFT) of the time-domain received signal vector r = [r0 , · · · , rN −1 ]T
expressed in (3.1) produces a frequency domain sequence [R0 , · · · , RN −1 ]T which can be written
3.3. CONVENTIONAL AND OPTIMAL PHASE NOISE CANCELLATION
as: √1 N
PN −1
rn ej2πnk/N PN −1 = dk hk U0 + l=0,l6=k dl hl U(l−k)N + vk ,
Rk =
for k = 0 to N −1. [U0 , · · · , UN −1 ]T is
n=0
√1 N
50
(3.3)
times the DFT of the PHN vector u = [ejθ0 , · · · , ejθN−1 ]T ,
and v = [v0 , · · · , vN −1 ]T is the DFT of the noise vector n. (l −k)N stands for (l −k) mod N . It P −1 can be shown that vk ∼ CN (0, 2σ 2 ), and the ICI term N l=0,l6=k dl hl U(l−k)N is approximated as P −1 2 2 2 zero-mean complex Gaussian noise with variance 2σICI = Es N l=0,l6=k |hl | E[|U(l−k)N | ], where Es = 2ρ2 and ρ2 is the symbol energy per dimension per subcarrier. Assuming pilot symbols
transmitted on carriers with index set Sp , the least-squares (LS) estimate of the common PHN term U0 is
P
ˆ0 = Pk∈Sp U
k∈Sp
Rk d∗k h∗k |dk hk |2
.
(3.4)
Given the estimated common PHN, the general phase noise suppression (GPNS) scheme [11] estimates dk as dˆk =
ˆ ∗ h∗ Rk Es U 0 k , ˆ0 hk |2 + 2σ 2 Es |U tot
(3.5)
2 = 2σ 2 2 where 2σtot ICI + 2σ is the effective additive noise variance. In [11], pilots are used to 2 . GPNS will be simulated in Section 3.6 as the “conventional scheme” estimate both U0 and 2σtot
to be compared with our proposed schemes, where no pilots are necessary. For comparison purposes, we will simply assume U0 is known perfectly when simulating GPNS. Furthermore, 2 can be calculated exactly in advance. with channel knowledge and known PHN statistics, 2σtot
With these assumptions, pilots are not needed in the simulations, even for GPNS.
3.3.2
Maximum Likelihood Detector
Here we give an original derivation of the optimal detector (also called exact inference in probabilistic inference literature) given the prior distribution of phase noise θ, and show that it has complexity exponential in N . From (3.2), the received signal can be expressed alternatively as r = diag(FH Hd)u + n ≈ diag(FH Hd)(1 + jθ) + n,
(3.6)
where u = ejθ = [ejθ0 , · · · , ejθN−1 ]T . The approximation is tight since θ is small. We thus have
51
3.3. CONVENTIONAL AND OPTIMAL PHASE NOISE CANCELLATION
the following prior and conditional pdf’s: p(θ) = N (0, Φ)
p(r|d, θ) = CN (diag(FH Hd)(1 + jθ), 2σ 2 I).
(3.7)
The ML estimate of d is derived using classical estimation theory by treating θ as a nuisance parameter and “integrating it out” [54] to obtain p(r|d). Theorem 3 The likelihood function p(r|d) is complex Gaussian distributed with mean FH Hd and covariance matrix diag(FH Hd)Φ diag(FH Hd)H + 2σ 2 I. Proof: Since p(r|d, θ) and p(θ) are Gaussian distributed, it can be shown that the distribution of p(r|d) is also Gaussian. Denoting the mean of r given d to be E(r|d) and the variance as V(r|d), then applying the Iterated Expectation Theorem [43, ch. 14] and its analog in covariance, yields E(r|d) = Eθ [Er (r|d, θ)] V(r|d) = Vθ [Er (r|d, θ)] + Eθ [Vr (r|d, θ)].
(3.8)
Because p(r|d, θ) = CN (diag(FH Hd)(1 + jθ), 2σ 2 I), it is straightforward to infer that Er (r|d, θ) = FH Hd + j diag(FH Hd)θ Vr (r|d, θ) = 2σ 2 I.
(3.9)
Given that p(θ) = N (0, Φ), with some further manipulation, we obtain Eθ [Er (r|d, θ)] = FH Hd Vθ [Er (r|d, θ)] = diag(FH Hd)Φ diag(FH Hd)H
(3.10)
Eθ [Vr (r|d, θ)] = 2σ 2 I, which implies that E(r|d) = FH Hd V(r|d) = diag(FH Hd)Φ diag(FH Hd)H + 2σ 2 I.
(3.11)
Therefore, p(r|d) = CN (FH Hd, diag(FH Hd)Φ diag(FH Hd)H + 2σ 2 I).
(3.12)
3.4. JOINT ESTIMATION VIA VARIATIONAL INFERENCE
52
Unfortunately, the maximizer of (3.12) does not have a closed form and the optimal d can only be found if each symbol hypothesis is tested, resulting in complexity of O(M N ), where M is the constellation size.
3.4
Joint Estimation via Variational Inference
Our basic problem is the estimation of d, whose optimal solution is the maximization of p(d|r) = R p(d, θ|r)dθ. This is hard to do because d is drawn from a discrete sample space and hence
the problem is NP-complete. The variational inference approach first relaxes the problem constraints by allowing d to be continuous; then it approximates p(d, θ|r) with a function Q(d, θ) that has convenient properties, such as Q(d, θ) = Qd (d)Qθ (θ). This last assumption is equivalent to assuming that d and θ are independent conditioned on r, and immediately leads R ˆ that maximizes Q(d, θ) over d and θ also maximizes Q(d, θ)dθ. to the result that the vector d In other words, the joint estimation of d and θ directly yields the optimal estimate of d. Finally,
Qd (d) and Qθ (θ) must be chosen so that they can be easily manipulated. Details will now be provided.
3.4.1
Variational Inference
Consider the optimization of p(d, θ|r) over d and θ. As introduced in Chapter 1, the heart of the variational technique is that it looks for a parameterized Q-distribution, Q(d, θ), which closely resembles p(d, θ|r), and then finds d and θ that maximize Q(d, θ). The versatility and simplicity of the variational technique lies in the fact that when Q(d, θ) is properly selected (e.g., as a Gaussian distribution), its maximizers can be easily deduced. It can be shown that the problem has been transformed from the maximization of p(d, θ|r) itself to that of its lower-bound [2], yielding enormous computational savings. The Variational Free Energy in the context of the PHN problem may be written as: F(Q, p) =
Z
Q(d, θ) log d,θ
Q(d, θ) dddθ. p(d, θ, r)
(3.13)
Subsequently, we make a simplification by factorizing Q(d, θ) into a product form (also known as mean-field approximation), i.e., Q(d, θ) = Qd (d)Qθ (θ). The mean-field approximation is also important for justifying the use of the d component of the maximizer of Q(d, θ) as the R optimal estimate of d; without it, we should be finding Q(d, θ)dθ and then maximizing the
3.4. JOINT ESTIMATION VIA VARIATIONAL INFERENCE
53
result, which may turn out to be infeasible. For PHN estimation, we assume that (after dropping the subscripts of the Q-functions for simplicity of notation) Q(d) = CN (md , Sd ), Q(θ) = N (mθ , Sθ ).
(3.14)
It is worth noting that the posteriors of d and θ are now parameterized by their means and variances (namely, md , mθ , Sd , and Sθ ), which then become the targets of optimization instead of the Q-functions themselves. The complete likelihood function p(d, θ, r) in (3.13) may be written as p(d, θ, r) = p(r|d, θ) ·p(θ)p(d), where p(r|d, θ) and p(θ) are given in (3.7). In addition, we let p(d) = CN (0, 2ρ2 I).
Note that instead of defining a discrete distribution over the signal constellation, we have made a Gaussian approximation, which leads to a linear detector. Substituting the Q-functions from (3.14) into (3.13), we have the closed form expression of F(Q, p), now expressed as a function of the parameters of the Q-functions, as derived in Appendix A.1:
F(md , Sd , mθ , Sθ ) 1 1 T −1 1 −1 = 2ρ12 tr(Sd ) + mH d md + 2 tr Φ Sθ + 2 mθ Φ mθ − 2 log |Sθ | − log |Sd | + 2σ1 2 tr (jMθ + I)FH HSd HH F(jMθ + I)H + tr HH F diag(Sθ )FH HSd o H F diag(S )FH Hm + (jM + I)FH Hm − r H (jM + I)FH Hm − r +mH H . θ d θ d θ d d (3.15) Obviously, the optimal parameters are hard to obtain analytically in one step. The usual practice is to update each one of them in turn, while holding the others constant, a simple technique termed coordinate descent in the optimization literature [50]. The algorithm is guaranteed to converge to a local minimum of the free energy expression [55]. Taking the partial derivative of F(md , Sd , mθ , Sθ ) w.r.t. each parameter and equating to zero, we obtain the following set of equations, with detailed derivations in Appendix A.2:
−1 Sθ = σ 2 σ 2 Φ−1 + diag(FH HSd HH F) + Xm XH m H mθ = σ −2 Sθ · Im XH m (r − F Hmd ) −1 Sd = 2σ 2 σ 2 ρ−2 I + HH F diag(Sθ )FH H + HH H
md = (2σ 2 )−1 Sd HH FPH m r,
where Xm = diag(FH Hmd ) and Pm = diag(ejmθ ).
(3.16) (3.17) (3.18) (3.19)
3.4. JOINT ESTIMATION VIA VARIATIONAL INFERENCE
54
Table 3.1: Updating Equations for Variational Inference Algorithm. Initialization:
(0) (0) (0) (0) ˆ init ). Choose initial values for Sd and md (e.g. Sd = 0 and md = d
Iterations:
For t = 1 to n i−1 h (t) (t−1) H H F) + XH(t−1) X(t−1) Sθ = σ 2 σ 2 Φ−1 + diag(FH HSd m m h i (t) (t) (t−1) H mθ = σ −2 Sθ · Im XH(t−1) (r − F Hm ) m d h i−1 (t) (t) 2 2 −2 H H Sd = 2σ σ ρ I + H Fdiag(Sθ )F H + HH H
Update for Q(θ)
Update for Q(d)
(t)
(t)
md = (2σ 2 )−1 Sd HH FPH(t) m r
End
We summarize the parameter updating procedure in Table 3.1. In each iteration, the parameters are updated in turn to generate new posterior estimates of d and θ that decrease F(md , Sd , mθ , Sθ ) monotonically. This particular update order was chosen based on the de-
pendence of mθ and md on Sθ and Sd . For initialization, a tentative data decision is made (0)
(0)
assuming zero PHN and used as md . Sd is set to the all-zero matrix. At the last iteration, approximate posterior distributions of d and θ are extracted. Since Q(d) is assumed to be Gaussian, it is maximized at d = md , and so hard decisions can be obtained from slicing md , or Q(d) can be used as a soft estimate of d for further error control decoding. The difference between the variational inference and the ML approach in Section 3.3.2 lies in the fact that in variational inference the posterior density p(d, θ|r) is “forced” to be a product of Gaussian densities Q(d) and Q(θ), making final symbol decisions easy to make since no integration over θ is necessary, and the mean of Q(d) is its maximizer. These approximations mean that the algorithm in general does not converge to the global maximum of the complete likelihood function, but it has been found to work very well in many applications. In the sequel, we investigate a few important variants of the variational inference algorithm. Although each is given a specific name, they only differ from the original version in their use of different Q functions.
3.4.2
Iterative Conditional Mode
Variational inference has made a very complex problem computationally tractable, but a further simplification is possible by assuming the posteriors Q(d) and Q(θ) to be delta functions instead
55
3.4. JOINT ESTIMATION VIA VARIATIONAL INFERENCE
Table 3.2: Updating Equations for Iterative Conditional Mode Algorithm. Initialization:
ˆ (0) (e.g. d ˆ (0) = d ˆ init ). Choose initial values for d
Iterations:
For t = 1 to n h i−1 h i ˆ (t) = σ 2 Φ−1 + X ˆ H(t−1) X ˆ (t−1) ˆ H(t−1) (r − FH Hd ˆ (t−1) ) · Im X θ ˆ (t) = σ 2 ρ−2 I + HH H −1 HH FP ˆ H(t) r d
Update for Q(θ) Update for Q(d)
End
of Gaussian. ˆ and δ(θ, θ), ˆ respectively. The notation δ(a, a ˆ) In this case, the Q-functions are δ(d, d) R ˆ)f (a) da = f (ˆ denotes a vector Dirac delta function with the following properties: δ(a, a a), R ˆ ˆ ˆ) da = 1. The minimization of variational free energy over the parameters d and θ and δ(a, a ˆ θ) ˆ = − log p(r, d, ˆ θ) ˆ over d ˆ ∈ CN and θ ˆ ∈ RN . An algorithm is equivalent to minimizing L(d, based on coordinate descent will iteratively perform optimal point estimation for one of the two unknowns while holding the other fixed, hence the name iterative conditional mode (ICM). ˆ θ) ˆ = p(r|d, ˆ θ)p( ˆ d)p( ˆ θ), ˆ L(d, ˆ θ) ˆ is evaluated to be: Since p(r, d, ˆ θ) ˆ = L(d,
1 ˆH ˆ ˆ T Φ−1 θ ˆ (d d) + 12 θ 2ρ2
+
1 (r 2σ2
ˆ H (r − PF ˆ ˆ H Hd) ˆ H Hd). − PF
(3.20)
ˆ ∗ = 0 and ∂L/∂ θ ˆ = 0 leads to As derived in Appendix A.3, solving ∂L/∂ d ˆ = (σ 2 ρ−2 I + HH H)−1 HH FP ˆ Hr d h i ˆ = (σ 2 Φ−1 + X ˆ H X) ˆ −1 · Im X ˆ H (r − FH Hd) ˆ , θ
(3.21) (3.22)
ˆ and P ˆ = diag(FH Hd) ˆ = diag(ej θˆ ). where X The ICM algorithm is shown in Table 3.2. The saving in ICM is that we no longer require the covariance matrices Sd and Sθ of the posterior distribution. The drawback is that point estimates do not model the uncertainties at each iteration, thus ICM in general produces inferior results compared to variational inference. It should be noted, however, though ICM resembles the heuristic decision-directed approach where the data symbols and PHN are detected/estimated iteratively until decisions are made for both unknowns, it is in nature rather
56
3.4. JOINT ESTIMATION VIA VARIATIONAL INFERENCE
different as no symbol hard decisions (constrained to the symbol constellation) are made during the iterations.
3.4.3
Expectation-Maximization (EM) Algorithm
In the algorithms of Tables I and II, the Q-functions were chosen to be Gaussian and delta functions respectively. We now show that these choices can be seen as the two extremes in a spectrum of choices that can be related to the EM algorithm [56]. The EM algorithm is used to estimate a vector of parameters, say φ, from observations y that are termed incomplete data, with the help of some auxiliary or hidden variables, say x. The algorithm iteratively carries out two operations: the E-step and the M-step. The t-th iteration effectively computes a probability density p(x|y, φ(t−1) ), where φ(t−1) is the estimate of φ in the previous iteration, and then maximizes U (φ, φ
(t−1)
)=
Z
p(x|y, φ(t−1) ) log p(x, y|φ)dx
(3.23)
over φ, yielding φ(t) . In any given problem, the EM algorithm requires us to separate the unknown parameters from the unknown hidden variables. In other words, all unknowns are grouped into two classes. In each iteration, hard estimates are obtained for the parameters, while soft estimates in the form of probability densities are obtained for the hidden variables. The link between EM and variational inference is made in [55], where it was shown that the EM algorithm is equivalent to jointly estimating the hidden variables and parameters by minimizing a single free-energy expression over a postulated distribution for the hidden variables, and over the parameters. This can be generalized even further: suppose we are simply faced with a problem with multiple unknowns. There are some unknowns which we want hard estimates of, and the rest we want soft estimates of. Using delta functions for the postulated distributions of the “hard” unknowns, and exact (or postulated) distributions for those of the “soft” unknowns in the variational inference algorithm will lead to the EM algorithm with the hard unknowns as parameters, and the soft unknowns as hidden variables. With this perspective, the algorithm in Table 3.1 can be seen as an EM algorithm with only hidden variables of Gaussian postulated distributions, and the ICM algorithm in Table 3.2 is an EM algorithm with only parameters. In between these extremes, we can set one unknown to be a variable and the other a parameter by simply adjusting the corresponding Q-functions.
57
3.4. JOINT ESTIMATION VIA VARIATIONAL INFERENCE
Table 3.3: Updating Equations for EM I Algorithm.
Initialization:
(0) (0) (0) (0) ˆ init ). Choose initial values for Sd and md (e.g. Sd = 0 and md = d
Iterations:
For t = 1 to n
Update for Q(θ)
h i h i−1 ˆ (t) = σ 2 Φ−1 + XH(t−1) X(t−1) (r − FH HX(t−1) ) θ · Im XH(t−1) m m m m
M Step
−1 (t) Sd = 2σ 2 σ 2 ρ−2 I + HH H
Update for Q(d)
(t) (t) ˆ H(t) r md = (2σ 2 )−1 Sd HH FP
E Step End
Specifically, we have two options: ˆ The free energy expression becomes • Option I: Q(d) = CN (md , Sd ) and Q(θ) = δ(θ, θ). ˆ = F(md , Sd , θ)
ˆ T Φ−1 θ ˆ − log |Sd | md + 12 θ tr(Sd ) + mH d n o ˆ H HSd HH FP ˆ H ] + (PF ˆ H Hmd − r)H (PF ˆ H Hmd − r) , + 2σ1 2 tr[PF (3.24) 1 2ρ2
ˆ = diag(ej θˆ ). Minimizing this free energy yields the EM I algorithm in Table 3.3. where P
ˆ and Q(θ) = N (mθ , Sθ ). The free energy expression becomes • Option II: Q(d) = δ(d, d) ˆ mθ , Sθ ) = F(d,
1 ˆH ˆ d d + 12 tr(Φ−1 Sθ ) + 12 mTθ Φ−1 mθ − 12 log |Sθ | 2ρ2 n ˆ H HH F diag(Sθ )FH Hd ˆ + [Pm FH Hd ˆ − r]H [Pm FH Hd ˆ− + 2σ1 2 d
o r] , (3.25)
where Pm = diag(ejmθ ). Minimizing this free energy yields the EM II algorithm in Table 3.4. We omit the formal derivations for EM I and EM II since they are straightforward simplifications to the original variational inference derivation. The fact that the above algorithms are equivalent to EM may be verified through the conventional EM formulation by setting the hidden variables and parameters appropriately. It should be noted that in the EM algorithm
3.4. JOINT ESTIMATION VIA VARIATIONAL INFERENCE
58
Table 3.4: Updating Equations for EM II Algorithm.
Initialization:
ˆ (0) (e.g. d ˆ (0) = d ˆ init ). Choose initial values for d
Iterations:
For t = 1 to n h i−1 (t) ˆ (t−1) ˆ H(t−1) X Sθ = σ 2 σ 2 Φ−1 + X h i (t) (t) ˆ (t−1) ) ˆ H(t−1) (r − FH Hd mθ = σ −2 Sθ · Im X
Update for Q(θ) E Step Update for Q(d) M Step End
ˆ (t) = σ 2 ρ−2 I + HH H −1 HH FPH(t) r d m
the posterior distribution of the hidden variables is not necessarily Gaussian1 . But our Gausˆ and p(θ|r, d) ˆ are sian parameterization for the Q-functions is correct here because p(d|r, θ) Gaussian distributions. Following the conventional convergence analysis of the EM algorithm, it can be shown ˆ (t) |r) with each iteration t, while EM II monotonically that EM I monotonically increases p(θ ˆ (t) |r). However, the convergence point may not be the global optima of p(θ|r) or increases p(d
p(d|r), due to the potential existence of local maxima.
3.4.4
Summary of Variants of Variational Inference
Through the preceding investigation, we have proposed not one, but a spectrum of algorithms, for the joint detection/estimation problem at hand, all under the unified framework of variational inference and all rigorously derived via free energy minimization. Among these, we have covered the EM algorithm as a special case. This framework admitted new insights into the problem and enabled us to look beyond conventional signal processing methods. Table 3.5 provides a summary of the above-mentioned variants of the variational inference algorithm, in which the term “objective function” stands for the function that the corresponding algorithm guarantees to improve after each iteration. 1 If the true distribution is non-Gaussian, but we assume it to be Gaussian, or some other more convenient distribution, we have a variational EM algorithm [2].
59
3.5. COMPLEXITY ANALYSIS AND REDUCTION
Table 3.5: Comparison of Variants of Variational Inference Algorithm.
3.5
Inference Schemes
Variational
EM I
EM II
ICM
Hidden Variable Parameter Q(d) Q(θ) Objective Function
d, θ – N (md , Sd ) N (mθ , Sθ ) F(md , Sd , mθ , Sθ )
d θ N (md , Sd ) ˆ δ(θ, θ)
θ d ˆ δ(d, d) N (mθ , Sθ ) p(d|r)
– d, θ ˆ δ(d, d) ˆ δ(θ, θ)
p(θ|r)
p(θ, d|r)
Complexity Analysis and Reduction
We consider ICM to be the most promising candidate for practical application, since it has the simplest form while retaining almost exactly the same performance as others. However, applying it directly may still substantially increase the complexity of a practical OFDM receiver. Further complexity reduction has to be devised to avoid a full N × N matrix inversion, which requires a complexity order of O(N 3 ).
ˆ in (3.21), we find that the evaluation of d ˆ only involves the Observing the expression for d
inversion of a diagonal matrix σ 2 ρ−2 I+HH H and multiplications by diagonal or DFT matrices. ˆ is O(N log N ). Hence the complexity associated with computing d ˆ in (3.22) is more involved and its complexity depends on our asThe computation for θ
sumptions about Φ. We now present two simplifying designs for both the Wiener and Gaussian PHN models. The simplifying techniques used here are similar to those for estimating PHN within the channel estimation stage using the JCPCE algorithm (Chapter 2), suggesting that the same processing unit may be used for both tasks in actual implementation.
3.5.1
Wiener Phase Noise
Similar to Chapter 2, we make use of the fact that the inverse of Wiener PHN covariance matrix Φ has a convenient tridiagonal structure [51]. If we let Ψ = σ −2 Φ, Ψ−1 = σ 2 Φ−1 can
60
3.5. COMPLEXITY ANALYSIS AND REDUCTION
ˆ Table 3.6: Conjugate Gradient Algorithm for Evaluating θ. Initialization: ˆ0 = 0 θ ˆ 0 − q = −q ˆ H X] ˆ θ γ 0 = [Ψ−1 + X ν 0 = −γ 0 = q k =0: i−1 H −1 + X ˆ H X]ν ˆ k) αk = γ H k γ k /(ν k [Ψ ˆ ˆ θ k+1 = θ k + αk ν k ˆ H X]ν ˆ k γ k+1 = γ k + αk [Ψ−1 + X
For
βk+1 =
γH k+1 γ k+1 γH k γk
ν k+1 = −γ k+1 + βk+1 ν k be written as:
Ψ−1
2
−1 σ2 = 2 αφ 0
−1 2 .. .
End
0
.. . . −1 2 −1 −1 1 −1 .. .
(3.26)
h i ˆ , where q can be computed efficiently using FFT since all matrices ˆ H (r − FH Hd) Let q = Im X ˆ is now involved in calculating q are either diagonal or DFT matrices. The evaluation of θ ˆ = q. This problem can be easily tackled ˆ H X] ˆ θ equivalent to solving a linear equation [Ψ−1 + X by the conjugate gradient method. The complete algorithm is presented in Table 3.6. The tridiagonal form of Ψ−1 helps to reduce the dominant complexity in Table 3.6, [Ψ−1 + ˆ H X]ν ˆ k , to merely 6N operations. Thus, the overall complexity of every iteration of the X conjugate gradient algorithm is O(N ). The conjugate gradient algorithm requires a maximum of N iterations to converge to the exact solution. But simulations in Section 3.6 show that
little performance degradation is introduced by setting i = 8. In conclusion, for Wiener PHN, ˆ is O(iN ), where i is the number of iterations in the conjugate the complexity of evaluating θ gradient algorithm.
61
3.6. SIMULATIONS
3.5.2
Gaussian Phase Noise
In the case of Gaussian PHN, we notice that Ψ, as a Toeplitz matrix, can be approximated by a ˜ [52,57]. According to Theorem 1 in Chapter 2, letting ψ T = [ψ0 , · · · , ψN −1 ] circulant matrix Ψ ˜ T = [ψ˜0 , · · · , ψ˜N −1 ], may be written as ˜ ψ be the first row of Ψ, then the first row of Ψ, (N − i)ψi + iψN −i ψ˜i = , N
(3.27)
˜ the modified estimator for θ becomes Replacing Ψ by Ψ, ˆ = [Ψ ˜ −1 + X ˆ H X] ˆ −1 · Im[X ˆ H (r − FH Hd)]. θ
(3.28)
This problem can be treated similar to the Wiener PHN case using the conjugate gradient ˜ Specifically, the evaluation of [Ψ ˜ −1 + X ˆ H X]ν ˆ k method in Table 3.6 by replacing Ψ with Ψ. requires 4N + 2N log N operations. Therefore, for Gaussian PHN, the overall computational ˆ is O(iN log N ). complexity of θ
3.6
Simulations
r
-
Preliminary
Detection
ˆ init d - PHN Estimation & Data Detection
- ˆ θ - ˆ d
Figure 3.1: Structure of the data detection module incorporating PHN mitigation.
To verify the effectiveness of the proposed PHN cancellation schemes, we present a set of simulations as follows. The data detection module is depicted in Fig. 3.1, where the received ˆ init signal first goes through a preliminary detection stage that detects the transmitted data d ignoring PHN distortion. Such a decision is inaccurate, but is necessary to initialize the next ˆ init is used as m(0) or d ˆ (0) . stage, which is the focus of this chapter. In the proposed schemes, d d We also simulate the conventional algorithm (i.e., GPNS in Section 3.3.1) for comparison. It is implemented differently from the original paper [11] since no pilot symbols are allocated in our 2 . Instead, simulation setting for estimating the common PHN U0 and ICI-plus-noise power 2σtot
3.6. SIMULATIONS
62
2 from the known channel response and we assume perfect knowledge for U0 and calculate 2σtot
phase noise statistics. In effect, this results in the best-achievable performance of GPNS. The following system parameters are assumed in our simulations: 1. A Rayleigh multipath fading channel with a delay of L = 10 taps and an exponentially decreasing power delay profile that has a decay constant of 4 taps; 2. An OFDM symbol size of N = 64 subcarriers with each subcarrier modulated in 64-QAM format; 3. Baseband sampling rate fs = 20 MHz (subcarrier spacing of 312.5 KHz); 4. The Wiener PHN is generated as a random-walk process with incremental PHN of αφ = 0.5 deg. The covariance matrix Φ is as depicted in (2.3); 5. The Gaussian PHN has a standard deviation of θrms = 3 deg (i.e., Rθ (0) = (πθrms /180)2 ). It is generated, according to the Matlab code recommended for the IEEE 802.11g standard [53], as i.i.d. Gaussian samples passed through a single pole Butterworth filter of 3dB bandwidth Ωo = 100 KHz. Fig. 3.2 compares the actual PHN profile with the PHN profile estimated using the variational inference algorithm (Table 3.1 with n = 3 iterations) at 30dB SNR (Eb /No ). The average PHN is also plotted. Through variational inference, we have very accurately estimated the phase noise profile, resulting in a much improved BER performance, as will be shown in Fig. 3.3 and Fig. 3.4. Note that the PHN estimated via variational inference is a distribution rather than a fixed value. Thus in addition to plotting the mean mθ , we also indicate one standard deviation around the mean, extracted from the diagonal elements of Sθ . The standard deviation quantitatively predicts the reliability of our estimates. Such an accuracy measure is also available in md , but is not shown here graphically. In Fig. 3.3, we demonstrate the performance of the proposed joint detector/estimator compared to the conventional method. The dotted line indicates the bit-error-rate (BER) of a OFDM receiver free of PHN (the ideal scenario), and the solid line indicates the BER of an OFDM receiver with PHN but without PHN mitigation (the worst case scenario). It should be noted that the system without PHN mitigation does not fail only because of the simulation assumption of good phase synchronization at the beginning of each OFDM symbol. Furthermore, we simulate at relatively high SNR (Eb /No ) values because we are implementing an uncoded
63
3.6. SIMULATIONS
0 Actual PHN Average PHN Estimated PHN mθ (Varitional Inference)
−1
Degree
−2 Wiener PHN −3
−4
−5
−6 0
10
20
30 40 Sample (a)
50
60
50
60
4
2
Degree
0
−2
Gaussian PHN
−4
−6
−8 0
Actual PHN Average PHN Estimated PHN mθ (Varitional Inference) 10
20
30 40 Sample (b)
Figure 3.2: An instance of PHN sequence estimated using the conventional method and variational inference. (a)–Wiener PHN, (b)–Gaussian PHN.
64
3.6. SIMULATIONS
No PHN Cancellation Perfect PHN Cancellation Conventional Scheme Variational ICM
−2
BER
10
−3
10
1 2 4
Wiener PHN
20
22
24
26
28 SNR (a)
30
32
34
36
No PHN Cancellation Perfect PHN Cancellation Conventional Scheme Variational ICM
−2
BER
10
−3
10
1 2 4
Gaussian PHN
20
22
24
26
28 SNR (b)
30
32
34
36
Figure 3.3: BER performance comparison between the conventional method and proposed detection schemes. (a)–Wiener PHN, (b)–Gaussian PHN.
3.7. SUMMARY
65
higher-order modulation system with diversity order one. In between the ideal scenario and the worst case scenario are the BER performance of receivers implementing the conventional PHN cancellation method (triangles) and the proposed schemes (crosses and circles). We plot the performance of the proposed algorithms after 1, 2 and 4 iterations (The BER does not improve significantly with more iterations). It is obvious that both the variational and ICM algorithms significantly outperform the conventional one, even though we assume the conventional scheme has perfect knowledge of U0 . The curves for EM I and EM II are not shown here since they overlap with the curves in the plot, implying identical performance. The superiority of variational inference over ICM is not evident here since we only consider an uncoded system, where the extra reliability information on d is not fully utilized. In Fig. 3.4 we study the performance of the low complexity simplified ICM technique by ˆ through i = 8 conjugate gradient iterations as prescribed in Table 3.6. Again, we evaluating θ plot the performance after 1, 2 and 4 ICM iterations as in Table 3.2. Compared to Fig. 3.3, it is seen that the OFDM receiver performs almost equally well, demonstrating that the simplified ICM scheme can be implemented efficiently in a practical receiver.
3.7
Summary
This chapter, together with Chapter 2, presented a complete physical layer design strategy for OFDM receivers in the presence of both CFO and PHN. Assuming the channel response and CFO have been accurately estimated adopting the methodology in Chapter 2, here we put forward a novel and low-complexity OFDM detection scheme based on variational inference techniques that combats PHN impairment. It is in essence an algorithm that updates the estimates for data and PHN iteratively. We have also demonstrated that each step of the iteration can be performed efficiently using FFT-based computations, leading to straightforward implementation in a practical OFDM receiver.
66
3.7. SUMMARY
No PHN Cancellation Perfect PHN Cancellation Conventional Scheme CG−based ICM
−2
BER
10
−3
10
1 2 4
Wiener PHN
20
22
24
26
28 SNR (a)
30
32
34
36
No PHN Cancellation Perfect PHN Cancellation Conventional Scheme CG−based ICM
−2
BER
10
−3
10
1 2 4
Gaussian PHN
20
22
24
26
28 SNR (b)
30
32
34
36
Figure 3.4: BER performance of the low complexity detection scheme. (a)–Wiener PHN, (b)– Gaussian PHN.
Part II
Soft-In Soft-Out Detection in Multiple Access Channels
67
Chapter 4
A Variational Inference Framework for Soft-In-Soft-Out Detection In this chapter, we propose a unified framework for deriving and studying soft-in-soft-out (SISO) detection in multiple-access channels using the concept of variational inference. The proposed framework may also be extended for SISO equalization in inter-symbol interference (ISI), and multiple-input multiple-output (MIMO) channels. Without loss of generality, we will focus our attention on turbo multiuser detection, to facilitate a more concrete discussion. It is shown that variational inference avoids the exponential complexity of maximum a posteriori probability (MAP) detection by optimizing the variational free energy. In addition to its systematic appeal, there are several other advantages to this viewpoint. First of all, it provides rigorous justifications for numerous detectors that were proposed on radically different grounds, and facilitates convenient joint detection and decoding (utilizing the turbo principle) when error-control codes are incorporated. Secondly, efficient joint parameter estimation and data detection is possible via the variational expectation maximization (EM) algorithm, such that the detrimental effect of inaccurate channel knowledge at the receiver may be dealt with systematically. We are also able to extend BPSK-based SISO detection schemes to arbitrary square QAM constellations in a rigorous manner using a variational argument, which will be shown in the next chapter.
68
4.1. INTRODUCTION
4.1
69
Introduction
At the centre of physical layer wireless communications are the challenges of data detection and error control code (ECC) decoding. These two tasks were traditionally considered separately: The data detection stage first estimates the channel symbols; then, after symbol-bit demapping, the ECC decoder processes the channel bit decisions to further reduce error probability. Utilizing the turbo principle [21], the seminal work of [58, 59], among others, introduced a new philosophy for receiver design, in which the detector and decoder exchange soft information in an iterative manner, resulting in dramatically improved bit-error-rate (BER) performance without any substantial complexity increase. Such a turbo receiver structure, depending on the areas of application, is called turbo multiuser detector ( [59, 60]), turbo equalizer ( [58, 61]), or turbo MIMO (multiple-input multiple-output) equalizer ( [62, 63]). A key component in the turbo receiver is a practical soft-in soft-out (SISO) detector/equalizer, which has to be able to receive and generate soft estimates with low computational overhead – the optimal SISO detector, the a posteriori probability (APP) detector, has exponential complexity. The works by Wang and Poor [59], and T¨ uchler, Singer and Koetter [61] successfully addressed these requirements by using the simple minimum mean-squared error (MMSE) principle in their detector/equalizer designs. Nevertheless, viable solutions are not limited to the MMSE type. For instance, [60] and [64] proposed powerful turbo multiuser detectors using, respectively, parallel and successive interference cancellation schemes as the detector component. In this chapter, we intend to propose a generalized method for the design of a SISO MUD, adopting variational inference as the design framework. We will see that this approach not only successfully includes some important existing SISO MUD schemes as special cases, but easily leads to various improvements and extensions. Although our study focuses on SISO MUD by treating it as an approximate inference engine, it also encompasses uncoded MUD (detectors with no prior information and only hard decision output), since uncoded MUD can be viewed as SISO MUD with uniform prior distributions for the channel symbols. Prior to this work, recent attempts on providing a unified approach to study the wide range of multiuser detectors include, to name a few, [65], [66] and [67]. Boutros and Caire [65] generalize iterative multiuser joint decoding as an approximate sum-product algorithm in a factor graph containing both the multiuser channel and code constraints. Such a generalization leads to elegant performance analysis through density evolution. Tanaka [66] and Guo and Verd´ u [67] view the uncoded linear and optimal multiuser detectors as posterior mean estimators of the
4.1. INTRODUCTION
70
Bayes retrochannel such that, in the large system limit, the bit error rate (BER) may be evaluated through techniques from statistical physics. This chapter may be regarded as an extension of [66] and [67] into the realm of non-linear (and iterative) detectors. Specifically, we show that such detectors arise from approximating the posterior distributions and iteratively optimizing the approximate distributions, and address the design challenges of the MUD component within the iterative multiuser joint decoding problem, highlighted in [65]. The implications of this new generalized framework are significant in at least three ways: 1. Theoretical Justification for Existing Multiuser Detectors: Section 4.4 introduces the variational inference formulation for MUD, in which a quantity known as variational free energy is constructed and minimized, generating a procedure termed variational free energy minimization (VFEM). From this perspective, we will show how various uncoded linear multiuser detectors (e.g., decorrelating and MMSE detectors), as well as their interference cancellation extensions (e.g., unconstrained or clipped successive interference cancellation (SIC) detectors) may be derived. We will further argue that the VFEM approach naturally produces SISO multiuser detectors that can be used in turbo MUD. In particular, we will examine the celebrated algorithms proposed in [60] and [59], to reveal that they can both be derived with the VFEM approach. 2. Channel Parameter Joint Estimation Using Variational EM Algorithm: Section 4.5 considers the scenario where certain channel parameters are unknown or inaccurately estimated at the multiuser receiver, motivating the joint estimation of channel parameters together with unknown data symbols. The VFEM framework offers a natural solution to this problem. By iteratively minimizing the free energy over both the data symbols and the channel parameters, we arrive at the variational EM algorithm [55]. This is a generalized EM algorithm with exact inference in the E step replaced by variational inference. As examples of this parameter estimation mechanism, we will demonstrate how the unknown channel noise variance may be iteratively estimated, and inaccurate channel amplitude refined, in conjunction with turbo MUD. 3. Generalization of BPSK MUD to Square QAM Modulation: In bandwidth-constrained channels, extensions of the SISO multiuser detectors from BPSK modulation to square QAM modulation may also be carried out within the VFEM framework. These extensions are not ad hoc, but optimal in the sense that the variational free energy modified for M QAM modulation is minimized. Such a scheme gives rise to a iterative detection technique
71
4.2. SYSTEM DESCRIPTION
for general linear Gaussian channels, called Bit-Level Equalization and Soft Detection (BLESD). A detailed discussion of BLESD will be presented in Chapter 5. The rest of the chapter will be organized as follows: Section 4.2 describes the multiple access channel model and formulates the optimal SISO multiuser detectors; Section 4.3 discusses the decoding/detection scheduling issue by studying the factor graph containing both the multiuser channel and code constraints. This will prove to be an important design parameter in the subsequent analysis of variational-inference-based detectors. Sections 4.4 and 4.5 contain the introduction and application examples of the proposed variational inference framework for MUD, and in two directions (the first two points summarized above) justify the merits of this new point of view; Section 4.6 presents some simulation results, and Section 4.7 provides a summary.
4.2 4.2.1
System Description Signal Model for BPSK Modulation
Consider a synchronous DS-CDMA wireless link with K users. Assuming flat fading, by sampling the chip matched filter output at chip rate, the received signal in one symbol interval, r ∈ RN ×1 , can be written in the well-known vector form: r = SAb + n,
(4.1)
where S = [s1 , s2 , · · · , sK ] is the spreading code matrix containing the normalized spreading
sequences of the K active users, A = diag(A1 , A2 , · · · , AK ) is the channel matrix representing each user’s signal amplitude and b = [b1 , b2 , · · · , bK ]T contains the transmitted BPSK channel
symbols from each user. n is a white Gaussian noise vector with distribution p(n) = N (0, σ 2 I).
After bit-level matched filtering at the receiver, we may write the matched filter output,
y ∈ RK×1 , as:
y = ST r = RAb + z,
(4.2)
where R = ST S is the symmetric normalized signature correlation matrix with unit diagonal elements, and z is a coloured Gaussian noise vector with distribution p(z) = N (0, σ 2 R).
The correlated noise statistics in y may be whitened by applying a noise whitening filter
72
4.2. SYSTEM DESCRIPTION
F−T , yielding ¯ = F−T y = FAb + n ¯, y
(4.3)
where F is a lower triangular matrix (i.e., Fij = 0 for i < j) resulting from the Cholesky ¯ is a white Gaussian noise vector, having the same distribution factorization for R, R = FT F. n as n. ¯ are sufficient statistics for detecting b, equations (4.1), (4.2) and (4.3) are equivAs y and y alent starting points for the derivation of multiuser detectors, although certain computational savings are easier to identify with certain models. Note that the channel model for frequency selective and asynchronous channels takes a similar linear form as (4.1). Thus the adaptation to these more general channel types is possible, but will not be discussed explicitly here. Interested readers may refer to, e.g., [59], for further insights.
4.2.2
Optimal SISO Detectors
Given the prior distribution p(b) and the conditional distribution p(r|b), the jointly optimal detector uses Bayes rule to compute p(r|b)p(b) p(b|r) = P . b p(r|b)p(b)
(4.4)
The posterior distribution p(b|r) is the “soft output” of the jointly optimal detector; hard decisions are obtained by maximizing over all possible symbol vectors b. Similarly, the individually optimal detector is obtained by evaluating the marginal posterior distribution of bk (k = 1 to K):
where p(r|bk )p(bk ) =
P
brbk
p(r|bk )p(bk ) p(bk |r) = P , bk p(r|bk )p(bk )
(4.5)
p(r|b)p(b). Due to the discrete nature of the information sym-
bols, both jointly optimal and individually optimal detectors require prohibitive exponential complexity. The individually optimal detector is the optimal SISO multiuser detector in terms of minimizing bit error rate (BER). Practical suboptimal SISO multiuser detectors may be derived by taking in the prior information p(bk ) and producing a posterior probability p(bk |r) or p(bk |y)
through some intelligent approximation which does not have exponential complexity. As we
4.3. MESSAGE-PASSING SCHEDULING IN TURBO MULTIUSER DETECTION
73
have seen in previous chapters, variational inference is one example of these “intelligent approximations”, where the outcome, Q(bk ), which approximates p(bk |r), is found by optimizing an underlying cost function called variational free energy. Starting from Section 4.4, we will show how to use variational inference to derive suboptimal multiuser detectors.
4.3
Message-Passing Scheduling in Turbo Multiuser Detection
In a turbo multiuser detector, the detector section needs to be able to accept prior estimates {p(bk )}K k=1 from the APP decoder and generate a soft decision, called extrinsic information
(EXT), to be sent back to the APP decoder. Such a mechanism for EXT exchange can be rigorously justified as the message passing algorithm in graphs [65, 68]. However, since any practical multiuser detector is at best an approximation to the exact sum-product algorithm (because exact inference, with the individually optimal detector, is NP complete), good methods to generate and pass EXT are not unique. In addition, the factor graph describing the statistical dependencies among all unknowns (conditioned on the observations) contains cycles, and hence several message passing schedules are valid. In this section we describe the sequential, flooding and hybrid schedules, and show that the Wang-Poor algorithm corresponds to a hybrid scheduling, while the flooding schedule is novel. The sequential schedule takes K times as long as the flooding schedule, but may result in fewer iterations to achieve a given level of performance. From Fig. 4.1, it is seen that the nodes representing the channel bits {bt,k }K k=1 are the relay
nodes that separate the graph into two halves, where on one side the decoder runs belief propagation to perform per-user APP decoding and on the other side the multiuser detector performs variational inference. The process by which the APP decoder retrieves prior information and generates extrinsic information is standard (see [69]) and will be skipped. We will therefore only discuss message passing between the detector and decoder.
4.3.1
Obtaining Extrinsic Information: Sequential Schedule
When a SISO detector is viewed as an approximate sum-product algorithm [65], the EXT may be obtained in a way analogous to the message-passing rule in graphs. Fig. 4.2 provides an example that demonstrates that the EXT for b1 may be generated using the priors of b2 , b3 and
74
4.3. MESSAGE-PASSING SCHEDULING IN TURBO MULTIUSER DETECTION
+ ,-) ./)0
( )*
( )*
( )*
( )*
#$$
#$$ '
#$$ &
#$$ %
!! "
Figure 4.1: Graphical model of a coded multiuser channel. Note the time dependency among bits of the same user (code constraint), and the user dependency among bits at the same time (channel constraint). b4 , but not the prior of b1 . In its exact form the message (EXT) from node f to node b1 is Mf →b1 =
X
p(r|b)p(b2 )p(b3 )p(b4 ) = p(r|b1 ).
(4.6)
b2 ,b3 ,b4
In sequential scheduling, Mf →b1 will be passed into the APP decoder for user 1, which will generate a new prior for b1 that will be used for EXT generation for b2 , and so on. So error control decoding is performed one user at a time, and not in parallel. In an approximate evaluation of EXT for bk that follows the same vein, one would ignore the prior of bk even if it is available from a previous iteration, and use a simple multi-user detector such as linear MMSE to generate an estimated p(r|bk ) using only {p(bl )}l6=k . Thus in
the sequential schedule,
• the EXT for each bit is obtained using different inputs (prior distributions), necessitating a substantially different EXT generator (multiuser detector) for each bit; and
• the prior knowledge of bk is ignored before detection in generating the EXT for bk . The sequential schedule to obtain extrinsic information is intuitive, since it resembles the message-passing protocol defined in the sum-product algorithm [43, ch. 4]. But it is also very
4.3. MESSAGE-PASSING SCHEDULING IN TURBO MULTIUSER DETECTION
L
L
6 789:;
75
6 789:;
PQRRSTU VWSXWY 12
13
15
D EFGH ?@
?A
L
12
13
D EFIH
?B
14 D EFJH
?C
D EFKH
12
13
15
14
?@
?A
?B
?C
L
6 789:;
15
Z[\RS[YQW ]QRQ^RSXT
14
_`ab ]Q^XcSTU
D MNO EFGH D EFIH D EFJH D EFKH ?C
6 789:;
D E89FGH 12
13
15
14
?@
?A
?B
?C
Figure 4.2: An instance of sequential message-passing in the graphical model: the multiuser detector receives prior distributions of b2 , b3 and b4 to generate the extrinsic information for b1 . This process is repeated for b2 , b3 and b4 to complete one message-passing iteration. restrictive, in that users have to be detected in series, introducing latency in the detection process. Furthermore, since a different joint detector must be devised for each user, the overall complexity in general increases linearly with K if no simplification measures are taken.
4.3.2
Obtaining Extrinsic Information: Flooding Schedule
In the flooding schedule, illustrated in Fig. 4.3, EXT’s for all bits are generated in parallel. The message from node f to bk will be Mf →bk =
X
{bl }l6=k
p(r|b)
K Y
l=1,l6=k
p(bl ) ∝
p(bk |r) . p(bk )
(4.7)
Note that, unlike in sequential scheduling, all EXT’s use the same priors. For instance, Mf →b2
and Mf →b4 both use p(b3 ) whereas in the sequential schedule, Mf →b4 would use pnew (b3 ) from
76
4.3. MESSAGE-PASSING SCHEDULING IN TURBO MULTIUSER DETECTION
i jklmn
i jklmn
de
df
dh
w xyz{ opp qrs
opp qrt
w xy|{
opp qru
dg w xy}{
opp qrv
w xy~{
de
df
dh
dg
opp qrs
opp qrt
opp qru
opp qrv
i jklmn
de
df dh dg w xyz{ w xy|{ w xy}{ w xy~{ opp opp opp opp qrs qrt qru qrv
i jklmn
w xy| k{ w xy} k{ w xy~ k{ w xyzk{ { { w xy| w xy} w xy~{ w xyz{ de df dh dg opp qrs
opp qrt
opp qru
opp qrv
Figure 4.3: An instance of flooding message-passing in the graphical model: the multiuser detector receives prior distributions of b1 , · · · , b4 to generate the extrinsic information for b1 , · · · , b4 . This completes one message-passing iteration. the most recent round of APP decoding. As well, we can write the EXT of bk as Mf →bk
K X Y 1 = p(r|b) p(bl ) p(bk ) {bl }l6=k
(4.8)
l=1
and hence view the flooding schedule as making use of all prior probabilities from the same iteration. This reasoning, together with (4.7), leads to the following sub-optimal approximation: • Use all prior probabilities from the same iteration to generate an approximate p(bk |r), say Q(bk );
• Form the EXT for bk by dividing Q(bk ) by p(bk ); • Send all EXT’s to the K APP decoders in parallel.
4.4. MULTIUSER DETECTION VIA VARIATIONAL INFERENCE
77
The advantage of the flooding schedule is two fold: 1) By passing messages to the detector in one shot, the latency is low; 2) By generating the extrinsic information in one shot, the complexity of the detector is reduced. Through implementing the flooding schedule, our MUD design challenge is shifted from approximating p(r|bk ) to approximating p(bk |r). And the variational inference viewpoint of MUD allows us to easily do so.
4.3.3
Obtaining Extrinsic Information: Hybrid Schedule
A hybrid schedule can be defined in which the EXT for bk is computed without using p(bk ) like in the sequential schedule, and all EXT’s are computed in parallel like in the flooding schedule. This approach removes the latency issue in sequential scheduling, and has been used in the literature without justification. If exact inference is used to compute p(r|bk ) in the hybrid schedule, and p(bk |r) in the
flooding schedule, the two implementations are identical, since the messages coming out of
the MUD section are the same – {p(r|bk )}K k=1 . However, in practical detector design, p(r|bk )
or p(bk |r) must be approximated. As to be demonstrated in Section 4.4.4, p(bk |r) may be approximated as Q(bk ) given prior distributions {p(bl )}K l=1 , while p(r|bk ) may be approximated
as Q(bk ) given {p(bl )}l6=k and non-informative p(bk ). With these approximations, the hybrid and
flooding scheduling schemes differ, as the former becomes the Wang-Poor turbo detector [59] and the latter turns into a brand-new design.
4.4
Multiuser Detection via Variational Inference
In [67], Guo and Verd´ u treat the linear multiuser detectors as posterior mean estimators (PME) with appropriately postulated distributions p(b) and p(r|b). For example, if a Gaussian prior is assumed, i.e., p(b) = N (0, I), and the channel is modelled as p(r|b) = N (SAb, α2 I), the posterior (or conditional) mean estimator, i.e., E [b|r], is a generalized linear detector given by ˆ = AT ST SA + α2 I b
−1
AT ST r.
(4.9)
By choosing different values for α, we arrive at different linear detectors. If α2 = σ 2 , we get the MMSE detector. If α → 0, we approach the decorrelating detector. And if α → ∞, the matched filter output is attained.
78
4.4. MULTIUSER DETECTION VIA VARIATIONAL INFERENCE
However, this model has its limitations in that it excludes all nonlinear detectors as p(b) can only be modelled as a continuous distribution for the evaluation of the posterior p(b|r) to be tractable1 , which is obviously unsatisfactory since b by nature takes on values in a discrete alphabet. In this work, we wish to extend the coverage of the posterior mean estimator by introducing an additional degree of freedom in approximating the posterior distribution. More specifically, we will not limit ourselves to applying Bayes rule to calculate the posterior, but instead use the more general and flexible variational inference technique.
4.4.1
Variational Inference and Variational Free Energy Minimization
As stated earlier, the general task of the SISO multiuser detector is to perform inference on ¯ (we will simply use r for now, as it is understood that they b given the observation r, y or y are equivalent). Suppose our objective is the jointly optimal detector, then the distribution of interest is2 p(b|r). Very often, however, the direct evaluation of p(b|r) is computationally intractable when Bayes rule is applied directly, in particular, when p(b) is a discrete distribution. In such a case, the variational inference technique assumes a tractable approximation to p(b|r), written as Q(b), where the constant r is omitted for convenience. According to Chapter 1, we may formulate the variational free energy as: F(λ) =
Z
b
Q(b) log
Q(b) db, p(b, r)
(4.10)
which equals D [Q(b)kp(b|r)] up to an additive constant and where λ contains parameters that specify Q(b) for K users’ symbols. The general procedure of VFEM contains three steps: 1. Postulation: Assume postulated distributions for p(b), p(r|b) and Q(b); 2. Evaluation: Derive closed-form expression for F(λ); 3. Optimization: Minimize F(λ) (exactly or iteratively) over λ.
4.4.2
VFEM Interpretation of Linear Multiuser Detectors
We shall begin by deriving linear multiuser detectors from variational free energy minimization, and thus show that simply adjusting the postulated distributions p(b), p(r|b) and Q(b) leads 1
We loosely define tractability as computations that can be done with polynomial complexity. Strictly speaking, individually optimal detector minimizes the BER. But since the difference is minimal, we may consider the jointly optimal detector for simplicity 2
79
4.4. MULTIUSER DETECTION VIA VARIATIONAL INFERENCE
to the well-known decorrelating and MMSE detectors. Although the exercises presented here are somewhat trivial, since uncoded linear MUD is the simplest instance of MUD, they lay the foundation for more sophisticated variations in later sections.
Proposition 2 Decorrelating Detectors may be derived through the VFEM routine by assuming the following distributions:
p(b) = Constant
p(r|b) = N (SAb, σ 2 I) Q(b) = N (µ, Σ).
(4.11)
Proof: Evaluating F(λ) as in (4.10), we have a function of µ and Σ: 1 1 F(µ, Σ) = − log |Σ| + 2 µT AT ST SAµ + tr[(AT ST SA)Σ] − 2rT SAµ 2 2σ
(4.12)
ˆ of F(µ, Σ). Calculating ˆ and Σ The final estimate of Q(b) is given by the minimizers µ ∂F(µ)/∂µ and ∂F(Σ)/∂Σ−1 and equating to zero, we have
ˆ = (AT ST SA)−1 AT ST r µ ˆ = σ 2 (AT ST SA)−1 . Σ
(4.13)
ˆ can be used as the detector output, since it maximizes Q(b), If hard decisions are desired, µ ˆ is identical to the decorrelating detector which is Gaussian. It is easy to recognize that µ output. Note that given the postulated priors in (4.11), the exact posterior p(b|r) is tractable and is ˆ is the exact posterior distribution in fact Gaussian. Therefore, the solved Q function, N (ˆ µ, Σ), which could also have been found by applying Bayes rule directly.
The decorrelating detector uses non-informative priors for the data bits transmitted, by setting p(b) to a constant. But in practice, side information is available. For instance, {bk }K k=1
can be safely assumed to be i.i.d. and zero mean. For BPSK signaling, in particular, we also known that E(b2k ) = 1. We will subsequently show that the Gaussian approximation about p(b), utilizing the first and second order statistics of b, gives rise to the familiar MMSE detector.
4.4. MULTIUSER DETECTION VIA VARIATIONAL INFERENCE
80
Proposition 3 MMSE Multiuser Detectors may be derived through the VFEM routine by assuming the following distributions:
p(b) = N (0, I)
p(r|b) = N (SAb, σ 2 I) Q(b) = N (µ, Σ).
(4.14)
Proof: Evaluating F(λ) yields a function of µ and Σ:
1 1 F(µ, Σ) = − log |Σ| + 2 µT (AT ST SA + σ 2 I)µ + tr[(AT ST SA + σ 2 I)Σ] − 2rT SAµ 2 2σ (4.15) Solving ∂F(µ)/∂µ = 0 and ∂F(Σ)/∂Σ−1 = 0 leads to the following solution: ˆ = (AT ST SA + σ 2 I)−1 AT ST r µ ˆ = σ 2 (AT ST SA + σ 2 I)−1 . Σ
(4.16)
ˆ in (4.16) can be identified as the MMSE detector output. Apparently, µ Note that the variational inference interpretation of decorrelating and MMSE detectors also produces a covariance matrix Σ of the Q function, which is not available through conventional signal processing techniques. Σ indicates the reliability of the detector output, something the hard-decision detector is unable to make use of. But it will prove valuable in SISO detectors, as demonstrated in Sections 4.4.4 and 4.4.5.
4.4.3
VFEM Interpretation of Interference Cancellation Detectors
Iterative multiuser detectors, and especially their convergence behavior, have been actively researched in the past. In [70], linear SIC and PIC are categorized as the Gauss-Seidel and Jacobi iterations for solving linear equations. SIC is also analyzed in greater depth in [71] and [72]. The study is later extended to clipped SIC in [73] through the investigation of the variational inequality (VI) problem. Here we offer an alternative view of SIC as the coordinate descent algorithm applied to the minimization of F(µ, Σ). Proposition 4 Linear/Clipped SIC Detectors may be derived from assuming the same distributions as in (4.11) or (4.14), except by minimizing F(µ, Σ) using the coordinate descent
81
4.4. MULTIUSER DETECTION VIA VARIATIONAL INFERENCE
algorithm. That is, in the i-th iteration, for k = 1 to K, minµk
(i)
(i)
(i−1)
(i−1)
F(µ1 , · · · , µk−1 , µk , µk+1 , · · · , µK
, Σ)
µmin ≤ µk ≤ µmax .
s.t.
(4.17)
The algorithm describes a linear SIC if µmin = −∞ and µmax = ∞, and a clipped SIC otherwise. Proof: Setting ∂F(µ, Σ)/∂µk = 0 based on (4.15) yields Ak sTk SAµ − Ak sTk r + σ 2 µk = 0.
(4.18)
Rearranging the terms and defining µ\k = [µ1 , · · · , µk−1 , 0, µk+1 , · · · , µK ]T , the optimal µk is
then expressed in the familiar linear interference cancellation form if µk is unbounded (i.e.,
µmin = −∞ and µmax = ∞): µ ˆk =
A2k
1 Ak sTk (r − SAµ\k ). + σ2
(4.19)
Since updating µk (k = 1, · · · , K) consecutively subject to ∂F(µ, Σ)/∂µk = 0 is the coordinate
descent algorithm for minimizing F(µ, Σ), then (4.19) corresponds to the coordinate descent
implementation of the MMSE detector. On the other hand, setting ∂F(µ, Σ)/∂µk = 0 based on (4.12) leads to the the coordinate descent implementation of the decorrelating detector: µ ˆk =
1 T s (r − SAµ\k ), Ak k
(4.20)
which is the standard-form SIC detector seen in the literature. If µmin and µmax are finite, we need to solve (4.18) subject to µmin ≤ µk ≤ µmax , which corresponds to clipped SIC.
To verify that (4.19) and (4.20) do converge to MMSE or decorrelator solutions, and to gain
further insights into the convergence behavior when the optimization constraints are active (clipped SIC), we invoke the following theorem [74]: Theorem 4 (Luo and Tseng, 1992) Consider an optimization problem: min f (x) = g(Ex) + cT x, s.t. x ∈ X ,
(4.21)
where X is a box (possibly unbounded) in Rn , f is a proper closed convex function in Rn , g is a proper closed convex function in Rm , E is an m × n matrix having no zero column, and c ∈ Rn .
4.4. MULTIUSER DETECTION VIA VARIATIONAL INFERENCE
82
Also assume 1. The set of optimal solutions for (4.21), denoted by X ∗ is nonempty; 2. The domain of g is open and g is strictly convex twice continuously differentiable on the domain; 3. ∇2 g(Ex∗ ) is positive definite for all x∗ ∈ X ∗ . Then if {xr } is a sequence of iterates generated by coordinate descent method according to
the Almost Cyclic Rule or Gauss-Southwell Rule, {xr } converges at least linearly to an element
of X ∗ .
Since the objective function of optimization, F(µ, Σ), satisfies all conditions in the theorem
when the spreading codes are linearly independent, it is clear that this theorem applies to the general linear/clipped SIC setting. Also due to the objective function being quadratic and the constraints being linear, there is a unique optimal solution in X ∗ . We may thus conclude the following:
Corollary 1 Linear/Clipped SIC are guaranteed to converge to the unique minimum free energy defined by F(µ, Σ) and the constraint (µmin ≤ µk ≤ µmax ), and the rate of convergence is at least linear.
This result is proven for the first time to our knowledge. Additionally, we may relax the conventional cyclic order of iteration for SIC and assert that as long as the coordinates are iterated upon according to either the Almost Cyclic Rule or Gauss-Southwell Rule, at least linear convergence rate is guaranteed. These relaxed iteration rules are discussed in [74]. In the sequel, we will investigate a few SISO multiuser detectors within the variational inference framework. Unlike the uncoded detectors studied previously, we will now make use of the soft output provided by Q(b) to facilitate iterative multiuser joint decoding. We will demonstrate that a unique SISO detector is determined by choosing 1) the postulated distributions (like (4.11) and (4.14), but with biased priors), and 2) the message-passing schedule for joint decoding.
4.4. MULTIUSER DETECTION VIA VARIATIONAL INFERENCE
4.4.4
83
VFEM Interpretation of Gaussian SISO Multiuser Detector
Definition 1 A Gaussian SISO Multiuser Detector is a multiuser detector that obtains soft estimates Q(b) through the VFEM routine, subject to the following postulated distributions:
˜ W) p(b) = N (b,
p(r|b) = N (SAb, σ 2 I) Q(b) = N (µ, Σ),
(4.22)
˜ = [˜b1 , · · · , ˜bK ]T are the soft bit estimates from the APP decoder, and W = diag([1 − where b ˜b2 , · · · , 1 − ˜b2 ]T ). 1 K
We name this detector Gaussian SISO MUD because, like the uncoded linear detectors in
Section 4.4.2, Gaussian densities are assumed for the prior and posterior distributions of b. But unlike the linear detectors, this detector is capable of accepting informative priors, as well as generating soft posterior bit probability. The Existing Form of Gaussian SISO MUD A ground-breaking turbo detection scheme was proposed by Wang and Poor [59], spurring a tremendous amount of interest in turbo MUD and turbo equalization in the years that followed. It involves a two stage process: First, the soft bit estimate from the APP decoder is remodulated and subtracted from the matched filter output: ˜k, yk , y − RAb
(4.23)
˜ k = [˜b1 , · · · , ˜bk−1 , 0, ˜bk+1 , · · · , ˜bK ]T , which is equal to the soft bit estimates coming In (4.23), b ˜ except for the k-th element being 0. from the APP decoder, b, Second, a linear MMSE filter is used to further suppress the residual interference. It can be shown that the filter output is Soft interference MMSE with cancellation residual MAI }| z }| { hz i{ T T 2 −1 −1 −1 ˜ zk = Ak ek A Wk A + σ R R y − Abk ,
(4.24)
where ek denotes a K-vector of all zeros, except for the k-th element being 1, and Wk = diag([1 − ˜b2 , · · · , 1 − ˜b2 , 1, 1 − ˜b2 , · · · , 1 − ˜b2 ]T ). 1
k−1
k+1
K
4.4. MULTIUSER DETECTION VIA VARIATIONAL INFERENCE
84
In order to convert the MMSE filter output zk into a soft estimate in the discrete domain, a Gaussian equivalent channel assumption is made about zk , i.e., zk = αk bk + ηk ,
(4.25)
where αk is a constant and p(ηk ) = N (0, νk2 ). In other words, p(zk |bk ) = N (αk bk , νk2 ). Since αk and νk2 can be found to be, respectively,
αk = A2k (AT Wk A + σ 2 R−1 )−1 k,k νk2 = zk − zk2 ,
(4.26)
the output EXT can be written as LLRmud (bk ) = log
p(r|bk = 1) p(zk |bk = 1) 2zk . ≈ log = p(r|bk = −1) p(zk |bk = −1) 1 − αk
(4.27)
In essence, the target distribution p(r|bk ) is approximated by p(zk |bk ) to obtain the EXT. We
will now demonstrate that with the VFEM formulation, the two-stage process can be derived
from a single optimization procedure, and without the heuristic Gaussian assumption about zk . Proposition 5 The SISO multiuser detection scheme described in [59] is an instance of Gaussian SISO MUD. Proof: If the extrinsic information is extracted following the sequential schedule in Section 4.3.1, by ignoring the prior information for bk , then (4.22) may be modified as
˜ k , Wk ) p(b) = N (b
p(r|b) = N (SAb, σ 2 I) Q(b) = N (µ , Σ ), k k
(4.28)
˜ k = [˜b1 , · · · , ˜bk−1 , 0, ˜bk+1 , · · · , ˜bK ]T and Wk = diag([1−˜b2 , · · · , 1−˜b2 , 1, 1−˜b2 , · · · , 1− where b 1 k−1 k+1 ˜b2 ]T ). From (4.28), it can be shown that K
Fgauss (µk , Σk ) =
−1 −1 1 T T T 2 T 2 ˜T 2σ2 [µk (A S SA + σ Wk )µk − 2(r SA + σ bk Wk )µk ] − 12 log |Σk | + 12 tr(Wk−1 Σk ) + 2σ1 2 tr(AT ST SAΣk ).
(4.29)
4.4. MULTIUSER DETECTION VIA VARIATIONAL INFERENCE
85
Let µ′k denote the k-th element of µk . Solving ∂Fgauss (µk , Σk )/∂µ′k = 0 yields ˜ k ), µ′k = eTk (AT ST SA + σ 2 Wk−1 )−1 AT ST (r − SAb
(4.30)
which is identical to zk in (4.24). One piece of information that the MMSE-based detector in [59] does not have is the covariance matrix of the posterior distribution, Σk , which can be shown to be Σk =
1 T T A S SA + Wk−1 σ2
−1
.
(4.31)
In other words, the marginal posterior distribution of bk is Q(bk ) = N (µ′k , [Σk ]k,k ). Since
the prior distribution of bk is ignored during the detection operation, Q(bk ) obtained as such is in fact proportional to p(r|bk ). Therefore, LLRmud (bk ) = log
2µ′k p(r|bk = 1) Q(bk = 1) . ≈ log = p(r|bk = −1) Q(bk = −1) [Σk ]k,k
(4.32)
Applying the matrix inversion lemma on Σk in (4.31), we have Σk = Wk − Wk A(AWk A + σ 2 R−1 )−1 AWk .
(4.33)
Since [Wk ]k,k = 1, [Σk ]k,k = 1 − A2k [(AWk A + σ 2 R−1 )−1 ]k,k = 1 − αk , where αk is as defined in (4.26). Therefore,
LLRmud (bk ) =
2µ′k 2zk = . [Σk ]k,k 1 − αk
(4.34)
We have thus re-derived the Wang-Poor scheme via a radically different approach. It is remarkable how the variational inference viewpoint leads to exactly the same outcome as [59], while the conditional Gaussian assumption made about the MMSE filter output is no longer necessary. After taking APP decoding into account, the Wang-Poor turbo MUD algorithm as a whole can be seen as hybrid-Gaussian-SISO MUD. In the next section, we will systematically investigate all three possible scheduling schemes applied to Gaussian SISO MUD.
4.4. MULTIUSER DETECTION VIA VARIATIONAL INFERENCE
Table 4.1: Three scheduling schemes of turbo MUD employing Gaussian SISO MUD. Sequential-Gaussian-SISO ˜=0 Initialization: b FOR j = 1 : J (Outer Iteration) FOR k = 1 : K ˜ k = [˜b1 , · · · , ˜bk−1 , 0, ˜bk+1 , · · · , ˜bK ]T b Wk = diag([1 − ˜b21 , · · · , 1 − ˜b2k−1 , 1, 1h− ˜b2k+1 , · · · , 1 i− ˜b2K ]T ) −1 ˜k µ′k = Ak eTk AT Wk A + σ 2 R−1 R−1 y − Ab 2 T 2 −1 −1 αk = Ak (A Wk A + σ R ) k,k LLRmud (bk ) =
2µ′k 1−αk
Decoding
LLRdec (bk ) ⇐= LLRmud (bk ) ˜bk = tanh[LLRdec (bk )/2] END END Flooding-Gaussian-SISO ˜ Initialization: b = 0 FOR j = 1 : J (Outer Iteration) FOR k = 1 : K ˜ k = [˜b1 , · · · , ˜bk−1 , 0, ˜bk+1 , · · · , ˜bK ]T b W = diag([1 − ˜b21 , · · · , 1 − ˜b2K ]T ) h i −1 ˜k µ ˇk = Ak eTk AT WA + σ 2 R−1 R−1 y − Ab α ˇ k = (1 − ˜b2 )A2 (AT WA + σ 2 R−1 )−1 k
LLRmud (bk ) =
END FOR k = 1 : K
k 2ˇ µk 1−α ˇk
k,k
Decoding
LLRdec (bk ) ⇐= LLRmud (bk ) ˜bk = tanh[LLRdec (bk )/2] END END Hybrid-Gaussian-SISO ˜ Initialization: b = 0 FOR j = 1 : J (Outer Iteration) FOR k = 1 : K ˜ k = [˜b1 , · · · , ˜bk−1 , 0, ˜bk+1 , · · · , ˜bK ]T b Wk = diag([1 − ˜b21 , · · · , 1 − ˜b2k−1 , 1, 1h− ˜b2k+1 , · · · , 1 i− ˜b2K ]T ) −1 ˜k µ′k = Ak eTk AT Wk A + σ 2 R−1 R−1 y − Ab αk = A2k (AT Wk A + σ 2 R−1 )−1 k,k LLRmud (bk ) =
END FOR k = 1 : K
2µ′k 1−αk
Decoding
LLRdec (bk ) ⇐= LLRmud (bk ) ˜bk = tanh[LLRdec (bk )/2] END END
86
4.4. MULTIUSER DETECTION VIA VARIATIONAL INFERENCE
87
The Standard Forms of Gaussian SISO MUD In Table 4.2, we summarize three different versions of standard Gaussian SISO MUD. In the following, we point out some of the major characteristics associated with each one, and, in particular, introduce the new flooding schedule implementation. Sequential-Gaussian-SISO: In Section 4.4.4, we presented a variational-inference-based approach to obtain the EXT at the SISO detector output, which coincides with the EXT conventionally calculated through soft interference cancellation and MMSE filtering. In contrast to [59], however, where the EXT’s are stored until all users are processed and then used for APP decoding in parallel, the sequential schedule requires the EXT, LLRmud (bk ), be directly passed down to the APP decoder. Then the EXT from the APP decoder, viewed by the detector as the updated prior ˜bk , is immediately used for the detection of bk+1 . Flooding-Gaussian-SISO: The flooding schedule allows the APP decoding of all users to be done in parallel. In the detection stage, some changes to the derivation presented in Section 4.4.4 are needed, since the prior information of bk should not be ignored as is done in (4.28). Instead, the postulated distributions in (4.22) are adopted. Given (4.22), the free energy becomes Fgauss (µ, Σ) =
1 ˜ T W−1 )µ] [µT (AT ST SA + σ 2 W−1 )µ − 2(rT SA + σ 2 b 2σ2 − 12 log |Σ| + 12 tr(W−1 Σ) + 2σ1 2 tr(AT ST SAΣ).
(4.35)
Solving ∂F(µ, Σ)/∂µ = 0 and ∂F(µ, Σ)/∂Σ−1 = 0 leads to the minimizer of Fgauss (µ, Σ)
in (4.29):
˜ + (AT ST SA + σ 2 W−1 )−1 AT ST (r − SAb) ˜ µ = b
Σ = (σ −2 AT ST SA + W−1 )−1 .
(4.36)
It implies that the approximate posterior distribution, p(b|r) ≈ Q(b) = N (µ, Σ). In other
words, the marginal posterior distribution of bk is p(bk |r) ≈ N (µk , [Σ]k,k ). Recalling in (4.22), p(bk ) = N (˜bk , 1 − ˜b2k ), if we apply the flooding schedule in Section 4.3.2 to extract the EXT,
then
p(r|bk ) ∝
p(bk |r) Q(bk ) 2 ≈ = N (µext , σext ), p(bk ) p(bk )
where µext = 1
2 σext
=
2 σext
1 [Σ]k,k
µk [Σ]k,k
−
−
1 . 1−˜b2k
˜ bk 1−˜b2k
(4.37)
(4.38)
88
4.4. MULTIUSER DETECTION VIA VARIATIONAL INFERENCE
(4.38) is true, because if N (µ1 , σ12 )N (µ2 , σ22 ) ∝ N (µ3 , σ32 ), then [23] µ3 σ32 1 σ32
= =
µ1 σ12 1 σ12
+ +
µ2 σ22 1 . σ22
(4.39)
2 ) at b = 1 and b = −1, we obtain Finally, sampling N (µext , σext k k
LLRmud (bk ) = log ≈ = = = = = where
p(r|bk =1) p(r|bk =−1)
2µext 2 σext 2µk 2˜bk [Σ]k,k − 1−˜b2 k 2 −1 )−1 (R−1 y−Ab)] ˜ ˜ 2eT 2˜bk k [b+WA(AWA+σ R − 1− ˜b2 (1−˜b2k )−(1−˜b2k )2 A2k [(AWA+σ2 R−1 )]k,k k ˜ k )]−2˜bk (1−˜b2 )A2 [(AWA+σ2 R−1 )] 2eT [WA(AWA+σ2 R−1 )−1 (R−1 y−Ab k
k
k
(4.40) k,k
(1−˜b2k ){1−(1−˜b2k )A2k [(AWA+σ2 R−1 )−1 ]k,k } 2 R−1 )−1 (R−1 y−Ab ˜k) 2(1−˜b2k )Ak eT (AWA+σ k (1−˜b2k ){1−(1−˜b2k )A2k [(AWA+σ2 R−1 )−1 ]k,k } 2ˇ µk , 1−α ˇ 2k
˜k) µ ˇk = Ak eTk (AWA + σ 2 R−1 )−1 (R−1 y − Ab α ˇ k = (1 − ˜b2 )A2 (AWA + σ 2 R−1 )−1 . k
k
(4.41)
k,k
In (4.41), µ ˇk can also be computed more efficiently as
˜ + ˜bk A2 (AWA + σ 2 R−1 )−1 , µ ˇk = Ak eTk (AWA + σ 2 R−1 )−1 (R−1 y − Ab) k k,k
(4.42)
such that common information may be utilized to evaluate µ ˇk for all k. Hybrid-Gaussian-SISO: As mentioned earlier, the Wang-Poor turbo MUD scheme is exactly the hybrid-Gaussian-SISO MUD. It differs from the sequential schedule in that the EXT for bk generated by the SISO detector is now stored until the EXT’s of all users k = 1, · · · , K are ready.
Then EXT’s are passed down to the APP decoders, for decoding in parallel. Hybrid-GaussianSISO MUD brings computational savings compared to the sequential-Gaussian-SISO MUD, due to both the possibility of parallel decoding, and the ease of evaluating [AT Wk A + σ 2 R−1 ]−1 . So far, based on the Gaussian distributions assumed in the postulation step, we showed that the variational inference algorithm converges to a family of Gaussian SISO detectors, including the well-known Wang-Poor scheme as the special case. But the VFEM framework allows us to generalize even further, since the Gaussian distributions, albeit convenient, are unnatural
4.4. MULTIUSER DETECTION VIA VARIATIONAL INFERENCE
89
choices for BPSK symbols. The subsequent section will focus on a different family of detectors induced by a different set of assumptions in the postulation step.
4.4.5
VFEM Interpretation of Discrete SISO Multiuser Detector
Definition 2 A Discrete SISO Multiuser Detector is a multiuser detector that obtains soft estimates Q(b) through the VFEM routine, subject to the following postulated distributions:
p(b) =
QK
1+bk 2
k=1 ξk
(1 − ξk )
1−bk 2
, bk ∈ {±1}
p(r|b) = N (SAb, σ 2 I) 1+b k Q(b) = QK γ 2 k (1 − γ ) 1−b 2 , bk ∈ {±1}, k k=1 k
(4.43)
where ξk and γk are the prior and posterior probability of bk being 1.
The discrete SISO MUD has two salient features in the postulated distributions: 1) Both the prior and posterior distributions are discrete, conforming to the actual properties of the data; 2) The posterior distributions of individual bits, {bk }K k=1 , are assumed to be independent by applying the mean-field approximation. Indeed, the only distinction between this scheme
and the jointly optimal detector is the mean-field approximation about the posterior, which, though a crude assumption in general, is asymptotically exact in the large system limit. This technique is closely tied to the replica method used to study the performance of randomly spread CDMA [67]. The mean-field approximation is also used in [75] and [24] to derive multiuser detectors for uncoded CDMA. The Existing Form of Discrete SISO MUD In [60], a simple (linear complexity) multiuser detector was proposed for coded CDMA producing near optimal performance at very high network load. Alexander, Grant and Reed applied a simple interference cancellation scheme and made the following observation: ˜k) = p(yk |bk , b\k = b =
√ 1 2πσ2 √ 1 2πσ2
n o ˜ k − A2 bk )2 exp − 2σ1 2 (yk − sTk SAb k n o 1 T ˜ k )]2 exp − 2σ2 [Ak bk − sk (r − SAb
(4.44)
˜ is the average bit estimate received from the APP decoder, and b ˜ k = [˜b1 , · · · , ˜bk−1 , 0, where b ˜bk+1 , · · · , ˜bK ]T . Defining σ 2 = σ 2 + σ 2 as the variance of the combined channel noise and tot MU 2 can be approximated as the sample average of residual MAI modelled as Gaussian noise, σtot
90
4.4. MULTIUSER DETECTION VIA VARIATIONAL INFERENCE
˜ 2. [sTk (r − SAb)]
The soft estimate of bk can then be drawn from (4.44) as a log-likelihood ratio: LLR(bk ) =
2 ˜ k ), Ak sTk (r − SAb 2 σtot
(4.45)
˜ for the and fed back to the APP channel code decoders. The decoders subsequently update b next iteration. Now we proceed to prove the link of this simple and effective scheme to the VFEM framework. Proposition 6 The SISO multiuser detection scheme described in [60] is an instance of the Discrete SISO MUD. Proof: Let the prior distribution p(b) in (4.43) represent the EXT provided by the APP 1+bk 1−bk Q 2 decoder. Also, p(b) = K (1 − ξk ) 2 implies that p(bk = 1) = k=1 p(bk ), where p(bk ) = ξk (ξk )1 (1 − ξk )0 = ξk and p(bk = 0) = (ξk )0 (1 − ξk )1 = 1 − ξk . As seen from the derivation in (4.44), in the traditional MUD viewpoint, this information may be used for soft interference
cancellation in the detection stage. We will now demonstrate that this IC technique corresponds to one iteration of recursive minimization of variational free energy. We let ˜bk = 2ξk − 1 and mk = 2γk − 1, to denote the prior mean and posterior mean of bk . After some mathematical manipulation, we have, according to (4.43) and (4.10), Fdisc (m) =
PK h 1+mk
i
k log 1+m + 1+˜b
k=1 2 k T 1 + 2σ2 r r − rT SAm
+
1−mk k log 1−m + N2 log σ 2 2 1−˜bk mT Bm + tr(AT ST SA) ,
(4.46)
where B = AT ST SA − diag (AT ST SA). (4.46) is obtained by utilizing the property that E(bT Cb) = E
P
i6=j Cij bi bj +
PK
2 i=1 Cii bi
= mT [C − diag(C)]m + 1T diag(C)1,
(4.47)
for b ∈ {±1}K and C = [Cij ] ∈ RK×K .
Rearranging ∂Fdisc (m)/∂m = 0 gives a system of equations, for k = 1, · · · , K, that deter-
mines the minimum of Fdisc (m), log
1 + mk 1 + ˜bk 2 = log + 2 η Tk r − β Tk m , 1 − mk σ 1 − ˜bk
(4.48)
91
4.4. MULTIUSER DETECTION VIA VARIATIONAL INFERENCE
where η k and β k are the k-th column vectors of SA and B, respectively. The coordinate descent algorithm minimizes a function successively along one direction at a time. By setting ∂Fdisc (m)/∂mk to zero in turn, we have the following update for user k in iteration i: LLR(i) (bk ) = LLR(0) (bk ) +
i 2 h T (i) (i−1) T T η r − β m − β m . k k k σ2 k
In (4.49), we defined the log-likelihood ratio LLR(i) (bk ) , log (i)
(4.49)
(i)
1+mk
(i)
1−mk
(or equivalently,
mk = tanh[LLR(i) (bk )/2]). The iterations are initialized with the prior probabilities of bk , K−k+1
i.e.,
m(0)
˜ and LLR(0) (bk ) = log = b
1+˜bk . 1−˜bk
As well, mk = [0, · · · , 0, mk , · · · , mK ]T .
z }| { = [m1 , · · · , mk−1 , 0, · · · , 0]T , while
The flooding schedule (see Fig. 4.3) indicates that the EXT is the ratio between the posterior
and the prior distributions, or the difference between posterior and prior LLR’s, i.e., after I iterations, the multiuser detector passes the following EXT to the k-th decoder: LLRmud (bk ) = LLRpos (bk ) − LLR(0) (bk ) =
i 2 h T (I) (I−1) T T r − β m − β m η . k k k k σ2
(4.50)
Consider simplifying (4.50) by removing the serial iterations, then LLRmud (bk ) =
i 2 h T 2 T (0) ˜ k ), η r − β m = 2 Ak sTk (r − SAb k k 2 σ σ
(4.51)
which is similar to (4.45). Note that this simplified updating scheme does not guarantee the decrease of free energy, and thus is not as robust as the standard version in (4.50). In the above proof, we have set σ 2 to be the channel noise variance, and assumed it known. 2 is the noise-plusThis is in contrast to Alexander-Grant-Reed’s original derivation, where σtot
MAI variance which has to be estimated iteratively. We will postpone the discussion of this issue 2 can be interpreted until Section 4.5.3, where we will show that, the iterative estimation of σtot
as the M step in the variational EM algorithm for joint data detection and noise variance estimation. Also from (4.49), an interesting link to uncoded multi-stage SIC can be made – in that case, (i) (i) LLR(0) (bk ) = 0. Defining ˆb = m = tanh[LLR(i) (bk )/2], we get the hyperbolic-tangent SIC k
updates ˆb(i) = tanh k
k
1 (i) (i−1) T ˆ ˆ A s r − SA b − SA b . k k k σ2
(4.52)
4.4. MULTIUSER DETECTION VIA VARIATIONAL INFERENCE
92
In addition to demonstrating a solid theoretical foundation for the Alexander-Grant-Reed scheme, this section also clearly revealed the underlying suboptimal simplifications made en route to the final result. In the following, we will compare it to the standard discrete SISO MUD based on the theory of VFEM and the associated scheduling rules. The Standard Forms of Discrete SISO MUD In Table 4.2, we summarize three different versions of the standard discrete SISO MUD. The following highlights the major characteristics of each scheme. Sequential-Discrete-SISO : The sequential schedule obtains the EXT for bk through a serial update algorithm governed by (4.50). Before the inner iterations, {LLRdec (bl )}l6=k are set to
the most recent output from the APP decoder, except for LLRdec (bk ), which is set to 0. This is equivalent to setting ξk = 1/2 in (4.43), as required by the sequential scheduling rule. After the serial update, LLRmud (bk ) is immediately sent to the k-th APP decoder ({LLRmud (bl )}l6=k are discarded), such that an updated prior LLRdec (bk ) is generated (see Fig. 4.2). The sequential schedule is inefficient, since a different serial update of LLRmud (bk ) needs to be done K times, one for each user. A SIC-based turbo MUD scheme proposed by Kobayashi, Boutros, and Caire [64] can be seen as a simplification to the full-blown sequential-discrete-SISO MUD, with I = 1 inner iteration. Flooding-Discrete-SISO : The flooding schedule is much more efficient. The serial update algorithm in the inner iteration updates the posterior LLR’s, {LLRpos (bk )}K k=1 . After I iterations, in which the free energy is monotonically reduced, reliable estimates of {LLRpos (bk )}K k=1
are attained. The SISO detector passes the EXT, LLRmud (bk ), into the APP decoder, where the decoding of K users can be done in parallel. Hybrid-Discrete-SISO : The hybrid schedule differs from the sequential schedule, in that when LLRmud (bk ) is found, it is not immediately sent to the APP decoder to update LLRdec (bk ), but stored until all other users’ EXT’s are obtained, to facilitate parallel APP decoding.
4.4.6
VFEM Interpretation of Decorrelating-Decision-Feedback SISO Multiuser Detector
In [20], Duel-Hallen proposed the decorrelating-decision-feedback (DDF) multiuser detector. It has been shown to out-perform most of the linear and interference-cancellation detectors, especially in terms of near-far resistance. However, the soft-decision DDF MUD and its application
4.4. MULTIUSER DETECTION VIA VARIATIONAL INFERENCE
Table 4.2: Three scheduling schemes of turbo MUD employing discrete SISO MUD. Sequential-Discrete-SISO Initialization: m = 0 and LLRdec (bk ) = 0 for all k FOR j = 1 : J (Outer Iteration) FOR k = 1 : K LLRdec (bk ) = 0 FOR i = 1 : I (Inner Iteration) FOR l = k : K, 1 : k − 1h
i LLRmud (bl ) = σ22 ηTk r − β Tk m LLRpos (bl ) = LLRdec (bl ) + LLRmud (bl ) ml = tanh[LLRpos (bl )/2]
END END LLRdec (bk ) END
Decoding
⇐=
LLRmud (bk )
END Flooding-Discrete-SISO Initialization: m = 0 and LLRdec (bk ) = 0 for all k FOR j = 1 : J (Outer Iteration) FOR i = 1 : I (Inner Iteration) FOR k = 1 : K h i LLRpos (bk ) = LLRdec (bk ) + σ22 η Tk r − βTk m mk = tanh[LLRpos (bk )/2] END END FOR k = 1 : K LLRmud (bk ) = LLRpos (bk ) − LLRdec (bk ) LLRdec (bk )
END
Decoding
⇐=
LLRmud (bk )
END Hybrid-Discrete-SISO Initialization: m = 0 and LLRdec (bk ) = 0 for all k FOR j = 1 : J (Outer Iteration) FOR k = 1 : K LLRdec (bk ) = 0 FOR i = 1 : I (Inner Iteration) FOR l = k : K, 1 : k − 1h
i LLRmud (bl ) = σ22 ηTk r − β Tk m LLRpos (bl ) = LLRdec (bl ) + LLRmud (bl ) ml = tanh[LLRpos (bl )/2]
END END END FOR k = 1 : K LLRdec (bk ) END END
Decoding
⇐=
LLRmud (bk )
93
4.4. MULTIUSER DETECTION VIA VARIATIONAL INFERENCE
94
within the turbo MUD framework is relatively unknown. In this section, we will propose a SISO DDF multiuser detector using the VFEM principle. The subsequent discussion will allow new insights and new algorithms, including an interesting link to the discrete SISO detector discussed earlier. Consider applying the VFEM routine to the following postulated distributions:
p(b) =
QK
1+bk 2
k=1 ξk
(1 − ξk )
1−bk 2
, bk ∈ {±1}
p(¯ y|b) = N (FAb, σ 2 I) 1+bk 1−bk QK 2 Q(b) = γ (1 − γk ) 2 , bk ∈ {±1}, k=1 k
(4.53)
Notice that these distributions are identical to the discrete SISO case in (4.43), except that the ¯ (defined in (4.3)). Therefore, we may received vector r is replaced by its sufficient statistics y directly make use of the derivation in Section 4.4.5, and arrive at an iterative detector similar to (4.48): log
i 1 + mk 1 + ˜bk 2 h T ¯T m , ¯k y ¯−β = log + 2 η k 1 − mk 1 − ˜bk σ
(4.54)
¯ are the k-th column vector of FA and AT FT FA−diag (AT FT FA), respectively. ¯ k and β where η k ¯ = ST r = y and FT F = ST S = R. (4.54) is in fact identical to (4.48), since FT y Consider the uncoded scenario, i.e., ˜bk = 0 for k = 1, · · · , K, then (4.54) reduces to mk = tanh
1 T T ¯ ¯k y ¯ − βk m . η σ2
(4.55)
The free energy is monotonically reduced if mk is evaluated in a SIC fashion similar to (4.49), i.e., in the i-th iteration: (i) mk
= tanh
1 T T T (i) (i−1) ¯ m −β ¯ m ¯k y ¯−β η . k k k σ2
(4.56)
Now we take a crucial step that will produce the DDF SISO detector based on (4.56). We ¯ , by replacing F with a new matrix Fk . Let Fk be F, ¯ k and β will alter the definition of η k
4.4. MULTIUSER DETECTION VIA VARIATIONAL INFERENCE
95
except with elements Fk+1,k to FK,k nulled, i.e.,
F1,1
F 2,1 Fk = . .. FK,1
..
. Fk,k 0 .. .
Fk+1,k+1 .. .. . .
0
FK,k+1
FK,K
.
(4.57)
¯ be the k-th column vectors of Fk A and AT FT Fk A − diag (AT FT Fk A), ¯ k and β Then we let η k k k respectively. Subsequently, we see that
¯ k = [0, · · · , 0, Ak Fk,k , 0, · · · , 0]T η
¯ = Ak Fk,k [A1 Fk,1 , · · · , Ak−1 Fk,k−1 , 0, · · · , 0]T , β k
(4.58) (4.59)
and ¯ Tk y ¯ η ¯ T m>k β k Hence (4.56) becomes mk = tanh
= Ak Fk,k y¯k =
0.
1 T ¯ A F y ¯ − β m . k k,k k