wire lines, coaxial cables, optical fibers, or free space that serves as prop- ...... the noiseless received waveform Y (t)âan approach that we may call the received ...
Thesis for the degree of Doctor of Philosophy A dissertation submitted to the School of Computer Science and Engineering, Chalmers University of Technology, in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Electrical Engineering
Generalized APP Detection for Communication over Unknown Time-Dispersive Waveform Channels Anders ˚ A. Hansson
Department of Computer Engineering CHALMERS UNIVERSITY OF TECHNOLOGY G¨oteborg, Sweden 2003
Generalized APP Detection for Communication over Unknown Time-Dispersive Waveform Channels ANDERS ˚ A. HANSSON ISBN 91-7291-250-2 c 2003 by Anders ˚ Copyright A. Hansson. All rights reserved. Doktorsavhandlingar vid Chalmers tekniska h¨ ogskola Ny serie nr 1932 ISSN 0346-718X School of Computer Science and Engineering Chalmers University of Technology Technical report No. 10D ISSN 1651-4971
Contact Information: Telecommunication Theory Group Department of Computer Engineering Chalmers University of Technology SE-412 96 G¨ oteborg Sweden Telephone: +46 (0)31-772 1000 Fax: +46 (0)31-772 3663 URL: http://www.ce.chalmers.se/TCT
This thesis was prepared with LATEX and reproduced by Chalmers Reproservice from camera-ready copy supplied by the author. G¨ oteborg, Sweden, January 2003
Generalized APP Detection for Communication over Unknown Time-Dispersive Waveform Channels ANDERS ˚ A. HANSSON Department of Computer Engineering, Chalmers University of Technology
Abstract The principle of transmitting time-equidistant pulses to carry discrete information is fundamental. When the pulses overlap, intersymbol interference occurs. Maximum-likelihood sequence detection of such signals observed in additive white Gaussian noise (AWGN) was known in the early 1970s. Due to distortion and multipath propagation, it is less artificial to assume that the received pulse shape is unknown to the receiver designer, and in this thesis, the channel is modeled as an unknown (and time-dispersive) linear filter with AWGN. First, we discuss how the conventional optimal front-end (based on the notion of a sufficient statistic and matched filtering) is inappropriate in this context. We revisit continuous time in order to derive an equivalent vector channel, and an alternative optimality criterion is reviewed. Moreover, we present an optimal sequence detector that performs joint estimation/detection by employing the generalized maximum-likelihood technique, and it is seen how such a detector relies on an exhaustive tree search. Pruning the optimal search tree leads to a suboptimal complexity-constrained algorithm, where only a subset of all sequences are evaluated as candidates. These elementary ideas are subsequently extended to the case of blind (or semiblind) soft decision detection, which also incorporates the concept of bi-directional estimation. The soft decisions are generated in the form of approximate a posteriori probabilities (APPs), and their soundness is evaluated by considering iterative detection of interleaved serially concatenated codes. Keywords: Frequency-selective fading, intersymbol interference, channel estimation, adaptive estimation, recursive least-squares, per-survivor processing, blind (unsupervised, self-recovering) detection (deconvolution, equalization), symbol-by-symbol MAP detection, iterative detection, turbo codes, serially concatenated codes.
List of Publications This thesis is partly based on the listed publications. • A. Hansson and T. Aulin, “Iterative diversity detection for correlated continuous-time Rayleigh fading channels,” to appear in IEEE Transactions on Communications, Jan. 2003. • A. Hansson and T. Aulin, “On the discretization of unsupervised digital communication over time-dispersive channels,” in Proc. IEEE International Symposium on Information Theory, Lausanne, Switzerland, June/July 2002, p. 270. • A. Hansson and T. Aulin, “Unsupervised detection over time-dispersive vector channels,” in Proc. Radio Science and Communication, Stockholm, Sweden, June 2002, pp. 338–341. • A. Hansson, K. M. Chugg, and T. Aulin, “On forward-adaptive versus forward/backward-adaptive SISO algorithms for Rayleigh fading channels,” IEEE Communications Letters, vol. 5, pp. 477–479, Dec. 2001. • A. Hansson and T. Aulin, “Iterative array detection of CPM over continuous-time Rayleigh fading channels,” in Proc. IEEE International Conference on Communications, Helsinki, Finland, June 2001, pp. 2221–2225. • A. Hansson, T. Aulin, and K. M. Chugg, “An APP algorithm for fading channels using forward-only prediction,” in Proc. PCC Workshop / Nordic Radio Symposium, Nyn¨ ashamn, Sweden, Apr. 2001, pp. 85–89. • A. Hansson, K. M. Chugg, and T. Aulin, “A forward-backward algorithm for fading channels using forward-only estimation,” Technical Report CSI-00-11-02, Communication Sciences Institute, University of Southern California, Los Angeles, CA, Nov. 2000. • A. Hansson and T. Aulin, “On multi-antenna receiver principles for correlated Rayleigh fading channels,” in Proc. IEEE International Symposium on Information Theory, Sorrento, Italy, June 2000, p. 495.
• A. Hansson and T. Aulin, “On antenna array receiver principles for space–time-selective Rayleigh fading channels,” IEEE Transactions on Communications, vol. 48, pp. 648–657, Apr. 2000. • A. Hansson, “Detection principles for fast Rayleigh fading channels using an antenna array,” Technical Report No. 343L, Chalmers University of Technology, G¨ oteborg, Sweden, Apr. 2000. • A. Hansson and T. Aulin, “Antenna array receiver principles for correlated Rayleigh channels,” in Proc. PCC Workshop, Lund, Sweden, Nov. 1999, pp. 88–93. • A. Hansson and T. Aulin, “Generation of N correlated Rayleigh fading processes for the simulation of space-time-selective radio channels,” in Proc. European Wireless ’99 / 4th ITG Conference on Mobile Communications, Munich, Germany, Oct. 1999, pp. 269–272. • A. Hansson and T. Aulin, “Simulation of N correlated Rayleigh fading diversity links,” in Proc. Radio Science and Communication, Karlskrona, Sweden, June 1999, pp. 567–570. • A. Hansson and T. Aulin, “Communication through a space-time selective time continuous vector Rayleigh fading channel,” in Proc. PCC Workshop, Stockholm, Sweden, Nov. 1998, pp. 85–86.
To the memory of my father, who stimulated my interest in science
Contents Preface
ix
Acknowledgments
xi
1 Introduction 1.1 Communication over Unknown Channels 1.1.1 Research Approaches . . . . . . . 1.2 Problem Definition . . . . . . . . . . . . 1.3 Organization of the Thesis . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
2 System Model and the Generalized Maximum-Likelihood Test 2.1 Discretization of Known Waveform Channels . . . . . . 2.2 The Physics of Multipath Channels . . . . . . . . . . . . 2.2.1 The Tapped Delay Line Channel Model . . . . . 2.2.2 The Received Waveform Model . . . . . . . . . . 2.3 An Equivalent Vector Channel . . . . . . . . . . . . . . 2.3.1 The Equivalent Vector Channel in Matrix Form . 2.4 Discussion on Front-End Optimality . . . . . . . . . . . 2.4.1 The Short-Time Fourier Transform . . . . . . . . 2.4.2 Prolate Spheroidal Wave Functions . . . . . . . . 2.4.3 Fractionally-Spaced Sampling . . . . . . . . . . . 2.5 Modulation Format . . . . . . . . . . . . . . . . . . . . . 2.5.1 Decomposition of Binary CPM Signals into PAM Waveforms . . . . . . . . . . . . . . . . . . . . . 2.6 Example Channels . . . . . . . . . . . . . . . . . . . . . vii
. . . .
1 1 4 5 7
. . . . . . . . . . .
11 12 14 20 21 22 28 29 31 34 35 37
. . . .
38 41
. . . .
. . . . . . . . . . .
viii
Contents 2.7
Generalized Maximum-Likelihood Detection of Short Blocks . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.1 Numerical Results and Discussion . . . . . . . . . .
3 Hard Decision Tree Search Using Per-Survivor Processing 3.1 A Forward Recursion for the LS Metric . . . . 3.1.1 Initialization of the Forward Recursion . 3.2 A Search Strategy Based on the Viterbi Operation and Per-Survivor Processing . . . . . 3.3 Numerical Results and Discussion . . . . . . . . 3.4 A Backward Recursion for the Time-Reversed LS Metric . . . . . . . . . . . . . . . . . . . . . 3.4.1 Initialization of the Backward Recursion
. . . . . . . . . . . . . .
57 58 61
. . . . . . . . . . . . . .
64 68
. . . . . . . . . . . . . .
74 75
4 The Forward-Backward Algorithm in the Presence of Uncertainty 4.1 A Renaissance for Iterative Detection . . . . . . . . 4.2 The Forward-Backward Algorithm Using Forward-Only Estimation . . . . . . . . . . . . . . . 4.2.1 Truncation or Finite Path Memory . . . . . . 4.3 The Forward-Backward Algorithm Using Bi-directional Estimation . . . . . . . . . . . . . . . 5 Generalized APP Detection 5.1 The Completion Phase . . . . . . . . . . 5.1.1 Completion Without the Explicit Completion Term . . . . . . . . . 5.2 Description of an Example System . . . 5.3 Numerical Results . . . . . . . . . . . .
48 54
. . . . . . . Need for a . . . . . . . . . . . . . . . . . . . . .
. . . .
79 80
. . . . . . . .
82 88
. . . .
93
. . . .
97 99
. . . . 105 . . . . 106 . . . . 114
6 Conclusions and Suggestions for Future Work
127
A Details of the Metric Recursion
131
B Details of the Metric Partitioning
137
References
143
Preface “Would you tell me, please, which way I ought to go from here?” “That depends a good deal on where you want to get to,” said the Cat. “I don’t much care where. . . ” said Alice. “Then it doesn’t matter which way you go,” said the Cat. “. . . so long as I get somewhere,” Alice added as an explanation. “Oh, you’re sure to do that,” said the Cat, “if you only walk long enough.” Lewis Carroll, Alice’s Adventures in Wonderland
T
he conversation between Alice and the Cheshire-Cat well describes the meandering path (full of dead ends, sharp turns, and hurdles) that one has to tread when doing research. It is now almost five years since my journey through Wonderland begun, as I joined the Telecommunication Theory Group at Chalmers (I still remember how I called from Kawagoe Prince Hotel in suburban Tokyo to ask the Head of the Department for a slight extension of the application deadline). The present thesis, however, is the result of research I started some two years back, while I spent the 2000–2001 academic year as a Visiting Scholar in the Communication Sciences Institute at the University of Southern California. During the past five years people sometimes asked me, “Isn’t it difficult?”, and I usually answered with Ralph Waldo Emerson’s sharp tongue, “Thinking is the hardest work in the world. That’s why so few of us do it.” Saying so, Emerson meant original thinking or research, in contrast to idle thinking or daydreaming. The wannabe scientist who does not love research will not withstand the moments of doubt and the empty weeks (or months) when ideas do not come, or come only to prove false. Now, what words of wisdom can I give to a new innocent Alice who is considering entering into Wonderland? I believe my luck is that I am stupid enough ix
x
Preface
to always think that the idea I have at the moment is going to work out. This saves me a lot of anguish, but more importantly, by the time I realize that it does not work, it has led to another idea, which of course is going to work. At the end of this iterative process, there is indeed something that works (which generally has nothing to do with the original idea—but who cares). Thus, my advice to a fresh Alice is that she must generate enough enthusiasm to persuade herself that it will work, because it will—at some point. I hope no Alice was intimidated by the passage above, because being a doctoral student is highly stimulating. Personally, I have gained insight and experience not only in digital communication theory and mathematics, but also in vastly different disciplines and activities, such as technical writing, the English language, teaching, oral presentation, programming, socializing at conferences, speeding the great freeways of Los Angeles, and late-night tea drinking. Only one word remains to be said to the lucky souls who get a similar chance to escape reality for five years: ENJOY!
Anders Hansson G¨ oteborg, January 2003
Acknowledgments Did we make a difference? Captain James Kirk, Star Trek
T
here are many people who have contributed in different ways to this thesis and I could easily have written several pages about the importance of the contribution from every one of them. I will, however, attempt the impossible task of restricting myself to one sentence per person. This promise has to be broken at once; Professor Tor Aulin at Chalmers has been an eminent advisor ever since I first started my doctoral studies. He is gratefully acknowledged for knowing everything, or at least for knowing where to find it, which is just as good. It has been a pleasure all the way! I owe very special thanks to Professor Keith M. Chugg at the University of Southern California for inviting me to be a Visiting Scholar in the Communication Sciences Institute. His ability to hit upon interesting ideas is remarkable. Now, let me obey the one-sentence-per-person principle that I mentioned above; my heartfelt gratitude goes to (in no particular order). . . . . . Dr. Jocelyn Chow at Chalmers for valuable advice, Dr. Lars Rasmussen, professor at the University of South Australia, for numerous pep talks, Dr. Gianluigi Ferrari from the University of Parma for putting up with me as a flatmate in LA (and for providing parmigiano), xi
xii
Acknowledgments
Professor Mats Viberg, head of Chalmers’ signal processing group, for input, Dr. Achilleas Anastasopoulos at the University of Michigan for helpful correspondence, Dr. Stig-G¨ oran Larsson at Chalmers for your commitment to the courses where I served as a teaching assistant, P¨ ar Moqvist, Zihuai Lin, Dhammika Bokolamulla, Ming Xiao, Anders Nilsson, Elisabeth Uhlemann, Fredrik Br¨ annstr¨ om, and Peng Hui Tan at Chalmers for being good friends, ¨ Dr. Per Odling and Dr. Jossy Sayir for inviting me to the Telecommunications Research Center in lovely Vienna, Professor Emeritus Irving S. Reed of the Communication Sciences Institute, University of Southern California, for memorable chats, Professor Giovanni Corazza from the University of Bologna for lunch discussions on life, the universe, and everything (while at USC), Kyu-Hyuk Chung, Bob Weaver, Robert Wilson, Phunsak Thiennviboon, Yuankai Wang, Ali Taha, Durai Thirupathi, Jun Heo, Robert Golshan, Carlos Corrada-Bravo, Orhan Coskun, Mingrui Zhu, Jun Yang, and Guang-Cai Zhou, (former) graduate students at USC, for many unforgettable moments in the Big Orange (not to mention the chaotic trip to Globecom 2000 in San Francisco), colleagues in The Personal Computing and Communication Research Program for broadening my knowledge, Dr. Ulf Hansson at Ericsson for his genuine interest in my work and progress, my friend Dr. Ulric Ljungblad at Micronic for persuading me to choose the path of science (which he did in 1997 at a hamburger bar in T¨ aby), my mother and my brother for their patience, understanding, and encouragement,
xiii The Swedish Foundation for Strategic Research for funding my work by the grant PCC-9706-01, The Sweden-America Foundation, Ericsson, The Swedish Foundation for International Cooperation in Research and Higher Education, Alice and Lars Erik Landahl’s Foundation, and Chalmers for granting scholarships during my academic year 2000–2001 at the University of Southern California, Donald Knuth and Leslie Lamport for creating LATEX (in lieu of conventional thanks to a typist for helping to prepare the manuscript), and others, who know why.
xiv
Acknowledgments
Chapter 1
Introduction You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles. Do you understand this? And radio operates exactly the same way: you send signals here, they receive them there. The only difference is that there is no cat. Albert Einstein, when asked to explain radio
T
his work falls within the scope of point-to-point digital communications, although there are some connections to other aspects of information and signal processing, such as exploratory seismology, machine learning, data mining, and econometrics.
1.1
Communication over Unknown Channels
Figure 1.1 shows a block diagram of the sort of communication system that will be treated herein. It is assumed that the source of information is discrete by nature, and that it produces data with no redundancy [86]. The resulting data stream is first fed to a transmitter, which maps the data into a waveform sequence. In detail, one could divide the transmission into channel encoding followed by modulation, where the combined aim is to provide reliability. The channel represents the connection between the transmitter and the receiver. Common physical channels are twisted-pair wire lines, coaxial cables, optical fibers, or free space that serves as propagation medium for electromagnetic energy. Different physical channels possess different characteristics. 1
2
Chapter 1
Discrete source
✲ Transmitter
✲ Waveform
channel
✲
Receiver
Figure 1.1: Functional block diagram of a digital communication system. This thesis is devoted to linear filter channels with thermal noise [131]. The channel in Figure 1.1 is then modeled as in Figure 1.2. Thermal noise, which is encountered in all electrical circuits, is shown as an additive disturbance, while other characteristics of the transmission medium are represented by a linear filter. There are several motives for this construct, one being the intractability of nonlinear problems; only scarce theories exist for digital transmission over nonlinear channels [16]. Moreover, a linear filter model captures the most important characteristics of the great majority of the physical channels that we face in practice [131]. A third reason is offered by carefully designing the modulator in such a way that the overall system becomes less sensitive to nonlinearities. For example, the constant envelope of continuous phase modulation (CPM) makes a CPM signal immune to nonlinearities in the transmitter amplifier (a CPM signal carries its information in the phase) [7].
Noise ✲
✲
Linear filter
❄ ✲+
✲
✲
WAVEFORM CHANNEL Figure 1.2: The linear filter channel with additive noise. The channel introduces unwanted distortions of the transmitted waveform sequence. If these effects are perfectly known to the receiver, there is no communication problem. However, the additive noise is adequately modeled as a white random Gaussian process [163], and we will only assume knowledge of its mean and correlation functions. Let us now turn our attention to the linear filter. Keeping our discussion on an introductory level, we can identify a few different types of problems. First, we could make the artificial assumption that the receiver designer
3 has perfect knowledge of the impulse response of the linear filter (exclusive of noise). Forney derived maximum-likelihood sequence detection (MLSD) for this problem (assuming linear modulation and an impulse response with finite time support) [55]. See also the work by Ungerboeck [159]. Second, we could adopt a parametric description of the overall waveform channel. Such an approach has, for example, resulted in successful detector structures in the case of statistically parameterized (continuoustime, frequency-flat) Rayleigh fading channels [79, 81, 71, 78]. In practice, it would then be necessary to feed the detector with estimated channel parameters, which implies the segregated receiver design illustrated in Figure 1.3. Note that this segregated design does not minimize the probability of making erroneous decisions on the data—it is merely used for conceptual simplicity. Also, (deterministic) basis expansion models, in which the channel impulse response is approximated by for example a weighted sum of complex exponentials, have recently gained popularity [124, 66, 140]. The prior knowledge encapsulated in the model allows one to design adaptive receivers for rapidly time-varying communication channels. A third type of problem arises when also statistical and structural knowledge of the (noiseless) channel impulse response is lacking. We are then forced to choose a nonparametric approach, and such an approach will be the focus of this thesis.
✲
✲ Detector
✲
✲
✻ ✲ Estimator ✛
RECEIVER Figure 1.3: The traditional segregated receiver design.
4
Chapter 1
1.1.1
Research Approaches
A common approach to communication over unknown channels is to first transmit a block comprising known data symbols—what is called a training sequence or a pilot sequence—prior to the information carrying data stream.1 The known data symbols are used in the receiver to estimate the channel by studying the statistics of the corresponding observations. The subsequent detection phase is then typically based on this initial channel estimate. There are some applications where it is desirable for the receiver to detect the data without having access to a known training sequence. Techniques for data detection without the benefit of training are said to be blind (or unsupervised, self-recovering). Such schemes are motivated by the need for an efficient use of the (limited) frequency spectrum: in multipoint communication networks, throughput increases and management overhead drops, and in digital broadcasting it is possible to obviate interruptions due to training of new users entering the cell. There are also channels for which training is not efficient due to a rapidly time-varying nature, and scenarios where training is simply not accessible, such as communications intelligence. This thesis treats blind detection (also called blind deconvolution or blind equalization),2 or more specifically, semiblind detection, where both a small (or moderate) number of training symbols and unknown data symbols are used for (implicit) channel estimation. To give a detailed overview of blind detection is difficult due to the plethora of material that has been written on this topic since Lucky pioneered the field in the mid 1960s [109]; most standard textbooks on communication theory includes a chapter on equalization (in which blind equalization is treated) [131, 16]. A tutorial paper from 1985 by Qureshi is often referenced [132], while a recent book edited by Haykin intends to cover state-of-the-art advancements [88]. Some order is established by classifying the existing blind algorithms into four categories.3 First, there are ad hoc algorithms based on the (I) sequential Monte Carlo methodology [45]. Another class of algorithms are of 1
The transmitter usually lacks knowledge of the channel impulse response. Hence, the transmitter cannot convey information about the channel to the receiver. 2 The words “deconvolution” and “equalization” are often used interchangeably, but sometimes the latter term is restricted to problems involving discrete-amplitude pulses. 3 The classes (II)–(IV) are also identified in [131] and [88].
5 (II) Bussgang type, and they recursively perform a stochastic gradient descent by employing a memoryless nonlinearity to adjust a finite-duration impulse response (FIR) filter [88]. Lucky’s decision-directed algorithm [109], novel work by Sato [138], and the Godard/constant-modulus algorithm [67] are all instances of this second class of algorithms. Further work along this line is referenced in Proakis’ bibliographical notes on adaptive equalization [131]. A third family of algorithms explicitly use (III) higherorder statistics of the received signal to estimate the channel characteristics [85, 154]. Bussgang algorithms are related to control theory, while methods based on higher-order statistics stem from signal processing. Just like class (I), both (II) and (III) are ad hoc, but there is also a fourth class of algorithms that is rigorously based on concepts rooted in (IV) information and communication theory. Information theorists are supposedly advocates of a detection rule that results in maximum mutual information (MMI) (or maximum Kullback-Leibler divergence). This idea is exploited in independent component analysis [36, 15, 3], which comes from neural network theory and computer science. We shall, however, take a classical detection approach based on the standard maximum-likelihood (ML) criterion. Algorithms of this type has been devised by, for example, Ghosh and Weber [65], Seshadri [142], and Zervas et al. [166]. It is noteworthy that the two starting points (MMI and ML) are equivalent [24].
1.2
Problem Definition
It can be interesting to formulate our problem in general terms: To design an intelligent system in the presence of uncertainty, where the imprecise term “intelligent system” can be defined as a system making decisions according to a well-defined set of rules in order to achieve some objective (such as minimizing a cost function). With this general picture in mind, the aforementioned connection to other aspects of information processing becomes obvious; it should hardly be surprising to learn in Chapter 4 that one of the recent (and most promising) detection principles in communication theory has its counterpart algorithm in computer science. Now, let us narrow down our research goal a bit. The receiver will be pictured as a black box, and our aim is to develop receiver algorithms
6
Chapter 1
that perform estimation and detection jointly, i.e., estimation should be is an integral part of detection. We will hence attack a composite detection problem, in which both the data and the channel are unknown. Our research approach is to employ a generalized ML concept. It should also be mentioned that there have been attempts to design codes tailored to the problem of joint estimation/detection [145, 37]. We will not consider such a strategy. Instead, we intend to focus on detection algorithms associated with standard trellis codes. In 1993, Berrou, Glavieux, and Thitimajshima revolutionized digital communications by introducing a new type of codes, which they named turbo codes, and iterative decoding was suggested as a practical means of decoding these codes [21]. In the wake of the tremendous success of turbo codes, similar techniques have been proposed for known linear filter channels under the generic term “turbo equalization” [47, 100, 151]. In this thesis, we also intend to derive an algorithm suited for turbo coding/decoding—with the special focus on unknown waveform channels. It should be stressed that iterative detection relies on soft information (also called soft decisions, reliability values, beliefs), and these quantities are most often a priori probabilities, a posteriori probabilities (APPs), or scaled versions thereof. Due to complexity reasons, it is not always reasonable to compute exact APPs, and one could then consider approximation methods, leading to approximate APPs. As long as these approximate quantities are normalized to probabilities, they can still be referred to as soft decisions. We are now ready to formulate our main objective in more precise terms; we set out to derive a joint estimation/detection algorithm that delivers soft decisions on symbols transmitted over an unknown timedispersive waveform channel. Let us briefly review some related work.4 In a first attempt, GarciaFrias and Villasenor modeled the unknown nuisance parameter as a Markov chain with a finite number of states [62]. The optimal soft decision algorithm can then operate in an augmented trellis. Of more interest is the case of continuous parameters, and Garcia-Frias and Villasenor later addressed also this problem [63]. The solution in [63] is based on the Baum-Welch algorithm [133], or equivalently the expectation-maximization (EM) algorithm [42]. Since convergence to a local optimum is possible, the optimality 4 Our literature survey is by no means complete. Nevertheless, each selected article reflects a different approach to our problem.
7 of the EM algorithm cannot be guaranteed, and this approach requires accurate initialization. Moreover, the simulation results in [63] indicated that fairly many iterations are needed to obtain good performance, but for complexity reasons, it is desirable to keep down the number of iterations. There is also another suboptimal approach, which relies on fixed lag processing, i.e., only a few symbol intervals are taken into account when estimating the unknown parameters, see for example [83, 40]. We will instead explore a number of ideas first suggested by Anastasopoulos and Chugg for communication over a (first-order) Gauss-Markov channel [5]. Anastasopoulos has also investigated similar techniques for communication over a deterministic parameter model [4]. Just like Iltis et al. [90], Anastasopoulos and Chugg initially presented an optimal detector that consists of a bank of processing units in which each unit maintains a parameter estimate conditioned on one of the hypothesized candidate sequences [5]. This approach differs from, for example, work by Baccarelli and Cusani, who proposed a sub-optimal receiver structure based on a single (global) channel estimator [9]. In order to reduce the complexity, Anastasopoulos and Chugg later suggested to prune the tree of candidate sequences [5]. Inevitably, there is a risk that one looses the most likely sequence(s) due to the pruning process, and to mitigate the effect of such a loss, Anastasopoulos and Chugg proposed bi-directional estimation (more about this later on). This is their most significant contribution, which we hope to benefit from. Our results can be viewed as generalizations and extensions of those in [5] and [4]. What partly makes our exposition unique is that we start with a nonparametric continuous-time model. As a direct consequence, we face a difficult discretization problem, because, as will be explained in Chapter 2, the conventional matched filtering approach is not appropriate.
1.3
Organization of the Thesis
This thesis is organized as follows. Chapter 2: System Model and the Generalized MaximumLikelihood Test Chapter 2 presents a brief introduction to the physics of time-dispersive radio channels. It is explained how the standard discrete-time model for
8
Chapter 1
known time-dispersive channels cannot be taken as starting-point for obtaining an adequate model for unknown channels. Instead, continuous time is revisited in order to derive an equivalent vector channel. The chapter also treats various system aspects, such as the choice of modulation scheme. It is seen how channel modeling and system design are closely intertwined when dealing with unknown channels. Moreover, a somewhat peculiar discussion on front-end optimality is included in this chapter. The conventional optimality criterion (based on the notion of a sufficient statistic) is not appropriate for finite complexity detection, and an alternative optimality criterion is discussed. Rather than delving into details, the discussion is kept at a high level; we will merely point out the relevance of related theories and their bearing on our problem. We still believe the reader can benefit from the presented ideas, and hopefully it can spur new ideas. Finally, the concept of generalized ML detection, which plays a central role in this thesis, is explained in Chapter 2. We exemplify this principle by considering detection of short blocks, and conclude that generalized MLSD (GMLSD) implies an exhaustive tree search. Chapter 3: Hard Decision Tree Search Using Per-Survivor Processing A suboptimal hard decision algorithm is proposed. In short, this suboptimal algorithm is based on tree pruning, which means that only a subset of all possible sequences are evaluated as candidates. The selection of survivor sequences is governed by the add-compare-select operation of the Viterbi algorithm. We will also explain how the metric (or score) of each survivor sequence may be calculated by means of the recursive least-squares algorithm and per-survivor processing. The search algorithm is initialized with a short preamble of known pilot symbols. As a key result, we observe that a small (or moderate) number of pilot symbols is not enough to guarantee a good starting estimate of the unknown channel parameter vector. In consequence, the risk of loosing a whole frame cannot be neglected, and bi-directional estimation is mentioned as a potential remedy. Bi-directional estimation combines an estimate obtained from a forward-directed search with an estimate obtained from a time-reversed search, i.e., a backward-directed search. Since
9 bi-directional estimation will be applied in subsequent chapters, Chapter 2 also presents a backward-directed version of the search algorithm. Chapter 4: The Forward-Backward Algorithm in the Presence of Uncertainty As stated above, our main objective is to derive a soft decision algorithm. In Chapter 4, we discuss the general problem of soft decision detection for channels with memory, and we note that the standard forward-backward algorithm is also capable of computing APPs in the presence of finite memory. The presented discussion provides a background to bi-directional estimation. Chapter 5: Generalized APP Detection In this chapter, different concepts from the previous chapters are combined in a final algorithm. In particular, we give a detailed description of the completion steps of forward and backward survivor sequences. Numerical results are also presented to illustrate the performance of the algorithm. Chapter 6: Conclusions and Suggestions for Future Work In Chapter 6, we recapitulate the main results and discuss their applicability to related problems, such as the complexity reduction problem of soft decision decoding. A brief outlook is also presented.
10
Chapter 1
Chapter 2
System Model and the Generalized Maximum-Likelihood Test No problem is so formidable that you can’t walk away from it. Charles M. Schulz
M
ainly, this chapter is devoted to the modeling issue of unknown time-dispersive waveform channels, and an equivalent vector channel is derived.1 Various system aspects are also treated, such as the receiver front-end and the modulation format. We will see that channel modeling and system design are closely intertwined when dealing with unknown channels. Finally, the generalized maximum-likelihood criterion is explained. Let us first put the term “modeling” into perspective. By a scientific model is meant a mathematical construct (with certain verbal interpretations). The justification of such a construct is solely that it is expected to describe phenomena observed in the real world. Simple models are favorable due to their mathematical tractability, but simplicity often prevents a model from being precise. The trade-off between simplicity and fidelity has encouraged multiple models to evolve. On the other hand, the research 1
Parts of this chapter have been published in [73].
11
12
Chapter 2
community has a strong reason for adopting a rather limited number of models, since a unified framework facilitates comparison between different results. A universal model further allows the individual researcher to devote all his/her energy to theoretical investigations within the common framework, and the need to re-establish the very basics is eliminated. It is of paramount importance, however, that every researcher fully understands the derivation of the model and its underlying assumptions. One should also bear in mind that the model defines the problem and indirectly implies a solution (or a class of solutions). Thus, modeling is concluded to be crucial. Unfortunately it seems like the importance of modeling is sometimes overlooked, perhaps due to the existence of a few predominant models. A universally accepted model runs the risk of becoming so ingrained in the field that researchers tend to modify this model even when the underlying assumptions are violated—instead of deriving a new model that matches the new assumptions. As will be explained below, this has sometimes happened to the problem of communication over unknown time-dispersive waveform channels.
2.1
Discretization of Known Waveform Channels
A maximum a posteriori probability computer is known to achieve minimum probability of detection error, according to the classical Bayesian detection theory [155]. Such a device declares the data hypothesis with largest a posteriori probability to be the transmitted data. This detection rule thus involves the computation of likelihood functionals for all hypotheses, where the core is to first find a sufficient statistic, i.e., a set of observation variables that is sufficient for making an optimal decision [155]. Statistical sufficiency is based on two well-known theorems: the theorem of reversibility (TR) and the theorem of irrelevance (TI) [163]. Let us first consider the additive white Gaussian noise (AWGN) channel, where the noise is assumed to be independent of the transmitted data. The TR requires the mapping (from the continuous-time received signal) that defines the sufficient statistic to be one-to-one (otherwise information is irretrievably lost), while the TI implies that it is not necessary to satisfy the TR with respect to the noise (since the noise is independent of the
13 data). For a modulation scheme with at most P distinct waveforms, we could then use the Gram-Schmidt orthogonalization process to obtain a set of N ≤ P basis functions spanning the P different pulses [163]. The GramSchmidt procedure is parsimonious in the sense that the obtained basis is not guaranteed to span an arbitrary function that does not belong to the set of possible modulation pulses. Projection into the N -dimensional space spanned by the N basis functions yields a minimal sufficient statistic comprising only N observables. Two common realizations of this projection are known as the correlation receiver and the matched filter receiver [163]. Forney showed that a similar front-end structure attains statistical sufficiency also for digital signaling corrupted by known finite-length intersymbol interference (ISI) and AWGN [55]. If the channel impulse response is a known (deterministic) function, say c(t), it is clear that a finite set of noiseless channel outputs can be calculated, and this set can then be used to once again derive a basis by means of the Gram-Schmidt procedure. The resulting front-end is a bank of matched filters,2 whose outputs are sampled at the symbol rate 1/Ts . One inconvenience of this solution is that it leads to colored noise samples (due to the temporal overlap). Hence, Forney suggested the use of a (discrete-time) whitening post-filter, and his proposed front-end structure is called a whitened matched filter (WMF) [55]. Compare the work by Ungerboeck [159], in which the detection algorithm operates directly on the matched filter output without whitening the noise. For simplicity, consider a system using pulse amplitude modulation (PAM) [131]. Denote the modulation pulse by g(t), and the noiseless channel response to this pulse by h(t) = g ∗c(t). In mathematical notation, the WMF output rk in symbol interval k due to a data stream {dk } is then rk =
L−1
dk− h + wk ,
(2.1)
=0 L−1 is a set of known coefficients, and wk is a white noise sample where {h }=0 [55]. The number of terms L = Lg + Lc depends on the support of the modulation pulse [0, Lg Ts ) and the support of the channel impulse response [0, Lc Ts ). 2 The standard implementation is in the form of a channel matched filter followed by a bank of signal matched filters.
14
Chapter 2
The numbers {h } are often called channel coefficients or channel parameters. It should be noted that their relations to the channel impulse response c(t) are somewhat involved; the coefficients are related to the sampled autocorrelation function of the channel response h(t) = g ∗ c(t) [55], ∞ L−1 h(τ ) h∗ (τ − kTs )dτ = h h∗−k . (2.2) −∞
=0
The set {h } is conveniently calculated by means of a spectral factorization in the z-transform domain [55]. The model in (2.1) has become so ingrained in the study of known ISI channels that it has sometimes been adopted in the case of unknown ISI channels, then with unknown channel parameters [131, 142]. Clearly, if the channel is unknown, the WMF cannot be identified,3 and hence, equation (2.1) no longer corresponds to observables coming from an optimal frontend. We must revisit continuous time in order to develop an adequate model.
2.2
The Physics of Multipath Channels
Consider a data block dK−1 = [d0 , d1 , . . . , dK−1 ] that consists of (possibly 0 complex-valued) data symbols dk drawn from a Q-ary alphabet. Assume that the possible data symbols at each time index k are associated with equal a priori probabilities. The (finite) block length K may be arbitrarily large. More specifically, let us confine our exposition to digital modulation waveforms that can be expressed as a sum of modulation pulses g equidistantly separated in time, ) s(t; dK−1 0
=
K−1
g(t − kTs ; dk , Sk ),
(2.3)
k=0
where 1/Ts is the rate with which symbols are being transmitted, and Sk represents an internal state for modulation formats with memory. Equation (2.3) is general in the sense that the shape of the pulse g may vary from one time epoch to another, because g depends on both dk and Sk . 3 More specifically, a matched filter (bank) cannot be derived when the channel impulse response is unknown.
15 We will assume, however, that there is a finite set of possible states {Sk }, which implies that g(t) is selected from a finite set of waveforms, say G = {g0 (t), g1 (t), . . . , gΞ−1 (t)}. The burden of mathematical notation is somewhat alleviated by introducing the state dependent data symbol bk = bk (dk , Sk ),
(2.4)
where the notation is meant to convey that the symbol bk depends on dk and Sk —although we not explicitly write out its argument. Modulation with memory could then be viewed as the concatenation of a finite state machine and a memoryless mapper, confer Rimoldi’s description of CPM [136]. into a (complex) equivalent After modulating the data stream dK−1 0 ), as was described by (2.3), we subsequently baseband signal s(t; dK−1 0 heterodyne (frequency translate) the signal up into the passband. The transmitted signal is written as √ (2.5) sc (t; dK−1 ) = s(t; dK−1 ) 2 ejωc t , 0 0 √ where ωc is the carrier angular frequency. The normalizing factor 2 is ) equal to that of s(t; dK−1 ) included to maintain the energy of sc (t; dK−1 0 0 [141]. Now, channel modeling is the problem of mathematically expressing the ) waveform that appears at the receiver when the passband signal sc (t; dK−1 0 is transmitted. This requires a thorough understanding of the physical characteristics of radio channels, since a reliable model should ultimately be based on empirical observations rather than mathematical axioms. Several hundreds (or presumably thousands) of papers have been written on the characterization of radio propagation channels. A standard work on land-mobile radio channels is edited by Jakes [91], while Parsons offers a more recent treatise [128]. Another starting-point for acquiring knowledge is a tutorial paper on indoor radio propagation channels written by Hashemi, which includes a list of more than 250 useful references [84]. A transmitted radio wave interacts with various physical objects within the wireless channel. The number of objects, as well as the coordinates and properties of these objects, define a certain propagation environment. Note that this structure might be time-varying as a consequence of moving objects and/or a moving receiver (and/or a moving transmitter). The
16
Chapter 2
interaction of the wave with the surrounding objects is a highly complex process that includes diffraction, refraction, and reflection. As a result, a transmitted wave reaches the receiver through multiple propagation paths, where each path is associated with an attenuation factor, a time delay, a Doppler shift, and an angle of arrival. The Doppler shift is an apparent shift in frequency caused by motion of the receiver relative to the frame of reference of a scattering object [127],4 ωc,m = ωc
1 + v/c0 cos ϕm ≈ ωc (1 + v/c0 cos ϕm ), 1 − v 2/c20
(2.6)
where v is the speed of the receiver, c0 is the speed of electromagnetic waves (which is the speed of light), and ϕm is the direction angle of the mth scatterer with respect to the receiver velocity vector. The approximation in (2.6) is accurate for practical communication systems. Moreover, for notational brevity it is convenient to use the notion of a maximum Doppler angular frequency shift ωd , = ωc + η v cos ϕm = ωc + ωd cos ϕm . ωc,m
(2.7)
The symbol η in (2.7) stands for the circular wave number, which is inversely proportional to the carrier wavelength λc , η = 2π/λc .
(2.8)
All of the just mentioned parameters, such as the range of time delays, are somewhat different for different types of propagation environments and for different frequency bands. A claim on realism seems possible only if fairly strict delimitations are made concerning the environment and the band. However, what is accepted as a general characterization of radio propagation channels is the existence of multiple propagation paths, and this explains why someone has coined the term multipath channels. A neat expression of the received passband signal, subject to multipath propagation but exclusive of additive thermal noise, is in the form of a sum of M attenuated, time delayed, as well as frequency shifted replicas of the transmitted wave, Yc (t) =
M −1
√ jω (t−t ) m , Am s(t − tm ; dK−1 ) 2 e c,m 0
(2.9)
m=0 4 Certainly, a Doppler shift can also arise from motion of the transmitter, but it all boils down to the choice of reference system.
17 where Am is the attenuation factor for path m, and tm is the delay associated with path m. The delay depends on the (Euclidean) geometric vector ρ ! for the particular antenna element that receives the signal, and can thus be decomposed as tm = tm − 1/c0 ρ ! • θ!m , (2.10) where tm is the delay relative to the origin of coordinates (the reference point), and θ!m denotes a unit vector normal to the wavefront impinging ! can be set to the from an angle θm . Note that the geometric vector ρ null vector if the considered antenna element is taken as reference point. This can be done without loss of generality, since we confine ourselves to single antenna receivers.5 It should also be mentioned that the antenna element in (2.9) was assumed to be omnidirectional, i.e., the element was assumed to have an essentially nondirectional antenna pattern. Another assumption is that the receiver antenna operates in the Fraunhofer region (or far-field), which justifies the approximation involved in modeling a wavefront as plane [11]. Next, since ωd ωc and cos ϕm ≤ 1, the exponential factor in (2.9) is well approximated as exp j(ωc t − ωc tm + ωd cos ϕm t − ωd cos ϕm tm ) ≈ ejωc t e−jωc tm ejφm t , (2.11) if we introduce the Doppler angular frequency shift φm = ωd cos ϕm .
(2.12)
We can thus write the received (complex) equivalent baseband signal in the compact form r(t) =
M −1
Am s(t − tm ; dK−1 ) e−jωc tm ejφm t + w(t), 0
(2.13)
m=0
where w(t) represents AWGN with autocorrelation function E{w(t) w∗ (t + τ )} = N0 δ(τ ) and zero mean [141]. In general, all the channel parameters 5
For array receivers, it becomes necessary to model the angle spread (or spatial spread), and the reader is referred to a tutorial paper by Ertel et al. [49] and a book by Saunders [139].
18
Chapter 2
in (2.13) are time-varying, M (t)−1
r(t) =
Am (t) s(t − tm (t); dK−1 ) e−jωc tm (t) ejφm (t) t + w(t). (2.14) 0
m=0
The received signal in (2.14) can alternatively be viewed as the response Y (t) to a time-varying linear filter (system) observed in additive noise, ∞ r(t) = Y (t) + w(t) = c(τ ; t) s(t − τ ; dK−1 )dτ + w(t), (2.15) 0 −∞
where the kernel c(τ ; t) represents the (time-varying) channel impulse response, M (t)−1
c(τ ; t) =
Am (t) δ(τ − tm (t)) e−jωc tm (t) ejφm (t) t .
(2.16)
m=0
Here, the multipath response Y (t) and the noise process w(t) are assumed to be statistically independent. The time-invariant version of the impulse response in (2.16), c(τ ) =
M −1
Am δ(τ − tm ) e−jωc tm ,
(2.17)
m=0
is more attractive due to its simplicity, and was first suggested by Turin for modeling long-distance multipath propagation via the ionosphere or troposphere [156]. The fact that the Doppler shift has been neglected in (2.17) is of course its major weakness. In conventional outdoor mobile radio channels (with elevated base station antennas and low-level mobile antennas), the scattering is mainly caused by large objects (such as buildings) that are fixed in time, but the Doppler shift may be significant due to motion of the receiver. Yet, (2.17) has been reported to successfully capture the essence of mobile radio propagation in urban environments [157]. An important remark, however, is that only snapshots of multipath profiles were investigated in [157], while no attention was paid to the (time) evolution of the phase shifts. For indoor radio channels, the propagation environment itself is more inclined to change over time (due to the motion of people and other scattering objects surrounding the low-level receiver antenna), but for short
19 to moderate-sized data blocks it is reasonable to assume a time-invariant impulse response as suggested by (2.17) [84]. The overall change of the propagation environment may be classified as a long-term variation (since it is slowly varying compared with the transmission rate), just like shadowing and path loss, and such phenomena can usually be omitted from a model that is solely used for the design and analysis of receiver algorithms [148]. The long-term variations are instead taken into account at a cell-planning stage. We have not elaborated on the number M of propagation paths yet. The fact that few scatterers behave as ideal mirrors leads us to believe that there will be a great number of paths; confer the concept of diffuse sources and micro-angular spread [160]. Practical measurements have a limited resolution, however, and two paths are only resolvable as distinct paths if the condition |tm − tn | ≥ 1/W,
m = n
(2.18)
is satisfied, where W is the transmission bandwidth [156]. This has been taken as an argument for lumping paths together in (2.17) [156]—usually into a quite small number of remaining terms (typically less than ten) [96]. In this way, each resulting term arises from a large number of superimposed paths. The intractable complexity of each superposition, and equally important, the unpredictable nature of the whole propagation environment, calls for statistical modeling. For example, it is common to assume that an ensemble of phase shifts are uniformly distributed on [0, 2π). The argument for this assumption is easily understood if we express the phase shifts as (2.19) −ωc tm = −η lm , where lm is the length of path m. Clearly, two paths with slightly different path lengths may have significantly different phase shifts (one wavelength corresponds to a phase shift of 2π radians). However, if the temporal resolution associated with (2.18) is very high, it is not reasonable to believe that the phase shifts of two closely spaced paths are distinctly different. A literature survey reveals that very little work has been devoted to phase modeling; Nikookar and Hashemi have proposed two phase models for indoor radio propagation channels [125]. As pointed out in [84], the scanty material is probably due to difficulties associated with measuring the phase of individual multipath components.
20
Chapter 2
The physics of time dispersive channels could be further characterized by the multipath spread (or the delay spread, or the temporal spread), which can be interpreted as a measure of the range of time delays over which the average power output of the channel is essentially nonzero. The multipath spread can be assumed to equal a number of symbol intervals, Tm = Lc Ts ,
(2.20)
where Lc is called the length of the impulse response. Note that this assumption merely restricts the maximum support of the impulse response, and hence it does not affect the generality of the model; Lc can be arbitrarily large, and the impulse response can also be zero within [0, Lc Ts ). To conclude, the channel impulse response c(τ ) is assumed to be causal and compactly supported on [0, Tm ).
2.2.1
The Tapped Delay Line Channel Model
Proakis presents an alternative background to the discrete-time model defined by (2.17) and (2.18) [131]. He first assumes that the transmitted pulse g is band-limited to [−W/2, W/2], before expanding g by means of the sampling theorem. Even if g may not be strictly band-limited, one could argue that all feasible signals have a practical bandwidth (according to some bandwidth measure, such as the 99% in band power measure) [146]. More importantly, Proakis’ calculation is based on one more assumption that is not always clearly stated, namely that also c is band-limited to [−W/2, W/2]. Further, c is argued to be essentially time-limited to [0, Tm ).6 Starting in continuous time, Proakis thus arrives at a very neat discrete-time model, which is the celebrated tapped delay line model [131]. However, the assumptions are violated if the channel impulse response contains frequency components outside the band occupied by the modulation pulse. If the channel is unknown to the receiver designer, s/he has no reason to assume that c is limited to the same frequency band as g. Thus, the tapped delay line model seems somewhat ill motivated in the case of unknown channels. We further note that one fundamental design problem reduces to a mere triviality for the tapped delay line, namely the question of obtaining a suf6 Strictly, no signal can be both band-limited and time-limited in consequence of the properties of the Fourier transform.
21 ficient statistic. Since the received noiseless signal comprises nondistorted replicas of the modulation pulse—apart from the scaling factors Am e−jωc tm and the delays tm —the front-end can rely on matched filtering followed by fractionally-spaced sampling, which is a cornerstone of the RAKE receiver [130].
2.2.2
The Received Waveform Model
Note that the condition (2.18) is due to limitations of the measurement equipment, rather than an inherent characteristic of the channel. Some channels (such as the tropospheric scatter channel) are better modeled as consisting of a continuum of multipath components [131]. The channel impulse response in (2.17) could then be generalized to c(τ ) = A(τ )e−jωc τ .
(2.21)
We understand that great concern must be given to the sort of quantization implied by (2.18). The relevance of the last statement is also clear from [76], where the effect of a continuous azimuth spread was investigated for various detector structures. The lesson to be learned from [76] is that a receiver designed for a quantized model may perform poorly on the continuous (physical) channel. Instead of adopting a discrete-time model from start, we set out to solve the more general problem of communicating over continuous-time multipath channels. Moreover, it seems somewhat artificial to assume that the system designer has any knowledge of the channel impulse response. In this thesis, we will hence model the (continuous-time) impulse response as a time-invariant deterministic function (which is unknown). Since the impulse response is unknown, there is no reason to specify it. Later we need to give some numerical examples, but for now we intend to keep our presentation as general as possible. Let us then focus our attention to the noiseless received waveform Y (t)—an approach that we may call the received waveform model. For any practical system, it is most reasonable to suppose that Y (t) belongs to the Lebesgue space L2 (R), i.e., the Hilbert space of complex-
22
Chapter 2
valued functions defined on R that are square-integrable,7 ∞ 2 2 Y (t) ∈ L (R) ⇒ Y L2 (R) |Y (t)|2 dt = EY < ∞,
(2.22)
−∞
which means that the squared L2 -norm of Y (t), or the energy of Y (t), denoted EY , is well-defined (or finite).
2.3
An Equivalent Vector Channel
The received signal r(t) is a continuous-time random process, and as such it is specified in terms of the joint probability density functions (PDFs) that it implies [163]. This means that the key to analyze r(t) is to find some way to represent it by a (discrete) vector, which will be denoted r. Except for very short blocks lengths K, it is advantageous to divide the discretization procedure by considering each symbol interval at a time (or alternatively a few number of symbol intervals at a time). Such an approach favors small decision delays in the receiver, which is desirable for real-time communication. In order to pursue this strategy we write the observation vector r as T (2.23) r = rT0 rT1 · · · rTK−1 , where rTk = [rk,0 rk,1
· · · rk,n
· · · rk,N −1 ]
(2.24)
denotes a partial observation vector obtained in symbol interval k = 0, 1, . . . , K − 1. The numbers {rk,n } will be called observables. Note that the vector rk could be infinite-dimensional, and the set of indices {n} is then the natural numbers, denoted by N. Sometimes it is convenient to index the elements of rk with both positive and negative integers, i.e., n ∈ Z = {0, ±1, ±2, . . .}. We will use E to denote either N or Z. Likewise, Y (t) and w(t) are represented by two column vectors Y and w comprising the elements (Y)k,n = Yk,n , k = 0, 1, . . . , K − 1, 7
n = 0, 1, . . . , N − 1,
(2.25)
The integral is the Lebesgue integral, which is somewhat more general than the basic Riemann integral. The value of a Lebesgue integral is not affected by values of the function over any countable set of values of its argument (or, more generally, a set of measure zero) [41].
23 and (w)k,n = wk,n ,
k = 0, 1, . . . , K − 1,
n = 0, 1, . . . , N − 1,
(2.26)
such that r = Y + w.
(2.27)
Now, since the information-carrying term Y (t) is assumed to be statistically independent of the noise w(t), the aforementioned theorem of irrelevance says that we only need to concern ourselves with the discretization of Y (t) [163]. We have postulated that Y (t) is an L2 -function, and a general method to obtain the vector Y is then defined by the (linear) mapping Ψ, Ψ : L2 (R) → l2 (E2 )
(2.28)
(ΨY )k,n = Y |ψk,n = Yk,n ,
(2.29)
where the functional Yk,n is the L2 -inner product of Y (t) and a function ψk,n (t),8 ∞ ∗ Y |ψk,n Y (t) ψk,n (t)dt. (2.30) −∞
The theorem of reversibility demands Ψ to be one-to-one, otherwise r does not constitute a sufficient statistic, and optimality is lost [163]. Let us further require that the inversion (which is possible, in principle, if Ψ is one-to-one) is numerically stable; loosely speaking, if the images Y1 and Y2 are close for any two given functions Y1 (t) and Y2 (t), then it should imply that Y1 (t) and Y2 (t) are close as well. Strictly stated, we require that Ψ is a frame operator [41], R0 Y 2L2 (R) ≤
|Yk,n |2 ≤ R1 Y 2L2 (R) ,
(2.31)
k,n
with positive frame bounds R0 and R1 . If R0 = R1 , the frame is called tight, and the inversion Y → Y becomes trivial, Y |ψk,n ψk,n (t) = R0−1 Yk,n ψk,n (t), (2.32) Y (t) = R0−1 k,n
k,n
8 By convention, the inner product is linear in the first argument and anti-linear in the second argument.
24
Chapter 2
which is readily verified by taking the inner product of Y (t) with both sides of (2.32). The notion of frames was introduced in 1952 by Duffin and Schaeffer for the study of nonharmonic Fourier series [48, 164]. Note that frames (including tight frames) are not bases in general, in spite of the resemblance between the reconstruction formula (the synthesis formula) for a tight frame (2.32) and the standard expansion in an orthonormal (ON) basis. A frame is an over-representation of a basis: a frame may be redundant in the sense that any of the spanning functions lies in the closed linear span of all the others, whereas a basis is independent, which means that the coefficient functionals are unique for any particular element of the function space (no basis function can be written as a linear combination of the others). In a similar manner, a tight frame is an over-representation of an ON basis. If R0 = R1 = 1, we recognize (2.31) as Parseval’s identity, and it then follows that the frame is an ON basis. We are all familiar with the advantage of an ON basis: it leads to uncorrelated noise observables {wk,n }, ∗ = E {w| ψk,m w| ψ,n ∗ } σ 2 E wk,m w,n ∞ ∞ ∗ E{w(t)w∗ (u)} ψk,m (t) ψ,n (u)dudt = −∞ −∞ ∞ ∗ ψk,m (t) ψ,n (t)dt = N0 δk δmn , (2.33) = N0 −∞
if we assume that basis functions that are associated with different symbol intervals are orthogonal (we will come back to this assumption below). Thus, why bother about frames? The reason for including the concept of frames is twofold. First, a frame operator provides a very general discretization, and it is always interesting to learn which options are available. Second, frames are useful in the context of Fourier series expansions, which will be discussed in Section 2.4. We assumed before that the impulse response is time-invariant. This allows the noiseless received signal Y to be expressed as a sum of timeshifted pulses y, Y (t) =
K−1 Tm k=0
0
c(τ ) g(t − τ − kTs ; bk )dτ =
K−1 k=0
y(t − kTs ; bk ),
(2.34)
25 which considerably simplifies the subsequent analysis (and is not possible for time-varying impulse responses). Moreover, let the modulation pulse g(t) be compactly supported on [0, Lg Ts ). Signals that are not compactly supported in the time domain (i.e., band-limited signals) are treated as Lg → ∞. It now follows that y(t − kTs ; bk ) has support on [kTs , (k + Lc + Lg )Ts ). For convenience, let us introduce L Lc + Lg . (2.35) Observable number n in symbol interval k can then be expressed as rk,n = r| ψk,n = Y | ψk,n + w| ψk,n K−1 (+L)Ts ∗ y(t − 1Ts ; b ) ψk,n (t)dt + wk,n , = =0
(2.36)
Ts
If we choose spanning functions {ψk,n (t)} with finite support, some terms in the sum will vanish, which is desirable for simplifying reasons. Assume that {ψk,n (t)} are all compactly supported on [kTs , (k + 1)Ts ). This also means that basis functions that are associated with different spanning sets (different symbol intervals) are trivially orthogonal, and the noise samples wk,n will be uncorrelated as was shown in (2.33). By considering the support of ψk,n (t), we understand that the nonzero terms satisfy 1+L>k
⇐⇒
1>k−L
(2.37)
and 1 < k + 1.
(2.38)
Hence, (2.36) reduces to rk,n =
k
(+L)Ts
=k−L+1 Ts
∗ y(t − 1Ts ; b ) ψk,n (t)dt + wk,n .
(2.39)
There is no reason to use different spanning sets in different symbol intervals k, so we let the spanning sets be time-shifted replicas of each other, ψk,n (t) = ψ0,n (t − kTs ) ψn (t − kTs ).
(2.40)
26
Chapter 2
Next, write out the sum in (2.39) and make the variable substitution t − kTs = u in each term (integral), rk,n =
Ts
−(L−1)Ts 2Ts
y u + (L − 1)Ts ; bk−(L−1) ψn∗ (u)du
y u + (L − 2)Ts ; bk−(L−2) ψn∗ (u)du
+
−(L−2)Ts LTs
+... + 0
y(u; bk ) ψn∗ (u)du + wk,n .
(2.41)
Let us once again write the expression in the form of a sum, rk,n =
L−1 (L−)Ts −Ts
=0
y(t + 1Ts ; bk− ) ψn∗ (t)dt + wk,n .
(2.42)
The limits of the integral in (2.42) can be simplified if we recall that ψn (t) is compactly supported on [0, Ts ), rk,n =
L−1 Ts =0
0
y(t + 1Ts ; bk− ) ψn∗ (t)dt + wk,n .
(2.43)
Next, replace y with the help of (2.34), rk,n =
L−1 Ts Lc Ts =0
0
c(τ ) g(t − τ + 1Ts ; bk− )dτ ψn∗ (t)dt + wk,n .
0
(2.44)
In general, the observables {rk,n } are seen to depend on the data symbols {bk } in a nonlinear fashion, and as we will see in Section 2.7, this proves to be mathematically intractable. When the modulation scheme is linear, i.e., when g(t; bk ) ≡ bk g(t),
(2.45)
equation (2.44) shows a linear data dependency, rk,n =
L−1 =0
Ts bk−
0
0
Lc Ts
c(τ ) g(t − τ + 1Ts )dτ ψn∗ (t)dt + wk,n .
(2.46)
27 By defining the channel coefficients 9 Ts Lc Ts ∗ h,n c(τ ) g(t − τ + 1Ts )dτ ψn (t)dt = 0
0
0
Ts
y(t + 1Ts ) ψn∗ (t)dt, (2.47)
we finally arrive at the discrete model rk,n =
L−1
bk− h,n + wk,n ,
(2.48)
=0
which is illustrated in Figure 2.1. Note that the concept of channel coefficients is meaningless for nonlinear modulation formats, since the decomposition (2.48) into data symbols and (unknown) projection coefficients is invalid.
✲D
bk h0,n
❄ ✲×
h1,n
bk−1
✲ ···
❄ ✲×
✲D
hL−1,n
❄ ✲+ ✲ ···
bk−L+1 ❄ ✲×
wk,n
❄ ✲+
❄ ✲+
✲ rk,n
Figure 2.1: The equivalent vector channel for observable rk,n . A box labeled “D” symbolizes a delay element. Before continuing, it is illustrative to rewrite (2.48) as Lc +Lg −1
rk,n = bk h0,n +
bk− h,n + wk,n .
(2.49)
=1
The first term is produced by the kth transmitted data symbol, while the second term represents a residual effect of the Lc + Lg − 1 previously transmitted symbols. This effect is called intersymbol interference (ISI). 9
In order to keep the notation plain and simple, y(t) is now being used as the noiseless channel response to a data independent modulation pulse g(t). Confer (2.34), where y(t; bk ) is the response to a data dependent pulse g(t; bk ). This should not lead to any confusion, since the dependency is clear from the argument.
28
Chapter 2
In physical terms, ISI arises because the transmission bandwidth is greater than the coherence bandwidth of the channel, which means that different frequency components of the modulation pulse are attenuated differently, and more significantly, delayed differently by the channel. The channel is said to be frequency-selective. Consequently, the pulse appearing at the output of the channel is temporally dispersed (or time spread) over an interval longer than the duration Ts of the pulse. Since the number of ISI terms increases linearly with the duration Lg of the modulation pulse, it seems reasonable to choose a pulse with short to moderate time support. We will see in the subsequent chapters how this choice keeps the complexity at a manageable level. A band-limited modulation pulse with infinite time support could certainly be approximated as being time-limited (e.g., consider a Nyquist pulse, which has small ripples outside some finite interval [131]). It should be noted, however, that such an approach leads to an approximate model, and the relevance of its results may be somewhat unclear.
2.3.1
The Equivalent Vector Channel in Matrix Form
Equation (2.48) is more conveniently expressed using matrix algebra, rk,n = bTk hn + wk,n , where we have introduced bTk = bk−(L−1) bk−(L−2)
(2.50)
· · · bk ,
(2.51)
· · · h0,n ]T .
(2.52)
and hn = [hL−1,n hL−2,n
Next, consider the kth partial observation vector in (2.24), · · · rk,N −1 ]T = Bk h + wk ,
rk = [rk,0 rk,1
where Bk is an N × N L band matrix defined as T 0 bk 0 · · · 0 bT 0 · · · 0 k .. = diag {bT , bT , . . . , bT }, .. Bk = ... . . k k k T 0 ··· 0 b 0 0
···
k
0
bTk
(2.53)
(2.54)
29 while h is an N L × 1 column vector T h = hT0 hT1 · · · hTN −1
(2.55)
comprising all channel coefficients, and wk = [wk,0 wk,1
· · · wk,N −1 ]T
(2.56)
is the kth partial noise vector. Finally, also the total observation vector r in (2.23) can be compactly written in a similar matrix form, and when doing so, let us add sub- and superscripts to r in order to clearly indicate the time span (expressed in the symbol index k) of the continuous function r(t) being discretized, B0 h + w0 B1 h + w1 = h + w0K−1 , (2.57) r ≡ rK−1 = BK−1 . 0 0 . . BK−1 h + wK−1 if we let
= BT0 BT1 BK−1 0
· · · BTK−1
be an N K × N L data matrix and w0K−1 = w0T w1T
T · · · wK−1
T
T
(2.58)
(2.59)
is the total noise vector w, which was introduced in (2.26).
2.4
Discussion on Front-End Optimality
The second equality in (2.47) implies that y(t + 1Ts ) may be expressed as y(t + 1Ts ) = h,n ψn (t), (2.60) y(t + 1Ts )|ψn (t) ψn (t) = n
n
if the set {ψn (t)} constitutes an ON basis.10 Next, take the inner product of y(t + 1Ts ) with both sides of (2.60), |h,n |2 . (2.61) y(t + 1Ts ) 2L2 (0,Ts ) = n 10
For a tight frame, the frame bound should be included as a scaling factor.
30
Chapter 2
It is then easy to see that L−1 =0
|h,n |
2
=
L−1 Ts =0 0 LTs
n
2
|y(t + 1Ts )| dt =
L−1 (+1)Ts =0
Ts
|y(t)|2 dt = Ey ,
= 0
|y(t)|2 dt (2.62)
where Ey is the same as the energy of y. The channel impulse response c(τ ) will be normalized such that Ey equals the energy of the modulation pulse g(t), which is the symbol energy Es . Differently put, this simply means that the impulse response c(τ ) neither amplifies nor attenuates the symbol energy Es . It follows that L−1 =0
or Σ|h|2
2
|h,n | = Ey ≡ Es
n
Lg Ts
0
|g(t)|2 dt,
L−1 L−1 1 1 2 |h,m | ≤ |h,n |2 = 1, Es E s m n =0
(2.63)
(2.64)
=0
where {m} ⊂ {n} = E. Also, the signal-to-noise ratio (SNR) is defined as SNR Eb /N0 ,
(2.65)
Eb Es / log2 (Q).
(2.66)
where Eb is the bit energy,
Let us now reflect upon the results above. Equation (2.60) reveals that {h,n } are the coefficient functionals of the pulse y(t + 1Ts ) associated with the expansion {ψn (t)}. This actually creates a deep problem: since the receiver designer only knows that y(t + 1Ts ) is an L2 -function, s/he must use a spanning set {ψn (t)} that is complete in L2 (0, Ts )—otherwise there is no guarantee that the mapping Ψ is one-to-one, and the theorem of reversibility may be violated—but relation (2.64) indicates that the pulse y(t + 1Ts ) cannot have a large functional h,n for every function in the set {ψn (t)}. Now, from (2.48) it is seen that a small coefficient h,n has the effect of attenuating the data, which consequently means that mostly noise is extracted. If the space spanned by {ψn (t)} is infinite-dimensional
31 (which is the case for all expansions that are complete in L2 ) we expect this to happen in an infinite number of dimensions, and the breakdown of the optimal front-end becomes a matter of fact. Chugg and Polydoros have given a mathematical justification of this claim [33]. Intuitively, it may seem strange that one cannot gain from extracting more information, and strictly, it would be beneficial—if we could only afford to invest an infinite amount of complexity, as well as wait an infinite time. More about this in Section 2.7.
2.4.1
The Short-Time Fourier Transform
We recognize that (2.32) is the foundation of harmonic analysis, the most familiar expansion of which is doubtless the Fourier series representation.11 It may thus seem natural to propose a (truncated) Fourier series expansion for the purpose of discretization. In order to analyze one symbol interval at a time, we first define the interval Ik , (2.67) Ik = (kTs , (k + 1)Ts ), and a window function ζk (t) with compact support on Ik , ζk (t) = ζ(t − kTs )χIk (t),
(2.68)
where χIk (t) denotes the characteristic function of the interval Ik (or the indicator function of the interval Ik , i.e., the function that is one on Ik and zero elsewhere). Now, the short-time Fourier transform (STFT)— sometimes called the windowed Fourier transform—makes use of the spanning functions 1 2π k = 0, 1, . . . , K − 1, (2.69) ψk,n (t) = √ exp jn t ζk (t), n = 0, ±1, ±2, . . . Ts Ts This transform was first proposed by Gabor, who employed Gaussian window functions (which are not compactly supported) [58], and has been extensively studied in quantum theory as Weyl-Heisenberg coherent states; a name that is accounted for by the physicists’ way of viewing the coefficients Yk,n in (2.32) as inner products of the signal Y (t) with a discrete 11 As late as 1966, Carleson proved that a Fourier series expansion converges almost everywhere for all functions that belong to L2 (0, Ts ) [25].
32
Chapter 2
lattice of coherent states [39]. The coherent states are the family of functions {ψk,n (t)} generated from time-frequency translations of ψ, while the set of labels {k, n} specifying the translations is identified with a discrete lattice in the time-frequency space. In physics, these kind of expansions are also called atomic decompositions. When the window function ζk (t) is the rectangle window, i.e., when ζ(t) ≡ 1, we can think of the STFT as a classical Fourier series expansion. This is seen from the following. First, introduce the function Yk (t) such that (2.70) Yk (t) ≡ Y (t), ∀t ∈ Ik . The Fourier series expansion is only valid for functions satisfying the Dirichlet conditions, I. Yk (t + Ts ) = Yk (t), II. Yk (t) is bounded and piece-wise differentiable, III. Yk (t) = 1/2 Yk (t− ) + Yk (t+ ) at jump discontinuities (the average of the left-hand and right-hand limits), but these requirement are trivially met by defining Yk (t) to be the L2 function Y (t)χIk (t) periodically extended on R. See Figure 2.2.
Y (t)χIk (t) ✲
(k − 2)Ts
(k − 1)Ts
kTs
(k + 1)Ts
t
(k + 2)Ts
Figure 2.2: The periodic extension of Y (t)χIk (t). We note that the set of basis functions defined by (2.69) is infinite. This is an inherent property for all atomic decompositions that are complete in L2 . However, we can only use finitely many observables, so any practical receiver front-end will approximate Yk (t) by its projection into a finite-dimensional subspace, the linear span of a finite collection {ψk,n (t); n = n0 , n1 , . . . , nN −1 } of N synthesis functions. Moreover, in order to keep the complexity low, we would like to use rather few basis
33 functions, yielding an observation vector of low (or moderate) dimension N . The STFT is then involved with a problem. First, consider the rectangle window, which is well localized in the time domain. Windowing by a rectangle function followed by periodic extension is likely to introduce jump discontinuities in the (imaginary) periodic function Yk (t), and this implies unwanted high frequencies. We expect that fairly many coefficients are needed in order to get a small representation error (confer the well-known Gibbs/Walbraham phenomenon [92]). Now, could we instead look for a smooth window function (similar to the Gaussian window used in the Gabor transform); a function that is well localized in both time and frequency? The answer is negative, because the Balian-Low theorem says that the Heisenberg product of the window function is infinite if we want our Weyl-Heisenberg coherent states to be a (Hilbertian) basis for L2 [94].12 For the Weyl-Heisenberg case, we are thus forced to consider frames instead of bases, but as was previously concluded, bases have the advantage of creating uncorrelated noise samples. It is interesting to note that a different type of coherent states, better known as wavelets, are well localized in the time-frequency space, and some of these expansions actually constitutes an orthogonal basis for L2 [39]. Wavelets have received a lot of attention in the past fifteen to twenty years. Without going into any details, it has been claimed that wavelets are the best bases for representing functions with singularities [46]. Since the modulation pulse is usually fairly smooth (in order to achieve spectrum efficiency), it follows that the noiseless ISI response will also be fairly smooth (it will be essentially band-limited to the same frequency band as the modulation pulse)—provided that the energy of the channel impulse response is concentrated to the same band as the modulation pulse. It is then doubtful whether wavelets are much better than the STFT in the context of communication. Relatively few papers have investigated wavelets and communication, but a paper by Friedlander and Porat strengthens the conjecture that the advantage of using wavelets for detection is yet to be seen [57]. Also, many wavelets are difficult to synthesize in continuous time, and they are better suited for processing discrete-time signals. 12
The Heisenberg product is a standard measure of how well localized a function is in the time-frequency space; a measure that is limited according to Heisenberg’s inequality (and attains it minimum value only for Gaussian functions) [94].
34
Chapter 2
2.4.2
Prolate Spheroidal Wave Functions
Let us free ourselves from the demand for completeness in L2 and accept sub-optimality. Then, one idea is to develop an algorithm that adaptively searches through a large—but finite—set of basis functions, ranks them according to some cost functional (for example based on an energy measure), and selects the best ones. Algorithms that explore this idea have been proposed for data compression (the algorithms typically operate on a few thousand samples), such as the matching pursuit algorithm [112] and wavepackets [35]. It should be noted that these two algorithms both rely on a priori information about the signal of interest in order to define the finite set of basis functions. Since we lack that information, it is not clear how these algorithms could be generalized to fit our problem. We understand that the core issue is the fact that we have not yet been able to think of any intelligent way of selecting a finite set of basis functions. Loosely speaking, the chief problem is that L2 is a too wide search space. Instead of completeness in L2 , we should try to come up with a more meaningful optimality criterion. We know that all signals encountered in practice are constrained in bandwidth in one way or another, which means that the essential part of the power is confined to a limited frequency interval, and just an insignificant fraction of the power spill over outside this band [146]. A signal y(t) that is time-limited to [−Ty , Ty ], supp {y(t)} = [−Ty , Ty ],
(2.71)
and essentially band-limited to (−Wy , Wy ), F(y)2L2 (−Wy ,Wy ) F(y)2L2 (R)
≥ 1 − ∆,
(2.72)
where F(y) is the Fourier transform of y and ∆ is small, can be expanded in (orthonormal) prolate spheroidal wave functions (PSWFs), which minimizes the representation error
6 = sup y(t)∈Y∆
inf
{nm }0N −1 ⊂N
2 nN −1 y, ψk,n ψk,n (t) y(t) − n=n 0
L2 (R)
(2.73)
35 associated with a finite expansion [147, 98, 99].13 The symbol Y∆ denotes the space of functions satisfying (2.71) and (2.72). We see that the PSWFs are optimal in the minimax sense, i.e., they lead to the smallest possible energy loss for the worst function in Y∆ . Moreover, the PSWFs are eigenfunctions to the operator singling out (−Wy , Wy ) × (−Ty , Ty ), and they form a complete basis for Y∆ . We have finally found a well-defined way of truncating the (infinite) expansion: if an error 6 < 12∆ is desired, we only have to include the PSWFs with eigenvalues larger than 6, and the number of functions is given by [99] N = !2Wy Ty " + 1.
(2.74)
The drawback is that PSWFs cannot be expressed in closed form, and their practical value is hence limited.
2.4.3
Fractionally-Spaced Sampling
We have seen how there is a choice of expansions. In a complexity constrained system, there is competition between ease of front-end processing and ease of subsequent computation. Normally, it is tractable to have a simple discretizer, and then invest most of ones complexity in the detector unit. The simplest (complete) ON system in L2 is presumably the Haar system [95]. An expansion of this type has been suggested for discretizing statistically parameterized Rayleigh fading channels [82]. However, as we have already seen, any L2 -complete basis involves the problem of selecting a (finite) subset of basis functions. How about sampling functions [163]? They are not complete in L2 , and nor do they minimize the representation error 6 defined by (2.73). On the other hand, sampling offers a very attractable front-end in that the observables may be evaluated by means of Dirac’s delta function [163], n = 0, 1, . . . , N − 1, (2.75) ψn (t) = δ(t − nTδ ), N = Ts /Tδ , 13
The problem considered in [99] is the dual of the problem of interest to us: we have time-limited and essentially band-limited functions, whereas [99] treats band-limited and essentially time-limited functions. However, the prolate spheroidal wave functions are completely specified by the product Wy Ty , and the results in [99] can thus be carried trough with a dual interpretation.
36
Chapter 2
which yields the channel coefficients h,n = y(1Ts + nTδ ).
(2.76)
Recall that the correlation function of the (ideal) noise is assumed to equal E{w(t)w∗ (t+τ )} = N0 δ(τ ). This means that we must pass the signal through a lowpass filter prior to sampling. If the (ideal) lowpass filter has a cutoff frequency W , the correlation between two filtered noise samples becomes σ 2 E{wm wn∗ } = 2W N0 δmn ,
(2.77)
provided that the filter output is sampled at the Nyquist rate [141]. We see that Nyquist sampling preserves the white property of the additive noise. Since Y is not band-limited (in consequence of the pulse y being timelimited), information will be lost due to the lowpass filtering. However, provided that the cutoff frequency of the filter is sufficiently high, the loss of information will be negligible. If we assume that the channel impulse response has its energy concentrated to the same band as the modulation pulse (whose shape is known), a modulation bandwidth measure (such as the 99% in band power measure) may be used for designing the lowpass filter. Note that a band-limited pulse y is not time-limited and requires an infinite number of samples, taken over the entire time axis, to attain statistical sufficiency. Likewise, a time-limited pulse is not band-limited and requires infinitely dense sampling [155]. Hence, sampling implies suboptimal detection, because an optimal detector must operate on a sufficient statistic, which obviously becomes impractical. More importantly, samples extracted over the entire time axis are (most likely) associated with an unlimited number of symbol intervals. Hence, the number of hypotheses becomes unlimited, and the optimal detector must perform an unlimited tree search. We will see in the next chapter, however, that our joint estimation/detection problem implies an infinite path memory. This means that the entire data path is needed for making optimal decisions—irrespective of how the observables are obtained—and no search other than an exhaustive tree search is optimal.
37
2.5
Modulation Format
To keep down the number of terms in (2.48), we would like to employ a modulation pulse with a rather short time support. In addition, it has been mentioned that linear modulation is desirable for complexity reasons. A very simple choice of modulation would then be binary phase shift keying (BPSK) with rectangle shaped pulses, Es /Ts , when |t| < Ts /2, (2.78) grec (t) = 0, elsewhere. Rectangle shaped pulses are often used as a very first approximation to practical modulation pulses. However, a rectangle pulse has the drawback of not being well-localized in the frequency domain; its Fourier transform is (2.79) Grec (ν) = Es Ts sinc(Ts ν). It is easy to show that the energy Eξ confined to the band [−ξ/Ts , ξ/Ts ], where ξ ∈ N, is given by ξ/Ts 2Es 2Es 2πξ sin x dx = Si(2πξ), (2.80) |Grec (ν)|2 dν = Eξ π x π −ξ/Ts 0 which means that ξ = 11 is required if we want Eξ to be at least 99% of the total energy Es . Further, a lowpass filter with cutoff frequency W = ξ/Ts has a corresponding Nyquist rate of N = 22 samples per symbol. This example shows that bandwidth inefficient pulses are ill-suited when the front-end performs fractionally-spaced sampling. Also, the frequency spectrum is a limited resource (provided by nature), which is yet another reason for employing spectrally efficient modulation. One class of bandwidth efficient modulation schemes is well-known as continuous phase modulation (CPM) [7]. A CPM signal has the additional advantage of a constant envelope, which makes the modulation robust to nonlinearities in the transmitter amplifier. Moreover, CPM can be viewed as a recursive trellis code, and this proves to be valuable later (when we investigate a coded system). Following the classic description of CPM [7], the equivalent baseband signal can be written as K−1 pn Es K−1 , t ≥ 0, exp j 2π dk q(t − kTs ) + φc s(t; d0 ) = Ts pd k=0 (2.81)
38
Chapter 2
where pn and pd are relatively prime integers, and q(t) is the integral of the frequency response fq (t),
t
q(t) = −∞
fq (τ )dτ,
(2.82)
The pulse fq (t) is time-limited to the interval (0, Lq Ts ) and satisfies
fq (t) = fq (Lq Ts − t),
(2.83)
fq (τ )dτ = q(Lq Ts ) = 1/2.
(2.84)
Lq Ts
0
Without loss of generality, the initial phase of the carrier φc can be set to zero. Considering our quest for efficient communication over unknown ISI channels, this representation does not seem very handy due to its nonlinear data dependency. However, the section below presents a convenient alternative form for binary CPM signals.
2.5.1
Decomposition of Binary CPM Signals into PAM Waveforms
Laurent showed [103] that a binary (i.e., Q = 2) CPM signal can be expressed as a sum of I = 2Lq −1 PAM waveforms ) s(t; dK−1 0
=
s(t; bK−1 ) 0
=
I−1 K−1 Eb bi,k gi (t − kTs ), Ts
(2.85)
i=0 k=0
where the modified data symbols bi,k are related to the (binary) information symbols dk ∈ {±1} as Lq −1 k p n bi,k = exp jπ dm − dk− βi, . (2.86) pd m=0
=0
For 1 ≤ 1 ≤ Lq − 1, the parameter βi, ∈ {0, 1} is the 1th bit in the radix-2 representation of i, Lq −1
i=
=1
2−1 βi, , 0 ≤ i ≤ I − 1,
(2.87)
39 while βi,0 ≡ 0. Finally, the pulse gi (t) is given by Lq −1
gi (t) =
"
u(t + 1Ts + βi, Lq Ts ), 0 ≤ i ≤ I − 1,
(2.88)
=0
where u(t) is defined as sin{2πq(t)pn /pd }/ sin(πpn /pd ), 0 ≤ t ≤ Lq Ts , u(t) = Lq Ts < t ≤ 2Lq Ts , u(2Lq Ts − t), 0, elsewhere.
(2.89)
A similar expansion has also been derived for higher order CPM schemes [118]. Let us choose minimum shift keying (MSK) as our modulation scheme due to its simplicity; MSK is the most elementary (nontrivial) member of the CPM class. For MSK, Lq = 1 (full response CPM), pn /pd = 1/2, and 1/(2Ts ), 0 < t < Ts , fq = (2.90) 0, elsewhere. It follows that I = 2Lq −1 = 1, so the Laurent representation includes only one PAM signal g0 (t), g0 (t) ≡ u(t) = χ[0,2Ts ) (t) sin {πt/(2Ts )} .
(2.91)
We recognize the familiar interpretation of MSK as offset quadrature phase shift keying (OQPSK) where the modulation pulse is a half-cycle sinusoid with period 4Ts [129], i.e., ) s(t; bK−1 0
=
K−1
bk g(t − kTs ),
(2.92)
k=0
with k π {±j}, k = 0, 2, 4, . . . , dm ∈ bk = exp j {±1}, k = 1, 3, 5, . . . , 2
(2.93)
m=0
and g(t) =
Eb χ (t) sin {πt/(2Ts )} . Ts [0,2Ts )
(2.94)
40
Chapter 2
The state diagram of the just described MSK representation has been depicted in Figure 2.3 on the next page, where the previously transmitted symbol bk−1 has been used as state label. Note that (2.93) assumes that “1” is the starting state. j 1/ − 1
1/j −1 /j
−1 / 1
−1
1 −1 / − 1
1/ − j
−1 / − j 1/1
−j
Figure 2.3: MSK state diagram. The transitions are labeled dk /bk , while the states are labeled with bk−1 . Another advantage of MSK is that the modified data symbols are automatically differentially encoded. This is an efficient way to cope with the sign ambiguity in (2.48)—how could we tell whether the sign of the product bk− h,n stems from the data symbol bk− or from the channel coefficient h,n ? For example, consider the data block d50 = [−1, 1, 1, 1, −1, 1] with corresponding modified data block b50 = [−j, 1, j, −1, j, −1] given by (2.93). Now, it is easy to recover d50 from b50 —with the exception of the very first symbol d0 , which is a dummy symbol—by looking at the incremental phase change associated with each pair of consecutive symbols bk and bk+1 . If the phase changes clockwise (2.93) shows that the associated data symbol dk+1 is −1, otherwise it must be +1. This is also seen in Figure 2.3. We understand that both b50 and −b50 = [j, −1, −j, 1, −j, 1] correspond to the data block d50 = [d0 , 1, 1, 1, −1, 1], where d0 is the dummy symbol. Similarly, MSK is insensitive to an arbitrary phase offset, which makes it suitable for noncoherent communication. MSK is also very efficient in terms of bandwidth. Consider a front-end that employs fractionally-spaced sampling. A lowpass filter with cutoff frequency W = 1/Ts is then enough to guarantee that less than 0.25% of
41 the total power is discarded, which is readily verified by integrating the power spectral density ΦMSK (ν) on (−W, W ) [131], # $ W 16Eb 1/Ts cos(2πνTs ) 2 ΦMSK (ν)dν = dν ≈ 0.99756 Eb /Ts π 2 −1/Ts 1 − 16ν 2 Ts2 −W (2.95) The cutoff frequency 1/Tb implies a sampling rate of 2/Tb . Thus, extracting only two samples per symbol interval (i.e., N = 2) gives an insignificant energy loss. Finally, the pulse duration is Lg = 2 for MSK, and (2.47) and (2.48) lead to the model L c +1 bk− h,n + wk,n . (2.96) rk,n = =0
2.6
Example Channels
The characteristics of the channel impulse response only affect the performance of our detection algorithms; the channel characteristics cannot be exploited since the impulse response is unknown. Any type of channel could thus be used in order to evaluate the algorithms—the algorithms will not change, only their performance. Let us now define a few example channels based on a simple parametric modeling of the channel impulse response,14 c(t) = C
Tukey ζ[0,T (t; α) m)
M −1
Am exp{−am (t − tm )2 + jθm },
(2.97)
m=0
where C is a normalization constant such that Ey = Es , the parameter Tukey am controls the time-localization of the mth term, while ζ[0,T (t; α) is a m) cosine-tapered window function, also known as a Tukey window, supported on [0, Tm ) = [0, Lc Ts ), & & Tm & Tm & 1,' ( 0 ≤ t −& 2 < &α 2 |t−Tm /2|−α Tm /2 Tukey 1 (t; α) = ζ[0,T , α T2m ≤ &t − T2m & < T2m 2 1 + cos π (1−α)Tm /2 m) 0, elsewhere. (2.98) 14 Our model of the impulse response could be interpreted as a number of diffuse sources [160].
42
Chapter 2
The value α is the ratio of constant section to taper and is between 0 and 1; for α = 1 we get a rectangle window and for α = 0 we get a Hanning window. In this thesis, we will use α = 0.9, and the five impulse responses specified by (2.97)–(2.98), together with the parameters in Table 2.1. Table 2.1: Parameters for five example channels: Channel A, B, C, D, and E. The normalization constant is C ≈ 3.6920 for Channel A, C ≈ 3.4118 for Channel B, C ≈ 2.5980 for Channel C, C ≈ 6.5747 for Channel D, and C ≈ 9.6276 for Channel E. The localization parameters are normalized to 1/Ts2 , the delays are normalized to Ts , and Lc = 2. Ex. A B C D E
Attenuation {A0 , . . . , AM −1 } {0.9, 0.7} {1.0, 1.0, 1.0} {1.0, 1.0} {0.5, 0.5, 1.0} {0.5, 0.3, 0.6}
Localization {a0 , . . . , aM −1 } {100, 50} {50, 150, 150} {150, 50} {50, 50, 100} {50, 250, 200}
Delay {t0 , . . . , tM −1 } {0.3, 1.3} {0.5, 0.9, 1.8} {1.4, 1.6} {0.3, 1.2, 1.3} {0.1, 1.0, 1.8}
Phase shift {θ0 , . . . , θM −1 } {4π/5, 4π/5} {π, π/2, 3π/2} {5π/4, 4π/3} {8π/5, 5π/4, π/2} {π, 0, 3π/5}
It is also common to visualize the impulse response by means of the amplitude spectra that are implied by the discrete-time channel parameters (2.52). Each partial vector hn could be associated with a time function in the form L−1 h,n δ(t − 1Ts ), (2.99) hn (t) = =0
with Fourier transform Hn (ν) =
L−1
h,n e−jνTs ,
(2.100)
=0
and amplitude spectrum |Hn (ν)|. In order to judge the quality of the channel, one should calculate a distance spectrum of the error events (or at least the minimum distance, which gives an asymptotic characterization) [16]. Figure 2.4–Figure 2.13 illustrate our five example channels by showing the noiseless channel response, y(t) in (2.34), given a transmitted MSK modulation pulse, g(t) in (2.94), as well as the associated amplitude spectrum |Hn (ν)| for both sampling and the first two harmonics of the STFT in (2.69).
½
Re y Re g
0.5 0 −0.5 −1 0
1
2 3 Time [ Ts ]
Amplitude [ ( Es / Ts ) ]
1
½
Amplitude [ ( Es / Ts ) ]
43
4
1
Im y Im g
0.5 0 −0.5 −1 0
1
2 3 Time [ Ts ]
4
Figure 2.4: The noiseless response y(t) of Channel A to the MSK modulation pulse g(t). The asterisks mark the sampling times.
Amplitude [dB]
6 0 −6 −12 −18 Sample n=0 Sample n=1
−24 −30 −3.14
−1.57
0 Frequency ν Ts
1.57
3.14
Amplitude [dB]
6 0 −6 −12 −18 STFT harmonic n=0 STFT harmonic n=1
−24 −30 −3.14
−1.57
0 Frequency ν Ts
1.57
3.14
Figure 2.5: Amplitude spectra of Channel A associated with sampling and the STFT.
½
½
1
Re y Re g
0.5 0 −0.5 −1 0
1
2 3 Time [ Ts ]
Amplitude [ ( Es / Ts ) ]
Chapter 2
Amplitude [ ( Es / Ts ) ]
44
4
1
Im y Im g
0.5 0 −0.5 −1 0
1
2 3 Time [ Ts ]
4
Figure 2.6: The noiseless response y(t) of Channel B to the MSK modulation pulse g(t). The asterisks mark the sampling times.
Amplitude [dB]
6 0 −6 −12 −18 Sample n=0 Sample n=1
−24 −30 −3.14
−1.57
0 Frequency ν Ts
1.57
3.14
Amplitude [dB]
6 0 −6 −12 −18 STFT harmonic n=0 STFT harmonic n=1
−24 −30 −3.14
−1.57
0 Frequency ν Ts
1.57
3.14
Figure 2.7: Amplitude spectra of Channel B associated with sampling and the STFT.
½
Re y Re g
0.5 0 −0.5 −1 0
1
2 3 Time [ Ts ]
Amplitude [ ( Es / Ts ) ]
1
½
Amplitude [ ( Es / Ts ) ]
45
4
1
Im y Im g
0.5 0 −0.5 −1 0
1
2 3 Time [ Ts ]
4
Figure 2.8: The noiseless response y(t) of Channel C to the MSK modulation pulse g(t). The asterisks mark the sampling times.
Amplitude [dB]
6 0 −6 −12 −18 Sample n=0 Sample n=1
−24 −30 −3.14
−1.57
0 Frequency ν Ts
1.57
3.14
Amplitude [dB]
6 0 −6 −12 −18 STFT harmonic n=0 STFT harmonic n=1
−24 −30 −3.14
−1.57
0 Frequency ν Ts
1.57
3.14
Figure 2.9: Amplitude spectra of Channel C associated with sampling and the STFT.
½
½
1
Re y Re g
0.5 0 −0.5 −1 0
1
2 3 Time [ Ts ]
Amplitude [ ( Es / Ts ) ]
Chapter 2
Amplitude [ ( Es / Ts ) ]
46
4
1 0.5 0 −0.5 −1 0
Im y Im g 1
2 3 Time [ T ]
4
s
Figure 2.10: The noiseless response y(t) of Channel D to the MSK modulation pulse g(t). The asterisks mark the sampling times.
Amplitude [dB]
6 0 −6 −12 −18 Sample n=0 Sample n=1
−24 −30 −3.14
−1.57
0 Frequency ν Ts
1.57
3.14
Amplitude [dB]
6 0 −6 −12 −18 STFT harmonic n=0 STFT harmonic n=1
−24 −30 −3.14
−1.57
0 Frequency ν Ts
1.57
3.14
Figure 2.11: Amplitude spectra of Channel D associated with sampling and the STFT.
½
Re y Re g
0.5 0 −0.5 −1 0
1
2 3 Time [ Ts ]
Amplitude [ ( Es / Ts ) ]
1
½
Amplitude [ ( Es / Ts ) ]
47
4
1 0.5 0 −0.5 −1 0
Im y Im g 1
2 3 Time [ T ]
4
s
Figure 2.12: The noiseless response y(t) of Channel E to the MSK modulation pulse g(t). The asterisks mark the sampling times.
Amplitude [dB]
6 0 −6 −12 −18 Sample n=0 Sample n=1
−24 −30 −3.14
−1.57
0 Frequency ν Ts
1.57
3.14
Amplitude [dB]
6 0 −6 −12 −18 STFT harmonic n=0 STFT harmonic n=1
−24 −30 −3.14
−1.57
0 Frequency ν Ts
1.57
3.14
Figure 2.13: Amplitude spectra of Channel E associated with sampling and the STFT.
48
2.7
Chapter 2
Generalized Maximum-Likelihood Detection of Short Blocks
Had the channel been known, it would have been straightforward to employ the ML detection rule, which is optimal in the sense that it minimizes the average probability of detection error [104]. Let us mention that alternative performance measures exist, and researchers with different backgrounds (communication theory, signal processing, etc.) tend to appreciate different measures. For example, the performance of a blind (or semiblind) receiver is frequently evaluated by plotting the evolution of mean-square error (MSE) of the estimated channel coefficients over time. Such a measure may shed additional light on the detection algorithm, but we will adopt the communication purist’s view and use error rate as our principal design criterion. It should be noted that minimizing the MSE happens to be the same as minimizing detection errors for our particular problem. Given the observation vector, the ML sequence detector declares that the transmitted data sequence is the (in a probabilistic sense) most likely sequence, K−1 & K−1 * ) K−1 = arg max Pr d &r , (2.101) d 0 0 0 * K−1 d 0
* K−1 denotes a hypothesized data sequence. Obviously, this detecwhere d 0 tion rule minimizes the sequence error rate (or frame error rate), which is not necessarily the same as minimizing the bit error rate. We will have reason to come back to the issue of optimal symbol-by-symbol detection in Chapter 4 and 5, but for now it is enough to state that the performances of these two strategies are virtually identical in most applications [141]. The ML criterion (2.101) is usually expressed in terms of the likelihood * K−1 ), |d function p(rK−1 0 0 & K−1
) K−1 = arg max p rK−1 &* , d d0 0 0 * K−1 d 0
(2.102)
which is easily done by invoking Bayes’ rule. When both the data and the channel are unknown, (2.102) can no longer be applied explicitly due to the composite nature of the hypotheses—the likelihood function in (2.102) is undefined. The problem could have been reduced to a simple hypothesis-
49 testing problem provided that we had a statistical description of the nuisance parameter vector h in the form of a PDF p(h). The desired simplification is then obtained by integration, i.e., by averaging over the channel parameters [155], & K−1 & K−1
&* &* = p(h) p rK−1 (2.103) p rK−1 d d0 , h dh. 0 0 0 We are facing another case of interest, which seems less artificial, since the channel model has no PDF over which to average. Three approaches have been proposed: (I) maximization methods, (II) Bayesian methods suggesting integration over a noninformative prior, and (III) competitive minimax criteria [52] (confer the notion of a most stringent test in [104]). By a noninformative prior is meant a prior that contains no information about h (loosely speaking, a prior that favors no possible values of h over others) [19].15 Strictly, it may seem incorrect to associate a PDF with the parameter vector since the channel impulse response is modeled as a deterministic function. The impulse response could instead be thought of as a realization of a random process with unknown statistics. The difference between the two modeling views is philosophical. Berger et al. discuss the strengths and weaknesses of the Bayesian approach [20]. A practical weakness is that the mixture PDF in (2.103) is hard to compute in general. In this thesis, we will explore the first approach (which leads to a nice recursive detection algorithm as will be seen in the subsequent chapters). This is done by considering the generalized maximum-likelihood (GML) test, i.e., condition on the channel vector and perform a joint optimization over both the hypothesized data sequences and the possible channel vectors,
K−1
K−1 ) = arg max p r|d * * . ) ,h ,h (2.104) d 0 0 NL * K−1 , h∈C * d 0
Let us here discuss the question of detection optimality. The best performance we could achieve would be obtained if we knew the continuoustime channel impulse response—the detector could then use an optimal front-end (a bank of matched filters). However, due to the problems involved with the discretization, a comparison with such an imaginary detector seems unfair. We will instead use the known discrete-time channel 15
A uniform density is a common choice.
50
Chapter 2
as our benchmark. In other words, the performance of an ML detector for the corresponding known channel vector is the limit that the designer of detectors for unknown channels attempts to reach. The design goal is called a universal detector, which can be implemented without prior knowledge of the particular (discrete-time) channel over which transmission takes place, and it yet attains the same random-coding error exponent as the ML detector tuned to the channel in use (i.e., the optimal detector given knowledge of the actual channel vector) [155]. A universal detector will perform as well as the ML detector in an asymptotic sense, which means that it achieves the same performance for infinite sequences (i.e., K → ∞). It is relevant to ask if there exist universal detectors for the Gaussian ISI channel that we treat, and if so, is the GML detector universal? Let us mention that a training sequence approach is not universal even for very simple families of channels [51]. However, it has been shown that universal detectors do exist for the class of ISI channels [51]. To our disappointment, the existence proof in [51] suggests a very complicated construction, which requires the merging of all ML detectors16 for all the possible ISI channels into a single universal detector, and the complexity of evaluating any candidate data block is extremely high. The situation is simpler in the case of discrete memoryless channels, because the MMI detector is universal for this class of channels, and it is also equivalent to the GML detector [38]. This fact might lead one to conjecture that the GML detector is universal for any family of channels for which a universal detector exists, but this has proven to be false [102]. Yet, Lapidoth and Telatar have shown that the GML detector can achieve the same rates as the corresponding ML detector over Gaussian ISI channels [101]. Let us then apply GML detection. By noting that the noise terms {wk,n } in (2.48) are independent and Gaussian, (2.104) can be written as a joint least-squares (LS) minimization,
) = ) K−1 , h b 0
arg min
&2 L−1 && & ˜bk− h ˜ ,n & , &rk,n − & &
NL * K−1 , h∈C * b k,n 0
(2.105)
=0
where ˜bk− denotes a modified data symbol on a particular hypothesis 16 The ML detector is not unique, since ties in the detection rule can be resolved in various ways without changing the average probability of error.
51 transmitted at time index k − 1. Here, it was used that the mapping between the data block dK−1 = [d0 , d1 , . . . , dK−1 ] and the modified data 0 = [b , b , . . . , b ] block bK−1 0 1 K−1 is one-to-one. Note that a nonlinear mod0 ulation scheme, which gives rise to (2.44), leads to a much more difficult optimization problem. Now, since the symbols in (2.105) are discrete, it is natural to find the LS channel estimate for every (fixed) candidate data block. We could thus consider the metric (or score) for each hypothesized block as a function of the channel coefficients, & &2 L−1 & & K−1 ˜bk− h ˜ ,n & . * b * &rk,n − ) = L h( 0 & & k,n
(2.106)
=0
* ML , is calculated from the system The extremum point, which is denoted h of equations that arises from setting the partial derivatives of L to zero.17 This is more conveniently done if we first formulate the GML detection rule (2.105) in a form analogous to (2.57),
K−1 ) = ) ,h B 0 =
arg min
* B * K−1 ) L h( 0
arg min
K−1 * 2 , * K−1 h r −B 0 0
NL * * K−1 , h∈C B 0
NL * * K−1 , h∈C B 0
(2.107)
where z is the 12 -norm (the Euclidean norm) defined by z
√
zH z =
|z0 |2 + |z1 |2 + . . . + |zM −1 |2 , z ∈ CM .
(2.108)
* K−1 , we are thus facing the (standard) For each hypothesized data matrix B 0 LS problem, * 2 . * ML = arg min rK−1 − B * K−1 h (2.109) h 0 0 NL * h∈C
* ML is the maximum-likelihood estimate The index “ML” indicates that h of the unknown parameter vector, assuming that the hypothesized data * K−1 is true [155]. matrix B 0 17 The extremum is a indeed a minimum since L is readily verified to be convex on CN L .
52
Chapter 2
Before addressing the extremum point, it is important to realize that the system in (2.109) must be overdetermined, which is the case when N K > N L, i.e., K > L. (2.110) Otherwise it will be possible to achieve metric zero18 for all possible data * K−1 , and the candidates cannot be distinguished. Let us exmatrices B 0 emplify this by considering (2.106) and a specific example where K ≤ L. Assume that the block length is K = 2, the length of the channel impulse response is Lc = 1, the length of the modulation pulse is Lg = 1, and (without loss of generality) the number of basis functions is N = 1. Now, the metric
˜ 0,1 2 + r1,1 − ˜b1 h ˜ 0,1 − ˜b0 h ˜ 1,1 2 (2.111) L = r0,1 − ˜b0 h has a trivial extremum point, ˜ 0,1 = r0,1 , h ˜b0 ˜ ˜ ˜ 1,1 = r1,1 − b1 h0,1 , h ˜b0
(2.112) (2.113)
* K−1 attain L = 0 for any observasuch that all hypothesized blocks B 0 tion vector r = [r0,1 r1,1 ]T . One implication is that one-shot detection, where K = 1, as well as optimal GML detection of band-limited modulation, where Lg → ∞, becomes impossible over unknown ISI channels. A more detailed discussion on distinguishability of blocks over ISI channels is presented in [32]. After this digression, we return to the question of finding the extremum point in (2.109). Let us use the conventional definition of the conjugate derivative with respect to a complex-valued vector z having elements zm = xm + jym , m = 0, 1, . . . , M − 1 [87], ∂/∂x0 + j∂/∂y0 ∂/∂x1 + j∂/∂y1 1 ∂ (2.114) . .. ∂z∗ 2 . ∂/∂xM −1 + j∂/∂yM −1 18 Metric zero is trivially optimal due to the properties of the norm. This is also clear from the quadratic form of the objective function L in (2.106).
53 With this definition it is straightforward to prove that ∂L *∗ ∂h
=
∂ K−1 * K−1 * H K−1 * K−1 * r r0 − B0 h − B0 h *∗ 0 ∂h
*, * K−1 h * K−1 )H B * K−1 )H rK−1 + (B = −(B 0 0 0 0 which leads to the extremum point K−1 H K−1 −1 K−1 H K−1 * ML = (B * * * * K−1 )† rK−1 , h ) B (B ) r0 = (B 0 0 0 0 0
(2.115)
(2.116)
* K−1 )† is the (Moore-Penrose) pseudo-inverse of B * K−1 [68]. The where (B 0 0 last form (with the pseudo-inverse) is a standard solution to full rank LS problems [68].19 * K−1 that only To conclude, the channel estimate above yields a metric Λ 0 * K−1 (and the given observation depends on the hypothesized data matrix B 0 ), vector rK−1 0
* K−1 * K−1 ≡ Λ rK−1 , B Λ 0 0 0 * 2 = rK−1 − B * ML 2 * K−1 h * K−1 h min rK−1 − B NL * h∈C
0
0
0
0
* K−1 (B * K−1 )† rK−1 2 , = I−B 0 0 0 where I denotes an identity matrix. The detection criterion
* K−1 ) K−1 = arg min Λ rK−1 , B B 0 0 0 * K−1 B 0
(2.117)
(2.118)
K−1 * is an exhaustive search in the data space b , which grows exponen0 tially with the block length K. Thus, (2.118) does not lend itself to practical detection—unless K is very small—and the subsequent chapters will be devoted to the problem of practical evaluation of the data candidates.
* K−1 guarantees that all its columns are linearly The structure of the data matrix B 0 K−1 * independent, which implies that B0 has full rank; recall that b0 = [0 0 · · · 0 b0 ]T , b1 = [0 · · · 0 b0 b1 ]T , etc. 19
54
Chapter 2
2.7.1
Numerical Results and Discussion
Let us illustrate some of our previous claims with a few simple Monte Carlo simulations. Table 2.2 shows the error performance in terms of bit error rate (BER) versus SNR for a few different STFT expansions.20 Al++ |h,n |2 though system S1 extracts more energy than system S2 , since is greater, it performs slightly worse at 10 dB in SNR. This is because the channel parameters that arise from projection into the dimension spanned by the second harmonic (n = 2) are small and difficult to estimate. For a higher SNR (cf. 15 dB) or for a more overdetermined LS problem (cf. K = 12, i.e., systems S3 and S4 ), the detector starts to gain from including an additional dimension. Table 2.2: Error performance of different STFT expansions. System S1 S2 S3 S4
Expansion STFT, STFT, STFT, STFT,
n = 1, 2 n=1 n = 1, 2 n=1
Σ Σ |h,n |2
N
K
0.7001 0.6669 0.7001 0.6669
2 1 2 1
8 8 12 12
BER @ SNR=10 dB 0.0897 0.0834 0.0423 0.0427
BER @ SNR=15 dB 0.0052 0.0098 0.0005 0.0012
Figure 2.14 shows the performance of GMLSD for the previously defined example channels. Here, MSK is the modulation scheme, the frontend employs fractionally-spaced sampling with W = 1/Ts (N = 2 samples are extracted per symbol interval), and the block length is K = 6 symbols. The performance is seen to be totally different for different channels, and this is of course not desirable. In particular, the performance over Channel C is very bad. We can improve the performance by extending the block length as shown in Figure 2.15. Unfortunately, optimal sequence detection of considerably larger blocks is not reasonable due to the prohibitive complexity, and we have to find other ways. This is the main objective of the subsequent chapters, and we also hope to arrive at a solution that performs well irrespective of the particular impulse response. 20
We use the example system in [73].
55 0
10
−1
Bit error rate
10
−2
10
−3
10
−4
10
0
Channel C Channel B Channel A Channel D Channel E 5
10 15 20 Signal−to−noise ratio [dB]
25
30
Figure 2.14: Error performance of GMLSD for the five example channels specified in Table 2.1. The front-end employs fractionally-spaced sampling to extract N = 2 samples per symbol, and the block length is K = 6 symbols. 0
10
−1
Bit error rate
10
−2
10
−3
10
−4
10
0
Channel C; K = 6, 8, 10, 12 Channel A; K = 6, 8, 10, 12 Channel E; K = 6, 8, 10, 12 5
10 15 20 Signal−to−noise ratio [dB]
25
30
Figure 2.15: Error performance of GMLSD for various block lengths.
56
Chapter 2
Finally, Figure 2.16 illustrates the sub-optimality of sampling by comparing sampling with a front-end in which the observables are defined as the first Fourier coefficients in the STFT (which by no means is an optimized front-end). For a very long block, we could argue that sampling builds up an observation vector that approximates a sufficient statistic, but for the short block in Figure 2.16, comprising K = 6 symbols, it is clear that this approximation is crude. 0
10
−1
Bit error rate
10
−2
10
−3
10
−4
10
0
Channel C, Sampling Channel C, STFT Channel B, Sampling Channel B, STFT Channel E, Sampling Channel E, STFT 5
10 15 20 Signal−to−noise ratio [dB]
25
30
Figure 2.16: Error performance of GMLSD for sampling versus the two first harmonics of the STFT. The block length is K = 6 symbols.
Chapter 3
Hard Decision Tree Search Using Per-Survivor Processing “I could have done it in a much more complicated way” said the Red Queen, immensely proud. apocryphal quote erroneously attributed to Lewis Carroll
A
s concluded in Chapter 2, the number of GMLSD metrics (2.117) to be evaluated in (2.118) grows exponentially with the block length K. Another problem is the increasing complexity of each metric due to an increasing block length. An amendment for the latter problem would be to express each metric recursively, i.e., to find a way of exploiting (and refining) previous computations as time progresses. It is then crucial that the complexity of the recursion algorithm does not increase over time (i.e., as we move deeper and deeper into the search tree). We will start by deriving a recursive form of the LS metric (2.117). At the end of this chapter, we will further address the problem with the exponentially growing number of data hypotheses. 57
58
3.1
Chapter 3
A Forward Recursion for the LS Metric
* ML . This is Let us first find a recursion for the conditional ML estimate h done by means of the recursive least-squares (RLS) algorithm [87, 50]. It is well known that the standard RLS algorithm may suffer from numerical instability [50]. If we think of the RLS algorithm as an adaptive Newton search, we understand the importance of a well-conditioned Hessian [123]. Thus, it is common to consider a weighted version of the LS metric in (2.117) [32],
* k 2 * k ) (Xk )1/2 rk − B *k h * k0 ; x ≡ Λx (rk0 , B Λ 0 0 0 0 0
* k H Xk rk − B *k *k h *k h = rk0 − B (3.1) 0 0 0 0 0 0 , where we choose to define the N (k + 1) × N (k + 1) weight matrix Xk0 as Xk0 diag {xk I, . . . , x I, I}.
(3.2)
Here, I is N × N , while the forgetting factor x is constrained by 0 < x ≤ 1. Our simulations indicated well-conditioned matrices, so the forgetting factor was always discarded, i.e., x = 1 ⇒ Xk0 = I. Note that this is the optimal solution for the time-invariant problem under consideration. However, for time-varying problems it is common to employ the ad hoc solution implied by a non-identity weight matrix Xk0 [4]. For generality, Xk0 is retained in the subsequent calculations. Now, if we first introduce the shorthand notation (−1 ' *k *k * k H Xk B P B , (3.3) 0 0 0 0
k H k k * *k0 B X0 r0 , (3.4) q 0 it is straightforward to extend the results in Section 2.7 to the case of a weighted ML estimate (cf. (2.116) and the preceding derivation), k *k = P *k q h 0 0 *0 .
Let us follow Farhang-Boroujeny [50]. *k * k )−1 by first partitioning B Hessian P 0 0 (k−1 diag xX0 , I , (, xXk−1 0 k −1 ' k−1 H 0 * *H * P B (B 0 k 0 ) 0 I
(3.5)
We can find a recursion for the k−1 T * T T * = (B0 ) Bk and Xk0 = * k−1 B 0 * Bk
k−1 −1 * k. * *H B =x P +B k 0 (3.6)
59 Next, apply the Sherman-Morrison-Woodbury formula (also called the matrix inversion lemma) [50],
−1 H −1 −1
= A−1 − A−1 C B−1 + CH A−1 C C A , (3.7) A + C B CH with * k−1 )−1 , A = x (P 0 B = I, *H, C = B k
(3.8) (3.9) (3.10)
to (3.6), k−1 −1 * * k −1 *H B x P +B k 0 * k−1 − x−1 P * k−1 B * H I + x−1 B *k P * k−1 B * H −1 B * k x−1 P * k−1 . = x−1 P k k 0 0 0 0
*k = P 0
(3.11) Alternatively, (3.11) has the compact form
k−1 * *k P * k = x−1 I − G *k B P 0 0
(3.12)
expressed in the gain matrix * k−1 B * H I + x−1 B *k P * k−1 B * H −1 , * k x−1 P G (3.13) k k 0 0 *k P * k−1 B * H , can be which, if we multiply from the right by I + x−1 B 0 k rewritten as
k−1 H *k P * , * * k = x−1 I − G *k B B (3.14) G k 0 or using (3.12),
rk0
*k = P *H *k B G 0 k .
(3.15)
*k0 by partitioning In a similar way, T we can also find a recursion for q k−1 T T = (r0 ) rk , (, xXk−1 0 -, rk−1 ' k−1 H * H k 0 0 * * H rk . (3.16) *k−1 *0 (B0 ) = xq q +B Bk k 0 rk 0 I
Equation (3.16) leads to a new expression for (3.5), k *k P *H *k q * k *k−1 + P *k B * k *k−1 + G * k rk , h 0 0 * 0 = x P0 q 0 k rk = x P0 q 0 0
(3.17)
60
Chapter 3
* k by means of (3.12), where (3.15) was also used. Next, replace P 0
k−1 k −1 *k P *k q * k rk *k P *k B * k−1 q *0 + G h I−G 0 0 *0 = x x 0 *k P * k−1 q * k−1 q *k B * k rk *0k−1 − G *0k−1 + G = P 0 0
* k−1 + G * k−1 . * k rk − B *k h = h 0 0
(3.18)
The last relation is usually expressed in terms of the a priori estimation error, * k−1 , *k h * (3.19) ek−1 rk − B 0 that is,
* k−1 + G *k = h *k* ek−1 . h 0 0
(3.20)
We have thus derived the desired recursion for the (conditional) ML estimate. Finally, let us define an N L × N intermediate matrix, *H, * k−1 B *k P U k 0 as well as an N × N intermediate matrix,
* k −1 , *k xI + B *k U V
(3.21)
(3.22)
which yields yet another expression for the gain matrix in (3.13), *k V * k. *k = U G
(3.23)
* k will be reused below. Thus, U * k and V * k are motivated The quantity V for complexity reasons, since they make it possible to reduce the number of matrix multiplications. * k . Next, So far we have derived a recursion for the parameter vector h 0 we should also express the LS metric in (3.1) recursively. The details are left to Appendix A, and only the result is presented here,
with increment
˜k * k−1 + x λ * k0 ; x = x Λ Λ 0;x
(3.24)
˜k * * ek−1 . eH λ k−1 Vk *
(3.25)
The derived metric recursion is summarized in Table 3.1 on the next page. Later, Section 3.2 will describe how the tree implied by Table 3.1 may be searched.
61 Table 3.1: Metric forward recursion. • Input
* k−1 and P * k and h * k−1 and Λ * k−1 rk and B 0 0 0;x
• Output
* k and P * k and Λ *k h 0 0 0;x
1. Error estimation * k−1 *k h * ek−1 = rk − B 0 2. Computation of intermediate matrices * k−1 B *k = P *H U k
0 *k = xI + B * k −1 *k U V 3. Computation of the gain matrix *k = U *k *k V G 4. Channel vector adaptation * k−1 + G *k = h *k* ek−1 h 0 0 5. Inverted Hessian update
k−1 * k = x−1 I − G *k P *k B * P 0
0
6. Metric update * ek−1 *k = x Λ * k−1 + x * Λ eH 0;x 0;x k−1 Vk *
3.1.1
Initialization of the Forward Recursion
The standard RLS initialization is ad hoc and classified as a soft-constrained * k−1 = 0 and P * k−1 = ε−1 I, where ε is a small positive initialization: let h 0 0 constant [87, 50]. We will instead compute exact values on the desired quantities. The basic idea is to perform an exhaustive search over a moderate number of short blocks. This was described in detail in Section 2.7, but the solution can now also be expressed in equations (3.3)–(3.5), '
(−1 * k−1 H Xk−1 B * k−1 * k−1 = B , (3.26) P 0 0 0 0
* k−1 H Xk−1 rk−1 , *k−1 = B (3.27) q 0 0 0 0 k−1 k−1 k−1 * * * . q = P (3.28) h 0
0
0
* k−1 is thus computed for each hypothesized data matrix An estimate h 0 k−1 * B 0 . Note that b0 is the first symbol that is modulated and transmitted
62
Chapter 3
over the channel. By adding L − 1 leading zeros, i.e., [˜b−(L−1) , ˜b−(L−2) , . . . , ˜b−1 ] ≡ [0, 0, . . . , 0], it is readily shown that B * k−1 has full rank, and the 0 * k−1 exists on all hypotheses.1 inverted Hessian P 0 For binary modulation, there are 2k candidate data matrices. However, Chapter 2 explained that our detection problem involves a sign ambiguity, i.e., the data sequence [−1, −1, . . . , −1] yields the same metric as * k−1 . The [1, 1, . . . , 1], the only difference being a sign-reversed estimate h 0 initialization at time k − 1 can therefore be confined to only half the candidates, i.e., to only 2k−1 candidates. Further, recall the necessary condition from Section 2.7: the length k of the block must exceed the length L of the channel response. Loosely speaking, k should be large enough to yield a good parameter estimate * k−1 for the correct hypothesis. In other words, the initialization (exhaush 0 tive search) should hopefully indicate the correct starting node for the subsequent tree search. This corresponds to a low metric
* k−1 2 . * k−1 h * k−1 = (Xk−1 )1/2 rk−1 − B Λ 0;x 0 0 0 0
(3.29)
We understand that the number of evaluated metrics (the number of starting nodes) is a design parameter. After computing metrics for all starting nodes, a search tree is grown, where the metric for each child node is a refined value of the metric assigned to its parent node. This updating process was summarized in Table 3.1. One problem remains to be solved, namely the exponentially growing number of nodes, and a practical (suboptimal) search strategy will be presented in Section 3.2. Let us finally illustrate the forward initialization by an example where k − 1 = 3 and L = 3. As mentioned before, bit d0 is lost due to the differential encoding—only a relative difference between two symbols can be decoded—so the detector can assume that d˜0 = −1. Recall that we assumed (without loss of generality) that state “1” in Figure 2.3 is a known *3 starting state. It is therefore enough to evaluate the eight data words b 0 listed in Figure 3.1. 1 In practice, the zero-padding is done by separating any two blocks by a guard interval comprising all zeros.
63
Zero- ✛ padding
˜b−2 ˜b−1 ˜b0 ˜b1 ˜b2 ˜b3 0
0 −j −1 j
1
Node 0
−1
Node 1
0 −j −1 −j −1
Node 2
−1 −1 −1 d˜1 d˜2 d˜3
0
0 −j −1 j −1 −1
0
−1
0
0 −j −1 −j
0 −j
0 −j
0 −j
−1
Zero- ✛ padding
Node 5
1
Node 6
−1
Node 7
−1
1
j
1 1
1 1
j
1
0 −j
Node 4
−1 −1
1
0
Node 3
1
1 −j 1
0
1
1
1 −j −1 1
0
−1
1
−1
0
1
1
1
Figure 3.1: Example of the how the forward recursion is initialized at k − 1 = 3 when L = 3. The initialization time (k − 1) has been marked with double circles.
64
Chapter 3
3.2
A Search Strategy Based on the Viterbi Operation and Per-Survivor Processing
Let us now describe how the dyadic (or binary) tree can be efficiently searched.2 Each node at depth k − 1 is associated with the hypothesized * k−1 leading to the node. We first note that the number of nodes data path b 0 grows exponentially with time. More importantly, each emerging branch ˜ k . The expla(or edge) is assigned its own distinct incremental metric λ ˜ k depends on the entire hypothesized pre-history through nation is that λ * k−1 . This is clear from (3.19) and (3.25), and it the associated estimate h 0 means that we must rely on suboptimal search strategies—no search other than the exhaustive tree search is optimal. Before the actual search starts, we have already computed metrics for a certain number of starting nodes, say, for 2κ−1 starting nodes. This was done during the initialization phase described above. Here, we have denoted the initialization time with κ − 1 rather than k − 1 to stress that it is a fixed time index, i.e., κ is a constant chosen by the receiver designer. First, let us give a high-level description of the search strategy. The initial step is to expand the 2κ−1 starting nodes into 2κ children nodes. Refined metrics are computed for all 2κ children nodes. Next, a selection process prunes away half of the children nodes, and the metrics of the remaining children nodes are stored in a node vector.3 Once the survivor nodes have been selected, one cycle of the search has been completed. The whole procedure is subsequently repeated, i.e., the survivor nodes are expanded into new children nodes, and so on. Figure 3.2 exemplifies the forward search in a schematic way. The node vector obviously contains the same number of metrics as the number of starting nodes. We could also have started with a smaller number of nodes and then grown a full search tree (and computed metrics for all emerging nodes) until the number of nodes reaches a complexity barrier defined as 2κ−1 nodes. Instead, we choose to directly compute metrics for all the 2κ−1 nodes via an exhaustive search, i.e., by means of the previously described initialization process. The difference between the two approaches is merely an implementation detail. 2
Without loss of generality, the exposition is confined to binary modulation. Alternatively, the corresponding survivor sequences could have been stored in the node vector. This approach will be revisited in Chapter 5. 3
65
(a)
(b)
Figure 3.2: Schematic illustration of the forward tree search, where the number of starting nodes is 2κ−1 = 4. (a) The explored paths. (b) The survivor paths. It is now time to explain the search in more detail. The core operation is provided by the add-compare-select (ACS) unit of the Viterbi algorithm [56, 108]. A similar idea was also used in [80]. To make things more concrete, let us look at a specific pair of starting nodes in Figure 3.1 whose differential bits only differ in the first position d˜1 . In particular, consider node 0 and node 4, and the expansion of these two nodes into four children nodes. This is depicted in Figure 3.3 on the next page. The zero padding is not shown for simplifying reasons. First, incremental metrics are calculated and added to the metrics of the parent nodes (or starting nodes) according to Table 3.1, where the * 4 comprises ˜b2 , ˜b3 , and ˜b4 . This is done *k = B hypothesized data matrix B during the add operation of the ACS process. Second, the nodes are grouped in pairs of two such that the κ − 1 last differential bits are identical within each pair. The ACS unit subsequently compares the metrics of the two nodes in each pair and selects the node having lowest metric to be the survivor node of that pair. In our example, the two children labeled “Node a” and “Node A” are grouped, as well as “Node b” and “Node B.” Just like in the Viterbi algorithm, the survivor sequences are thus forced to have different recent (differential) bits. Differently put, the survivor sequences comprise all 2κ−1 bit combinations of
66
Chapter 3
the last κ − 1 bits, which at time epoch k corresponds to the hypothe*k ˜ ˜ ˜ sized bit strings d k−κ+2 = [dk−κ+2 , . . . , dk−1 , dk ]. In Figure 3.3, “Node a” and “Node A” correspond to the two candidate sequences ending with * 4 = [−1, −1, −1], while “Node b” and “Node B” are associated with d 2 * 4 = [−1, −1, 1]. d 2
˜b0
˜b1
✯ −j −1 ˜b0
˜b2
˜b3
−j −1 j
1
.
˜b1
−1 −1 −1 d˜1 d˜2 d˜3
/0 Node 0
.
1
−1 −1
/0 Node 4
1 −j
j
1
j
1
/0 Node b
1
1
1 −j −1 j 1
1 −j −1 1
j
−1 −1 −1
.
.
˜b4
/0 Node a
✯ −j
−j
˜b3
−1 −1 −1 −1 d˜1 d˜2 d˜3 d˜4
❥ −j −1
.
˜b2
−1 −1 −1
/0 Node A
1
1 ❥ −j
.
1 −j −1 −j 1
−1 −1
/0 Node B
1
1
Figure 3.3: Example of how two parent nodes are expanded into four children nodes. The L = 3 symbols used in the computation of the incremental ˜ k are marked with double circles. metric λ
67 We will see in Chapter 4 and Chapter 5 how a forward search may be combined with a search going backwards. Breadth-first algorithms facilitate the completion step due to a time-regular search front. As an alternative to the Viterbi algorithm, one could then employ the M-algorithm [6], or more generally, the SA(B, C) [8]. We will return to the issue of a bi-directional search in the next section. Moreover, note that the Viterbi condition imposes an additional restriction on our search, and as such, it might affect the performance negatively. The condition however guarantees a diversified set of survivor sequences, something which may seem advantageous since we are ultimately interested in generating soft information. It can here be interesting to refer to a related work where the A*algorithm [126] was employed to compute approximate a posteriori probabilities (APPs) [D. Bokolamulla, personal communication, Chalmers University of Technology, Sept. 2002]. As will be described in detail in the next chapter, the APP of an edge was calculated by averaging over a small number of consistent sequences. This did not work very well. At first, it was conjectured that the weak performance was due to the small number of used sequences. In this thesis, as well as in a related study [23], we find that a small number of sequences works fine. The trick is that we use different (small) sets of sequences for different time indices rather than a fixed set of sequences for all time indices. More about this is Chapter 5. The just presented search strategy seems to imply a dynamical data structure. It is often easier to implement a regular structure than a dynamical one, so let us investigate this matter. Actually, we will see how a regular structure may be used for our search algorithm. We first note that only one survivor path enters each node. This means that each child node in Figure 3.3 could have been built up by four data symbols instead of five, because the very first symbol ˜b0 can be found by a trace-back operation, i.e., by following the branch entering the child node backwards. Now, look at “Node a” and “Node A.” Apart from the first data symbol, we see that the two nodes correspond to sign-reversed data words. One could then ask whether it is somehow possible to always continue the search from, say, “Node a”? To answer this question, imagine that the metrics of “Node a” and “Node A” have already been computed. If “Node a” is selected to be the survivor node, there is of course no problem in continuing the search from “Node a.” When “Node A” is selected to be the survivor node, we
68
Chapter 3
would still like to continue from “Node a.” Since an imaginary transition from “Node A” to “Node a” involves a sign switch of the hypothesized sequence, we must also switch the sign of the associated channel estimate ˜ 4 . Thus, if “Node A” is selected to be the survivor node, we store its h 0 metric in “Node a,” together with a sign switched copy of the estimate calculated for “Node A.” The search is then continuing from “Node a” as if the sign-reversed sequence had been the original candidate. It should further be mentioned that also the inverted Hessian needs to be stored. For an odd time epoch, the node vector may be updated as in Figure 3.4 on the next page, where κ − 1 = 3. It should be stressed that the concept of mergers in not valid; the figure merely illustrates how a regular data structure may be used to represent the metric recursion. In spite of the similarity to one recursion of the Viterbi algorithm, one must remember that our search is performed in a tree. Naturally, each survivor sequence keeps and updates its own individual vector of estimated parameters, based on its own data sequence, namely its associated data history. This concept is known as per-survivor processing [142, 134].
3.3
Numerical Results and Discussion
In this section, we give numerical results in the form of (Monte Carlo simulated) error rates versus SNR for hard decision decoding of data transmitted over Channel A. The bit decisions were delivered by means of the standard Viterbi trace-back operation, where the Viterbi decision depth was 25 symbol intervals [56, 108]. The size of the node vector was 2κ−1 = 32 nodes, which means that κ − 1 = 5 differential bits or κ = 6 symbols were used in the initialization phase. To get a more reliable starting estimate, we further used a preamble comprising 25 pilot symbols. As described before, MSK was used as modulation scheme, while the front-end employed fractionally-spaced sampling with W = 1/Ts , i.e., two samples (N = 2) were extracted per symbol interval. Figure 3.5 shows the average bit error rate. Since the curve is obtained by averaging over a large number of blocks (of length K = 1024 data symbols) the figure is not able to capture the main weakness of the presented algorithm, namely the risk of loosing a whole frame (or block).
69
˜bk−4 ˜bk−3 ˜bk−2 ˜bk−1
Node 0
−j −1 j
˜bk−3 ˜bk−2 ˜bk−1 ˜bk
1
−1 −1 −1 d˜k−3 d˜k−2 d˜k−1
Node 1
−j −1 j
Node 2
Node 3
−j −1 −j −1
Node 4
−j
−j
−j
−1
−j .
1
1
Node 2
−1
1
−1 −j 1
−1
1
1
Switch Switch
j
Node 4
1
Node 6
−1
j
1 1
Node 5
1
1 −j 1
✲ −1 −j
.
Node 3
1
−1 −1
−1 −j ✯
−1 1
1
Switch Switch
−1
/0 Even time epoch
j
−1 j
1
✯ −1 −j −1 −j
1
j
1
❘ −1 ✒
1
1
1
j
Node 1
❘ −1 −j −1 ✒
1
j
1
❥ −1 ✣
Node 0
j
1
Switch −1
1
−1 −1
1
Node 7
1
1 −j 1
Node 6
1
j
Switch −1
1 −j −1 1
Node 5
−1
1
❥ −1 ✣
1 −j
Switch −1 −1
1
−j −1 −j −1 −1
j
Switch −1 −1 −1 d˜k−2 d˜k−1 d˜k
−1
−1 −1
✲ −1 ✕
/0 Odd time epoch
1
Node 7 1
Figure 3.4: Example of how the node vector is updated in the forward direction at an odd time k, where κ − 1 = 3. The L = 3 symbols used in ˜ k are marked with double circles. Tranthe computation of the increment λ sitions marked “Switch” involves a sign switch of the calculated channel estimate.
70
Chapter 3 0
10
−1
Bit error rate
10
−2
10
−3
10
0
2
4 6 8 Signal−to−noise ratio [dB]
10
12
Figure 3.5: Bit error performance of joint forward-only estimation/detection over Channel A. The node vector contained 32 nodes, and 25 pilot symbols were used. Let us instead look at Figure 3.6, where we have chosen to consider the fraction of blocks containing more than 0, 10, 100, 250, and 500 errors, respectively. The block length is K = 1024 data symbols. Note that the 0error threshold corresponds to the standard definition of frame error rate, i.e., the fraction of erroneous frames. We observe that the three curves associated with the 10-error threshold, the 100-error threshold, and the 250-error threshold are clustered together. Also, the probability that any one of these thresholds are exceeded is quite high. To conclude, there is a fairly high risk that the decoded block contains a considerable number of erroneous bits. What is the reason for the catastrophic behavior? Loosely speaking, a grossly erroneous data hypothesis associated with a gravely erroneous parameter estimate might yield a good overall metric. A bad estimate deludes the receiver even further along the false candidate path, which, in turn, leads to an even worse estimate, etc. The situation might escalate. One remedy for this problem would be to increase the number of pilot symbols in order to get started with a better estimate. However, the MSE
71 0
Fraction of corrupt blocks
10
−1
10
−2
10
Threshold: 0 errors Threshold: 10 errors Threshold: 100 errors Threshold: 250 errors Threshold: 500 errors
−3
10
0
2
4 6 8 Signal−to−noise ratio [dB]
10
12
Figure 3.6: Error performance of joint forward-only estimation/detection over Channel A. The vertical axis is the fraction of corrupt blocks, where a block is defined as corrupt if the number of bit errors exceeds a given threshold. The block size was K = 1024 symbols, and 25 pilot symbols were used. The node vector contained 32 nodes. of the estimate decreases slowly with an increased block length, which is seen from a back-of-the-envelope calculation if we let L = 1 and N = 1, & & ˆ E |h ˆ − h|2 = E &(BK−1 )† wK−1 &2 MSE(h) 0 0 & & K−1 H K−1 &2 H K−1 −1 ) B (B ) w = E & (BK−1 0 0 0 0 2& + 3 & ∗ &2 & K−1 Kσ 2 1 ∗ ∗ k=0 bk wk & & = = E & +K−1 E b w b w = k k 2& K2 K2 k=0 |bk | k
=
σ2 K
∼
1 . K
(3.30)
We must also bear in mind that the redundancy increases due to an increased number of pilot symbols. It is not reasonable (or advantageous) to use a large number of pilot symbols for short and moderate blocks. Hence, to circumvent the problem of catastrophic blocks by increasing the number
72
Chapter 3
of pilot symbols seems to be a poor solution. Ideally, paths with bad estimates should also have bad metrics, or put differently, a path with a bad estimate should pay a penalty. Chugg and Anastasopoulos suggested bi-directional estimation as a possible solution [5]. Survivor paths at depth m are completed with another set of survivor paths that are found by searching a time-reversed tree from the end of the data block down to time index m + 1. The ingenuity of the method in [5] is that the forward and backward metrics are decoupled, i.e., the forward and backward survivors are associated with independent estimates. This provides a desired estimation diversity. More specifically, if the forward estimate is bad, there is a chance that the backward estimate is good, and vice versa. A somewhat related idea was proposed by Chen et al. as a means of improving the error performance of SOVA [28].4 However, the completion in [28] is completely ad hoc. We will come back to bi-directional estimation and derive an exact completion procedure in Chapters 4 and 5. A completely different approach is to detect whether a frame is catastrophic or not, and request retransmission of catastrophic frames. This idea has been explored in for example [89, 158, 135]. Retransmission of a frame would further enable the possibility of combining soft information from multiple frames [122, 158]. Such a technique lies beyond the scope of this thesis. Let us merely indicate how it would be possible to detect a catastrophic frame by considering Figure 3.7 and Figure 3.8. Here, ad hoc soft information on a particular bit was obtained by adding (and normalizing) the forward and backward metrics associated with sequences consistent with the considered bit.5 Details of this computation will be explained in Chapters 4 and 5. For now it is enough to say that the block length was 512 bits, Channel A was used, and the SNR was 6 dB. Final decisions were produced by rounding the soft information to the nearest integer. The two figures show how the soft information is distributed for typical blocks with few bit errors and a large number of bit errors, respectively. Based upon these examples, a simple retransmission criterion is to retransmit all bits whose soft information is between the thresholds α and 1 − α. Alternatively, we could request retransmission of the whole block if a certain number of bits have soft information in the interval [α, 1 − α]. A good value on α is determined experimentally (it may vary with the SNR). 4 5
Forward and backward metrics were combined in order to produce improved metrics. The completion term, which will be defined in Chapter 4, was discarded.
73 Also, compare the similarity between these ideas and standard stopping rules for iterative detection [114]. 140
120
Number of bits
100
80
60
40
20
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Soft information
Figure 3.7: A histogram of the soft information in a typical noncatastrophic frame. 18
16
Number of bits
14
12
10
8
6
4
2
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Soft information
Figure 3.8: A histogram of the soft information in a typical catastrophic frame.
74
Chapter 3
3.4
A Backward Recursion for the Time-Reversed LS Metric
As explained above, we will later make use of a backward recursion. Hence, let us reverse the time arrow and write down the expression for the backward recursion, b ˜b = x Λ *b * * K−1 + x λ * K−1 + x (* * K−1 = x Λ ebk+2 )H V Λ k+1 k+1 ek+2 , (3.31) k+1 ; x k+2 ; x k+2 ; x
where the time-reversed LS metric is analogous to (3.1), K−1 1/2 K−1 K−1 * K−1 * K−1 2 * K−1 h * K−1 rk+1 − B Λ k+1 ; x ≡ Λx (rk+1 , Bk+1 ) (Xk+1 ) k+1 k+1
* K−1 ,(3.32) * K−1 * K−1 H XK−1 rK−1 − B * K−1 h = rK−1 k+1 − Bk+1 hk+1 k+1 k+1 k+1 k+1 with an N (K − k − 1) × N (K − k − 1) reversed weight matrix K−k−2 I}. XK−1 k+1 diag {I, x I, . . . , x
(3.33)
Like in (3.2), I in (3.33) is N × N , while the forgetting factor x is limited to (0, 1]. Because of the similarity to the forward recursion, we omit all details and directly summarize the backward recursion in Table 3.2 on the next * b in (3.31) are defined in Table 3.2. page. The quantities * ebk+2 and V k+1 It should be noted, however, that the backward recursion is not a mirrored version of the forward recursion, because the ISI channel memory goes backwards in time. For example, when we consider the backward recursion at time epoch k, it means that the current observable rk is pre* k−1 . This is a direct consequence ceded by L − 1 hypothesized symbols b k−L+1 * k . Thus, the backward metric at time k involves of the structure of B * K−1 . The forward metric at time k, on the other hand, the symbols b k−L+1 * k . The conclusion is that the backward metric is involves the symbols b 0 computed by means of smoothing, whereas the forward metric is computed by means of filtering [87]. We could then conjecture that the backward estimate should be slightly better than the forward estimate (although we observed no discrepancy in our simulations).
75 Table 3.2: Metric backward recursion. • Input
* K−1 and P * k+1 and h * K−1 and Λ * K−1 rk+1 and B k+2 k+2 k+2 ; x
• Output
* K−1 and P * K−1 and Λ * K−1 h k+1 k+1 k+1 ; x
1. Backward error estimation * K−1 * k+1 h * ebk+2 = rk+1 − B k+2 2. Computation of intermediate matrices *H *b = P * K−1 B U k+1 k+1 k+2
−1 *b = xI + B *b * k+1 U V k+1
k+1
3. Computation of the backward gain matrix *b V *b = U *b G k+1 k+1 k+1 4. Channel vector adaptation b * K−1 = h * K−1 + G *b * h k+1 ek+2 k+1 k+2 5. Backward inverted Hessian update
K−1 * * K−1 = x−1 I − G * *b B P k+1 k+1 Pk+2 k+1 6. Metric backward update b *b * * K−1 + x (* * K−1 = x Λ ebk+2 )H V Λ k+1 ek+2 k+1 ; x k+2 ; x
3.4.1
Initialization of the Backward Recursion
By analogy with (3.26)–(3.28), one can compute the time-reversed quantities (−1 '
* K−1 * K−1 = * K−1 H XK−1 B P B , (3.34) k+2 k+2 k+2 k+2
K−1 H K−1 K−1 * *K−1 = B Xk+2 rk+2 , (3.35) q k+2 k+2 * K−1 = P * K−1 q *K−1 , (3.36) h k+2
and
k+2
k+2
* K−1 2 , * K−1 h * K−1 = (XK−1 )1/2 rK−1 − B Λ k+2 ; x k+2 k+2 k+2 k+2
(3.37)
76
Chapter 3
* K−1 . By adding L − 1 trailing zeros, for each hypothesized data matrix B k+2 * K−1 has i.e., [˜bK−(L−1) , . . . , ˜bK−2 , ˜bK−1 ] ≡ [0, . . . , 0, 0], the data matrix B k+2 * K−1 exists. Following this notation, the L − 1 last data full rank, and P k+2 symbols are the trailing zeros, so the effective block length becomes K − L + 1. It is clear that the last real (physical) symbol, which is modulated and transmitted at time epoch K − L, has a nonzero energy at time epoch K − 1 due to the time dispersion. Hence, an optimal detector must also process the observables rK−1 K−L . The backward initialization is illustrated in Figure 3.9 on the next page, where k + 2 = 62, K − 1 = 65, and L = 3. If state “−1” in Figure 2.3 is a known termination state, it is enough to evaluate the eight data words * 63 listed in Figure 3.9. The data words may be considered to be built b 60 up in the time-reversed direction, i.e., starting from ˜b63 and ending with ˜b60 . Starting with the differential bit d˜63 , one should then go against the arrows in Figure 2.3. If we instead choose state “1” as termination state the hypothesized * 63 has to be sign-reversed, but the metrics remain unchanged. data words b 60 We can thus postulate either one of the two states “1” and “−1” as our termination state. If “1” (or “−1”) is the starting state, this assumption corresponds to an even number K − L + 1 of transmitted symbols. By switching sign of some of the nodes and/or reordering some of the nodes, it is possible to transform the data words in Figure 3.9 into the ones listed in Figure 3.1. In particular, switch sign of node 0 and node 5, swap node 1 and node 4, and switch sign of node 3 and node 6 and swap these two nodes. By doing so, the structure of the backward tree becomes identical with the structure of the forward tree. What is somewhat complex is the completion step, described in Chapters 4 and 5. In a related work, devoted to bi-directional joint estimation/detection over statistically parameterized Rayleigh fading channels, we introduced the concept of a virtual trellis section in order to describe the completion [74]. For pedagogical reasons it is easier to imagine the forward and backward recursions operating in two separate structures. The forward and backward searches may anyhow run simultaneously.
77
✲
Zeropadding
✲
Zeropadding
˜b60 ˜b61 ˜b62 ˜b63 ˜b64 ˜b65 Node 0
1 −j −1 0
j
0
−1 −1 −1 d˜61 d˜62 d˜63
Node 1
−j
1 −j −1 0 −1 −1
1
Node 2
−j −1 −j −1 0 −1
Node 3
1
Node 4
−j −1 j
1
Node 6
j
−1
Node 7
−j
1
1
1
−1 0
0
−1 0
0
−1 0
0
1
j
1
0
1
j
1 −1
−1 0 1
−1 j
j
0
−1
1
−1 −1
Node 5
0
−1
1
−1 −j −1 0
j
0
1
Figure 3.9: Example of the how the backward recursion is initialized at k + 2 = 62 when K − 1 = 65 and L = 3. The initialization time (k + 2) has been marked with double circles.
78
Chapter 3
Chapter 4
The Forward-Backward Algorithm in the Presence of Uncertainty For particulars, as every one knows, make for virtue and happiness; generalities are intellectually necessary evils. Aldous Huxley, Brave New World
S
o far, we have discussed hard decision decoding and mono-directional joint estimation/detection, i.e., forward-only or backward-only joint estimation/detection. This chapter is devoted to soft decision decoding and bi-directional joint estimation/detection. In particular, we will present the celebrated forward-backward algorithm, which operates on soft information and also delivers soft information. We will further discuss the condition for the applicability of this algorithm with implications for unknown channels. It will be seen how the forward-backward algorithm can execute in different modes: (I) a forward-only estimation mode, (II) a backward-only estimation mode, as well as (III) a bi-directional estimation mode. As concluded in the previous chapter, we are especially interested in the bi-directional variant.
79
80
Chapter 4
4.1
A Renaissance for Iterative Detection
In 1948, Shannon derived a theoretical capacity limit to reliable communication [143]. He further proved that a completely random code (i.e., a randomly chosen mapping set of codewords) can achieve arbitrarily small probability of error at any communication rate up to the capacity of the channel [143]. Unfortunately, this performance is approached only as the lengths of the codewords tend to infinity, and since the number of codewords then increases exponentially, it makes the decoder’s search for the most likely codeword impractical—unless the code provides for a simple search technique. It is hardly surprising, therefore, that the interest in algorithms delivering soft information was greatly intensified in 1993 due to the introduction of turbo codes. In a seminal paper [21], Berrou et al. announced how turbo codes—a system based on parallel concatenated recursive systematic convolutional codes and soft decisions—attains the Holy Grail of communication theory, sought for nearly half a century; turbo codes were the first (practical) near-optimal codes with performance close to the Shannon capacity limit for AWGN channels. Shortly after the invention of turbo codes, it was clear that also interleaved serially concatenated codes offer comparable performance [18]. Concatenated codes were studied by Forney already in the mid 1960s [54], but the paramount novelty of turbo codes was to embed a (large) pseudo-random interleaver in the code structure to form an overall concatenated code with a distance distribution that mimics that of random coding [13]. Rather than attacking the weights of the codewords (which is the conventional way of designing asymptotically good codes), turbo codes attack the bit error multiplicities, an approach called spectral thinning [141]. Influential work along these lines of thought was made by Battail a few years before the dawn of turbo codes [12]. Another key concept of turbo codes was that of iterative detection, the computational complexity of which is similar to that needed to separately decode the two constituent codes. Interestingly, iterative detection was first suggested by Gallager in the early 1960s as a method for decoding low-density parity-check codes [59, 60], but these codes largely fell into oblivion for three decades before MacKay and Neal rediscovered them in 1995 [111, 110].1 Modern computer simulations (unavailable at the time of 1
The 1995 MacKay-Neal paper contains two flaws, and the reader is instead referred
81 Gallager’s work) have shown that low-density parity-check codes perform similar to turbo codes. Moreover, it should be mentioned that Lodge et al. independently proposed iterative detection of parallel concatenated codes at the same time as Berrou’s work appeared [106]. However, the interleaver was tailored to the constituent codes, thus loosing some of the magic of turbo codes. The core of iterative detection is the exchange of soft information in an iterative manner. Berrou et al. (essentially) generated soft information in the form of APPs by employing the BCJR algorithm, named after Bahl, Cocke, Jelinek, and Raviv [10]. The term “BCJR algorithm” is slightly inequitable, and we will instead write the “forward-backward algorithm,” because this algorithm was invented by Lloyd Welch in 1962 [unpublished work], and seems to have first appeared in two independent publications in 1966 [14, 27]. See also a survey on the Baum-Welch algorithm [133], and the article by McAdam, Welch, and Weber published in 1972 [115]. Later, Benedetto et al. proposed a slight generalization of the original forwardbackward algorithm in order to handle parallel trellis transitions, and they called it the “soft-input soft-output (SISO) algorithm” [17]. A plethora of papers have been written on turbo-like coding techniques and iterative detection during the past decade. To recapitulate all the important achievements is beyond the scope of this thesis, and the reader is instead referred to a recent book by Chugg et al. [30], which presents a general view of iterative detection and its applications. Let us just mention that Gallager was aware that his iterative decoding algorithm could be proved to yield exact solutions only when the underlying graphical representation of the code had no cycles, but he also noted that the algorithm gave good experimental results even when cycles were present [59, 60]. Gallager’s observation did not attract much attention until Tanner, who realized its importance in 1981, founded the general study of codes defined on graphs and recast Gallager’s algorithm in message-passing form [150]. Tanner’s work also went relatively unnoticed until Wiberg, Loeliger, and K¨ otter rediscovered his results, and introduced states in Tanner’s graphical structure [161, 162]. This allowed a connection to trellises and turbo codes, and turbo decoding was interpreted as a message-passing algorithm on a graphical structure with cycles. At the same time, MacKay and Neal realto a publication from 1997. See also the work by Sipser and Spielman [144].
82
Chapter 4
ized that turbo decoding is an instance of Pearl’s belief propagation algorithm, which was developed in the area of artificial intelligence for solving the problem of probabilistic inference on a Bayesian network [111, 116]. Recently, the subject has been abstracted and refined under the rubrics of factor graphs [97] and the generalized distributive law [1]. Although these are powerful tools for anyone who is trying to devise new decoding algorithms, our exposition will follow a more traditional approach. In the wake of the tremendous success of turbo codes, we have witnessed a paradigmatic shift (at least in academia): the forward-backward algorithm has replaced the Viterbi algorithm as the standard decoding block in digital communications.
4.2
The Forward-Backward Algorithm Using Forward-Only Estimation
The classic derivation of the forward-backward algorithm, given by Bahl el al. [10], assumes that the channel is discrete and memoryless. In contrast to those assumptions, this section treats APP algorithms for problems characterized by path memory (or correlation),2 where it is worth pointing out that the memory is due to some degree of uncertainty about the channel. We will see that a slightly modified version of the forward-backward algorithm is also capable of computing APPs for the more generic problem associated with (finite) memory. A similar conclusion was independently drawn by Gertsman and Lodge [64]. We have seen in Chapter 3 that the GML detector operates in a tree. Hence, let us start by considering a Q-ary tree, and in particular, the edge (or the branch) labeled 1 ∈ {0, 1, . . . , Q −1} at time index k ∈ {0, 1, . . . , K− 1}. This particular edge will be denoted ek, . Note that the highest label is limited by the relation Q ≤ Qk+1 , where we have equality when all edges (up to time k) have a nonzero a priori probability. As an example, Figure 4.1 depicts a dyadic tree (i.e., Q = 2) with depth K = 8 and edge e4,18 marked. This tree has been constructed such that the upper edge from each node is associated with bit zero (d˜k = 0), while the lower edge corresponds to bit one (d˜k = 1). Moreover, the last two bits are known to 2
The work presented in this section has partly been published in [74] and [75].
83 be zero (d6 = d7 = 0); the reason for this redundancy will become evident below.
d˜1 = 0 d˜1 = 1
d˜0 = 0
e4,18
d˜0 = 1
d˜1 = 0 d˜1 = 1
k = 0
1
2
3
4
5
6
7
Figure 4.1: Dyadic search tree for K = 8 and d76 = [0 0].
84
Chapter 4
Now, assume that we are interested in finding the APP of edge ek, , i.e., edge e4,18 in our example. The desired APP is defined as the probability for the whole block of K symbol of ek, given the observation vector rK−1 0 intervals, ). (4.1) APP(ek, ) Pr(ek, |rK−1 0 This probability is easily rewritten with Bayes’ rule, )= Pr(ek, |rK−1 0
|ek, ) Pr(ek, ) p(rK−1 0 p(rK−1 ) 0
=
1 |ek, ) Pr(ek, ), p(rK−1 0 C
(4.2)
where C is a normalization constant that may be calculated by averaging * K−1 , i.e., over all hypothesized bit sequences d 0 * K−1 ) p(rK−1 |d * K−1 ). Pr(d (4.3) C= 0 0 0 * K−1 d 0
However, we are ultimately interested in calculating the APPs for all edges at time index k. This means that there is no need for computing the normalization constant C, because we know that all APPs for each time index k must sum up to one. The normalization can thus readily be done afterwards, and we realize that C does not provide any additional information. Let us henceforth disregard the distinction between true APPs and quantities that are equivalent in an information theoretical sense, i.e., let us use the term “APP” also when the scale factor C has been omitted. * K−1 , Next, consider (4.2), and average over all candidate sequences d 0 * K−1 ) p(rK−1 |ek, , d * K−1 ) Pr(ek, |d * K−1 ). Pr(d (4.4) APP(ek, ) = 0 0 0 0 * K−1 d 0
* K−1 ), is a (discrete) indicator funcThe last factor in each term, Pr(ek, |d 0 * K−1 is tion that is either one or zero depending on whether the sequence d 0 consistent with the edge ek, or not. In other words, it is enough to sum over those sequences that are consistent with the considered edge, * K−1 ) p(rK−1 |ek, , d * K−1 ). Pr(d (4.5) APP(ek, ) = 0 0 0 * K−1 : ek, d 0
We can simplify the APP expression further by observing that it is superfluous to condition on the edge ek, if we also condition on a particular
85 * K−1 going through this edge, sequence d 0 * K−1 ) p(rK−1 |d * K−1 ) = APP(ek, ) = Pr(d 0 0 0 * K−1 : ek, d 0
* K−1 ). p(rK−1 ,d 0 0
* K−1 : ek, d 0
(4.6) In our simple example in Figure 4.1, only two sequences are consistent with edge e4,18 , and hence the sum (4.6) consists of only two terms. Except for very short block lengths K, however, a direct computation of this sum is not practical because of the astronomical number of terms. Thus, we need to come up with an efficient way of computing (4.6). Before continuing, let us make a relevant comment. The observant * K−1 ) |d reader recalls from Chapter 2 that the likelihood function p(rK−1 0 0 is undefined in the case of nonparametric uncertainty. For convenience, let us temporarily adopt a model in which the uncertainty may be statistically parameterized. More specifically, assume that the likelihood function, 1 * K−1 | π N K |K 0 * K−1 )−1 (rK−1 − m * K−1 * K−1 −m )H (K ) , × exp − (rK−1 0 0 0 0 0
* K−1 ) = |d p(rK−1 0 0
(4.7) may be expressed by means of a well-defined first moment, * K−1 , * K−1 E rK−1 |d m 0 0 0
(4.8)
and a well-defined second moment, * K−1 . * K−1 E rK−1 (rK−1 )H |d K 0 0 0 0
(4.9)
Like before, N denotes the number of observables extracted per symbol interval. A common model that satisfies this description is the statistically parameterized Rayleigh fading channel model [22, 34]. The motivation for adopting such a model is that we can easily plant a number of conceptual ideas without delving into mathematical details. In Section 4.3, we will return to the nonparametric case. Assume that the source produces independent bits, i.e., Pr(d˜k |d˜i ) = Pr(d˜k ) for all k = i. Trivially, the source is independent of previously received observables (since there is no feedback channel), i.e., Pr(d˜k |ri ) =
86
Chapter 4
Pr(d˜k ) for all k > i. As a first step towards an efficient computation of (4.6), each term in the sum is now factorized into three parts, K−1 * K−1 ) = α p(rK−1 ,d *0m−1 γ *0m β*m+1 , 0 0
(4.10)
* m−1 ), ,d α *0m−1 p(rm−1 0 0
(4.11)
* m ), ,d γ *0m Pr(d˜m ) p(rm |rm−1 0 0
(4.12)
K−1 * K−1 m * m p(rK−1 β*m+1 m+1 , dm+1 |r0 , d0 ).
(4.13)
(I) the past
(II) the present
and (III) the future
Further factorization leads to convenient recursions for the past, * m−2 ) Pr(d˜m−1 ) p(rm−1 |rm−2 , d * m−1 ) α *0m−1 = p(rm−2 ,d 0 0 0 0 *0m−1 , = α *0m−2 γ
(4.14)
and the future, K−1 * m+1 ) p(rK−1 , d * K−1 |rm+1 , d * m+1 ) β*m+1 = Pr(d˜m+1 ) p(rm+1 |rm 0 , d0 m+2 m+2 0 0 m+1 *K−1 = γ * . (4.15) β 0
m+2
To emphasize that (4.10) yields the APP of a hypothesized sequence (apart from the normalization constant C), (4.14) and (4.15) will be called the sequence APP forward recursion and the sequence APP backward recursion, respectively. The choice of the particular point in time m when the past and future paths are combined is completely arbitrary. Later, we will give a reason for choosing m = k, so let us already make this choice. (Recall that we are interested in the APP of edge number 1 at time k.) The recursions are updated with the factor γ *0i . This factor involves *i the a priori probability of bit d˜i and a PDF of the form p(ri |ri−1 0 , d0 ). At least two different approaches have been suggested for calculating the latter quantity: linear prediction as in, for example, [107, 165], or directly
87 (by factorization) as in [79, 81, 26], i.e., *i p(ri |ri−1 0 , d0 ) = =
*i ) p(ri0 |d 0 * i−1 ) p(ri−1 |d 0
0
* i−1 | |K H * i−1 −1 i−1 0 * i−1 * i−1 exp (ri−1 −m −m 0 0 ) (K0 ) (r0 0 ) i N * π |K 0 | * i )−1 (ri − m * i0 )H (K * i0 ) . (4.16) −(ri0 − m 0 0
Moreover, note that both the forward recursion and the backward recursion are updated with the same factor γ *0i —the only difference being the time index i. Thus, an algorithm based on (4.14) and (4.15) exploits the channel memory in only one direction and performs forward-only estimation. Now, let us once again consider the toy example in Figure 4.1. There are 32 edges at the depth k = 4. For infinite memory channels, we understand that each one of these edges is assigned its own distinct forward metric update, since the update depends on the entire hypothesized prehistory. This means that any forced-folding of the exponentially growing tree search into a trellis search is suboptimal, as is any search other than the exhaustive search. So far, no distinct steps have been taken to attack the core of our problem, namely the huge number of hypotheses; as we move deeper and deeper into the search tree, starting at the root, there will be an exponentially increasing number of paths to update (which is done according to (4.14)). Moreover, the sequence APP backward recursion (4.15) depends on the entire hypothesized prehistory. In consequence, the forward and backward recursions do not decouple in the sense that it would be possible to create a time-reversed search tree (starting from the end of the data block) with a unique root. The initialization of the backward recursion hence becomes impractical, * K−2 ) = Pr(d˜K−1 ) p(rK−1 |rK−2 , d * K−1 ) ,d β*K−1 p(rK−1 , d˜K−1 |rK−2 0 0 0 0 K−1 * K−1 p(r0 |d0 ) , (4.17) = γ *0K−1 = Pr(d˜K−1 ) * K−2 ) p(rK−2 |d 0 0 * K−1 ). At this point, |d because it involves the likelihood function p(rK−1 0 0 we may ask ourselves if we have achieved anything, since the very same
88
Chapter 4
likelihood was part of our original expression (4.6). Our findings have raised an interesting question: It is possible to split the sequence APP in (4.6) such that the past and future parts decouple? Put differently: Can we derive an algorithm that performs bi-directional estimation? Before addressing this question, let us first discuss how (4.6) may be simplified. There are two basic approximation methods (and combinations thereof): (I) we can either approximate each term/metric in the sum, or (II) we can approximate the sum by including only a few exact terms/metrics. The second approach was followed in Chapter 3. More specifically, we employed pruning in order to evaluate only a subset of all possible candidate sequences. The chief advantage of the strategy in Chapter 3 is that the survivor sequences retain exact metrics. For example, the exact GMLSD metrics3 are computed by the steps in Table 3.1, where the core is the RLS operation. For a Rayleigh fading channel with known first and second moments, the RLS algorithm is replaced by a Kalman filter to provide exact sequence metrics, as shown by Lodge and Moher [107]. As we have seen in Chapter 3, approach (II) is advantageous in the case of unknown ISI channels due to the slow convergence of the channel parameter estimates. It is still interesting to explore the implications of approach (I), and this is done below.
4.2.1
Truncation or Finite Path Memory
Finite path memory is a natural approximation for many channels having infinite memory—although optimality is lost. For example, consider a multiplicative fading channel model [22] with (infinite) channel correlation given by the zeroth order Bessel function of the first kind [34]; since the envelope of the correlation is monotonically decreasing (in time lag), the approximation entailed in truncating the memory becomes increasingly accurate as the truncation depth is extended. If the path memory is limited to µ previous symbol intervals, we arrive at the simplified recursions k−2 * k−2 k−2 * k−1 * k−1 ˜ α *0k−1 p(rk−1 0 , d0 ) = p(r0 , d0 ) Pr(dk−1 ) p(rk−1 |r0 , d0 ) * k−1 ), (4.18) ≈ α *k−2 Pr(d˜k−1 ) p(rk−1 |rk−2 , d 0
k−1−µ
k−1−µ
3 The relationship between the GMLSD metrics and the APPs will be explained in the next section.
89 and K−1 * K−1 k * k p(rK−1 β*k+1 k+1 , dk+1 |r0 , d0 ) * k+1 ) p(rK−1 , d * K−1 |rk+1 , d * k+1 ) = Pr(d˜k+1 ) p(rk+1 |rk , d
≈
0 0 k+2 k+2 k * k+1 ) β*K−1 . Pr(d˜k+1 ) p(rk+1 |rk+1−µ , d k+1−µ k+2
0
0
(4.19)
Note that the approximations in (4.18) and (4.19) become equalities when the model satisfies a µth-order Markovian property [117]. By introducing a finite memory update, k *k Pr(d˜k ) p(rk |rk−1 γ *k−µ k−µ , dk−µ ),
(4.20)
we can express (4.10) as K−1 k * K−1 ) ≈ α ,d *0k−1 γ *k−µ , β*k+1 p(rK−1 0 0
(4.21)
with (finite memory) sequence APP forward recursion k−1 *0k−2 γ *k−1−µ α *0k−1 = α
(4.22)
K−1 k+1 K−1 =γ *k+1−µ . β*k+2 β*k+1
(4.23)
and backward recursion
Moreover, the backward initialization does no longer involve the entire memory, K−1 *K−1−µ . (4.24) β*K−1 = γ Let us now see what happens to the search tree in Figure 4.1 on the assumption that the truncated path memory is µ = 2. The 32 metric updates at k = 4 are not distinct anymore , and the following eight sequences d˜0 0 0 1 1 0 0 1 1
d˜1 0 1 0 1 0 1 0 1
d˜2 0 0 0 0 0 0 0 0
d˜3 1 1 1 1 1 1 1 1
d˜4 0 0 0 0 0 0 0 0
d˜5 0 0 0 0 1 1 1 1
d˜6 0 0 0 0 0 0 0 0
d˜7 0 0 0 0 0 0 0 0
90
Chapter 4 k=0
k=1
k=2
k=3
k=4
k=5
k=6
k=7
00 01 e4,2 10 11 state labels Figure 4.2: Example of a terminated trellis with truncation depth µ = 2. * 4 ), since the have identical sequence APP updates γ *24 = Pr(d˜4 ) p(r4 |r32 , d 2 * 4 = [0 1 0] is the same.4 When the path memory hypothesized bit vector d 2 is truncated, the search tree becomes a graphical over-representation of our problem, and we can thus redraw the tree such that these eight sequences overlap at the time epoch k = 4. The whole tree then folds into the trellis illustrated in Figure 4.2, and the four dashed edges in Figure 4.1 are now superimposed and represented by edge e4,2 . A trellis state is defined as the µ most recent hypothesized symbols. We understand that the size of the resulting trellis is determined by the truncation depth µ, which is a design parameter. Also, the two zeros (d6 = d7 = 0) at the end of the block are seen to terminate the paths into a singular trellis state. Instead of a formal derivation, we will make use of the example in Figure 4.2 to illustrate how the desired sum (4.6) can be computed in a highly efficient manner. Hence, consider the APP of edge e4,2 , where the sequence APP is expressed by means of (4.21), α *03 γ *24 β*57 . (4.25) APP(e4,2 ) = * 7 : e4,2 d 0
Equations (4.22), (4.18), and (4.20) reveal that α *03 depends on the hy* 3 , i.e., α * 3 ). Likewise, it is clear from pothesized data vector d *03 ≡ α(d 0 0 7 7 * * 4 ). The sum * *24 ≡ γ(d (4.23), (4.18), and (4.24) that β5 ≡ β(d3 ), while γ 2 (4.25) —which should be taken over the eight hypothesized sequences listed 4 We are referring to the forward recursion update. An analogous statement holds for the backward recursion update.
91 previously—is then conveniently factorized into three parts, ' 3 * 3 = [0 0 0 1] + α d * = [0 1 0 1] APP (e4,2 ) = α d 0 0 3 ( 3 * = [1 1 0 1] * = [1 0 0 1] + α d + α d 0 0 4 ' 7 7 ( * = [0 1 0] β d * = [1 0 0 0 0] + β d * = [1 0 1 0 0] .(4.26) ×γ d 2 3 3 If we adopt the same notation as Benedetto et al. [17], we can write the last expression as APP(e4,2 ) = A sS (e4,2 ) γ(e4,2 ) B sE (e4,2 ) ,
(4.27)
where sS (ek, ) denotes the starting trellis state of edge ek, , while sE (ek, ) is the corresponding ending state. The state sS (e4,2 ) is thus assigned a forward metric ' 3 * 3 = [0 0 0 1] + α d * = [0 1 0 1] A sS (e4,2 ) = α d 0 0 3 3 ( * * + α d0 = [1 0 0 1] + α d0 = [1 1 0 1] , (4.28) whereas sE (e4,2 ) is assigned a backward metric ' 7 7 ( * = [1 0 0 0 0] + β d * = [1 0 1 0 0] . B sE (e4,2 ) = β d 3 3
(4.29)
The fact that sequence APPs of any two sequences merging in sS (ek, ) K−1 k is the key to the β*k+1 have an identical set of hypothesized factors γ *k−µ factorization (4.26). This further yields a nice recursion for the forward metric of any state s at time k + 1, Ak sS (ek, ) γ(ek, ), (4.30) Ak+1 [s] = ek, : sE (ek, )=s
where the metric has been explicitly indexed with the time k + 1 to make the recursion more apparent. Mergers in the backward direction allows a similar recursion for the backward state metric, Bk+1 sE (ek, ) γ(ek, ). (4.31) Bk [s] = ek, : sS (ek, )=s
92
Chapter 4
Equations (4.30) and (4.31), together with the completion step in (4.27), (4.32) APP(ek, ) = A sS (ek, ) γ(ek, ) B sE (ek, ) , constitutes the forward-backward algorithm. To conclude, it should be emphasized that it was the trellis description (i.e., the finite memory) that enabled this efficient computation of the sum (4.6). We could also have taken the standard forward-backward algorithm as our starting point, and redefined the original metrics αk (·), βk (·), and γk (·) given by Bahl et al. [10],
Ak [s] p sk = s; rk0 , &
k & Bk [s] p rK−1 k+1 sk = s; rk−µ+1 , &
γk s, s p sk = s; rk &sk−1 = s ; rk−1 , k−µ
(4.33) (4.34) (4.35)
where we have adopted the same simplified notation.5 A recursion for Ak [s] is then readily obtained by averaging over the previous states, followed by a factorization of the PDF, Ak [s] =
p sk−1 = s ; sk = s; rk0 s
4
& p sk−1 = s ; r0k−1 p sk = s; rk &sk−1 = s ; r0k−1 = s
≈
s
=
&
Ak−1 s ] p sk = s; rk &sk−1 = s ; rk−1 k−µ Ak−1 s γk (s , s).
(4.36)
s
The backward recursion may be found in a similar way, Bk [s] =
Bk+1 s γk+1 (s, s ).
(4.37)
s
We presented this derivation in [77]. 5 In comparison, Benedetto’s notation may seem a bit clumsy, but working with edges allows the handling of parallel trellis transitions [17].
93
4.3
The Forward-Backward Algorithm Using Bi-directional Estimation
In this section, we will see that it is possible to derive a form of the sequence APP where the forward and backward recursions decouple. Our presentation is devoted to the problem of unknown nonparametric channels. We have treated the case of statistically parameterized frequency-flat Rayleigh fading channels in [74] and [75].6 Following the GML concept, we will employ a generalized sequence APP in the form
K−1 * K−1 ) p(rK−1 |b * K−1 , h) * * GAPP b = max Pr(b 0 0 0 0 N L * h∈C ' 2 ( K−1 1 K−1 K−1 * * * , = exp max ln Pr(b ) − 2 r0 −B h 0 0 NL σ * h∈C (4.38) where σ 2 is the variance of a (discrete-time) noise observable. In (4.38), we dropped a constant (which can be subsumed into the previously omitted normalization constant C), and we further used that max(·) and exp(·) commutate since exp(·) is monotonically increasing.7 Let us rewrite (4.38) such that it involves a minimization, 3 2 K−1 1 K−1 * , * ln Pr(˜bk ) − 2 min r0 −B h 0 0 NL σ h∈C * k=0 (4.39) K−1 * which, in its turn, can be expressed by means of the LS metric Λ0 from Chapter 2,
K−1 * = exp GAPP b
2 K−1
2
, -3 K−1
K−1 K−1 1 2 * − 2 Λ r0 , B −σ ln Pr(˜bk ) . 0 σ k=0 (4.40) K−1 * Now, let us define a new metric Γ0 that includes the a priori informa K−1 * = exp GAPP b 0
6
Note that [74] and [75] use truncation, where the recursions and the completion are approximations. Optimal sequence metrics would involve Kalman filtering, as in [107]. 7 As before, we write “=” when only a redundant constant is missing.
94
Chapter 4
tion, K−1
* K−1 Λ rK−1 , B * K−1 − σ 2 * K−1 ≡ Γ rK−1 , B Γ ln Pr(˜bk ). 0 0 0 0 0
(4.41)
k=0
As mentioned above, our aim is to split this quantity into (I) a term for the past, (II) a term for the future that does not depend on the past, and possibly (III) a completion term. The details are left to Appendix B, and only the obtained result is presented here, * K−1 ) ≡ Λ * K−1 = Λ * k0 ; x + x Λ * K−1 + Λ *c . ,B Λx (rK−1 k 0 0 0;x k+1 ; x
(4.42)
For generality, the weighted LS metrics was used. Recall that 0 < x ≤ 1 * k and is the forgetting factor. It was shown in Chapter 3 how both Λ 0;x * c will be called the * K−1 can be computed recursively. The third term Λ Λ k k+1 ; x completion term (or the binding term), k −1/2 K−1 K−1 −1/2 k * c 2 + x (P * c 2 . * * −h * ) * * c (P h h Λ 0 0 k k k+1 − hk k+1 )
(4.43)
* k , the backward inverted It depends on the forward inverted Hessian P 0 K−1 *k * K−1 * Hessian P k+1 , the forward estimate h0 , the backward estimate hk+1 , and a completion estimate
where
−1 * K−1 * c * K−1 *k *c P *c h *k V h 0 k k 0 + x Pk+1 Vk hk+1 ,
(4.44)
k * K−1 −1 . * + x−1 P *c P V 0 k k+1
(4.45)
* c is somewhat involved, but it should be interpreted The expression for Λ k as a penalty term due to mismatching forward/backward estimation processes. Equations (4.44) and (4.45) show that identical forward/backward * K−1 , and identical forward/backward inverted Hes*k ≡ h estimates, i.e., h 0 k+1 *c ≡ h *k ≡ h * K−1 , so the * K−1 , yields the completion estimate h *k ≡ P sians P 0 0 k k+1 k+1 * c in (4.42) vanishes, as seen from (4.43).8 In general, completion term Λ k however, we understand that a penalty is paid by increasing the corresponding sequence metric. 8
Here, we assumed that x = 1.
95 The completion step will be described in more detail in the next chapter. As we will see, not all forward survivor sequences are consistent with the backward survivor sequences. This is due to the channel memory, which leads to a symbol overlap. To conclude, we have seen how it is possible to split the sequence APP in such a form that it may be computed by means of two independent recursions and a completion step. Estimation is then performed from both ends of the data block, and the algorithm is indeed bi-directional.
96
Chapter 4
Chapter 5
Generalized APP Detection It looked insanely complicated, and this was one of the reasons why the snug plastic cover it fitted into had the words DON’T PANIC printed on it in large friendly letters. Douglas Adams, The Hitchhiker’s Guide to the Galaxy
H
aving introduced most of the necessary ingredients, we are now ready to summarize our detection algorithm. In doing so, the metric completion needs to be explained in somewhat more detail. This chapter finally presents numerical results for a serially concatenated coded system.1 In order to make the algorithm easier to grasp, let us first identify its three phases. I. Initialization of the forward and backward recursions. II. Recursive computation of the forward and backward metrics, and pruning of the forward and backward search trees. III. Completion of the forward and backward metrics. Equations (3.26)–(3.29) and (3.34)–(3.37) together with the zero-padding explained in Sections 3.1.1 and 3.4.1 specify phase I. Recall that the number of (forward/backward) hypotheses for which initial metrics are computed is a design choice that determines the number of (forward/backward) 1
Parts of this chapter have been submitted for publication [72].
97
98
Chapter 5
sequences that will be retained during phase II. Moreover, the forward recursion and the backward recursion in phase II are computed according to the steps listed in Table 3.1 and Table 3.2, respectively. Note that those steps only yield forward and backward LS metrics, and we are interested in (generalized) sequence APPs. The a priori probabilities must be included in order to find the sequence APPs, and this is most conveniently done by first considering (4.41) and (4.42),2 * K−1 − σ 2 * K−1 = Λ Γ 0 0
ln Pr(˜bm )
m=0
2 =
K−1
* k0 − σ 2 Λ
k
3
ln Pr(˜bm )
2 +
* K−1 − σ 2 Λ k+1
m=0
K−1
3 ln Pr(˜bm )
*c . +Λ k
m=k+1
(5.1) *k , is then The first term within curly brackets, which may be denoted A updated by means of the forward recursion in (3.24)–(3.25), * k0 − σ 2 *k Λ A
k
ln Pr(˜bm )
m=0
˜k − σ2 * k−1 + λ = Λ 0
k−1
ln Pr(˜bm ) − σ 2 ln Pr(˜bk )
m=0
* ek−1 − σ 2 ln Pr(˜bk ), *k−1 + * = A eH k−1 Vk *
(5.2)
*k+1 , can be expressed whereas the second term within curly brackets, say B in terms of the backward recursion (3.31), *k+1 Λ * K−1 − σ 2 B k+1
K−1
ln Pr(˜bm )
m=k+1
˜ b − σ 2 ln Pr(˜bk+1 ) − σ 2 * K−1 + λ = Λ k+1 k+2
K−1
ln Pr(˜bm )
m=k+2
*k+2 + = B 2
(* ebk+2 )H
b *b * V k+1 ek+2
For notational brevity, we let x = 1.
2
− σ ln Pr(˜bk+1 ).
(5.3)
99 Equation (5.2) replaces the sixth step in Table 3.1, while (5.3) replaces the sixth step in Table 3.2. Finally, the generalized APP of a hypothesized * K−1 is calculated as indicated by (4.40) and (5.1), sequence b 0 (
K−1 1 '* c * * * GAPP b0 = exp − 2 Ak + Bk+1 + Λk . σ
(5.4)
Ideally, we want the survivor sequences to be the most likely sequences, *k+1 + Λ * c should then be de*k + B and the sequences with lowest metrics A k clared as survivors. In the terminology of (iterative) detection, the metric of the best sequence that is consistent with the sought output is called minimum sequence metric (MSM) [30]. The forward tree search tries to meet *k+1 + Λ * c by only expanding *k + B the goal of achieving a low overall metric A k *k . Likewise, the backward tree search only exnodes with low metrics A *k+1 . The details of the forward/backward pands nodes with low metrics B search have been explained in Section 3.2. Note that the completion term * c is not taken into account during the search process. Thus, we cannot Λ k guarantee that the MSM is found. Besides, we may loose the MSM when we prune the two search trees.
5.1
The Completion Phase
It has now become time to look at the completion phase in more detail. Assume that we are interested in finding the generalized APP of a certain edge, say edge ek, . This is done with (4.6) and (5.4), GAPP(ek, ) =
* K−1 : ek, b 0
( 1 '* c * * exp − 2 An + Bn+1 + Λn . σ
(5.5)
From (5.5) it is clear that the particular point in time n when the past and future paths are combined is completely arbitrary. In fact, the two extreme values n = K − 1 and n = 0 correspond to a single forward tree and a single backward tree, respectively. In practice, however, the point n is chosen to be in the vicinity of the edge ek, whose (generalized) APP we are looking for. This is done in order to maximize the number of consistent survivor sequences that can be used to produce soft information on ek, . More specifically, it is not certain that an explored path survives one step
100
Chapter 5
into the future, so one cannot guarantee that there exists any sequence consistent with the edge ek, if n > k. If we instead choose n = k, any forward survivor sequence that ends with an edge ek, at time k can be completed with a backward survivor sequence that has been grown down to the time index k +1. It is irrelevant whether this particular edge belongs to a sequence that is pruned away by the forward search later on (i.e., at a higher time index)—by then we have already completed the forward sequence with the help of a backward sequence and calculated the total sequence metric. Note that the forward tree and the backward tree can be simultaneously searched during phase II. See Figure 5.1, which also illustrated the completion in phase III. Although rather few survivor sequences are retained during phase II, and although a quite small number of consistent sequences are used in (5.5) for approximating the APP of a particular edge, we understand that the total number of used sequences is fairly large. The explanation is that different sequences are created at each completion. To illustrate this, let us make a simple example in which we assume that the forward and backward recursions retain only two survivor sequences each. For simplicity, we further assume that the block comprises five bits. Now, consider the following • forward survivor sequences of length two: 11 and 00, • forward survivor sequences of length three: 111 and 001, • backward survivor sequences of length two: 10 and 01, • backward survivor sequences of length three: 010 and 001. If the forward survivor sequences of length two are completed with the backward survivor sequences of length three, we get the four total sequences 11010, 11001, 00010, and 00001. Likewise, if the forward survivors of length three are completed with the backward survivors of length two, we get the sequences 11110, 11101, 00110, and 00101. We note that the four sequences created at the first completion are all different from the four sequences created at the second completion. Thus, growing two search trees provides not only an estimation diversity as explained in the previous chapters, but also sequence diversity.
101 K −1
Time
k
0 Forward tree
Backward tree
Completion front at time k
Figure 5.1: Schematic illustration of how the forward survivor sequences are completed with the backward survivor sequences at time k.
In the example above, it was assumed that every forward survivor sequence grown up to time k may be combined together with any backward survivor sequence grown down to time k + 1. This is of course not the case in general, and especially not for our problem. To understand how forward and backward survivors may be completed, consider Figure 5.2. Each node in the forward search node vector has been assigned a metric *k , while each node in the backward search node vector has been assigned A *k depends on the for*k+1 . From (5.2), it can be seen that A a metric B k k * * ward LS metric Λ0 , which, in turn, depends on B0 . Likewise, (5.3) shows * K−1 via the backward LS metric Λ * K−1 . Now, *k+1 depends on B how B k+1 k+1
102
Chapter 5
* k is built up of the hypothesized recall that the memory implies that B 0 *k * K−1 . This * K−1 comprises the symbols b data symbols b , whereas B −L+1 k+1 k−L+2 *k are common for the forward means that the hypothesized symbols b k−L+2 and backward metrics, i.e., any given forward survivor sequence at time k *k must have the L − 1 symbols b k−L+2 in common with a possible backward survivor sequence. These overlapping symbols have been marked with double circles in Figure 5.2. Expressed in the differential bits, the overlapping *k symbols correspond to the L − 2 bits d k−L+3 . Recall that each node represents two sequences, where the two sequences are sign-reversed replicas of each other. This means that we can switch the sign of any backward survivor sequence, if we at the same time switch the sign of its associated parameter estimate. By switching the sign of a backward survivor sequence, we simply assume that the termination state in Figure 2.3 is “−1” instead of “1,” or vice versa. In Chapter 3 it was only assumed that the transmitted data block comprises an even number of data symbols, which corresponds to the assumption that the termination state is either “−1” or “1.” Some completions require that the backward survivor is sign-reversed in order to satisfy the consistency condition, and such completions have been marked with a dashed line in Figure 5.2. In the particular example illustrated in Figure 5.2, it is seen how each one of the eight forward survivor sequences may be completed with the help of four different backward survivors. Now, the forward and backward metrics have already been computed, but (4.43)–(4.45) must be employed in * c for the particular sequence of interest. order to find the completion term Λ k * k and P * k are associated with the As already mentioned, the quantities h 0 0 * K−1 and P * K−1 are associated with the consistent forward survivor, while h k+1 k+1 backward survivors. The problem with (4.43)–(4.45) is their computational complexity—these equations involve three matrix inversions, * k )−1/2 , 1. (P 0 * K−1 )−1/2 , 2. (P k+1
k * + x−1 P * K−1 −1 , *c P 3. V 0 k k+1
103
˜bk−3 ˜bk−2 ˜bk−1 ˜bk
Node 0
−j −1 j
1
˜bk−1 ˜bk ˜bk+1 ˜bk+2
Node 1
−j −1 j
Node 2
Node 3
−j −1 −j
Node 4
−j
−j
Node 6
1
−j
−1
−1
j
1 1
1 −1
1
1
1
. /0 1 Forward search node vector
j
−1
−j
1
1
1
−1
Node 5
−1
Node 6
−1
Node 7
1
j
1
Node 4
1
j
1 −1
−1 1
−1 j
j 1
j
1
−j
−1 −1
1
Node 3
−1
1
−j −1 j
1
Node 2
−1
1
−1 −j −1
j
−1 −1
1
Node 7
1
1 −j 1
−j −1 −j −1
1
Node 1
−1 −1
−1
1 −j −1 1
Node 5
1
1 −j −1 1
−1
1
−1
−j
1
−j −1 −j −1 −1
Node 0
−1 −1 −1 d˜k d˜k+1 d˜k+2
−1
−1 −1
1 −j −1
j
−1 −1 −1 d˜k−2 d˜k−1 d˜k
1
. /0 1 Backward search node vector
Figure 5.2: Example of how the forward survivor sequences are completed with a number of consistent backward survivor sequences when L = 3. We have assumed that the completion time index k is an odd number. The L − 1 = 2 overlapping symbols have been marked with double circles, and a dashed completion implies a need for sign-reversing the associated backward survivor sequence.
104
Chapter 5
* K−1 , where the latter two, which involve the backward inverted Hessian P k+1 are unique for each backward survivor sequence. Although these inverses are rather small (the dimension is N L×N L), it would be desirable to compute only a few of them. One way to achieve a lower complexity would be to complete only one of the four consistent sequences for each forward survivor in Figure 5.2. Naturally, we could for example complete the forward *k+1 . survivor with the backward survivor having lowest metric B The bottleneck of the algorithm is perhaps more its memory requirement than its computational load. Even if we choose to complete each forward survivor with only one backward survivor, all forward survivors cannot be completed with the same backward survivor due to the consistency constraint. The backward survivor sequences may be divided into 2L−2 groups, where each group corresponds to a particular differential hy*k pothesis d k−L+3 . Let us call these groups backward survivor classes. In Figure 5.2, there are two backward survivor classes: the nodes {0, 2, 4, 6} corresponding to d˜k = −1, and the nodes {1, 3, 5, 7} corresponding to * K−1 and the pad˜k = 1. At least, we need to store the inverted Hessian P k+1 K−1 * *k+1 in rameter estimate hk+1 associated with the best backward metric B each backward survivor class. If the forward and backward trees are grown *k and searched simultaneously, we also need to store the inverted Hessian P 0 k * and the parameter estimate h0 for every node in the forward node vector, and this should be done for all time indices. Recall that the sign of the selected backward survivor also needs to be stored, because one must be able * K−1 should be to determine whether the backward parameter estimate h k+1 sign-reversed or not. To conclude, it is clear that the memory requirement may be rather significant. Another idea is to search the forward tree first, and save the quantities * k for each node at each time index. Once this is done, we * k , and h *k , P A 0 0 can grow and search the backward tree in order to complete the forward metrics. In this way, there is no need for saving the backward quantities, because they are used as soon as they are computed. Even better, we can * K−1 grow and search the backward tree first, only storing the quantities P k+1 * K−1 for the best backward metric B *k+1 in each backward survivor and h k+1 class. Considering our previous conjecture that the backward recursion might be slightly better, it would perhaps be beneficial to use all backward survivor sequences, and instead group the forward survivors into survivor
105 classes. If the two search trees are not searched simultaneously, the execution time of the detection process is of course extended. Also, the storage requirement is still higher than that of the conventional (mono-directional) forward-backward algorithm, which only requires the storage of scalar forward/backward metrics.
5.1.1
Completion Without the Explicit Need for a Completion Term
Anastasopoulos and Chugg treated a different problem with a much smaller dimensionality, and the memory consumption of their bi-directional estimation algorithm was never mentioned as a potential bottleneck [5]. On the other hand, they proposed an approximate completion term as a means of reducing the computational load [5]. Let us instead present an alternative approach whose storage requirement is similar to that of the standard forward-backward algorithm. Further, this approach has the advantage of being conceptually easier to understand, and we also avoid the three matrix inversions listed previously. Rather than delivering intermediate results (in the form of inverted Hessians and estimated parameter vectors) needed for computing the penal* c , we suggest that the forward/backward search delivers a set of ty term Λ k candidate sequences. Each forward search node at depth k simply needs to keep its associated survivor sequence of length k + 1, and correspondingly for the backward node. Once the forward/backward survivors have been found, we can complete the sequences and compute the total sequence met* K−1 in (2.117). An advantage ric directly by means of the full LS metric Λ 0 is that the forward and backward searches can be executed simultaneously. To conclude, the very concept of a completion term becomes superfluous for this approach, because the LS metric is never partitioned. Although the LS metric 2 ' ( K−1 H * * K−1 −1 ×(B * K−1 )H ×rK−1 * K−1 × (B * K−1 = ) × B I− B Λ (5.6) 0 0 0 0 0 0 only involves one matrix inversion of dimension N L × N L, the complexity grows linearly with the block length because of the four matrix multiplications. In consequence, the complexity for this approach is very high.
106
5.2
Chapter 5
Description of an Example System
In Chapter 4, we gave a brief background to turbo codes and iterative detection, and we are now ready to explore these ideas in the context of unsupervised communication, using our newly derived SISO algorithm. In particular, we will discuss the choice of constituent encoders, as well as some implementation aspects of iterative detection. The original turbo codes, introduced by Berrou et al. in 1993, were based on parallel concatenated convolutional codes with an embedded pseudo-random interleaver [21]. By deriving an upper bound to the average bit error probability for ML detection over AWGN channels, Benedetto et al. later showed that the interleaver gain for serially concatenated coding schemes can be made significantly higher than for conventional turbo codes [18]. The interleaver gain is defined as the factor that decreases the bit error probability as a function of the interleaver size. In particular, it was shown that an interleaver gain is guaranteed for serial concatenation if, for the inner encoder, the minimum weight of input sequences generating error events is two [18]. Recursive convolutional encoders possess this property and are thus suitable as inner encoders [18]. Moreover, by decomposing CPM into a recursive continuous-phase encoder (CPE) and a memoryless modulator [136], it is clear that also CPM schemes may be suitable as inner encoders. Here, MSK will be employed as inner encoder, and since the minimum weight of input sequences generating error events is then two, we can expect an interleaver gain over the AWGN channel. See for example work by Moqvist and Aulin [121, 120, 119], who have thoroughly investigated the performance of serially concatenated CPM over the AWGN channel. The decomposition of MSK into a recursive CPE and a memoryless modulator is shown in Figure 5.3 at the top of the next page [136]. Considering the existence of multiple avatars of MSK, it should be mentioned that there is a precoded version without interleaver gain [2]. The explanation is that this form, known as differential MSK, involves a nonrecursive CPE with error events of input weight one [16, 18]. Based on the upper AWGN bound to the average ML bit error probability, Benedetto et al. also gave a number of design rules for the outer encoder; we should choose an outer nonrecursive encoder with a large (and preferably odd) value of the free distance [18]. A simple choice is then the
107 ✲
Interleaved bit stream from the outer encoder
✲+ ✲ D ✻
Memoryless ✲ modulator
Figure 5.3: The inner encoder, MSK, can be decomposed into a simple recursive encoder, followed by a memoryless mapper. The addition ⊕ is performed modulo two. standard feedforward convolutional code with generators (7, 5) in octal notation and with free distance five [131]. Note that the given rules of thumb for designing good serially concatenated codes are valid for the AWGN channel, and their bearing on our problem will be discussed below. A block diagram of the transmitter, the channel, and the receiver is depicted in Figure 5.4 on the next page. The information bit stream is first fed to the outer encoder, which produces a block of coded bits. Whenever the outer code has a trellis description, we choose to drive the sequence to a known termination state, such as the all-zero state. Termination is generally achieved by appending a number of zeros. Next, the coded bits are permuted by the interleaver and fed to an inner encoder, and as already explained, MSK constitutes this inner encoder. Let us further recall that we choose a semiblind approach, i.e., the first as well as the last modulated symbols are known pilot symbols. Finally, the symbols are transmitted over the complex baseband channel, which is characterized by the (noiseless) channel impulse response, i.e., an unknown (linear) filter, as well as the additive thermal noise. Now, what are the main differences between communication over the AWGN channel and our problem? First, our channel has memory, and it can hence be thought of as an additional code, in series with the inner code. See the box labeled “joint inner code” in Figure 5.4. Had the (noiseless) channel impulse response been perfectly known, this joint inner code would have reduced to MSK followed by a finite impulse response (FIR) filter. Trivially, the joint inner code would then have been recursive, just as the CPE of MSK, and we should have expected an interleaver gain.
108
Chapter 5
Outer encoder
✲ Interleaver
✲
Inner encoder JOINT INNER CODE
❄
Unknown filter ❄ ✲+
AWGN
❄
Receiver front-end Outer SISO
✛
Deinterleaver ✛ ✲ Interleaver
✲
Inner SISO
✛
ITERATIVE DETECTOR
Figure 5.4: System block diagram. In this thesis, we treat a different, more realistic case in which we lack perfect knowledge of the channel (exclusive of noise). Now, is the joint inner code still recursive, i.e., can we expect an interleaver gain? Before addressing this question, it could be interesting to recall our results for statistically parameterized frequency-flat Rayleigh fading channels [71, 78]. In [71, 78], we only assumed knowledge of the first and second moments of the fading process. Special attention was paid to channels with high Doppler rate, i.e., rapidly fading channels. This fast fading causes a weak correlation between any two symbols separated more than a few symbol intervals, and in consequence, the recursive nature of the joint inner code is impaired. As expected, the interleaver gain was found to be somewhat smaller in [71, 78] compared with the corresponding system applied over the AWGN channel [119, 120]. Our current problem is more complicated. However, for an unlimited block length in combination with an unlimited tree search, it is conjectured
109 that the inherent estimation process of the unknown channel parameter vector yields a perfect estimate (cf. [101] and (3.30)). In the limit, the impulse response can once again be regarded as a known FIR filter, and the recursive nature of the inner code is preserved. For this reason, it seems motivated to employ a recursive inner encoder also for our problem. Let us return to the block diagram in Figure 5.4. The front-end obtains an observation vector by Nyquist sampling the lowpass filtered received signal. This observation vector is then fed to the iterative detector, which consists of two SISO modules, one deinterleaver, and one interleaver. Each SISO module can be viewed as the soft inverse [30] of the corresponding code block, i.e., it performs soft decoding of the corresponding encoder. In particular, the outer SISO module employs the standard forward-backward algorithm, with the only difference that it delivers extrinsic information [21, 17] instead of APPs. Extrinsic information is formed by normalizing3 the computed APP with the corresponding a priori information, which means that its value is based on information coming from all symbols in the block, except the one corresponding to the same symbol interval [17]; we will give a simple example below for a rate 1/2 repetition code. The use of extrinsic information has been interpreted as an attempt to maintain independence between the two decoders by only delivering information that is new to each decoder [70]. From another viewpoint, extrinsic information helps to overcome local maxima on the likelihood surface(s) of the constituent decoder(s) [93]. However, it is not obvious that extrinsic information is preferable also for our system—the inner SISO module does not compute exact APPs—but empirically a small gain was observed. Considering practical implementation, the SISO algorithm operating in the logarithmic domain offers somewhat higher numerical precision [18]. It is then crucial to use the Jacobian logarithm, as explain in [137]. In our simulations, the outer SISO operated in the probability domain, and we did not experience any numerical problems. The inner SISO, however, suffered from precision problems due to large *k + B *k+1 + Λ * c , or as seen from (5.4), due to small * K−1 = A metrics Γ 0 k K−1 −2 * GAPPs exp{−σ Γ }. This problem was readily eliminated by means 0 of a simple renormalization procedure. As an illustrative example, assume 3 The normalization is equivalent to a division in the probability domain, and a subtraction in the logarithmic domain.
110
Chapter 5
that two edges correspond to bit zero at time k with GAPP(0k )1 = e−1000 GAPP(0k )2 = e−1001 , while the two generalized APPs GAPP(1k )1 = e−1003 GAPP(1k )2 = e−1004 are associated with bit one. Now, these values have to be normalize in order to find true probability measures, Pr(0k ) = (e−1000 + e−1001 ) ÷ (e−1000 + e−1001 + e−1003 + e−1004 ), Pr(1k ) = (e−1003 + e−1004 ) ÷ (e−1000 + e−1001 + e−1003 + e−1004 ), and we can then discard the greatest factors common to all terms in the numerator and the denominator, Pr(0k ) = (1 + e−1 ) ÷ (1 + e−1 + e−3 + e−4 ) ≈ 0.95, Pr(1k ) = (e−3 + e−4 ) ÷ (1 + e−1 + e−3 + e−4 ) ≈ 0.05. From the block diagram in Figure 5.4, it is further easy to understand the conceptual idea of iterative decoding. First, the inner SISO decodes the joint inner code, which means that it runs the presented algorithm to find GAPPs of all edges in the pruned search tree. Trivially, each transition (or edge) is driven by a corresponding input bit, and we can then calculate the GAPPs of bit one and bit zero in each symbol interval; this was exemplified above. The obtained values are then normalized by the a priori information in order to form extrinsic information.4 As soon as extrinsic information has been generated by the inner SISO module, these quantities are deinterleaver and used as a priori probabilities of the coded bits coming from the outer code. We can then decode the outer code and make final decisions. However, we can also let the outer SISO module deliver extrinsic information on the coded bits, interleave this information, and use it as new a priori information on the bits fed to the inner encoder. 4 For the very first iteration, the a priori information is assumed to be uniformly distributed.
111 The whole procedure is repeated a number of times, and the soft decisions hopefully converge to the correct hard decisions. See the articles by Bahl et al. [10] and Benedetto et al. for details on the conventional SISO algorithm [17]. Let us now comment on the outer code. Today, almost a decade since turbo codes first appeared on the scene, extrinsic information transfer (EXIT) charts [152] are commonly used to provide design guidelines for the constituent codes. In this type of chart, the exchange of extrinsic information between the constituent decoders is visualized as a decoding trajectory. The method offers a numerical analysis of the convergence behavior of iterative detection, and it can further predict the error performance of interleaved concatenated codes. See also the analysis method proposed by El Gamal and Hammons [61], as well as the work by Divsalar, Dolinar, and Pollara [43]. Our primary objective is not to design an excellent overall system for communication over unknown time-dispersive waveform channels, but merely to provide some insight into the design issues. Rather, we have chosen to focus on the development of a soft decision decoding algorithm. This does not mean that we can ignore the design of the outer code—it only means that we have no intention of finding the best possible outer code. Indeed, we will see in the next section that the simple (7, 5) convolutional code performs poorly, whereas the weaker rate 1/2 repetition code yields good performance. An EXIT chart should be able to explain this difference in performance, and such a study is part of future work. For now, we conclude that each constituent decoder can be associated with a mutual information (MI) transfer function in the EXIT chart [152]. A tunnel opens between the two functions at the decoding threshold (sometimes called the turbo cliff position, or the pinch-off limit)—normally defined as the minimum SNR for which the bit error rate approaches zero as the number of iterations increases (to infinity)—under the assumption of an infinite interleaver size, and under the assumption that the two transfer functions converge to one-one in MI [152, 61, 43]. Since our inner decoder only computes approximate APPs, it is not certain that its associated MI transfer function reaches one-one in MI. Anyhow, we conjecture that the tunnel does not open for a (7, 5) convolutional code as outer code. Instead, a rate 1/2 repetition code (which corresponds to a linear MI transfer function that goes from (0, 0) to (1,1) along the first diagonal of the EXIT chart [153]) supposedly opens up a tunnel. Finally, it should be noted that the
112
Chapter 5
EXIT chart method has the same weakness as conventional Monte Carlo simulations in that it also relies on the implementation of the constituent decoders—it does not provide analytical exactness. Finally, let us mention that the SISO algorithm becomes trivial for a rate 1/2 repetition code, since such a code has a single state as illustrated in Figure 5.5. ck,1 = uk uk
0/00
1/11
ck,2 = uk
(a)
(b)
Figure 5.5: (a) Rate 1/2 repetition encoder. (b) State diagram of a rate 1/2 repetition code. The transitions are labeled uk /ck,1 ck,2 . In consequence, there is no need for a forward recursion or a backward recursion. As an example, consider the following a priori information on the two coded bits ck,1 and ck,2 (at time k) Pr(ck,1 = 0) = 1 − p, Pr(ck,2 = 0) = 1 − q, Pr(ck,1 = 1) = p, Pr(ck,2 = 1) = q. We can then easily calculate the probabilities of the information bit uk , Pr(uk = 0) ∝ (1 − p)(1 − q), Pr(uk = 1) ∝ p q, which is normalized to Pr(uk = 0) = Pr(uk = 1) =
(1 − p)(1 − q) , (1 − p)(1 − q) + p q pq . (1 − p)(1 − q) + p q
113 Next, consider the extrinsic information on the coded bits, 1−q (1 − p)(1 − q) + p q 1−p = 0) = (1 − p)(1 − q) + p q q = 1) = (1 − p)(1 − q) + p q p = 1) = (1 − p)(1 − q) + p q
Pre (ck,1 = 0) ∝ Pr(uk = 0) ÷ Pr(ck,1 = 0) =
a,
Pre (ck,2 = 0) ∝ Pr(uk = 0) ÷ Pr(ck,2
b,
Pre (ck,1 = 1) ∝ Pr(uk = 1) ÷ Pr(ck,1 Pre (ck,2 = 1) ∝ Pr(uk = 1) ÷ Pr(ck,2
c, d,
which is normalized to Pre (ck,1 = 0) = a ÷ (a + c) = 1 − q, Pre (ck,2 = 0) = b ÷ (b + d) = 1 − p, Pre (ck,1 = 1) = c ÷ (a + c) = q, Pre (ck,2 = 1) = d ÷ (b + d) = p. We see that the two incoming a priori probabilities p and q are just swapped and propagated back as extrinsic information from the outer SISO.5 Thus, the outer SISO can be subsumed into the deinterleaving/interleaving process in Figure 5.4, which means that the inner SISO module is concatenated with a simple reordering device as shown in Figure 5.6 below. The generalization to arbitrary rate repetition codes is straightforward [44]. Reordering ✛ ✲ device
Inner SISO
✛
Figure 5.6: Block diagram of the iterative detector when a rate 1/2 repetition code is used as outer encoder. Highly complex systems have previously been proposed for communication in the presence of parametric and nonparametric uncertainty [113, 53, 69]. These proposals suggest a turbo encoder (i.e., a configuration comprising an interleaver and two concatenated encoders) as outer constituent 5 The fact the outer constituent decoder is a plain swapper explains why its MI transfer function is represented by a diagonal line in the EXIT chart [153].
114
Chapter 5
encoder, which leads to a rather intricate detector structure. In contrast, our example system is extremely simple, with the sole exception of the inner SISO algorithm.
5.3
Numerical Results
As a benchmark system on the transmitter side, we will consider a rate 1/2 repetition code in series with a 512-bit interleaver and MSK. 25 pilot symbols are inserted in the beginning and in the end of the block. Our standard receiver configuration keeps 32 nodes in the forward node vector, as well as in the backward node vector. In general, we will only use one consistent sequence to generate soft information on each edge, and this is done for complexity reasons. Figure 5.7 shows the simulated probability of making an erroneous decision on an information bit as a function of the SNR when our standard configuration (defined in the previous paragraph) is employed for communication over Channel A. By repeating the tree search three times, i.e., by feeding back the extrinsic information from the outer SISO module twice, we gain slightly more than two dB in SNR. This gain is very much the same as the iteration gain obtained by conventional turbo codes over AWGN channels [21], which is pleasing. On the other hand, iterating more than twice is not especially meaningful, since we have observed that the extrinsic information entropy is close to zero after the second iteration, i.e., practically hard decisions are circulating between the inner and outer SISO modules after the second iteration. As an example, if we iterate a third time at 3 dB in SNR, the error rate decreases from 4.4% to 4.2%, and such a small improvement could hardly be seen in Figure 5.7. This is different to iterative detection of interleaved concatenated codes over AWGN channels, where as many as 10 or 15 iterations may be fruitful in order to improve the error performance [21, 18].6 The fast convergence that we experience for our system is due to the approximations in the inner SISO algorithm and the properties of the joint inner code (confer the previous discussion on the weak recursive nature of the joint inner code). In any case, a great number of iterations is not reasonable because of the high 6 Codes that operate very close to the Shannon limit may require more than 100 iterations [153].
115 10
Bit error rate
10
10
10
10
0
Iteration 0 Iteration 1 Iteration 2 −1
−2
−3
−4
0
1
2
3 4 5 6 Signal−to−noise ratio [dB]
7
8
9
Figure 5.7: Error performance over Channel A for a rate 1/2 repetition code in series with a 512-bit interleaver and MSK. One sequence is completed through each edge. 32 nodes and 25 pilot symbols are used in the forward/backward search.
complexity involved in each iteration. Moreover, let us recall one observation from Chapter 2, where generalized ML detection of short blocks was investigated. It was seen how the error performance was very sensitive to the particular channel, with a huge discrepancy (over 20 dB in SNR) between the best and the worst case. As already concluded, a more robust performance is desirable. The bit error probability obtained by simulating the standard configuration over Channel B, Channel C, Channel D, and Channel E is reported in Figure 5.8, Figure 5.9, Figure 5.10, and Figure 5.11, respectively. The gain from iterating twice is obviously more than two dB in SNR for all the five example channels, while the SNR that is required to attain 10−3 in error probability varies between 5.7 (Channel C) and 7.25 (Channel D). To conclude, our SISO algorithm seems reasonably robust to the particular shape of the channel impulse response.
116
Chapter 5 10
Bit error rate
10
10
10
10
0
Iteration 0 Iteration 1 Iteration 2 −1
−2
−3
−4
0
1
2
3 4 5 6 Signal−to−noise ratio [dB]
7
8
9
Figure 5.8: Error performance over Channel B for a rate 1/2 repetition code in series with a 512-bit interleaver and MSK. One sequence is completed through each edge. 32 nodes and 25 pilot symbols are used in the forward/backward search. 10
Bit error rate
10
10
10
10
0
Iteration 0 Iteration 1 Iteration 2 −1
−2
−3
−4
0
1
2
3 4 5 6 Signal−to−noise ratio [dB]
7
8
9
Figure 5.9: Error performance over Channel C for a rate 1/2 repetition code in series with a 512-bit interleaver and MSK. One sequence is completed through each edge. 32 nodes and 25 pilot symbols are used in the forward/backward search.
117 10
Bit error rate
10
10
10
10
0
Iteration 0 Iteration 1 Iteration 2 −1
−2
−3
−4
0
1
2
3 4 5 6 Signal−to−noise ratio [dB]
7
8
9
Figure 5.10: Error performance over Channel D for a rate 1/2 repetition code in series with a 512-bit interleaver and MSK. One sequence is completed through each edge. 32 nodes and 25 pilot symbols are used in the forward/backward search. 10
Bit error rate
10
10
10
10
0
Iteration 0 Iteration 1 Iteration 2 −1
−2
−3
−4
0
1
2
3 4 5 6 Signal−to−noise ratio [dB]
7
8
9
Figure 5.11: Error performance over Channel E for a rate 1/2 repetition code in series with a 512-bit interleaver and MSK. One sequence is completed through each edge. 32 nodes and 25 pilot symbols are used in the forward/backward search.
118
Chapter 5
As a reference, Figure 5.12 further presents the error performance over a discrete-time channel with eight real-valued, identical coefficients. This discrete-time channel is normalized such that the sum of the squared coefficients equals one. We find that the performance slightly deteriorates compared to our five previous examples. In particular, 7.5 dB in SNR is required to attain 10−3 in BER, whereas 7.25 dB was sufficient for the worst case continuous-time example channel (Channel D). The iteration gain for this discrete-time channel is close to 3 dB. Figure 5.13 on the next page summarizes the performance over our different example channels by showing the simulated BER after three decoding iterations assuming our standard configuration (i.e., a rate 1/2 repetition code in series with a 512-bit interleaver and MSK, and a bi-directional search using one consistent sequence per edge, 2 × 32 nodes, and 2 × 25 pilot symbols).
10
Bit error rate
10
10
10
10
0
Iteration 0 Iteration 1 Iteration 2 −1
−2
−3
−4
0
1
2
3 4 5 6 Signal−to−noise ratio [dB]
7
8
9
Figure 5.12: Error performance over an eight-tap real-valued equal gain channel for a rate 1/2 repetition code in series with a 512-bit interleaver and MSK. One sequence is completed through each edge. 32 nodes and 25 pilot symbols are used in the forward/backward search.
119 The main conclusion to be drawn from Figure 5.7–5.13 is once again the robustness compared to the numerical results presented in Chapter 2. Recall that all examples are valid for a channel impulse response supported on [0, 2Ts ), and a greater delay spread may of course impair the performance since it leads to an LS problem of higher dimension. It could certainly be interesting to evaluate the performance over channels with substantial delay spread, but the proposed SISO algorithm is then rather expensive in terms of complexity, and further approximations would be necessary. A natural option is to replace the RLS algorithm with the considerably less complex least mean squares (LMS) algorithm [87, 33]. Such an approach may be explored in a future study. 0
10
Channel A Channel B Channel C Channel D Channel E Equal gain
−1
Bit error rate
10
−2
10
−3
10
−4
10
0
1
2
3 4 5 6 Signal−to−noise ratio [dB]
7
8
9
Figure 5.13: Error performance after three decoding iterations of a rate 1/2 repetition code in series with a 512-bit interleaver and MSK. One sequence is completed through each edge. 32 nodes and 25 pilot symbols are used in the forward/backward search. Compared to the AWGN channel, over which an error rate of 10−3 is attained at less than 3 db in SNR with (7,5)-coded MSK for a 512-bit interleaver [119, 120], there is a gap of about 3 dB for our standard config-
120
Chapter 5
uration. Let us now investigate various ways of improving the performance and/or closing this gap. First, consider the influence of the number of pilot symbols. This is done in Figure 5.14 below, where the error rate is seen to initially decrease rapidly as more and more pilot symbols are inserted, until the curve flattens out. Trivially, there is a point where we start to loose from inserting more pilot symbols, since an increased number of pilot symbols means that more energy is spent to transmit a certain amount of information. In Figure 5.14, we have used the block length K = 512, and for this block length, it is seen how the curve flattens out around 25 pilot symbols. Note that the forward and backward searches are initialized with 25 pilot symbols each, which means that 50 pilot symbols are used in total, or approximately 10% of the transmitted symbols.
0
10
Iteration 0 Iteration 1 Iteration 2
−1
Bit error rate
10
−2
10
−3
10
6
10
15
20 25 30 Number of pilot symbols
35
40
Figure 5.14: Influence of the number of pilot symbols. The ordinate shows the error rate over Channel A at 6 dB in SNR for a rate 1/2 repetition code in series with a 512-bit interleaver and MSK. One sequence is completed through each edge. 32 nodes and 25 pilot symbols are used in the forward/backward search.
121 In larger blocks, more pilot symbols (in absolute numbers) may be inserted before the performance is impaired. Still, we have found that the BER curve starts to flatten out after around 20 pilot symbols. The main reason for going up in block size is hence not because we would like to insert a few more pilot symbols without significantly degrading the overall code rate. Rather, we hope to achieve an interleaver gain. The effects of an increased interleaver size is shown in Figure 5.15, where the number of pilot symbols was fixed to 25. At a bit error probability of 10−3 we gain about 1.1 dB in SNR when the interleaver size is increased from 256 bits to 1024 bits. This result is similar to the gain obtained by interleaved (7, 5)-coded MSK over AWGN channels [119, 120]. In our case, however, the gain is not only due to the increased interleaver. As already pointed out, there is a small gain from the increased energy per information symbol
0
10
256−bit interleaver 512−bit interleaver 1024−bit interleaver
−1
Bit error rate
10
−2
10
−3
10
−4
10
0
1
2
3 4 5 6 Signal−to−noise ratio [dB]
7
8
9
Figure 5.15: Influence of the interleaver size. The ordinate shows the error rate over Channel A after three decoding iterations of a rate 1/2 repetition code in series with a 512-bit interleaver and MSK. One sequence is completed through each edge. 32 nodes and 25 pilot symbols are used in the forward/backward search.
122
Chapter 5
(the ratio of pilot symbols becomes smaller as the block size is extended). Further, recall that longer sequences yields more accurate estimates, and this is also expected to improve the performance somewhat. A drawback of large interleavers is the inevitable decoding delay, since it is necessary to wait until the whole block has been received before the deinterleaver can permute and deliver the information to the outer SISO module. Another approach is to try different encoders. See Figure 5.16 in which a rate 1/2 repetition code is compared with a convolutional code whose generators are 7 and 5 in octal notation. Clearly, the (7, 5) convolutional code performs poorly both in terms of the iteration gain and the absolute BER. This plot merely exemplifies the importance of choosing a proper outer encoder—to provide design guidelines for the outer constituent encoder lies beyond the scope of this thesis. Yet, let us recall that EXIT chart techniques offer a more direct way of analyzing the convergence properties 0
10
(7,5) convolutional code Rate 1/2 repetition code −1
Bit error rate
10
−2
10
−3
10
−4
10
0
1
2
3 4 5 6 Signal−to−noise ratio [dB]
7
8
9
Figure 5.16: Influence of the outer code. The ordinate shows the error rate over Channel A after the first three decoding iterations of two different outer codes in series with a 512-bit interleaver and MSK. One sequence is completed through each edge. 32 nodes and 25 pilot symbols are used in the forward/backward search.
123 of the outer code compared to Monte Carlo simulating the whole system [152]. Confer the discussion in Section 5.2. Furthermore, our inner SISO algorithm is general and could be employed to other modulation formats than MSK. For AWGN channels, some interleaved concatenated higher order CPM schemes have been proven to achieve better power efficiency than interleaved concatenates MSK [121]. This is also true for statistically parameterized Rayleigh fading channels [Anders Hansson, unpublished work]. It would be straightforward to extend the present results to higher order CPM, but in order to keep the complexity as low as possible, we have chosen to work with MSK. If we are willing to invest more complexity, it may be more rewarding to change the outer code to a turbo code, as in [113, 53, 69]. Next, Figure 5.17 shows how very crucial the completion term is. The dashed curves represent the error rates of the first three decoding iterations 0
10
Without completion term With completion term
−1
Bit error rate
10
−2
10
−3
10
−4
10
0
1
2
3 4 5 6 Signal−to−noise ratio [dB]
7
8
9
Figure 5.17: Importance of the completion term. The ordinate shows the error rate over Channel A after the first three decoding iterations of a rate 1/2 repetition code in series with a 512-bit interleaver and MSK. One sequence is completed through each edge. 32 nodes and 25 pilot symbols are used in the forward/backward search.
124
Chapter 5
when the completion term is neglected, i.e., when the completion term is set to zero. From the last two plots it can concluded that both the structure of the outer encoder and the exactness of the sequence metrics play important roles and must be handled with care. One parameter that has an insignificant effect on the error performance is the number of nodes in the forward/backward node vector, i.e., the number of forward/backward survivor sequences at any given time index. This is seen in Figure 5.18 below, where we compare our reference system using either 32 or 128 nodes. 0
10
32 nodes 128 nodes
−1
Bit error rate
10
−2
10
−3
10
−4
10
0
1
2
3 4 5 6 Signal−to−noise ratio [dB]
7
8
9
Figure 5.18: Influence of the number of nodes in the forward/backward node vector. The ordinate shows the error rate over Channel A after three decoding iterations of a rate 1/2 repetition code in series with a 512-bit interleaver and MSK. One sequence is completed through each edge. 25 pilot symbols are used in the forward/backward search. It may seem a bit astonishing that the gain from including an additional 96 (= 128 − 32) forward/backward survivor sequences is so modest, but these new survivor sequences are obviously not able to contribute very much (their metrics are high). Put differently, 32 survivor sequences are
125 enough to capture the essential information. It may of course be necessary to use more than 32 nodes if the delay spread (i.e., the dimensionality of the LS problem) is large, but our conclusion is that there is not to be gained from increasing the number of nodes beyond a certain threshold (which happens to be 32 nodes in our example). Confer a related study in which we devised a similar bi-directional search technique for the purpose of reduced-complexity iterative detection [23]. So far, we have identified the interleaver size as a key parameter for improving the error performance. Are there other ways of obtaining higher power efficiency? The answer is yes, as seen from Figure 5.19 below and Figure 5.20 on the next page. These two plots illustrate how the bit error probability decreases as more than one consistent sequence is used for computing soft information on each survivor edge. The discrepancy between one and two sequences is 0.6 dB in SNR at 10−3 in BER, while the additional gain is insignificant when four sequences is used instead of two. 0
10
One sequence Two sequences
−1
Bit error rate
10
−2
10
−3
10
−4
10
0
1
2
3 4 5 6 Signal−to−noise ratio [dB]
7
8
9
Figure 5.19: Influence of the number of sequences that are completed through each edge. The ordinate shows the error rate over Channel A after the first three decoding iterations of a rate 1/2 repetition code in series with a 512-bit interleaver and MSK. 32 nodes and 25 pilot symbols are used in the forward/backward search.
126
Chapter 5 0
10
Two sequence Four sequences −1
Bit error rate
10
−2
10
−3
10
−4
10
0
1
2
3 4 5 6 Signal−to−noise ratio [dB]
7
8
9
Figure 5.20: Influence of the number of sequences that are completed through each edge. The ordinate shows the error rate over Channel A after the first three decoding iterations of a rate 1/2 repetition code in series with a 512-bit interleaver and MSK. 32 nodes and 25 pilot symbols are used in the forward/backward search.
Chapter 6
Conclusions and Suggestions for Future Work If I have not seen as far as others, it is because giants were standing on my shoulders. Hal Abelson, paraphrasing Sir Isaac Newton’s correspondence to fellow scientist Robert Hooke on February 5, 1676, “If I have seen further it is by standing on the shoulders of giants.”
A
semiblind algorithm that delivers soft decisions on symbols transmitted over an unknown time-dispersive waveform channel was derived. Special attention was paid to the discretization problem, i.e., the derivation of an equivalent (discrete-time) vector channel. We discussed how the notion of a sufficient statistic (in the sense of L2 completeness) is involved with difficulties and not directly applicable for complexity-constrained detection. Instead, by confining us to the space of time-limited and essentially band-limited functions we were able to put the quest for an optimal discretization scheme on a more rigorous basis. Next, the generalized maximum-likelihood concept was reviewed and employed for optimal sequence detection. It was seen how such an optimal algorithm needs to operate in a search tree whose size grows exponentially with the block length. In order to reduce the complexity, we presented a pruning scheme based on the add-compare-select steps of the Viterbi algorithm in combination with per-survivor processing. The obvious weakness of this suboptimal approach is that we risk to loose the sequence(s) with 127
128
Chapter 6
minimum metric(s), and in consequence, we risk estimation error propagation and even error escalation (an erroneous estimate favors an erroneous data sequence, which, in turn, leads to a worse estimate). As a means of mitigating this problem, we partitioned the sequence metric into a forward term, a backward term, and a penalty term, where the forward and backward terms were computed in a recursive fashion by means of the recursive least-squares algorithm. The purpose of this approach was twofold. First, we hoped to benefit from estimation diversity, i.e., we hoped that either the forward sequence metric or the backward sequence metric could be associated with a good estimate of the unknown parameter vector. When the forward sequences were completed with the backward sequences, the penalty term increased the total sequence metric whenever there was a divergence between the associated forward and backward estimation processes. Second, our bi-directional search method was able to provide a fairly large number of diversified sequences at reasonable cost. It should be noted that the conventional forward-backward algorithm (which handles an astronomical number of sequences) is not directly applicable to our problem, since it relies on the concept of mergers, i.e., it relies on a trellis description. The derived algorithm was finally evaluated by considering iterative detection of interleaved serially concatenated codes. The outer code was a simple rate 1/2 repetition code, while minimum-shift keying constituted the inner encoder. This system lead to a trivial decoding rule concerning the outer code, and the receiver complexity was thus determined by the inner soft decoder, i.e., our generalized APP algorithm. Concerning future work, it could be interesting to evaluate other pruning schemes, such as one based on the M-algorithm [6], or more generally, one based on the SA(B, C) [8]. One could further try to generate additional sequences using ad hoc heuristic search strategies, similar to the methods proposed by Lieberman and Allebach [105]. Since the RLS algorithm is computationally expensive, it is desirable to employ some alternative algorithm with similar convergence properties but lower complexity [87]. An obvious choice is the LMS estimator [87, 33]. We should provide some analytical bounds on the investigated algorithm. It is also interesting to devise optimized codes, which could supposedly be done with the help of an EXIT chart [152]. Moreover, we should investigate the error performance and tracking capability of our algorithm when the unknown channel is time-varying.
129 To conclude, the strength of the presented material is its generality. Although we have focused on point-to-point digital communications, there are a great number of relevant applications in fields as diverse as exploratory seismology, machine learning, data mining, and econometrics. More obvious, some of the employed concepts can be used in other communication problems, such as low-complexity iterative detection. Since the standard forward-backward algorithm is computationally expensive for a large trellis, one could perform a bi-directional search in this large trellis—very much the same as our bi-directional tree search—to reduce the size/complexity [23, 29].
130
Chapter 6
Appendix A
Details of the Metric Recursion The aim of this appendix is to derive a recursion for the LS metric. Various recursive expressions for the LS metric are known [31, 33, 149]. We will generalize the form presented in [31]. Let us substitute the forward estimate, given by (3.5) and (3.4),
k *k = P *k q * k * k H Xk rk , h 0 0 * 0 = P0 B 0 0 0
(A.1)
in the weighted forward LS metric (3.1), * k 2 . * k ) (Xk )1/2 rk − B *k h * k0 ; x ≡ Λx (rk0 , B Λ 0 0 0 0 0
(A.2)
The alternative form of the metric becomes * k ) = (Xk )1/2 Q * k rk 2 , Λx (rk0 , B 0 0 0 0
(A.3)
expressed in the error projection matrix
* k * k H Xk . *k P *k I − B Q 0 0 0 B0 0
(A.4)
In order to obtain a recursion for the LS metric, we first set out to find a * k. recursion for Q 0 * k, Rewrite Q 0 * *k = I − B *k C Q (A.5) 0 0 k, 131
132
Appendix A
* k, in terms of C
k H k *k P * *k B C X0 , 0 0
(A.6)
* k in (3.12), and recall that we have already found a recursion for P 0
k−1 k −1 * =x I−G *k P *k B * P (A.7) 0 0 . * k, * k in C Use this recursion to substitute P 0
k H k
k−1 k H k *k B *k P * *k B * * *k P X0 = x−1 I − G X0 . B C 0 0 0 0
(A.8)
*k = * k recursively by applying the partitioning B can now express C 0 k−1 We k−1 T T T k * * and X0 = diag xX0 , I , (B0 ) Bk ( , xXk−1 0
k−1 ' k−1 H −1 H 0 * * * * * * Bk (B0 ) Ck = x I − Gk Bk P0 0 I , k−1 k−1 H k−1
k−1 H
* *k P *k P * * *k B * *k B = I−G (B ) X x−1 I − G B 0
=
'
0
* k−1 *k C *k B I−G
0
0
( *k , G
k
(A.9)
* k from (A.6), as well as the form of G *k where we used the definition of C given in (3.14). Next, return to the expression for the error projection matrix in (A.5), * k , given by (A.9), and employ our newly derived recursion for C , - * k−1 ' ( , Q Q
B0 I 0 1 2 k * * * * * Q0 = . I − Gk Bk Ck−1 Gk = − * Q Q 0 I 3 4 Bk (A.10) Here, the sub-matrices Q1 , Q2 , Q3 , and Q4 are found by identification, * k−1 C * k−1 G * k−1 + B *k B *k C * k−1 Q1 = I − B 0 0 k−1 k−1 *k B *k C * k−1 , * * G = Q +B 0
0
* k−1 G * k, Q2 = −B
0 *k G *k C *k B * k−1 , Q3 = − I − B *k G * k. Q4 = I − B
(A.12) (A.13) (A.14)
* k , which means that *k V *k = U Equation (3.23) stated that G
−1 *k − B *k V *k = V *k V * k, * −1 V *k U * −B *k U Q4 = V k
(A.11)
k
(A.15)
133
*k xI + B *k U * k −1 , which yields while (3.22) defined V * k. Q4 = x V
(A.16)
It is then possible to rewrite Q3 as *k B *k C * k−1 . Q3 = −x V
(A.17)
* k , we are now ready to After having found a desired recursion for Q 0 return to the metric in (A.3), * k rk 2 = (Xk )1/2 Q * k rk H (Xk )1/2 Q * k rk , * k ) (Xk )1/2 Q Λx (rk0 , B 0 0 0 0 0 0 0 0 0 0 (A.18) which contains the factor , (k−1)/2 Q1 Q2 x1/2 X0 0 k 1/2 * k k k 1/2 k r0 = (X0 ) Q0 r0 = (X0 ) Q3 Q4 0 I , - , * k−1 G * k−1 G *k B *k C * k−1 −B *k * k−1 + B a Q rk−1 0 0 0 0 = , × *k B *k *k C * k−1 b rk −x V xV (A.19) where the two sub-matrices a and b are easily identified, ' ( (k−1)/2 * k−1 k−1 * k−1 G *k C * k rk − B * k−1 rk−1 , (A.20) a = x1/2 X0 Q0 r0 − B 0 0 ' ( * k−1 rk−1 . *k C * k rk − B (A.21) b = xV 0 Consider the expression within the square brackets,
k−1 H k−1 k−1 * k−1 *k C *k P *k h * k−1 rk−1 = rk − B * k−1 B * X0 r0 = rk − B rk − B 0 0 0 0 =* ek−1 ,
(A.22)
where we employed (A.6), (A.1), and (3.19), respectively. Once again, look at the forward LS metric (cf. (A.18) and (A.19)), * k) Λx (rk0 , B 0
, H H a = aH a + bH b, = a b b
(A.23)
134
Appendix A
and expand the first term by means of (A.20) and (A.22), ' (H ' ( k−1 * k−1 k−1 * k−1 rk−1 − B * k−1 G * k−1 G *k* *k* e e X r − B aH a = x Q Q k−1 k−1 0 0 0 0 0 0 0 k−1 k−1 H k−1 k−1 k−1 * * Q =x Q r0 X0 r0 0 0 k−1 k−1 H k−1 k−1 * *k* * ek−1 G − x Q r0 X0 B 0 0 H k−1 k−1 k−1 k−1 *k* * * ek−1 G − x B X0 Q r0 0 0 k−1 H k−1 k−1 * *k* * *k* ek−1 ek−1 G G + x B X0 B 0 0 = a1 − a2 − a3 + a4 , where
(A.24)
1/2 * k−1 k−1 2 , a1 = x (Xk−1 Q0 r0 0 ) a2 = a3 = a4 =
aH 3 , *H x* eH k−1 Gk *H x* eH k−1 Gk
(A.25) (A.26)
* k−1 )H (B 0 H * (Bk−1 0 )
* k−1 rk−1 , X0k−1 Q 0 0 k−1 * k−1 * ek−1 . X 0 B 0 Gk *
(A.27) (A.28)
*H *k = P *k B The term a3 can be rewritten by means of the relation G 0 k given in (3.15), * * k * k−1 H k−1 Q * k−1 rk−1 , eH a3 = x * k−1 Bk P0 (B0 ) X0 0 0
(A.29)
* k is trivially Hermitian as seen from the where it was also used that P 0 definition in (3.3). Next, employ the recursion (A.7) and invoke (A.6), ' k−1 k−1 H k−1 ( k−1 k−1 * *k I − G *k P *k B * * eH (B r0 a3 = * B Q k−1 0 0 ) X0 0 * * k−1 rk−1 . * * * = * eH (A.30) k−1 Bk I − Gk Bk Ck−1 Q0 0 * k−1 rk−1 is an all-zero vector. This is seen by first re* k−1 Q The factor C 0 0 calling that the metric
* 2 = rk − B * H Xk rk − B * *k h *k h *k h (A.31) Lk0 (Xk0 )1/2 rk0 − B 0 0 0 0 0 0 *=h * k , i.e., is minimum when h 0 & & ∂Lk0 & * k *k * k )H Xk rk + (B * k )H Xk B = −(B & 0 0 0 0 0 0 h0 = 0 ∗ * & ∂ h h= * h *k 0 * k = 0, *k h * k )H Xk rk − B ⇐⇒ (B 0
0
0
0
0
(A.32)
135 which is known as the principle of orthogonality [50]. Moreover, (A.1) and *k = C * k rk , so (A.32) implies that (A.6) show that h 0 0 k *k H k *k k * k )H Xk I − B * *k C (A.33) (B 0 0 0 k r0 = (B0 ) X0 Q0 r0 = 0, * k (which is where (A.5) was used. Finally, multiply from the left by P 0 nonzero), and use (A.6) another time, * k rk = 0, *k Q C 0 0
(A.34)
which holds irrespective of the time index k. We have thus proved that the * k−1 rk−1 is zero, which means that both a3 and a2 vanish. * k−1 Q factor C 0 0 Now, start to manipulate the term a4 by using the definition in (3.3), k H k k −1 * * k = (B * ) X B P , 0 0 0 0 * H * k−1 −1 G *k* ek−1 . eH a4 = x * k−1 Gk (P0 )
(A.35)
* k , and (3.21), *k V *k = U Rewrite the new expression by invoking (3.23), G k−1 * H * * Uk = P0 Bk , * H * * k−1 B *H V *k * ek−1 , a4 = x * eH k−1 Vk Bk P0 k
(A.36)
* k−1 is Hermitian as seen from (3.3). where it was used that P 0 Next, expand the second term in (A.23) by means of (A.21) and (A.22), * H * ek−1 . eH bH b = x2 * k−1 Vk Vk *
(A.37)
The weighted forward LS metric in (A.3) can finally be expressed by means of (A.23), (A.24), (A.25), (A.36), and (A.37), * k ) = (Xk )1/2 Q * k rk 2 = aH a + bH b = a1 + a4 + bH b Λx (rk0 , B 0 0 0 0 * k−1 rk−1 2 = x (Xk−1 )1/2 Q =
0 0 0 H H * * k−1 * H * * *H ek−1 + x2 * eH + x* ek−1 Vk Bk P0 Bk Vk * k−1 Vk * k−1 x Λx (rk−1 0 , B0 ) ' ( *H B * k−1 B *H + xI V *k P *k * ek−1 V + x* eH k−1 k k 0
* k−1 * H * −1 * ek−1 = x Λx (rk−1 eH k−1 Vk Vk Vk * 0 , B0 ) + x * *H * * k−1 ) + x * ek−1 , = x Λx (rk−1 , B eH V 0
0
k−1
k
*k * ek−1 V
(A.38)
136
Appendix A
*k = P * k−1 B * k = (x I + B *k U * H , and (3.22), V * k )−1 . Also, if we use (3.21), U 0 k * k is Hermitian, which is obvious from (3.21) and (3.22), so the matrix V the metric increment may be written as ˜k x * * ek−1 . xλ eH k−1 Vk *
(A.39)
Appendix B
Details of the Metric Partitioning In this appendix, we will prove how it is possible to partition the LS metric into (I) a term that involves the past through an associated forward estimate, (II) a term that involves the future through an associated backward estimate, and (III) a completion term that functions as a penalty for mismatching forward/backward estimates. In our calculations, we will employ the Sherman-Morrison-Woodbury (SMW) formula [50] in order to split the LS expression. This idea was first sketched in [4]. Just like in Chapter 3, we consider a weighted LS metric for generality [32],
* ML,x 2 * K−1 ) (XK−1 )1/2 rK−1 − B * K−1 h * K−1 ≡ Λx (rK−1 , B Λ 0;x 0 0 0 0 0
* ML,x H XK−1 rK−1 − B * ML,x , (B.1) * K−1 h * K−1 h = rK−1 −B 0 0 0 0 0 , and this time we define an N K × N K forgetting matrix XK−1 0 , k X0 0 K−1 , X0 0 xXK−1 k+1
(B.2)
comprising the N (k + 1) × N (k + 1) past weight matrix Xk0 defined in (3.2) and the N (K − k − 1) × N (K − k − 1) future weight matrix in (3.33). By defining an N L × N L matrix '
(−1 * n H Xn B *n *n B , 0 ≤ m < n ≤ K − 1, (B.3) P m m m m 137
138
Appendix B
and an N L × 1 column vector
n H n n * *nm B q Xm rm , 0 ≤ m < n ≤ K − 1, m
(B.4)
the weighted ML estimate can be expressed as * ML,x = P * K−1 q *K−1 , h 0 0
(B.5)
which is a straightforward extension of the results in Chapter 2. Likewise, in Chapter 2 we introduced a weighted ML forward estimate1 k *k = P *k q h 0 0 *0
⇐⇒
*k = q * k )−1 h *k0 , (P 0 0
(B.6)
and a weighted ML backward estimate * K−1 = q * K−1 )−1 h *K−1 (P (B.7) k+1 k+1 k+1 . * k )T (B * K−1 )T T , we can easily * K−1 = (B Now, by first partitioning B 0 0 k+1 * K−1 )−1 , partition (P 0 - * k ' (, Xk
K−1 −1 B0 0 K−1 k H H 0 * ) (B * * (B P K−1 0 0 k+1 ) * K−1 0 xXk+1 B k+1 K−1 K−1 K−1 k H k k H * ) X B * * + x(B * = (B ) X B K−1 * K−1 = P * K−1 q h k+1 k+1 * k+1
0
⇐⇒
0
0
k+1
k+1
k+1
* k )−1 + x(P * K−1 )−1 . = (P 0 k+1 T T , we get = (rk0 )T (rK−1 Likewise, if we also partition rK−1 0 k+1 )
(B.8)
* k )H Xk rk + x(B * K−1 )H XK−1 rK−1 = q *k0 + x* *K−1 = (B qK−1 q 0 0 0 0 k+1 k+1 k+1 k+1 * k + x(P * K−1 , * k )−1 h * K−1 )−1 h = (P (B.9) 0
0
k+1
k+1
where we used the rightmost relations in (B.6) and (B.7). The estimate * ML,x in (B.5) can then be rewritten by means of (B.8) and (B.9), h (−1 ' ( ' * ML,x = (P * k +x(P * K−1 . (B.10) * k )−1 h * k )−1 +x(P * K−1 )−1 * K−1 )−1 h ( P h 0 0 0 k+1 k+1 k+1 Next, we will partition this estimate (into a past term, a future term, and a residual term) by repeated use of the SMW formula. This identity exists in various forms, and we will first apply
−1 −1 (A + B)−1 = A−1 − A−1 A−1 + B−1 A , (B.11) 1
The indices ML and x are suppressed for notational brevity.
139 * k )−1 and B = x(P * K−1 )−1 , which yields with A = (P 0 k+1 ( '
k * ML,x = P * K−1 −1 P *k * + x−1 P *k − P *k P h 0 0 0 0 k+1 ( ' k −1 k * + x(P * K−1 * ) h * K−1 )−1 h × (P 0 0 k+1 k+1
k * k + xP * K−1 *k − P * K−1 −1 h * + x−1 P *k P * k (P * K−1 )−1 h = h 0 0 0 0 0 k+1 k+1 k+1
k *k −1 * K−1 −1 * k * K−1 −1 * K−1 * P0 (Pk+1 ) hk+1 . − xP0 P0 + x Pk+1 (B.12) Once again, apply the SMW formula to the last term in (B.12), now with * k and B = x−1 P * K−1 . The last term then takes the form A=P 0 k+1 , −1 * K−1 * k )−1−(P * k (P * k )−1 (P * K−1 )−1 (P * k )−1 P * K−1 )−1 h * k (P * k )−1+x(P −x P 0
0
0
0
k+1
0
0
k+1
k+1
−1 * K−1 + x (P * K−1 , * k (P * K−1 )−1 h * k )−1 + x(P * K−1 )−1 * K−1 )−1 h = −xP (P 0 0 k+1 k+1 k+1 k+1 k+1 (B.13) which, in turn, means that we can rewrite the ML estimate in (B.12) as −1 *k − P *k * ML,x = h * K−1 * k + x−1 P *k P h h 0 0 0 0 k+1 −1 * K−1 . (B.14) * k )−1 + x(P * K−1 )−1 * K−1 )−1 h + x (P (P 0 k+1 k+1 k+1 Next, consider the last term in (B.14) and use the SMW formula with * k )−1 , * K−1 )−1 and B = (P A = x(P 0 k+1 −1 * K−1 * k )−1 + x(P * K−1 )−1 * K−1 )−1 h x (P (P 0 k+1 k+1 k+1 , −1 K−1 −1 * K−1 −1 * K−1 −1 * K−1 k −1 * K−1 * * K−1 −1 h * = x x Pk+1 − x Pk+1 x Pk+1 + P x P 0 k+1 (Pk+1 ) k+1 −1 * K−1 − x−1 P * K−1 . * K−1 P * K−1 * k + x−1 P =h h 0 k+1 k+1 k+1 k+1
(B.15)
We have thus finally arrived at an elegant partitioning of the ML estimate, *k + h * K−1 − h *c , * ML,x = h h 0 k k+1
(B.16)
if we define the completion estimate
k
k * k + x−1 P * K−1 . *c P * + x−1 P * + x−1 P * K−1 −1 h * K−1 P * K−1 −1 h *k P h 0 0 0 0 k k+1 k+1 k+1 k+1 (B.17)
140
Appendix B
Let us now return to the weighted LS metric in (B.1), H , - * ML,x * ML,x *k h *k h k rk0 − B rk0 − B X 0 0 0 K−1 0 * Λ 0;x = * K−1 * * K−1 * 0 xXK−1 rK−1 rK−1 k+1 k+1 − Bk+1 hML,x k+1 − Bk+1 hML,x
2 * *k h = (Xk0 )1/2 rk0 − B 0 ML,x K−1 1/2 K−1 * ML,x 2 . (B.18) * K−1 h r + x(X ) −B k+1
k+1
k+1
Consider the first term and use (B.16), ' k 1/2 k k K−1 ( * *c ) H * ML,x 2 = rk − B *k H − B *k h *k h * (h X0 ) r0 − B − h 0 0 0 0 0 k k+1 ' ( * K−1 − h *c ) *k − B * k (h *k h × Xk0 rk0 − B 0 0 0 k k+1 k 1/2 k
2 2 * K−1 − h *c * k + (Xk )1/2 B *k h *k h = (X0 ) r0 − B 0 0 0 0 k k+1 ' (
H *c *k * K−1 − h *k h *k h − rk0 − B Xk0 B 0 0 0 k k+1 ' (
K−1 H * * k } . (B.19) *c * k )H Xk {rk − B *k h − h ( B − h 0 0 0 0 0 k k+1 Here, the last two terms are the Hermitian transpose of each other, and they are zero due to the principle of orthogonality in (A.32). The second term in (B.18) can be expressed in an equivalent form, and we thus arrive at
K−1 * c 2 * k 2 + (Xk )1/2 B * *k h *k h * K−1 = (Xk )1/2 rk − B Λ 0 0 0 0 0 0 0;x k+1 − hk
K−1 1/2 * K−1 2 * K−1 h + x (XK−1 rk+1 − B k+1 ) k+1 k+1
1/2 * K−1 * k * c 2 . Bk+1 h0 − h + x (XK−1 (B.20) k k+1 ) Next, rewrite the second term in (B.20), k 1/2 k K−1 * c 2 * * h (X0 ) B 0 k+1 − hk k K−1 k K−1 * c H Xk B *c * * * h * h = B 0 0 0 k+1 − hk k+1 − hk
K−1
* c H (B * * k * K−1 * c * k )H Xk B = h 0 0 0 hk+1 − hk k+1 − hk
K−1
K−1 * * * c H (P *c * k )−1 h = h 0 k+1 − hk k+1 − hk k −1/2 K−1 * c 2 , * * ) = (P h 0 k+1 − hk
(B.21)
and analogously, rewrite the fourth term in (B.20), K−1 −1/2 k
k * c 2 = x(P * c 2 . * −h * −h * K−1 h * x(XK−1 )1/2 B ) h
(B.22)
k+1
k+1
0
k
k+1
0
k
141 Expressed in the completion term (or binding term) k −1/2 K−1 K−1 −1/2 k * c 2 + x (P * c 2 , * * −h * ) * * c (P − h ) h h Λ 0 0 k k k k+1 k+1
(B.23)
the metric in (B.20) may finally be written as * K−1 ) = Λx (rk , B * k ) + x Λx (rK−1 , B * K−1 ) + Λ *c * K−1 ≡ Λx (rK−1 , B Λ 0 0 k 0;x 0 0 k+1 k+1 *k + x Λ * K−1 + Λ *c . ≡ Λ 0;x k k+1 ; x
(B.24)
142
Appendix B
References [1] S. M. Aji and R. J. McEliece, “The generalized distributive law,” IEEE Transactions on Information Theory, vol. 46, no. 2, pp. 325– 343, Mar. 2000. [2] I. Altunba¸s and K. R. Narayanan, “Serial concatenated coding schemes with minimum shift keying,” Electronics Letters, vol. 37, no. 23, pp. 1393–1395, Nov. 2001. [3] S. Amari, A. Cichocki, and H. Yang, “A new learning algorithm for blind signal separation,” in Advances in neural information processing systems, David S. Touretzky, Michael C. Mozer, and Michael E. Hasselmo, Eds., Cambridge, MA, 1996, vol. 8, pp. 757–763, MIT Press. [4] A. Anastasopoulos, Adaptive soft-input soft-output algorithms for iterative detection, Ph.D. thesis, University of Southern California, Los Angeles, CA, Oct. 1999. [5] A. Anastasopoulos and K. M. Chugg, “Adaptive soft-input softoutput algorithms for iterative detection with parametric uncertainty,” IEEE Transactions on Communications, vol. 48, no. 10, pp. 1638–1649, Oct. 2000. [6] J. B. Anderson, “Limited search trellis decoding of convolutional codes,” IEEE Transactions on Information Theory, vol. 35, no. 5, pp. 944–955, Sept. 1989. [7] J. B. Anderson, T. Aulin, and C.-E. Sundberg, Digital Phase Modulation, Plenum Press, New York, NY, 1986. 143
144
References
[8] T. M. Aulin, “Breadth-first maximum-likelihood sequence detection: Basics,” IEEE Transactions on Communications, vol. 47, no. 2, pp. 208–216, Feb. 1999. [9] E. Baccarelli and R. Cusani, “Combined channel estimation and data detection using soft statistics for frequency-selective fast-fading digital links,” IEEE Transactions on Communications, vol. 46, no. 4, pp. 424–427, Apr. 1998. [10] L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding of linear codes for minimizing symbol error rate,” IEEE Transactions on Information Theory, vol. 20, no. 2, pp. 284–287, Mar. 1974. [11] C. A. Balanis, Antenna Theory: Analysis and Design, Harper & Row, Cambridge, MA, 1982. [12] G. Battail, “Construction explicite de bons codes longs,” Annales des T´el´ecommunications, vol. 44, no. 7–8, pp. 392–404, July/Aug. 1989. [13] G. Battail, “A conceptual framework for understanding turbo codes,” IEEE Journal on Selected Areas in Communications, vol. 16, no. 2, pp. 245–254, Feb. 1998. [14] L. E. Baum and T. Petrie, “Statistical inference for probabilistic functions of finite state Markov chains,” Annals of Mathematical Statistics, vol. 37, pp. 1554–1563, 1966. [15] A. J. Bell and T. J . Sejnowski, “An information-maximization approach to blind separation and blind deconvolution,” Neural Computation, vol. 7, no. 6, pp. 1129–1159, 1995. [16] S. Benedetto and E. Biglieri, Principles of Digital Transmission: With Wireless Applications, Kluwer Academic / Plenum Publishers, New York, NY, 1999. [17] S. Benedetto, D. Divsalar, G. Montorsi, and F. Pollara, “A softinput soft-output APP module for iterative decoding of concatenated codes,” IEEE Communications Letters, vol. 1, no. 1, pp. 22–24, Jan. 1997.
145 [18] S. Benedetto, D. Divsalar, G. Montorsi, and F. Pollara, “Serial concatenation of interleaved codes: Performance analysis, design, and iterative decoding,” IEEE Transactions on Information Theory, vol. 44, no. 3, pp. 909–926, May 1998. [19] J. O. Berger, Statistical Decision Theory and Bayesian Analysis, Springer-Verlag, New York, NY, 2nd edition, 1985. [20] J. O. Berger, B. Liseo, and R. L. Wolpert, “Integrated likelihood methods for eliminating nuisance parameters,” Technical Report No. 96-7C, Purdue University, West Lafayette, IN, Nov. 1996. [21] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon limit error-correcting coding and decoding: Turbo codes,” in Proc. IEEE International Conference on Communications, Geneva, Switzerland, May 1993, pp. 1064–1070. [22] E. Biglieri, J. Proakis, and S. Shamai (Shitz), “Fading channels: Information-theoretic and communication aspects,” IEEE Transactions on Information Theory, vol. 44, no. 6, pp. 2619–2692, Oct. 1998. [23] D. N. Bokolamulla, A. ˚ A. Hansson, and T. M. Aulin, “Lowcomplexity iterative detection based on bi-directional trellis search,” submitted to IEEE International Symposium on Information Theory, Yokohama, Japan, June/July 2003. [24] J.-F. Cardoso, “Infomax and maximum likelihood for blind source separation,” IEEE Signal Processing Letters, vol. 4, no. 4, pp. 112– 114, Apr. 1997. [25] L. Carleson, “On convergence and growth of partial sums of Fourier series,” Acta Mathematica, vol. 116, pp. 135–157, 1966. [26] P. Castoldi and R. Raheli, “New recursive formulations of optimal detectors for Rayleigh fading channels,” in Proc. IEEE Global Telecommunications Conference, London, UK, Nov. 1996, pp. 11–15. [27] R. W. Chang and J. C. Hancock, “On receiver structures for channels having memory,” IEEE Transactions on Information Theory, vol. 12, no. 4, pp. 463–468, Oct. 1966.
146
References
[28] J. Chen, M. P. C. Fossorier, S. Lin, and C. Xu, “Bi-directional SOVA decoding for turbo-codes,” IEEE Communications Letters, vol. 4, no. 12, pp. 405–407, Dec. 2000. [29] X. Chen and K. M. Chugg, “Reduced-state soft-input/soft-output algorithms for complexity reduction in iterative and non-iterative data detection,” in Proc. IEEE International Conference on Communications, New Orleans, LA, June 2000, vol. 1, pp. 6–10. [30] K. Chugg, A. Anastasopoulos, and X. Chen, Iterative Detection: Adaptivity, Complexity Reduction, and Applications, Kluwer Academic Publishers, Norwell, MA, 2001. [31] K. M. Chugg, Sequence estimation in the presence of parametric uncertainty, Ph.D. thesis, University of Southern California, Los Angeles, CA, Aug. 1995. [32] K. M. Chugg, “Blind acquisition characteristics of PSP-based sequence detectors,” IEEE Journal on Selected Areas in Communications, vol. 16, no. 8, pp. 1518–1529, Oct. 1998. [33] K. M. Chugg and A. Polydoros, “MLSE for an unknown channel— Part I: Optimality considerations,” IEEE Transactions on Communications, vol. 44, no. 7, pp. 836–846, July 1996. [34] R. H. Clarke, “A statistical theory of mobile-radio reception,” Bell System Technical Journal, vol. 47, no. 6, pp. 957–1000, July/Aug. 1968. [35] R. R. Coifman and M. V. Wickerhauser, “Entropy-based algorithms for best basis selection,” IEEE Transactions on Information Theory, vol. 38, no. 2, pp. 713–718, Mar. 1992. [36] P. Comon, C. Jutten, and J. Herault, “Blind separation of sources, Part II: Problem statement,” Signal Processing, Elsevier Science, vol. 24, pp. 11–21, 1991. [37] O. Coskun and K. M. Chugg, “Combined coding and training for unknown isi channels,” submitted to IEEE Transactions on Communications.
147 [38] I. Csisz´ar and J. K¨ orner, Information Theory: Coding Theorems for Discrete Memoryless Systems, Academic Press, New York, NY, 1981. [39] I. Daubechies, Ten Lectures on Wavelets, vol. 61 of CBMS-NFS regional conference series in applied mathematics, Society for Industrial and Applied Mathematics, Philadelphia, PA, 1992. [40] L. M. Davis, I. B. Collings, and P. Hoeher, “Joint MAP equalization and channel estimation for frequency-selective and frequencyflat fast-fading channels,” IEEE Transactions on Communications, vol. 49, no. 12, pp. 2106–2114, Dec. 2001. [41] L. Debnath and P. Mikusi´ nski, Introduction to Hilbert Spaces with Applications, Academic Press, San Diego, CA, 2nd edition, 1999. [42] A. P. Dempster, N. M. Laird, and D. B. Rudin, “Maximum likelihood from incomplete data via the EM algorithm,” Journal of the Royal Statistical Society, vol. 39, pp. 1–38, 1977. [43] D. Divsalar, S. Dolinar, and F. Pollara, “Iterative turbo decoder analysis based on density evolution,” IEEE Transactions on Information Theory, vol. 19, no. 5, pp. 891–907, May 2001. [44] D. Divsalar and F. Pollara, “Hybrid concatenated codes and iterative decoding,” TDA Progress Report 42-130, Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA, Aug. 1997. [45] P. M. Djuri´c and S. J. Godsill, Eds., Special issue on Monte Carlo methods for statistical signal processing, IEEE Transactions on Signal Processing, vol. 50, no. 2, Feb. 2002. [46] D. L. Donoho, “Unconditional bases are optimal bases for data compression and for statistical estimation,” Applied and Computational Harmonic Analysis, vol. 1, no. 1, pp. 100–115, Dec. 1993. [47] C. Douillard, M. J´ez´equel, C. Berrou, A. Picart, P. Didier, and A. Glavieux, “Iterative correction of intersymbol interference: Turbo-equalization,” European Transactions on Telecommunications, vol. 6, no. 5, pp. 507–511, Sept./Oct. 1995.
148
References
[48] R. J. Duffin and A. C. Schaeffer, “A class of nonharmonic Fourier series,” Transactions of the American Mathematical Society, vol. 72, pp. 341–366, 1952. [49] R. B. Ertel, P. Cardieri, K. W. Sowerby, T. S. Rappaport, and J. H. Reed, “Overview of spatial channel models for antenna array communication systems,” IEEE Personal Communications, vol. 5, no. 1, pp. 10–22, Feb. 1998. [50] B. Farhang-Boroujeny, Adaptive Filters: Theory and Applications, John Wiley & Sons, Chichester, UK, 1998. [51] M. Feder and A. Lapidoth, “Universal decoding for channels with memory,” IEEE Transactions on Information Theory, vol. 33, no. 5, pp. 1726–1745, Sept. 1998. [52] M. Feder and N. Merhav, “Universal composite hypothesis testing: A competitive minimax approach,” IEEE Transactions on Information Theory, vol. 48, no. 6, pp. 1504–1517, June 2002. [53] G. Ferrari, On Iterative Detection for channels with memory, Ph.D. thesis, University of Parma, Parma, Italy, Nov. 2001. [54] G. D. Forney, Jr., Concatenated Codes, MIT Press, Cambridge, MA, 1966. [55] G. D. Forney, Jr., “Maximum-likelihood sequence estimation of digital sequences in the presence of intersymbol interference,” IEEE Transactions on Information Theory, vol. 18, no. 3, pp. 363–378, May 1972. [56] G. D. Forney, Jr., “The viterbi algorithm,” Proceedings of the IEEE, vol. 61, no. 3, pp. 268–278, Mar. 1973. [57] B. Friedlander and B. Porat, “Performance analysis of transient detectors based on a class of linear data transforms,” IEEE Transactions on Information Theory, vol. 38, no. 2, pp. 665–673, Mar. 1992. [58] D. Gabor, “Theory of communication,” Journal of the Institute for Electrical Engineers, vol. 93, no. III, pp. 429–457, Nov. 1946.
149 [59] R. G. Gallager, Low Density Parity Check Codes, Sc.D. thesis, MIT, Cambridge, MA, Sept. 1960. [60] R. G. Gallager, “Low-density parity-check codes,” IRE Transactions on Information Theory, vol. 8, no. 1, pp. 21–28, Jan. 1962. [61] H. El Gamal and A. R. Hammons, Jr., “Analyzing the turbo decoder using the Gaussian approximation,” IEEE Transactions on Information Theory, vol. 47, no. 2, pp. 671–686, Feb. 2001. [62] J. Garcia-Frias and J. D. Villasenor, “Combining hidden Markov source models and parallel concatenated codes,” IEEE Communications Letters, vol. 1, no. 4, pp. 111–113, July 1997. [63] J. Garcia-Frias and J. D. Villasenor, “Blind turbo decoding and equalization,” in Proc. IEEE Vehicular Technology Conference, Houston, TX, May 1999, vol. 3, pp. 1881–1885. [64] M. J. Gertsman and J. H. Lodge, “Symbol-by-symbol MAP demodulation of CPM and PSK signals on Rayleigh flat-fading channels,” IEEE Transactions on Communications, vol. 45, no. 7, pp. 788–799, July 1997. [65] M. Ghosh and C. L. Weber, “Maximum-likelihood blind equalization,” in Proc. SPIE, S. Haykin, Ed., San Diego, CA, July 1991, The International Society for Optical Engineering, vol. 1565, Adaptive Signal Processing, pp. 188–195. [66] G. B. Giannakis and C. Tepedelenlioˇ glu, “Basis expansion models and diversity techniques for blind identification and equalization of time-varying channels,” Proceedings of the IEEE, vol. 86, no. 10, pp. 1969–1986, Oct. 1998. [67] D. N. Godard, “Self-recovering equalization and carrier tracking in two-dimensional data communication systems,” IEEE Transactions on Communications, vol. 28, no. 11, pp. 1867–1875, Nov. 1980. [68] G. H. Golub and C. F. Van Loan, Matrix Computations, The Johns Hopkins University Press, Baltimore, MD, 3rd edition, 1996.
150
References
[69] P. Ha and B. Honary, “Improved blind turbo detector,” in Proc. IEEE Vehicular Technology Conference, Tokyo, Japan, 2000, vol. 2, pp. 1196–1199. [70] J. Hagenauer, E. Offer, and L. Papke, “Iterative decoding of binary block and convolutional codes,” IEEE Transactions on Information Theory, vol. 42, no. 2, pp. 429–445, Mar. 1996. [71] A. Hansson, “Detection principles for fast Rayleigh fading channels using an antenna array,” Technical Report No. 343L, Chalmers University of Technology, Gothenburg, Sweden, Apr. 2000. [72] A. ˚ A. Hansson and T. M. Aulin, “Generalized APP detection for unknown ISI channels,” submitted to IEEE International Symposium on Information Theory, Yokohama, Japan, June/July 2003. [73] A. Hansson and T. Aulin, “On the discretization of unsupervised digital communication over time-dispersive channels,” in Proc. IEEE International Symposium on Information Theory, Lausanne, Switzerland, June/July 2002, p. 270. [74] A. Hansson, K. M. Chugg, and T. Aulin, “A forward-backward algorithm for fading channels using forward-only estimation,” Technical Report CSI-00-11-02, Communication Sciences Institute, USC, Los Angeles, CA, Nov. 2000. [75] A. Hansson, K. M. Chugg, and T. Aulin, “On forward-adaptive versus forward/backward-adaptive SISO algorithms for Rayleigh fading channels,” IEEE Communications Letters, vol. 5, no. 12, pp. 477– 479, Dec. 2001. [76] A. ˚ A. Hansson and T. M. Aulin, “On antenna array receiver principles for space–time-selective Rayleigh fading channels,” IEEE Transactions on Communications, vol. 48, no. 4, pp. 648–657, Apr. 2000. [77] A. ˚ A. Hansson and T. M. Aulin, “Iterative array detection of CPM over continuous-time Rayleigh fading channels,” in Proc. IEEE International Conference on Communications, Helsinki, Finland, June 2001, pp. 2221–2225.
151 [78] A. ˚ A. Hansson and T. M. Aulin, “Iterative diversity detection for correlated continuous-time rayleigh fading channels,” to appear in IEEE Transactions on Communications, Jan. 2003. [79] U. Hansson, Efficient Digital Communication over the Time Continuous Rayleigh Fading Channel, Ph.D. thesis, Chalmers University of Technology, Gothenburg, Sweden, Dec. 1997. [80] U. Hansson and T. Aulin, “Soft information transfer for sequence detection with concatenated receivers,” IEEE Transactions on Communications, vol. 44, no. 9, pp. 1086–1095, Sept. 1996. [81] U. Hansson and T. Aulin, “Digital signaling on the time continuous Rayleigh fading channel,” in Proc. International Conference on Telecommunications, Porto Carras, Greece, June 1998, pp. 260–264. [82] U. Hansson and T. Aulin, “Aspects on single symbol signaling on the frequency flat Rayleigh fading channel,” IEEE Transactions on Communications, vol. 47, no. 6, pp. 874–883, June 1999. [83] B. D. Hart and S. Pasupathy, “Innovations-based MAP detection for time-varying frequency-selective channels,” IEEE Transactions on Communications, vol. 48, no. 9, pp. 1507–1519, Sept. 2000. [84] H. Hashemi, “The indoor radio propagation channel,” Proceedings of the IEEE, vol. 81, no. 7, pp. 943–968, July 1993. [85] D. Hatzinakos and C. L. Nikias, “Blind equalization using a tricepstrum-based algorithm,” IEEE Transactions on Communications, vol. 39, no. 5, pp. 669–682, May 1991. [86] S. Haykin, Digital Communications, John Wiley & Sons, New York, NY, 1988. [87] S. Haykin, Adaptive Filter Theory, Prentice Hall, Upper Saddle River, NJ, 3rd edition, 1996. [88] S. Haykin, Unsupervised Adaptive Filtering, Volume II: Blind Deconvolution, John Wiley & Sons, New York, NY, 2000.
152
References
˘ zm´ar, “Hy[89] J. H´ amorsk´ y, U. Wachsmann, J. B. Huber, and A. Ci˘ brid automatic repeat request scheme with turbo codes,” in Proc. International Symposium on Turbo Codes & Related Topics, Brest, France, Sept. 1997, pp. 247–250. [90] R. A. Iltis, J. J. Shynk, and K. Giridhar, “Bayesian algorithms for blind equalization using parallel adaptive filtering,” IEEE Transactions on Communications, vol. 42, no. 2/3/4, pp. 1017–1032, Feb./Mar./Apr. 1994. [91] W. C. Jakes, Jr., Microwave Mobile Communications, John Wiley & Sons, New York, NY, 1974. [92] A. J. Jerri, The Gibbs Phenomenon in Fourier Analysis, Splines, and Wavelet Approximations, Kluwer Academic Publishers, Boston, MA, 1998. [93] L. A. Johnston, V. Krishnamurthy, and L. Davis, “On the formation of extrinsic information in turbo decoding,” in Proc. IEEE International Symposium on Information Theory, Washington, DC, June 2001, p. 192. [94] J.-P. Kahane and P.-G. Lemari´e-Rieusset, Fourier Series and Wavelets, vol. 3 of Studies in the Development of Modern Mathematics, Gordon and Breach Publishers, Luxembourg, 1995. [95] B. S. Kashin and A. A. Saakyan, Orthogonal Series, vol. 75 of Translations of Mathematical Monographs, American Mathematical Society, Providence, RI, 1989. [96] R. Kohno, “Structures and theories of software antennas for software defined radio,” IEICE Transactions on Communications, vol. E83-B, no. 6, pp. 1189–1199, June 2000. [97] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, “Factor graphs and the sum-product algorithm,” IEEE Transactions on Information Theory, vol. 47, no. 2, pp. 498–519, Feb. 2001. [98] H. J. Landau and H. O. Pollak, “Prolate spheroidal wave functions, Fourier analysis and uncertainty—II,” Bell System Technical Journal, vol. 40, pp. 65–84, Jan. 1961.
153 [99] H. J. Landau and H. O. Pollak, “Prolate spheroidal wave functions, Fourier analysis and uncertainty—III: The dimension of the space of essentially time- and band-limited signals,” Bell System Technical Journal, vol. 41, pp. 1295–1336, July 1962. [100] C. Laot, A. Glavieux, and J. Labat, “Turbo equalization: Adaptive equalization and channel decoding jointly optimized,” IEEE Journal on Selected Areas in Communications, vol. 19, no. 9, pp. 1744–1752, Sept. 2001. [101] A. Lapidoth and E. Telatar, “Gaussian ISI channels and the generalized likelihood ratio test,” in Proc. IEEE International Symposium on Information Theory, Sorrento, Italy, June 2000, p. 460. [102] A. Lapidoth and J. Ziv, “On the universality of the LZ-based decoding algorithm,” IEEE Transactions on Information Theory, vol. 44, no. 5, pp. 1746–1755, Sept. 1998. [103] P. A. Laurent, “Exact and approximate construction of digital modulations by superposition of amplitude modulated pulses (AMP),” IEEE Transactions on Communications, vol. 34, no. 2, pp. 150–160, Feb. 1986. [104] E. L. Lehmann, Testing Statistical Hypotheses, John Wiley & Sons, New York, NY, 1959. [105] D. J. Lieberman and J. P. Allebach, “A dual interpretation for direct binary search and its implications for tone reproduction and texture quality,” IEEE Transactions on Image Processing, vol. 9, no. 11, pp. 1950–1963, Nov. 2000. [106] J. Lodge, R. Young, P. Hoeher, and J. Hagenauer, “Separable MAP “filters” for the decoding of product and concatenated codes,” in Proc. IEEE International Conference on Communications, Geneva, Switzerland, May 1993, pp. 1740–1745. [107] J. H. Lodge and M. L. Moher, “Maximum likelihood sequence estimation of CPM signals transmitted over Rayleigh flat-fading channels,” IEEE Transactions on Communications, vol. 38, no. 6, pp. 787–794, June 1990.
154
References
[108] H.-L. Lou, “Implementing the viterbi algorithm—Fundamentals and real-time issues for processor designers,” IEEE Signal Processing Magazine, vol. 12, no. 5, pp. 42–52, Sept. 1995. [109] R. W. Lucky, “Techniques for adaptive equalization of digital communication systems,” Bell System Technical Journal, vol. 45, pp. 255–286, Feb. 1966. [110] D. J. C. MacKay, “Good error-correcting codes based on very sparse matrices,” IEEE Transactions on Information Theory, vol. 45, no. 2, pp. 399–431, Mar. 1999, Errata, ibid., vol. 47, no. 5, p. 2101, July 2001. [111] D. J. C. MacKay and R. M. Neal, “Near Shannon limit performance of low density parity check codes,” Electronics Letters, vol. 33, no. 6, pp. 457–458, Mar. 1997. [112] S. Mallat and Z. Zhang, “Matching pursuits with time-frequency dictionaries,” IEEE Transactions on Signal Processing, vol. 41, no. 12, pp. 3397–3415, Dec. 1993. [113] I. D. Marsland and P. T. Mathiopoulos, “Multiple differential detection of parallel concatenated convolutional (turbo) codes in correlated fast Rayleigh fading,” IEEE Journal on Selected Areas in Communications, vol. 16, no. 2, pp. 265–275, Feb. 1998. [114] A. Matache, S. Dolinar, and F. Pollara, “Stopping rules for turbo decoders,” TMO Progress Report 42-142, Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA, Aug. 2000. [115] P. L. McAdam, L. R. Welch, and C. L. Weber, “M.A.P. bit decoding of convolutional codes,” in Proc. IEEE International Symposium on Information Theory, Asilomar, CA, Jan. 1972, p. 91. [116] R. J. McEliece, D. J. C. MacKay, and J.-F. Cheng, “Turbo decoding as an instance of Pearl’s ‘belief propagation’ algorithm,” IEEE Journal on Selected Areas in Communications, vol. 16, no. 2, pp. 140–152, Feb. 1998. [117] J. S. Meditch, Stochastic Optimal Linear Estimation and Control, McGraw-Hill, New York, NY, 1969.
155 [118] U. Mengali and M. Morelli, “Decomposition of M -ary CPM signals into PAM waveforms,” IEEE Transactions on Information Theory, vol. 41, no. 5, pp. 1265–1275, Sept. 1995. [119] P. Moqvist, “Serially concatenated systems: An iterative decoding approach with application to continuous phase modulation,” Technical Report No. 331L, Chalmers University of Technology, Gothenburg, Sweden, Nov. 1999. [120] P. Moqvist, Multiuser Serially Concatenated Continuous Phase Modulation, Ph.D. thesis, Chalmers University of Technology, Gothenburg, Sweden, Oct. 2002. [121] P. Moqvist and T. Aulin, “Power and bandwidth efficient serially concatenated CPM with iterative decoding,” in Proc. IEEE Global Telecommunications Conference, San Francisco, CA, Nov./Dec. 2000, pp. 790–794. [122] K. R. Narayanan and G. L. St¨ uber, “A novel ARQ technique using the turbo coding principle,” IEEE Communications Letters, vol. 1, no. 2, pp. 49–51, Mar. 1997. [123] S. G. Nash and A. Sofer, Linear and Nonlinear Programming, McGraw-Hill, New York, NY, 1996. [124] M. Nied´zwiecki, Identification of Time-varying Processes, John Wiley & Sons, Chichester, UK, 2000. [125] H. Nikookar and H. Hashemi, “Phase modeling of indoor radio propagation channels,” IEEE Transactions on Vehicular Technology, vol. 49, no. 2, pp. 594–606, Mar. 2000. [126] N. J. Nilsson, Principles of Artificial Intelligence, Tioga Publishing Company, Palo Alto, CA, 1980. ¨ [127] C. Nordling and J. Osterman, Physics Handbook: Elementary Constants and Units, Tables, Formulae and Diagrams and Mathematical Formulae, Studentlitteratur, Lund, Sweden, 4th edition, 1987. [128] J. D. Parsons, The Mobile Radio Propagation Channel, Pentech Press, London, UK, 1992.
156
References
[129] S. Pasupathy, “Minimum shift keying: A spectrally efficient modulation,” IEEE Communications Magazine, vol. 17, no. 4, pp. 14–22, July 1979. [130] R. Price and P. E. Green, Jr., “A communication technique for multipath channels,” Proceedings of the IRE, vol. 46, no. 3, pp. 555–570, Mar. 1958. [131] J. G. Proakis, Digital Communications, McGraw-Hill, New York, NY, 3rd edition, 1995. [132] S. U. H. Quereshi, “Adaptive equalization,” Proceedings of the IEEE, vol. 73, no. 9, pp. 1349–1387, Sept. 1985. [133] L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257–286, Feb. 1989. [134] R. Raheli, A. Polydoros, and C. Tzou, “Per-survivor processing: A general approach to MLSE in uncertain environments,” IEEE Transactions on Communications, vol. 43, no. 2/3/4, pp. 354–364, Feb./Mar./Apr. 1995. [135] A. C. Reid, T. A. Gulliver, and D. P. Taylor, “Convergence and errors in turbo-decoding,” IEEE Transactions on Communications, vol. 49, no. 12, pp. 2045–2051, Dec. 2001. [136] B. Rimoldi, “A decomposition approach to CPM,” IEEE Transactions on Information Theory, vol. 34, no. 2, pp. 260–270, Mar. 1988. [137] P. Robertson, E. Villebrun, and P. Hoeher, “A comparison of optimal and sub-optimal MAP decoding algorithms operating in the log domain,” in Proc. IEEE International Conference on Communications, Seattle, WA, June 1995, pp. 1009–1013. [138] Y. Sato, “A method of self-recovering equalization for multilevel amplitude-modulation systems,” IEEE Transactions on Communications, vol. 23, no. 6, pp. 679–682, June 1975.
157 [139] S. R. Saunders, Antennas and Propagation for Wireless Communication Systems, John Wiley & Sons, Chichester, UK, 1999. [140] A. M. Sayeed and B. Aazhang, “Joint multipath-Doppler diversity in mobile wireless communications,” IEEE Transactions on Communications, vol. 47, no. 1, pp. 123–132, Jan. 1999. [141] C. Schlegel, Trellis Coding, IEEE Press, Piscataway, NJ, 1997. [142] N. Seshadri, “Joint data and channel estimation using blind trellis techniques,” IEEE Transactions on Communications, vol. 42, no. 2/3/4, pp. 1000–1011, Feb./Mar./Apr. 1994. [143] C. E. Shannon, “A mathematical theory of communication,” Bell System Technical Journal, vol. 27, no. 3, pp. 379–423 and 623–656, July and Oct. 1948. [144] M. Sipser and D. A. Spielman, “Expander codes,” IEEE Transactions on Information Theory, vol. 42, no. 6, pp. 1710–1722, Nov. 1996. [145] M. Skoglund, J. Giese, and S. Parkvall, “Code design for combined channel estimation and error protection,” IEEE Transactions on Information Theory, vol. 48, no. 5, pp. 1162–1171, May 2002. [146] D. Slepian, “On bandwidth,” Proceedings of the IEEE, vol. 64, no. 3, pp. 292–300, Mar. 1976. [147] D. Slepian and H. O. Pollak, “Prolate spheroidal wave functions, Fourier analysis and uncertainty—I,” Bell System Technical Journal, vol. 40, pp. 43–63, Jan. 1961. [148] S. Stein, “Fading channel issues in systems engineering,” IEEE Journal on Selected Areas in Communications, vol. 5, no. 2, pp. 68– 89, Feb. 1987. [149] S. Talwar, Blind space-time algorithms for wireless communication systems, Ph.D. thesis, Stanford University, Stanford, CA, Jan. 1996. [150] R. M. Tanner, “A recursive approach to low complexity codes,” IEEE Transactions on Information Theory, vol. 27, no. 5, pp. 533– 547, Sept. 1981.
158
References
[151] M. T¨ uchler, R. Koetter, and A. C. Singer, “Turbo equalization: Principles and new results,” IEEE Transactions on Communications, vol. 50, no. 5, pp. 754–767, May 2002. [152] S. ten Brink, “Convergence of iterative decoding,” Electronics Letters, vol. 35, no. 13, pp. 1117–1118, June 1999. [153] S. ten Brink, “Rate one-half code for approaching the shannon limit by 0.1 db,” Electronics Letters, vol. 36, no. 15, pp. 1293–1294, July 2000. [154] L. Tong, G. Xu, and T. Kailath, “Blind identification and equalization based on second-order statistics: A time domain approach,” IEEE Transactions on Information Theory, vol. 40, no. 2, pp. 340– 349, Mar. 1994. [155] H. L. Van Trees, Detection, Estimation, and Modulation Theory, Part I, John Wiley & Sons, New York, NY, 1968. [156] G. L. Turin, “Communication through noisy, random-multipath channels,” IRE National Convention Record, vol. 4, pp. 154–166, Mar. 1956. [157] G. L. Turin, F. D. Clapp, T. L. Johnston, S. B. Fine, and D. Lavry, “A statistical model of urban multipath propagation,” IEEE Transactions on Vehicular Technology, vol. 21, no. 1, pp. 1–9, Feb. 1972. [158] E. Uhlemann, “Hybrid ARQ using serially concatenated block codes for real-time communication: An iterative decoding approach,” Technical Report No. 374L, Chalmers University of Technology, Gothenburg, Sweden, Oct. 2001. [159] G. Ungerboeck, “Adaptive maximum-likelihood receiver for carriermodulated data-transmission systems,” IEEE Transactions on Communications, vol. 22, no. 5, pp. 624–636, May 1974. [160] M. C. Vanderveen, Estimation of parametric channel models in wireless communication networks, Ph.D. thesis, Stanford University, Stanford, CA, Dec. 1997.
159 [161] N. Wiberg, Codes and Decoding on General Graphs, Ph.D. thesis, Link¨ oping University, Link¨ oping, Sweden, Apr. 1996. [162] N. Wiberg, H.-A. Loeliger, and R. K¨ otter, “Codes and iterative decoding on general graphs,” European Transactions on Telecommunications, vol. 6, no. 5, pp. 513–526, Sept./Oct. 1995. [163] J. M. Wozencraft and I. M. Jacobs, Principles of Communication Engineering, John Wiley & Sons, New York, NY, 1965. [164] R. M. Young, An Introduction to Nonharmonic Fourier Series, Academic Press, New York, NY, 1980. [165] X. Yu and S. Pasupathy, “Innovations-based MLSE for Rayleigh fading channels,” IEEE Transactions on Communications, vol. 43, no. 2/3/4, pp. 1534–1544, Feb./Mar./Apr. 1995. [166] E. Zervas, J. G. Proakis, and V. Eyuboglu, “A quantized channel approach to blind equalization,” in Proc. IEEE International Conference on Communications, Chicago, IL, June 1992, pp. 1539–1543.
I find all books too long. Voltaire