Discrete-Input Two-Dimensional Gaussian Channels ...

IRWIN AND JOAN JACOBS

CENTER FOR COMMUNICATION AND INFORMATION TECHNOLOGIES

Discrete-Input Two-Dimensional Gaussian Channels with Memory: Estimation and Information Rates via Graphical Models and Statistical Mechanics Ori Shental, Noam Shental, Shlomo Shamai (Shitz), Ido Kanter, Anthony J. Weiss and Yair Weiss CCIT Report #679 December 2007

Electronics Computers Communications

DEPARTMENT OF ELECTRICAL ENGINEERING TECHNION - ISRAEL INSTITUTE OF TECHNOLOGY, HAIFA 32000, ISRAEL

CCIT Report #679

December 2007

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. XX, NO. Y, MONTH 200Z

1

Discrete-Input Two-Dimensional Gaussian Channels with Memory: Estimation and Information Rates via Graphical Models and Statistical Mechanics Ori Shental, Member, IEEE, Noam Shental, Shlomo Shamai (Shitz), Fellow, IEEE, Ido Kanter, Anthony J. Weiss, Fellow, IEEE, and Yair Weiss

Abstract Manuscript received September 18, 2006; revised September 18, 2007. The material in this paper was presented in part at the IEEE Information Theory Workshop (ITW), San-Antonio, Texas, USA, October 2004, and in the IEEE International Symposium on Information Theory (ISIT), Adelaide, Australia, September 2005. O. Shental was with the Department of Electrical Engineering-Systems, Tel Aviv University, Tel Aviv 69978, Israel. He is now with the Center for Magnetic Recording Research (CMRR), University of California - San Diego (UCSD), 9500 Gilman Drive, La Jolla, CA 92093, USA (e-mail: [email protected]). N. Shental is with the Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot 76100, Israel (e-mail: [email protected]). S. Shamai (Shitz) is with the Department of Electrical Engineering, Technion-Israel Institute of Technology, Haifa 32000, Israel (e-mail: [email protected]). I. Kanter is with the Minerva Center and Department of Physics, Bar-Ilan University, Ramat-Gan 52900, Israel (e-mail: [email protected]). A. J. Weiss is with the Department of Electrical Engineering-Systems, Tel Aviv University, Tel Aviv 69978, Israel (e-mail: [email protected]). Y. Weiss is with the School of Computer Science and Engineering, Center for Neural Computation, Hebrew University of Jerusalem, Jerusalem 91904, Israel (e-mail: [email protected]). The first two authors contributed equally to this work. Communicated by G. Kramer, Associate Editor for Shannon Theory.


Discrete-input two-dimensional (2-D) Gaussian channels with memory represent an important class of systems, which appears extensively in communications and storage. In spite of their widespread use, the workings of 2-D channels are still very much unknown. In this work we try to explore their properties from the perspective of estimation theory and information theory. At the heart of our approach is a mapping of a 2-D channel to an undirected graphical model, and inferring its a-posteriori probabilities using generalized belief propagation (GBP). The derived probabilities are shown to be practically accurate, thus enabling optimal maximum a-posteriori (MAP) estimation of the transmitted symbols. Also, the Shannon-theoretic information rates are deduced either via the vector-wise Shannon-McMillan-Breiman theorem, or via the recently derived symbol-wise GuoShamai-Verdú theorem. Our approach is also described from the perspective of statistical mechanics, as the graphical model and inference algorithm have their analogues in physics. Our experimental study, based on common channel settings taken from cellular networks and magnetic recording devices, demonstrates that under non-trivial memory conditions, the performance of this fully tractable GBP estimator is almost identical to the performance of the optimal MAP estimator. It also enables a practically accurate simulation-based estimate of the information rate. Rationalization of this excellent performance of GBP in the 2-D Gaussian channel setting is addressed.

Index Terms Two-dimensional channels, intersymbol interference, maximum a-posteriori estimation, information rate, generalized belief propagation, cluster variation method, Shannon-McMillan-Breiman theorem, Guo-Shamai-Verdú theorem, magnetic recording channels, multiple-access channels.

2


3

I. I NTRODUCTION In discrete-input Gaussian channels with memory, a.k.a. dispersive channels, a coded information symbol, chosen from a finite alphabet, is impaired by other transmitted symbols, and by an ambient additive white Gaussian noise (AWGN). The dependence between observations of the transmitted symbols, i.e. , the effect of memory or interference, may be formed over time, space, frequency or any other transmission aspect. The memory may be introduced either by design, as in the case of error-correcting codes, or by nature, as information is dispersed while propagating via the physical channel medium. The memory is primarily related to the dimensionality of the system, i.e. , the particular extent and structure of interference. In this paper we consider two-dimensional (2-D) channels, which are characterized by symbols ordered, at least logically, on a 2-D grid, causing interference in a limited neighborhood. Such channels play a fundamental role in various applications in modern communications and storage, and several examples of such applications are concisely stated below. A popular and important instance of this class of channels is 2-D intersymbol interference (ISI) channels, which appear, e.g. , in magnetic and optical recording devices [1]. In the 2-D ISI channel, symbols, stored on a magnetic or optical planar medium, interfere with adjacent symbols during their retrieval process. A second example concerns multiple-access (MA) channels in cellular networks. In a seminal work [2], Wyner introduced a simple, yet insightful, analytically solvable uplink model for cellular networks with continuous Gaussian signaling. This model yields a considerable insight into the ultimate information-theoretic limits of realistic cellular networks (e.g. , the 3rd generation Universal Mobile Telecommunication System standard, UMTS [3]) and commercial wireless local area networks (W-LAN). In addition to a naive one-dimensional (1-D) cell array extension of a single cell system, Wyner also analyzed the traditional 2-D hexagonal topology, where multiple-access interference (MAI) arises not only between intra-cell users, but is also caused by users within geographically neighboring cellular tiers. Hence, Wyner’s model can be viewed as an instance of dispersive channels. Another example of spatial interference stems from symbols transmitted (received) simultaneously from (in) nearby antennas in an array used in multiple-in multiple-out (MIMO) systems, for instance, in a single cell base-station with a planar antenna array and widely spaced antenna elements or in pixelated wireless optical channels [4].


4

A thorough analysis of 2-D channels involves both estimation-theoretic and informationtheoretic aspects. In the following, the state-of-the-art in these two major facets is reviewed, starting with estimation, then moving to information-theoretic issues. As in any other dispersive channel, the main task in communication through a 2-D channel is to overcome interference and noise, in order to correctly estimate the transmitted symbols. In the case of ISI this process is termed channel equalization [5], while in MA it is called multiuser detection (MUD, [6]). Optimal estimation, or detection, can be achieved by a maximum a-posteriori (MAP) joint sequence decision based on the matched filter (MF) outputs of all corrupted symbols, and yields a significant capacity improvement over the conventional MF detector with symbol-wise hard-decision. However as we shall see, its complexity for 2-D channels is exponential in the grid’s width, thus being impractical even for rather small channels. In fact, the NP completeness of the 2-D channel detection problem has recently been shown [7]. Hence, various practical sub-optimal detection methods have been proposed (e.g. , [8]–[15]). For example, Marrow and Wolf [14] evaluated the performance of several iterative estimation methods for the binary-input 2-D ISI channel. Their detection schemes operate iteratively on the rows and columns of the 2-D channel and approximate the optimal detector’s bit error rate (BER) to within 0.5dB. As for information-theoretic aspects, the capacity of discrete-input dispersive channels is given as the maximum mutual information rate over all discrete-input distributions. Computing this capacity for 1-D and 2-D channels is a longstanding open problem. Calculating the mutual information rate in the case of a predefined stationary input distribution is, in principle, a simpler problem. For example, for input symbols which are identically independently distributed (i.i.d.) and equiprobable, this is termed the symmetric (a.k.a. uniform-input) information rate (SIR), thus providing a limit on the achievable rate of reliable communication in this common case. Various bounds, either rigorous [16]–[18], numerical [19], [20] or conjectured [21], on the capacity and SIR of certain discrete-input 1-D dispersive channels have been proposed. Several authors have recently introduced simulation-based methodologies for computing such information rates (Arnold et al. [22] and references therein). In this approach, the forward recursion of the sum-product (BCJR) algorithm [23] is used for estimating the a-posteriori probability (APP) and consequently deriving the 1-D information rates. As for 2-D channels, due to their inherent complexity Chen and Siegel introduced upper and lower bounds on the information rate [24]–


5

[27]. In spite of the efforts reported in the aforementioned contributions on 2-D channels, neither a near-optimal practical detector nor an accurate, yet tractable, computation of the information rate has been proposed for this important class of channels. In this paper, we apply simulationbased methodologies, originally developed in the interrelated fields of probabilistic graphical models [28]–[30] and statistical mechanics [31], [32], for the analysis of 2-D channels. This analysis yields practically accurate schemes for optimal detection [33] and information rate computation [34]. The basic observation is that 2-D channels can be viewed as an undirected graphical model, a.k.a. pairwise Markov random field (MRF, [35]), and that performing inference in the graphical model yields the required estimation-theoretic and information-theoretic values. As our inference engine we apply a fully tractable generalized belief propagation (GBP, [36], [37]) algorithm, which yields a practically accurate APP. The marginal probabilities are then used for detection, and also for estimating the information rate, via the Guo-Shamai-Verd u´ (GSV) theorem. As an alternative approach the information rate is also estimated using the joint APP, via the ShannonMcMillan-Breiman (SMB) theorem, where the connection to statistical mechanics is established. Graphical models provide powerful tools for exact (optimal) or approximate inference. For example, loopy belief propagation (LBP, sum-product algorithm, [38]) is an efficient way to solve inference problems in graphical models, which has been shown to serve remarkably well as a decoding engine [39] in low density parity check (LDPC, [40]) codes and Turbo codes [41]– [43]. 1-D channels, depicted by a tree graph, can be also accurately analyzed, as was shown by Arnold et al. [22], using LBP which is equivalent in this case to the BCJR algorithm. Hence, our approach can be viewed as an extension of its 1-D Monte-Carlo counterpart, where GBP replaces LBP, due to the latter’s poor performance on loopy graphs 1 . Our findings are of practical and theoretical interest. They may enable more reliable communications through the 2-D channel and denser data storage devices, along with a better understanding of the theoretical limits of such systems. The paper is organized as follows. Section II introduces the dispersive 2-D channel model, 1

Recently, Pakzad and Anantharam [44] have applied GBP to the problem of joint decoding of a LDPC code and a 1-D ISI

channel. The performance of GBP in this 1-D case was shown to outperform the best conventional iterative method.


6

while Section III presents its connection to graphical models. Section IV discusses the issue of exact and approximate inference in the graphical model. The outputs of our inference algorithm are both the marginal and the joint probability distributions. In Section V we apply the marginal probabilities for performing detection and for estimating the information rate via the GSV theorem. In Section VI we use the joint probability distribution for estimating the information rate via the SMB theorem, which leads to the connection to statistical mechanics. Section VII provides thorough simulation results for both detection and information rate of 2-D ISI and MA channels in various topologies. The remarkable performance of GBP in this problem is also discussed in Section VII. We conclude in Section VIII. We shall use the following notations. The operator {·}T denotes a vector or matrix transpose, the matrix IN is an N × N identity matrix, while the symbols {·}i and {·}ij denote entries of a vector and matrix, respectively. The operators E{·} and k · k stand for expectation and Euclidean norm of a vector, respectively, while log is the natural logarithm. II. C HANNEL M ODEL Consider an N × N 2-D discrete-input channel with memory in the form X yk,l = xk,l + vk,l + αi,j,k,l xi,j ∀k, l = 1, . . . , N,

(1)

(i,j)∈hk,li

where yk,l , the channel’s output observation at symbol (k, l) ∈ Z 2 , is the sum of the finite alphabet input symbol xk,l , assumed to be taken from a stationary process, and two additional terms. The first term vk,l represents the common ambient AWGN, while the second term is the scaled interference caused by adjacent symbols to (k, l), denoted by hk, li. The parameter

αi,j,k,l (|αi,j,k,l | ≤ 1)2 controls the interference attenuation. The interference term is assumed to

be spatially invariant3 (excluding boundary symbols, as we do not assume periodic boundary conditions), which together with the assumptions regarding x k,l and vk,l guarantees that yk,l

(k, l = 1, . . . , N ) are stationary random variables4 . We also assume that the channel (including 2

This limit on α arises from the physical model, but is not crucial to our approach.

3

Assuming αi,j,k,l = αk−i,l−j being spatially invariant, the summation in the channel model (1) is simply a convolution.

4

Note that stationarity is needed only in order to enable the proper definition of the mutual information rate. For estimation

purposes, stationarity does not have to be assumed (e.g. , see Section VII-A where α i,j,k,l are taken randomly).


7

the noise variance) is perfectly known on the receiver’s side, which can jointly process all observations. Stacking all the observations, data symbols and noise samples into N 2 × 1 vectors y, x and v, respectively, the channel (1) can be rewritten as y = Sx + v,

(2)

where the N 2 × N 2 square matrix S encapsulates the memory/interference structure, which uniquely defines the channel. Our basic assumption, which later allows for a graphical model interpretation, is that interference is caused by neighboring symbols, i.e. , matrix S is a relatively sparse matrix. The upper pane in Fig. 1 represents the interference structure of four 2-D topologies: two ISI examples (a) and (b), a rectangular cellular network (c), and the typical Wyner hexagonal cellular network (d). In all four examples, 9 symbols (full nodes denoting bits in this case) are ordered on N × N = 3 × 3 grids, which differ in the way the bits interfere with each other. In the ISI examples, where the memory is unidirectional, an information bit is interfered with preceding bits along the same row and column (a) and also along the diagonal (b). In the case of cellular networks the memory is bidirectional, hence a bit is corrupted by both preceding and succeeding bits along its row and column (rectangular network, (c)) and also along the diagonal (hexagonal network, (d)). Note that ISI in 2-D storage systems can also exhibit bidirectional memory effects. In the following derivations, real-space data signaling x, interference S and noise v ∼

N (0, σ 2 IN ) are assumed. An extension to the complex domain is straightforward. The joint posterior probability distribution of the channel is 1 2 Pr(x|y) ∝ Pr (x) exp − 2 ||y − Sx|| , 2σ

(3)

where Pr (x) is the prior input distribution. Hereinafter, for purposes of exposition only, we consider an equiprobable i.i.d. binary-input alphabet, i.e. , x ∈ (−1, +1) N ×N . The binary alphabet is of interest, in both multi-user frameworks, as well as the magnetic recording setting. Hence, the conditional distribution (3) in this case takes the following form X 1 X Rij xi xj − h i xi , Pr(x|y) ∝ exp − 2 σ i (i>j)

(4)


8

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

(k)

(l)

Fig. 1. Upper pane: Interference structures for four types of 3 × 3 2-D channels: (a) and (b) ISI grids; (c) rectangular cellular network; (d) hexagonal Wyner cellular network. The arrows mark the direction of interference. Middle pane: The corresponding undirected graphical model representation of the channels in the upper pane: (e) and (f) ISI grids; (g) and (h) rectangular and hexagonal cellular networks, respectively. Full nodes represent (hidden) transmitted bits, while empty nodes correspond to the observations. Interaction couplings (compatibility functions) ψ ij are denoted by a solid line connecting two full nodes, while the external field potential (evidence) φi is depicted by a solid line connecting a full node and an empty node. For clarity we use dotted edges in (f), (g) and (h) to represent the extra edges added compared to the graphs (e), (f) and (g), respectively. Lower pane: The corresponding factor graph representation of the channels in the upper pane: (i) and (j) ISI grids; (k) and (l) rectangular and hexagonal cellular networks, respectively. Following common notation, the variables are denoted by empty circles and the local factor functions by full squares. The Factor graph’s variables and functions are enumerated from left to right, corresponding to a left-right/up-down numbering of the nodes in the 2-D grids.

where the symmetric matrix R = ST S is the interference cross-correlation matrix, and the vector h = ST y is the output vector of a filter matched to the channel’s interference structure. The notation (i > j) stands for a summation over all non-zero entries in the upper triangular of the


9

symmetric cross-correlation matrix R. In the remainder of this paper the main goal is to efficiently calculate either the joint posterior P probability Pr(x|y) or its marginal probabilities Pr(x i |y) = x\xi Pr(x|y), where summation is

performed over all symbols except xi . In order to do so we adopt the methodology of graphical models. III. T HE C ONNECTION TO U NDIRECTED G RAPHICAL M ODELS Undirected graphical models appear in many fields of research, e.g. , spin systems in statistical physics [45], computer vision [46], digital communications and error correcting codes [30], [47]. An undirected graphical model with pairwise potentials (a.k.a. pairwise MRF [35]), consists of

a graph G and real positive potential functions ψij (xi , xj ) and φi (xi ) such that the probability of an assignment x is given by Pr(x) ∝

Y

ψij (xi , xj )

Y

φi (xi ).

(5)

i

(i>j)

Here the previously-defined notation (i > j) represents the set of all edges connecting pairs of nodes in the graph. The implicit assumption is that (i > j) contains only relatively small sets of variables, such that the dependencies between the variables are ‘local’ (as opposed to ‘global’ dependency as in, for example, a fully connected graph.) The model is undirected since the potentials do not contain causal dependencies among the variables as in the case of directed graphical models (i.e. , Bayesian networks [28]). Hence, the posterior probability distribution (4) defines the following undirected graphical model Pr(x|y) ∝

Y

ψij (xi , xj )

(i>j)

Y

φi (xi , hi ),

(6)

!

(7)

i

where Rij xi xj ψij (xi , xj ) = exp − σ2

is a compatibility function (interaction coupling), i.e. , ‘potential’ representing the structure of the system, and the potential φi (xi , yi ) = exp

h i xi σ2

!

(8)


10

is the ‘evidence’ (local likelihood or external field), which describes the statistical dependency between the hidden variable xi and the observed variable hi .5 The middle pane in Fig. 1 presents the corresponding graphical model of our four application examples - ISI (e) and (f), a rectangular cellular network (g) and a hexagonal cellular network (h). Full nodes represent transmitted bits, while empty nodes correspond to the observations. Compatibility functions ψij are denoted by a solid line connecting two full nodes, while the evidence φi is depicted by a solid line connecting a full node and an empty node. Fig. 1 also displays (lower pane) the corresponding factor graphs, c.f. [47]. Estimating the joint posterior probability (4) or its marginals is termed ‘inference’, and the field of graphical models has developed tools for either calculating them exactly or approximately. In the next section we present the intractability of exact inference over 2-D channels, and then discuss the issue of approximate inference using belief propagation and its generalizations. IV. I NFERENCE IN G RAPHICAL M ODELS A. Exact Inference The canonical algorithm for exact inference is the junction-tree algorithm [48]. The algorithm converts a general graph into an equivalent tree whose nodes contain clusters (cliques) of nodes of the original graph. Inference can then be performed by passing messages forward and backward among neighboring cliques in the tree. The complexity of the algorithm is exponential in the size of the largest clique in the derived tree, which for N × N grid-like graphs, such as our 2-D problem, is N times the memory depth of the channel. Hence, exact inference becomes impractical for large grids, or even for moderate size graphs (e.g. , a small network of tens of cells) with non-binary modulation, or for non-trivial memory effects. Therefore one must resort to approximate inference methods. 5

In the case of a non-equiprobable i.i.d binary-input alphabet the MRF modelling is identical, except for an additional external

field potential operating on each hidden variable node, which can be absorbed into φ i . The same occurs for the non-binary discrete-input alphabet case, where an external field potential, which arises from the auto-correlations R ii , is added and should be absorbed into φi . In the case of correlated input (e.g. , a Markov process), additional general potentials appear, and the graph may no longer be defined as a pairwise MRF, although the proposed approach still holds.


11

B. Approximate Inference 1) Loopy Belief Propagation: Loopy belief propagation (LBP) is equivalent to applying Pearl’s local message-passing algorithm [38], originally derived for trees, to a general graph even if it contains cycles (loops). As mentioned previously, LBP has been found to have outstanding empirical success in many applications, e.g. , in decoding Turbo codes and LDPC codes. The excellent performance of LBP in these applications may be attributed to the sparsity of the graphs, which ensures that cycles in the graph are long, and inference may be performed as if it were a tree. Can LBP be used over dispersive 2-D channels? Our studies show that LBP frequently fails to converge (for both synchronous and asynchronous message update scheduling) and therefore its associated performance is poor. This result is not surprising since, in contrast to the sparse graphs of LDPC codes, the graph of 2-D channels contains many short cycles. As a result LBP’s implicit tree-like assumption does not hold, and its approximation is poor. In order to recover the near-optimal characteristics of message-passing inference one must circumvent this problem. 2) Generalized Belief Propagation: The generalized belief propagation (GBP) algorithm [37] is an extension of LBP, which has been shown to provide better approximations than LBP in many applications. In GBP, messages are sent between clusters (regions) of nodes, in contrast to LBP, where messages are passed along the graph’s edges. The GBP algorithm has the nice property of reducing to LBP when taking any pair of nodes connected by an edge on the MRF to be a region. The ‘region graph’ approach to GBP begins by defining ‘basic’ regions, which completely cover the graph and include all linked nodes. Let R be this set of regions, their intersections, the intersections of the intersection, and so on. Based on the set R, a ‘region graph’ of the graphical model can be composed, along which messages are passed in an analogous way to LBP. An example of such basic regions and the resulting region graph appears in Fig. 2. The initial issue in applying GBP concerns the selection of the basic regions. A smart choice of the basic regions will be such that it encompasses all nodes along the shortest loops. Since inference within a basic cluster is performed exactly, we may avoid the short loops which probably caused LBP not to converge. As for complexity, it is striking to reveal that the GBP involves only minimally more computations than LBP, as the complexity of GBP grows exponentially only with the size of the chosen


(a) Fig. 2.

12

(b)

(a) Covering a 4 × 4 nodes graph using a 3 × 3 basic region in an overlapping up-and-down sliding window manner,

and (b) the corresponding region graph, with two-region intersection levels.

basic region. Enlarging the basic region often entails a more accurate inference at the cost of complexity. For an elaborate presentation of GBP, see Yedidia et al. [37], and more specifically Appendix E in [37] for the ‘two-way GBP algorithm’ which we apply. Can GBP be applied to 2-D channels? Since the graphical models of our 2-D channels contain interactions between nearest neighbors and next-nearest neighbors, as displayed in Fig. 1-(e,f,g,h), a natural choice of overlapping regions is a sliding 3 × 3 square of nodes (e.g. , see Fig. 2(h)), which were used in all of our simulations. This size of basic regions is small enough to perform exact inference within each region, yet still can well capture the interaction between neighboring nodes. Our empirical study shows that the GBP message-passing algorithm, which is fully tractable, yields remarkable results in 2-D channels, even for harsh interference conditions. V. M ARGINAL P ROBABILITIES -BASED A NALYSIS A. Detection The marginal probabilities Pr(xi |y) estimated by GBP are applied in two contexts. The first and more straightforward application is detection, as the estimated symbol is simply arg max xi Pr(xi |y). As described in Section VII, our detected symbols are practically identical to optimal detection, which basically reflects the accuracy of estimated marginal probabilities. B. Information rate: Guo-Shamai-Verdu´ theorem The second application of the marginal probabilities is to estimate the symmetric information rate (SIR) via the recently derived Guo-Shamai-Verdu´ (GSV) theorem [49]. The GSV theorem


13

reveals a simple, yet powerful relationship between information and estimation theories, as it bridges over the notions of Shannon’s mutual information and minimum mean-square error (MMSE) for the common AWGN channel. The GSV theorem for the vector Gaussian channel may be stated in the following way. Consider a real-valued channel with input and output random variable vectors x and y, respectively, of the form y=

√ snr Hx + n,

(9)

where snr ≥ 0 is the channel’s signal-to-noise ratio (SNR), H is a deterministic L × K matrix, and n ∼ N (0, IL ) is a standard Gaussian noise independent of x. The input x and the output y are column vectors of appropriate dimensions. The input-output mutual information, I(x; y), and the MMSE in estimating Hx denoted by mmse(snr), maintain the following relation. GSV Theorem. [49, Theorem 2] For every input distribution P (x) that satisfies Ekxk 2 < ∞, 1 d I(x; y) = mmse(snr). dsnr 2

(10)

Hence by adapting the GSV theorem to our problem, we may evaluate mmse(snr) = E{k Sx − Sˆ x(y; snr) k2 }

(11)

via the marginal probabilities Pr(xi |y) estimated by GBP, where the conditional mean estimate ˆ (y; snr) , {ˆ is defined by x x1 , . . . , xˆN 2 }T , with

xî , Pr(xi = 1|y) − Pr(xi = −1|y),

(12)

and snr = 1/σ 2 . The expectation over all instances of the input x and output y in calculating the MMSE (11) is performed empirically by averaging over a sufficiently large 2-D channel grid. Integrating over snr one gets the desired mutual information I(x; y). Now the information rate is given by the mutual information per symbol over the sufficiently large symbol block. VI. J OINT P ROBABILITY-BASED A NALYSIS The second approach to estimating the information rate corresponds to estimating the joint probability distribution of the channel. This is a complementary approach to the former GSV approach, which also draws an interesting connection to statistical mechanics, as described below.


14

The information rate, i.e. , mutual information per symbol, between the channel’s input X and output Y is I(X ; Y) = h(Y) − h(Y|X ),

(13)

where h(·) are (differential) entropy rates, where, by definition, the entropy rate h(Q) of a stationary process q = {q1 , . . . , qL }T is given by limL→∞ h(q)/L. Let us deal separately with the two terms of the information rate (13). The second term, h(Y|X ), is given by limN →∞ h(y|x)/N 2 , but since h(y|x) = h(v), it is straightforward to validate that h(Y|X ) = (log 2πeσ 2 )/2. As for the term h(Y) we apply the Shannon-McMillan-Breiman theorem6 , which states that for a stationary and ergodic channel the entropy rate can be calculated by −

1 N →∞ log p(y) −→ h(Y) with probability 1, 2 N

(14)

where p(y) is the joint distribution of the channel’s output y. Hence, in order to calculate the information rate one needs to calculate p(y) in the limit of large systems. Applying Bayes’ law and using the channel model (1), p(y) can be rewritten as X X p(y) = p(y|x) Pr(x) = pv (y − Sx) Pr(x), x

where

P

x

(15)

x

corresponds to a sum over all the possible values of the transmitted symbols x. As we

consider the case of an equiprobable i.i.d. binary-input alphabet, using the AWGN’s probability density function, pv (·), the joint distribution of the channel’s output (15) can be rewritten as 2

p(y) = Z(y) · (2C)−N , where C , (2πσ 2 )1/2 , and Z(y) ,

X

exp

x

1 − 2 ||y − Sx||2 2σ

(16)

(17)

is the partition function of the channel’s joint posterior distribution (3). Inserting (16) into the entropy rate (14), I(X ; Y), in nats, can be written as N →∞

log 2 − 1/2 + F −→ I(X ; Y) with probability 1, 6

(18)

Proof of the SMB theorem for multidimensional channels, extending its 1-D version [50], can be found in Orenstein and

Weiss [51] and the references therein (e.g. , for processes indexed by Z d , d ≥ 2 [52] or even for all discrete amenable groups [53]). The SMB theorem also applies to continuous random variables [54].


15

where F ,−

1 log(Z(y)) N2

(19)

is recognized as the free energy per symbol [31], [55]. Hence, the problem of calculating the symmetric information rate boils down to estimating the free energy of an infinite system. The next section provides the statistical mechanics view of 2-D channels and draws the connection to GBP. A. A Statistical mechanics view of 2-D channels and the connection to GBP The connection between 2-D channels and basic models in statistical mechanics stems from the 2-D channel’s model. The posterior probability (4) actually corresponds to a Boltzmann distribution of an Ising Hamiltonian, with pairwise interactions R ij and external random fields hi [31]. The difficulty in estimating the posterior probability (4) lies in estimating the partition function (17), or similarly, the free energy (19). The field of statistical mechanics has devoted considerable effort to the development of methods for calculating the free energy. However, evaluating the free energy of infinitely large 2-D channels (1) is infeasible due to the intractability of computing p(y), and one must resort to approximate methods. One of the classic approximation methods for free energies is the Kikuchi approximation, also known as the cluster variation method (CVM, [37]). The CVM follows a variational principle: It defines the free energy as a functional of p(y), F(p(y)), replaces p(y) by a tractable trial belief vector b(y), and seeks to minimize F(b(y)). The trial belief vector in CVM is taken as Q b(y) , λ∈M p(yλ )cλ , where λ is a ‘cluster’ of ‘neighboring’ observations y λ , taken from the

set of ‘clusters’ M. The integers cλ , a.k.a. counting numbers, are provided by the CVM in order to ascertain that each symbol is counted exactly once in the corresponding free energy. Since

b(y) depends only on local probabilities, p(yλ ), its computation is tractable7 . Then, minimizing the functional F(b(y)) w.r.t b(y), yields an approximation to the free energy. Recently, Yedidia et al. [37] have shown a correspondence between the stationary points of the CVM-based free energy and the fixed points of GBP. Note, in passing, that Yedidia et al. , have also shown that the fixed points of LBP correspond to the stationary points of the Bethe 7

However, b(y) need not necessarily form a valid probability distribution function [37].


16

free energy, which is a special case of the CVM, in the same way that BP is a special case of GBP. For an elaborate discussion of CVM and its relation to GBP, see Yedidia et al. [37]. Hence, our idea for estimating the information rate is to use GBP to find the minimum (or stationary points) of the CVM of a large enough, yet finite, system, thus assuming that the computed free energy per symbol converges to its exact value for infinite systems. VII. R ESULTS AND D ISCUSSION A. Estimation The performance of the proposed GBP detector was evaluated using Monte-Carlo simulations of three examples of the dispersive 2-D channel: a 2-D ISI channel, and two topologies of a cellular network - rectangular and hexagonal (see Fig. 1-(a), Fig. 1-(c) and Fig. 1-(d), respectively.) The performance of GBP was compared to the optimal detector and to several other standard detectors. GBP with a standard8 parallel (synchronous) message-passing schedule was applied. Convergence is achieved when the total difference between beliefs along iterations is lower than 10−8 . In many of our experiments the LBP detector did not converge in a substantial percentage of the simulations, for both synchronous (parallel) and asynchronous (serial) message scheduling 9 , in contrast to the consistent convergence of GBP. As a result, its performance was worse than an MF detector and in certain cases was close to a random guess. Hence, LBP’s results are omitted from the performance evaluation figures. 1) 2-D ISI Channel: Following a recent contribution [14], we simulated a 6 × 6 binary ISI channel, under two non-trivial constant interference levels, α i,j,k,l = α = 0.5 and α = 0.75. Outside the grid the symbols were assumed to have value (−1). Fig. 3 compares the equalization performance of the optimal detector and GBP in terms of average BER per symbol as a function of SNR10 . 8

GBP always converged without the need to use ‘damping’, i.e. , averaging old and newly computed messages.

9

The stopping criteria used for LBP were much looser than the one for GBP.

10

There are different reasonable definitions of SNR for channels with ISI. Besides the definition we use, i.e. , snr = 1/σ 2 ,

it is also common to define snr =k h k2 /σ 2 , where h is the Kronecker delta impulse response of the channel. Evidently, the comparison of the error and information rates, under different α values and SNR, may change depending on the definition of the latter.


17

−1

10

α=0.5 MAP GBP α=0.75 MAP GBP

−2

10

−3

BER

10

α=0.75

−4

10

α=0.5

−5

10

−6

10

3

4

5

6

7

8

9

10

11

12

SNR [dB]

Fig. 3. 2-D ISI equalization in a 6 × 6 channel for αi,j,k,l = α = 0.5 and α = 0.75. Equalization performance of GBP (line) vs. optimal (MAP, squares) detectors in terms of average BER per symbol, as a function of SNR. The full squares and solid line correspond to α = 0.5, while empty squares and a dashed line correspond to α = 0.75.

As can be seen, the GBP error decreases with SNR and its performance coincides with MAP error levels or is extremely close to it. The standard deviation of the results (including those in subsequent sections) was small, thus here, and hereinafter it is omitted from the figures (unless otherwise specifically stated). For comparison, note that the performance of the best iterative detector proposed by Marrow and Wolf [14] for this setting is approximately 2/3dB above the optimal BER. To avoid confusion, note that in Figs. 3-7 the GBP curves have no markers, but the MAP performance points - represented by markers alone - fall exactly on top of the GBP lines. The performance of the GBP detector in a similar setting, but for a larger 20 × 20 channel with α = 0.5, was also evaluated. As MAP equalization is infeasible for such a system, GBP’s performance was compared to an analytical lower bound on the optimal error probability. This bound was obtained by assuming that all interfering bits are known at the receiver’s side, thus transforming the dispersive channel into a memoryless one, where optimal detection is achieved by using a simple MF. Fig. 4 displays the equalization performance of GBP and the lower bound on the optimal error probability. The results show that the GBP detector’s BER is very close to the bound, which implies that its performance is also near-optimal for this system. Moreover, apart from providing the correct hard decisions, GBP infers the marginal probabilities which may become useful in coded systems. We observed empirically in all our experiments


18

GBP Lower Bound on MAP

−2

10

−3

BER

10

α=0.5

−4

10

−5

10

−6

10

5

6

7

8

9

10

11

12

SNR [dB]

Fig. 4.

2-D ISI equalization in a 20 × 20 channel for α = 0.5. Equalization performance of a GBP detector (solid line) vs.

lower bound (dashed line) on optimal (MAP) detector error probability in terms of average BER per symbol, as a function of SNR.

that the marginal beliefs in GBP well-approximate the APP. Thus GBP may also operate, e.g. , as a detection stage in an iterative joint detection and decoding scheme.

−1

10

−2

10 BER

Fixed α MF MMSE GBP MAP Random α MF MMSE GBP MAP

−3

10

−4

10

0

2

4

6

8

10

SNR [dB]

Fig. 5. MUD in a 9 × 9 rectangular topology. Average BER per cell as a function of SNR, for the optimal (MAP, squares) and GBP (line) detectors. Also shown are the BER for the linear minimum mean square error (MMSE, circles and line) and single-user matched filter (MF, triangles and line) detectors. The BER was evaluated for two channel profiles of fixed (α i,j,k,l = α = 0.5, full markers and solid lines) and random interference (α ∼ 0.5 · N (0, 1) and hard-limited at ±1, empty markers and dashed lines).


19

2) Rectangular Topology Cellular Network: A cellular network of 9×9 cells 11 with rectangular topology (Fig. 1-(c)) was also simulated. Inter-cell interference scaling was either constant αi,j,k,l = α = 0.5, or taken from a zero-mean Gaussian distribution with standard deviation αstd = 0.5, hard-limited at α = ±1. The latter corresponds to a random MAI which may be interpreted as being caused by channel fading. Fig. 5 presents the performance of the optimal MAP detector, the GBP-MUD and also two other standard detectors - the linear MMSE and the naive single-user MF detectors [6]. (Note that the MF and MMSE BER in Figs. 5-7, in contrast to the GBP and MAP BER, are represented by curves composed of lines and markers.) It may be observed that GBP practically coincides with the optimal MUD for both constant and random interference scenarios, while MMSE and MF are substantially inferior. Note that another attractive property of the GBP detector in the Wyner-like cellular MUD context is its potential decentralized implementation. GBP’s messages between neighboring regions (or cells) may be implemented in the network itself, instead of being computed in a central processor, as imposed by the other detectors.

−1

10

BER

Fixed α MF MMSE GBP MAP Random α MF MMSE GBP MAP

−2

10

−3

10

0

2

4

6

8

10

SNR [dB]

Fig. 6. MUD in a 9×9 hexagonal topology. Average BER per cell as a function of SNR, for the optimal (MAP, squares) and GBP (line) detectors. Also shown are the BER for the linear minimum mean square error (MMSE, circles and line) and single-user matched filter (MF, triangles and line) detectors. The BER was evaluated for two channel profiles of fixed (α i,j,k,l = α = 0.5, full markers and solid lines) and random interference (α ∼ 0.5 · N (0, 1) and hard-limited at ±1, empty markers and dashed lines).

11

This was the largest system for which exact inference was possible.


20

3) Hexagonal Topology Cellular Network: Another instance of the 2-D model, where a 9 × 9 cellular network is planned according to a hexagonal topology (Fig. 1-(d)), was studied empirically. As in the rectangular case the inter-cell interference was either constant or random. Fig. 6 compares the GBP-MUD to the optimal MAP, and to MMSE and MF detectors, showing similar performance as in the rectangular case. We also evaluated the performance of the GBP-MUD in an additional complementary setting, in which the SNR level was fixed (4dB), and the interference scaling α was varied. We used α ∼ αstd · N (0, 1), hard-limited at α = ±1, where 0 ≤ αstd ≤ 1. Fig. 7 displays the performance of GBP, MAP, MMSE and MF detectors in this setting. GBP is extremely close to the optimum for all interference levels, even for harsh (α → 1) conditions.

MF MMSE GBP MAP

−1

BER

10

α~αstdN(0,1) SNR=4dB

−2

10

0

0.2

0.4

0.6

0.8

Interference Standard Deviation αstd

1

Fig. 7. MUD in 9 × 9 hexagonal topology over a range of random interferences. Average BER per cell as a function of random interference standard deviation αstd , for the optimal (MAP, squares), GBP (solid line), MMSE (circles and solid line), and MF (triangles and solid line) detectors. The BER was evaluated for a random interference scenario α ∼ α std · N (0, 1) (hard-limited at ±1) and a fixed SNR (= 4dB).

As for computation, note the enormous advantage in using a GBP detector with complexity dominated by the order of the number of all discrete-input combinations within the basic region (e.g. , O(29 ) for the systems examined in Figs. 3-7), in contrast to the O(|ℵ| N ×N ) complexity of the optimal MAP detector, where |ℵ| denotes the size of the input alphabet. As shown in the experimental study illustrated in Figs. 3-7, this significant reduction in complexity almost does not affect the BER. The sub-optimal MF and MMSE detectors are of polynomial complexity, namely O(N × N ) and O({N × N }3 ), respectively, however resulting in considerably inferior


21

error performance. To conclude this section, we show how well the marginal beliefs (and not only the BER) inferred by GBP approximate the true (MAP) marginals. For a 9×9 hexagonal topology network with interference scaling α = 0.5, the absolute value of the difference between the marginals computed using GBP and optimal MAP detectors is obtained for each of the 81 nodes. The maximal difference (out of the 81 results) is then recorded for 100 simulation rounds. Fig. 8 plots the median over this maximal difference record, i.e. δ = med( max Pr(xi = 1|y)MAP − Pr(xi = 1|y)GBP ), i=1,...,81

(20)

as a function of SNR. As expected based on the above BER results, the difference is miniscule, on the order of 10−3 and less. As LBP did not converge, in either synchronous or asynchronous modes, for a majority of the simulation rounds in the hexagonal topology case, it is omitted from the figure. −2

10

α=0.5 −3

10

−4

δ

10

−5

10

−6

10

−7

10

−8

10

0

2

4

6

8

10

SNR

Fig. 8.

The median (over 100 rounds), δ, of the maximal difference between the marginals computed using GBP and optimal

MAP detectors vs. SNR in 9 × 9 hexagonal topology with α = 0.5.

B. Information Rate The proposed GBP-based algorithm is also used for estimating the SIR of two examples of dispersive 2-D channels: a 2-D ISI channel and a hexagonal Wyner cellular network. The two aforementioned schemes for computing the information rate, i.e. , the symbol-wise GSV-based and the vector-wise SMB-based, were applied. Both approaches gave practically the same results,


22

labelled as ‘SIR’ in the following figures. Since GBP well-approximates the joint probability, and its subsequent marginals, both schemes may equally be applied (and were applied in our experimental study, producing the same information rates). The following results were obtained by averaging over 1000 realizations of 30 × 30 channels12 . 1 0.9

Bit Per Symbol

0.8 0.7 0.6 0.5 0.4 0.3 α=0.5

0.2

UB SIR LB

0.1 0

−5

0

5

10

SNR [dB]

Fig. 9. 2-D ISI channel. Estimated SIR, in terms of bit per symbol, evaluated using GBP-based simulations (squares and a solid line), as a function of SNR for α = 0.5. Also shown are upper (UB) and lower (LB) bounds (dashed-dotted) on the SIR [24], [25].

1) 2-D ISI Channel: Fig. 9 presents the estimated SIR, in terms of bit per symbol, computed using the GBP-based algorithms, as a function of SNR, for a binary ISI channel with non-trivial (α = 0.5) interference structure as depicted in Fig. 1-(b). Also drawn are the lower and upper bounds on the SIR, recently suggested by Chen and Siegel [24], [25]. As can be seen, the evaluated SIR increases with SNR up to the trivial 1-bit Shannon bound. The estimated SIR curve falls within these tight bounds and agrees with them perfectly. We would like to comment that recently another simulation-based method for estimating the mutual information of a certain instance of 2-D channels in optical data storage systems, known as the full-surface channel, was presented [56]. In this method the joint APP is approximated using a heuristic greedy algorithm which has a stochastic element. However, our approach seems to be more promising in obtaining good estimates of the information rates. 12

A grid width of 30 was chosen based on our free-energy approximation quality analysis, as discussed in Section VII-C.


23

2 1.8

Capacity SIR

K=1

Bits Per Channel Use

1.6 1.4 1.2 SNR=8dB

1 0.8 SNR=0dB

0.6 0.4 0.2 0 0

Fig. 10.

SNR=−10dB

0.2

0.4

0.6

Interference Scaling α

0.8

1

2-D Hexagonal Wyner Cellular Network. Estimated SIR (squares and solid line) and Gaussian signaling capacity

(dashed line), in bits per channel use, as a function of the inter-cell interference scaling α for three SNR values: −10, 0 and 8dB, and a single user within each cell (K = 1).

2) 2-D Hexagonal Wyner Cellular Network: In a similar way, the estimated SIR of a hexagonal topology Wyner model (as conceptually described in Fig. 1-(d)), under binary signaling, with a single user within each cell (i.e. , K = 1 in Wyner’s original notation [2], describing the case of intra-cell TDMA or orthogonal CDMA) was computed. Fig. 10 displays the estimated SIR calculated for the possible range of inter-cell interference scaling α, for three SNR levels. For comparison, the Gaussian signaling capacity as derived by Wyner, is also presented. As may be expected for low SNR (−10dB), the estimated SIR and Wyner’s capacity almost coincide. For the intermediate SNR level (0dB), Wyner’s capacity provides a tight upper bound on the SIR for α < 0.5. As for high SNR (8dB), the SIR saturates the 1-bit bound for almost all values of α. The SIR, as opposed to the capacity, always increases with the inter-cell interference scaling α. Note, in passing, that since the capacity of a binary channel is bounded between the SIR and Wyner’s Gaussian signaling capacity, one can also infer the ultimate (possibly non-uniform) information rate in these low and intermediate SNR regimes. Also note that a proper optimal joint processing of the channel observations yields an increase in the achievable error-free information throughput as the interference intensity grows. Thus interference, conceived at first as having a detrimental effect, may assist in the process of reliable information transfer.


24

1.5

Bits Per Channel Use

α=0.5, K=1

1

0.5

Capacity SIR 0

−2

0

2

4

6

8

10

SNR [dB]

Fig. 11.

2-D Hexagonal Wyner Cellular Network. Estimated SIR (squares and solid line) and Gaussian signaling capacity

(dashed line), in bits per channel use, as a function of SNR for α = 0.5 and K = 1.

Fig. 11 evaluates the SIR for the complementary case of a fixed α = 0.5 as a function of SNR. The estimated SIR coincides with Wyner’s capacity for SNR . 4dB. It should be emphasized that the analysis could have been performed in a similar manner for the case of several intra-cell users, i.e. , K > 1. In this case, the sum of all K intra-cell users’ (scaled) transmissions is composed of K + 1 discrete amplitude levels. Thus, the scenario of K intra-cell users with equiprobable binary signaling can be seamlessly replaced by the setting of a single intra-cell user with (K + 1)-ary signaling taken from a binomial distribution. C. Quality of the Free-Energy Approximation Our free-energy approximations were based on simulating channels of size 30 × 30. In this section we aim to justify this choice. In order to evaluate the quality of our free-energy approximation we first performed Monte-Carlo simulations of several channels with different sizes and compared our results to an exact calculation. The following results, averaged over 500 realizations, were obtained for a specific channel, i.e. , Wyner’s hexagonal cellular network, as depicted in Fig 1-(d), with α = 0.5, SNR of 0dB and assuming a single user per cell. Fig. 12-(a) displays the root mean square (RMS) error per symbol, in percentage, between the approximated and exact free energies as a function of the channel’s size N 2 (width N = 4, . . . , 9, where a 9 × 9 channel is the largest case for which exact computation was feasible.) As can be


Fig. 12.

25

Quality of the free-energy approximation. (a) Root mean square (RMS) error (in %) in computing the free energy

per symbol exactly and using the CVM, for N × N channels. The results were obtained using 500 realizations of Wyner’s hexagonal cellular networks (assuming a single user per cell), with α = 0.5 and SNR=0dB. (b) The corresponding CVM free energy per symbol as a function of N . For N ≤ 9 we used the same realizations as in (a). For larger systems (dashed line) the exact free energy cannot be calculated. Vertical bars stand for standard deviation in simulation results.

observed the difference between the approximated and exact free energies is miniscule (in the order of 10−4 %) and further decreases with the channel’s width. Fig. 12-(b) presents the CVM approximation of the free energy per symbol as a function of the channel’s width. It may be seen that the free energy per symbol converges with the size of the system, and that the differences among realizations become smaller. In principle, larger systems can be simulated, for which these differences are expected to be even smaller. However, it seems that a 30×30 channel size suffices as an effective approximation of the exact free energy per symbol of infinite size systems, thus providing a proper estimate of the information rate. Similar behavior was observed for all other channels throughout the entire interference range and for a wide scope of SNR. D. Remarks about GBP and 2-D channels In this section we aim to justify the use of GBP for 2-D channels and try to understand the reasons for its excellent performance. We first compare its performance to two straightforward alternatives:


•

26

LBP over a Galois field (GF) The system is coarsened using 3×3 (non-overlapping) blocks. We apply LBP, using symbols with 29 states (i.e. , LBP over GF (q = 29 ), [57]), and estimate the free energy using the Bethe approximation. Detection is then performed by marginalizing a single block’s length29 belief vector. We refer to this method as Coarsened LBP (CLBP). The main difference between CLBP and GBP is the selection of the basic regions. While GBP uses a sliding 3×3 window, the regions of CLBP are non-overlapping. Hence, better performance of GBP over CLBP may indicate the importance of the region selection in GBP. The complexity of CLBP is similar to GBP. 1

1

0.9 0.9

Bit Per Symbol

0.8 0.7

0.8

0.6 0.7 0.5 0.6

6

8

10 GBP CLBP LEI(2) LEI(3) LEI(4) UB LB

0.4 0.3 α=0.5

0.2 0.1 0

Fig. 13.

−10

−5

0 SNR [dB]

5

10

2-D ISI channel. Estimated SIR in terms of bit per symbol as a function of SNR for α = 0.5. The SIR is evaluated

using GBP (squares and a solid line), CLBP (circles), LEI of width ±2 (triangles and a solid line), ±3 (stars and a solid line) or ±4 (diamonds and a solid line). Also shown are upper (UB) and lower (LB) bounds (dashed-dotted) on the SIR. The inset displays an enlargement of the curves.

•

Local Exact Inference In this local estimate method we perform exact inference over a certain square region around each symbol, and ignore the rest of the system. The width of a region was ±2, ±3 or ±4, where the latter corresponds to a 9 × 9 system centered around the relevant symbol. Estimation is performed for each symbol separately, and the free energy corresponds to a mean-field approximation [45], as the joint probability distribution is completely factorized. We refer to this method as Local Exact Inference (LEI). LEI is highly non-efficient as the computation is performed independently for each symbol and we include it in order to check the necessity of passing messages.


27

100

% Converged Systems

90 80 70 60 50 40 30 20

SNR=5dB

10 0

Fig. 14.

0.2

0.4

0.6

Interference Scaling α

0.8

1

Convergence of CLBP. Percentage of 2-D ISI systems (SNR=5dB) over which CLBP has converged as a function of

the interference scaling α.

Fig. 13 presents the information rate as a function of SNR, for an ISI channel (Fig. 1-(b)) with interference level of α = 0.5, together with the known bounds [24], [25]. The information rate is estimated using three methods: GBP, CLBP (with parallel message-passing schedule) and LEI. One observes that GBP and CLBP yield an identical estimate, which falls within the theoretical bounds, while LEI falls outside the bounds starting from SNR of −7dB. Therefore local estimates are adequate only for low SNR values. Our empirical study has shown that the difference between GBP and CLBP appears only when the interference conditions are more harsh. Fig. 14 presents the percentage of 9 × 9 systems (Fig. 1-(b), SNR=5dB) over which CLBP has converged as a function of α. As α becomes larger, CLBP tends not to converge, thus providing poor detection and information rate estimates, while GBP converges for all α. It appears that at increased values of α the system is ‘rescaled’ such that CLBP effectively suffers the same problems as a regular LBP over lower values of α. 1) A necessary condition for GBP to succeed: The physics and graphical models literature does not provide a rigorous explanation for the quality of approximations of GBP-based CVM for general graphs, or even for graphs representing 2-D channels. We conjecture that the reason for the success of GBP stems from the local nature of the system. More specifically, we claim that the correlation length, ξ, which measures the range over which symbols are correlated, is of the order of a GBP region over the relevant range of SNR. In order to show that we first examined the classical 2-D Ising ferromagnet spin-glass


28

0.5 0 ∆−0.5

(a)

−1 −1.5 −2 −30

−25

−20

−15

−10 −5 SNR [dB]

0

5

10

3

ξ

2 (b)

1 0 −30

Fig. 15.

−25

−20

−15

−10 −5 SNR [dB]

0

5

10

2-D Ising ferromagnet (without external fields). (a) The difference, ∆, between the exact free energy of an infinite

system and the CVM free energy of a 30 × 30 system. (b) Correlation length ξ (in 2 × 2 block units) as a function of SNR.

model without external fields. Fig. 15-(a) presents the difference, ∆, between the exact free energy of an infinite system13 and the CVM free energy of a 30 × 30 system14 . Fig. 15-(b) presents the 2-D Ising ferromagnet’s correlation length as a function of SNR. The correlation length in this case is the ‘typical’ distance over which two spin sites tend to be correlated and is calculated as ξ = 1/ log (λ0 /λ1 ),

(21)

where λ0 and λ1 are the largest and second largest eigenvalues of the transfer matrix, respectively. The transfer matrix M in this case is a 24 ×24 matrix where Mij is proportional to the probability that a 2 × 2 block is in a certain (4 symbol) state i and its adjacent block is in state j [59]. One observes that the CVM successfully approximates the free energy, as long as the correlation length is close to the block size, i.e. , ξ = 1 which corresponds to a single block. As for the case of 2-D channels with memory, there is no simple way of estimating the correlation length, due to the random external fields. Hence, in order to supply an estimate we calculate the correlation length for a 30 × 30 hexagonal topology (Fig. 1-(d)) in the following 13

A closed-form solution for the exact free energy of the 2-D Ising ferromagnet was obtained by Onsager in his renowned

work of 1944 [58]. 14

In this case the GBP regions were taken as a sliding 2 × 2 window, which is the natural choice for nearest-neighbor

interactions.


29

0.5 0.45 0.4 0.35 0.3

ξ

0.25 0.2 0.15 0.1 α=0.5

0.05 0 −10

Fig. 16.

−5

0 SNR [dB]

5

Mean ‘correlation length’ ξ (in 3 × 3 block units) over all pairs of adjacent blocks as a function of SNR (α = 0.5).

way. We take each pair of adjacent 3 × 3 blocks and calculate the correlation length as if they were replicated indefinitely. Therefore, due to this replication, the 2-D channel can be now viewed as a classical 2-D Ising spin-glass model for which the correlation length is known (21), except here the eigenvalues are of a 29 × 29 transfer matrix. Now, the correlation length estimate can be computed by averaging over the correlation lengths of the different replicated 2-D channels, corresponding to all the pairs of adjacent blocks in the original 2-D channel’s graph. Fig. 16 presents the mean ‘correlation length’ as a function of SNR (α = 0.5). One observes that this value is lower than 1 for all SNR, which probably accounts for the excellent CVM approximation. The reason for the low correlation length, even at high SNR values, is the positive effect of external fields, originating from the observations. These effects can be related to the pinning and percolation theories in the physics literature [60]. VIII. C ONCLUSIONS A GBP estimator for 2-D channels with memory was introduced. This estimator arose from the frameworks of probabilistic graphical models and statistical mechanics. Extensive simulation results of four different 2-D memory types strongly indicate a practically optimal behavior of GBP in inferring the channel’s both marginal and joint posterior probabilities. The proposed fully tractable message-passing scheme then shows a near-optimal error performance in detecting the


30

channel input symbols. Based on this observation, two complementary methods for a simulationbased computation of the information rates of 2-D discrete-input channels with memory are developed. In the first method the inferred marginals are used to evaluate the MMSE, then the GSV theorem is being applied. The second approach is based on the computed joint APP and the SMB theorem. In order to validate our method, we compared the resulting information rate to previouslycalculated bounds. The quality of the free-energy approximation, which is identical (according to the previously-derived relation (18)) to the quality of the information rate approximation, was compared to the exact free energy using small-grid size channels and was found to exhibit practically accurate behavior. The error performance of GBP is found to be almost optimal throughout the whole SNR and interference ranges. As rigorous analysis of the workings of GBP is a challenging open problem, we have suggested an explanation for its remarkable performance. The attractive usage of GBP illustrated for 2-D channels naturally calls for its real-time implementation as an inference engine in communication and storage systems. ACKNOWLEDGMENT O. Shental acknowledges the partial support of the NSF (Grant CCF-0514859). The work of S. Shamai was supported by the US-Israel, Binational Science Foundation. I. Kanter thanks the partial support of the Israel Science Foundation. The authors are grateful to Dongning Guo for constructive comments on an early version of the manuscript and Neri Merhav and Tsachy Weissman for pointing out the references about the multideminsional version of the ShannonMcMillan-Breiman theorem. The authors also wish to thank the reviewers and the Associate Editor for their contribution in enhancing the readability and technical quality of the paper. R EFERENCES [1] B. Vasic and E. M. Kurtas, Eds., Coding and Signal Processing for Magnetic Recording Systems.

CRC Press, 2005.

[2] A. D. Wyner, “Shannon-theoretic approach to a Gaussian cellular multiple-access channel,” IEEE Trans. Inform. Theory, vol. 40, pp. 1713–1727, Nov. 1994. [3] (2003) The 3rd generation partnership project website. [Online]. Available: http://www.3gpp.org [4] S. Hranilovic and F. R. Kschischang, “An indoor wireless optical MIMO channel using pixelated arrays of transmitters and receivers,” in Proc. 22nd Biennial Symposium on Communications, Kingston, Ontario, Canada, May 2004, pp. 392–394. [5] J. G. Proakis, Digital Communications, 4th ed. [6] S. Verdú, Multiuser Detection.

New York, USA: McGraw-Hill, 2000.

Cambridge, UK: Cambridge University Press, 1998.


31

[7] E. Ordentlich and R. M. Roth, “On the computational complexity of 2D maximum-likelihood sequence detection,” Hewlett-Packard Laboratories, Palo Alto, CA, Tech. Rep. HPL-2006-69, Apr. 2006. [Online]. Available: http://www.hpl.hp.com [8] P. S. Kumar and S. Roy, “Two-dimensional equalization: Theory and application to high density magnetic recording,” IEEE Trans. Commun., vol. 42, pp. 386–395, Feb. 1994. [9] W. Weeks IV, “Full surface data storage,” Ph.D. dissertation, University of Illinois at Urbana-Champaign, USA, 1996. [10] K. M. Chugg, X. Chen, and M. A. Neifeld, “Two-dimensional equalization in coherent and incoherent page-oriented optical memory,” J. Opt. Soc. Amer. A, vol. 16, pp. 549–562, Mar. 1999. [11] N. Singla, J. A. O’Sullivan, R. S. Indeck, and Y. Wu, “Iterative decoding and equalization for 2-D recording channels,” IEEE Trans. Magn., vol. 38, pp. 2328–2330, Sept. 2002. [12] Y. Wu, J. A. O’Sullivan, R. S. Indeck, and N. Singla, “Iterative detection and decoding for separable two-dimensional intersymbol interference,” IEEE Trans. Magn., vol. 39, pp. 2115–2120, July 2003. [13] A. H. J. Immink et al., “Signal processing and coding for two-dimensional optical storage,” in Proc. IEEE Global Conference on Communications (GLOBECOM), Piscataway, New Jersey, USA, Dec. 2003, pp. 3904–3908. [14] M. Marrow and J. K. Wolf, “Iterative detection of 2-dimensional ISI channels,” in Proc. IEEE Inform. Theory Workshop (ITW), Paris, France, Mar. 2003, pp. 131–134. [15] N. Singla and J. A. O’Sullivan, “Joint equalization and decoding for nonlinear two-dimensional intersymbol interference channels,” in Proc. IEEE Int. Symp. Inform. Theory (ISIT), Adelaide, Australia, Sept. 2005. [16] W. Hirt, “Capacity and information rates of discrete-time channels with memory,” Ph.D. dissertation, Swiss Federal Inst. of Tech. (ETH), Zurich, Switzerland, 1998. [17] S. Shamai (Shitz), L. H. Ozarow, and A. D. Wyner, “Information rates for a discrete-time Gaussian channel with intersymbol interference and stationary inputs,” IEEE Trans. Inform. Theory, vol. 37, no. 6, pp. 1527–1539, Nov. 1991. [18] J. Chen and P. H. Siegel, “Markov processes asymptotically achieve the capacity of finite-state intersymbol interference channels,” IEEE Trans. Inform. Theory, submitted for publication. [19] A. Kavˇcić, “On the capacity of Markov sources over noisy channels,” in Proc. IEEE Global Conference on Communications (GLOBECOM), San Antonio, Texas, USA, Nov. 2001, pp. 2997–3001. [20] S. Yang and A. Kavˇcić, “Markov sources achieve feedback capacity of finite-state machine channels,” in Proc. IEEE Int. Symp. Inform. Theory (ISIT), Lausanne, Switzerland, June 2002, p. 361. [21] S. Shamai (Shitz) and R. Laroia, “The intersymbol interference channel: lower bounds on capacity and precoding loss,” IEEE Trans. Inform. Theory, vol. 42, no. 5, pp. 1388–1404, Sept. 1998. [22] D. Arnold, H. A. Loeliger, P. O. Vontobel, A. Kavˇcić, and W. Zeng, “Simulation-based computation of information rates for channels with memory,” IEEE Trans. Inform. Theory, vol. 52, no. 8, pp. 3498–3508, Aug. 2006. [23] L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding of linear codes for minimizing symbol error rate,” IEEE Trans. Inform. Theory, vol. 20, no. 3, pp. 284–287, Mar. 1974. [24] J. Chen and P. H. Siegel, “On the symmetric information rate of two-dimensional finite state ISI channels,” in Proc. IEEE Information Theory Workshop (ITW), Paris, France, Mar. 2003. [25] ——, “On the symmetric information rate of two-dimensional finite state ISI channels,” IEEE Trans. Inform. Theory, vol. 52, no. 1, pp. 227–236, Jan. 2006. [26] ——, “Information rates of two-dimensional finite state ISI channels,” in Proc. IEEE Int. Symp. Inform. Theory (ISIT), Yokohama, Japan, June 2003, p. 118.


32

[27] ——, “Information rates of two-dimensional finite state ISI channels,” IEEE Trans. Inform. Theory, submitted for publication. [28] M. I. Jordan, Ed., Learning in Graphical Models.

Cambridge, MA: The MIT Press, 1999.

[29] M. I. Jordan and C. M. Bishop, Eds., An Introduction to Graphical Models. draft, 2000. [30] B. J. Frey, Graphical Models for Machine Learning and Digital Communication. Cambridge, MA: The MIT Press, 1998. [31] M. Mézard, G. Parisi, and M. A. Virasoro, Spin Glass Theory and Beyond.

Singapore: World Scientific Lecture Notes

in Physics Vol. 9, 1987. [32] H. Nishimori, Statistical Physics of Spin Glasses and Information Processing.

Oxford, UK: Oxford University Press,

2001. [33] O. Shental, N. Shental, A. J. Weiss, and Y. Weiss, “Generalized belief propagation receiver for near-optimal detection of two-dimensional channels with memory,” in Proc. IEEE Information Theory Workshop (ITW), San Antonio, Texas, USA, Oct. 2004. [34] O. Shental, N. Shental, and S. Shamai (Shitz), “On the achievable information rates of two-dimensional channels with memory,” in Proc. IEEE Int. Symp. Inform. Theory (ISIT), Adelaide, Australia, Sept. 2005. [35] R. Kindermann and J. L. Snell, Markov Random Fields and Their Applications.

Providence, Rhode Island: American

Mathematical Society, 1980. [36] J. S. Yedidia, W. T. Freeman, and Y. Weiss, “Generalized belief propagation.” in Advances in Neural Information Processing Systems, T. K. Leen, T. G. Dietterich, and V. Tresp, Eds. Cambridge, MA: The MIT Press, 2001, vol. 13. [37] ——, “Constructing free-energy approximations and generalized belief propagation algorithms,” IEEE Trans. Inform. Theory, vol. 51, no. 7, pp. 2282–2312, July 2005. [38] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference.

San Francisco: Morgan

Kaufmann, 1988. [39] D. J. C. Mackay and R. M. Neal, “Near Shannon limit performance of low density parity check codes,” Elect. Lett., vol. 33, no. 6, pp. 457–458, Mar. 1997. [40] R. G. Gallager, “Low density parity check codes,” IRE Trans. Inform. Theory, vol. 8, pp. 21–28, 1962. [41] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon limit error-correcting coding and decoding,” in Proc. IEEE International Conference on Communications (ICC), Geneva, Switzerland, May 1993, pp. 1064–1070. [42] C. Berrou and A. Glavieux, “Near optimum error-correcting coding and decoding: Turbo codes,” IEEE Trans. Commun., vol. 44, no. 10, Oct. 1996. [43] R. J. McEliece, D. J. C. MacKay, and J. F. Cheng, “Turbo decoding as an instance of Pearl’s ’belief propagation’ algorithm,” IEEE J. Select. Areas Commun., vol. 16, pp. 140–152, Feb. 1998. [44] P. Pakzad and V. Anantharam, “Kikuchi approximation method for joint decoding of LDPC codes and partial-response channels,” IEEE Trans. Commun., vol. 54, no. 7, pp. 1149–1153, July 2006. [45] M. Opper and D. Saad, Advanced Mean Field Methods: Theory and Practice.

Cambridge, MA: MIT Press, 2001.

[46] S. Geman and D. Geman, “Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images,” IEEE Trans. Pattern Anal. Machine Intell., vol. 6, no. 6, pp. 721–741, Nov. 1984. [47] F. Kschischang, B. Frey, and H. A. Loeliger, “Factor graphs and the sum-product algorithm,” IEEE Trans. Inform. Theory, vol. 47, pp. 498–519, Feb. 2001. [48] R. G. Cowell, A. P. Dawid, S. L. Lauritzen, and D. J. Spiegelhalter, Probabilistic Networks and Expert Systems. Springer, 1999.


33

[49] D. Guo, S. Shamai (Shitz), and S. Verdu´ , “Mutual information and minimum mean-square error in Gaussian channels,” IEEE Trans. Inform. Theory, vol. 51, no. 4, pp. 1261–1282, Apr. 2005. [50] T. M. Cover and J. A. Thomas, Elements of Information Theory. John Wiley and Sons, 1991. [51] D. Ornstein and B. Weiss, “Entropy and recurrence rates for stationary random fields,” IEEE Trans. Inform. Theory, vol. 48, no. 6, pp. 1694–1697, June 2002. [52] ——, “The ShannonMcMillanBreiman theorem for a class of amenable groups,” Israel J. Math., vol. 44, pp. 53–60, 1983. [53] E. Lindenstrauss, “Pointwise theorems for amenable groups,” Invent. Math., vol. 146, no. 2, pp. 259–295, Jan. 2001. [54] B. G. Leroux, “Maximum-likelihood estimation for hidden Markov models,” Stochastic Processes and their Applications, vol. 40, pp. 127–143, 1992. [55] T. Tanaka, “A statistical-mechanics approach to large-system analysis of CDMA multiuser detectors,” IEEE Trans. Inform. Theory, vol. 48, pp. 2888–2910, Nov. 2002. [56] M. A. Iqbal and W. Weeks IV, “Simulation-based estimation of the capacity of full-surface channels,” in Proc. IEEE Int. Symp. Inform. Theory (ISIT), Adelaide, Australia, Sept. 2005, p. 1778. [57] I. Kanter and H. Kfir, “Statistical mechanical aspects of joint source-channel coding,” Europhys. Lett., vol. 63, no. 2, pp. 310–316, 2003. [58] L. Onsager, “Crystal statistics: A two-dimensional model with order-disorder transition,” Phys. Rev., vol. 65, pp. 117–149, 1944. [59] R. J. Baxter, Exactly Solved Models in Statistical Mechanics.

London, UK: Academic Press, 1982.

[60] G. Blatter, M. V. Feigel’man, V. B. Geshkenbein, A. I. Larkin, and V. M. Vinokur, “Vortices in high-temperature superconductors,” Rev. Mod. Phys., vol. 66, no. 4, pp. 1125–1388, Oct. 1994.

Ori Shental Ori Shental (S’03-M’07) received the B.Sc. and Ph.D. degrees in Electrical Engineering from the Tel-Aviv University, Israel, in 1996 and 2006. He is currently a post-doctoral researcher at the Center for Magnetic Recording Research, University of California, San Diego. His research interests are in the application of computational tools from statistical physics and probabilistic graphical models into information-theoretic problems in communications and data storage. He received several awards for academic excellence.

Noam Shental Noam Shental received a B.Sc. degree in physics and mathematics from the Tel-Aviv University, Israel, in 1991, and a Ph.D. degree in Neural Computation from the Hebrew University, Jerusalem, Israel, in 2004. He is currently a post-doctoral fellow at the Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel. His research interests are applications of machine learning and probabilistic graphical models, especially in biological systems.


34

Shlomo Shamai (Shitz) Shlomo Shamai (Shitz) received the B.Sc., M.Sc., and Ph.D. degrees in electrical engineering from the Technion—Israel Institute of Technology, in 1975, 1981 and 1986 respectively. During 1975-1985 he was with the Communications Research Labs in the capacity of a Senior Research Engineer. Since 1986 he is with the Department of Electrical Engineering, Technion—Israel Institute of Technology, where he is now the William Fondiller Professor of Telecommunications. His research interests encompasses a wide spectrum of topics in information theory and statistical communications. Dr. Shamai (Shitz) is an IEEE Fellow and a member of the Union Radio Scientifique Internationale (URSI). He is the recipient of the 1999 van der Pol Gold Medal of URSI, and a co-recipient of the 2000 IEEE Donald G. Fink Prize Paper Award, the 2003, and the 2004 joint IT/COM societies paper award, and the 2007 IEEE Information Theory Society Paper Award. He is also the recipient of 1985 Alon Grant for distinguished young scientists and the 2000 Technion Henry Taub Prize for Excellence in Research. He has served as Associate Editor for the S HANNON T HEORY

OF THE

IEEE T RANSACTIONS

ON I NFORMATION

T HEORY, and also serves on the Board of Governors of the Information Theory Society.

Ido Kanter Ido Kanter received the B.Sc. in Physics and Computer Science, and Ph.D degree in Physics from Bar-Ilan University in 1983 and 1987, respectively. During 1988-1989 he was a visiting research fellow at Princeton university working with P.W. Anderson. In 1989 he was also a visiting research fellow at Bell Laboratories. Since 1990 he is with the Department of Physics, Bar-Ilan University, and since 1996 he is a Professor. His research interests are statistical mechanics, phase transitions and critical phenomena, physics of disordered magnetic systems, theory of neural networks, optimization problems, error correcting codes, CDMA, nonlinear dynamics and synchronization of chaotic lasers. Ido Kanter is the recipient of the Wolf Foundation Prize for students (1982), Landau Prize for Ph.D students (1986), Weizmann Postdoctoral Fellow (1988-1989) and the Humbouldt prize for senior scientist (2001). He was acting as an Associate Editor of the Physical Review E (1991-1997) and he is currently acting as an Associate Editor for the International Journal of Neural Systems and for Journal of Statistical Physics.

Anthony J. Weiss Anthony J. Weiss (S’84-M’85-SM’86-F’97) received the B.Sc. degree from the Technion-Israel Institute of Technology, in 1973, and the M.Sc. and Ph.D. degrees from Tel Aviv University, Tel Aviv, Israel, in 1982 and 1985, all in Electrical Engineering. From 1973 to 1983, he was involved in research and development of numerous projects in the fields of communications, command and control, and emitter localization. In 1985, he joined the Department of Electrical Engineering-Systems, Tel Aviv University. From 1996 to 1999 he served as the department chairman and as the chairman of IEEE Israel Section. He is currently the head of the EE School in Tel Aviv University. His research interests include detection and estimation theory, signal processing, sensor array processing, and wireless networks. He also held leading scientific positions with Signal Processing Technology Ltd., Wireless on Line Ltd. and SigmaOne Ltd. Professor Weiss published over 100 papers in professional magazines and conferences and holds 9 US patents. He was a recipient of the IEEE 1983 Acoustics, Speech, and Signal Processing Society’s Senior Award and the IEEE third millennium medal. He is an IEEE Fellow since 1997 and an IET Fellow since 1999.


35

Yair Weiss Yair Weiss is an associate professor of Computer Science and Engineering at the Hebrew University of Jerusalem, joining the faculty in 2001. His research interests include human and machine vision, machine learning and information theory. He received his Ph.D. from the Massachussetts Institute of Technology in 1998, working with Ted Adelson on Bayesian models of motion analysis. He was the program chair for the Neural Information Processing Systems (NIPS) 2004 conference and general chair of NIPS 2005. In 2006 he was appointed a fellow of the Canadian Institute of Advanced Research.

Discrete-Input Two-Dimensional Gaussian Channels ...

Discrete-Input Two-Dimensional Gaussian Channels ...

Suggest Documents

Multi-mode bosonic Gaussian channels

Multi-mode bosonic Gaussian channels

Optimal nonuniform signaling for Gaussian channels - CiteSeerX

MIMO Gaussian Bidirectional Broadcast Channels ... - Semantic Scholar

Gaussian Intersymbol Interference Channels With Mismatch

Gaussian Channels - You should not be here.

MIMO Gaussian Broadcast Channels with Confidential ... - CiteSeerX

Majorization preservation of Gaussian bosonic channels

Amendable Gaussian channels: Restoring ... - Google Sites

Evaluating capacities of Bosonic Gaussian channels

Black holes as bosonic Gaussian channels

MIMO Gaussian Broadcast Channels with Confidential ... - CiteSeerX

Positivity of TwoDimensional Elliptic Differential ...

Multiple-Input Multiple-Output Gaussian Broadcast Channels ... - arXiv

Multiplicativity of maximal output purities of Gaussian channels under ...

Improper Gaussian Signaling in Full-Duplex Relay Channels ... - arXiv

On Gaussian Interference Channels with mixed ... - ITA @ UCSD

On the Capacity of Quantized Gaussian MAC Channels ... - ECE@IISc

MIMO Gaussian Broadcast Channels with Common, Private ... - arXiv

Amendable Gaussian channels: restoring entanglement via a unitary ...

Transmit Power Optimization for Gaussian Vector Broadcast Channels

On the capacity of multi-antenna gaussian channels - Infoscience

Secure Degrees of Freedom for Gaussian Channels ... - Google Sites

Outage analysis of Block-Fading Gaussian Interference Channels - arXiv