Near-Ideal M-ary LDGM Quantization with Recovery - IEEE Xplore

1830

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 59, NO. 7, JULY 2011

Near-Ideal 𝑀 -ary LDGM Quantization with Recovery Qingchuan Wang, Student Member, IEEE, Chen He, Member, IEEE, and Lingge Jiang, Member, IEEE Abstract—For iterative mean-square error (MSE) quantizers with alphabet size 𝑀 = 2𝐾 using low-density generator-matrix (LDGM) code constructions, an efficient recovery algorithm is proposed, which adjusts the priors used in belief propagation (BP) to limit the impact of previous non-ideal decimation steps. Based on an analysis of the BP process under ideal or nonideal decimation, the algorithm first estimates the conditional probability distributions describing the effect of non-ideal decimation, then adjusts the priors to make the distributions match the ideal situation. As shown in simulation results, the recovery algorithm can improve quantization performance greatly, reducing the shaping loss to as low as 0.012 dB, while the increase in computational complexity is modest thanks to the use of FFT techniques. Index Terms—Low-density generator-matrix, quantization, decimation, recovery.

I. I NTRODUCTION

S

PARSE-GRAPH codes have recently found some use in long-block lossy source coding problems due to their potential to achieve near-ideal rate-distortion performance at a lower computational complexity than traditional methods like trellis-coded quantization (TCQ) [1]. Although structured constructions such as polar codes [2] have been shown to be fast and effective in many channel and source coding problems [3], including those involving side information and binning [4], more randomized ones based on low-density generator matrix (LDGM) codes remain attractive due to their more moderate block length requirements, efficient integration with e.g. superposition coding schemes, as well as the availability of well-established optimization methods from low-density parity-check (LDPC) literature. Indeed, being duals to LDPC codes used in channel coding [5], LDGM codes are known to be able to approach the Shannon limit under optimal encoding for the binary symmetric case [6], and with appropriate modulation mappings, for more general sources possibly requiring non-uniform reconstruction alphabets as well [7]; variants with additional parity [8] or Hamming weight [9] constraints have also been proposed to improve finite-degree performance or to allow channel coding and binning. Naturally, most practical encoding algorithms (or quantizers) for such sparse-graph codes employ some form of message passing. Although more elaborate algorithms like survey propagation exist [6], [7], ordinary belief propagation Paper approved by Z. Xiong, the Editor for Distributed Coding and Processing of the IEEE Communications Society. Manuscript received August 5, 2010; revised December 19, 2010. The authors are with the Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China (e-mail: {r6144, chenhe, lgjiang}@sjtu.edu.cn). This paper was supported by the National Natural Science Foundation of China Grants No. 60772100, 60832009, and 60872017, as well as National 863 Program Grant No. 2009AA011505. Digital Object Identifier 10.1109/TCOMM.2011.061511.100462

(BP) appears to be sufficient for quantization when a good degree distribution is used [10]. In any case, decimation steps (i.e. hard decisions) usually need to be carried out according to e.g. the BP marginals (extrinsic probabilities) in order to make BP converge to one codeword among many similarly good ones; other positive feedback mechanisms used in [11] and [12] serve essentially the same purpose. For randomized LDGM constructions, loops in the factor graph and the limited number of iterations causes the extrinsic probabilities from BP to be approximate; this leads to non-ideal decimation choices, which adversely affects future iterations and makes the performance of such BP algorithms difficult to predict theoretically. Our works mainly focus on a specific lossy source coding problem, mean-square error (MSE) quantization of Euclidean space [13, Sec. II-C], which plays an important role in highrate source coding, as well as various channel coding schemes on Gaussian channels, such as the shaping component of dirty paper coding [14]. As shown in previous works [15] and [16], an LDGM-based construction using BP and decimation for encoding can approach the Shannon limit of this problem quite well, and methods for optimizing the degree distribution, pace of decimation, etc. have been proposed. The non-ideal decimation problem above also exists here; in [16, Sec. VI-C], a recovery step run before each BP iteration to adjust the priors has been found to reduce its impact and improve quantization performance significantly. However, the recovery algorithm in [16] is somewhat ad hoc and poorly understood, and it is only applicable to binary constructions, whose alphabet limitation leaves a significant gap to the Shannon limit. In this paper, after giving an overview of the quantization algorithm in Section II, in Section III we will present some theoretical arguments in an attempt to gain a better understanding of non-ideal decimation and how recovery might be performed. Based on this analysis, a recovery algorithm will be designed in Section IV for LDGM-based MSE quantization with 𝑀 -ary alphabet, whose computational complexity turns out to be quite modest. The simulation results in Section V demonstrate that recovery can improve quantization performance greatly and approach the Shannon limit to within 0.012 dB. Finally, Section VI concludes the paper. A. Notations and Conventions Notations are similar to those in [16]. Bold letters denote sequences, vectors or matrices whose elements are indicated by subscripts, e.g. 𝒚 = (𝑦1 , . . . , 𝑦𝑛 ); conversely, a vector or matrix can also be defined element-wise, e.g. (𝑒−j2𝜋𝑘𝑣/𝑀 )𝑘𝑣 is the DFT (discrete Fourier transform) matrix. (⋅)T and (⋅)H denotes matrix transposition and Hermitian transposition, ∥⋅∥ is the Euclidean norm, 1[⋅] is 1 if the condition is true and 0 otherwise, ⊕ and ⊖ denote addition and

c 2011 IEEE 0090-6778/11$25.00 ⃝

WANG et al.: NEAR-IDEAL 𝑀-ARY LDGM QUANTIZATION WITH RECOVERY

subtraction modulo-2 or modulo-𝑀 (should be clear from context), (⋅)ℐ is the modulo-𝑀 operation on a real number into ℐ ≜ [−𝑀/2, 𝑀/2), and (⋅)ℐ 𝑛 is the element-wise modulooperation into ℐ 𝑛 . In the BP algorithm, symbols like 𝜆b𝑖 , b 𝜇bc 𝑖𝑠 , 𝜈𝑖 are binary or 𝑀 -ary probability tuples representing the priors, messages and extrinsic probabilities in BP; each 𝑀 -ary probability tuple 𝜇 is a tuple of 𝑀 real numbers representing a probability distribution over {0, . . . , 𝑀 − 1}, with each component denoted by 𝜇(𝑢), 𝑢 = 0, . . . , 𝑀 −1. For conciseness, all probability tuples are implicitly normalized; that is, when we define an 𝑀 -ary probability tuple 𝜇 by writing 𝜇(𝑢) = 𝜇𝑢 , 𝑢 = 0, . . . , 𝑀 − 1, it actually means that 𝜇(𝑢) = 𝜇𝑢 /(𝜇0 + ⋅ ⋅ ⋅ + 𝜇𝑀−1 ), and later mentions of 𝜇(𝑢) refer to these normalized components. For 𝑢 = 0, . . . , 𝑀 − 1, 𝑢 denotes the “sure-𝑢” probability tuple with 𝑢(𝑢) = 1 and all 1 1 other components being zero, while ∗ = ( 𝑀 ,..., 𝑀 ) denotes the “unknown” probability tuple. For 𝑀 -ary probability tuples 𝜇′ and 𝜇′′ , 𝜇′ ⊙ 𝜇′′ and 𝜇′ ⊕ 𝜇′′ are also 𝑀 -ary probability tuples (𝜇′ ⊙ 𝜇′′ )(𝑢) ≜ 𝜇′ (𝑢)𝜇′′ (𝑢) and (𝜇′ ⊕ 𝜇′′ )(𝑢) ≜ ∑𝑀−1with ′ ′ ′′ ′ 𝑢′ =0 𝜇 (𝑢 )𝜇 (𝑢 ⊖ 𝑢 ), 𝑢 = 0, . . . , 𝑀 − 1, similar to the variable-node and check-node operations in LDPC literature. The informativeness of an 𝑀 -ary probability tuple 𝜇 is measured in bits according to 𝐼(𝜇) ≜ log 𝑀 +

𝑀−1 ∑

𝜇(𝑢) log 𝜇(𝑢);

(1)

𝑢=0

for example, among binary probability tuples, 0 and 1 are the most informative and ∗ is the least. Often we will have a random 𝑀 -ary probability tuple 𝜇 and a random variable 𝑢∗ ∈ {0, . . . , 𝑀 − 1}, where 𝑝(𝑢∗ ) is uniform and 𝑝(𝜇 ∣ 𝑢∗ ) satisfies, for any deterministic 𝜇′ , 𝑝(𝜇 = 𝜇′ ∣ 𝑢∗ = 0) = 𝑝(𝜇 = 𝜇′ ⊕ 𝑢 ∣ 𝑢∗ = 𝑢), 𝑢 = 0, . . . , 𝑀 − 1, (2) due to the dithering performed below. 𝑝(𝜇 ∣ 𝑢∗ ) can then be fully characterized by 𝑝(𝜇 ∣ 𝑢∗ = 0), and we extend similar notions in LDPC analysis [17] and call the latter the density of 𝜇 with respect to 𝑢∗ , and say 𝜇 has a symmetric density (or simply is symmetric) w.r.t. 𝑢∗ if, for any deterministic 𝜇′ , 𝑝(𝑢∗ = 𝑢 ∣ 𝜇 = 𝜇′ ) = 𝜇′ (𝑢),

𝑢 = 0, . . . , 𝑀 − 1.

(3)

Eq. (3) implies that the mutual information ] [𝑀−1 ∑ ∗ ∗ ∗ 𝑝(𝑢 ∣ 𝜇) log 𝑝(𝑢 ∣ 𝜇) 𝐼(𝑢 ; 𝜇) = log 𝑀 + E𝜇

(4)

𝑢∗ =0

is equal to E [𝐼(𝜇)]; if E [𝐼(𝜇)] > 𝐼(𝑢∗ ; 𝜇), 𝜇 is then said to be over-confident w.r.t. 𝑢∗ . II. OVERVIEW OF THE Q UANTIZATION P ROBLEM AND A LGORITHM We consider MSE quantization with reconstruction alphabet size 𝑀 ≜ 2𝐾 , which is equivalent to the follows [16]: given 𝑀 , rate 𝑅 and block length 𝑛, design a codebook 𝒰 ⊆ {0, . . . , 𝑀 − 1}𝑛 with ∣𝒰∣ = 2𝑛b and 𝑛b ≜ 𝑛𝑅, as well as a quantization algorithm that quantizes a source sequence 𝒚 ˆ uniformly distributed in [0, 𝑀 )𝑛 into a 𝒖 ∈ 𝒰, such that the modulo-𝑀 MSE 𝜎 2 ≜ E [𝑑(ˆ 𝒚 , 𝒖)] with

1831

μcu sj μbc is

c1

u1

c2 b1

c3

b2

u2

c4

c2n−1

bnb μcb si

c2n

un

μuc js Fig. 1. The factor graph of the LDGM quantizer when 𝑀 = 4. Since the scrambling sequence 𝒂 has already been fixed when the quantization algorithm is run, the corresponding variable nodes can be omitted. 2

𝑑(ˆ 𝒚 , 𝒖) ≜ 𝑛1 ∥(ˆ 𝒚 − 𝒖)ℐ 𝑛 ∥ is minimized. As 𝑛 → ∞, 𝜎∗2 (𝑅) ≜ (2𝜋𝑒(2𝑅 /𝑀 )2 )−1 is a lower bound of 𝜎 2 that becomes tight as 𝑀 increases (see [13] and [16, Sec. II]), so we define 10 log10 (𝜎 2 /𝜎∗2 (𝑅)) as the shaping loss in decibels. The 𝑀 -ary LDGM codebook is constructed in a way similar to [16, Sec. VII]. We let 𝑮 be an 𝑛b × 𝑛c (𝑛c ≜ 𝐾𝑛) binary low-density generator matrix from a suitably optimized LDGM code ensemble, 𝜹 ∈ {0, . . . , 𝑀 − 1}𝑛 and 𝒂 ∈ {0, 1}𝐾𝑛 be predetermined i.i.d. uniform dithering and scrambling sequences known at both the encoder and the decoder. For each 𝒃 ∈ {0, 1}𝑛b , we divide 𝒄 ≜ 𝒄(𝒃, 𝒂) ≜ 𝒃𝑮 ⊕ 𝒂 into 𝑛 sub-sequences 𝒄˜𝑗 ≜ (𝑐𝑗1 , . . . , 𝑐𝑗𝐾 ) with 𝑗𝑘 ≜ 𝐾(𝑗 − 1) + 𝑘, 𝑗 = 1, . . . , 𝑛, and map each resulting 𝒄˜𝑗 into one 𝑀 -ary 𝒄𝑗 ) with 𝜙𝑗 (⋅) ≜ 𝜙(⋅) ⊕ 𝛿𝑗 being the symbol 𝑢𝑗 = 𝜙𝑗 (˜ dithered version of Gray mapping 𝜙(⋅); this yields a codeword 𝒖 ≜ 𝒖(𝒃, 𝒂), and all 2𝑛b codewords from the given 𝑮, 𝜹 and 𝒂 form the codebook 𝒰. Fig. 1 shows the factor graph [18] describing the code; as in [16], the variable nodes for all 𝑏𝑖 ’s have the same right-degree, denoted by 𝑑b . The BP messages used in the algorithm, denoted by e.g. 𝜇bc 𝑖𝑠 , are also depicted in the figure; the two subscripts are the indices of its source and destination nodes. Given parameter 𝑡 > 0, the quantization algorithm carries out BP with a priori probabilities ˜ u (𝑢) = 𝑒−𝑡(˜𝑦𝑗 −𝑢)2ℐ /𝑄(˜ 𝜆 𝑦𝑗 ), 𝑢 = 0, . . . , 𝑀 − 1 (5) 𝑗 ∑𝑀−1 −𝑡(˜𝑦𝑗 −𝑢)2 ℐ is the nor𝑦𝑗 ) ≜ for each 𝑢𝑗 , where 𝑄(˜ 𝑢=0 𝑒 malization factor that will henceforth be omitted as noted in the conventions, and 𝒚 ˜ is an adjusted version of the source sequence 𝒚 ˆ using the proposed recovery algorithm. To make BP converge, a number of decimation steps are performed in each iteration, each of which fixes a certain 𝑏𝑖∗ to a hard decision 𝑏∗ ∈ {0, 1} by setting its prior 𝜆b𝑖∗ to the sure message 𝑏∗ . The choice of 𝑖∗ and 𝑏∗ in each decimation step is made according to the extrinsic probabilities 𝜈𝑖b of the 𝑏𝑖 ’s from BP. Specifically, we can use either the greedy decimator (GD), which chooses 𝑖∗ and 𝑏∗ among the undecimated positions such that 𝜈𝑖b∗ (𝑏∗ ) is maximized, or the typical (probabilistic) decimator (TD), which chooses an undecimated bit index 𝑖∗

1832


randomly and then 𝑏∗ ∈ {0, 1} with probability proportional to 𝜈𝑖b∗ (𝑏∗ ); while GD performs better in practice, TD is more amenable to analysis and thus used there. For conciseness, the quantization algorithms using TD/GD with recovery will henceforth be called TD-R and GD-R, respectively, while if recovery is not done (i.e. with 𝒚˜ made the same as 𝒚ˆ) they are called plain TD or GD. The number of decimation steps performed in each iteration, 〈 known 〉 as the pace of decimation, is controlled via 𝐼bc ≜ 𝐼(𝜇bc ) where ⟨⋅⟩ means averaging 𝑖𝑠 over the subscripts 𝑖 and 𝑠. Based on the analysis in [16, Sec. VI-D], in each iteration we simply carry out decimation until 𝐼bc increases by at least 𝑑b −2 2(𝑑b − 1) 𝑑𝐼bc = (1 − 𝐼bc ) 2(𝑑b −1) 𝑑𝑙 𝐿0 𝑑b

(6)

since the last iteration, which ensures that the algorithm finishes within 𝐿0 iterations. When the block length 𝑛 is small, the actual iteration count 𝐿 may be significantly smaller than 𝐿0 ; a “throttling” mechanism has been used in [16] to reduce the shaping loss in this case at the cost of higher 𝐿, but since it does not improve the performance-speed tradeoff significantly, it is not adopted here for simplicity. The entire quantization algorithm used in this paper can now be outlined in Fig. 2, which is largely the same as that in [16, Sec. VII]. The main difference is the introduction of the recovery algorithm at the beginning of each iteration, which is the focus of this paper. As the recovery algorithm ˜ u ’s must be updated recomputes 𝒚˜ in every iteration, the 𝜆 𝑗 accordingly, rather than being computed once before BP; to reduce the computational cost, (5) is pre-evaluated for different 𝑦˜𝑗 at step size 1/32 and then approximated via linear interpolation, which turns out to have negligible performance impact. Another difference is that the 𝜇cu 𝑠𝑗 ’s are now updated after the decimation steps, unlike in [16] where they were cu updated at the same time as the 𝜇cb 𝑠𝑖 ’s. This allows the 𝜇𝑠𝑗 ’s to include the information in the 𝜇bc 𝑖𝑠 ’s from the current iteration, thus making the 𝜈𝑗u ’s used in recovery in the next iteration more up-to-date; the importance of this will be explained in Section III-D. Finally, the introduction of 𝜹 and 𝒂, which is necessary in analysis, also leads to some minor changes. III. T HEORETICAL A RGUMENTS FOR R ECOVERY A. Analysis Framework for the Decimation Process Decimation in the LDGM quantization algorithm is both a significant obstacle in the algorithm’s analysis and the main reason for the use of recovery. The analysis in [16] was for TD and under ideal decimation in the sense defined below, which was sufficient for the purpose of degree distribution optimization. On the other hand, when designing a recovery algorithm, a good understanding of the behavior under nonideal decimation is necessary, so we will attempt below to extend the ideas in [16] to model this. To avoid complications, we assume without loss of generality that the bits in 𝒃 are decimated by TD sequentially rather than in a random order, and the number of decimation steps carried out in each iteration is likewise assumed to be deterministic over the ensemble defined below, unaffected by statistical fluctuations in 𝐼bc .

Input: Quantizer parameters 𝑮, 𝜹, 𝒂, 𝑡, source sequence 𝒚 ˆ Output: Quantized codeword 𝒖 = 𝒖(𝒃, 𝒂) labeled by 𝒃 cu 𝜇bc 𝑖𝑗𝑘 ⇐ ∗, 𝜇𝑗𝑘 𝑗 ⇐ ∗, 𝑖 = 1, . . . , 𝑛b , 𝑗 = 1, . . . , 𝑛, 𝑘 = 1, . . . , 𝐾 𝜆b𝑖 ⇐ ∗, 𝑖 = 1, . . . , 𝑛b ℰ ⇐ {1, 2, . . . , 𝑛b } {the set of bits not yet decimated} repeat {belief∏propagation iteration} cu 𝑐 ), 𝑗 = 1, . . . , 𝑛, 𝑢 = 𝜙 (˜ 𝜈𝑗u (𝑢) ⇐ 𝐾 𝑗 𝒄) = 0, . . . , 𝑀 − 1 𝑘 𝑘=1 𝜇𝑗𝑘 𝑗 (˜ Compute 𝒚 ˜ from 𝒚 ˆ and the 𝜈𝑗u ’s using the recovery algorithm ˜ u (𝑢) ⇐ 𝑒−𝑡(𝑦˜𝑗 −𝑢)2ℐ , 𝑗 = 1, . . . , 𝑛, 𝑢 = 0, . . . , 𝑀 − 1 𝜆 𝑗 for 𝑗 = 1 to 𝑛 ∑ and 𝑘 = 1 to 𝐾 do ∏ ˜u 𝜇uc 𝒄)) 𝑘′ ∕=𝑘 𝜇cu 𝑐𝑘′ ), 𝑐 = 0, 1 𝒄 ˜:˜ 𝑐𝑘 =𝑐 𝜆𝑗 (𝜙𝑗 (˜ 𝑗𝑗𝑘 (𝑐) ⇐ 𝑗𝑘′ 𝑗 (˜ end for for 𝑠 = 𝑗𝑘 = 1 to 𝑛c do( ) uc bc cb 𝜇cb bc ∖{𝑖} 𝜇𝑖′ 𝑠 , 𝑖 ∈ 𝒩𝑠⋅ 𝑠𝑖 ⇐ (𝜇𝑗𝑠 ⊕ 𝑎𝑠 ) ⊕ ⊕𝑖′ ∈𝒩⋅𝑠 end for for 𝑖 = 1 to 𝑛b (do ) b cb bc 𝜇bc 𝑖𝑠 ⇐ 𝜆𝑖 ⊙ ⊙𝑠′ ∈𝒩 cb ∖{𝑠} 𝜇𝑠′ 𝑖 , 𝑠 ∈ 𝒩𝑖⋅ ⋅𝑖

𝜈𝑖b ⇐ ⊙𝑠′ ∈𝒩 cb 𝜇cb ′ ⋅𝑖 𝑠 𝑖 end for while ℰ ∕= ∅ and more decimation is necessary in this iteration do Choose the bit index 𝑖∗ to decimate and its value 𝑏∗ bc ∗ ∗ 𝜆b𝑖∗ ⇐ 𝑏∗ , 𝜇bc 𝑖∗ 𝑠 ⇐ 𝑏 , 𝑠 ∈ 𝒩𝑖∗ ⋅ {decimate 𝑏𝑖 to 𝑏 } ℰ ⇐ ℰ∖{𝑖∗ } end while ( ) bc 𝜇cu bc 𝜇𝑖′ 𝑠 , 𝑠 = 𝑗𝑘 = 1, . . . , 𝑛c 𝑠𝑗 ⇐ 𝑎𝑠 ⊕ ⊕𝑖′ ∈𝒩⋅𝑠 until ℰ = ∅ 𝑏𝑖 ⇐ 0 (resp. 1) if 𝜆b𝑖 = 0 (or 1), 𝑖 = 1, . . . , 𝑛b

Fig. 2. A brief outline of the quantization algorithm. 𝒩𝑖⋅bc = 𝒩⋅𝑖cb is the cb = 𝒩 bc is the positions of 1’s in the 𝑖-th row of 𝑮, and similarly 𝒩𝑠⋅ ⋅𝑠 positions of 1’s in the 𝑠-th column of 𝑮.

The analysis uses a fully-adjusted sequence 𝒚, obtained from an idealized recovery algorithm described in Sec˜ tion III-C, in place of 𝒚 ˜ in (5) to define the priors 𝜆u𝑗 ’s, while 𝒚 u ˜ and the 𝜆𝑗 ’s used in practice are regarded as approximations. The 𝜈𝑖b used in each TD decimation step then approximates the corresponding “true” extrinsic probabilities ∑ ∏ 𝜆u𝑗 (𝑢𝑗 (𝒃′ , 𝒂′ )) 𝜈𝑖b∗ (𝑏) ≜ (𝒃′ ,𝒂′ )∈𝒞 𝑗 𝑏′𝑖 =𝑏

=

∑

′

′

𝑒−𝑛𝑡𝑑(𝒚,𝒖(𝒃 ,𝒂 )) ,

(7)

(𝒃′ ,𝒂′ )∈𝒞 𝑏′𝑖 =𝑏

where 𝒞 is defined as the set of (𝒃′ , 𝒂′ ) consistent with previous decimation choices, meaning here that 𝒂′ is equal to the scrambling sequence 𝒂, while each 𝑏′𝑖′ is equal to 𝑏∗ ∈ {0, 1} if 𝑏𝑖′ has previously been decimated to 𝑏∗ (i.e. 𝜆b𝑖′ = 𝑏∗ ), but can be either 0 or 1 if 𝑏𝑖′ has not been decimated yet (i.e. 𝜆b𝑖′ = ∗). The i.i.d. choice of each 𝑎𝑠 in the scrambling sequence 𝒂, which determines the codebook 𝒰, can likewise be viewed as a TD decimation step using ∗ as the extrinsic probabilities. Assuming for convenience that 𝑎1 , 𝑎2 , . . . , 𝑎𝑛c are chosen sequentially, the corresponding true extrinsic probabilities for the choice of 𝑎𝑠 can similarly be given by ∑ ∏ 𝜆u𝑗 (𝑢𝑗 (𝒃′ , 𝒂′ )), (8) 𝜈𝑠a∗ (𝑎) ≜ (𝒃′ ,𝒂′ )∈𝒞 𝑗 𝑎′𝑠 =𝑎

where 𝒞 is still the set of (𝒃′ , 𝒂′ ) consistent with previous decimation choices, i.e. 𝑎′𝑠′ = 𝑎𝑠′ for 𝑠′ < 𝑠, while the other bits in 𝒂′ as well as the entire 𝒃′ are arbitrary. We can thus define the true typical decimator (TTD) as an idealized version


1833

of TD that uses respectively 𝜈𝑠a∗ and 𝜈𝑖b∗ in the choice of each 𝑎𝑠 and 𝑏𝑖 , and say ideal decimation has been performed if all the bits in 𝒂 and 𝒃 decimated so far have used the TTD.1 Using TTD in all decimation steps yields each 𝒖 ∈ {0, . . . , 𝑀 − 1}𝑛 with probability proportional to 𝑒−𝑛𝑡𝑑(𝒚,𝒖) , giving

with recovery, and say a certain 𝑝(⋅) is ideal if it matches the corresponding 𝑝¯(⋅). Now the reference codeword is simply the result of TTD after all decimation steps, so

∫

𝑀−1 ∑

−𝑡(𝑦−𝑢)2ℐ

𝑀 1 𝑒 E [𝑑(𝒚, 𝒖)] = 𝜎𝑡2 ≜ 𝑑𝑦 (𝑦 − 𝑢)2ℐ ⋅ 𝑀 0 𝑄(𝑦) 𝑢=0 ∫ 𝑀/2 2 𝑒−𝑡𝑧 𝑑𝑧 = 𝑧2 ⋅ 𝑄(𝑧) −𝑀/2

∗ 2

𝑝¯(𝑢∗𝑗 ∣ 𝑦𝑗 ) = 𝑝¯(𝑢∗𝑗 ∣ 𝒚) = 𝑒−𝑡(𝑦𝑗 −𝑢𝑗 )ℐ /𝑄(𝑦𝑗 ),

which implies 𝑝¯(𝑢∗𝑗 ∣ 𝜆u𝑗 ) = 𝜆u𝑗 (𝑢∗𝑗 ), i.e. 𝜆u𝑗 has a symmetric density w.r.t. 𝑢∗𝑗 . As 𝑝¯(𝑦𝑗 ) is uniform, this density can be computed from ∗ 2

𝑝¯(𝑦𝑗 ∣ 𝑢∗𝑗 ) = 𝑒−𝑡(𝑦𝑗 −𝑢𝑗 )ℐ /𝑄(𝑦𝑗 ).

(9)

(denoted by 𝑃𝑡 in [16]); for example, if 𝒚 = 𝒚ˆ, then the achieved MSE is simply 𝜎𝑡2 . TD can yield the same result if each 𝜈𝑠a∗ can be made equal to ∗ and each 𝜈𝑖b∗ equal to the 𝜈𝑖b from BP. Similar approaches have been used in [19] and [3] to analyze the decimation process in satisfiability problems and quantization with polar codes. We now consider an ensemble of quantizer instances with a probability measure over it, so that probabilistic analytical approaches including density evolution (DE) can be carried out. Recall that in the DE analysis of the LDPC decoder [17], the probabilities are defined over the possible channel realizations and the LDPC code ensemble with a specific degree distribution, and DE is performed in reference to the transmitted codeword (usually assumed to be all-zero). In contrast, when analyzing the LDGM quantizer with TD or TTD, each quantizer instance in the ensemble has not only a specific 𝒚ˆ, a 𝑮 in the LDGM code ensemble, a dither 𝜹 but also a specific sequence of values used as the random source for decimation of both 𝒂 and 𝒃; for example, for each 𝑏𝑖 the random source can be denoted by an i.i.d. uniform variable 𝜔𝑖b over [0, 1), such that 𝑏𝑖 is decimated to 1[𝜔𝑖b ≥ 𝜈˜𝑖b (0)], 𝜈˜𝑖b being 𝜈𝑖b for TD and 𝜈𝑖b∗ for TTD. Over this ensemble, 𝒚 as well as all the priors, BP messages, etc. are now random variables whose distributions depend on whether TD or TTD (ideal decimation) is used, and whether recovery is performed. The reference codeword used by DE, denoted by (𝒃∗ , 𝒂∗ ) or the corresponding 𝒖∗ ≜ 𝒖(𝒃∗ , 𝒂∗ ), is defined as the quantization result that would be obtained if previous decimation choices were kept and the remaining decimation steps used TTD. Therefore, (𝒃∗ , 𝒂∗ ) is random (specific to each quantizer instance); given 𝒚 and 𝒞, it is any (𝒃′ , 𝒂′ ) ∈ 𝒞 ′ ′ with probability proportional to 𝑒−𝑛𝑡𝑑(𝒚,𝒖(𝒃 ,𝒂 )) . Over the entire ensemble, by definition it remains unchanged after a decimation step using TTD, but is otherwise also specific to each decimation step. B. Analysis of Ideal Decimation The analysis of a decimation step under ideal decimation (in previous steps), as was done in the DE analysis for degree distribution optimization [16, Sec. V], can be expressed in the above framework as follows. As recovery is not necessary in this case, 𝒚 is identical to 𝒚 ˆ. For clarity, we use 𝑝¯(⋅) to denote a pdf under ideal decimation, 𝑝(⋅) when TD is used 1 Note that the definition of TTD here also encompasses the concept of true typical source generator (TTSG) introduced in [16, Sec. V-A], by viewing the choice of bits in 𝒂 as decimation steps as well.

(10)

(11)

For asymptotically large 𝑛, DE can then be carried out to obtain the densities of the 𝜈𝑠a ’s and 𝜈𝑖b ’s after any given number of BP iterations with different initial symmetric densities of BP messages. The corresponding 𝜈𝑠a∗ and 𝜈𝑖b∗ are bound by these densities in terms of physical degradation, making it possible to evaluate how close each 𝜈𝑠a∗ matches ∗ and each 𝜈𝑖b∗ matches 𝜈𝑖b . For given degree distributions, this DE analysis yields a monotonicity threshold 𝑡thr such that, as long as 𝑡 ≤ 𝑡thr , when the iteration count 𝐿 goes to infinity along with 𝑛 (i.e. decimation is performed slowly), the mean-square difference between each 𝜈𝑠a∗ , 𝜈𝑖b∗ and respectively ∗, 𝜈𝑖b goes to zero. In other words, assuming ideal decimation in previous steps, when the current decimation step is carried out using TD after a sufficiently large number of BP iterations, it will yield the same result as the TTD with high probability and thus be ideal as well. This MSE 𝜎𝑡2thr potentially achievable by TD, as defined in (9), can be very close to the theoretical limit 𝜎∗2 (𝑅) when using degree distributions optimized for a high 𝑡thr . The extrinsic probabilities 𝜈𝑗u of each 𝑢𝑗 , given by 𝜈𝑗u (𝑢) ≜

𝐾 ∏

𝜇cu 𝑐𝑘 ), 𝑗𝑘 𝑗 (˜

𝑢 = 𝜙𝑗 (˜ 𝒄) = 0, . . . , 𝑀 − 1, (12)

𝑘=1

are used by the proposed recovery algorithm as a summarization of the BP messages. The corresponding true extrinsic probabilities can be defined as ∑ ∏ 𝜆u𝑗 ′ (𝑢𝑗 ′ (𝒃′ , 𝒂′ )). (13) 𝜈𝑗u∗ (𝑢) ≜ (𝒃′ ,𝒂′ )∈𝒞 𝑗 ′ ∕=𝑗 𝑢𝑗 (𝒃′ ,𝒂′ )=𝑢

As the 𝜆u𝑗 ′ ’s have symmetric densities, it is easy to prove that each 𝜈𝑗u∗ is symmetric w.r.t. 𝑢∗𝑗 as well, i.e. 𝑝¯(𝑢∗𝑗 ∣ 𝜈𝑗u∗ ) = 𝜈𝑗u∗ (𝑢∗𝑗 ); 𝑦𝑗 (or 𝜆u𝑗 ) — 𝑢∗𝑗 — 𝜈𝑗u∗ can also be shown to form a Markov chain, as 𝜈𝑗u∗ only depends on the 𝑦𝑗 ′ ’s with 𝑗 ′ ∕= 𝑗. Using (11), we thus have 𝑝¯(𝑦𝑗 ∣ 𝜈𝑗u∗ ) = =

𝑀−1 ∑ 𝑢=0 𝑀−1 ∑

𝑝¯(𝑢∗𝑗 = 𝑢 ∣ 𝜈𝑗u∗ ) ⋅ 𝑝¯(𝑦𝑗 ∣ 𝑢∗𝑗 = 𝑢) (14) 𝜈𝑗u∗ (𝑢)

⋅

𝑝¯(𝑦𝑗 ∣ 𝑢∗𝑗

= 𝑢).

𝑢=0

Since the neighborhood in the factor graph involved in computing 𝜈𝑗u is loop-free with high probability as 𝑛 → ∞, for asymptotically large 𝑛, 𝜈𝑗u can be shown to be symmetric w.r.t. 𝑢∗𝑗 and 𝑦𝑗 — 𝑢∗𝑗 — 𝜈𝑗u is a Markov chain as well, so (14) remains true with 𝜈𝑗u∗ replaced by 𝜈𝑗u .

1834

C. Decimation Errors and Idealized Recovery Plain TD gives poor quantization performance in practice, and its reason can be understood as follows. The finite 𝑛 and 𝐿 used in practice leaves a small but finite mean-square difference between 𝜈𝑖b∗ and 𝜈𝑠a∗ and respectively 𝜈𝑖b and ∗, so each TD decimation step has a finite probability to give a result different from that of TTD, which we call decimation errors for convenience. Without recovery (𝒚 = 𝒚 ˆ), such erroneous decimations will usually not favor codewords close to 𝒚ˆ as much as the TTD, and after 𝒞 shrinks due to decimation, the subsequent reference codeword will, on average, have a larger distance to 𝒚 ˆ. The quantization error (ˆ 𝒚 − 𝒖∗ )ℐ 𝑛 is analogous to the noise in the LDPC decoder; its increase in magnitude will change the density of the 𝜆u𝑗 ’s compared to ideal decimation, usually making them over-confident, which will slow down BP convergence and reduce the informativeness of the 𝜈𝑖b ’s in future BP iterations, thus worsening the quality of future decimation choices. The recovery algorithm intends to allow BP to recover from decimation errors. It is inspired by the analysis of the similar issue in binary erasure quantization (BEQ) [16, Sec. VI-B], where erroneous decimations are evident as contradictions among the BP messages and the source sequence 𝒚ˆ, and can similarly result in less informative 𝜈𝑖b ’s in the future. In that case, a solution is to flip the contradictory bits in the source sequence 𝒚ˆ and use the resulting flipped sequence 𝒚 ˜ to generate the 𝜆u𝑗 ’s for subsequent iterations (also reminiscent of the approach in [12]). These flipped bits intuitively represent errors already made which we do not attempt to fix, and flipping them can be shown to make future 𝜈𝑖b ’s as informative as if no decimation errors occurred, so fewer decimation errors will be made in the future. The proposed recovery algorithm for 𝑀 -ary MSE quantization uses a similar idea, which can be better explained by first considering an idealized recovery algorithm that defines the 𝒚 above and is approximated by the actual one. Whenever 𝒞 and thus the reference codeword 𝒖∗ changes due to decimation errors, the algorithm adjusts 𝒚 accordingly to make each 𝑝(𝑦𝑗 ∣ 𝑢∗𝑗 ) over the ensemble identical to its ideal version (11). This keeps the density of each 𝜆u𝑗 generated from this 𝑦𝑗 , and thus that of each 𝜈𝑖b computed from these 𝜆u𝑗 ’s in subsequent iterations, identical to their counterparts under ideal decimation (which are computable via DE for asymptotically large 𝑛), so decimation errors no longer degrade the quality of future decimations. When the quantization algorithm finishes after decimating all bits in 𝒃, 𝒞 will contain only the resulting (𝒃, 𝒂), which will also be the reference codeword then; since 𝑝(𝑦𝑗 ∣ 𝑢∗𝑗 ) is ideal, E [𝑑(𝒚, 𝒖)] = E [𝑑(𝒚, 𝒖∗ )] is also equal to the ideal 𝜎𝑡2 , while the difference between 𝒚ˆ and 𝒚 correspond to decimation errors that cause the actual MSE 𝒚 , 𝒖)] to be higher. 𝜎 2 = E [𝑑(ˆ While adjustment of each 𝑦𝑗 causes 𝑢∗𝑗 to change as well, the desired 𝑝(𝑦𝑗 ∣ 𝑢∗𝑗 ) can still be achieved in idealized recovery using 𝜈𝑗u∗ from (13), which may be approximated by 𝜈𝑗u in practice. To see this, recall that given 𝒚 and 𝒞, the reference codeword is any (𝒃′ , 𝒂′ ) ∈ 𝒞 with probability ′ ′ proportional to 𝑒−𝑛𝑡𝑑(𝒚,𝒖(𝒃 ,𝒂 )) , so ignoring normalization,


𝑝(𝑢∗𝑗 = 𝑢 ∣ 𝒚, 𝒞) = =

∑ ′

′

(𝒃 ,𝒂 )∈𝒞 𝑢𝑗 (𝒃′ ,𝒂′ )=𝑢

∏

∑

′

′

𝑒−𝑛𝑡𝑑(𝒚,𝒖(𝒃 ,𝒂 ))

(𝒃′ ,𝒂′ )∈𝒞 𝑢𝑗 (𝒃′ ,𝒂′ )=𝑢

𝜆u𝑗 ′ (𝑢𝑗 ′ (𝒃′ , 𝒂′ )) = (𝜆u𝑗 ⊙ 𝜈𝑗u∗ )(𝑢);

(15)

𝑗′

that is, the dependence on 𝒚 and 𝒞 can be subsumed into 𝜆u𝑗 (a function of 𝑦𝑗 ) and 𝜈𝑗u∗ (a function of 𝒞 and the other elements of 𝒚). 𝑝(𝑢∗𝑗 ∣ 𝑦𝑗 , 𝜈𝑗u∗ ) is thus also given by (15) and identical to the ideal 𝑝¯(𝑢∗𝑗 ∣ 𝑦𝑗 , 𝜈𝑗u∗ ). Using this fact, we let 𝐹¯𝜈𝑗u∗ (⋅) and 𝐹𝜈𝑗u∗ (⋅) be the cdfs corresponding to respectively 𝑝¯(𝑦𝑗 ∣ 𝜈𝑗u∗ ) and 𝑝(ˆ 𝑦𝑗 ∣ 𝜈𝑗u∗ ), and define 𝑦𝑗 ≜ 𝐹¯𝜈−1 𝑦𝑗 )) u∗ (𝐹𝜈 u∗ (ˆ 𝑗 𝑗

(16)

as the fully-adjusted version of 𝑦ˆ𝑗 . Conditioned on 𝜈𝑗u∗ , 𝑦𝑗 ) is uniformly distributed over [0, 1], so the appli𝐹𝜈𝑗u∗ (ˆ u∗ cation of 𝐹¯𝜈−1 ¯(𝑦𝑗 ∣ 𝜈𝑗u∗ ). We then u∗ (⋅) makes 𝑝(𝑦𝑗 ∣ 𝜈𝑗 ) = 𝑝 𝑗 have 𝑝(𝑦𝑗 , 𝑢∗𝑗 ∣ 𝜈𝑗u∗ ) = 𝑝¯(𝑦𝑗 , 𝑢∗𝑗 ∣ 𝜈𝑗u∗ ) and 𝑝(𝑦𝑗 ∣ 𝑢∗𝑗 , 𝜈𝑗u∗ ) = 𝑝¯(𝑦𝑗 ∣ 𝑢∗𝑗 , 𝜈𝑗u∗ ) = 𝑝¯(𝑦𝑗 ∣ 𝑢∗𝑗 ), the final equality implied by the Markov chain property in Section III-B, so 𝑝(𝑦𝑗 ∣ 𝑢∗𝑗 ) now matches 𝑝¯(𝑦𝑗 ∣ 𝑢∗𝑗 ), as desired. It should be noted that updating a certain 𝑦𝑗 in this manner changes the corresponding 𝜆u𝑗 and thus the 𝜈𝑗u∗′ ’s for 𝑗 ′ ∕= 𝑗, so 𝑦𝑗 ′ has to be updated again according to the new 𝑝(ˆ 𝑦𝑗 ′ ∣ 𝜈𝑗u∗′ ) in ∗ order to maintain the desired 𝑝(𝑦𝑗 ′ ∣ 𝑢𝑗 ′ ). A rigorous definition of 𝒚 will thus require an iterative sequential update process, with each step updating one 𝑦𝑗 ; examination of the BEQ case leads us to conjecture that this process will always converge to a fixed point. In any case, such details are no longer relevant when 𝜈𝑗u∗ is approximated by 𝜈𝑗u below. D. Principles of the Proposed Recovery Algorithm The proposed recovery algorithm approximates the idealized version by using 𝜈𝑗u instead of 𝜈𝑗u∗ : let 𝐹¯𝜈𝑗u (⋅) and 𝐹𝜈𝑗u (⋅) be the cdfs corresponding to respectively 𝑝¯(𝑦𝑗 ∣ 𝜈𝑗u ) and 𝑝(ˆ 𝑦𝑗 ∣ 𝜈𝑗u ), the algorithm yields, for each 𝑗, 𝑦˜𝑗 ≜ 𝐹¯𝜈−1 𝑦𝑗 )). u (𝐹𝜈 u (ˆ 𝑗 𝑗

(17)

While 𝑝¯(𝑦𝑗 ∣ 𝜈𝑗u ) is well approximated by the asymptotic result (14) (with 𝜈𝑗u in place of 𝜈𝑗u∗ ), 𝑝(ˆ 𝑦𝑗 ∣ 𝜈𝑗u ) must, in practice, be u estimated from the (ˆ 𝑦𝑗 , 𝜈𝑗 ) samples in the actual quantizer instance. Approximating 𝜈𝑗u∗ with 𝜈𝑗u causes the following inaccuracies: u ∙ 𝜈𝑗 is computed with a finite number of BP iterations and ˜ u′ ’s from thus fails to include the information in the 𝜆 𝑗 faraway nodes in the factor graph. ˜u′ ’s from the 𝒚˜ given by ∙ The priors used are the 𝜆 𝑗 the actual recovery algorithm, which can be inaccurate compared to the 𝜆u𝑗 ′ ’s from idealized recovery due to both estimation errors in 𝑝(ˆ 𝑦𝑗 ∣ 𝜈𝑗u ) and the incomplete recovery issue discussed below. ˜ u′ ’s involved in the computation of the current 𝜈 u ∙ The 𝜆 𝑗 𝑗 come from recovery steps in earlier iterations, which do not take more recent decimation errors into account. As the result, while (17) makes 𝑝(˜ 𝑦𝑗 ∣ 𝜈𝑗u ) identical to 𝑝¯(𝑦𝑗 ∣ 𝜈𝑗u ) in the absence of estimation errors, this does not imply that


𝑝(˜ 𝑦𝑗 ∣ 𝑢∗𝑗 ) will match 𝑝¯(𝑦𝑗 ∣ 𝑢∗𝑗 ), as is the case for idealized recovery. In general, the effect is that 𝒚 ˜ is insufficiently adjusted compared to 𝒚; for example, when 𝜈𝑗u = ∗, 𝑝¯(𝑦𝑗 ∣ 𝜈𝑗u ) and 𝑝(ˆ 𝑦𝑗 ∣ 𝜈𝑗u ) will likely both be uniform distributions, so 𝑦˜𝑗 will be identical to 𝑦ˆ𝑗 even if 𝜈𝑗u∗ indicates that 𝑦𝑗 should be adjusted to some other value. However, such incomplete recovery does not cause much performance degradation in practice (actually none in case of BEQ). Roughly speaking, if 𝜈𝑗u is uninformative at some iteration in an quantizer instance, most of the corresponding cb outgoing BP messages 𝜇uc 𝑗𝑠 and 𝜇𝑠𝑖 will be uninformative as well and not much affected by the difference between 𝑦˜𝑗 and 𝑦𝑗 due to incomplete recovery, while the affected messages are also prevented from propagating far with the use of recovery in future iterations. In effect, recovery is being carried out incrementally, adjusting 𝑦˜𝑗 toward 𝑦𝑗 as the corresponding 𝜈𝑗u becomes informative. Due to the reliance on future recovery to stop the propagation of BP messages affected by incomplete recovery, it is important that recovery be carried out in every iteration, and u the 𝜇cu 𝑠𝑗 ’s be updated late to make the 𝜈𝑗 ’s more up-to-date (see Section II). Indeed, the algorithm often exhibits oscillatory behavior otherwise, which can affect 𝐼bc and thus disrupt the pace of decimation so much as to increase the shaping loss above 1dB. IV. T HE R ECOVERY A LGORITHM Based on the above principles, we will now explain in detail how to estimate 𝑝(ˆ 𝑦𝑗 ∣ 𝜈𝑗u ) and perform the recovery using (17) with reasonable computational complexity. A. Modeling 𝑝(ˆ 𝑦𝑗 ∣ 𝜈𝑗u ) Since the LDGM code ensemble is symmetric with respect to symbol permutation, 𝑝(ˆ 𝑦𝑗 ∣ 𝜈𝑗u ) is the same for any 𝑗. Its estimation must still rely on the 𝑛 (ˆ 𝑦𝑗 , 𝜈𝑗u ) samples from the current quantizer instance, so some assumptions must be made on it in order to reduce the number of parameters to estimate and minimize random estimation errors. Recall that 𝜈𝑗u is asymptotically symmetric w.r.t. 𝑢∗𝑗 as 𝑛 → ∞ under ideal decimation; this asymptotic symmetry should remain true under non-ideal decimation but idealized recovery, so it is reasonable to assume approximately that it holds under the actual recovery algorithm as well. Analogous to (14), we can then express 𝑝(ˆ 𝑦𝑗 ∣ 𝜈𝑗u ) as 𝑝(ˆ 𝑦𝑗 ∣ 𝜈𝑗u ) = =

𝑀−1 ∑ 𝑢=0 𝑀−1 ∑

𝑝(𝑢∗𝑗 = 𝑢 ∣ 𝜈𝑗u )𝑝(ˆ 𝑦𝑗 ∣ 𝑢∗𝑗 = 𝑢, 𝜈𝑗u ) (18) 𝜈𝑗u (𝑢)𝑝(ˆ 𝑦𝑗 ∣ 𝑢∗𝑗 = 𝑢, 𝜈𝑗u ).

𝑢=0

In general, the deviation of 𝑝(ˆ 𝑦𝑗 ∣ 𝑢∗𝑗 , 𝜈𝑗u ) from the ideal ∗ 𝑝¯(𝑦𝑗 ∣ 𝑢𝑗 ) is related to the pace of decimation and can vary with 𝜈𝑗u ; analysis of BEQ suggests that there is usually more deviation when 𝜈𝑗u is informative. However, this effect is difficult to estimate because only the (ˆ 𝑦𝑗 , 𝜈𝑗u ) samples with u informative 𝜈𝑗 ’s are useful in the estimation. In practice, attempts to model this dependence do not improve the results, so we ignore it and model 𝑝(ˆ 𝑦𝑗 ∣ 𝑢∗𝑗 , 𝜈𝑗u ) as 𝑝(ˆ 𝑦𝑗 ∣ 𝑢∗𝑗 , 𝜈𝑗u ) = 𝑝z ((ˆ 𝑦𝑗 − 𝑢∗𝑗 )ℐ ),

(19)

1835

such that 𝑝(ˆ 𝑦𝑗 ∣ 𝜈𝑗u ) =

𝑀−1 ∑

𝜈𝑗u (𝑢)𝑝z ((ˆ 𝑦𝑗 − 𝑢)ℐ ).

(20)

𝑢=0

𝑝z (⋅) is further made to satisfy the constraints 𝑝z (𝑧) = 𝑝z (−𝑧), 0 ≤ 𝑧 < 𝑀/2, 𝑀−1 ∑

𝑝z ((𝑦 − 𝑢)ℐ ) = 1, 𝑦 ∈ [0, 𝑀 ),

(21) (22)

𝑢=0

where (22) corresponds to the∫uniformity of 𝑝(ˆ 𝑦𝑗 ) and implies the normalization condition ℐ 𝑝z (𝑧) 𝑑𝑧 = 1. Note that the ideal 𝑝¯(𝑦𝑗 ∣ 𝑢∗𝑗 ) in (11) also has the form of (19) with the 2 corresponding 𝑝¯z (𝑧) ≜ 𝑒−𝑡𝑧 /𝑄(𝑧) (𝑧 ∈ ℐ) satisfying (21) and (22). The interval [0, 𝑀 ) of possible 𝑦ˆ𝑗 values is now equally divided into 𝑀 𝑆 subintervals 𝒴𝑠𝑣 , 𝑠 = 0, . . . , 𝑆 − 1, 𝑣 = 0, . . . , 𝑀 − 1, with each 𝒴𝑠𝑣 ≜ ∫ [𝑣 + 𝑠/𝑆, 𝑣 + (𝑠 + 1)/𝑆). This discretizes 𝑝z (⋅) into 𝑝𝑠𝑣 ≜ 𝒴𝑠𝑣 𝑝z ((𝑦)ℐ ) 𝑑𝑦, which can be further grouped into column vectors 𝒑𝑠 ≜ (𝑝𝑠𝑣 )𝑀−1 𝑣=0 , and the constraints above become, for all 𝑠 and 𝑣, 𝑝𝑠𝑣 = 𝑝𝑆−1−𝑠,𝑀−1−𝑣 , or 𝒑𝑠 = R𝒑𝑆−1−𝑠 , 𝑀−1 ∑

𝑝𝑠𝑣 = 1/𝑆, or 1T 𝒑𝑠 = 1/𝑆,

(23) (24)

𝑣=0

where R = RT = R−1 is such a matrix that R𝒑 is 𝒑 with its elements in reverse order, and 1 is the all-one vector. By (23), it is sufficient to estimate those 𝒑𝑠 with 𝑠 = 0, . . . , 𝑆/2 − 1, each satisfying (24), from the 𝑛 (ˆ 𝑦𝑗 , 𝜈𝑗u ) samples according to the discretized version of (20), ∑ ] 𝑀−1 [ 𝜈𝑗u (𝑢)𝑝𝑠,𝑣⊖𝑢 , Pr 𝑦ˆ𝑗 ∈ 𝒴𝑠𝑣 ∣ 𝜈𝑗u =

∀𝑗, 𝑠, 𝑣.

(25)

𝑢=0

B. Computing the Raw Estimate of 𝑝𝑠𝑣 Maximum-likelihood estimation using (25) turns out to be computationally intractable, so we adopt a regularized leastsquares method instead. For each[𝑗, 𝑠 and 𝑣, the left-hand side ] of (25) is just the expectation E 1[ˆ 𝑦𝑗 ∈ 𝒴𝑠𝑣 ∣ 𝜈𝑗u ] , so we can approximate it with its sample 1[ˆ 𝑦𝑗 ∈ 𝒴𝑠𝑣 ] obtained from the actual 𝑦ˆ𝑗 , yielding 1[ˆ 𝑦𝑗 ∈ 𝒴𝑠𝑣 ] =

𝑀−1 ∑

𝜈𝑗u (𝑢)𝑝𝑠,𝑣⊖𝑢 + 𝑤𝑠𝑗𝑣 ,

∀𝑗, 𝑠, 𝑣,

(26)

𝑢=0

𝑤𝑠𝑗𝑣 being the sampling error. These equations can be grouped by 𝑠 and re-expressed as 𝑑𝑠𝑗𝑣 =

𝑀−1 ∑

′

𝐴𝑣𝑗𝑣 𝑝𝑠𝑣′ + 𝑤𝑠𝑗𝑣 , or 𝒅𝑠 = 𝑨𝒑𝑠 + 𝒘𝑠 ,

(27)

𝑣 ′ =0 ′

where 𝐴𝑣𝑗𝑣 ≜ 𝜈𝑗u (𝑣 ⊖ 𝑣 ′ ), 𝑑𝑠𝑗𝑣 ≜ 1[ˆ 𝑦𝑗 ∈ 𝒴𝑠𝑣 ], and they form respectively an 𝑀 𝑛×𝑀 matrix 𝑨 and 𝑀 𝑛×1 vectors 𝒅𝑠 . For each 𝑗, the corresponding 𝑀 rows of 𝑨 and 𝒅𝑠 are denoted 𝑨𝑗 and 𝒅𝑠𝑗 , respectively; note that each 𝑨𝑗 is a circulant matrix that does not vary with 𝑠.

1836


Since 𝒑𝑠 = R𝒑𝑆−1−𝑠 , for each 𝑠 ∈ {0, . . . , 𝑆/2 − 1}, (27) for 𝑠 and 𝑆 − 1 − 𝑠 can be combined to yield [ ] [ ] ] [ 𝒅𝑠 𝑨 𝒘𝑠 = 𝒑 + , (28) 𝑨R 𝑠 𝒅𝑆−1−𝑠 𝒘𝑆−1−𝑠 ˜ 𝑠+𝒘 or simply 𝒅˜𝑠 = 𝑨𝒑 ˜𝑠 . The estimation of 𝒑𝑠 is then formulated as the following regularized least-squares problem 2 1 1 ˜ 2 ˜ 𝑠 𝒅 minimize − 𝑨𝒑 + 2 ∥𝒑𝑠 − 𝒑¯𝑠 ∥ 𝑠 2 𝜎𝑤 𝜎𝑝 (29) T subject to 1 𝒑𝑠 = 1/𝑆, ∫ where we have let 𝑝¯𝑠𝑣 ≜ 𝒴𝑠𝑣 𝑝¯z (𝑧) 𝑑𝑧, or 𝒑¯𝑠 in vector form, be the “ideal” 𝒑𝑠 (which also satisfies 1T 𝒑¯𝑠 = 1/𝑆), and 2 and 𝜎𝑝2 can respectively be the regularization parameters 𝜎𝑤 understood as the variance of each 𝑤𝑠𝑗𝑣 and the mean-square deviation of each 𝒑𝑠 from 𝒑¯𝑠 . Solving (29), we obtain the estimate ˜ + 𝜌I)−1 (𝑨 ˜T 𝑨 ˜T 𝒅˜𝑠 + 𝜌𝒑¯𝑠 − 𝜇1) 𝒑𝑠 = (𝑨 = (𝑨T 𝑨 + R𝑨T 𝑨R + 𝜌I)−1 T

(30)

T

⋅ (𝑨 𝒅𝑠 + R𝑨 𝒅𝑆−1−𝑠 + 𝜌𝒑¯𝑠 − 𝜇1), 2 where 𝜌 ≜ 𝜎𝑤 /𝜎𝑝2 and 𝜇 is the Lagrange dual variable determined by the constraint 1T 𝒑𝑠 = 1/𝑆. T There remains the computation of the 𝑀 × 𝑀 matrix 𝑨 ∑ 𝑨 T T T and the 𝑀 × 1 vectors 𝑨 𝒅𝑠 . Note that 𝑨 𝑨 = 𝑗 𝑨𝑗 𝑨𝑗 ; T since each 𝑨𝑗 is circulant, so is 𝑨T 𝑗 𝑨𝑗 and thus 𝑨 𝑨, and the matrix inversion in (30) can be computed efficiently using FFT. Specifically, we first use the circulant property of 𝑨T 𝑨 to see that (𝑨T 𝑨 + R𝑨T 𝑨R + 𝜌I)1 = (2𝛼 + 𝜌)1, where 𝛼 is the sum of any row in 𝑨T 𝑨; this allows the term with 𝜇 in (30) to be separated out, yielding

𝒑𝑠 = 𝒑˜𝑠 − 𝜇′ 1,

(31)

where 𝒑˜𝑠 is (30) sans the 𝜇-term, and 𝜇′ ≜ 𝜇/(2𝛼 + 𝜌) can be computed from the 1T 𝒑𝑠 = 1/𝑆 constraint as 𝜇′ = (1T 𝒑˜𝑠 − 1/𝑆)/𝑀 . Let F ≜ (𝑒−j2𝜋𝑘𝑣/𝑀 )𝑘𝑣 be the DFT matrix with F−1 = FH /𝑀 (both the row index 𝑘 and the column index 𝑣 start at 0 for convenience), then (30) can be transformed into F𝒑˜𝑠 = (F𝑨T 𝑨F−1 + FR𝑨T 𝑨RF−1 + 𝜌I)−1 ⋅ F(𝑨T 𝒅𝑠 + R𝑨T 𝒅𝑆−1−𝑠 + 𝜌𝒑¯𝑠 ).

(32)

−1 Now F𝑨𝑗 F−1 = diag(A𝑗 ) and F𝑨T = diag(A𝑗 )∗ , 𝑗 F ∗ where (⋅) denotes complex conjugation and A𝑗 (not to be confused with 𝑨𝑗 ) is an 𝑀 × 1 vector representing the DFT of 𝜈𝑗u , i.e. 𝑀−1 ∑ 𝑒−j2𝜋𝑘𝑢/𝑀 𝜈𝑗u (𝑢). (33) (A𝑗 )𝑘 ≜ 𝑢=0 ∗

Moreover, FR = ΦF where Φ ≜ diag((𝑒j2𝜋𝑘/𝑀 )𝑀−1 𝑘=0 ). Therefore, ∑ −1 F𝑨T 𝑨F−1 = F𝑨T = diag(Q), (34) 𝑗 𝑨𝑗 F 𝑗 T

−1

FR𝑨 𝑨RF = Φ ⋅ diag(Q)∗ Φ−1 = diag(Q), (35) ∑ 2 where Q ≜ 𝑗 ∣A𝑗 ∣ is an 𝑀 × 1 vector and ∣⋅∣ takes the complex magnitude element-wise. On the other hand, 𝑨T 𝒅𝑠 =

∑

T 𝑨T 𝑗 𝒅𝑠𝑗 , with each 𝑨𝑗 𝒅𝑠𝑗 given by { ∑ ′ 𝜈𝑗u (𝑣 ⊖ 𝑣 ′ ), if 𝑦ˆ𝑗 ∈ 𝒴𝑠𝑣 , (𝑨T 𝐴𝑣𝑗𝑣 𝑑𝑠𝑗𝑣 = 𝑗 𝒅𝑠𝑗 )𝑣 ′ = / ∪𝑣 𝒴𝑠𝑣 , 0, if 𝑦ˆ𝑗 ∈ 𝑣 (36) and 𝑨T 𝒅𝑆−1−𝑠 can be computed analogously. 𝒑˜𝑠 can thus be obtained from (32), with the left-multiplication by F and F−1 carried out using FFT and inverse FFT respectively, and 𝒑𝑠 follows from (31). 𝑗

C. Further Regularization of the Estimate and Computation of the Adjusted 𝒚˜ As the final step of the recovery algorithm, we regularize the estimated 𝑝𝑠𝑣 ’s, compute the corresponding 𝐹𝜈𝑗u (⋅)’s, and use (17) to obtain the adjusted 𝒚 ˜. By (20), each 𝐹𝜈𝑗u (⋅) is a linear combination 𝐹𝜈𝑗u (𝑦) =

𝑀−1 ∑

𝜈𝑗u (𝑢)𝐹𝑢 (𝑦)

(37)

𝑢=0

of 𝐹𝑢 (⋅), the cdfs corresponding to 𝑝z ((⋅ − 𝑢)ℐ ). For convenience, we define the “periodically extended” cdf 𝐹 (𝑦) ≜ ∫𝑦 ′ 𝑝 ((𝑦 )ℐ ) 𝑑𝑦 ′ for 𝑝z (⋅), so that 𝐹𝑢 (𝑦) becomes simply z 0 𝐹 (𝑦 − 𝑢) − 𝐹 (−𝑢). After discretization, for 𝑦 ∈ [0, 𝑀 ] that are integer multiples of 1/𝑆, ∑ 𝐹 (𝑦) = 𝑝𝑠𝑣 , (38) (𝑠,𝑣):𝑣+𝑠/𝑆