Dequantizing compressed sensing with non ... - Infoscience - EPFL

1 downloads 0 Views 201KB Size Report
2Communications and Remote Sensing Laboratory, Université catholique de Louvain (UCL), ... In this paper, following the Compressed Sensing paradigm,.
DEQUANTIZING COMPRESSED SENSING WITH NON-GAUSSIAN CONSTRAINTS L. Jacques1,2 , D. K. Hammond1 , M. J. Fadili3 1 2

Institute of Electrical Engineering, Ecole Polytechnique F´ed´erale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland

Communications and Remote Sensing Laboratory, Universit´e catholique de Louvain (UCL), B-1348 Louvain-la-Neuve, Belgium 3

GREYC CNRS-ENSICAEN-Universit´e de Caen, 14050 Caen France

ABSTRACT In this paper, following the Compressed Sensing paradigm, we study the problem of recovering sparse or compressible signals from uniformly quantized measurements. We present a new class of convex optimization programs, or decoders, coined Basis Pursuit DeQuantizer of moment p (BPDQp ), that model the quantization distortion more faithfully than the commonly used Basis Pursuit DeNoise (BPDN) program. Our decoders proceed by minimizing the sparsity of the signal to be reconstructed while enforcing a data fidelity term of bounded `p -norm, for 2 < p 6 ∞. We show that in oversampled situations the performance of the BPDQp decoders are significantly better than that of BPDN, √ with reconstruction error due to quantization divided by p + 1. This reduction relies on a modified Restricted Isometry Property of the sensing matrix expressed in the `p norm (RIPp ); a property satisfied by Gaussian random matrices with high probability. We conclude with numerical experiments comparing BPDQp and BPDN for signal and image reconstruction problems. Index Terms— Compressed Sensing, Quantization, Sampling, Uniform noise, Convex Optimization, Basis Pursuit.

∆(yq , ) = arg min kuk1 s.t. kyq − Φuk2 6 , (BPDN)

1. INTRODUCTION

u∈RN

The theory of Compressed Sensing (CS) [1] enables reconstruction of sparse or compressible signals from a small number of linear measurements, relative to the dimension of the signal space. In this setting, knowledge of a signal x ∈ RN is contained in the m 6 N linear measurements provided by a sensing matrix Φ ∈ Rm×N , i.e. we know only the m inner products hϕi , xi ; where (ϕi )m−1 i=0 are the rows of Φ. In any realistic digital acquisition system, these analog measurements must be quantized before they may be stored or transmitted. The study of signal recovery from quantized measurements is thus of fundamental interest. Therefore, in this paper we are interested in the noiseless and uniformly quantized sensing (or coding) model: yq = Qα [Φx] = Φx + n,

where yq ∈ (αN + α2 )m is the quantized measurement vector, (Qα [·])i = αb(·)i /αc + α2 is the uniform quantization operator in Rm of bin width α, and n ∈ Rm is the quantization distortion. This model is a realistic description of systems where the quantization distortion dominates other secondary noise sources (e.g. thermal noise), an assumption valid for many electronic measurement devices. A CS reconstruction program (or decoder) relies on the assumption that the sensed signal x is sparse or compressible in an orthogonal1 basis Ψ ∈ RN ×N , i.e. the best K-term approximation xK in Ψ is an exact or accurate representation of this signal even for small K < N . For simplicity, only the canonical basis Ψ = Id will be considered here. The commonly used Basis Pursuit algorithm for CS recovery finds the sparsest signal (in `1 norm) that could have produced the observed measurements. Directly using the quantized measurements in Basis Pursuit fails, however, as there may be no signal whose (unquantized) measurements reproduce the observed quantized values! This problem may be resolved by relaxing the data fidelity constraint. Using a quadratic constraint yields the standard Basis Pursuit DeNoise (BPDN) program [3]:

(1)

L. J. is a Postdoctoral Researcher of the Belgian National Science Foundation (F.R.S.-FNRS).

where k · k2 is the `2 -norm. The value of  depends on the magnitude of quantization distortion, and should be chosen just large enough to ensure that the measurements of the original true signal satisfy the data fidelity constraint. In [3], an estimator of  is obtained by considering n as a uniform random vector X with Xi ∼iid U ([− α2 , α2 ]), i.e. 2 = E[kXk22 ] + p 2 2 1 κ Var[kXk22 ] = α12 m + κ 6α√5 m 2 . In that case, u = x respects the constraint of BPDN with a probability higher than 2 1 − e−c0 κ for a certain constant c0 > 0. The stability of BPDN is guaranteed if the sensing matrix Φ ∈ Rm×N satisfies the Restricted Isometry Property (RIP) of order K and √ radius δ ∈ (0, 1), i.e. if there√exists a constant µ such that µ 1 − δ kuk2 6 kΦuk2 6 µ 1 + δ kuk2 , for all K-sparse signals u ∈ RN . Generally, CS is described with 1A

generalization for redundant basis, or dictionary, exists [2].

¯ = Φ/µ having unit-norm columns (in normalized matrices Φ expectation) so that µ is absorbed in the normalizing constant. Interestingly, Standard Gaussian Random (SGR) matrices, i.e. with entries drawn from Φij ∼iid N (0, 1), satisfy √ the RIP with a controllable high probability (with µ = m), as soon as m > O(K log N/K) [3]. Moreover, other random constructions satisfying the RIP exist (e.g. Bernoulli matrix, Fourier ensemble, etc.) [1, 3]. For completeness, we include the following theorem expressing the aforementioned stability result, i.e. the `2 − `1 instance optimality2 of BPDN. Theorem 1 ([5]). Let x ∈ RN be a compressible signal with 1 a K-term `1 -approximation error e0 (K) = K − 2 kx − xK k1 , for 0 6 K 6 N , and xK the best K-term `2 -approximation of x. Let √ Φ be a RIP matrix of order 2K and radius 0 < δ2K < 2 − 1. Given a measurement vector y = Φx + n corrupted by a noise n with power knk2 6 , the solution x∗ = ∆(y, ) obeys the `2 − `1 instance optimality kx∗ − xk2 6 A e0 (K) + B µ , (2) √



2−1)δ2K 4 √1+δ2K √ for values A = 2 1+( and B = 1−( . For 1−( 2+1)δ2K 2+1)δ2K instance, for δ2K = 0.2, A < 4.2 and B < 8.5.

However, using the BPDN decoder to account for quantization distortion is theoretically unsatisfying for several reasons. First, there is no guarantee that the BPDN solution x∗ respects Quantization Consistency (QC), i.e. Qα [Φx∗ ] = yq . This will be met iff kyq −Φx∗ k∞ 6 α2 , which is not necessarily implied by the BPDN `2 fidelity constraint. Second, from a Bayesian Maximum a Posteriori (MAP) standpoint, BPDN can be viewed as solving an ill-posed inverse problem where the `2 -norm used in the fidelity term corresponds to the conditional log-likelihood associated to an additive white Gaussian noise. However, the quantization distortion is not Gaussian, but rather uniformly distributed. This motivates the need for a new kind of CS decoder that more faithfully models the quantization distortion. Recently, a few works have focused on this problem. In [6], the extreme case of 1-bit CS is studied, i.e. when only the signs of the measurements are sent to the decoder. Authors tackle the reconstruction problem by adding a sign consistency constraint in a modified BPDN program working on the sphere of unit-norm signals. In [7], an adaptation of both BPDN and the Subspace Pursuit integrates the QC constraint explicitly. However, despite interesting experimental results, no theoretical guarantees are given about the approximation error reached by these solutions. In oversampled ADC conversion of signal [8] and in image restoration problems [9], dequantization obtained from global optimization with equivalent QC constraint expressed in `∞ -norm can also be found. This paper, linked to the companion technical report [11], is structured as follows. In Section 2, we present a new class 2 Adopting

the definition of mixed-norm instance optimality [4].

of abstract decoders, coined Basis Pursuit DeQuantizer of moment p (BPDQp ), that model the quantization distortion more faithfully. These decoders consist in minimizing the sparsity of the signal to be reconstructed while imposing a data fidelity term of bounded `p -norm, for 2 6 p 6 ∞. Section 3 introduces a modified Restricted Isometry Property (RIPp ) expressed in the `p -norm. With this tool, we then prove then the stability of the BPDQp programs, i.e. their `2 − `1 instance optimality. In Section 4, we show that, given a sufficient number of measurements, the approximation er√ ror due to quantization scales inversely with p + 1. Finally, Section 5 reports numerical simulations on signal and image reconstruction problems. 2. BASIS PURSUIT DEQUANTIZERS We introduce a new class of optimization programs (or decoders) that generalize the fidelity term of the BPDN program to noises that follow a centered Generalized Gaussian Distribution (GGD) of shape parameter p > 1 [10], with the uniform noise case corresponding to p → ∞. These decoders reconstruct an approximation of the sparse or compressible signal x from its distorted measurements y = Φx + n when the distortion has a bounded pth moment, i.e. knkp 6 . Formally, the decoder writes ∆p (y, ) = arg min kuk1 s.t. ky − Φukp 6 .

(BPDQp )

u∈RN

We dub this class of decoders Basis Pursuit DeQuantizer of moment p (or BPDQp ) since, as shown in Section 4, their approximation error when Φx is uniformly quantized decreases as both the moment p and the oversampling factor m/K increase. 3. RIPp AND `2 − `1 INSTANCE OPTIMALITY In order to study the approximation error of BPDQp , we introduce the Restricted Isometry Property of moment p (or RIPp ). Definition 1. A matrix Φ ∈ Rm×N satisfies the RIPp (1 6 p 6 ∞) property of order K and radius δ, if there exists a constant µp > 0 such that √ √ µp 1 − δ kxk2 6 kΦxkp 6 µp 1 + δ kxk2 , (3) for all x ∈ RN with kxk0 6 K, and where k·kp is the `p -norm on Rm . The common RIP previously introduced is thus the RIP2 . Interestingly, SGR matrices Φ ∈ Rm×N satisfy also the RIPp of order K and radius 0 < δ < 1 with high  probability provided that m > O (δ −2 K log N/K)p/2 for 2 6 p < ∞, or log m > O(δ −2 K log N/K) for p = ∞; see [11] for details. Moreover, for these matrices an asymptotic (in m) approximation for µp is √ −1 1 1 p p 2 π 2p Γ[ p+1 (4) 2 ] m

1

p = 10

36

SNR SNR(dB) (dB)

p=4 p=3 32

BPDN

28

24

10

15

20

25 m/K m/K

30

35

40

Fig. 1: Quality of BPDQp for different m/K and p.

√ if 2 6 p < ∞ and µ∞ > ρ−1 log m for a certain ρ > 0 [11]. This results from the specialization to SGR vectors of a study made in [12]. We are now ready to state our main result. Theorem 2. Let x ∈ RN be a compressible signal with a K1 term `1 -approximation error e0 (K) = K − 2 kx − xK k1 , for 0 6 K 6 N , and xK the best K-term `2 -approximation of x. Let Φ be a RIPp matrix on s sparse signals with radius δs , for s ∈ {K, 2K, 3K} and 2 6 p < ∞. Given a measurement vector y = Φx+n with knkp 6 , the solution x∗p = ∆p (y, ) obeys the `2 − `1 instance optimality  kx∗p − xk2 6 Ap e0 (K) + Bp , (5) µp 2(1+C −δ

)

p 2K for Ap = 1−δ2K −Cp , Bp = Cp (δK , δ2K , δ3K ) well behaved.

√ 4 1+δ2K 1−δ2K −Cp ,

and with Cp =

The approximation error reached by BPDQ is thus bounded in (5) by the sum of the compressibility and the noise errors. As shown in [11], this theorem uses explicitly the 2-smoothness of the Banach spaces `p when 2 6 p < ∞ [13] and their p possible embedding in `2 . The value Cp behaves as (δK + δ3K ) (1 + δ2K ) p for large p, and as δ3K + 43 (1 + δ3K )(p − 2) for p ' 2. For p = 2, Theorem 2 √ reduces to Theorem 1 if δ3K = 2 δ2K . 4. QUANTIZATION ERROR REDUCTION We now turn to the behavior of the BPDNp decoders on quantized measurements of sparse or compressible signals. First, assuming the quantization distortion n = Qα [Φx] − Φx is uniformly distributed in each quantization bin, we proved in [11] that for √ 1 α m + κ (p + 1) m p , (6)  = p (α) , 2 (p+1) 1/p u = x is a solution of the BPDQp fidelity constraint with a 2 probability at least 1 − e−2κ . Second, by Theorem 2, when Φ is RIPp with 2 6 p < ∞,  i.e. when for m > O (δ −2 K log N/K)p/2 for SGR matrices, we have p (α) kx − x∗p k2 6 Ap e0 (K) + Bp . (7) µp

Third, from (4), we have the approximation µp ' cp m p √ −1 1 3 √ 1 p with cp = 2 π 2p Γ[ p+1 > 2− 2 e− 4 p + 1 1 + 2 ]  1 z z 2 O(p−2 ) , using Stirling formula Γ(z) = ( 2π z ) ( e ) (1 + 1 O( z )). Finally, bounding the different functions involved yields  p (α) α . C√ 1 + O(p−2 ) , C < 1.497. (8) µp p+1 In short, the noise error term in the `2 − `1 instance optimality √ relation (7) for the quantized model (1) is thus divided by p + 1 if the sensing matrix Φ satisfies the RIPp ! More precisely, with a philosophy close to the oversampled ADC conversion [8], this error noise reduction happens in oversampled sensing, i.e. when the oversampling factor m/K is high. Indeed, in that case a SGR matrix Φ satisfies the RIPp with high probability for high p. Moreover, oversampling gives a smaller δ, i.e. δ ∝ m−1/p , hence counteracting the increase of p in the factor Cp of the values Ap > 2 and Bp > 4. This decrease of δ also favors BPDN, but since the value A = A2 and B = B2 in (2) are bounded from below this effect is limited. This is confirmed experimentally in Section 5. Finally, note that the necessity of satisfying RIPp implies that we cannot directly set p = ∞ in BPDQp to impose Quantization Consistency (QC) of this decoder3 . Indeed, for a given oversampling factor m/K, a SGR matrix Φ can be RIPp only over a finite interval p ∈ [2, pmax ]. 5. EXPERIMENTAL RESULTS The BPDQp decoders are practically solved by monotone operator splitting proximal methods [14, 11]. More precisely, as both the `1 -norm and the indicator function of the constraint in BPDQp are non-differentiable, the DouglasRachford splitting is used. The Douglas-Rachford recursion to solve BPDQp can be written in the compact form u(t+1) = (1− α2t )u(t) + α2t (2Sγ −Id )◦(2PTp () −Id )(u(t) ), where αt ∈ (0, 2), ∀t ∈ N, γ > 0, Sγ is the componentwise soft-thresholding operator with threshold γ and PTp () is the orthogonal projection onto the closed convex constraint set Tp () = {u ∈ RN : kyq − Φukp 6 }. From [14], one can show that the sequence (u(t) )t∈N converges to some point u∗ and PTp () (u∗ ) is a solution of BPDQp . The projection PTp () was computed iteratively using Newton’s method to solve the Lagrange multiplier equations arising from minimizing the distance to the constraint set. For the first experiment, setting the dimension N = 1024 and the sparsity level K = 16, we have generated 500 Ksparse signals with support selected uniformly at random in {1, · · · , N }. The non-zero elements have been drawn from a standard Gaussian distribution N (0, 1). For each sparse signal, m quantized measurements have been recorded as in 3 Observing

also that limp→∞ p (α) = α/2.

(a)

(b) (SNR = 2.96 dB)

(c) (SNR = 3.93 dB)

Fig. 2: Reconstruction from quantized undersampled Fourier measurements. (a) Original; details from (b) BPDN; and (c) BPDQ10

model (1) with a SGR matrix Φ ∈ Rm×N . The bin width has been set to α = kΦxk∞ /40. In Figure 1, we plot the average quality of the reconstructions of BPDQp for various p > 2 and m/K ∈ [10, 40]. We use the quality metric ||x|| SNR(ˆ x; x) = 20 log10 ||x−ˆx2|| , where x is the true original 2 signal and x ˆ the reconstruction. The different decoders become dominant from oversampling factors m/K increasing with p. This confirms the fact that the noise error can be reduced when both p and m/K are high. In the second experiment, we applied our methods to a model undersampled MRI reconstruction problem. Using an example similar to [15], the original signal is a 256 by 256 pixel “simulated angiogram” comprised of 10 randomly placed ellipses. The linear measurements are the real and imaginary components of one sixth of the Fourier coefficients at randomly selected locations in Fourier space, giving m = 2562 /6 independent measurements. These are quantized with a bin width α giving at most 12 quantization levels for each measurement. We use the Haar wavelet transform as a sparsity basis. The measurement matrix is then Φ = F Ψ, where Ψ is the Haar matrix, and F is formed by the randomly selected rows of the Discrete Fourier Transform matrix. The original image has K = 821 nonzero wavelet coefficients, giving an oversampling ratio m/K = 13.3. In Figure 2, we show 100 by 100 pixel details of the results of reconstruction with BPDN, and with BPDQ for p = 10. Note that we do not have any proof that the sensing matrix Φ satisfies the RIPp (3). We nonetheless obtain similar results as in the previous 1-d example. The BPDQ reconstruction shows improvements both in SNR and visual quality compared to BPDN. 6. CONCLUSION The objective of this paper was to show that the BPDN reconstruction program commonly used in Compressed Sensing with noisy measurements is not always adapted to quantization distortion. We introduced a new class of decoders, the Basis Pursuit DeQuantizers, and we have shown both theoretically and experimentally that BPDQp exhibit an substantial

reduction of the approximation error in oversampled situations. An interesting perspective is to characterize the evolution of the optimal moment p with the oversampling ratio. This could allow the selection of the best BPDQ decoder as a function of the precise CS coding/decoding scenario. 7. REFERENCES [1] E.J. Cand`es and J. Romberg, “Quantitative Robust Uncertainty Principles and Optimally Sparse Decompositions,” Found. Comp. Math., 6(2):227–254, 2006. [2] H. Rauhut, K. Schnass, and P. Vandergheynst, “Compressed sensing and redundant dictionaries,” IEEE T. Inform. Theory., 54(5):2210– 2219, 2007. [3] E. Cand`es, J. Romberg, and T. Tao, “Stable signal recovery from incomplete and inaccurate measurements,” Comm. Pure Appl. Math, 59(8):1207–1223, 2006. [4] A. Cohen, R. DeVore, and W. Dahmen, “Compressed sensing and best k-term approximation,” J. Amer. Math. Soc., 22:211–231, 2009. [5] E. Cand`es, “The restricted isometry property and its implications for compressed sensing,” Compte Rendus de l’Academie des Sciences, Paris, Serie I, 346:589–592, 2008. [6] P. Boufounos and R.G. Baraniuk, “1-bit compressive sensing,” in 42nd Conf. Inf. Sc. Systems (CISS), Princeton, NJ, March 2008, 19–21. [7] W. Dai, H. Vinh Pham, and O. Milenkovic, “Distortion-rate functions for quantized compressive sensing,” preprint, 2009, arXiv:0901.0749. [8] N.T. Thao and M. Vetterli, “Deterministic analysis of oversampled A/D conversion and decoding improvement based on consistent estimates,” IEEE. T. Sig. Proc., 42(3):519–531, 1994. [9] P. Weiss, L. Blanc-Feraud, T. Andre, and M. Antonini “Compression artifacts reduction using variational methods: Algorithms and experimental study Acoustics,” IEEE ICASSP 2008, 1173–1176. [10] M.K. Varanasi and B. Aazhang, “Parametric generalized Gaussian density estimation,” Journ. Acoust. Soc. Amer., 86:1404–1415, 1989. [11] L. Jacques, D.K. Hammond, and M.J. Fadili, “Dequantizing compressed sensing,” Tech. Rep., 2009, http://www.tele.ucl. ac.be/˜jacques/preprint/TR-LJ-2009.01.pdf. [12] D. Franc¸ois, V. Wertz, and M. Verleysen, “The Concentration of Fractional Distances,” IEEE T. Know. Data. Eng., 873–886, 2007. [13] W.L. Bynum, “Weak parallelogram laws for Banach spaces,” Canad. Math. Bull., 19(3):269–275, 1976. [14] P.L. Combettes, “Solving monotone inclusions via compositions of nonexpansive averaged operators,” Optimization, 53:475–504, 2004. [15] M. Lustig, D. Donoho and J. M. Pauly, “Sparse MRI: The Application of Compressed Sensing for Rapid MR Imaging,” Magnetic Resonance in Medicine, 58:1182–1195, 2007.