BAYESIAN COMPRESSED SENSING USING ... - Google Sites

1 downloads 216 Views 195KB Size Report
weight to small components encourages sparse solutions. The CS reconstruction ... knowledge about the signal. ... MERIDI
BAYESIAN COMPRESSED SENSING USING GENERALIZED CAUCHY PRIORS Rafael E. Carrillo, Tuncer C. Aysal and Kenneth E. Barner Department of Electrical and Computer Engineering University of Delaware uses the unconstrained convex program

ABSTRACT Compressed sensing shows that a sparse or compressible signal can be reconstructed from a few incoherent measurements. Noting that sparse signals can be well modeled by algebraic-tailed impulsive distributions, in this paper, we formulate the sparse recovery problem in a Bayesian framework using algebraic-tailed priors from the generalized Cauchy distribution (GCD) family for the signal coefficients. We develop an iterative reconstruction algorithm from this Bayesian formulation. Simulation results show that the proposed method requires fewer samples than most existing reconstruction methods to recover sparse signals, thereby validating the use of GCD priors for the sparse reconstruction problem. Index Termsβ€” Compressed sensing, Bayesian methods, signal reconstruction, nonlinear estimation, impulse noise 1. INTRODUCTION Compressed sensing (CS) is a novel framework that goes against the traditional data acquisition paradigm. CS demonstrates that a sparse, or compressible, signal can be acquired using a low rate acquisition process that projects the signal onto a small set of vectors incoherent with the sparsity basis [1]. Let π‘₯ ∈ ℝ𝑛 be an sparse signal, and 𝑦 = Ξ¦π‘₯ a set of measurements with Ξ¦ an π‘š Γ— 𝑛 sensing matrix (π‘š < 𝑛). The optimal algorithm to recover π‘₯ from the measurements is min βˆ₯π‘₯βˆ₯0 subject to Ξ¦π‘₯ = 𝑦 π‘₯

(1)

(optimal in the sense that finds the sparsest vector π‘₯ such that is consistent with the measurements). Since noise is always present in real data acquisition systems, the acquisition system can be modeled as 𝑦 = Ξ¦π‘₯ + π‘Ÿ

(2)

where π‘Ÿ represents the sampling noise. The problem in (1) is combinatorial and NP-hard; however, a range of different algorithms have been developed that enable approximate reconstruction of sparse signals from noisy compressive measurements [1, 2, 3]. The most common approach is to use Basis Pursuit Denoising (BPD) [1], which

min βˆ₯𝑦 βˆ’ Ξ¦π‘₯βˆ₯22 + πœ†βˆ₯π‘₯βˆ₯1 , π‘₯

(3)

to estimate a solution of the problem. A family of iterative greedy algorithms ([2] and references therein) are shown to enjoy a similar approximate reconstruction property, generally with less computational complexity. However, these algorithms require more measurements for exact reconstruction than the 𝐿1 minimization approach. Recent works show that nonconvex optimization problems can recover a sparse signal with fewer measurements than current geometric methods, while preserving the same reconstruction quality [4, 5]. In [4], the authors replace the 𝐿1 norm in BPD with the 𝐿𝑝 norms, for 0 < 𝑝 < 1, to approximate the 𝐿0 norm and encourage sparsity in the solution. Cand`es et. al use a re-weighted 𝐿1 minimization approach to find a sparse solution in [5]. The idea is that giving a large weight to small components encourages sparse solutions. The CS reconstruction problem can also be formulated in a Bayesian framework (see [6] and references therein), where the coefficients of π‘₯ are modeled with Laplacian priors and a solution is iteratively constructed. The basic premise in CS is that a small set of coefficients in the signal have larger value than the rest of the coefficients (ideally zero), yielding a very impulsive characterization. Algebraic-tailed distributions put more mass in very high amplitude values and also in β€œzerolike” small values, and are therefore more suitable models for sparse coefficients of compressible signals. In this paper, we formulate the CS recovery problem in a Bayesian framework using algebraic-tailed priors from the generalized Cauchy distribution (GCD) family for the signal coefficients. An iterative reconstruction algorithm is developed from this Bayesian formulation. Simulation results show that GCD priors are a good model for sparse representations. Numerical results also show that the proposed method requires fewer samples than most existing recovery strategies to perform the reconstruction. 2. BAYESIAN MODELING AND GENERALIZED CAUCHY DISTRIBUTION In Bayesian modeling, all unknowns are treated as stochastic quantities with assigned probability distributions. Consider

the observation model in (2). The unknown signal π‘₯ is modeled by a prior distribution 𝑝(π‘₯), which represents the a priori knowledge about the signal. The observation 𝑦 is modeled by the likelihood function 𝑝(π‘¦βˆ£π‘₯). Modeling the sampling noise as white Gaussian noise and using a Laplacian prior for π‘₯, the maximum a posteriori (MAP) estimate of π‘₯ is equivalent to find the solution of (3) [6]. The generalized Cauchy distribution (GCD) family has algebraic tails which makes it suitable to model many impulsive processes in real life (see [7, 8] and references therein). The PDF of the GCD is given by 2

𝑓 (𝑧) = π‘Žπ›Ώ(𝛿 𝑝 + βˆ£π‘§βˆ£π‘ )βˆ’ 𝑝

Since 𝑝(π‘₯βˆ£π‘¦; 𝜎, 𝛿) ∝ 𝑝(π‘¦βˆ£π‘₯; 𝜎)𝑝(π‘₯βˆ£π›Ώ), the MAP estimate, assuming 𝜎 and 𝛿 known, is 1 π‘₯ Λ† = arg min βˆ₯𝑦 βˆ’ Ξ¦π‘₯βˆ₯22 + πœ†βˆ₯π‘₯βˆ₯𝐿𝐿1 ,𝛿 π‘₯ 2

(8)

where πœ† = 2𝜎 2 . One remark to make is that the 𝐿𝐿1 norm has been previously used to approximate the 𝐿0 norm but without making a statistical connection to the signal model. The re-weighted 𝐿1 approach proposed in [5] is equivalent to finding a solution for the first order approximation of the problem in (8) using a decreasing sequence for 𝛿.

(4) 4. FIXED POINT ALGORITHM

with π‘Ž = 𝑝Γ(2/𝑝)/2(Ξ“(1/𝑝))2. In this representation, 𝛿 is the scale parameter and 𝑝 is the tail constant. The GCD family contains the meridian [7] and Cauchy distributions as special cases with 𝑝 = 1 and 𝑝 = 2, respectively. For 𝑝 < 2, the tail of the PDF decays slower than in the Cauchy distribution, resulting in a heavier-tailed PDF. Similar to 𝐿𝑝 norms derived from the generalized Gaussian density (GGD) family, a family of robust metrics are derived from the GCD family [3].

In this paper, instead of directly minimizing (8), we develop a fixed point search to find a sparse solution. The fixed point algorithm is based on first order optimality conditions and is inspired in the robust statistics literature [9]. Let π‘₯βˆ— be a stationary point of (8), then the first order optimality condition is

Definition 1 For 𝑒 ∈ β„π‘š , the 𝐿𝐿𝑝 norm of 𝑒 is defined as

Noting that the gradient βˆ‡βˆ₯π‘₯βˆ— βˆ₯𝐿𝐿1 ,𝛿 , can be expressed as

βˆ₯𝑒βˆ₯𝐿𝐿𝑝,𝛿 =

π‘š βˆ‘

log{1 + 𝛿

βˆ’π‘

𝑝

βˆ£π‘’π‘– ∣ }, 𝛿 > 0.

Φ𝑇 Ξ¦π‘₯βˆ— βˆ’ Φ𝑇 𝑦 + πœ†βˆ‡βˆ₯π‘₯βˆ— βˆ₯𝐿𝐿1 ,𝛿 = 0.

(9)

βˆ‡βˆ₯π‘₯βˆ— βˆ₯𝐿𝐿1 ,𝛿 = π‘Š (π‘₯βˆ— )π‘₯βˆ— , (5)

𝑖=1

The 𝐿𝐿𝑝 norm (quasi-norm) doesn’t over penalize large deviations, and is therefore a robust metric appropriate for impulsive environments [3].

(10)

where π‘Š (π‘₯) is a diagonal matrix with diagonal elements given by π‘Šπ‘–π‘– (π‘₯) = [(𝛿 + ∣π‘₯𝑖 ∣)∣π‘₯𝑖 ∣]βˆ’1 , (11) the first order optimality condition, (9), is equivalent to Φ𝑇 Ξ¦π‘₯βˆ— βˆ’ Φ𝑇 𝑦 + πœ†π‘Š (π‘₯βˆ— )π‘₯βˆ— = 0.

3. BAYESIAN COMPRESSED SENSING WITH MERIDIAN PRIORS

Solving for π‘₯βˆ— we find the fixed point function

Of interest here is the development of a sparse reconstruction strategy using a Bayesian framework. To encourage sparsity in the solution, we propose the use of meridian priors for the signal model. The meridian distribution possesses heavier tails than the Laplacian distribution, thus yielding more impulsive (sparser) signal models and intuitively lowering the number of samples to perform the reconstruction. We model the sampling noise as independent, zero mean, Gaussian distributed samples with variance 𝜎 2 . Using the observation model in (2) the likelihood function becomes 𝑝(π‘¦βˆ£π‘₯; 𝜎) = 𝒩 (Ξ¦π‘₯, Ξ£), Ξ£ = 𝜎 2 𝐼.

(12)

(6)

Assuming the signal π‘₯ (or coefficients in a sparse basis) are independent meridian distributed samples yields the following prior 𝑛 𝛿𝑛 ∏ 1 𝑝(π‘₯βˆ£π›Ώ) = 𝑛 . (7) 2 𝑖=1 (𝛿 + ∣π‘₯𝑖 ∣)2

π‘₯βˆ— = [Φ𝑇 Ξ¦ + πœ†π‘Š (π‘₯βˆ— )]βˆ’1 Φ𝑇 𝑦 =π‘Š

βˆ’1

𝑇

(π‘₯ )Ξ¦ [Ξ¦π‘Š βˆ—

βˆ’1

(13) 𝑇

(π‘₯ )Ξ¦ + πœ†πΌ] βˆ—

βˆ’1

𝑦.

The fixed point search uses the solution at previous iteration as input to update the solution. The estimate at iteration time 𝑑 + 1 is given by π‘₯ ˆ𝑑+1 = π‘Š βˆ’1 (Λ† π‘₯𝑑 )Φ𝑇 [Ξ¦π‘Š βˆ’1 (Λ† π‘₯𝑑 )Φ𝑇 + πœ†πΌ]βˆ’1 𝑦.

(14)

The fixed point algorithm turns out to be a reweighted least squares recursion, which iteratively finds a solution and updates the weight matrix using (11). As in other robust regression problems, the estimate in (8) is scale dependent (𝛿 in the meridian prior formulation). In fact, 𝛿 controls the sparsity of the solution and in the limiting case when 𝛿 β†’ 0 the solution of (8) is equivalent to the 𝐿0 norm solution [5]. To address this problem we propose to jointly estimate 𝛿 and π‘₯ at each iteration similar to joint scalelocation estimates [8, 9].

A fast way to estimate 𝛿 from π‘₯ is using order statistics (although more elaborate estimates can be used as in [8]). Let 𝑋 be a meridian distributed random variable with zero location and scale parameter 𝛿 and denote the π‘Ÿ-th quartile of 𝑋 as 𝑄(π‘Ÿ) . The interquartile distance is 𝑄(3) βˆ’ 𝑄(1) = 2𝛿, thus, a fast estimate of 𝛿 is half the interquartile distance of the samples π‘₯. Let 𝑄𝑑(π‘Ÿ) denote the π‘Ÿ-th quartile of the estimate π‘₯ˆ𝑑 at time 𝑑, then the estimate of 𝛿 at iteration time 𝑑 is given by 𝛿ˆ𝑑 = 0.5(𝑄𝑑(3) βˆ’ 𝑄𝑑(1) ).

(15)

To summarize, the final algorithm is depicted in Algorithm 1, where 𝐽 is the maximum number of iterations and 𝛾 is a tolerance parameter for the error between subsequent solutions. To prevent numerical instabilities we pre-define a minimum value for 𝛿 denoted as π›Ώπ‘šπ‘–π‘› . We start the recursion with the LS solution (π‘Š = 𝐼) and we also assume a known noise variance, 𝜎 2 (recall πœ† = 2𝜎 2 ). The resulting algorithm is coined meridian Bayesian compressed sensing (MBCS). Algorithm 1 MBCS Require: πœ†, π›Ώπ‘šπ‘–π‘› , 𝛾 and 𝐽. 1: Initialize 𝑑 = 0 and π‘₯ Λ†0 = Φ𝑇 (ΦΦ𝑇 + πœ†πΌ)βˆ’1 𝑦. 2: while βˆ₯Λ† π‘₯𝑑 βˆ’ π‘₯ Λ†π‘‘βˆ’1 βˆ₯2 > 𝛾 or 𝑑 < 𝐽 do 3: Update 𝛿ˆ𝑑 and π‘Š . 4: Compute π‘₯ ˆ𝑑+1 as in equation (14). 5: 𝑑←𝑑+1 6: end while 7: return π‘₯ Λ† As mentioned in the last section the reweighted 𝐿1 approach of [5] and MBCS minimize the same objective. Moreover, the reweighted 𝐿1 may require fewer iterations to converge, but the computational cost of one iteration of MBCS is substantially lower than the computational cost of an iteration of reweighted 𝐿1 , thereby resulting in a faster algorithm. 5. EXPERIMENTAL RESULTS In this section we present numerical experiments that illustrate the effectiveness of MBCS for sparse signal reconstruction. For all experiments we use random Gaussian measurements matrices with normalized columns and π›Ώπ‘šπ‘–π‘› = 10βˆ’8 in the algorithm. The first experiment shows the validity of the joint estimation approach of MBCS. Meridian distributed signals with length 𝑛 = 1000 and 𝛿 ∈ {10βˆ’3 , 10βˆ’2 , 10βˆ’1 } are used. The signals are sampled taking π‘š = 200 measurements and zero mean Gaussian distributed sampling noise with variance 𝜎 2 = 10βˆ’2 is added. Table 1 shows the average reconstruction SNR (R-SNR) for 200 repetitions. The performance loss is of 6 dB approximately in the worst case, but fully automated MBCS still yields a good reconstruction.

Table 1. Comparison of reconstruction quality between known 𝛿 and estimated 𝛿 MBCS. Meridian distributed signals, 𝑛 = 1000, π‘š = 200. R-SNR (dB). 𝛿 = 10βˆ’3 𝛿 = 10βˆ’2 𝛿 = 10βˆ’1 Known 𝛿 9.91 21.5 30.69 Estimated 𝛿 8.16 17.58 24.98 The next set of experiments compare MBCS with current reconstruction strategies for noiseless samples and noisy samples. The algorithms used for comparison are 𝐿1 minimization [1], re-weighted 𝐿1 minimization [5], RWLS to approach 𝐿𝑝 [4], and CoSaMP [2]. We use π‘˜-sparse signals (π‘˜ nonzero coefficients) of length 𝑛 = 1000, in which the amplitudes of the nonzero coefficients are Gaussian distributed with zero mean and standard deviation 𝜎π‘₯ = 10. Each experiment is averaged over 200 repetitions. The first experiment compares MBCS in a noiseless setting for different sparsity levels, fixing π‘š = 200. We use the probability of exact reconstruction as a measure of performance, where a reconstruction is considered exact if βˆ₯Λ† π‘₯βˆ’ π‘₯βˆ₯∞ ≀ 10βˆ’4 . The results are shown in Fig. 1 (Middle). Results show that MBCS outperforms CoSaMP and 𝐿1 minimization (giving larger probability of success for lager values of π‘˜) and yielding a slightly better performance than 𝐿𝑝 minimization. It is of notice that MBCS has similar performance to reweighted 𝐿1 , since they are minimizing the same objective, but with a different approach. The second experiment compares MBCS in the noisy case varying the number of samples (π‘š) and fixing π‘˜ = 10. The sampling noise is Gaussian distributed with variance 𝜎 2 = 10βˆ’2 . The R-SNR is used as the performance metric. Results are presented in Fig. 1 (Right). In the noisy case MBCS outperforms all other reconstruction strategies, yielding a larger R-SNR for fewer samples with a good approximation for 60 samples and above. Moreover, the R-SNR of MBCS is better than reweighted 𝐿1 minimization. An explanation for this that 𝐿1 minimization methods suffer from bias problems needing a de-biasing step after the solution is found (see [3] and references therein), therefore to achieve a similar performance an additional step is needed for reweighted 𝐿1 . As a practical experiment, we present an example utilizing a 256 Γ— 256 image. We use a Daubechies db8 wavelets as sparse basis and the number of measurements, π‘š, is set to 256 Γ— 256/4 (25% of the number of pixels of the original image). Fig. 1 (Right) shows a zoom of the normalized histogram of the coefficients along with a plot of meridian and Laplacian distributions. It can be noticed that the meridian is a better fit for the tails of the coefficient distribution. Fig. 2 (Left) shows the original image, Fig. 2 (Middle) the reconstructed image by 𝐿1 minimization, and Fig. 2 (Right) the reconstructed image by MBCS. The reconstruction SNR is 15.2 dB and 19.3 dB for 𝐿1 minimization and MBCS, respectively. This example shows the effectiveness of MBCS to model and recover sparse representations of real signals.

L1 minimization

rwβˆ’L

1

rwlsβˆ’L

p

0.6

CoSaMP MBCS

0.4

0.2

0 30

30

rwβˆ’L1

20

CoSaMP MBCS

10 0

70

βˆ’20 20

0.6

0.4

0.2

βˆ’10

40 50 60 Number of nonzero elements, k

Meridian Laplacian

0.8 Normalized histogram

1

0.8

1

40

L minimization Reconstruction SNR, dB

Probability of exact reconstruction

1

40

60 80 Number of samples, m

100

0 0

10

20

30 40 Value

50

60

70

Fig. 1. MBCS experiments. L: π‘˜-sparse (noiseless case), M: π‘˜-sparse (noisy measurements), R: Wavelet coefficient histogram.

Fig. 2. Image reconstruction example. L: Original image, M: 𝐿1 reconstruction, R: MBCS reconstruction. 6. CONCLUSIONS In this paper, we formulate the CS recovery problem in a Bayesian framework using algebraic-tailed priors from the GCD family for the signal coefficients. An iterative reconstruction algorithm, referred to as MBCS, is developed from this Bayesian formulation. Simulation results show that the proposed method requires fewer samples than most existing reconstruction algorithms for compressed sensing, thereby validating the use of GCD priors for sparse reconstruction problems. Methods to estimate the sampling noise variance are still an open problem. A future research direction is to explore the use of GCD priors with 𝑝 different from 1 to give more flexibility in the sparsity model.

sampling and reconstruction methods for sparse signals in the presence of impulsive noise,” IEEE Journal of Selected Topics in Signal Processing, accepted for publication. [4] R. Chartrand and V. Staneva, β€œRestricted isometry properties and nonconvex compressive sensing,” Inverse Problems, vol. 24, no. 3, 2008. [5] E. J. Cand`es, M. Wakin, and S. Boyd, β€œEnhacing sparsity by reweighted β„“1 minimization,” Journal of Fourier Analysis and Applications, vol. 14, no. 5, pp. 877–905, Dec. 2008, pecial issue on sparsity.

7. REFERENCES

[6] S. D. Babacan, R. Molina, and A. K. Katsaggelos, β€œBayesian compressive sensing using laplace priors,” IEEE Transactions on Image Processing, 2009, accepted for publication.

[1] E. J. Cand`es and M. B Wakin, β€œAn introduction to compressive sampling,” IEEE Signal Processing Magazine, vol. 25, no. 2, pp. 21–30, Mar. 2008.

[7] T. C. Aysal and K. E. Barner, β€œMeridian filtering for robust signal processing,” IEEE Transactions on Signal Processing, vol. 55, no. 8, pp. 3949–3962, Aug. 2007.

[2] D. Needell and J. A. Tropp, β€œCosamp: Iterative signal recovery from incomplete and inaccurate samples,” Applied and Computational Harmonic Analysis, vol. 26, no. 3, pp. 301–321, Apr. 2008.

[8] R. E. Carrillo, T. C. Aysal, and K. E. Barner, β€œGeneralized cauchy distribution based robust estimation,” in Proceedings, IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Apr. 2008.

[3] R. E. Carrillo, K. E. Barner, and T. C. Aysal, β€œRobust

[9] Huber, Robust Statistics, John Wiley & Sons, Inc., 1981.

Suggest Documents